anonymizer
Offline · Open Source · AGPL-3.0

Offline PII redactor for legal documents.

Replaces names, companies, financial IDs, addresses, emails and phones with structured tokens in .docx, .pdf and .xlsx. Runs locally. Russian + English. No telemetry.

$ curl -fsSL anonymizer.site/install | sh
Zero network macOS · Windows · Linux AGPL-3.0

Why this exists

Lawyers want AI feedback on contracts but can't paste raw client data into third-party tools. Manual redaction is slow and error-prone, especially for scanned documents. anonymizer automates the redaction step locally so the rest of the AI workflow stays unchanged.

What it strips

Stable tokens that preserve grammatical position. Numbering is consistent within a session.

Names
John Smith [Person_1]
Companies
Acme Corp. [Company_1]
Emails
j.smith@example.com [Email_1]
Phones
+1 (415) 555-1234 [Phone_1]
Tax IDs
12-3456789 (EIN) [Tax_ID_1]
IBANs
GB29 NWBK 6016... [IBAN_1]
Cards
4276 1300 ... [Card_1]
Addresses
1 Main St, New York [Address_1]
Dates
03/12/2024 [Date_1]
IPs / MACs
192.168.1.1 [IP_1]
URLs
example.com/dashboard [URL_1]

How it works

Drop a file

Drag a .docx, .pdf or .xlsx into the local web UI.

Detect locally

Natasha + spaCy run on your CPU. Regex catches structured PII. Never opens a socket.

Get tokenized doc

Structure preserved, metadata cleared. Original file untouched.

No data leaves your laptop. Ever.

Install

macOS / Linux
$ curl -fsSL anonymizer.site/install | sh
Windows PowerShell
$ iwr -useb anonymizer.site/install.ps1 | iex
Manual / corporate
$ uv tool install docs-anonymizer

See /docs/installation/manual for SHA256 and offline mirror options.

Why it's safe to install

Zero network in core

An integration test asserts no socket opens during processing (tests/integration/test_no_network.py).

Open source AGPL-3.0

Full source ships as sdist alongside the wheel on PyPI.

No telemetry

Feedback is opt-in via an in-UI button. No passive analytics, ever.

Signed wheel + reproducible build

Coming in v0.2.

FAQ

Does it work on scanned PDFs?

Not yet. MVP-1 brings OCR.

Why AGPL?

Because we use PyMuPDF. Sdist ships alongside the wheel.

Can I disable detectors I don't need?

Yes, per-category toggles in UI and in the config file.

Does it phone home?

Only the user-clicked 'Check for updates' button. Zero passive telemetry.

Will tokens stay consistent across runs?

Within a session yes; across sessions no — by design. See Token Manager docs.

Does it run on Linux?

Yes, but no install one-liner yet. Use 'uv tool install docs-anonymizer'.

Is it stable for production legal work?

MVP-0 is pilot-grade; ≥99% recall on P0 categories from the golden corpus. Decide for yourself.