Replaces names, companies, financial IDs, addresses, emails and phones with structured tokens in .docx, .pdf and .xlsx. Runs locally. Russian + English. No telemetry.
curl -fsSL anonymizer.site/install | sh Lawyers want AI feedback on contracts but can't paste raw client data into third-party tools. Manual redaction is slow and error-prone, especially for scanned documents. anonymizer automates the redaction step locally so the rest of the AI workflow stays unchanged.
Stable tokens that preserve grammatical position. Numbering is consistent within a session.
John Smith → [Person_1] Acme Corp. → [Company_1] j.smith@example.com → [Email_1] +1 (415) 555-1234 → [Phone_1] 12-3456789 (EIN) → [Tax_ID_1] GB29 NWBK 6016... → [IBAN_1] 4276 1300 ... → [Card_1] 1 Main St, New York → [Address_1] 03/12/2024 → [Date_1] 192.168.1.1 → [IP_1] example.com/dashboard → [URL_1] Drag a .docx, .pdf or .xlsx into the local web UI.
Natasha + spaCy run on your CPU. Regex catches structured PII. Never opens a socket.
Structure preserved, metadata cleared. Original file untouched.
No data leaves your laptop. Ever.
curl -fsSL anonymizer.site/install | sh iwr -useb anonymizer.site/install.ps1 | iex uv tool install docs-anonymizer See /docs/installation/manual for SHA256 and offline mirror options.
An integration test asserts no socket opens during processing (tests/integration/test_no_network.py).
Full source ships as sdist alongside the wheel on PyPI.
Feedback is opt-in via an in-UI button. No passive analytics, ever.
Coming in v0.2.
Not yet. MVP-1 brings OCR.
Because we use PyMuPDF. Sdist ships alongside the wheel.
Yes, per-category toggles in UI and in the config file.
Only the user-clicked 'Check for updates' button. Zero passive telemetry.
Within a session yes; across sessions no — by design. See Token Manager docs.
Yes, but no install one-liner yet. Use 'uv tool install docs-anonymizer'.
MVP-0 is pilot-grade; ≥99% recall on P0 categories from the golden corpus. Decide for yourself.