Detectors
Это содержимое пока не доступно на вашем языке.
anonymizer ships with detectors organized into tiers by criticality.
P0 — Critical (must be ≥99% recall)
These are the categories where a single miss is unacceptable.
| Category | Detector type | Languages |
|---|---|---|
| Names (personal) | NER (Natasha / spaCy) | ru, en |
| Tax IDs (ИНН, СНИЛС, ОГРН) | Regex + checksum | language-agnostic |
| Bank cards | Regex + Luhn | language-agnostic |
| IBAN | Regex + checksum | language-agnostic |
| Emails | Regex | language-agnostic |
| Phone numbers (RU, E.164) | Regex + format normalization | language-agnostic |
P1 — Important (≥95% recall)
| Category | Detector type | Languages |
|---|---|---|
| Companies (legal entities) | NER | ru, en |
| Addresses | NER + postal-code regex | ru, en |
| Dates | Regex (all dates masked in MVP-0) | language-agnostic |
| IP / MAC | Regex | language-agnostic |
| URLs | Regex | language-agnostic |
P2 — Warn-only
Categories that surface as warnings but don’t auto-replace in MVP-0. The user can opt-in via manual mask.
Currently empty for MVP-0.
What is NOT detected
- Free-form aliases or nicknames not seen in NER training data
- Handwritten names in scanned PDFs (no OCR in MVP-0)
- Implicit references like “my client” or “the daughter of”
- PII inside image attachments or signature blocks
If something important is missed, see Reporting feedback.