API Reference

This section provides detailed information about the classes and functions provided by Zink.

Main Module (zink.zink)

zink.zink.redact(text, categories=None, placeholder=None, use_cache=True, use_json_mapping=True, extractor=None, merger=None, replacer=None, auto_parallel=False, chunk_size=1000, max_workers=4)[source]

Module-level convenience function that uses a global instance for caching. If 'auto_parallel' is True and len(text) > chunk_size, concurrency-based pipeline is used. Otherwise single-pass logic is used.

zink.zink.replace(text, categories=None, user_replacements=None, ensure_consistency=True, use_cache=True, use_json_mapping=True, extractor=None, merger=None, replacer=None, auto_parallel=False, chunk_size=1000, max_workers=4)[source]

Module-level convenience function that uses a global instance for caching.

zink.zink.replace_with_my_data(text, categories=None, user_replacements=None, ensure_consistency=True, use_json_mapping=True, extractor=None, merger=None, replacer=None, auto_parallel=False, chunk_size=1000, max_workers=4)[source]

Module-level convenience function. Typically 'replace_with_my_data' does NOT rely on caching, but we might still want concurrency for large texts if 'auto_parallel' is True.

Extractor Module (zink.extractor)

class zink.extractor.EntityExtractor(model_name='deepanwa/NuNerZero_onnx')[source]

Bases: object

predict(text, labels=None)[source]

Predict entities in the given text.

Parameters:
  • text (str) -- The input text.

  • labels (list of str,) -- Only entities with these labels will be returned. If None, all detected entities are returned.

Returns:

A list of dictionaries, each containing 'start', 'end', 'label', and 'text'.

Return type:

list of dict

Merger Module (zink.merger)

class zink.merger.EntityMerger[source]

Bases: object

Merges entities based on their labels and positions in the text. This class is designed to handle entities that are close together or have the same label, merging them into a single entity when appropriate.

merge(entities, text)[source]

Result Module (zink.result)

class zink.result.PseudonymizationResult(original_text: str, anonymized_text: str, replacements: ~typing.List[~zink.result.ReplacementDetail] = <factory>, features: ~typing.Dict = <factory>)[source]

Bases: object

Result of the pseudonymization process.

anonymized_text: str
features: Dict
original_text: str
replacements: List[ReplacementDetail]
class zink.result.ReplacementDetail(label: str, original: str, pseudonym: str, start: int, end: int, score: float)[source]

Bases: object

Details about the replacement of a sensitive entity.

end: int
label: str
original: str
pseudonym: str
score: float
start: int

Replacer Subpackage (zink.replacer)

class zink.replacer.EntityReplacer(use_json_mapping=False)[source]
replace_entities(entities, text, user_replacements=None)[source]

Replace entities in the text with pseudonyms, with randomized replacements.

Parameters:
  • entities (list of dict) -- A list of dictionaries, each containing 'start', 'end', 'label', and 'text'.

  • text (str) -- The original text.

  • user_replacements (dict,) -- A dictionary of user-defined replacements for specific entity labels. If provided, these will override the JSON-based mappings.

Returns:

The text with entities replaced by pseudonyms.

Return type:

str

replace_entities_ensure_consistency(entities, text, user_replacements=None)[source]

Replace entities in the text with pseudonyms, ensuring consistent replacements.

Parameters:
  • entities (list of dict) -- A list of dictionaries, each containing 'start', 'end', 'label', and 'text'.

  • text (str) -- The original text.

  • user_replacements (dict,) -- A dictionary of user-defined replacements for specific entity labels. If provided, these will override the JSON-based mappings.

Returns:

The text with entities replaced by pseudonyms.

Return type:

str

This subpackage provides various replacement strategies. It is used internally by the main zink.replace function.