15.1.18. crate_anon.anonymise.test_anonymisation¶
Copyright (C) 2015-2018 Rudolf Cardinal (rudolf@pobox.com).
This file is part of CRATE.
CRATE is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
CRATE is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with CRATE. If not, see <http://www.gnu.org/licenses/>.
Test the anonymisation for specific databases.
- From the output, we have:
- n_replacements (POSITIVE) word_count (N) true_positive_confidential_masked (TP) false_positive_banal_masked (FP) false_negative_confidential_visible_known_to_source (FN) confidential_visible_but_unknown_to_source
- Therefore, having summed across documents:
- TP + FP = POSITIVE NEGATIVE = N - POSITIVE TN = NEGATIVE - FN
- and then we have everything we need. For all identifiers, we make FN equal to
- false_negative_confidential_visible_known_to_source
- not_false_negative_confidential_visible_but_unknown_to_source
instead.
-
crate_anon.anonymise.test_anonymisation.
get_docids
(args: Any, fieldinfo: crate_anon.anonymise.test_anonymisation.FieldInfo, from_src: bool = True) → List[int][source]¶ Generate a limited set of PKs for the documents.
-
crate_anon.anonymise.test_anonymisation.
get_patientnum_anontext
(docid: int, fieldinfo: crate_anon.anonymise.test_anonymisation.FieldInfo) → Tuple[Union[int, NoneType], Union[str, NoneType]][source]¶ Fetches the anonymised text for a given document PK, plus the associated patient ID.
-
crate_anon.anonymise.test_anonymisation.
get_patientnum_rawtext
(docid: int, fieldinfo: crate_anon.anonymise.test_anonymisation.FieldInfo) → Tuple[Union[int, NoneType], Union[str, NoneType]][source]¶ Fetches the original text for a given document PK, plus the associated patient ID.
-
crate_anon.anonymise.test_anonymisation.
process_doc
(docid: int, args: Any, fieldinfo: crate_anon.anonymise.test_anonymisation.FieldInfo, csvwriter: Any, first: bool, scrubdict: Dict[int, Dict[str, Any]]) → int[source]¶ Write the original and anonymised documents to disk, plus some counts to a CSV file.