15.1.9. crate_anon.anonymise.dd


Copyright (C) 2015-2018 Rudolf Cardinal (rudolf@pobox.com).

This file is part of CRATE.

CRATE is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

CRATE is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with CRATE. If not, see <http://www.gnu.org/licenses/>.


Data dictionary classes for CRATE anonymiser.

Data dictionary as a TSV file, for ease of editing by multiple authors, rather than a database table.

class crate_anon.anonymise.dd.DataDictionary(config: Config)[source]

Class representing an entire data dictionary.

check_against_source_db() → None[source]

Check DD validity against the source database. Also caches SQLAlchemy source column type

check_valid(prohibited_fieldnames: List[str] = None, check_against_source_db: bool = True) → None[source]

Check DD validity, internally +/- against the source database.

get_dest_table_for_src_db_table[source]

For a given source database/table, return the single or the first destination table.

get_dest_tables[source]

Return a SortedSet of all destination tables.

get_dest_tables_for_src_db_table[source]

For a given source database/table, return a SortedSet of destination tables.

get_dest_tables_with_patient_info[source]

Return a SortedSet of destination table names that have patient information.

get_fieldnames_for_src_table[source]

For a given source database name/table, return a SortedSet of source fields.

get_int_pk_ddr[source]

For a given source database name and table, return the DD row for the integer PK for that table.

Will return None if no such data dictionary row.

get_int_pk_name[source]

For a given source database name and table, return the field name of the integer PK for that table.

get_optout_defining_fields[source]

Return a SortedSet of (src_db, src_table, src_field, pidfield, mpidfield) tuples.

get_patient_src_tables_with_active_dest[source]

For a given source database name, return a SortedSet of source tables that have an active destination table.

get_pk_ddr[source]

For a given source database name and table, return the DD row for the PK for that table, whether integer or not.

Will return None if no such data dictionary row.

get_rows_for_dest_table[source]

For a given destination table, return a SortedSet of DD rows.

get_rows_for_src_table[source]

For a given source database name/table, return a SortedSet of DD rows.

get_scrub_from_db_table_pairs[source]

Return a SortedSet of (source database name, source table) tuples where those fields contain scrub_src (scrub-from) information.

get_scrub_from_rows[source]

Return a SortedSet of DD rows for all fields containing scrub_src (scrub-from) information.

get_source_databases[source]

Return a SortedSet of source database names.

get_src_db_tablepairs[source]

Return a SortedSet of (source database name, source table) tuples.

get_src_db_tablepairs_w_int_pk[source]

Return a SortedSet of (source database name, source table) tuples.

get_src_db_tablepairs_w_pt_info[source]

Return a SortedSet of (source database name, source table) tuples.

get_src_dbs_tables_for_dest_table[source]

For a given destination table, return a SortedSet of (dbname, table) tuples.

get_src_dbs_tables_with_no_pt_info_int_pk[source]

Return a SortedSet of (source database name, source table) tuples where the table has no patient information and has an integer PK.

get_src_dbs_tables_with_no_pt_info_no_pk[source]

Return a SortedSet of (source database name, source table) tuples where the table has no patient information and no integer PK.

get_src_tables[source]

For a given source database name, return a SortedSet of source tables.

get_src_tables_with_active_dest[source]

For a given source database name, return a SortedSet of source tables.

get_src_tables_with_patient_info[source]

For a given source database name, return a SortedSet of source tables that have patient information.

get_tsv() → str[source]

Return the DD in TSV format.

has_active_destination[source]

For a given source database name and table – does it have an active destination?

read_from_file(filename: str) → None[source]

Read DD from file.

read_from_source_databases(report_every: int = 100) → None[source]

Create a draft DD from a source database.