15.1.15. crate_anon.anonymise.models


Copyright (C) 2015-2018 Rudolf Cardinal (rudolf@pobox.com).

This file is part of CRATE.

CRATE is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

CRATE is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with CRATE. If not, see <http://www.gnu.org/licenses/>.


To create a SQLAlchemy Table programmatically:
http://docs.sqlalchemy.org/en/latest/core/schema.html http://stackoverflow.com/questions/5424942/sqlalchemy-model-definition-at-execution # noqa http://stackoverflow.com/questions/2580497/database-on-the-fly-with-scripting-languages/2580543#2580543 # noqa
To create a SQLAlchemy ORM programmatically:
http://stackoverflow.com/questions/2574105/sqlalchemy-dynamic-mapping/2575016#2575016 # noqa
class crate_anon.anonymise.models.OptOutMpid(**kwargs)[source]
mpid

Patient ID

class crate_anon.anonymise.models.OptOutPid(**kwargs)[source]
pid

Patient ID

class crate_anon.anonymise.models.PatientInfo(**kwargs)[source]

Design decision in this class:

  • It gets too complicated if you try to make the fieldnames arbitrary and determined by the config.

  • So we always use ‘pid’, ‘rid’, etc.

    • Older config settings that this decision removes:

      mapping_patient_id_fieldname
      mapping_master_id_fieldname
      
    • Note that these are still actively used, as they can be used to set the names in the OUTPUT database (not the mapping database):

      research_id_fieldname
      trid_fieldname
      master_research_id_fieldname
      source_hash_fieldname
      
  • The config is allowed to set three column types:

    • the source PID type (e.g. INT, BIGINT, VARCHAR)
    • the source MPID type (e.g. BIGINT)
    • the encrypted (RID, MRID) type (which is set by the encryption algorithm; e.g. VARCHAR(128) for SHA-512.
mpid

Master patient ID (MPID)

mrid

Master research ID (MRID)

patient_scrubber_text

Raw patient scrubber (for debugging only)

pid

Patient ID (PID) (PK)

rid

Research ID (RID)

scrubber_hash

Scrubber hash (for change detection)

tp_scrubber_text

Raw third-party scrubber (for debugging only)

trid

Transient integer research ID (TRID)

class crate_anon.anonymise.models.TridRecord(**kwargs)[source]
classmethod new_trid(session: sqlalchemy.orm.session.Session, pid: Union[int, str]) → int[source]

We check for existence by inserting and asking the database if it’s happy, not by asking the database if it exists (since other processes may be doing the same thing at the same time).

pid

Patient ID (PID) (PK)

trid

Transient integer research ID (TRID)