6.4. Preprocessing tools

These tools reshape specific databases for CRATE.

6.4.1. crate_preprocess_rio

The RiO preprocessor creates a unique integer field named crate_pk in all tables (copying the existing integer PK, creating one from an existing non-integer primary key, or adding a new one using SQL Server’s INT IDENTITY(1, 1) type. For all patient tables, it makes the patient ID (RiO number) into an integer, called crate_rio_number. It then adds indexes and views. All of these can be removed again, or updated incrementally if you add new data.

The views ‘denormalize’ the data for convenience, since it can be pretty hard to follow the key chain of fully normalized tables. The views conform mostly to the names used by the Servelec RiO CRIS Extraction Program (RCEP), with added consistency. Because user lookups are common, to save typing (and in some cases keep the field length below the 64-character column name limit of MySQL), the following abbreviations are used:

_Resp_Clinician_ … Responsible Clinician

Options as of 2017-02-28:

usage: crate_preprocess_rio [-h] --url URL [-v] [--print] [--echo] [--rcep]
                            [--drop-danger-drop] [--cpft] [--debug-skiptables]
                            [--prognotes-current-only | --prognotes-all]
                            [--clindocs-current-only | --clindocs-all]
                            [--allergies-current-only | --allergies-all]
                            [--audit-info | --no-audit-info]
                            [--postcodedb POSTCODEDB]
                            [--geogcols [GEOGCOLS [GEOGCOLS ...]]]
                            [--settings-filename SETTINGS_FILENAME]

*   Alters a RiO database to be suitable for CRATE.

*   By default, this treats the source database as being a copy of a RiO
    database (slightly later than version 6.2; exact version unclear).
    Use the "--rcep" (+/- "--cpft") switch(es) to treat it as a
    Servelec RiO CRIS Extract Program (RCEP) v2 output database.


optional arguments:
  -h, --help            show this help message and exit
  --url URL             SQLAlchemy database URL
  -v, --verbose         Verbose
  --print               Print SQL but do not execute it. (You can redirect the
                        printed output to create an SQL script.
  --echo                Echo SQL
  --rcep                Treat the source database as the product of Servelec's
                        RiO CRIS Extract Program v2 (instead of raw RiO)
  --drop-danger-drop    REMOVES new columns and indexes, rather than creating
                        them. (There's not very much danger; no real
                        information is lost, but it might take a while to
                        recalculate it.)
  --cpft                Apply hacks for Cambridgeshire & Peterborough NHS
                        Foundation Trust (CPFT) RCEP database. Only appicable
                        with --rcep
  --debug-skiptables    DEBUG-ONLY OPTION. Skip tables (view creation only)
  --prognotes-current-only
                        Progress_Notes view restricted to current versions
                        only (* default)
  --prognotes-all       Progress_Notes view shows old versions too
  --clindocs-current-only
                        Clinical_Documents view restricted to current versions
                        only (*)
  --clindocs-all        Clinical_Documents view shows old versions too
  --allergies-current-only
                        Client_Allergies view restricted to current info only
  --allergies-all       Client_Allergies view shows deleted allergies too (*)
  --audit-info          Audit information (creation/update times) added to
                        views
  --no-audit-info       No audit information added (*)
  --postcodedb POSTCODEDB
                        Specify database (schema) name for ONS Postcode
                        Database (as imported by CRATE) to link to addresses
                        as a view. With SQL Server, you will have to specify
                        the schema as well as the database; e.g. "--postcodedb
                        ONS_PD.dbo"
  --geogcols [GEOGCOLS [GEOGCOLS ...]]
                        List of geographical information columns to link in
                        from ONS Postcode Database. BEWARE that you do not
                        specify anything too identifying. Default: pcon pct
                        nuts lea statsward casward lsoa01 msoa01 ur01ind oac01
                        lsoa11 msoa11 parish bua11 buasd11 ru11ind oac11 imd
  --settings-filename SETTINGS_FILENAME
                        Specify filename to write draft ddgen_* settings to,
                        for use in a CRATE anonymiser configuration file.

6.4.2. crate_preprocess_pcmis

Options as of 2018-06-10:

usage: crate_preprocess_pcmis [-h] --url URL [-v] [--print] [--echo]
                              [--drop-danger-drop] [--debug-skiptables]
                              [--postcodedb POSTCODEDB]
                              [--geogcols [GEOGCOLS [GEOGCOLS ...]]]
                              [--settings-filename SETTINGS_FILENAME]

Alters a PCMIS database to be suitable for CRATE.

optional arguments:
  -h, --help            show this help message and exit
  --url URL             SQLAlchemy database URL
  -v, --verbose         Verbose
  --print               Print SQL but do not execute it. (You can redirect the
                        printed output to create an SQL script.
  --echo                Echo SQL
  --drop-danger-drop    REMOVES new columns and indexes, rather than creating
                        them. (There's not very much danger; no real
                        information is lost, but it might take a while to
                        recalculate it.)
  --debug-skiptables    DEBUG-ONLY OPTION. Skip tables (view creation only)
  --postcodedb POSTCODEDB
                        Specify database (schema) name for ONS Postcode
                        Database (as imported by CRATE) to link to addresses
                        as a view. With SQL Server, you will have to specify
                        the schema as well as the database; e.g. "--postcodedb
                        ONS_PD.dbo"
  --geogcols [GEOGCOLS [GEOGCOLS ...]]
                        List of geographical information columns to link in
                        from ONS Postcode Database. BEWARE that you do not
                        specify anything too identifying. Default: pcon pct
                        nuts lea statsward casward lsoa01 msoa01 ur01ind oac01
                        lsoa11 msoa11 parish bua11 buasd11 ru11ind oac11 imd
  --settings-filename SETTINGS_FILENAME
                        Specify filename to write draft ddgen_* settings to,
                        for use in a CRATE anonymiser configuration file.