v1 (I2B2 Demodata)

The v1 loader expects an already existing i2b2 database (in .csv format) that will be converted in a way that is compliant with the MedCo data model. This involves encrypting and ‘deterministically tagging’ some of the data.

List of input (‘original’) files:

  • all i2b2metadata files (e.g. i2b2.csv)
  • dummy_to_patient.csv
  • patient_dimension.csv
  • visit_dimension.csv
  • concept_dimension.csv
  • modifier_dimension.csv
  • observation_fact.csv
  • table_access.csv

Loading in the same host

If you using the same host machine to deploy and load the data you can use the following table bellow to adapt some of the script parameters depending on the deployment scenario. This includes the scenario in test-network where for each of the nodes you want to load data from its hosting machine. You need to repeat the loading process for all nodes, by modifying the arguments “network”, “entryPointIdx” and “dbName”.

Deployment Profile –network –v (volumes) –dbHost –dbName
test-local-3nodes test-local-3nodes_medco-network + test-local-3nodes_medco-srv<node index> ~/medco-loader/data/i2b2:/dataset + ~/medco-deployment/configuration-profiles/test-local-3nodes/group.toml:/group.toml postgresql i2b2medcosrv<node index>
test-network test-network-<network name>-node<node index>_default ~/medco-loader/data/i2b2:/dataset + ~/medco-deployment/configuration-profiles/test-network-<network name>-node<node index>/group.toml:/group.toml postgresql i2b2medco
dev-local-3nodes dev-local-3nodes_medco-network + dev-local-3nodes_medco-srv<node index> ~/medco-loader/data/i2b2:/dataset + ~/medco-deployment/configuration-profiles/dev-local-3nodes/group.toml:/group.toml postgresql i2b2medcosrv<node index>

Loading in a different host

If you are using an external machine (e.g. your laptop) to load the data into one of the nodes you can use the following table bellow to adapt some of the script parameters depending on the deployment scenario. In this case you do not need to specify the --network parameters. You need to repeat the loading process for all nodes, by modifying the arguments “network”, “entryPointIdx” and “dbName”.

Deployment Profile –v (volumes) –dbHost –dbName
test-local-3nodes ~/medco-loader/data/i2b2:/dataset + ~/medco-deployment/configuration-profiles/test-local-3nodes/group.toml:/group.toml <domain name> i2b2medcosrv<node index>
test-network ~/medco-loader/data/i2b2:/dataset + ~/medco-deployment/configuration-profiles/test-network-<network name>-node<node index>/group.toml:/group.toml <domain name> i2b2medco

Dummy Generation

The provided example data set files come with dummy data pre-generated. Those data are random dummy entries whose purpose is to prevent frequency attacks. For more information on how this dummy generation is done please refer to ~/medco-loader/data/scripts/import-tool/report/report.pdf. In a future release, the generation will be done dynamically by the loader.

Example

The following example allows to load data into a running MedCo development deployment (dev-local-3nodes), on the node 0. Adapt accordingly arguments network, entryPointIdx and dbName for the 2 other nodes.

cd ~/medco-loader/deployment
docker run --network="dev-local-3nodes_medco-network" --network="dev-local-3nodes_medco-srv0" \
    -v ~/medco-loader/data/i2b2:/dataset -v ~/medco-deployment/configuration-profiles/dev-local-3nodes/group.toml:/group.toml \
    medco/medco-loader:v0.1.1 medco-loader -debug 2 v1 --group /group.toml --entryPointIdx 0 --sen /dataset/sensitive.txt  \
    --files /dataset/files.toml --dbHost localhost --dbPort 5432 --dbName i2b2medcosrv0 --dbUser i2b2 --dbPassword i2b2
NAME:
    medco-loader v1 - Convert existing i2b2 data model

USAGE:
    medco-loader v1 [command options] [arguments...]

OPTIONS:
    --group value, -g value               UnLynx group definition file
    --entryPointIdx value, --entry value  Index (relative to the group definition file) of the collective authority server to load the data
    --sensitive value, --sen value        File containing a list of sensitive concepts
    --dbHost value, --dbH value           Database hostname
    --dbPort value, --dbP value           Database port (default: 0)
    --dbName value, --dbN value           Database name
    --dbUser value, --dbU value           Database user
    --dbPassword value, --dbPw value      Database password
    --files value, -f value               Configuration toml with the path of the all the necessary i2b2 files
    --empty, -e                           Empty patient and visit dimension tables (y/n)

To check that it is working you can query for:

-> Diagnoses -> Neoplasm -> Benign neoplasm -> Benign neoplasm of breast

You should obtain 2 matching subjects.