v1 (I2B2 Demodata)¶
The v1 loader expects an already existing i2b2 database (in .csv format) that will be converted in a way that is compliant with the MedCo data model. This involves encrypting and ‘deterministically tagging’ some of the data.
List of input (‘original’) files:
- all i2b2metadata files (e.g. i2b2.csv)
- dummy_to_patient.csv
- patient_dimension.csv
- visit_dimension.csv
- concept_dimension.csv
- modifier_dimension.csv
- observation_fact.csv
- table_access.csv
Loading in the same host¶
If you using the same host machine to deploy and load the data you can use the following table bellow to adapt some of the script parameters depending on the deployment scenario.
This includes the scenario in test-network where for each of the nodes you want to load data from its hosting machine.
You need to repeat the loading process for all nodes, by modifying the arguments “network”, “entryPointIdx” and “dbName”.
| Deployment Profile | –network | –v (volumes) | –dbHost | –dbName |
|---|---|---|---|---|
| test-local-3nodes | test-local-3nodes_medco-network + test-local-3nodes_medco-srv<node index> |
~/medco-loader/data/i2b2:/dataset + ~/medco-deployment/configuration-profiles/test-local-3nodes/group.toml:/group.toml |
postgresql |
i2b2medcosrv<node index> |
| test-network | test-network-<network name>-node<node index>_default |
~/medco-loader/data/i2b2:/dataset + ~/medco-deployment/configuration-profiles/test-network-<network name>-node<node index>/group.toml:/group.toml |
postgresql |
i2b2medco |
| dev-local-3nodes | dev-local-3nodes_medco-network + dev-local-3nodes_medco-srv<node index> |
~/medco-loader/data/i2b2:/dataset + ~/medco-deployment/configuration-profiles/dev-local-3nodes/group.toml:/group.toml |
postgresql |
i2b2medcosrv<node index> |
Loading in a different host¶
If you are using an external machine (e.g. your laptop) to load the data into one of the nodes you can use the following table bellow to adapt some of the script parameters depending on the deployment scenario. In this case you do not need to specify the --network parameters.
You need to repeat the loading process for all nodes, by modifying the arguments “network”, “entryPointIdx” and “dbName”.
| Deployment Profile | –v (volumes) | –dbHost | –dbName |
|---|---|---|---|
| test-local-3nodes | ~/medco-loader/data/i2b2:/dataset + ~/medco-deployment/configuration-profiles/test-local-3nodes/group.toml:/group.toml |
<domain name> |
i2b2medcosrv<node index> |
| test-network | ~/medco-loader/data/i2b2:/dataset + ~/medco-deployment/configuration-profiles/test-network-<network name>-node<node index>/group.toml:/group.toml |
<domain name> |
i2b2medco |
Dummy Generation¶
The provided example data set files come with dummy data pre-generated.
Those data are random dummy entries whose purpose is to prevent frequency attacks.
For more information on how this dummy generation is done please refer to ~/medco-loader/data/scripts/import-tool/report/report.pdf.
In a future release, the generation will be done dynamically by the loader.
Example¶
The following example allows to load data into a running MedCo development deployment (dev-local-3nodes), on the node 0.
Adapt accordingly arguments network, entryPointIdx and dbName for the 2 other nodes.
cd ~/medco-loader/deployment
docker run --network="dev-local-3nodes_medco-network" --network="dev-local-3nodes_medco-srv0" \
-v ~/medco-loader/data/i2b2:/dataset -v ~/medco-deployment/configuration-profiles/dev-local-3nodes/group.toml:/group.toml \
medco/medco-loader:v0.1.1 medco-loader -debug 2 v1 --group /group.toml --entryPointIdx 0 --sen /dataset/sensitive.txt \
--files /dataset/files.toml --dbHost localhost --dbPort 5432 --dbName i2b2medcosrv0 --dbUser i2b2 --dbPassword i2b2
NAME:
medco-loader v1 - Convert existing i2b2 data model
USAGE:
medco-loader v1 [command options] [arguments...]
OPTIONS:
--group value, -g value UnLynx group definition file
--entryPointIdx value, --entry value Index (relative to the group definition file) of the collective authority server to load the data
--sensitive value, --sen value File containing a list of sensitive concepts
--dbHost value, --dbH value Database hostname
--dbPort value, --dbP value Database port (default: 0)
--dbName value, --dbN value Database name
--dbUser value, --dbU value Database user
--dbPassword value, --dbPw value Database password
--files value, -f value Configuration toml with the path of the all the necessary i2b2 files
--empty, -e Empty patient and visit dimension tables (y/n)
To check that it is working you can query for:
-> Diagnoses -> Neoplasm -> Benign neoplasm -> Benign neoplasm of breast
You should obtain 2 matching subjects.