πŸŽ“ Synthea by Bulk APi

In this guide we will generate synthea data and load it into aidbox

Generate synthea data

We are going to generate synthetic data with synthea project:

# brew install gradle
git clone https://github.com/synthetichealth/synthea
cd synthea
​
# edit src/main/resources/synthea.properties
# set exporter.fhir.bulk_data = true
​
# generate 100 pts
./run_synthea -p 100
​
cd output/fhir
ls -lah
# create all.ndjson
cat *.ndjson > all.ndjson
​
# gzip all ndjson
gzip *.ndjson
ls -lah
​
#load to storage
gsutil cp *.ndjson.gz gs://your-bucket/dir/

Load by Resource Type

Now we can load for example Patients and Observations into your box:

POST /fhir/Patient/$load
Accept: text/yaml
Content-Type: text/yaml
​
source: 'https://storage.googleapis.com/aidbox-public/synthea/100/Patient.ndjson.gz'
​
#resp
{total: 124}
POST /fhir/Observation/$load
Accept: text/yaml
Content-Type: text/yaml
​
source: 'https://storage.googleapis.com/aidbox-public/synthea/100/Observation.ndjson.gz'
​
#resp
{total: 20382}

Let see the data in Aidbox:

GET /Patient?_ilike=John&_revinclude=Observation:patient

Load all at once with $load

Using /fhir/$load you can load ndjson file with multiple resource types in one step:

POST /fhir/$load
Accept: text/yaml
Content-Type: text/yaml
​
source: 'https://storage.googleapis.com/aidbox-public/synthea/100/all.ndjson.gz'
​
# resp
​
{CarePlan: 356, Observation: 20382, MedicationAdministration: 150, Goal: 301, Patient: 124, DiagnosticReport: 1430, Practitioner: 181, ExplanationOfBenefit: 3460, Immunization: 1636, Claim: 4488, MedicationRequest: 1028, Encounter: 3460, Condition: 871, Procedure: 2854, Organization: 181, AllergyIntolerance: 40, ImagingStudy: 134}

Let's see database stats:

SELECT relname, reltuples
FROM pg_class r
JOIN pg_namespace n ON (relnamespace = n.oid)
WHERE relkind = 'r' AND n.nspname = 'public'
order by reltuples desc
LIMIT 20
​
---
​
observation 20382
attribute 7257
claim 4488
encounter 3460

Cleanup data:

Truncate tables from db console:

truncate CarePlan;
truncate Observation;
truncate MedicationAdministration
truncate Goal;
truncate Patient;
truncate DiagnosticReport;
truncate Practitioner;
truncate ExplanationOfBenefit;
truncate Immunization;
truncate Claim;
truncate MedicationRequest;
truncate Encounter;
truncate Condition;
truncate "procedure";
truncate Organization;
truncate AllergyIntolerance;
truncate ImagingStudy;
​

Load with Bulk $import

To load data in a async way using new FHIR Bulk $import:

POST /fhir/$import
Accept: text/yaml
Content-Type: text/yaml
​
id: synthea
inputFormat: application/fhir+ndjson
contentEncoding: gzip
mode: bulk
inputs:
- resourceType: AllergyIntolerance
url: https://storage.googleapis.com/aidbox-public/synthea/100/AllergyIntolerance.ndjson.gz
- resourceType: CarePlan
url: https://storage.googleapis.com/aidbox-public/synthea/100/CarePlan.ndjson.gz
- resourceType: Claim
url: https://storage.googleapis.com/aidbox-public/synthea/100/Claim.ndjson.gz
- resourceType: Condition
url: https://storage.googleapis.com/aidbox-public/synthea/100/Condition.ndjson.gz
- resourceType: DiagnosticReport
url: https://storage.googleapis.com/aidbox-public/synthea/100/DiagnosticReport.ndjson.gz
- resourceType: Encounter
url: https://storage.googleapis.com/aidbox-public/synthea/100/Encounter.ndjson.gz
- resourceType: ExplanationOfBenefit
url: https://storage.googleapis.com/aidbox-public/synthea/100/ExplanationOfBenefit.ndjson.gz
- resourceType: Goal
url: https://storage.googleapis.com/aidbox-public/synthea/100/Goal.ndjson.gz
- resourceType: ImagingStudy
url: https://storage.googleapis.com/aidbox-public/synthea/100/ImagingStudy.ndjson.gz
- resourceType: Immunization
url: https://storage.googleapis.com/aidbox-public/synthea/100/Immunization.ndjson.gz
- resourceType: MedicationAdministration
url: https://storage.googleapis.com/aidbox-public/synthea/100/MedicationAdministration.ndjson.gz
- resourceType: MedicationRequest
url: https://storage.googleapis.com/aidbox-public/synthea/100/MedicationRequest.ndjson.gz
- resourceType: Observation
url: https://storage.googleapis.com/aidbox-public/synthea/100/Observation.ndjson.gz
- resourceType: Organization
url: https://storage.googleapis.com/aidbox-public/synthea/100/Organization.ndjson.gz
- resourceType: Patient
url: https://storage.googleapis.com/aidbox-public/synthea/100/Patient.ndjson.gz
- resourceType: Practitioner
url: https://storage.googleapis.com/aidbox-public/synthea/100/Practitioner.ndjson.gz
- resourceType: Procedure
url: https://storage.googleapis.com/aidbox-public/synthea/100/Procedure.ndjson.gz

Operation will return 200 instantly and you can monitor import status with:

GET /BulkImportStatus/synthea