Cohorts
Cohorts define subsets of a population that you'd like to include in a dataset; cohorts are configured at a database level and restrict the content in all tables to records related to members of the cohort.
For example, if you restrict your healthcare database to patients with leukemia, then all tables (visits, diagnoses, labs, etc) will all be restricted to that same set of patients; records for other patients will not be present in the synthetic database.
Cohorts will automatically be applied to all synthetic tables, and can be optionally referenced in views. Lookup tables are not affected by cohort definitions.
Defining a cohort
Cohort definitions involve two components:
A query that provides a list of IDs for members of the cohort
A reference to the field in the entities table (ex:
person) that contains the IDs
For example, you might commonly select Person.PersonID for the entity field, and provide a query like this to filter down to a specific set of people / patients (specifically, those diagnosed with COVID-19):
-- Patients with a COVID-19 diagnosis in a standard OMOP database
WITH covid_concepts AS (
SELECT descendant_concept_id AS concept_id
FROM concept_ancestor
WHERE ancestor_concept_id = 37311061 -- COVID-19 (SNOMED)
)
SELECT DISTINCT person_id
FROM condition_occurrence
WHERE condition_concept_id IN (SELECT concept_id FROM covid_concepts);Configuring a cohort
You can define cohorts while Creating synthetic databases, and you can update cohort definitions for existing databases on the Settings > Cohorts section of the web portal.
Applying a new cohort definition will also recompute row counts for all tables and views; this will take place in the background any may take a few minutes to complete.
Referencing cohorts in view definitions
Cohorts do not automatically apply to views, but they can be easily referenced for users that want to leverage the database's designated cohort: the list of IDs will be accessible as a relation called cohort:
Last updated