Cohorts define subsets of a population that you'd like to include in a dataset; cohorts are configured at a database level and restrict the content in all tables to records related to members of the cohort.
For example, if you restrict your healthcare database to patients with leukemia, then all tables (visits, diagnoses, labs, etc) will all be restricted to that same set of patients; records for other patients will not be present in the synthetic database.
Cohorts will automatically be applied to all synthetic tables, and can be optionally referenced in views. Lookup tables are not affected by cohort definitions.
Defining a cohort
Cohort definitions involve two components:
A query that provides a list of IDs for members of the cohort
A reference to the field in the entities table (ex: person) that contains the IDs
For example, you might commonly select Person.PersonID for the entity field, and provide a query like this to filter down to a specific set of people / patients (specifically, those diagnosed with COVID-19):
-- Patients with a COVID-19 diagnosis in a standard OMOP databaseWITH covid_concepts AS (SELECT descendant_concept_id AS concept_idFROM concept_ancestorWHERE ancestor_concept_id =37311061-- COVID-19 (SNOMED))SELECT DISTINCT person_idFROM condition_occurrenceWHERE condition_concept_id IN (SELECT concept_id FROM covid_concepts);
Configuring a cohort
You can define cohorts while Creating synthetic databases, and you can update cohort definitions for existing databases on the Settings > Cohorts section of the web portal.
Applying a new cohort definition will also recompute row counts for all tables and views; this will take place in the background any may take a few minutes to complete.
Referencing cohorts in view definitions
Cohorts do not automatically apply to views, but they can be easily referenced for users that want to leverage the database's designated cohort: the list of IDs will be accessible as a relation called cohort:
SELECT *
FROM visit_occurrence
WHERE person_id IN (SELECT person_id FROM cohort) -- `cohort` is pre-defined
AND visit_start_date >= DATE '2024-01-01'
AND visit_start_date < DATE '2025-01-01';