Cohorts

Cohorts define subsets of a population that you'd like to include in a dataset; cohorts are configured at a database level and restrict the content in all tables to records related to members of the cohort.

For example, if you restrict your healthcare database to patients with leukemia, then all tables (visits, diagnoses, labs, etc) will all be restricted to that same set of patients; records for other patients will not be present in the synthetic database.

Cohorts will automatically be applied to all synthetic tables, and can be optionally referenced in views. Lookup tables are not affected by cohort definitions.

Defining a cohort

Cohort definitions involve two components:

A query that provides a list of IDs for members of the cohort
A reference to the field in the entities table (ex: person) that contains the IDs

For example, you might commonly select Person.PersonID for the entity field, and provide a query like this to filter down to a specific set of people / patients (specifically, those diagnosed with COVID-19):

-- Patients with a COVID-19 diagnosis in a standard OMOP database
WITH covid_concepts AS (
    SELECT descendant_concept_id AS concept_id
    FROM concept_ancestor
    WHERE ancestor_concept_id = 37311061  -- COVID-19 (SNOMED)
)
SELECT DISTINCT person_id
FROM condition_occurrence
WHERE condition_concept_id IN (SELECT concept_id FROM covid_concepts);

Configuring a cohort

You can define cohorts while Creating synthetic databases, and you can update cohort definitions for existing databases on the Settings > Cohorts section of the web portal.

Applying a new cohort definition will also recompute row counts for all tables and views; this will take place in the background any may take a few minutes to complete.

Referencing cohorts in view definitions

Cohorts do not automatically apply to views, but they can be easily referenced for users that want to leverage the database's designated cohort: the list of IDs will be accessible as a relation called cohort:

SELECT *
FROM visit_occurrence
WHERE person_id IN (SELECT person_id FROM cohort) -- `cohort` is pre-defined
  AND visit_start_date >= DATE '2024-01-01'
  AND visit_start_date <  DATE '2025-01-01';

PreviousPatient identifiers NextConstraints

Last updated 18 days ago

hashtagDefining a cohort

hashtagConfiguring a cohort

hashtagReferencing cohorts in view definitions

Defining a cohort

Configuring a cohort

Referencing cohorts in view definitions