# Cohorts

Cohorts define subsets of a population that you'd like to include in a dataset; cohorts are configured at a database level and restrict the content in all tables to records related to members of the cohort.&#x20;

For example, if you restrict your healthcare database to patients with leukemia, then all tables (visits, diagnoses, labs, etc) will all be restricted to that same set of patients; records for other patients will not be present in the synthetic database.

Cohorts will automatically be applied to all synthetic tables, and can be optionally referenced in views. Lookup tables are not affected by cohort definitions.

### Defining a cohort

Cohort definitions involve two components:

1. A query that provides a list of IDs for members of the cohort
2. A reference to the field in the entities table (ex: `person`) that contains the IDs

For example, you might commonly select `Person.PersonID` for the entity field, and provide a query like this to filter down to a specific set of people / patients (specifically, those diagnosed with COVID-19):

```sql
-- Patients with a COVID-19 diagnosis in a standard OMOP database
WITH covid_concepts AS (
    SELECT descendant_concept_id AS concept_id
    FROM concept_ancestor
    WHERE ancestor_concept_id = 37311061  -- COVID-19 (SNOMED)
)
SELECT DISTINCT person_id
FROM condition_occurrence
WHERE condition_concept_id IN (SELECT concept_id FROM covid_concepts);
```

### Configuring a cohort

You can define cohorts while [Creating synthetic databases](/configuration/creating-synthetic-databases.md), and you can update cohort definitions for existing databases on the `Settings > Cohorts` section of the web portal.

Applying a new cohort definition will also recompute row counts for all tables and views; this will take place in the background any may take a few minutes to complete.

### Referencing cohorts in view definitions

Cohorts do not automatically apply to views, but they can be easily referenced for users that want to leverage the database's designated cohort: the list of IDs will be accessible as a relation called `cohort`:

```sql
SELECT *
FROM visit_occurrence
WHERE person_id IN (SELECT person_id FROM cohort) -- `cohort` is pre-defined
  AND visit_start_date >= DATE '2024-01-01'
  AND visit_start_date <  DATE '2025-01-01';
```


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.subsalt.io/configuration/creating-synthetic-databases/cohorts.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
