# Constraints

Synthetic data is made to match the underlying statistical patterns of a given source dataset, with some added noise. Sometimes there are strict rules that a dataset must follow in order to be valid that data synthesis will not recognize by default. For example, if you have a `state` column mixed with a `zip_code` column, there is a strict set of combinations allowed between these two columns - for example, you cannot have a California ZIP code in a row whose state is Texas.&#x20;

Note that defining these characteristics will generally not impact the statistical fidelity and they are generally only impactful cosmetically. In the above example, even without defining a characteristic you will still mostly see realistic ZIP/state pairs - adding a constraint will ensure that *all* rows follow a particular set of rules.

### Adding constraints

You can add constraints to a database during onboarding; the standard onboarding flow has a step for adding constraints, and you can also add them later via the "Constraints" tab on the database details page.

The full list of supported constraints is below.

<table><thead><tr><th width="129.1640625">Constraint</th><th>Description</th><th>Example</th></tr></thead><tbody><tr><td>Derive</td><td>Ensures that specified columns will be populated with data from another column</td><td><code>birth_year</code> should be populated based on the "year" portion of <code>birth_date</code></td></tr><tr><td>Group</td><td>Ensures that the columns specified are never seen in a new unique combination</td><td>State, city, ZIP can only be combined as observed in source data</td></tr><tr><td>Conditional</td><td>Ensures that a target column will be populated with a specific value when a column contains a specified value</td><td>Values generated for <code>dischargable</code> will be "Y" when <code>healthy</code> is 'true'</td></tr><tr><td>Calculate</td><td>Ensures that a target column will be populated with the results of calculation between two columns</td><td>Values generated for 'total_cost' will be the result of 'base_cost' + 'fee'</td></tr><tr><td>SpecialValues</td><td>Ensures that special values within columns are preserved</td><td>'-1' has a special meaning in a continuous column and should be modeled independently from the primary distribution</td></tr><tr><td>Existence</td><td>Ensures that a target column will be populated with 1 if the conditional matches on the user input or 0 otherwise. <strong>NOTE</strong>: Only integer columns are supported</td><td>Mask age column by setting values to 1 if age is greater than 20 else 0</td></tr><tr><td>Delta</td><td>Ensures that the gap between two related columns stays realistic by learning the actual distribution of distances in the source data</td><td><code>admit_date</code> and <code>discharge_date</code> are modeled as the admit date plus the duration between them, ensuring discharge always follows admission by a realistic interval</td></tr></tbody></table>


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.subsalt.io/configuration/creating-synthetic-databases/constraints.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
