# Constraints

Synthetic data is made to match the underlying statistical patterns of a given source dataset, with some added noise. Sometimes there are strict rules that a dataset must follow in order to be valid that data synthesis will not recognize by default. For example, if you have a `state` column mixed with a `zip_code` column, there is a strict set of combinations allowed between these two columns - for example, you cannot have a California ZIP code in a row whose state is Texas.&#x20;

Note that defining these characteristics will generally not impact the statistical fidelity and they are generally only impactful cosmetically. In the above example, even without defining a characteristic you will still mostly see realistic ZIP/state pairs - adding a constraint will ensure that *all* rows follow a particular set of rules.

### Adding constraints

You can add constraints to a database during onboarding; the standard onboarding flow has a step for adding constraints, and you can also add them later via the "Constraints" tab on the database details page.

The full list of supported constraints is below.

<table><thead><tr><th width="129.1640625">Constraint</th><th>Description</th><th>Example</th></tr></thead><tbody><tr><td>Derive</td><td>Ensures that specified columns will be populated with data from another column</td><td><code>birth_year</code> should be populated based on the "year" portion of <code>birth_date</code></td></tr><tr><td>Group</td><td>Ensures that the columns specified are never seen in a new unique combination</td><td>State, city, ZIP can only be combined as observed in source data</td></tr><tr><td>Conditional</td><td>Ensures that a target column will be populated with a specific value when a column contains a specified value</td><td>Values generated for <code>dischargable</code> will be "Y" when <code>healthy</code> is 'true'</td></tr><tr><td>Calculate</td><td>Ensures that a target column will be populated with the results of calculation between two columns</td><td>Values generated for 'total_cost' will be the result of 'base_cost' + 'fee'</td></tr><tr><td>SpecialValues</td><td>Ensures that special values within columns are preserved</td><td>'-1' has a special meaning in a continuous column and should be modeled independently from the primary distribution</td></tr><tr><td>Existence</td><td>Ensures that a target column will be populated with 1 if the conditional matches on the user input or 0 otherwise. <strong>NOTE</strong>: Only integer columns are supported</td><td>Mask age column by setting values to 1 if age is greater than 20 else 0</td></tr><tr><td>Delta</td><td>Ensures that the gap between two related columns stays realistic by learning the actual distribution of distances in the source data</td><td><code>admit_date</code> and <code>discharge_date</code> are modeled as the admit date plus the duration between them, ensuring discharge always follows admission by a realistic interval</td></tr></tbody></table>
