Creating synthetic databases

A synthetic database is a de-identified, synthetic representation of a sensitive database and can be queries by authorized end users. Synthetic databases must be set up before they can be used, and can be periodically refreshed when the underlying data sources change.

By default, synthetic databases will include a subset of the tables that exist in the source database; you can also define Views that expose custom subsets of data if required.

Creating a new synthetic database

Configuring data connector

Every synthetic database is tied to a source data system and mimics its schema and data distributions. The first step for creating a new synthetic database is to to configure the connection to the source data system, which you can do by clicking the "New connection" button on the Databases tab in the Subsalt portal.

Select the connector you'd like to use, and fill out the connection information that the Subsalt system should use to read data from the system.

The credentials that you provide need to authorize the Subsalt system to make read-only queries (SELECT) to the namespaces and/or databases that you provide in the form.

Specifying a privacy standard

You'll also need to specify a privacy standard to enforce a particular de-identification policy. Privacy standards are typically defined by third-party auditors and cannot be modified manually; contact your administrator or Subsalt support if you're unsure of what privacy standard to select when onboarding a new data source.

Once you've configured your data connector and privacy standard, the system will attempt to connect to the source system and discover what data assets are accessible.

Schema and metadata configuration

Once the discovery process completes, each table that you want to include in your synthetic database needs to be configured with certain metadata in order to be processed; read more about the configuration process in Tables.

For more complex databases, it may be easier to download a representation of the table schema as a file, edit the file, and upload the full schema in one shot. This functionality is available via the "import" and "export" buttons visible on the table configuration page.

You can come back and modify table configurations later, so you can start with onboarding a few tables if you don't want to onboard an entire warehouse at once.

Define constraints

In certain cases it may be useful to configure Constraints on the database; these are useful for when there are explicit relationships between certain fields - for example, you can define that:

  • birth_year should always match the year of birth_datetime, or

  • arrival_date should always occur before departure_date

Review database configuration

After your database has been configured, the final step is to review and attest that the configuration is correct and complete. It's essential that the configuration is accurate for Subsalt's automated privacy checks to accurately assess risk, so be sure to review all details before finalizing the configuration.

Updating a synthetic database

For periodic updates, you can simply click the "Build all" button on the database table list page. This will automatically rebuild all configured tables using freshly sampled data from the configured data source.

You can also add or remove tables from your synthetic database by navigating to the database tables page and clicking the "Configure tables" button.

Rescanning a database is currently not supported. In order to add tables that didn't exist in the original scan, you should create a new synthetic database.

Last updated