Creating synthetic databases
A synthetic database is a de-identified, synthetic representation of a sensitive database and can be queries by authorized end users. Synthetic databases must be set up before they can be used, and can be periodically refreshed when the underlying data sources change.
By default, synthetic databases will include a subset of the tables that exist in the source database; you can also define Views that expose custom subsets of data if required.
Creating a new synthetic database
Configuring data connector
Every synthetic database is tied to a source data system and mimics its schema and data distributions. The first step for creating a new synthetic database is to to configure the connection to the source data system, which you can do by clicking the "New connection" button on the Databases tab in the Subsalt portal.
Select the connector you'd like to use, and fill out the connection information that the Subsalt system should use to read data from the system.
Specifying a privacy standard
You'll also need to specify a privacy standard to enforce a particular de-identification policy. Privacy standards are typically defined by third-party auditors and cannot be modified manually; contact your administrator or Subsalt support if you're unsure of what privacy standard to select when onboarding a new data source.
Once you've configured your data connector and privacy standard, the system will attempt to connect to the source system and discover what data assets are accessible.
Schema and metadata configuration
Once the discovery process completes, each table that you want to include in your synthetic database needs to be configured with certain metadata in order to be processed; read more about the configuration process in Tables.
For more complex databases, it may be easier to download a representation of the table schema as a file, edit the file, and upload the full schema in one shot. This functionality is available via the "import" and "export" buttons visible on the table configuration page.
Define constraints
In certain cases it may be useful to configure Constraints on the database; these are useful for when there are explicit relationships between certain fields - for example, you can define that:
birth_year
should always match the year ofbirth_datetime
, orarrival_date
should always occur beforedeparture_date
Review database configuration
After your database has been configured, the final step is to review and attest that the configuration is correct and complete. It's essential that the configuration is accurate for Subsalt's automated privacy checks to accurately assess risk, so be sure to review all details before finalizing the configuration.
Updating a synthetic database
For periodic updates, you can simply click the "Build all" button on the database table list page. This will automatically rebuild all configured tables using freshly sampled data from the configured data source.
You can also add or remove tables from your synthetic database by navigating to the database tables page and clicking the "Configure tables" button.
Last updated