Quick start

This guide assumes you have access to either Subsalt Cloud or a managed instance. Contact us for trial access to our Cloud-based sandbox environment.

This guide will provide a step-by-step process on how to build your first synthetic table based on a one-million row source dataset.

Creating the database

Detailed documentation: Creating synthetic databases

  1. Log into the Subsalt web portal and click the "New connection" button on the "Databases" tab.

  2. Select MySQL , click Next, and then add the following connection information:

    1. Synthetic database alias: first_database

    2. Privacy standard: HIPAA

    3. Hostname: subsalt-onboard.subsalt.svc.cluster.local

    4. Database name: healthcare

    5. Username: onboard

    6. Password: password

  3. After clicking the "Next" button, a scan status screen will appear with a progress bar indicating that a database scan is underway. During this process, the application analyzes the source dataset schema so that data columns can be properly configured in a subsequent step.

  4. Once the scanning process completes, the database configuration page will appear with a few tables listed. Check the boxes beside the patients, visits, and medications tables.

These three tables will be included in your new synthetic database. The next step is to configure each of them to maximize synthetic data quality and apply appropriate privacy protections.

Configuring tables

Detailed documentation: Tables

Next we need to select each of the three tables we've included and configure their schemas.

  1. First select the patients table and add the "Indirect identifier" property to the weight column. Then click "Mark reviewed" to go back to the database configuration page.

  2. Then select the visits table and add the "Join key" property to the patient_id column. An extra dropdown will appear, and you should select patients.patient_id to indicate which column the visits.patient_id references. Then select "Mark reviewed."

  3. Lastly, select the medications table and add the "Join key" property to patient_id, just as you did with the visits table above. Then select "Mark reviewed."

Then click the "Next" button, click "Next" on the Adding constraints page, and review your configurations for all three tables. Check the box indicating that you've reviewed the configurations and click the "Next" button.

Congratulations! You've completed onboarding for your first synthetic database!

This dataset is non-sensitive, so feel free to try different configurations.

Building your table(s)

In order to populate your synthetic database, you need to build each table. "Building" a table instructs the system to pull in data from the source system to train and evaluate generative models on. You can build tables in one of two ways:

  • Click the "Build all" button on the table list page - /databases/<id>/tables

  • Click on an individual table and click the "Build" button - /databases/<database_id>/tables/<table_id>

Building may take several hours depending on dataset size. You can click on the status indicator beside each table to monitor the progress each build job.

Once a table enters "Ready" status it'll be immediately queryable by authorized users.

Querying a table

Detailed documentation: Running queries

You can retrieve synthetic data from a table as soon as its marked as Ready. Prior to querying there are a few one-time steps to enable access to the synthetic database:

  1. If your Subsalt portal account uses SSO, you need to set a distinct password for querying Subsalt. Go to Profile > Access Credentials in the portal and configure that.

  2. Next, go to the Databases page and select the first_database you created above. Check the "Conditions" label and review any agreements that may be required for your organization.

Once those steps are complete, you're ready to query your database.

Open your preferred Postgres client - TablePlus is a good option if you don't have a favorite. Create a new Postgres connection using the following connection information:

  • Hostname: [Your access point hostname]

  • User: [Email address you used for the portal]

  • Password: [Password you set in Step 1 above]

  • Database: first_database

  • Port: 5432

Once authenticated, run the simple query below to view a sample of the results from your first table:

select * from patients limit 5000;

Joining tables

Joins work the same way as single-table queries, but require an additional build step the first time. Run this query on first_database :

select * from patients
join medications on patients.patient_id = medications.patient_id
limit 5000;

If the patients and medications tables have never been joined before, you'll receive a message indicating that a new build job needs to run along with a link to a status page to track progress. Once that job completes, these tables will be joinable for all future users.

Last updated