Tables

Tables map one-to-one with relations in the source system; in some cases they'll contain a subset of the fields from the source table, but they will never include fields that don't exist in the source system.

Setup

It's important that you configure your tables properly during onboarding to ensure that data quality is high and privacy is properly measured. There are two properties that need to be configured for each field: the semantic content of the column and any specific properties that apply to the fields, including privacy-related properties.

Content

Content labels describe what sort of data is contained in each column, which affects how the system models and trains on your data. Content labels will usually be auto-populated, but it's important that you review all labels for correctness.

Content
Description
Examples

Categorical

A nominal field with a discrete set of values

Gender, ZIP codes, ICD-10 codes

Numeric

An ordinal numeric field

Age, Height

Datetime

A date or datetime representation

7/15/22 10:41:55, August 11 2022

Currency

A string that corresponds to a USD ($) amount. Currency symbol must be the first character.

$99.99, $1.05

Binary

Any field that contains 2 unique values

1/0, yes/no, on/off

Properties

Properties provide additional metadata that's important for privacy evaluation and other important tasks. Subsalt can provide support from third-party auditors for populating HIPAA-compliant privacy labels if necessary.

Indirect identifier

A field that combined with other information would help single out an individual in a dataset

Age, Gender, Home state

Direct identifier

A field that can be used to directly single out an individual in a dataset

Names, SSNs, Contact info

Person's age

A field that indicates a person's age

Age, Birthdate

Join key

A field that can be used to join two or more tables.

Patient ID, Facility ID

Medical code

A field that contains ICD-10 codes or other classification codes

Diagnoses, procedures

Entity identifier

A field that contains unique IDs for entities that need to be modeled over time

Patient ID

Sequence key

Datetime fields that indicate the sequence of events for the entity

Visit dates

Ineligible fields

The only requirement for any field in a table in Subsalt is that the field must be at least 50% non-null; fields that do not meet this requirement will be automatically marked as ineligible. These fields will not be included in the synthetic database schema, so they will not be visible to or queryable by data consumers.

Lookup tables

Lookup tables are static fact tables that contain non-personal information, such as an OMOP Concept Tables or a list of ICD-10 codes and their classifications and/or definitions. These tables have two important properties:

  • They have no relationship to patients or patient populations on their own, and therefore carry no privacy risk until they're joined with patient-related information

  • It's important to be able to join synthetic patient information with accurate lookup table information; the definition of a particular Concept ID shouldn't change from row to row.

Tables that have these two properties can be configured as "lookup tables" during data onboarding; Subsalt copies lookup tables into the Subsalt cluster, and these tables are not synthesized and are exempt from privacy audits.

Be sure to review potential lookup tables with appropriate stakeholders before marking a table as a lookup table; this setting has significant privacy implications.

Last updated