Tables
Tables map one-to-one with relations in the source system; in some cases they'll contain a subset of the fields from the source table, but they will never include fields that don't exist in the source system.
Setup
It's important that you configure your tables properly during onboarding to ensure that data quality is high and privacy is properly measured. There are two properties that need to be configured for each field: the semantic content of the column and any specific properties that apply to the fields, including privacy-related properties.
Content
Content labels describe what sort of data is contained in each column, which affects how the system models and trains on your data. Content labels will usually be auto-populated, but it's important that you review all labels for correctness.
Categorical
A nominal field with a discrete set of values
Gender, ZIP codes, ICD-10 codes
Numeric
An ordinal numeric field
Age, Height
Datetime
A date or datetime representation
7/15/22 10:41:55, August 11 2022
Currency
A string that corresponds to a USD ($) amount. Currency symbol must be the first character.
$99.99, $1.05
Binary
Any field that contains 2 unique values
1/0, yes/no, on/off
Properties
Properties provide additional metadata that's important for privacy evaluation and other important tasks. Subsalt can provide support from third-party auditors for populating HIPAA-compliant privacy labels if necessary.
Indirect identifier
A field that combined with other information would help single out an individual in a dataset
Age, Gender, Home state
Direct identifier
A field that can be used to directly single out an individual in a dataset
Names, SSNs, Contact info
Person's age
A field that indicates a person's age
Age, Birthdate
Join key
A field that can be used to join two or more tables.
Patient ID, Facility ID
Medical code
A field that contains ICD-10 codes or other classification codes
Diagnoses, procedures
Entity identifier
A field that contains unique IDs for entities that need to be modeled over time
Patient ID
Sequence key
Datetime fields that indicate the sequence of events for the entity
Visit dates
Ineligible fields
The only requirement for any field in a table in Subsalt is that the field must be at least 50% non-null; fields that do not meet this requirement will be automatically marked as ineligible. These fields will not be included in the synthetic database schema, so they will not be visible to or queryable by data consumers.
Lookup tables
Lookup tables are static fact tables that contain non-personal information, such as an OMOP Concept Tables or a list of ICD-10 codes and their classifications and/or definitions. These tables have two important properties:
They have no relationship to patients or patient populations on their own, and therefore carry no privacy risk until they're joined with patient-related information
It's important to be able to join synthetic patient information with accurate lookup table information; the definition of a particular Concept ID shouldn't change from row to row.
Tables that have these two properties can be configured as "lookup tables" during data onboarding; Subsalt copies lookup tables into the Subsalt cluster, and these tables are not synthesized and are exempt from privacy audits.
Be sure to review potential lookup tables with appropriate stakeholders before marking a table as a lookup table; this setting has significant privacy implications.
Last updated