Overview

Subsalt is a Postgres-compatible query engine that legally de-identifies query results.

Subsalt is a Postgres-compatible generative database; it mirrors other databases containing sensitive data (healthcare information, user data, financial records, etc) and can generate legally de-identified responses to standard SQL queries.

In practice, data from generative databases can dramatically accelerate access to sensitive data sources for a wide range of analytics use cases.

Using Subsalt

As an analyst or data scientist, using Subsalt feels very similar to using a traditional Postgres database: you write the same queries you would've written on the sensitive data source, and get results back. You can connect using all of the tools you're used to.

One important difference: the data Subsalt responds with is fully synthetic. In many cases, Subsalt data can be used in place of real data for machine learning applications and population-level analyses; in other cases, users will do their exploratory and development work on synthetics and then ship their code back to real data for a blind final analysis.

In either case, you get faster access to data and get to keep using the tools you know and love. Subsalt is commonly used for machine learning, research, data science, and business intelligence use cases where population-level patterns (as opposed to studying individual rows) is the primary objective.

How it fits in

Subsalt is typically deployed in a Kubernetes cluster in a customer's cloud environment and connected to one or more data warehouses; Subsalt serves as an alternative access point for users who can't easily access the sensitive warehouse(s).

The system periodically connects to these warehouses to train generative models, but no active connection is required between Subsalt and the warehouse for handling queries.

Users most likely to benefit are those who can't access (or can't easily access) the underlying warehouse data due to compliance or privacy concerns, but need to complete projects that depend on that data. Because of Subsalt's automatic de-identification capabilities, these users can often get dramatically faster access to Subsalt than they could to the real data.

Managed vs Cloud

Subsalt instances can be provisioned directly in customer environments (Subsalt Managed) or in Subsalt's secure cloud (Subsalt Cloud).

Factor
Subsalt Managed
Subsalt Cloud

Infrastructure Management

Customer-owned cloud account, Subsalt manages cluster

Fully managed by Subsalt

Data Location

Customer's cloud environment

Stored in customer's cloud Accessible from Subsalt's cloud

Compliance Requirements

Customer-controlled environment

SOC2 + HIPAA certified environment, BAA available on request.

Time to Deploy

Requires cluster provisioning

No infrastructure setup

Network Requirements

Deployed within customer VPC

Private Link or similar for data access

Infrastructure Access

Customer has full infrastructure-level access

None

Support

24/7 monitoring, automatic updates, email support

24/7 monitoring, automatic updates, email support

Last updated