Overview
Subsalt is a Postgres-compatible query engine that legally de-identifies query results.
Subsalt is a Postgres-compatible generative database; it mirrors other databases containing sensitive data (healthcare information, user data, financial records, etc) and can generate legally de-identified responses to standard SQL queries.
In practice, data from generative databases can dramatically accelerate access to sensitive data sources for a wide range of analytics use cases.
Using Subsalt
As an analyst or data scientist, using Subsalt feels very similar to using a traditional Postgres database: you write the same queries you would've written on the sensitive data source, and get results back. You can connect using all of the tools you're used to.
One important difference: the data Subsalt responds with is fully synthetic. In many cases, Subsalt data can be used in place of real data for machine learning applications and population-level analyses; in other cases, users will do their exploratory and development work on synthetics and then ship their code back to real data for a blind final analysis.
In either case, you get faster access to data and get to keep using the tools you know and love. Subsalt is commonly used for machine learning, research, data science, and business intelligence use cases where population-level patterns (as opposed to studying individual rows) is the primary objective.
How it fits in
Subsalt is typically deployed in a Kubernetes cluster in a customer's cloud environment and connected to one or more data warehouses; Subsalt serves as an alternative access point for users who can't easily access the sensitive warehouse(s).
The system periodically connects to these warehouses to train generative models, but no active connection is required between Subsalt and the warehouse for handling queries.
Users most likely to benefit are those who can't access (or can't easily access) the underlying warehouse data due to compliance or privacy concerns, but need to complete projects that depend on that data. Because of Subsalt's automatic de-identification capabilities, these users can often get dramatically faster access to Subsalt than they could to the real data.
Managed vs Cloud
Subsalt instances can be provisioned directly in customer environments (Subsalt Managed) or in Subsalt's secure cloud (Subsalt Cloud).
Infrastructure Management
Customer-owned cloud account, Subsalt manages cluster
Fully managed by Subsalt
Data Location
Customer's cloud environment
Stored in customer's cloud Accessible from Subsalt's cloud
Compliance Requirements
Customer-controlled environment
SOC2 + HIPAA certified environment, BAA available on request.
Time to Deploy
Requires cluster provisioning
No infrastructure setup
Network Requirements
Deployed within customer VPC
Private Link or similar for data access
Infrastructure Access
Customer has full infrastructure-level access
None
Support
24/7 monitoring, automatic updates, email support
24/7 monitoring, automatic updates, email support
Last updated