Database Inference & Aggregation Simulator
Learn about database inference attacks through interactive guided scenarios. Query mock HR, medical, and financial databases using aggregation functions, discover how sensitive data can be deduced, and explore countermeasures like polyinstantiation, noise injection, and cell suppression.
Want to learn more?
Understand how inference and aggregation attacks extract sensitive data from databases.
Read the guideProtecting Sensitive Database Data?
Our team implements database security controls, access logging, and inference prevention.
What Is Database Inference
Database inference is a security threat in which an attacker derives sensitive information from seemingly innocuous query results. Even when direct access to confidential data is restricted, the combination of permitted queries, aggregate functions, and metadata can reveal protected information. This is a particular concern for statistical databases, data warehouses, and systems that provide analytical query access to multiple users with different privilege levels.
Unlike SQL injection, which exploits input validation flaws, inference attacks exploit the legitimate functionality of a database. An attacker uses authorized queries — counting, averaging, filtering — to narrow down results until they can deduce specific records or values that they should not be able to access.
How Database Inference Attacks Work
Inference attacks exploit the mathematical relationship between aggregate query results and individual records:
Common Inference Techniques
| Technique | Method | Example |
|---|---|---|
| Direct inference | Query results directly reveal sensitive data | "SELECT AVG(salary) WHERE department = 'CEO Office'" returns one person's salary |
| Indirect inference | Combining multiple queries isolates individuals | Two queries with overlapping filters differ by one record |
| Tracker attacks | Crafting complementary queries that sum to the full database | Query for condition C plus query for NOT C equals all records |
| Homogeneity attacks | All records in a group share the same sensitive value | Every person in a filtered result has the same diagnosis |
| Background knowledge | External data combined with query results | Knowing someone is in a specific department plus aggregate data |
Example Attack Scenario
- Attacker queries: "How many employees in Engineering earn over $200K?" → Result: 1
- Attacker knows there are 3 engineers: Alice, Bob, Carol
- Attacker queries: "How many employees named Alice or Bob in Engineering earn over $200K?" → Result: 0
- By elimination: Carol earns over $200K — sensitive information inferred without direct access
Common Use Cases
- Privacy impact assessment: Test whether your database's query interface leaks personally identifiable information through aggregate queries
- Access control design: Determine what query restrictions are needed to prevent inference on sensitive columns
- HIPAA/GDPR compliance: Demonstrate that de-identified or aggregate health and personal data cannot be re-identified through query combinations
- Data warehouse security: Evaluate whether analytical dashboards expose underlying individual records
- Security training: Teach developers and data analysts how seemingly safe queries can leak confidential information
Defense Strategies
- Query restriction — Suppress results from aggregate queries where the group size falls below a minimum threshold (typically k=5 or k=11). This prevents queries from isolating individuals.
- Differential privacy — Add calibrated random noise to query results. The noise is large enough to protect individual records but small enough to preserve statistical accuracy for legitimate analysis.
- Query auditing — Log and analyze all queries to detect patterns consistent with inference attacks. Flag sequences of queries that progressively narrow result sets.
- Cell suppression — In statistical reports, suppress cells with too few contributors and also suppress complementary cells that would allow back-calculation.
- Data generalization — Replace precise values with ranges (e.g., salary bands instead of exact figures) and use k-anonymity to ensure each record is indistinguishable from at least k-1 others.
Frequently Asked Questions
Common questions about the Database Inference & Aggregation Simulator
An inference attack uses legitimate queries on non-sensitive data to deduce sensitive information. For example, querying the average salary of a department with only one person reveals that person's exact salary. Even when direct access is denied, aggregation functions (COUNT, AVG, SUM) can leak individual data points.
When a query returns aggregate results for a small group, individual values can be deduced. If you know the sum of salaries for 5 people and the sum for 4 of them, simple subtraction reveals the 5th person's salary. This simulator demonstrates these attacks with guided scenarios on mock databases.
Polyinstantiation creates multiple instances of the same data at different classification levels. A Top Secret user sees the real data, while a Secret user sees a plausible but different version. This prevents inference attacks by eliminating the ability to detect that data exists at a higher classification level.
Key countermeasures include: cell suppression (hiding values in small groups), noise injection (adding random perturbation to query results), query restriction (limiting queries that return small result sets), polyinstantiation (multiple data versions by clearance), and differential privacy (mathematical guarantees against inference).
Database security is covered in CISSP Domain 8: Software Development Security. Key topics include database inference and aggregation attacks, polyinstantiation, views for access control, database encryption, and the role of the DBMS in enforcing security policies. Understanding these attacks is essential for the CISSP exam.
Explore More Tools
Continue with these related tools
ℹ️ Disclaimer
This tool is provided for informational and educational purposes only. All processing happens entirely in your browser - no data is sent to or stored on our servers. While we strive for accuracy, we make no warranties about the completeness or reliability of results. Use at your own discretion.