Database Inference & Aggregation Simulator
Learn about database inference attacks through interactive guided scenarios. Query mock HR, medical, and financial databases using aggregation functions, discover how sensitive data can be deduced, and explore countermeasures like polyinstantiation, noise injection, and cell suppression.
Need Professional Security Testing?
Our penetration testers find vulnerabilities before attackers do. Get a comprehensive security assessment.
What Is Database Inference
Database inference is a security threat in which an attacker derives sensitive information from seemingly innocuous query results. Even when direct access to confidential data is restricted, the combination of permitted queries, aggregate functions, and metadata can reveal protected information. This is a particular concern for statistical databases, data warehouses, and systems that provide analytical query access to multiple users with different privilege levels.
Unlike SQL injection, which exploits input validation flaws, inference attacks exploit the legitimate functionality of a database. An attacker uses authorized queries — counting, averaging, filtering — to narrow down results until they can deduce specific records or values that they should not be able to access.
How Database Inference Attacks Work
Inference attacks exploit the mathematical relationship between aggregate query results and individual records:
Common Inference Techniques
| Technique | Method | Example |
|---|---|---|
| Direct inference | Query results directly reveal sensitive data | "SELECT AVG(salary) WHERE department = 'CEO Office'" returns one person's salary |
| Indirect inference | Combining multiple queries isolates individuals | Two queries with overlapping filters differ by one record |
| Tracker attacks | Crafting complementary queries that sum to the full database | Query for condition C plus query for NOT C equals all records |
| Homogeneity attacks | All records in a group share the same sensitive value | Every person in a filtered result has the same diagnosis |
| Background knowledge | External data combined with query results | Knowing someone is in a specific department plus aggregate data |
Example Attack Scenario
- Attacker queries: "How many employees in Engineering earn over $200K?" → Result: 1
- Attacker knows there are 3 engineers: Alice, Bob, Carol
- Attacker queries: "How many employees named Alice or Bob in Engineering earn over $200K?" → Result: 0
- By elimination: Carol earns over $200K — sensitive information inferred without direct access
Common Use Cases
- Privacy impact assessment: Test whether your database's query interface leaks personally identifiable information through aggregate queries
- Access control design: Determine what query restrictions are needed to prevent inference on sensitive columns
- HIPAA/GDPR compliance: Demonstrate that de-identified or aggregate health and personal data cannot be re-identified through query combinations
- Data warehouse security: Evaluate whether analytical dashboards expose underlying individual records
- Security training: Teach developers and data analysts how seemingly safe queries can leak confidential information
Defense Strategies
- Query restriction — Suppress results from aggregate queries where the group size falls below a minimum threshold (typically k=5 or k=11). This prevents queries from isolating individuals.
- Differential privacy — Add calibrated random noise to query results. The noise is large enough to protect individual records but small enough to preserve statistical accuracy for legitimate analysis.
- Query auditing — Log and analyze all queries to detect patterns consistent with inference attacks. Flag sequences of queries that progressively narrow result sets.
- Cell suppression — In statistical reports, suppress cells with too few contributors and also suppress complementary cells that would allow back-calculation.
- Data generalization — Replace precise values with ranges (e.g., salary bands instead of exact figures) and use k-anonymity to ensure each record is indistinguishable from at least k-1 others.
Frequently Asked Questions
Common questions about the Database Inference & Aggregation Simulator
An inference attack uses legitimate queries on non-sensitive data to deduce sensitive information. For example, querying the average salary of a department with only one person reveals that person's exact salary. Even when direct access is denied, aggregation functions (COUNT, AVG, SUM) can leak individual data points.
Explore More Tools
Continue with these related tools
SQL Formatter & Beautifier
Format and beautify SQL queries with proper indentation, keyword capitalization, and line breaks. Supports MySQL, PostgreSQL, SQL Server, Oracle, and more.
SQL Dialect Translator
Generate CONVERT() and CAST() syntax for MySQL, PostgreSQL, SQL Server, Oracle, and SQLite with dialect-specific type mappings.
Data Classification Policy Architect
Design comprehensive data classification policies with government (TS/S/C/U) or commercial (Restricted/Confidential/Internal/Public) schemas. Define handling rules for storage, transmission, disposal, and access with compliance overlays for HIPAA, PCI-DSS, GDPR, and CMMC.
ℹ️ Disclaimer
This tool is provided for informational and educational purposes only. All processing happens entirely in your browser - no data is sent to or stored on our servers. While we strive for accuracy, we make no warranties about the completeness or reliability of results. Use at your own discretion.