Home/Tools/Security/Database Inference & Aggregation Simulator

Database Inference & Aggregation Simulator

Learn about database inference attacks through interactive guided scenarios. Query mock HR, medical, and financial databases using aggregation functions, discover how sensitive data can be deduced, and explore countermeasures like polyinstantiation, noise injection, and cell suppression.

Loading Database Inference & Aggregation Simulator...
Loading interactive tool...

Need Professional Security Testing?

Our penetration testers find vulnerabilities before attackers do. Get a comprehensive security assessment.

What Is Database Inference

Database inference is a security threat in which an attacker derives sensitive information from seemingly innocuous query results. Even when direct access to confidential data is restricted, the combination of permitted queries, aggregate functions, and metadata can reveal protected information. This is a particular concern for statistical databases, data warehouses, and systems that provide analytical query access to multiple users with different privilege levels.

Unlike SQL injection, which exploits input validation flaws, inference attacks exploit the legitimate functionality of a database. An attacker uses authorized queries — counting, averaging, filtering — to narrow down results until they can deduce specific records or values that they should not be able to access.

How Database Inference Attacks Work

Inference attacks exploit the mathematical relationship between aggregate query results and individual records:

Common Inference Techniques

TechniqueMethodExample
Direct inferenceQuery results directly reveal sensitive data"SELECT AVG(salary) WHERE department = 'CEO Office'" returns one person's salary
Indirect inferenceCombining multiple queries isolates individualsTwo queries with overlapping filters differ by one record
Tracker attacksCrafting complementary queries that sum to the full databaseQuery for condition C plus query for NOT C equals all records
Homogeneity attacksAll records in a group share the same sensitive valueEvery person in a filtered result has the same diagnosis
Background knowledgeExternal data combined with query resultsKnowing someone is in a specific department plus aggregate data

Example Attack Scenario

  1. Attacker queries: "How many employees in Engineering earn over $200K?" → Result: 1
  2. Attacker knows there are 3 engineers: Alice, Bob, Carol
  3. Attacker queries: "How many employees named Alice or Bob in Engineering earn over $200K?" → Result: 0
  4. By elimination: Carol earns over $200K — sensitive information inferred without direct access

Common Use Cases

  • Privacy impact assessment: Test whether your database's query interface leaks personally identifiable information through aggregate queries
  • Access control design: Determine what query restrictions are needed to prevent inference on sensitive columns
  • HIPAA/GDPR compliance: Demonstrate that de-identified or aggregate health and personal data cannot be re-identified through query combinations
  • Data warehouse security: Evaluate whether analytical dashboards expose underlying individual records
  • Security training: Teach developers and data analysts how seemingly safe queries can leak confidential information

Defense Strategies

  1. Query restriction — Suppress results from aggregate queries where the group size falls below a minimum threshold (typically k=5 or k=11). This prevents queries from isolating individuals.
  2. Differential privacy — Add calibrated random noise to query results. The noise is large enough to protect individual records but small enough to preserve statistical accuracy for legitimate analysis.
  3. Query auditing — Log and analyze all queries to detect patterns consistent with inference attacks. Flag sequences of queries that progressively narrow result sets.
  4. Cell suppression — In statistical reports, suppress cells with too few contributors and also suppress complementary cells that would allow back-calculation.
  5. Data generalization — Replace precise values with ranges (e.g., salary bands instead of exact figures) and use k-anonymity to ensure each record is indistinguishable from at least k-1 others.

Frequently Asked Questions

Common questions about the Database Inference & Aggregation Simulator

An inference attack uses legitimate queries on non-sensitive data to deduce sensitive information. For example, querying the average salary of a department with only one person reveals that person's exact salary. Even when direct access is denied, aggregation functions (COUNT, AVG, SUM) can leak individual data points.

ℹ️ Disclaimer

This tool is provided for informational and educational purposes only. All processing happens entirely in your browser - no data is sent to or stored on our servers. While we strive for accuracy, we make no warranties about the completeness or reliability of results. Use at your own discretion.