Home/Blog/What are best practices for CSV headers and JSON keys?
Data Management

What are best practices for CSV headers and JSON keys?

Master the conventions for naming CSV headers and JSON keys to ensure compatibility, readability, and consistency across data formats.

By Inventive HQ Team
What are best practices for CSV headers and JSON keys?

The Importance of Proper Naming

When converting between CSV and JSON, field names transition from CSV headers (first row of the file) to JSON keys (property names in objects). These names are critically important—they:

  • Define how you access data programmatically (user.email, record["date_of_birth"])
  • Appear in documentation and discussions about the data
  • Impact readability and understandability of the data structure
  • Affect data validation and schema compatibility
  • Influence database column naming if data is imported to a database
  • Are used in API documentation and client code

Poor naming conventions lead to confusion, bugs, and difficult-to-maintain code. Conversely, consistent, thoughtful naming makes data self-documenting and integration seamless.

CSV Header Conventions

CSV headers define the structure of the file and become JSON keys after conversion. Best practices for CSV headers:

Use descriptive, meaningful names: Headers should clearly indicate what data the column contains. Avoid cryptic abbreviations or vague names:

// Good headers:
first_name, last_name, email, phone_number, date_of_birth, employee_id

// Poor headers:
fn, ln, em, ph, dob, id

"first_name" is instantly understandable; "fn" requires documentation.

Use lowercase with underscores (snake_case): This convention is widely adopted, especially in data contexts:

customer_id, order_date, product_name, unit_price, quantity_ordered

Not:

CustomerID, OrderDate, ProductName, UnitPrice, QuantityOrdered

Snake_case is easier to type, works reliably in all systems, and is the standard in databases and data APIs.

Be specific and avoid ambiguity:

// Good:
order_date, delivery_date, payment_received_date

// Ambiguous:
date, date1, date2

Specific names prevent confusion about which date is which. Multiple date fields should all be named clearly.

Avoid spaces and special characters: Spaces complicate parsing and programmatic access:

// Good:
customer_email, phone_number, zip_code

// Problematic:
customer email, phone number, zip code  // Requires quotes in CSV, awkward in JSON

Include units when relevant: If a field contains a measurement, include the unit in the name:

// Good:
height_cm, weight_kg, temperature_celsius, speed_mph

// Vague:
height, weight, temperature, speed  // What units?

This eliminates ambiguity about measurement scale.

Use consistent pluralization: Decide whether to use singular or plural names, then be consistent:

// Good (singular):
product_name, price, quantity

// Good (plural):
products, prices, quantities

// Inconsistent (avoid):
product_name, prices, quantity  // Mixed singular and plural

Singular is more common for individual field names.

Avoid reserved words: Some programming languages and databases have reserved keywords that can't be used as variable/field names without escaping:

// Problematic:
id, name, select, delete, class, type

// Better:
user_id, full_name, select_value, delete_flag, user_class, item_type

If you can't avoid reserved words, you'll have to quote them when accessing: data["class"] instead of data.class.

Keep names reasonably short: While descriptive is important, excessively long names become unwieldy:

// Good:
customer_address, account_creation_date, last_purchase_amount

// Too long:
customer_mailing_address_including_apartment_number, date_when_account_was_created, total_dollar_amount_of_last_purchase

Aim for clarity in 2-4 words.

JSON Key Conventions

When CSV headers become JSON keys, the same conventions apply. Additionally, JSON has specific requirements:

JSON keys must be strings: All property names in JSON must be enclosed in double quotes:

{
  "customer_id": 123,
  "order_date": "2025-01-31",
  "total_amount": 99.99
}

Not:

{
  customer_id: 123,  // Invalid JSON - keys must be strings
  "order_date": "2025-01-31",
  "total_amount": 99.99
}

Use consistent casing: Choose between snake_case, camelCase, or PascalCase and be consistent. For API/JSON contexts, two conventions dominate:

snake_case (common in databases and back-end systems):

{
  "customer_id": 123,
  "order_date": "2025-01-31",
  "total_amount": 99.99,
  "shipping_address": "123 Main St"
}

camelCase (common in JavaScript and front-end applications):

{
  "customerId": 123,
  "orderDate": "2025-01-31",
  "totalAmount": 99.99,
  "shippingAddress": "123 Main St"
}

Both are valid. Choose one and use it consistently throughout your API or application. Many teams choose snake_case for APIs (matching database conventions) and convert to camelCase in front-end JavaScript (matching language conventions).

Avoid special characters: Even though JSON technically allows special characters in key names (when quoted), avoid them:

// Technically valid but poor practice:
{
  "customer@id": 123,
  "order$date": "2025-01-31",
  "total-amount": 99.99
}

This requires awkward quoting when accessing: data["customer@id"] instead of data.customerId.

Be consistent with type information: Prefix boolean fields with "is_" or "has_" to clarify they contain boolean values:

{
  "is_active": true,
  "has_premium_account": false,
  "is_verified_email": true
}

This naming convention makes the type obvious: is_ and has_ prefixes clearly indicate boolean values.

Conversion Best Practices

When converting from CSV headers to JSON keys:

Normalize during conversion: Use your conversion tool's rename features to standardize naming if source CSV doesn't follow conventions:

# Python example - rename headers during conversion
import pandas as pd

df = pd.read_csv('data.csv')
df.columns = df.columns.str.lower().str.replace(' ', '_')  # Normalize headers
json_data = df.to_json(orient='records')

Map inconsistent sources: If your CSV comes from external sources with poor header naming, create a mapping:

{
  "headerMapping": {
    "CustomerName": "customer_name",
    "Cust. Email": "customer_email",
    "PhoneNum": "phone_number",
    "DOB": "date_of_birth"
  }
}

Document the schema: Create documentation defining all field names and their meanings:

## Data Schema

- `customer_id` (number): Unique customer identifier
- `customer_name` (string): Full name of customer
- `email` (string): Primary email address
- `phone_number` (string): Primary phone number (format: XXX-XXX-XXXX)
- `date_of_birth` (string, ISO 8601): Customer birth date
- `is_active` (boolean): Whether account is active

Naming for Different Data Types

Different field types have naming conventions:

Numeric identifiers: Usually suffixed with "_id":

{
  "customer_id": 123,
  "order_id": 456,
  "product_id": 789
}

Dates and timestamps: Include "date" or "timestamp" in the name:

{
  "created_date": "2025-01-31",
  "updated_timestamp": "2025-01-31T14:30:00Z",
  "expiration_date": "2025-12-31"
}

Monetary amounts: Include the type or currency:

{
  "price_usd": 99.99,
  "salary_annual": 95000,
  "discount_percent": 15
}

Percentages: Suffix with "_percent" or "_percentage":

{
  "discount_percent": 15,
  "growth_percentage": 25.5,
  "success_rate": 0.95  // Or use 0-1 scale without "percent" suffix
}

Boolean fields: Prefix with "is_", "has_", "should_", or "can_":

{
  "is_active": true,
  "is_deleted": false,
  "has_premium_subscription": true,
  "should_notify": true,
  "can_edit": false
}

Status fields: Avoid ambiguous names:

{
  "order_status": "completed",    // Good: clear what status is
  "payment_status": "pending",
  "status": "unknown"  // Bad: which status?
}

Lists and arrays: Often pluralized:

{
  "tags": ["urgent", "customer-service"],
  "email_addresses": ["[email protected]", "[email protected]"],
  "phone_numbers": ["555-1234", "555-5678"]
}

Handling Special Cases

Nested/hierarchical data: Use dot notation or nested objects:

// Dot notation in headers:
// address.street, address.city, address.zip_code

// Results in nested JSON:
{
  "address": {
    "street": "123 Main St",
    "city": "Boston",
    "zip_code": "02101"
  }
}

Multiple-language content: Include language suffix:

{
  "product_name_en": "Laptop",
  "product_name_es": "Portátil",
  "product_name_fr": "Ordinateur portable"
}

Or use nested structure:

{
  "product_name": {
    "en": "Laptop",
    "es": "Portátil",
    "fr": "Ordinateur portable"
  }
}

Legacy/deprecated fields: Mark clearly:

{
  "customer_id": 123,
  "customer_email": "[email protected]",
  "_deprecated_email": "[email protected]",  // Clearly marked as deprecated
  "_legacy_account_number": "ABC123"
}

Or remove entirely if truly unused.

Validation and Standards

To enforce naming standards:

Use JSON Schema with pattern validation:

{
  "type": "object",
  "properties": {
    "customer_id": {"type": "number"},
    "email": {"type": "string", "format": "email"}
  },
  "patternProperties": {
    "^[a-z_]+$": {"type": ["string", "number", "boolean", "null"]}
  }
}

This schema validates that all keys follow snake_case pattern.

Implement linting: Use tools to check naming conventions:

  • ESLint rules for camelCase in JavaScript
  • Python style checkers (flake8, pylint) for naming conventions
  • Custom validation scripts for CSV headers

Document and enforce in code review: Make naming conventions part of code review standards. Catch naming inconsistencies before code is merged.

Common Naming Mistakes to Avoid

  1. Mixing conventions: Don't mix snake_case, camelCase, and PascalCase in the same dataset
  2. Using reserved words: Avoid JavaScript keywords like "class", "type", "delete"
  3. Overly abbreviated names: "cst_em" instead of "customer_email" saves characters but costs clarity
  4. No type indication: For boolean fields especially, use "is_", "has_" prefixes
  5. Inconsistent pluralization: Don't mix "customer" and "customers" for the same concept
  6. Spaces and special characters: These complicate programmatic access and parsing
  7. Vague names: "value", "data", "info" need context about what they contain

Best Practices Summary

  • Use snake_case for CSV headers and JSON keys (unless camelCase is standard in your organization)
  • Make names descriptive but concise (2-4 words typically)
  • Avoid spaces, special characters, and reserved words
  • Use consistent conventions throughout your datasets
  • Prefix boolean fields with "is_", "has_", or similar
  • Include units in measurement field names
  • Include type hints (like "_percent", "_id") when helpful
  • Document the schema and naming conventions
  • Validate naming conventions in code reviews
  • Map external data sources to your internal naming standards

Conclusion

Proper CSV header and JSON key naming is foundational for data quality and usability. Following consistent conventions makes data self-documenting, simplifies integration, enables programmatic access, and prevents errors. By adopting snake_case naming, using descriptive names, avoiding special characters, and being consistent throughout your datasets, you create data structures that work seamlessly across formats, systems, and teams.

Need Expert IT & Security Guidance?

Our team is ready to help protect and optimize your business technology infrastructure.