A BigQuery dataset is a container that organizes and controls access to your tables and views. This guide walks you through creating a dataset for billing data exports, analytics, or any other data storage needs in Google Cloud.
Prerequisites
Before you begin, ensure you have:
- A Google Cloud project with the BigQuery API enabled
- Owner or Editor role on the project (or BigQuery Admin role)
- Access to the Google Cloud Console or gcloud CLI installed
- Basic understanding of data location and access control requirements
Understanding BigQuery Datasets
A dataset in BigQuery is similar to a database in traditional SQL systems. Key characteristics:
- Organizational unit: Datasets contain tables, views, and models
- Access control boundary: Permissions are set at the dataset level
- Regional or multi-regional: Data location affects query performance and cost
- Naming restrictions: Dataset names must be alphanumeric and underscores only
Step-by-Step Guide
Method 1: Using Google Cloud Console
-
Navigate to BigQuery
- Open the Google Cloud Console
- Click the navigation menu (three horizontal lines) in the top-left
- Select BigQuery under the "Analytics" section
- The BigQuery Studio interface will open
-
Open the Create Dataset Dialog
- In the Explorer panel on the left, locate your project name
- Click the three-dot menu (⋮) next to your project name
- Select Create dataset
-
Configure Dataset Settings
-
Dataset ID: Enter a descriptive name (e.g.,
billing_data,analytics_dataset)- Use lowercase letters, numbers, and underscores only
- Must be unique within the project
- Cannot be changed after creation
-
Data location: Choose where data will be stored
- Multi-region (recommended for billing exports):
US (multiple regions in United States)EU (multiple regions in European Union)
- Region (for specific compliance requirements):
us-central1,us-east1,europe-west1, etc.
- Note: Location cannot be changed after creation
- Multi-region (recommended for billing exports):
-
-
Set Default Table Expiration (Optional)
- Enable table expiration: Check this box to automatically delete tables after a specified time
- Days until expiration: Enter number of days (e.g., 365 for one year)
- Use case: Useful for managing storage costs on historical data
- Note: This is a default setting; individual tables can override it
-
Configure Encryption (Optional)
- Default encryption: Google-managed encryption key (recommended)
- Customer-managed encryption key (CMEK): Select a Cloud KMS key for additional control
- Use case: Required for some compliance frameworks
-
Create the Dataset
- Click CREATE DATASET
- The new dataset will appear in the Explorer panel
- You'll see a confirmation message
Method 2: Using gcloud CLI
Basic Dataset Creation
# Create a dataset with default settings
bq mk \
--dataset \
--location=US \
PROJECT_ID:DATASET_NAME
Example:
bq mk \
--dataset \
--location=US \
my-project:billing_data
Dataset with Table Expiration
# Create dataset with 365-day default table expiration
bq mk \
--dataset \
--location=US \
--default_table_expiration=31536000 \
PROJECT_ID:DATASET_NAME
Note: Table expiration is specified in seconds (31,536,000 seconds = 365 days)
Dataset with Description
# Create dataset with description
bq mk \
--dataset \
--location=US \
--description="Billing export data from Cloud Billing" \
PROJECT_ID:DATASET_NAME
Verify Dataset Creation
# List all datasets in a project
bq ls --project_id=PROJECT_ID
# Show dataset details
bq show --format=prettyjson PROJECT_ID:DATASET_NAME
Method 3: Using Terraform (Infrastructure as Code)
For repeatable infrastructure deployments:
resource "google_bigquery_dataset" "billing_dataset" {
dataset_id = "billing_data"
friendly_name = "Billing Export Data"
description = "BigQuery dataset for Cloud Billing exports"
location = "US"
default_table_expiration_ms = 31536000000 # 365 days in milliseconds
access {
role = "OWNER"
user_by_email = "[email protected]"
}
access {
role = "READER"
special_group = "projectReaders"
}
labels = {
environment = "production"
team = "finance"
}
}
Apply with:
terraform init
terraform plan
terraform apply
Configuring Dataset Access Control
After creating your dataset, configure who can access it:
Using Console
-
Navigate to Dataset Settings
- Click on your dataset in the Explorer panel
- Click the SHARING tab (or click the three-dot menu and select Share)
-
Add Principals
- Click ADD PRINCIPAL
- Enter email addresses or group names
- Assign roles:
- BigQuery Data Viewer: Read-only access to tables
- BigQuery Data Editor: Read and write access
- BigQuery Admin: Full control over the dataset
-
Save Changes
- Click DONE
- Changes take effect immediately
Using gcloud CLI
# Grant BigQuery Data Viewer role to a user
bq update \
--dataset_access_entry="role=READER,[email protected]" \
PROJECT_ID:DATASET_NAME
# Grant BigQuery Data Editor role to a group
bq update \
--dataset_access_entry="role=WRITER,[email protected]" \
PROJECT_ID:DATASET_NAME
Best Practices
Naming Conventions
- Use descriptive names:
billing_data,analytics_prod,customer_data_warehouse - Include environment:
billing_data_prod,billing_data_dev - Avoid generic names: Don't use
dataset1,temp,testin production - Use underscores: Not hyphens or spaces (e.g.,
billing_datanotbilling-data)
Data Location Selection
- Billing exports: Use
USmulti-region (Google Cloud billing is US-based) - Compliance requirements: Select region based on data residency laws (e.g.,
EUfor GDPR) - Performance: Choose location close to data sources and users
- Cost optimization: Multi-region is more expensive than single region
Access Control
- Principle of least privilege: Grant minimum necessary permissions
- Use groups: Assign permissions to groups, not individual users
- Service accounts: Use dedicated service accounts for automated processes
- Audit regularly: Review dataset access quarterly
Cost Management
- Set default table expiration: Automatically delete old data to reduce storage costs
- Use partitioned tables: Reduce query costs by scanning less data
- Monitor usage: Set up billing alerts for BigQuery storage and queries
- Archive old datasets: Export unused datasets to Cloud Storage
Security
- Enable encryption: Use customer-managed keys (CMEK) for sensitive data
- Audit logging: Enable Data Access audit logs for compliance
- VPC Service Controls: Restrict data access to specific networks (enterprise feature)
- Data masking: Use column-level security for sensitive fields
Common Dataset Configurations
For Billing Exports
bq mk \
--dataset \
--location=US \
--description="Cloud Billing export data" \
--default_table_expiration=63072000 \
my-project:billing_data
For Analytics Workloads
bq mk \
--dataset \
--location=us-central1 \
--description="Product analytics and user behavior data" \
--default_table_expiration=7776000 \
my-project:analytics_prod
For Data Warehousing
bq mk \
--dataset \
--location=US \
--description="Enterprise data warehouse" \
my-project:data_warehouse
Troubleshooting
Permission Denied Error
Problem: "Permission denied" when creating a dataset
Solution:
- Verify you have Owner, Editor, or BigQuery Admin role
- Check that the BigQuery API is enabled in your project
- Ensure you're authenticated:
gcloud auth login - Verify you're working in the correct project:
gcloud config get-value project
Dataset Already Exists
Problem: "Dataset already exists" error
Solution:
- Dataset names must be unique within a project
- List existing datasets:
bq ls - Choose a different dataset name
- Delete the existing dataset if it's no longer needed:
bq rm -d PROJECT_ID:DATASET_NAME
Invalid Dataset ID
Problem: Error about invalid dataset name
Solution:
- Use only lowercase letters, numbers, and underscores
- Cannot start with a number
- Maximum 1024 characters
- Cannot use reserved words like
table,project,dataset
Location Mismatch
Problem: "Cannot query table from different location" error
Solution:
- Ensure your dataset location matches your billing export location
- Billing exports default to
USmulti-region - Cannot change dataset location after creation
- Create a new dataset in the correct location and reconfigure exports
Insufficient Quota
Problem: "Quota exceeded" when creating dataset
Solution:
- Check BigQuery quotas:
gcloud compute project-info describe --project PROJECT_ID - Default limit is 1024 datasets per project
- Delete unused datasets to free up quota
- Request quota increase via Google Cloud Console
Next Steps
After creating your BigQuery dataset:
- Enable billing exports: Configure Cloud Billing to export to your new dataset
- Create tables: Define table schemas for your data
- Set up access control: Grant appropriate permissions to team members
- Configure monitoring: Set up alerts for storage and query costs
- Create views: Build SQL views for common queries