Google Cloudadvanced

How to Enable GKE Cost Allocation

Get detailed cost insights at the node and pod level in Google Kubernetes Engine

15 min readUpdated January 2025

Google Kubernetes Engine (GKE) cost allocation provides detailed insights into resource consumption and costs at the namespace, deployment, and pod level. This guide walks you through enabling cost allocation metering to track and attribute Kubernetes costs accurately across teams and applications.

Prerequisites

Before you begin, ensure you have:

  • A GKE cluster (Standard or Autopilot mode)
  • Kubernetes Engine Admin role or Container Cluster Admin role
  • kubectl command-line tool installed and configured
  • gcloud CLI installed and authenticated
  • BigQuery billing exports enabled (recommended for full cost visibility)
  • Basic understanding of Kubernetes namespaces, pods, and resources

Understanding GKE Cost Allocation

GKE cost allocation helps you answer questions like:

  • Which teams are consuming the most resources? - Track costs by namespace
  • What is the ROI of each application? - Measure resource consumption per workload
  • How can we optimize costs? - Identify over-provisioned or idle resources
  • How do we chargeback cloud costs? - Attribute costs to specific business units

Key features:

  • Namespace-level breakdowns: See costs per namespace
  • Resource utilization tracking: CPU, memory, storage, and network usage
  • Pod-level granularity: Drill down to individual pods and containers
  • Integration with BigQuery: Export data for custom analysis
  • GKE usage metering: Track compute, storage, and network costs separately

Step-by-Step Guide

Phase 1: Enable GKE Usage Metering (Resource Consumption)

GKE usage metering collects CPU, memory, and storage consumption data at the pod and namespace level.

For New Clusters (During Creation)

Using Google Cloud Console
  1. Navigate to GKE

  2. Configure Cluster Basics

    • Choose Standard or Autopilot mode
    • Set cluster name, location, and other basic settings
  3. Enable Cost Allocation

    • Expand Features section
    • Under "Resource usage export", check Enable usage metering
    • Dataset: Select or create a BigQuery dataset for usage data
    • Consumption metering: Check to enable (tracks CPU/memory usage)
    • Network egress metering: Check to enable (tracks network costs)
  4. Create Cluster

    • Complete other configuration steps
    • Click CREATE
Using gcloud CLI
# Create cluster with usage metering enabled
gcloud container clusters create my-cluster \
  --zone=us-central1-a \
  --enable-cloud-logging \
  --enable-cloud-monitoring \
  --resource-usage-bigquery-dataset=PROJECT_ID.DATASET_NAME \
  --enable-network-egress-metering \
  --enable-resource-consumption-metering

Example:

gcloud container clusters create production-cluster \
  --zone=us-central1-a \
  --num-nodes=3 \
  --machine-type=n1-standard-2 \
  --resource-usage-bigquery-dataset=finops-billing-prod.gke_usage_metering \
  --enable-network-egress-metering \
  --enable-resource-consumption-metering

For Existing Clusters (Enable After Creation)

Using Google Cloud Console
  1. Navigate to Your Cluster

    • Go to Kubernetes Engine > Clusters
    • Click on your cluster name
  2. Edit Cluster Settings

    • Click the EDIT button at the top
    • Scroll to "Resource usage export"
  3. Enable Usage Metering

    • Check Enable usage metering
    • BigQuery dataset: Select a dataset in the format PROJECT_ID.DATASET_NAME
    • Check Enable resource consumption metering
    • Check Enable network egress metering (optional but recommended)
  4. Save Changes

    • Click SAVE
    • Changes take effect immediately
Using gcloud CLI
# Enable usage metering on existing cluster
gcloud container clusters update CLUSTER_NAME \
  --zone=ZONE \
  --resource-usage-bigquery-dataset=PROJECT_ID.DATASET_NAME \
  --enable-network-egress-metering \
  --enable-resource-consumption-metering

Example:

gcloud container clusters update production-cluster \
  --zone=us-central1-a \
  --resource-usage-bigquery-dataset=finops-billing-prod.gke_usage_metering \
  --enable-network-egress-metering \
  --enable-resource-consumption-metering

Verify Usage Metering is Enabled

# Check cluster configuration
gcloud container clusters describe CLUSTER_NAME \
  --zone=ZONE \
  --format="value(resourceUsageExportConfig)"

Expected output:

bigqueryDestination:
  datasetId: gke_usage_metering
enableNetworkEgressMetering: true
consumptionMeteringConfig:
  enabled: true

Phase 2: Create BigQuery Dataset for Usage Data

If you haven't already created a dataset:

# Create dataset for GKE usage metering
bq mk \
  --dataset \
  --location=US \
  --description="GKE usage metering and cost allocation data" \
  PROJECT_ID:gke_usage_metering

Phase 3: Label Resources for Cost Attribution

Apply labels to namespaces, pods, and services for better cost tracking:

Label Namespaces

# Label namespace with team and environment
kubectl label namespace production \
  team=backend \
  environment=prod \
  cost-center=engineering

kubectl label namespace staging \
  team=backend \
  environment=staging \
  cost-center=engineering

Label Deployments

# deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-app
  namespace: production
  labels:
    app: my-app
    team: backend
    cost-center: engineering
    environment: prod
spec:
  replicas: 3
  selector:
    matchLabels:
      app: my-app
  template:
    metadata:
      labels:
        app: my-app
        team: backend
        cost-center: engineering
    spec:
      containers:
      - name: my-app
        image: gcr.io/my-project/my-app:latest
        resources:
          requests:
            cpu: "500m"
            memory: "512Mi"
          limits:
            cpu: "1000m"
            memory: "1Gi"

Apply with:

kubectl apply -f deployment.yaml

Phase 4: Set Resource Requests and Limits

For accurate cost allocation, all pods must have resource requests defined:

Example Pod with Resource Requests

apiVersion: v1
kind: Pod
metadata:
  name: cost-tracked-pod
  namespace: production
  labels:
    team: frontend
spec:
  containers:
  - name: app
    image: nginx:latest
    resources:
      requests:
        cpu: "250m"          # Required for cost allocation
        memory: "256Mi"      # Required for cost allocation
      limits:
        cpu: "500m"
        memory: "512Mi"

Check Pods Without Resource Requests

# Find pods without CPU requests (won't be tracked accurately)
kubectl get pods --all-namespaces -o json | \
  jq -r '.items[] | select(.spec.containers[].resources.requests.cpu == null) |
  "\(.metadata.namespace)/\(.metadata.name)"'

Phase 5: Query Usage Data in BigQuery

After enabling usage metering, data will populate in BigQuery within 24 hours.

View Usage Data Tables

# List tables in the usage metering dataset
bq ls PROJECT_ID:gke_usage_metering

You'll see tables like:

  • gke_cluster_resource_usage - CPU and memory consumption
  • gke_cluster_resource_consumption - Detailed resource consumption with labels
  • gke_network_egress - Network egress costs (if enabled)

Query Total Cost by Namespace

-- Monthly cost by namespace
SELECT
  cluster_name,
  namespace_name,
  ROUND(SUM(cpu_core_seconds) / 3600, 2) AS cpu_core_hours,
  ROUND(SUM(memory_byte_seconds) / (1024*1024*1024*3600), 2) AS memory_gb_hours,
  -- Estimate cost (adjust rates based on your machine types)
  ROUND((SUM(cpu_core_seconds) / 3600) * 0.0327, 2) AS estimated_cpu_cost,
  ROUND((SUM(memory_byte_seconds) / (1024*1024*1024*3600)) * 0.0044, 2) AS estimated_memory_cost
FROM
  `PROJECT_ID.gke_usage_metering.gke_cluster_resource_consumption`
WHERE
  usage_start_time >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 30 DAY)
GROUP BY
  cluster_name,
  namespace_name
ORDER BY
  estimated_cpu_cost DESC

Query Cost by Label (Team Attribution)

-- Cost by team label
SELECT
  cluster_name,
  resource_labels.value AS team,
  ROUND(SUM(cpu_core_seconds) / 3600, 2) AS cpu_core_hours,
  ROUND(SUM(memory_byte_seconds) / (1024*1024*1024*3600), 2) AS memory_gb_hours,
  ROUND((SUM(cpu_core_seconds) / 3600) * 0.0327 +
        (SUM(memory_byte_seconds) / (1024*1024*1024*3600)) * 0.0044, 2) AS estimated_total_cost
FROM
  `PROJECT_ID.gke_usage_metering.gke_cluster_resource_consumption`,
  UNNEST(resource_labels) AS resource_labels
WHERE
  usage_start_time >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 30 DAY)
  AND resource_labels.key = 'team'
GROUP BY
  cluster_name,
  team
ORDER BY
  estimated_total_cost DESC

Query Cost by Pod

-- Top 10 most expensive pods
SELECT
  cluster_name,
  namespace_name,
  pod_name,
  ROUND(SUM(cpu_core_seconds) / 3600, 2) AS cpu_core_hours,
  ROUND(SUM(memory_byte_seconds) / (1024*1024*1024*3600), 2) AS memory_gb_hours,
  ROUND((SUM(cpu_core_seconds) / 3600) * 0.0327 +
        (SUM(memory_byte_seconds) / (1024*1024*1024*3600)) * 0.0044, 2) AS estimated_total_cost
FROM
  `PROJECT_ID.gke_usage_metering.gke_cluster_resource_consumption`
WHERE
  usage_start_time >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 7 DAY)
GROUP BY
  cluster_name,
  namespace_name,
  pod_name
ORDER BY
  estimated_total_cost DESC
LIMIT 10

Phase 6: Integrate with Cloud Billing Data

Combine GKE usage data with actual billing costs for precise cost attribution:

-- Join usage data with actual billing costs
WITH gke_usage AS (
  SELECT
    cluster_name,
    namespace_name,
    SUM(cpu_core_seconds) / 3600 AS cpu_core_hours,
    SUM(memory_byte_seconds) / (1024*1024*1024*3600) AS memory_gb_hours
  FROM
    `PROJECT_ID.gke_usage_metering.gke_cluster_resource_consumption`
  WHERE
    usage_start_time >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 30 DAY)
  GROUP BY
    cluster_name,
    namespace_name
),
billing_costs AS (
  SELECT
    resource.name AS cluster_name,
    SUM(cost) AS total_cost
  FROM
    `PROJECT_ID.billing_data.gcp_billing_export_v1_*`
  WHERE
    service.description = 'Kubernetes Engine'
    AND _TABLE_SUFFIX >= FORMAT_DATE('%Y%m01', DATE_SUB(CURRENT_DATE(), INTERVAL 1 MONTH))
  GROUP BY
    cluster_name
)
SELECT
  u.cluster_name,
  u.namespace_name,
  u.cpu_core_hours,
  u.memory_gb_hours,
  b.total_cost AS cluster_total_cost,
  -- Proportionally allocate cost based on resource usage
  ROUND((u.cpu_core_hours / SUM(u.cpu_core_hours) OVER (PARTITION BY u.cluster_name)) * b.total_cost, 2) AS allocated_cost
FROM
  gke_usage u
JOIN
  billing_costs b
ON
  u.cluster_name = b.cluster_name
ORDER BY
  allocated_cost DESC

Best Practices

Resource Management

  1. Always set resource requests: Pods without requests won't be accurately tracked
  2. Set realistic limits: Prevents cost overruns from runaway containers
  3. Use resource quotas: Enforce namespace-level resource limits
  4. Monitor utilization: Identify over-provisioned resources

Labeling Strategy

  1. Consistent labels: Standardize labels across all clusters
  2. Required labels: team, environment, cost-center, app
  3. Apply at namespace level: Labels propagate to all pods
  4. Document label taxonomy: Create a label governance document

Cost Optimization

  1. Rightsize nodes: Use appropriate machine types for workloads
  2. Enable autoscaling: HPA for pods, cluster autoscaler for nodes
  3. Use spot VMs: Reduce costs by 60-90% for fault-tolerant workloads
  4. Review idle resources: Delete unused deployments and services

Monitoring and Alerting

  1. Set budget alerts: Notify teams when costs exceed thresholds
  2. Track trends: Monitor cost changes month-over-month
  3. Create dashboards: Visualize costs in Looker Studio or Grafana
  4. Regular reviews: Monthly cost review meetings per team

Troubleshooting

No Data in BigQuery

Problem: Usage metering enabled but no data in BigQuery

Solution:

  • Wait 24-48 hours for initial data population
  • Verify dataset exists: bq ls PROJECT_ID:gke_usage_metering
  • Check cluster configuration: gcloud container clusters describe CLUSTER_NAME
  • Ensure pods have resource requests defined
  • Verify GKE service account has BigQuery Data Editor role

Permission Denied Errors

Problem: Cannot enable usage metering or query data

Solution:

  • Verify you have Kubernetes Engine Admin role
  • Ensure BigQuery API is enabled in the project
  • Grant roles/bigquery.dataEditor to GKE service account
  • Check project-level IAM bindings

Incomplete Cost Data

Problem: Some pods not showing in cost allocation

Solution:

  • All pods must have CPU and memory requests defined
  • Check for pods without requests: See Phase 4 query above
  • Update deployments to include resource requests
  • Wait 24 hours for updated data to populate

High Query Costs

Problem: BigQuery queries consuming excessive budget

Solution:

  • Use date partitioning: Filter on usage_start_time
  • Limit query scope: Query specific namespaces or time ranges
  • Create materialized views: Pre-aggregate common queries
  • Set query quotas: Limit per-user query costs

Labels Not Appearing

Problem: Resource labels missing from BigQuery data

Solution:

  • Verify labels are applied: kubectl describe pod POD_NAME
  • Labels must be on pod template, not just deployment metadata
  • Wait 24 hours for label changes to propagate
  • Check label format: Must be lowercase, alphanumeric, and hyphens

Next Steps

After enabling GKE cost allocation:

  1. Create cost dashboards: Build Looker Studio reports for team visibility
  2. Implement chargebacks: Allocate costs to business units or projects
  3. Optimize resources: Identify and eliminate waste
  4. Set budget alerts: Notify teams of cost overruns
  5. Automate reporting: Schedule BigQuery queries for regular cost reports
  6. Enable pod autoscaling: Use HPA and VPA for dynamic resource management

Related Resources

Frequently Asked Questions

Find answers to common questions

To enable GKE usage metering on an existing cluster, navigate to the Kubernetes Engine section in the Google Cloud Console. Select your cluster, click the EDIT button, and scroll to the 'Resource usage export' section. Check 'Enable usage metering', select a BigQuery dataset in the format PROJECTID.DATASETNAME, and check both 'Enable resource consumption metering' and 'Enable network egress metering' if desired. Finally, click SAVE to apply the changes. Note that changes take effect immediately, but data population in BigQuery may take up to 24 hours.

Need Professional Help?

Our team of experts can help you implement and configure these solutions for your organization.