Building an AI-Powered SaaS Platform on Cloudflare's Edge
How we designed and built a full-stack AI resume platform entirely on Cloudflare's edge infrastructure — leveraging Workers, D1, R2, Vectorize, AI Gateway, and Browser Rendering to deliver sub-100ms global latency at a fraction of traditional cloud costs.
Overview
When a startup approached us to build an AI-powered career platform, they had an ambitious vision: a SaaS application that could ingest a user's entire career history — resumes, performance reviews, certifications, LinkedIn profiles — and generate tailored, ATS-optimized resumes matched to specific job descriptions. They needed it fast, they needed it scalable, and they needed it affordable. The result is Superpower Resume, a production platform built entirely on Cloudflare's edge infrastructure.
Rather than assembling a traditional cloud stack with separate compute, database, storage, and AI services across multiple providers, we made a deliberate architectural decision: build the entire platform on Cloudflare. Every request, every database query, every AI inference, every file operation runs at the edge — no origin servers, no cold starts, no cross-region latency penalties.
The Challenges
Building an AI SaaS from scratch presents compounding technical challenges. Each one alone is manageable, but together they create a complexity that demands careful architectural thinking.
RAG Pipeline at the Edge
The core product requirement — generating resumes from a user's career history — demanded a full Retrieval-Augmented Generation (RAG) pipeline. Documents needed to be parsed, chunked, embedded into vectors, stored in a searchable index, and retrieved contextually at generation time. Traditional RAG architectures rely on centralized vector databases and GPU-heavy embedding services, creating latency bottlenecks and infrastructure complexity.
Multi-Tenant Data Isolation
Career documents are deeply personal. The platform needed airtight tenant isolation — not just at the application layer, but at every infrastructure boundary. A single misconfigured query or leaked vector namespace could expose one user's career history to another. This had to be enforced structurally, not just through application logic.
AI Cost Management
AI inference costs can spiral quickly. With multiple features requiring LLM calls — resume generation, cover letters, mock interviews, chat-based editing, gap analysis — a naive approach of routing everything through a single premium model would make per-user economics unsustainable. The platform needed intelligent model routing that matched task complexity to model capability.
Global Performance
Job seekers don't wait. A resume builder that takes 5 seconds to load or 30 seconds to generate loses users immediately. The platform needed sub-100ms page loads and fast streaming responses globally — from New York to Nairobi — without the operational overhead of multi-region deployments.
Zero-to-Scale Economics
As a startup, the product needed to launch with near-zero infrastructure costs and scale linearly with revenue. Traditional cloud architectures with always-on instances, reserved capacity, and baseline database costs create a cost floor that's hostile to early-stage products.
The Solution
We designed Superpower Resume as a Cloudflare-native application, leveraging 10+ Cloudflare products as integrated building blocks rather than bolting together services from multiple providers.
Edge-First Application Architecture
The application runs on Cloudflare Workers via the OpenNext adapter, which deploys a full Next.js application to Workers without modification. Every HTTP request is handled at Cloudflare's nearest edge location — there is no origin server to route to. Static assets are served from Workers Static Assets, API routes execute in Workers, and server-side rendering happens at the edge. This eliminates the cold start problem entirely and delivers consistent sub-100ms Time to First Byte globally.
Cloudflare D1 as the Primary Database
All relational data — users, resumes, jobs, subscriptions, documents, interview transcripts, chat messages — lives in Cloudflare D1, a SQLite-based edge database. Because D1 is co-located with the Worker, database queries execute with sub-millisecond latency. No connection pooling, no VPC networking, no database proxy layers. We designed the schema to support the full application lifecycle: authentication, document management, resume versioning, job tracking, billing state, and usage metering.
RAG Pipeline with Vectorize and Workers AI
The RAG pipeline is the technical heart of the platform. When a user uploads a career document, the client-side parser extracts text from PDFs, DOCX files, and HTML. The extracted text is sent to the server, where it's chunked into 450-token segments with 50-token overlap for retrieval continuity. Each chunk is embedded using Cloudflare Workers AI (BGE-M3, 1024 dimensions) and stored in Cloudflare Vectorize with the user's ID as the namespace — providing hard isolation at the vector index level.
When a user pastes a job description and requests a resume, the system executes a multi-query RAG search: three parallel vector queries with different formulations of the job requirements. Results are deduplicated, scored, and assembled into an 8,000-token context window. This context, combined with the job description and the user's custom instructions, feeds into the LLM for streaming resume generation.
AI Gateway for Multi-Provider Model Routing
Different AI tasks have different quality and cost requirements. We implemented task-based model routing through Cloudflare AI Gateway, which acts as a unified proxy across OpenAI, Google Gemini, and Workers AI. Resume generation and chat revisions route to GPT-4.1-mini for quality. Job description parsing and summarization route to Gemini 2.5 Flash-Lite for cost efficiency. Embeddings use Workers AI at near-zero cost. Free-tier users fall back to Llama 3.3-70B running natively on Workers AI.
AI Gateway provides more than just routing — it adds edge caching for identical requests (90%+ latency reduction on cache hits), rate limiting per model, DLP scanning for PII before requests reach external providers, automatic retries with fallback models, and unified analytics across all providers.
R2 for Document and Asset Storage
Original documents, generated PDFs, AI-enhanced headshots, and interview transcripts are stored in Cloudflare R2 with a structured key hierarchy: document type, user ID, document ID, and filename. R2's S3-compatible API made integration straightforward, and its zero-egress-fee pricing model means serving documents to users costs nothing beyond storage.
Browser Rendering for PDF Generation
Resume PDF export uses Cloudflare Browser Rendering — headless Chrome at the edge. The application renders the resume as styled HTML using one of three professional templates, then captures it as a pixel-perfect PDF. This approach avoids the complexity of server-side PDF libraries and produces output identical to what users see in the browser editor.
Multi-Layer Security and Isolation
Tenant isolation is enforced at every infrastructure layer. Vectorize namespaces partition vector data by user ID — there is no query path that can cross namespace boundaries. All D1 queries include user ID predicates. R2 keys are scoped by user ID. Authentication uses NextAuth v5 with JWT sessions, and password hashing uses PBKDF2 via the Web Crypto API (Workers-compatible, no native binary dependencies). Prompt injection defense uses a scoring system with 30+ regex patterns to detect and block adversarial inputs in job descriptions and chat messages.
The Results
The architectural decisions paid off across every dimension that matters for a startup: performance, cost, and time to market.
Near-Zero Infrastructure Baseline
launch, the platform's infrastructure costs are effectively zero. Cloudflare Workers, D1, R2, and KV all offer generous free tiers. The only costs that scale are AI inference — and those scale linearly with actual usage. There are no idle instances, no minimum database commitments, no reserved capacity charges. This gave the startup runway to iterate on product-market fit without burning through infrastructure budget.
$0.15 per Active User in AI Costs
model routing through AI Gateway reduced AI costs dramatically. A Pro user generating 25 resumes, sending 200 chat messages, and uploading 10 documents in a month costs approximately $0.15 in AI inference — against $9/month in subscription revenue. This 60:1 revenue-to-AI-cost ratio gives the business healthy margins even at the lowest paid tier.
Sub-100ms Global Latency
every component runs at Cloudflare's edge — compute, database, storage, vector search, AI embeddings — there are no cross-region hops for any operation. Page loads consistently hit sub-100ms TTFB worldwide. Database queries return in under 1ms. Vector searches complete in under 10ms. The only latency-sensitive operations are external LLM calls, which are mitigated by streaming responses to the client as they generate.
Production-Ready in Weeks
using Cloudflare's integrated platform rather than stitching together services from multiple providers, we eliminated weeks of infrastructure setup, networking configuration, and DevOps tooling. No Terraform modules for VPCs, no Kubernetes manifests, no database proxy configuration, no CDN setup. The entire deployment is a single wrangler deploy command. This let the team focus engineering time on product features rather than infrastructure plumbing.
Linear Scaling Path to 1M+ Users
architecture includes a clear scaling roadmap. The current D1 instance supports approximately 2,000 active users. Phase 1 offloads text blobs to R2 to reach 20,000 users. Phase 2 introduces geo-sharded D1 databases with a central authentication database, embedding the user's shard ID in their JWT for zero-lookup routing — supporting 100 shards and 1 million users. At no point does the scaling plan require re-architecting the application or migrating to different infrastructure.
Tags
Ready to Achieve Similar Results?
Let our team of experts help you solve your toughest challenges and achieve transformational results.
Related Case Studies
Modernizing a 3-Tier Web Application with GKE
Modernize a 3-tier web app with GKE: cut compute costs by 90% and boost scalability with Linux migration and custom scaling.
Read More →Autoscaling for Black Friday Traffic Surges
How autoscaling helped an eCommerce client cut costs by 85% and handle Black Friday traffic spikes seamlessly.
Read More →Cutting IT Costs by 38% with Cloud Migration
Learn how cloud migration cut IT costs by 38% with NetApp Cloud Volumes, boosting scalability and efficiency.
Read More →