Skip to main content
Home/Blog/Is Qwen Code Still Free? The 2026 Free-Tier Shutdown (and 3 Free Alternatives)
Developer Tools

Is Qwen Code Still Free? The 2026 Free-Tier Shutdown (and 3 Free Alternatives)

Alibaba killed Qwen Code's free OAuth tier on April 15, 2026. Here's exactly what changed, what the paid Coding Plan costs, and three ways to keep running Qwen3-Coder for free.

By Sean

If you opened your terminal one morning in mid-April and got hit with Qwen OAuth free tier was discontinued on 2026-04-15, you weren't alone. That error broke a lot of workflows overnight, and the headlines didn't help — "Free Qwen Is Dead" ran across Decrypt, Yahoo Tech, and others around mid-April. The framing made it sound like Alibaba had nuked the whole project.

It didn't. The reality is narrower, and once you understand the distinction, you can be back to coding with Qwen3-Coder — for free — in about ten minutes. Let's clear up the confusion and walk through the three free paths that still work.

What actually changed (and what didn't)

Here's the single most important thing to get straight: "Qwen Code" the CLI is still free and open source. What ended was the free hosted OAuth login that gave you free cloud inference through Alibaba's servers. Those are two different things, and conflating them is the root of nearly every "Qwen is dead" take.

The shutdown happened in two stages. First, Alibaba cut the free OAuth quota from 1,000 requests/day down to 100 requests/day. Then, on April 15, 2026, they closed the free OAuth tier entirely. The official README now states plainly that the "Qwen OAuth free tier has been discontinued on April 15, 2026," and the docs surface the same message as the auth error you're probably seeing. The stated reason, per the policy adjustment issue on GitHub, was "a product strategy adjustment to better manage the free tier usage and costs."

The CLI didn't go away. The free inference did. You keep the tool; you now bring your own model.

So your install of qwen is fine. Your config and history are fine. You just need to re-authenticate against a source of inference that isn't the dead OAuth endpoint.

Re-authenticating: the one command you need

To switch auth methods, run the CLI and trigger the auth flow:

qwen
/auth

From there, the official docs and README point you to three replacement paths:

  1. API Key via Alibaba Cloud Model Studio / DashScope
  2. The paid Coding Plan (Alibaba's hosted subscription)
  3. Local Inference via Ollama / vLLM — explicitly described as the "recommended free alternative"

The README and discontinuation notices also name OpenRouter and Fireworks AI as bring-your-own-key (BYOK) providers. That gives us our menu. Let's price out the paid option, then cover the three genuinely free routes.

The paid option: Qwen Coding Plan

If you want a turnkey hosted experience and don't mind paying, the Qwen Coding Plan (a.k.a. the Alibaba Cloud Coding Plan) is the official answer. The headline/Pro tier is $50/month.

According to a detailed secondary source, that tier includes:

LimitAllowance
Per month~90,000 requests
Per rolling 5-hour window6,000 requests

It supports a wide model lineup — qwen3-coder-next, qwen3-coder-plus, qwen3-max, kimi-k2.5, glm-5, and MiniMax-M2.5 — through the endpoint https://coding.dashscope.aliyuncs.com/v1.

A word of caution on pricing you'll find elsewhere: some third-party marketplaces advertise much cheaper tiers (around $10/month "Lite," $7 "Standard," $22 "Business"). Those don't match the official $50 figure and appear to come from plan-reseller sites that may be conflating different products. Treat anything other than the $50 plan as unverified. Similarly, a claim floating around that the free developer API tier was replaced with a one-time 70M-token trial doesn't appear in the Qwen Code README — so I'd treat that as unconfirmed too.

If $50/month isn't where you want to be, here are the three free alternatives.

Free alternative #1: OpenRouter (~1,000 requests/day)

OpenRouter hosts Qwen3-Coder as a genuinely free model:

  • Model ID: qwen/qwen3-coder:free
  • Architecture: 480B-A35B MoE
  • Context: 1M tokens
  • Price: $0

Because it's an OpenAI-compatible endpoint, it drops straight into Qwen Code as a BYO key. The catch is the rate limits, which have some nuance worth understanding:

  • Free models are capped at 20 requests/minute, regardless of your tier.
  • You get roughly 50 requests/day by default.
  • You get 1,000 requests/day once you've purchased $10 or more in credits at any point — a one-time threshold that never expires.

So the widely-cited "~1,000/day free Qwen3-Coder" access isn't free out of the box; it requires a single $10 credit purchase to unlock, after which the higher daily ceiling sticks permanently. For most solo developers, that $10 one-time spend is the best value on this list — far cheaper than the $50/month plan if your usage fits inside ~1,000 requests/day.

One more OpenRouter perk worth knowing: all users get 1,000,000 free BYOK requests/month (routing your own provider key through OpenRouter), after which a 5% routing fee applies to normal model pricing.

Free alternative #2: Run it locally (no key, no limits)

This is the path the official docs call the recommended free alternative, and it's my favorite for anyone with the hardware. Qwen models are open-weight under Apache 2.0, which means you can run them locally at zero cost, with no API key and no usage limits, then point the Qwen Code CLI at your local server.

The documented runners are Ollama, vLLM, and llama.cpp. The general workflow looks like this:

# Option A: Ollama
ollama pull <qwen3-coder-model>
ollama serve   # exposes an OpenAI-compatible endpoint

# Option B: llama.cpp
# build llama.cpp, pull a quantized GGUF (e.g. an Unsloth Q4_K_XL),
# then run the OpenAI-compatible server

Then in Qwen Code, run /auth and point it at your local endpoint.

The hardware question is the real gate here. Qwen3-Coder-Next (announced February 2026) is an 80B MoE model with 3B active params and 256K context, designed for fast local agentic coding. It needs roughly:

  • ~46GB RAM/VRAM/unified memory at 4-bit
  • ~85GB at 8-bit

A 64GB Apple Silicon Mac is cited as a 4-bit "sweet spot." If you've got a workstation with a big GPU or a high-memory Mac, local is the cleanest answer: it's private, it's offline-capable, and there's genuinely no quota to hit.

If you'd rather not point Qwen Code directly at a single machine, a local-first AI gateway like Wide Area AI sits in front of your hardware as an OpenAI-compatible endpoint — serving requests from your own nodes at zero per-token cost and only failing over to a cloud provider when those nodes are offline. You just set Qwen Code's base URL to the gateway and keep the local-first economics without the single-point-of-failure.

Free alternative #3: BYO API key with free credits

The third route is BYOK against any OpenAI-compatible provider that gives you free starting credits or a free model. OpenRouter (above) is the obvious one, but the README also names Fireworks AI as a BYOK provider, and an Alibaba Cloud Model Studio / DashScope API key is a first-class supported option in Qwen Code.

The mechanics are identical across providers: get a key, run /auth in Qwen Code, select the API-key method, paste it in, and set the model. This is the most flexible option because it lets you mix and match — for example, a free local model for routine edits and a hosted key for the occasional heavy agentic task.

Which one should you pick?

PathCostBest for
OpenRouter :free$0 (or $10 once for 1,000/day)Solo devs who want hosted convenience
Local (Ollama/llama.cpp)$0 + hardwareAnyone with 46GB+ memory who wants privacy and no limits
BYO key (Fireworks/DashScope)Varies / free creditsMixing providers, flexibility
Coding Plan$50/monthTeams or heavy users wanting an official hosted SLA

Bottom line

The "Free Qwen Is Dead" headlines oversold it. The Qwen Code CLI is still free and open source — only the free hosted OAuth inference ended, on April 15, 2026. If you're staring at that auth error, run qwen then /auth and pick one of three free routes: the OpenRouter free model (best value after a one-time $10 unlock for 1,000 requests/day), fully local inference via Ollama or llama.cpp (zero cost, no limits, if your hardware can handle ~46GB at 4-bit), or a BYO API key with free credits. The $50/month Coding Plan is there if you'd rather pay for an official hosted experience — but you don't have to. Treat any sub-$50 "official" pricing or mystery free-trial claims with skepticism; they come from third-party sources that don't line up with Alibaba's own docs.

Frequently Asked Questions

Find answers to common questions

The Qwen Code CLI itself is still free and open source. What ended on April 15, 2026 was the free hosted OAuth login that provided free cloud inference. You can keep using Qwen Code at no cost by bringing your own inference: a free OpenRouter model, local models via Ollama/llama.cpp/vLLM, or a free-credit BYO API key.

That error means you were authenticated through the old free OAuth flow, which Alibaba shut down on April 15, 2026. Run qwen then /auth to switch to a supported method: an API key from Alibaba Cloud Model Studio (DashScope), the paid Coding Plan, or a local/OpenRouter endpoint.

Alibaba first cut the free OAuth quota from 1,000 requests/day down to 100 requests/day, then closed the free tier completely on April 15, 2026. The stated rationale was a product strategy adjustment to better manage free-tier usage and costs.

The official headline Coding Plan tier is $50/month. Per a detailed secondary source, it includes roughly 90,000 requests/month and 6,000 per rolling 5-hour window, and supports models like qwen3-coder-next, qwen3-coder-plus, qwen3-max, kimi-k2.5, glm-5, and MiniMax-M2.5. Some third-party marketplaces advertise cheaper tiers, but those don't match the official figure and should be treated as unverified.

OpenRouter hosts Qwen3-Coder as the free model ID qwen/qwen3-coder:free (480B-A35B MoE, 1M token context, $0 price). It's an OpenAI-compatible endpoint, so you can plug it into Qwen Code as a BYO key. Free models are capped at 20 requests/minute. You get about 50 requests/day by default, or 1,000/day once you've purchased $10+ in credits at any point (a one-time, never-expiring threshold).

Qwen models are open-weight (Apache 2.0), so you can run them locally with no API key and no usage limits. Install Ollama or build llama.cpp, pull a quantized GGUF, run an OpenAI-compatible server, and point Qwen Code at that local endpoint via /auth. The official docs call local inference the recommended free alternative.

Qwen3-Coder-Next is an 80B MoE model with 3B active params and 256K context. It needs roughly 46GB of RAM/VRAM/unified memory at 4-bit (about 85GB at 8-bit). A 64GB Apple Silicon Mac is cited as a 4-bit sweet spot for fast local agentic coding.

Yes. The open-source Qwen Code CLI is unaffected. Only the free hosted OAuth inference was discontinued. The README now directs users to API keys, the paid Coding Plan, or local inference instead.

Building Something Great?

Our development team builds secure, scalable applications. From APIs to full platforms, we turn your ideas into production-ready software.