GLM Models: The Open-Source Alternative from Zhipu AI
GLM-4 and CodeGeeX aren't widely discussed in Western developer circles. They should be. Open weights, strong multilingual coding, and a dedicated IDE extension that's genuinely good.
A colleague was building a bilingual internal tool — user-facing content in English, admin interface in Chinese, codebase with comments in both. He’d tried GPT-4 and Sonnet. Both produced workable code but required constant correction on the Chinese-facing parts. Then he tried CodeGeeX. The problem mostly went away.
GLM and CodeGeeX from Zhipu AI are the open-weight models most Western developers have never heard of. That’s a gap worth closing, especially if you have specific reasons to care about open weights, data sovereignty, or bilingual codebases.
What Zhipu AI is
Zhipu AI is a Chinese AI company founded in 2019, spun out of Tsinghua University’s KEG Lab — the same group that developed the original GLM (General Language Model) research. They’ve been building and publishing models since 2021, significantly ahead of most Western open-weight labs on the timeline.
Their models are genuinely open. Apache 2.0 licensing on the major versions, available on Hugging Face, self-hostable.
The models relevant to developers
GLM-4
General-purpose model, competitive with GPT-4-level performance on most benchmarks. Strong at code, writing, reasoning. Bilingual from the ground up — not “supports Chinese” as a secondary language, but trained with equivalent priority across Chinese and English.
Available via the Zhipu API and as open weights. The API is significantly cheaper than OpenAI/Anthropic equivalents; the open weights run on consumer hardware at the smaller quantised sizes.
GLM-4-Flash
Fast, cheap tier. Comparable to Haiku or Gemini Flash in the capability/speed tradeoff. Good for the same use cases: mechanical pipeline steps, simple explanations, boilerplate generation. Very fast response times.
GLM-4-Long
Extended context variant. Handles very long documents — up to 1M tokens in some configurations. Useful for tasks like “summarise this entire codebase” or “audit this monorepo for pattern violations.”
CodeGeeX4
The dedicated coding model. Open source under Apache 2.0. Trained specifically for:
- Code completion (including fill-in-the-middle)
- Code explanation
- Code generation from natural language
- Bug detection and fixing
- Unit test generation
Supports 80+ programming languages. 9B parameters. Strong on mainstream languages; bilingual code comments work well. This is the model powering the CodeGeeX IDE extension.
CodeGeeX IDE extensions
Zhipu ships official extensions for VS Code and JetBrains. Free. No API key required to get started — you can authenticate with a Zhipu account and use their hosted API tier.
What the extension does:
- Inline code completion (like Copilot, but powered by CodeGeeX)
- Chat panel for asking questions about your code
- Explain selected code
- Generate tests from selected functions
- Translate code between languages
The quality on code completion is genuinely competitive. It’s not at Claude Code or Copilot level for complex multi-file reasoning, but for autocomplete on a single function it’s solid — and free at the basic usage tier.
If you want to evaluate it with zero commitment: install the VS Code extension, create a Zhipu account, and use it for a week on a real project. You’ll have a real-world quality read in a few days.
Self-hosting
This is where GLM and CodeGeeX have a structural advantage over closed models.
CodeGeeX4 at 9B parameters runs on:
- An RTX 3090 (24GB VRAM) at full precision
- An M2 Pro MacBook Pro with 16GB unified memory at Q4 quantisation
- An RTX 3080 (10GB) at Q4 quantisation
Quantised versions (Q4_K_M and similar) via Ollama are straightforward:
# Pull via Ollama (community-maintained tags)
ollama pull codegeex4
# Or use llama.cpp directly with a GGUF file
./llama-cli -m codegeex4-Q4_K_M.gguf -p "Write a Python function that..."For GLM-4 (larger, more capable):
- 9B quantised: M2 Pro / RTX 3080 territory
- Full precision: needs 40GB+ VRAM
When you self-host, your code never leaves your machine. No API traffic, no data residency concerns, no per-token cost. For a team with a capable on-prem GPU, this is a strong option.
Where it shines
Chinese-English bilingual codebases and teams. This is where GLM genuinely outperforms Western models. Comments in mixed Chinese and English, error messages in Chinese, documentation for a Chinese-speaking user base — all handled naturally without the constant correction loop you get from models where Chinese is a secondary capability.
Open-weight deployments for data sovereignty. If your client contract requires all processing to happen on your own infrastructure, the open-weight GLM models are one of the strongest options available. You get GPT-4 class capability without routing anything to a third-party API.
Code completion tasks. CodeGeeX’s dedicated training shows in autocomplete quality. It’s not just a general model applied to code — it’s specifically trained on the autocomplete task, and the fill-in-the-middle performance reflects that.
Cost at API scale. The Zhipu API is priced significantly below OpenAI and Anthropic. For high-volume batch tasks (similar to the DeepSeek case), the economics are meaningful.
Where it’s weaker
Complex English instruction following. For nuanced, multi-part instructions written in English — the kind you’d put in a detailed CLAUDE.md — GLM-4 doesn’t follow them as reliably as Sonnet. It’s not bad, but it’s not as well-calibrated to the kind of structured instruction files that Anthropic’s model training has optimised for.
Architectural reasoning in English. Ask GLM-4 to design a system architecture and you’ll get a competent answer. Ask it to explain why one approach is better than another given a specific set of constraints — Sonnet and Opus still have the edge on that kind of judgment-heavy English-language reasoning.
Long-context coherence. GLM-4-Long addresses this directly, but the standard GLM-4 model’s coherence degrades on very long sessions in a way that’s more pronounced than Claude’s.
Community tooling. The ecosystem of guides, integrations, and third-party wrappers is smaller. If you hit a problem with CodeGeeX or GLM, you’re more likely to be debugging it yourself rather than finding a Stack Overflow answer.
Practical entry point
Start with the CodeGeeX VS Code extension. It’s free, requires minimal setup, and gives you a direct quality comparison against whatever completion tool you’re currently using. Use it for a week on a real project — not a toy — and you’ll have actual data on whether it fits your workflow.
If it does, the natural next step is either:
- The Zhipu API for programmatic access to GLM-4 at scale
- Self-hosting CodeGeeX4 via Ollama if data residency matters
Coming next
Post 5 covers local models more broadly — Qwen2.5-Coder, CodeLlama, Codestral, and the tooling (Ollama, LM Studio, llama.cpp) for running them. When the right call is that your code never leaves your machine at all.
Choosing Your AI Stack
6 parts in this series.
A six-part series on picking models, pricing models, and building workflows that aren't locked to any single provider.
- 01 Token vs Subscription: Which AI Pricing Model Is Right for You
- 02 Claude Models Explained: Opus, Sonnet, Haiku for Coding
- 03 DeepSeek for Developers: Capable, Cheap, and Worth Knowing previous
- 04 GLM Models: The Open-Source Alternative from Zhipu AI ← you are here
- 05 Local Models: When You Run the Weights Yourself up next
- 06 Building Model-Agnostic Workflows
What did you take away?
Thoughts, pushback, or a story of your own? Drop a reply below — I read every one.
Comments are powered by Disqus. By posting you agree to their terms.