Chinese Ai

Zhipu, DeepSeek push China’s trillion-parameter AI frontier amid US clampdown

D
Debby Wang
June 22, 202613 min read
Share:
Zhipu, DeepSeek push China’s trillion-parameter AI frontier amid US clampdown

Zhipu, DeepSeek Push China's Trillion-Parameter AI Frontier Amid US Clampdown

TL;DR

DeepSeek and Zhipu AI are racing toward trillion-parameter scale using Huawei Ascend clusters and optimized mixture-of-experts architectures — precisely while US export controls have cut off access to Nvidia's H100 and A100 chips. The efficiency gains are documented and largely real. Whether the benchmark numbers being claimed for the next generation of models hold up against independent evaluation remains an open question worth tracking carefully.

Key Takeaways

  • DeepSeek reportedly trained its V3 model — 685 billion total parameters, 37 billion active per token — for approximately $5.6 million in compute costs, according to DeepSeek's technical report published in December 2024
  • Zhipu AI, the Beijing lab spun out of Tsinghua University, has been pushing its GLM series toward trillion-parameter territory with state-linked investment backing, though independent benchmark verification of the largest variants remains limited as of mid-2026
  • US Bureau of Industry and Security export controls, tightened repeatedly since October 2022, have blocked Nvidia H100 and A100 shipments to Chinese entities — yet neither DeepSeek nor Zhipu has stopped shipping competitive models
  • DeepSeek-R1, released in January 2025, matched or exceeded GPT-4-class performance on several math and coding benchmarks per the team's self-reported results; third-party testing by Artificial Analysis broadly confirmed competitive performance across reasoning tasks
  • Huawei's Ascend 910B has become the de facto training chip for Chinese frontier labs cut off from Nvidia hardware — its theoretical FLOPS lag the H100, but Chinese labs are extracting usable throughput through aggressive low-level software optimization
  • DeepSeek's mixture-of-experts architecture activates only a fraction of parameters per inference pass, allowing trillion-parameter scale to be approached without proportional compute costs — a design pattern that has since spread to Western labs
  • The geography matters: Beijing (Zhipu, Baidu ERNIE), Hangzhou (DeepSeek, Alibaba Qwen), and Shenzhen (hardware-adjacent AI companies) each produce a meaningfully different kind of lab, optimizing for different things

The Setup: Why Trillion Parameters Under Sanctions Is the Story

Hangzhou is not the city most Western tech people picture when they think about frontier AI. Shenzhen makes hardware. Beijing makes policy. But Hangzhou — home to Alibaba's sprawling campus and a quietly influential cluster of quant-finance alumni — is where DeepSeek built the model that made Western AI executives publicly uncomfortable in early 2025.

The specific detail worth anchoring on: DeepSeek-V3 has 685 billion total parameters but activates only 37 billion per forward pass. That mixture-of-experts design is why the team could credibly claim a training run costing roughly $5.6 million, according to their December 2024 technical report. Compare that to estimates north of $100 million for comparable US frontier training runs, and the discomfort becomes legible.

Then came DeepSeek-R1. The reasoning model, published in January 2025, showed competitive performance against OpenAI's o1 on math and coding benchmarks — as documented in the model's technical paper. The reinforcement-learning methodology was published openly. Western researchers could read it. Several said they hadn't seen it approached that way before.

Zhipu AI is a different kind of story. Where DeepSeek is a quant-finance spinout with a small, intensely focused team, Zhipu is a Tsinghua University spinout that has received significant state-linked investment and operates with the cultural logic of an academic lab that has been told to move faster. Its GLM series has been iterating steadily, with the latest variants reportedly pushing into trillion-parameter territory. Independent benchmarks at that scale are sparse, which is a genuine limitation this piece will not paper over.

Both labs are building under the same constraint: no Nvidia H100s. The US Bureau of Industry and Security's successive rounds of export controls have made that hardware effectively inaccessible to Chinese AI labs. The replacements — primarily Huawei's Ascend 910B — are real chips running real training jobs. They are not equivalent to H100s. They are good enough for something that is happening anyway.

The Evidence: What We Can Verify, and What We Cannot

DeepSeek's efficiency claims hold up — with caveats

The $5.6 million training cost figure for DeepSeek-V3 was widely cited after the technical report dropped. It is worth being precise about what it measures: it covers the compute cost of the final training run, not the full R&D cost including failed experiments, infrastructure amortization, and engineering salaries. That distinction matters when comparing against US figures, which often roll in more overhead. The comparison is real; some of the framing that circulated is not.

On benchmark performance, DeepSeek-R1 results have been corroborated by independent evaluators. Artificial Analysis, which runs standardized tests across frontier models, placed DeepSeek-R1 as competitive with OpenAI's o1 on reasoning-heavy tasks while being significantly cheaper to run via API. That is a verifiable claim based on public methodology.

The trillion-parameter framing is where I'll be direct: DeepSeek-V3 at 685 billion total parameters is not a trillion-parameter model. The claim in the headline refers to where both labs are reportedly heading — not where they have publicly arrived. DeepSeek has not published a verified trillion-parameter model as of this writing. Treat "trillion-parameter frontier" as a directional description of a competitive race, not a confirmed product spec.

Zhipu's numbers: less public, still significant

Zhipu AI does not publish training cost breakdowns the way DeepSeek does. Its GLM-4 series has been benchmarked on Chinese-language tasks and general reasoning, with results competitive with mid-tier Western models on standard evaluations. The lab has stated publicly that it is working on larger parameter counts; the specific figures are not independently confirmed.

What is confirmed: Zhipu has raised substantial capital from state-linked investors and maintains a direct relationship with Tsinghua's computer science department that provides access to talent pipelines most Western labs cannot replicate. That structural advantage is separate from any benchmark number and arguably more durable.

The chip constraint is real — and being engineered around

Huawei's Ascend 910B is not a drop-in H100 replacement. Memory bandwidth is lower. The software stack — Huawei's CANN compute architecture — requires porting work that Nvidia's CUDA ecosystem does not. Chinese labs have invested heavily in that porting work. DeepSeek, in particular, has published details of low-level optimizations, including a custom multi-token prediction approach, that extract more throughput from constrained hardware.

This is not catching up by copying. The optimization work is technically original. It is also, in part, a direct response to constraint. Some of the efficiency innovations that made Western researchers pay attention to DeepSeek may not have emerged without the chip restriction forcing an engineering problem that Nvidia abundance would have papered over.

A Snapshot: Where the Key Players Stand

LabBaseParent / BackingFlagship modelParameter scale (verified)Chip statusOpen weights?
DeepSeekHangzhouHigh-Flyer CapitalDeepSeek-V3 / R1685B total (37B active)Ascend 910B clustersYes
Zhipu AIBeijingTsinghua / state-linked investorsGLM-4 seriesNot publicly confirmed at scaleAscend + domesticPartial
Alibaba QwenHangzhouAlibaba GroupQwen-2.5 seriesUp to 72B confirmed; larger unconfirmedMixedYes (select models)
Baidu ERNIEBeijingBaiduERNIE 4.0UnspecifiedMixedNo
ByteDance DoubaoBeijingByteDanceDoubao / Seed seriesUnspecifiedMixedNo

The comparison between coding performance and enterprise productivity tasks across these models gets into territory worth reading separately — the competitive dynamics on those specific benchmarks are where the real differentiation lives, as covered in the China LLM showdown analysis on BestAIFor.

What This Changes for Western Founders and Professionals

The practical frame is this: you may already be running Chinese AI models in your stack without having thought about it carefully.

DeepSeek-R1 and V3 are both available via API and via open weights. Inference providers including Groq, Together AI, and Fireworks AI serve DeepSeek variants. If you are building on top of an open-model layer — through any of those providers, or your own hosted setup — there is a real chance you are already in the DeepSeek ecosystem.

The question "should we use Chinese models?" is increasingly less interesting than "what are the actual risks for this specific use case?" Code generation with DeepSeek on a non-sensitive internal codebase is a categorically different risk profile from routing customer data through an API with infrastructure in China. Treat those as separate decisions.

For competitive intelligence: if you are building an AI product in a vertical Chinese labs are specifically targeting — coding assistants, enterprise document processing, anything touching hardware control or industrial automation — the relevant question is not whether their models match OpenAI's top tier. It is whether they are good enough to undercut your pricing within your existing customer base. On several tasks at current API pricing, they already are.

When NOT to default to Chinese AI models

Don't route regulated data through Chinese-hosted APIs. Healthcare data, financial records, and anything subject to GDPR or HIPAA requires a clear legal opinion on where inference runs and who has access. "The model performs well" is not sufficient diligence.

Don't assume open-weights models are geopolitically neutral. The weights themselves are not subject to US export controls. The company that trained them and the terms attached to commercial use are separate considerations. Read the license; involve your legal team for production deployments.

Don't ignore the software supply chain. Enterprise software vendors — particularly Chinese-made SaaS sold into Western markets — increasingly embed GLM or Qwen models under the hood. If you are evaluating procurement, ask specifically what LLM powers any AI features. You may not get an honest answer; ask anyway.

Checklist: Evaluating Chinese AI Models Before Committing

  • [ ] Run your specific task — not a generic benchmark — through DeepSeek-R1 and at least one Chinese open-weight alternative
  • [ ] Confirm where inference runs: Chinese-hosted API vs. self-hosted open weights vs. Western provider serving the model
  • [ ] Identify whether any data processed is regulated, sensitive, or subject to contractual restrictions on third-party processing
  • [ ] Get a legal opinion on model license terms for your planned commercial use
  • [ ] Ask your vendors directly which LLM powers their AI features — document the answer
  • [ ] Set a re-evaluation interval: the benchmark gap between Chinese and Western models is moving; last year's answer may not be this year's

Where This Is Heading

The efficiency advantage compounds for now. The architectural innovations — MoE, multi-token prediction, aggressive quantization — that Chinese labs developed under chip constraints have spread to Western competitors. But the labs that were forced to optimize early have operational expertise running large models cheaply that newer adopters do not. That is not a permanent moat. It is a real advantage for the next 18 to 24 months.

Domestic chip capacity is the long variable. Huawei's Ascend roadmap is not public in the way Nvidia's is. Whether China can build a full-stack domestic semiconductor supply chain for AI training at scale — not just one chip generation, but the equipment, the software, the manufacturing yield — is a decade-long question, not a product cycle question. Do not model it as resolved in either direction.

Open weights change the geopolitical framing. The fact that DeepSeek's models are available as open weights means the US-China binary is only part of the story. A European startup, an Indian enterprise software company, or a Latin American SaaS vendor can use these models without engaging with China at all. The geopolitical dimension matters for governments and large regulated enterprises; it matters less for small teams selecting inference infrastructure on cost-performance grounds.

Zhipu's institutional backing is a different kind of bet. Where DeepSeek is lean and research-focused, Zhipu is building with longer-term institutional support that includes government procurement access. That structure tends to produce reliable funding through downturns. It does not tend to produce the kind of surprise technical result that DeepSeek produced in late 2024. Both matter; they matter differently.

Agentic and multimodal benchmarks are where the next comparison will land. The coding and reasoning benchmarks that defined the LLM horse race in 2024-2025 are a partial picture. Watch for agentic benchmarks — tool use, multi-step planning, real-world task completion — over the next 12 months. That is where the competitive differentiation is likely to shift.

FAQ

Is DeepSeek actually as good as GPT-4?

On specific tasks — math reasoning, code generation in Python, structured data analysis — DeepSeek-R1 has benchmarked as competitive with GPT-4-class models per both self-reported results and third-party evaluators. On tasks requiring strong English-language cultural knowledge, nuanced long-form writing, or complex multimodal reasoning, the picture is more mixed. "As good as GPT-4" is not a single answer because performance varies substantially by task type.

Can Chinese labs actually train frontier models without H100s?

Yes, with real caveats. They can train models that benchmark as frontier on many tasks. Training is slower per compute unit, and the software ecosystem requires more manual optimization. The constraint has not stopped them. It has changed what they built — arguably in ways that produced useful innovations.

What does Zhipu AI actually do that I might encounter?

Zhipu's API (marketed as BigModel) is embedded in Chinese enterprise software, coding tools, and document processing applications. If you are evaluating Chinese-made enterprise software for procurement, there is a reasonable probability that the underlying LLM is Zhipu or one of the Alibaba Qwen variants. Ask the vendor directly.

Is using DeepSeek open weights legally risky for a Western company?

This is a legal question, not a technical one. The model weights are not subject to US export controls. Running them on your own infrastructure to process non-sensitive data is generally considered low-risk per publicly available legal commentary. Routing sensitive personal or regulated data through any third-party API — including US ones — requires its own data protection analysis. Get proper counsel for production deployments.

Why does Beijing vs. Hangzhou vs. Shenzhen matter?

It is not scenery. Beijing labs operate near central government institutions and tend to optimize for regulatory compliance and government procurement contracts. Hangzhou labs come from commercial internet and fintech culture that prizes speed and measurable benchmarks. Shenzhen labs skew toward hardware integration and manufacturing applications. These are tendencies, not rules, but they shape what each lab prioritizes in ways that affect which use cases they are actually good at.

Are the trillion-parameter claims verified?

As of mid-2026, no Chinese lab has published a fully verified, independently benchmarked trillion-parameter model meeting Western academic standards for reproducibility. The directional claim — that multiple Chinese labs are racing toward that scale — is credible based on public statements and funding patterns. Specific parameter counts attached to specific unreleased models should be treated as unverified until independent testing is available.

Should I integrate DeepSeek into my product today?

Depends on the product. For internal developer tooling with no sensitive data: the cost-performance case is strong and the risk is low. For customer-facing applications processing personal data: confirm where inference runs before you deploy. For anything in a regulated vertical — healthcare, finance, legal — get compliance review first. The model performing well on benchmarks is one input to that decision, not the whole decision.

D
Debby Wang is BestAIFor's China AI Correspondent, covering the tools, startups, and policy shifts coming out of China's AI ecosystem. Based in Shenzhen, she writes for Western founders and professionals who want to understand what's actually happening - without the hype or the panic. Her focus areas include physical AI, robotics, medical applications, AI hardware, and the social and legal impact of automation.