DeepSeek V3.1 first launched on August 19, 2025 as an incremental upgrade of DeepSeek V3, and the Terminus refresh now tightens multilingual fidelity and agent reliability while keeping the same Mixture-of-Experts backbone. This guide contrasts the three checkpoints that matter to builders: DeepSeek-V3.1-Base, DeepSeek-V3.1, and DeepSeek-V3.1-Terminus.

Version Landscape

Version	Positioning	Key capabilities
DeepSeek-V3.1-Base	Foundation checkpoint for custom pretraining or domain adaptation	671B total parameters with 37B activated per token, 128K context window, released under MIT for downstream tuning.
DeepSeek-V3.1	Instruction-tuned chat model with hybrid thinking and non-thinking modes	Adds chat templates, optimized tool calling, and higher reasoning efficiency versus DeepSeek-V3 while retaining the base architecture.
DeepSeek-V3.1-Terminus	Reliability-focused patch on top of V3.1	Addresses language mixing, improves code and search agents, and raises agent benchmarks without changing the core structure.

Architecture and Training Stack

All three checkpoints share the DeepSeek MoE design with 671B expert parameters and 37B active per token, backed by a 128K token context window. V3.1 builds on the Base checkpoint by extending long-context training in two phases: the 32K stage scales to 630B tokens and the 128K stage to 209B tokens, while adopting UE8M0 FP8 microscaling for weights and activations.

DeepSeek reports that the V3.1 upgrade also expands the overall corpus to 14.8 trillion tokens and integrates the thinking pipeline directly into the main model so users no longer have to switch to a separate reasoning release.

Chat Templates and Tooling

DeepSeek-V3.1 introduces a unified chat template that can toggle between non-thinking and thinking prefixes, preserving the new </think> token in multi-turn contexts. Tool calling, code-agent, and search-agent formats are defined within the repository assets, enabling the same base weights to power structured agents. Terminus keeps these templates intact, so any tooling built for V3.1 remains compatible.

Benchmark Highlights

Terminus records incremental gains across reasoning and agent tasks compared with the August V3.1 build: MMLU-Pro edges up from 84.8 to 85.0, SWE Verified rises from 66.0 to 68.4, and SWE-bench Multilingual climbs from 54.5 to 57.8. BrowseComp improves from 30.0 to 38.5, while Terminal-bench moves from 31.3 to 36.7. These shifts reflect the decoding and agent template adjustments shipped in the Terminus update.

The earlier V3.1 release already expanded performance relative to DeepSeek V3, including stronger tool use, higher math pass rates, and improved code generation—maintaining parity with DeepSeek-R1-0528 in thinking mode while responding faster.

Language Reliability and Known Issues

Terminus specifically targets language consistency, reducing mixed Chinese–English outputs and abnormal characters, and refines the Code Agent and Search Agent templates shipped with the model. DeepSeek also flags a known issue in this checkpoint: the self_attn.o_proj parameters currently deviate from the UE8M0 FP8 scale and will be corrected in a future release.

Pricing and Access

DeepSeek’s public API exposes the V3.1 family with tiered token pricing—$0.27 per million input tokens on cache miss ($0.07 when cached) and $1.10 per million output tokens during peak hours, with half-price discounts during off-peak windows. Because DeepSeek’s app, web, and API endpoints already run on Terminus, updating your workloads mainly involves validating prompts rather than changing endpoints.

For self-hosting, MIT-licensed checkpoints are available on Hugging Face in BF16, FP8 (E4M3), and FP32 formats, covering Base, V3.1, and Terminus. ModelScope mirrors support mainland download needs, and the shared architecture allows you to fine-tune a Base model and hot-swap Terminus once stability requirements are met.

Adoption Checklist

Decide whether you need raw MoE control (choose Base), instruction-following out of the box (choose V3.1), or improved multilingual and agent stability (choose Terminus).
Re-run evaluation suites—especially SWE-bench Multilingual and BrowseComp—to confirm the Terminus decoding changes benefit your workloads.
If you rely on custom FP8 kernels, account for the self_attn.o_proj format fix scheduled for a future patch.
Update API budgeting models to reflect DeepSeek’s time-of-day pricing and Terminus’ slightly higher agent success rates.

By understanding how the three checkpoints differ in alignment, tooling, and benchmarks, teams can decide whether to build on the foundation, stick with the August instruction-tuned release, or adopt the Terminus refresh for production agents.

DeepSeek V3.1 Terminus Comprehensive Analysis

Version Landscape

Architecture and Training Stack

Chat Templates and Tooling

Benchmark Highlights

Language Reliability and Known Issues

Pricing and Access

Adoption Checklist

Popular Kimi K2 paths

Kimi K3

Kimi K2.7 Code

Kimi Code

Kimi K3 Status

Related Articles