DeepSeek open-sources V4 Preview with 1M context default

DeepSeek has just rolled out DeepSeek-V4 Preview, open-sourcing the model and making 1M context the default across its services. The launch includes V4-Pro and V4-Flash, plus new sparse attention and compression claims, fresh API pricing, and a planned 2026 retirement for older models.

DeepSeek open-sources V4 Preview with 1M context default

TL;DR

  • DeepSeek-V4 Preview: Officially live, open-sourced; 1M context now default across official DeepSeek services
  • Two variants: DeepSeek-V4-Pro (1.6T total, 49B active) and DeepSeek-V4-Flash (284B total, 13B active)
  • Architecture update: Token-wise compression + DSA (DeepSeek Sparse Attention) for long context with lower compute/memory
  • Agent tooling: Integrated with Claude Code, OpenClaw, OpenCode; used internally for agentic coding
  • API availability: Same base_url; switch model to deepseek-v4-pro or deepseek-v4-flash; supports ChatCompletions, Anthropic APIs, Thinking/Non-Thinking modes
  • Pricing + deprecation: Pro $0.145/$1.74/$3.48; Flash $0.028/$0.14/$0.28 (cache hit/miss/output); deepseek-chat/deepseek-reasoner retire Jul. 24, 2026 15:59 UTC

DeepSeek posted on X that DeepSeek-V4 Preview is “officially live” and open-sourced, with 1M context length now set as the default across its official services.

According to the company’s announcement, the release comes in two variants: DeepSeek-V4-Pro, listed at 1.6T total parameters with 49B active parameters, and DeepSeek-V4-Flash, listed at 284B total parameters with 13B active parameters. DeepSeek claims V4-Pro offers “enhanced agentic capabilities,” “rich world knowledge,” and “world-class reasoning,” while V4-Flash “closely approaches” Pro on reasoning and performs “on par” with it on simple agent tasks. Those assertions remain the company’s own, as no independent evaluation was included in the post.

DeepSeek also points to what it calls a structural update built around “token-wise compression + DSA (DeepSeek Sparse Attention),” which it says enables long context with lower compute and memory costs. The company says the model has been integrated with agent tools including Claude Code, OpenClaw and OpenCode, and adds that it is already being used for in-house agentic coding. DeepSeek also linked a tech report and open weights alongside the launch.

The API is available now, with DeepSeek saying users can keep the same base_url and switch the model name to deepseek-v4-pro or deepseek-v4-flash. The company says both models support OpenAI ChatCompletions and Anthropic APIs, along with “Thinking” and “Non-Thinking” modes. A pricing table included in the provided context lists V4-Pro at $0.145 for input cache hits, $1.74 for input cache misses and $3.48 for output, while V4-Flash is listed at $0.028, $0.14 and $0.28, respectively, with 1M context for both.

DeepSeek also warned that deepseek-chat and deepseek-reasoner will be retired on Jul. 24, 2026, at 15:59 UTC and are currently routing to V4-Flash. In a separate post, the company urged readers to rely only on its official accounts for news.

Source: DeepSeek on X

Continue the conversation on Slack

Did this article spark your interest? Join our community of experts and enthusiasts to dive deeper, ask questions, and share your ideas.

Join our community