NVIDIA AI posted on X that it is shipping Nemotron 3 Ultra, a “550B MoE frontier-intelligence open model” built for long-running agents. The company claims the model delivers “5x faster inference” and can lower the cost of complex agentic tasks by up to “30%” versus other open frontier models.
In the launch thread, NVIDIA AI states that Ultra is aimed at workloads such as coding and deep research, where agents spend time planning, using tools, recovering from failures, and deciding what to do next. The company attributes the system’s efficiency to a hybrid Mamba-Transformer MoE architecture that allegedly enables more reasoning cycles within the same time budget.
NVIDIA also published comparison visuals that place Nemotron 3 Ultra alongside GLM 5.1, Kimi K2.6 and Qwen3.5 across several agentic benchmarks. In the table shown in the post, Nemotron 3 Ultra is listed at 91% on PinchBench for agent productivity, 33% on EnterpriseOps-Gym for long-horizon planning, 54% on Terminal-Bench 2.0 for coding, 82% on IFBench, 1,448 on GDPval-AA, 56% on ProfBench (Search), and 95% on RULER @ 1M for long context. The same table shows some rivals with higher scores on selected rows, so the company’s broader “leading accuracy” claim appears to depend on the benchmark.
The images also show a “Nemotron 3 - Hybrid Mamba Transformer Latent MoE” architecture diagram and a separate agent workflow schematic, suggesting the model is being positioned for tool-heavy, long-running systems rather than simple chat. Another chart compares accuracy and “Relative Throughput (Output tokens/s/GPU),” with visible labels including 5.9, 1.0, 1.2 and 3.7.
NVIDIA AI further mentions that Ultra was post-trained for agent harnesses including openclaw, NousResearch Hermes Agent, and LangChain. The company also states that Nemotron 3 Ultra is “fully open,” including model weights, synthetic data and post-training recipes, and that it is available on Hugging Face.
The launch drew quick responses from other accounts, including Glean, which posted that the model is “coming soon” to its enterprise stack, and Unsloth AI, which said it had uploaded Dynamic GGUF files for local use.
Source: NVIDIA AI on X
