Brendan McKeag

Deep Cogito Releases Suite of LLMs Trained with Iterative Policy Improvement

August 1, 2025

We're excited to announce that our partner DeepCogito has just released Cogito v2, a groundbreaking collection of hybrid reasoning, multimodal models that represents a fundamental shift in how we approach AI intelligence improvements. This isn't just another model release—it's a proof of concept for scalable superintelligence.

The Game-Changing Innovation: Quantity Without Sacrificing Quality in Reasoning

While most recent advances in reasoning models have focused on scaling up thinking token, essentially making models "think longer" to solve problems, Cogito v2 takes a radically different approach. Instead of brute-force searching through longer reasoning chains, these models develop better intuition about which reasoning paths to take.

The results speak for themselves: Cogito models achieve equivalent performance to leading models while using 60% shorter reasoning chains. This isn't just an efficiency gain—it represents a fundamental breakthrough in how we build more intelligent systems. There has been a deep focus across the field on making models smarter, but not necessarily faster or more usable. Larger models are great, but compute time definitely becomes an issue, especially with dense models. It shouldn’t come as a surprise as to why so many large model releases are strictly MoE, for that reason. What DeepCogito has accomplished is a model that not only provides the quality of answer that has become expected in the field, but does it faster and in fewer tokens than comparably sized models, resulting in a direct cost savings when using time-based billing (such as serverless.)

What makes this particularly valuable is that you get the same quality of reasoning with less computational cost. This means:

  • Lower inference costs per query
  • Better resource utilization
  • Ability to serve more users with the same hardware
  • More sustainable scaling as demand grows

Four Models, Four Opportunities

The Cogito v2 release includes four models designed to meet different computational needs, all released under open license:

Small Models:

  • 70B Dense: Compact powerhouse for efficient deployment
  • 109B MoE: Mixture of Experts architecture balancing performance and resource usage

Large Models:

  • 405B Dense: Frontier-level performance in a dense architecture
  • 671B MoE: The flagship model that matches the latest Deepseek v3 and R1 models

All models can answer directly or apply reasoning before answering.

The Technical Breakthrough: Iterative Policy Improvement

The secret behind Cogito v2's success lies in its approach to iterative policy improvement. Rather than simply scaling inference-time reasoning, the models use a two-step process inspired by successful narrow AI systems like AlphaGo:

  1. Inference-Time Reasoning: The model searches for solutions during inference
  2. Iterative Policy Improvement: The discoveries from that search are distilled back into the model's parameters

This creates a virtuous cycle where each iteration makes the model's base intelligence stronger, rather than just making it search longer. (After all, there are diminishing returns to giving more tokens to the thought process; you aren’t going to have the model find the Unified Field Theory of physics just because you gave it a million thinking token budget.) Because of this cycle, te model develops better "intuition" about which reasoning trajectories are most promising, leading to more efficient and effective problem-solving.

To put this to the test, we pit DeepCogito 405b against Llama-3 405b with the exact same setup on some very long context creative writing tasks (8xH200s, being served on vLLM with the exact same configuration) and DeepCogito’s model demonstrated some pretty significant inference speed improvements, all other things being equal. As stated, this would translate into a direct and proportional cost savings over an equally sized dense model in a serverless architecture.

Context Length DeepCogito Llama 3 Improvement
32K tokens 15.7 s 20.7 s 24 % faster
64K tokens 23.2 s 29.8 s 22 % faster
112K tokens 38.9 s 47.7 s 18 % faster

How to Get Started on Runpod With DeepCogito

Getting DeepCogito models up and running on Runpod is straightforward since they're built on the standard transformers architecture. This means all your existing inference engines and deployment workflows will work seamlessly with these new models. We have several templates and package deployment options such as vLLM, sglang, and text-generation-webui; all you need to do is plug in the model you want from the Deep Cogito Huggingface page and you are good to go.

Resource Recommendations

Here are the minimum (8k context max) and recommended (longer context) GPU specs for the suite of models.

For 70B Dense model:

  • Minimum: 4x A100 (80GB VRAM) = 320GB total
  • Recommended: 4x H100 (80GB VRAM)

For 109B MoE model:

  • Minimum: 4x A100 (80GB VRAM) = 320GB total
  • Recommended: 6x A100 or 4x H100

For 405B Dense model:

  • Minimum: 12x A100 (80GB VRAM) = 960GB total
  • Recommended: 16x

For 671B MoE model:

  • Minimum: 16x A100 (80GB VRAM) = 1.28TB total
  • Recommended: 20x+ H100 for best throughput
  • Requires our largest multi-GPU configurations (Instant Clusters)

Looking Ahead

Cogito v2 represents more than just another model release—it's a proof of concept that scalable self-improvement in AI systems is not just possible, but practical. By focusing on improving model intelligence rather than just scaling search, this approach could pave the way for the next generation of AI systems.

As these models become available on Runpod's platform, we're excited to see what the community will build with them. The combination of frontier performance, efficient reasoning, and open accessibility creates unprecedented opportunities for innovation.

The path to superintelligence may be closer than we think, and it might be more elegant than we imagined—sometimes the best solution isn't to think longer, but to think better.

Ready to experience DeepCogito's breakthrough reasoning capabilities? Try out Cogito v2 through our public endpoints and see the future of efficient AI in action. Visit our template marketplace to get started in minutes.

Build what’s next.

The most cost-effective platform for building, training, and scaling machine learning models—ready when you are.

You’ve unlocked a
referral bonus!

Sign up today and you’ll get a random credit bonus between $5 and $500 when you spend your first $10 on Runpod.