📋 Table of Contents
Jump to any section (22 sections available)
📹 Watch the Complete Video Tutorial
📺 Title: NEW Kimi K2 Thinking – Best Open Model?
⏱️ Duration: 682
👤 Channel: Sam Witteveen
🎯 Topic: New Kimi Thinking
💡 This comprehensive article is based on the tutorial above. Watch the video for visual demonstrations and detailed explanations.
Two years ago, skepticism surrounded Chinese large language models (LLMs), with many doubting they could rival the dominance of models from San Francisco giants like OpenAI, Anthropic, and Google. Fast forward to today—and the landscape has shifted dramatically. Enter New Kimi Thinking, the latest evolution from Moonshot AI, which not only surpasses all open-source Llama models but also competes head-to-head with—and often beats—top proprietary models from Anthropic, OpenAI, and Google.
This comprehensive guide dives deep into every aspect of the New Kimi Thinking model as revealed in a recent expert analysis. From its architecture and agentic capabilities to real-world demonstrations, performance benchmarks, pricing, and deployment options, we extract every detail from the original transcript to give you a full, actionable understanding of why this model is redefining the frontier of AI intelligence.
From Skepticism to Leadership: The Rise of Kimi Over Two Years
Back in early 2022, when the first Chinese LLM was covered, critics dismissed it as inferior to Western counterparts like the Llama series or models from OpenAI. But Moonshot AI’s Kimi has defied expectations. The original Kimi model launched in July of this year, followed by an updated version in September—culminating in the release of Kimi K2 Thinking, a model engineered for advanced reasoning and tool integration.
This progression reflects not just iterative improvement, but a massive investment in compute and training methodology—specifically, moving beyond basic instruction tuning to sophisticated interleaved chain-of-thought reasoning with tool calls.
What Makes Kimi K2 Thinking a “Thinking” Model?
Kimi K2 Thinking isn’t just another fine-tuned LLM. It’s explicitly trained to perform extended, interleaved chain-of-thought reasoning—a process where the model alternates between reasoning steps and external tool usage (like search or code execution) to solve complex problems.
Interleaved Chain-of-Thought Explained
In traditional models, reasoning happens in a linear block before a final answer is generated. In contrast, Kimi K2 Thinking dynamically interleaves reasoning with tool calls. For example, when solving a math problem, it may:
- Reason about the problem structure
- Call a search tool for relevant data
- Analyze the search results
- Make additional search calls if needed
- Execute Python code in a sandbox to compute the answer
- Refine its reasoning based on code output
The transcript notes one math example used 23 different interleaved reasoning and tool call steps to arrive at the correct solution—demonstrating the model’s capacity for deep, adaptive problem decomposition.
Test-Time Scaling: The Secret Behind Kimi’s Performance
A key innovation in Kimi K2 Thinking is its use of test-time scaling—a technique that enhances performance during inference rather than just during training. This scaling is applied not only to the length of reasoning chains but also to the number and complexity of tool calls.
This allows the model to handle long-horizon tasks that require sustained planning, adaptive decision-making, and multiple external interactions—something most models, even proprietary ones, struggle with.
Real-World Benchmark Dominance
Kimi K2 Thinking has demonstrated superior performance on major benchmarks, including Humanity’s Last Exam (text version), where it outperforms both OpenAI and Anthropic models. This isn’t just theoretical—Moonshot AI showcases these results directly on their platform, reinforcing the model’s competitive edge.
Agentic Capabilities: Beyond Simple Text Generation
Moonshot AI’s platform, moonshot.ai, is built to support the full agentic potential of Kimi K2 Thinking. Users can:
- Run the model in “thinking” or “non-thinking” modes
- Integrate external tools directly into the inference pipeline
- Connect MCP (Model Control Protocol) servers on the backend for custom tool orchestration
- Access built-in features like Deep Researcher and Computer Use
- Use specialized modes for data visualization, slide creation, and more
This shift—from static LLM responses to dynamic, tool-augmented agents—marks a fundamental evolution in how foundation models are used.
Tool Orchestration at Scale: 200–300 Sequential Tool Calls
One of the most impressive capabilities highlighted is Kimi Thinking’s ability to execute 200 to 300 sequential tool calls driven by long-horizon planning and adaptive reasoning. This isn’t just about making many calls—it’s about intelligently decomposing ambiguous, open-ended problems into clear, actionable subtasks and executing them cohesively.
Live Demonstration: Building a Kimi Release Timeline Website
To showcase the model’s agentic power, the speaker gave Kimi a simple but open-ended prompt: “Find out all the dates of the Kimi releases from Moonshot and create a website using the Kimi computer.”
Step-by-Step Breakdown of the Demonstration
- Planning Phase: Kimi generated a to-do list outlining the steps needed.
- Search Execution: It performed multiple web searches to gather release dates.
- Data Aggregation: Search results were fed into a Python script to structure the data.
- Website Generation: Using the Kimi computer (a built-in code sandbox), it created HTML, CSS, and even generated images.
- Deployment: The final website was rendered and displayed in real time.
The resulting timeline included:
| Milestone | Details |
|---|---|
| Moonshot AI Founded | Approximately 2.5 years ago |
| Initial Kimi Chatbot | First public release |
| Kimi 1.5 | Released around the same time as DeepSeek R1; potentially under-recognized |
| Kimi K2 Launch | Major architectural upgrade |
| Kimi K2 Updated | Refined version released in September |
| Kimi K2 Thinking | Latest release (as of the video); notably missing from the auto-generated timeline—likely due to recency |
The entire process took only minutes, demonstrating the model’s ability to autonomously execute multi-step, real-world tasks.
Performance on Complex Problems: The Math Olympiad Test
In another test, Kimi K2 Thinking was given an International Math Olympiad question. After over 20 minutes of continuous reasoning—and requiring two manual “continue” prompts—it was still working toward a solution.
This mirrors the behavior of Google’s Gemini Deep Think model, which also required ~20 minutes to solve a similar problem. While speed remains a limitation, the model’s persistence and depth of analysis are remarkable for an open model.
Model Architecture and Technical Specifications
According to the Hugging Face model card, Kimi K2 Thinking is:
- An open model—freely downloadable and self-hostable
- A trillion-parameter Mixture of Experts (MoE) architecture
- With only 32 billion active parameters per token
- Following the industry-standard active-to-total parameter ratio of 20:1 to 30:1
- Trained with quantization-aware training (QAT), enabling high-quality 4-bit inference
This combination allows for powerful performance even on consumer-grade hardware when quantized, without significant quality loss.
Breaking the Creative Ceiling: RLVR Models and Creative Writing
Historically, models trained with Reinforcement Learning by Verifiable Rewards (RLVR) excelled at math and coding but underperformed in creative domains like fiction and narrative writing—areas where proprietary models maintained an edge.
Kimi K2 Thinking, however, is explicitly noted for its strength in creative writing and fiction, suggesting Moonshot AI has overcome this limitation. This positions it as a rare model that excels across both analytical and creative tasks.
Pricing and Access Options
Kimi K2 Thinking is available through multiple channels, with transparent pricing:
| Access Method | Input Price | Output Price | Notes |
|---|---|---|---|
| Moonshot AI (Standard) | $0.60 per million tokens | $2.50 per million tokens | Official API |
| Moonshot AI (Turbo) | $1.15 per million tokens | $8.00 per million tokens | Faster but more expensive |
| Open Router | Varies | Varies | Third-party aggregator; check provider quality |
The Kimi K2 Vendor Verifier: Ensuring Model Integrity
Moonshot AI introduced a novel quality control mechanism: the Kimi K2 Vendor Verifier. This system benchmarks third-party providers against Moonshot’s official implementation to grade model fidelity.
The first verifier report revealed that many providers were delivering subpar performance—likely due to excessive quantization or misconfigured inference setups. A second report was released just weeks before the video, reinforcing the importance of sourcing the model from trusted providers.
Why Token Cost Is Becoming Irrelevant
At just $2.50 per million output tokens for frontier-level intelligence, the cost of using Kimi K2 Thinking is remarkably low. This economic efficiency shifts the focus from token counting to what you can build—specifically, long-horizon autonomous agents that can execute complex, multi-step workflows previously impossible with earlier models.
API Compatibility: Anthropic-Style Endpoints
Kimi supports the Anthropic-style Claude API format, allowing developers to plug existing Claude-compatible code directly into Kimi with minimal changes. This compatibility lowers the barrier to adoption and enables rapid experimentation with agentic workflows that require code sandbox integration and tool orchestration.
Speed Limitations and Future Expectations
While Kimi K2 Thinking excels in depth and reasoning, it is not the fastest model available. The speaker draws a parallel to the anticipated GLM-4.6 Air model—a faster, optimized variant—and suggests a “Kimi K2 Air” or speed-optimized version may be on the horizon.
Use Cases Enabled by Kimi K2 Thinking
The model unlocks a new class of applications, including:
- Autonomous research agents that gather, verify, and synthesize information
- Self-deploying web applications generated from natural language prompts
- Long-horizon planning systems for business strategy or scientific exploration
- Creative co-pilots that generate fiction, scripts, or marketing content with deep contextual understanding
- Math and coding tutors capable of solving Olympiad-level problems through iterative reasoning
How to Try Kimi K2 Thinking Yourself
You have two primary options to experiment with the model:
- Moonshot AI’s official platform (moonshot.ai) – direct access with full tool support
- Open Router – a third-party API aggregator that routes requests to Moonshot
For developers, the Anthropic-compatible API makes integration straightforward. For researchers and hobbyists, the open weights on Hugging Face allow local deployment and fine-tuning.
The Bigger Picture: The Dawn of Agentic AI
Kimi K2 Thinking represents more than a model upgrade—it signals a paradigm shift. The industry is moving away from “prompt-and-response” LLMs toward agentic systems that plan, act, reflect, and iterate using external tools. Moonshot AI is positioning itself at the forefront of this transition, with Kimi as its flagship agentic engine.
What’s Missing? The Need for Long-Horizon Benchmarks
While Kimi excels on existing benchmarks like Humanity’s Last Exam, the speaker notes a gap: there’s no public evaluation on the METER benchmark, which specifically measures performance on long-horizon, multi-step tasks. Such a benchmark would provide even stronger evidence of Kimi’s agentic capabilities.
Final Thoughts: A New Era of Accessible Frontier Intelligence
Kimi K2 Thinking proves that frontier-level AI intelligence is no longer the exclusive domain of Silicon Valley. With open weights, aggressive pricing, and unmatched agentic capabilities, Moonshot AI has delivered a model that challenges the status quo.
As token costs plummet and reasoning depth increases, the real value now lies in building agents that can do meaningful work—from automating research to creating entire applications from a single prompt.
Your Next Steps
- Try Kimi K2 Thinking on moonshot.ai or via Open Router
- Experiment with agentic prompts: ask it to “plan and execute” a multi-step task
- Check the Kimi K2 Vendor Verifier before choosing a third-party provider
- Explore the Hugging Face model card for self-hosting and quantization options
- Start designing long-horizon agents—this is where the future of AI lies
As the speaker concludes: “We’re at this point now where the cost of tokens is becoming so cheap for the amount of intelligence that we’re getting out of it that we really should be focused on long horizon agents and what you can actually build with those.”
The era of agentic AI is here—and Kimi K2 Thinking is leading the charge.

