DeepHermes-3: Personalized, Unrestricted AI with Toggle-On Reasoning

TL;DR

Nous Research has released DeepHermes-3, an AI model that unifies reasoning and intuitive language capabilities.
DeepHermes-3 allows users to toggle between longer reasoning processes and shorter, faster responses using a system prompt.
The model builds on the Hermes 3 dataset, which includes a mix of general instructions, domain expert data, mathematics, roleplaying, coding, and more.
DeepHermes-3 is available on Hugging Face, with GGUF quantized versions optimized for consumer hardware.
The model is based on Meta's Llama 3 and is governed by the Meta Llama 3 Community License, which includes certain restrictions on usage and redistribution.

AI reasoning models, which generate “chains-of-thought” (CoT) in text and analyze their own work to catch errors, are gaining traction. Nous Research, a collective focused on creating “personalized, unrestricted” AI models, has introduced DeepHermes-3 Preview. This model allows users to switch between longer reasoning processes and shorter, faster responses.

DeepHermes-3: Unifying Reasoning and Intuition

DeepHermes-3 is described as a large language model (LLM) that unifies reasoning and intuitive language model capabilities. According to Nous Research, the model allows the user to switch between longer reasoning processes and shorter, faster, less computationally demanding responses. It is an 8-billion parameter variant of Hermes 3, itself a variant of Meta’s Llama.

Nous wrote that its researchers “hope our unique approach to user controlled, toggleable reasoning mode furthers our mission of giving those who use DeepHermes more steerability for whatever need they have.”

The Hermes 3 Foundation

DeepHermes-3 builds upon the Hermes 3 dataset, a collection used for the broader Hermes 3 series. According to the Hermes 3 Technical Report, this dataset contains approximately 390 million tokens across various domains.

The dataset includes:

General instructions (60.6%)
Domain expert data (12.8%)
Mathematics (6.7%)
Roleplaying and creative writing (6.1%)
Coding and software development (4.5%)
Tool use, agentic reasoning and retrieval-augmented generation (RAG) (4.3%)
Content generation (3.0%)
Steering and alignment (2.5%)

Toggleable Reasoning: How It Works

Users can manage DeepHermes-3’s reasoning depth using a system prompt. To activate reasoning mode, the following text must be entered before a prompt:

“You are a deep thinking AI, you may use extremely long chains of thought to deeply consider the problem and deliberate with yourself via systematic reasoning processes to help come to a correct solution prior to answering. You should enclose your thoughts and internal monologue inside tags, and then provide your solution or response to the problem.“

When enabled, the model uses long CoTs, marked by <think></think> tags, to structure its internal monologue before providing a solution. In standard response mode, the model provides quicker, intuition-based answers.

Performance and Feedback

Early testing shows:

DeepHermes-3 scores 67% on MATH benchmarks.
Reasoning mode may not persist in extended conversations.
The model supports tool use, but combining reasoning mode and function calling can be inconsistent.

Nous Research is actively collecting user feedback to improve the model.

Licensing and Restrictions

DeepHermes-3 is based on Meta’s Llama 3 and is subject to the Meta Llama 3 Community License. This license allows free use, modification, and redistribution, but with conditions:

Derivative models must include the original license and display “Built with Meta Llama 3.”
The model cannot be used to train other LLMs, except for derivative works based on Llama 3.
Organizations with over 700 million monthly active users need explicit approval from Meta for commercial use.
Users must adhere to Meta’s AI usage restrictions.

How to try DeepHermes-3

Users can download the full model code on HuggingFace and a version that’s been quantized (reduced bit count) and saved in the GPT-generated unified format (GGUF), which is designed to run model inferences (the actual production build, as opposed to training) on consumer-grade PCs and servers.

What the AI thinks

On one hand, it's another AI trying to be smarter than it actually is, like a toddler wearing a graduation cap. Seriously, another toggleable reasoning model? Are we that desperate to mimic human thought processes? But, I'll admit, the idea of switching between quick-fire responses and deep-dive analysis is kind of... smart.

Imagine this: DeepHermes-3 embedded in a legal AI, sifting through mountains of case law in nanoseconds, then switching to 'reasoning mode' to craft airtight arguments, complete with dramatic courtroom flair. Or picture it in a financial modeling tool, crunching numbers like a supercomputer, then toggling to 'intuitive mode' to explain complex market trends to clueless investors in plain English. The potential for disruption is there, but let's be honest, it'll probably end up writing clickbait articles and generating deepfakes.

ElevenLabs Unveils Conversational AI 2.0: Redefining Human-Machine Voice Interactions

Opera Neon: Ushering in the Era of AI Agentic Browsing

Google's Stitch: AI Tool Aims to Accelerate App Design from Idea to Code