# Overview of LLMs

### 1. Find Primary Use Cases LLMs & R-LLMs

<table><thead><tr><th width="282.18182373046875">Use Case</th><th>LLM Models</th></tr></thead><tbody><tr><td>General Productivity</td><td>GPT 5.4, Gemini 2.5 Pro, Claude Haiku 4.5</td></tr><tr><td>Complex Reasoning</td><td>Gemini 2.5 Pro, Claude Sonnet 4.6, Claude Opus 4.6</td></tr><tr><td>Structured Writing &#x26; Synthesis</td><td>GPT 5.4, Claude Sonnet 4.6</td></tr><tr><td>Coding &#x26; Technical Workflows</td><td>GPT 5.3 Codex, Claude Sonnet 4.6</td></tr><tr><td>Fast &#x26; Scalable Processing</td><td>GPT 5.4 Non Thinking, Gemini 2.5 Flash (Lite), Claude Haiku 4.5</td></tr></tbody></table>

### 2. Hosting Preference

Choose where your data is processed based on your privacy needs and access priorities. We offer two hosting options, EU and US, each with different benefits around compliance, speed, and model access.

| Factor                       | EU Hosting (Privacy First)                                  | US Hosting (Feature First)                          |
| ---------------------------- | ----------------------------------------------------------- | --------------------------------------------------- |
| **GDPR Compliance**          | Fully GDPR-compliant                                        | Not GDPR-compliant by default                       |
| **Data Residency**           | Data stays in the EU                                        | Data stored globally                                |
| **Model Availability**       | Late model release depending on EU Data Center availability | Full access to the latest models and features first |
| **Legal & Regulatory Risks** | Meets stricter EU privacy laws                              | Subject to US law and transfer safeguards           |

**Summary:**

* **Choose EU hosting** if you prioritize **GDPR compliance** and **strict data privacy** within Europe.
* **Choose US hosting** if you want **the latest models** and global data centers.

### 3. Speed vs. Depth: What Matters More to You?

Some models are designed for quick, lightweight tasks. Others are built to dive deeper, think harder, and handle more complexity. Choose based on the kind of experience you need.

| Preference                                  | When to Choose                                                                     | Models                                                          |
| ------------------------------------------- | ---------------------------------------------------------------------------------- | --------------------------------------------------------------- |
| <p>High Speed<br>(Fast, Responsive)</p>     | For fast answers or simple tasks where low latency matters most.                   | Gemini 2.5 Flash (Lite), Claude Sonnet 4.6 (Fast), GPT 5.4 Mini |
| <p>High Depth<br>(Detailed, Structured)</p> | For complex prompts, multi-step logic, or detailed analysis that needs reflection. | Gemini 2.5 Pro, Claude 4.6 Sonnet, GPT 5.4 High Thinking        |

### 4. Choose your Desired LLMs

| Model                                | Description                                                                                                                                                                                                                                                                                                                                                                                  |
| ------------------------------------ | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| Claude Sonnet 4.6                    | Improved Sonnet model with stronger coding, computer use, and long-context reasoning. Better suited for agent workflows, large-codebase tasks, document analysis, and other multi-step work.                                                                                                                                                                                                 |
| Claude Sonnet 4.6 (Fast)             | A configuration of Claude Sonnet 4.6. Operates strictly in "Non-Reasoning" mode, bypassing long thinking steps in order to have low-latency quality outputs.                                                                                                                                                                                                                                 |
| Claude Opus 4.6                      | <p>Improves on the coding strengths of prior models with more reliable performance for agentic tasks, codebase management, and structured code review and debugging.<br>It’s also stronger in everyday work. From handling analysis, document review, and multitasking more consistently and efficiently.</p>                                                                                |
| Claude Opus 4.5                      | Built for agentic coding, long-horizon reasoning, and complex tool-driven workflows, with high correctness across multi-step execution. It excels at software engineering tasks, terminal and computer use, structured tool orchestration, and novel problem solving, achieving strong first-pass success with fewer retries.                                                                |
| Claude Haiku 4.5                     | Fast, efficient, and built for scale. Delivers near-Sonnet-level coding and reasoning at about 3× cheaper and 2× faster performance. Excels in tool use, UI interaction, and parallel task execution, ideal as a worker model in multi-agent or production setups. Strong on coding reliability. Best for backend automations, chat workloads, and agent systems needing speed and low cost. |
| Claude Sonnet 4.5                    | Extends balanced performance with stronger reasoning, longer task persistence, and more reliable tool use, achieving state-of-the-art results in coding and real-world computer tasks. Delivers higher accuracy, greater autonomy, and smoother multi-step workflows compared to Claude 4 Sonnet.                                                                                            |
| Claude Sonnet 4.5 (Fast)             | A configuration of Claude Sonnet 4.5. Operates strictly in "Non-Reasoning" mode, bypassing long thinking steps in order to have low-latency quality outputs.                                                                                                                                                                                                                                 |
| Claude Opus 4.1                      | Sharper, steadier, smarter. Builds on Opus 4 with noticeably cleaner code fixes, longer autonomous agentic workflows, and more precise reasoning. Tackles long, multi-step tasks with hybrid reasoning and extended "thinking", including code refactoring, deep research, and strategic synthesis.                                                                                          |
| Claude Opus 4                        | Designed for long, demanding tasks that require deep thinking and consistent performance. Especially strong at powering autonomous agents, handling large, complex codebases, and running multi-step workflows that can last for hours. Ideal for advanced coding tasks, detailed analysis, and tasks that need smart, sustained focus.                                                      |
| Claude Sonnet 4                      | Enhances balanced performance through refined reasoning, extended autonomy, and precise instruction-following, delivering improved coding capabilities and greater task accuracy compared to Claude 3.7 Sonnet.                                                                                                                                                                              |
| Gemini 3.1 Pro                       | Great for complex reasoning, coding, and long-context work across text, images, audio, video, PDFs, and large codebases. Best for deep research, multi-step agent workflows, technical planning, and document-heavy analysis.                                                                                                                                                                |
| Gemini 3 Pro                         | High-performance multimodal model for teams working with text, images, documents, videos, and code. Offers strong long-context reasoning, reliable analysis across large files, and advanced tool use for more automated workflows.                                                                                                                                                          |
| Gemini 2.5 Pro (Enterprise Search)   | Excels at reasoning, conversation, and coding, configured for secure commercial use. Uses Web Grounding for Enterprise to provide access to live web data without the privacy risks of standard search tools.                                                                                                                                                                                |
| Gemini 2.5 Pro                       | Well-rounded, reliable model for everyday tasks. Excels at reasoning, conversation, and coding. This is a strong fit for smart assistants, business tools, and creative workflows that require speed, accuracy, and thoughtful output.                                                                                                                                                       |
| Gemini 2.5 Flash                     | Smarter and more capable than 2.0 Flash, with fast responses and visible reasoning. Best for real-time tasks that need both speed and lightweight thinking, like chatbots, copilots, and scalable AI tools.                                                                                                                                                                                  |
| Google Vertex Imagen 4 Generate      | Designed for premium text-to-image generation with rich detail and consistent output quality. With integrated safety, watermarking, and prompt refinement. This fits well into enterprise workflows requiring trustworthy and compliant visual creation.                                                                                                                                     |
| Google Vertex 2.5 Flash Image        | Optimized for fast, flexible image generation and editing. Handles text-to-image, targeted visual edits, character consistency, and multi-image fusion. Ideal for rapid visual experimentation and consistent product imagery in cost-effective workflows.                                                                                                                                   |
| Gemini 2.0 Flash (Thinking Mode)     | Designed for academic precision, complex logic, and structured problem-solving. Well-suited for scientific reports, step-by-step reasoning, and data interpretation tasks that require depth over speed. Not intended for fast-turnaround or real-time use.                                                                                                                                  |
| GPT 5.5 Pro                          | Same underlying model as GPT-5.5, but applies parallel test-time compute for the hardest tasks. Leads on BrowseComp (90.1%) and FrontierMath Tier 4 (39.6%). Best reserved for problems where standard reasoning effort isn't enough.                                                                                                                                                        |
| GPT 5.5                              | Handles complex agentic work alongside coding, research, computer use, and high token tasks. Scores 82.7% on Terminal-Bench 2.0 and 84.9% on GDPval. Best for tasks requiring sustained reasoning across context.                                                                                                                                                                            |
| GPT 5.4 High Thinking with Search    | Uses more reasoning resources for complex and multi-step tasks. Best for deep analysis, strategy, and technical problem-solving with automatic search support.                                                                                                                                                                                                                               |
| GPT 5.4 Low Thinking with Search     | Uses additional reasoning compute to improve accuracy and structure. Suitable for summarization, light analysis, and basic decision-making with automatic search for up-to-date information.                                                                                                                                                                                                 |
| GPT 5.4 None Thinking with Search    | Uses baseline reasoning without extra compute. Optimized for speed and cost which is ideal for formatting, classification, and structured outputs, with automatic search when needed.                                                                                                                                                                                                        |
| GPT 5.4 Mini                         | Mainly designed for coding, tool workflows, and high-volume execution tasks. Fast and cost efficient. Best used as a default production model rather than for deep reasoning.                                                                                                                                                                                                                |
| GPT 5.3 Codex                        | Built for agentic software engineering, combining GPT-5.2 Codex's coding performance with broader reasoning across research, documentation, and technical decision-making. This runs 25% faster than its predecessor.                                                                                                                                                                        |
| GPT 5.2 Thinking with Search         | Built for complex reasoning and multi-step agentic workflows. Optimized for long-horizon knowledge work with improved factuality and integrated search.                                                                                                                                                                                                                                      |
| GPT 5.2                              | Built for end-to-end professional knowledge work and long-running agents. Produces higher-quality work artifacts, writes and refactors code more effectively, understands long context better, and perceives images more accurately.                                                                                                                                                         |
| GPT 5.1 Codex Max                    | Built for complex, high-risk coding work and autonomous agents. Stronger reasoning and consistency across large codebases. Best for multi-step refactors where correctness and depth outweigh speed.                                                                                                                                                                                         |
| GPT 5.1 Codex                        | High value for developer workflows like code generation and programming tasks. Best for short-to-medium coding problems. Ideal for human-in-the-loop development.                                                                                                                                                                                                                            |
| GPT 5.1 High Reasoning               | Uses high reasoning resources for the highest quality responses. Designed for tasks where high-leverage thinking and strategy are necessary.                                                                                                                                                                                                                                                 |
| GPT 5.1 Low Reasoning                | Utilizes reduced reasoning resources for high responsiveness. Bypasses deep reflective processing for near-instant answers. Good for ideal for real-time conversation and high-volume data processing.                                                                                                                                                                                       |
| GPT 5.1 None Reasoning               | Uses the least reasoning resources for the lowest latency and maximum cost efficiency. Designed for simple text formatting, high-volume classification, and conversational flow.                                                                                                                                                                                                             |
| GPT 5.1                              | Built for fast, reliable conversations with stronger instruction-following and more natural dialogue. A refined upgrade to GPT-5 with smoother communication and improved everyday usability.                                                                                                                                                                                                |
| GPT 5 Codex                          | A version of GPT-5 tailored for software engineering. Builds projects, adds features, debugs, refactors code, and performs code reviews. Adapts reasoning time to task complexity.                                                                                                                                                                                                           |
| GPT-5 (Auto)                         | Automatically selects the most suitable GPT model based on query complexity and performance needs. Routes to GPT-5 & GPT-4.1 family and o4 Mini.                                                                                                                                                                                                                                             |
| GPT-5 Thinking                       | Thinks like an expert, adapts to the task. Handles writing, coding, research, and data analysis. Intended to replace OpenAI o3 and OpenAI o3 Pro.                                                                                                                                                                                                                                            |
| GPT-5 Nano                           | Ultra-light and lightning-fast. Built for low-latency needs like summarization, classification, and quick Q\&A at a fraction of the cost. Intended to replace GPT 4.1 Nano.                                                                                                                                                                                                                  |
| GPT-5 Mini                           | Lean, cost-conscious, and still sharp. Delivers reliable instruction-following, rich multimodal responses, and reduced latency at a lighter compute and price point. Intended to replace GPT 4o Mini and OpenAI o4 Mini.                                                                                                                                                                     |
| GPT-5                                | Tailored for natural, enterprise-grade conversations. Multimodal and context-aware. Intended to replace GPT 4o.                                                                                                                                                                                                                                                                              |
| Azure GPT Image 1.5                  | Enhanced version of Azure GPT Image 1 with improved image quality, faster generation times, and better handling of complex prompts.                                                                                                                                                                                                                                                          |
| Azure GPT-image-1                    | Built for high-quality image generation and precise instruction following within Azure's enterprise-grade environment.                                                                                                                                                                                                                                                                       |
| GPT 4o Realtime                      | Ideal for fast, conversational tasks with streaming audio. Extremely low latency and strong comprehension for voice-based interactions. Best for real-time assistants and audio-first products.                                                                                                                                                                                              |
| GPT Realtime                         | Built for live, low-latency conversations with voice-in, voice-out streaming. Best for interactive assistants and real-time translators. Not intended for long, complex reasoning tasks.                                                                                                                                                                                                     |
| GPT 4.5                              | Delivers subtle improvements in emotional tone, writing flow, and creative ideation, especially in chat-like settings. Best for polished conversation output. Note: Very expensive (10–15× more costly than GPT-4o.                                                                                                                                                                          |
| GPT 4o Mini                          | A smaller, more affordable version of GPT-4 Omni for fast, high-quality responses without the higher cost. Great for smart assistants, everyday tasks, and apps needing reliable reasoning with some image or audio input.                                                                                                                                                                   |
| GPT 4.1 Nano                         | The fastest and most lightweight GPT-4.1 model, built for speed and simplicity. Ideal for quick autocomplete, fast classifications, or lightweight assistants where cost and response time matter most.                                                                                                                                                                                      |
| GPT 4.1 Mini                         | A faster, more efficient version of GPT-4.1 that keeps much of its quality while using fewer resources. Great for developers, startups, and product teams needing quick, capable performance at lower cost.                                                                                                                                                                                  |
| GPT 4.1                              | A powerful and refined model for tasks that demand precision and depth. Excels in coding, instruction-following, and handling extensive documents with a 1 million-token context window.                                                                                                                                                                                                     |
| Magistral Medium                     | Lean, swift, and powerful. Built for rapid, high-volume tasks with low latency. Features strong instruction-following, conversational finesse, and robust multimodal understanding.                                                                                                                                                                                                          |
| Mistral Small 24B                    | Designed for long-form analysis, deep document comprehension, and multimodal tasks. Supports an immense 10-million token context and excels in reasoning, image understanding, and coding.                                                                                                                                                                                                   |
| Mistral Large                        | An enterprise-grade reasoning powerhouse built via reinforcement learning. Delivers fast, traceable, multilingual chains-of-thought. Handles complex tasks in math, code, logic, and rule-based workflows with precision and transparency.                                                                                                                                                   |
| Mistral Small                        | A frontier-class generalist for enterprise demands. Excels in programming, mathematical reasoning, long-document comprehension, dialogue, and multimodal tasks. Offers up to 128K token context with function calling and agentic workflows.                                                                                                                                                 |
| Mistral Medium                       | Focuses on language fluency, response quality, and multilingual coverage. Well-suited for chatbots, writing assistance, and global customer support.                                                                                                                                                                                                                                         |
| Mistral Nemo 12B                     | Understands both language and images with impressive accuracy. Ideal for research, enterprise tools, and tasks combining visual reasoning with strong language skills.                                                                                                                                                                                                                       |
| Mistral Nemo                         | A multilingual powerhouse built for deep reasoning, code, math, and agent workflows. Offers a 32K-token context window, native function-calling, and RAG support.                                                                                                                                                                                                                            |
| Mistral Codestral                    | Well-suited for real-time development, code automation, and debugging across various programming languages. Improves code generation speed, accuracy, and multi-language support over earlier open models.                                                                                                                                                                                   |
| Grok 4                               | Frontier-level Grok model with native tool use, real-time web access, and advanced reasoning. Ideal for power users needing live data, long-context comprehension, and complex task execution.                                                                                                                                                                                               |
| Grok 3 Mini (thinking - High Effort) | A more efficient Grok model tuned for thoughtful, deeper reasoning. Ideal for moderately complex tasks where you want accurate answers without the full cost of a flagship model.                                                                                                                                                                                                            |
| Grok 3 Mini (thinking - Low Effort)  | A fast, low-resource Grok model for casual interactions and simple tasks. Ideal when speed matters more than depth.                                                                                                                                                                                                                                                                          |
| Grok 3                               | Built for stronger reasoning and deeper conversations across everyday and technical topics. Well-suited for smart assistants, research tools, and advanced chat experiences.                                                                                                                                                                                                                 |
| Grok 2 Vision                        | Designed to understand both images and text. Helpful for visual queries, screenshots, and image-based tasks.                                                                                                                                                                                                                                                                                 |
| Nebius AI Flux Schnell               | Optimized for fast, high-volume image generation. Excels in rapid visual iteration for design exploration, marketing assets, and creative prototyping.                                                                                                                                                                                                                                       |
| Teuken 7B                            | Balanced power for demanding, mixed workloads. Combines strong multimodal reasoning, coding, and multilingual skills with a 1-million token context. Best for versatile enterprise apps and AI agents.                                                                                                                                                                                       |


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.en.theblockbrain.ai/for-users/all-about-llms/overview-of-llms.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
