Discovery

Model directory

Curated NLP, scientific, engineering, and robotics models from Hugging Face for local and private deployment.

NLP & general language

Chat, summarization, and general-purpose language models for text and dialogue.

Llama 3.2

3B / 11B

General chat, RAG, and low-latency on-device use.

Meta's efficient small and medium language models for chat and instruction following. Strong general-purpose performance with 3B and 11B variants. Optimized for latency and throughput on consumer hardware.

Context: 128K

Customer support bots

Document Q&A

Summarization

View on Hugging Face

Llama 3.1 8B

RAG backends, chat, and medium-complexity reasoning.

Meta's 8B parameter model with strong instruction following and extended context. Good default for balanced quality and speed on a single GPU.

Context: 128K

Internal knowledge bases

Chat assistants

Drafting and editing

View on Hugging Face

Mistral 7B

Fast inference and multilingual chat.

Fast and capable 7B model for chat and instruction. Good balance of speed and quality for on-device or edge deployment. Strong in French and multilingual settings.

Context: 32K

Real-time chat

Multilingual support

Thin clients

View on Hugging Face

Mixtral 8x7B

8x7B (MoE)

High quality at moderate cost; RAG and long documents.

Mistral's mixture-of-experts model: 8 experts, 7B params each, ~13B active. Near-70B quality at much lower compute. Strong for long-form and reasoning.

Context: 32K

Enterprise RAG

Report generation

Complex Q&A

View on Hugging Face

Qwen2.5

0.5B–72B

Multilingual, tool use, and scalable sizes.

Alibaba's Qwen2.5 family for chat and long-context tasks. Strong multilingual and reasoning performance in 0.5B–72B sizes. Excellent tool use and instruction adherence.

Context: 32K–128K

APIs and tools

Non-English content

Fine-tuning base

View on Hugging Face

Phi-3

3.8B

Laptops, edge, and low-memory environments.

Microsoft's small language models (3.8B) with strong reasoning and instruction following. Suited for resource-constrained and edge deployment. Good at math and logic.

Context: 4K–128K

Offline assistants

Embedded systems

Quick prototypes

View on Hugging Face

Gemma 2

2B–27B

General chat and safe, aligned behavior.

Google's open weights models for chat and instruction. Available in 2B, 9B, and 27B; good for general NLP and RAG backends. Trained with RL and preference data.

Context: 8K

Content moderation

Safe chatbots

RAG

View on Hugging Face

DeepSeek-V3

671B

Heavy reasoning, math, and code when you have GPU capacity.

DeepSeek's large-scale model for complex reasoning, coding, and long-context tasks. Strong in math and code. Mixture-of-experts architecture for efficiency.

Context: 128K

Research

Code generation

Math and logic

View on Hugging Face

Command R+

104B

Enterprise RAG with citations and grounding.

Cohere's Command R+ for enterprise RAG and long-context tasks. Tuned for citation, grounding, and multilingual enterprise use.

Context: 128K

Document retrieval

Cited answers

Compliance-aware Q&A

View on Hugging Face

SOLAR 10.7B

10.7B

Quality/size tradeoff on one GPU.

Upstage's SOLAR: strong 10.7B model for chat and instruction. Good performance per parameter; suitable for single-GPU deployment.

Context: 32K

Chat

Summarization

RAG

View on Hugging Face

SmolVM2

1.5B

Fast iteration and minimal hardware.

Hugging Face's small, fast model for chat. Designed for low-resource and fast iteration; good for testing and lightweight agents.

Prototyping

Embedded

High-throughput

View on Hugging Face

Scientific

Models for biomedical, chemistry, and research text understanding and generation.

BioGPT

1.5B

Biomedical Q&A and entity extraction.

Microsoft's biomedical language model for literature-based discovery, entity recognition, and question answering over scientific text. Trained on PubMed abstracts and full texts.

Literature review

Drug discovery

Biomedical NER

View on Hugging Face

BioGPT-Large (v2)

Biomedical text generation and completion.

Biomedical generative model for PubMed-style text generation and downstream tasks in life sciences. Useful for hypothesis generation from literature.

Abstract generation

Hypothesis suggestion

View on Hugging Face

Galactica

125M–120B

Scientific citations and formula-aware generation.

Meta's scientific language model trained on papers, references, and knowledge. For scientific Q&A and citation-aware generation. Handles formulas and references.

Literature Q&A

Citation extraction

Formula completion

View on Hugging Face

SciBERT

Scientific text embeddings and classification.

BERT pretrained on scientific papers (Semantic Scholar). Best for classification and embedding of scientific text, not generative chat.

Paper similarity

Topic classification

Search indexing

View on Hugging Face

PubMedBERT

Biomedical NLP tasks (NER, RE, classification).

BERT trained on PubMed and PMC. Strong for biomedical NER, relation extraction, and classification on clinical and biology text.

Clinical notes

Entity extraction

Evidence retrieval

View on Hugging Face

ChemBERTa

Chemistry and molecular ML.

Domain BERT for chemistry (SMILES, IUPAC, reactions). Useful for property prediction, reaction outcome, and molecular representation.

Property prediction

Reaction modeling

Molecular embeddings

View on Hugging Face

Engineering & coding

Code generation, completion, and documentation for software and engineering.

Code Llama

7B–34B

Code completion, docs, and multi-language support.

Meta's code-specialized Llama for code completion, generation, and documentation. Supports multiple languages and fill-in-the-middle (FIM). Multiple size variants.

Context: 16K

IDE completion

Doc generation

Code review

View on Hugging Face

Qwen2.5-Coder

7B–32B

Full-stack and multi-language code generation.

Qwen2.5 fine-tuned for code. Strong at completion, generation, and editing across many programming languages. Good at repo-level context.

Context: 32K

Code generation

Refactoring

Explanations

View on Hugging Face

DeepSeek Coder

1.3B–33B

Code generation and FIM in IDEs.

DeepSeek's code models for generation and fill-in-the-middle. Strong on HumanEval and practical coding tasks. 1.3B to 33B sizes.

Context: 16K

Autocomplete

Snippet generation

Code explanation

View on Hugging Face

StarCoder2

3B–15B

Code completion and open-weight code models.

BigCode's StarCoder2 for code completion and generation. Trained on The Stack; supports many languages. 3B, 7B, and 15B variants.

Completion

Multi-language

FIM

View on Hugging Face

Codestral

22B

High-quality code generation and editing.

Mistral's code-specialized model for generation, completion, and fill-in-the-middle. Strong on multiple languages and code reasoning.

Context: 32K

Code gen

Documentation

Testing

View on Hugging Face

Magicoder

6.7B

Instruction-following code generation.

OSS-oriented instruction-tuned code model. Strong on diverse programming tasks and instruction following for code.

Code from specs

Tutorial generation

Multi-file

View on Hugging Face

DeepSeek R1 Coder

1.3B

Reasoning-heavy coding and debugging.

DeepSeek's reasoning-focused coder for complex programming tasks. Chain-of-thought and plan-then-code style outputs.

Debugging

Algorithm design

Step-by-step code

View on Hugging Face

Robotics & orchestration

Reasoning, planning, and task orchestration for automation and robotics workflows.

Qwen2.5 72B (reasoning)

72B

Orchestration and multi-step reasoning backends.

Large Qwen2.5 for complex reasoning and planning. Useful as a brain for orchestration, workflow planning, and multi-step tasks. Strong tool use.

Context: 128K

Workflow planning

Agent reasoning

Tool orchestration

View on Hugging Face

Llama 3.1 70B

70B

Heavy reasoning and agent backends.

Large Llama for reasoning and instruction. Suited for orchestration backends and agentic workflows requiring strong reasoning and long context.

Context: 128K

Agents

Planning

Complex Q&A

View on Hugging Face

OpenVLA

Embodied AI and robot control from language.

Open vision-language-action model for robotics. Maps images and language to robot actions; for embodied and robotics research.

Robot control

Vision-language

Research

View on Hugging Face

Muon (reasoning)

Complex planning and chain-of-thought.

Cognition's reasoning-oriented model for complex planning and chain-of-thought. Useful for orchestration and decision pipelines.

Orchestration

Multi-step planning

Decision support

View on Hugging Face

Qwen2.5 14B

14B

Moderate-size orchestration and tool use.

Mid-size Qwen2.5 for reasoning and tool use. Good balance for orchestration when 72B is too large. Strong instruction and tool adherence.

Context: 128K

Agents

APIs

Workflow steps

View on Hugging Face

Command R

35B

RAG-centric orchestration and grounding.

Cohere's Command R for RAG and long-context orchestration. Tuned for retrieval-augmented generation and enterprise workflows.

Context: 128K

RAG pipelines

Document agents

Enterprise search

View on Hugging Face