Discovery

Model directory

Curated NLP, scientific, engineering, and robotics models from Hugging Face for local and private deployment.

NLP & general language

Chat, summarization, and general-purpose language models for text and dialogue.

Llama 3.2

3B / 11B

General chat, RAG, and low-latency on-device use.

Meta's efficient small and medium language models for chat and instruction following. Strong general-purpose performance with 3B and 11B variants. Optimized for latency and throughput on consumer hardware.

Context: 128K

Customer support bots
Document Q&A
Summarization
View on Hugging Face

Llama 3.1 8B

8B

RAG backends, chat, and medium-complexity reasoning.

Meta's 8B parameter model with strong instruction following and extended context. Good default for balanced quality and speed on a single GPU.

Context: 128K

Internal knowledge bases
Chat assistants
Drafting and editing
View on Hugging Face

Mistral 7B

7B

Fast inference and multilingual chat.

Fast and capable 7B model for chat and instruction. Good balance of speed and quality for on-device or edge deployment. Strong in French and multilingual settings.

Context: 32K

Real-time chat
Multilingual support
Thin clients
View on Hugging Face

Mixtral 8x7B

8x7B (MoE)

High quality at moderate cost; RAG and long documents.

Mistral's mixture-of-experts model: 8 experts, 7B params each, ~13B active. Near-70B quality at much lower compute. Strong for long-form and reasoning.

Context: 32K

Enterprise RAG
Report generation
Complex Q&A
View on Hugging Face

Qwen2.5

0.5B–72B

Multilingual, tool use, and scalable sizes.

Alibaba's Qwen2.5 family for chat and long-context tasks. Strong multilingual and reasoning performance in 0.5B–72B sizes. Excellent tool use and instruction adherence.

Context: 32K–128K

APIs and tools
Non-English content
Fine-tuning base
View on Hugging Face

Phi-3

3.8B

Laptops, edge, and low-memory environments.

Microsoft's small language models (3.8B) with strong reasoning and instruction following. Suited for resource-constrained and edge deployment. Good at math and logic.

Context: 4K–128K

Offline assistants
Embedded systems
Quick prototypes
View on Hugging Face

Gemma 2

2B–27B

General chat and safe, aligned behavior.

Google's open weights models for chat and instruction. Available in 2B, 9B, and 27B; good for general NLP and RAG backends. Trained with RL and preference data.

Context: 8K

Content moderation
Safe chatbots
RAG
View on Hugging Face

DeepSeek-V3

671B

Heavy reasoning, math, and code when you have GPU capacity.

DeepSeek's large-scale model for complex reasoning, coding, and long-context tasks. Strong in math and code. Mixture-of-experts architecture for efficiency.

Context: 128K

Research
Code generation
Math and logic
View on Hugging Face

Command R+

104B

Enterprise RAG with citations and grounding.

Cohere's Command R+ for enterprise RAG and long-context tasks. Tuned for citation, grounding, and multilingual enterprise use.

Context: 128K

Document retrieval
Cited answers
Compliance-aware Q&A
View on Hugging Face

SOLAR 10.7B

10.7B

Quality/size tradeoff on one GPU.

Upstage's SOLAR: strong 10.7B model for chat and instruction. Good performance per parameter; suitable for single-GPU deployment.

Context: 32K

Chat
Summarization
RAG
View on Hugging Face

SmolVM2

1.5B

Fast iteration and minimal hardware.

Hugging Face's small, fast model for chat. Designed for low-resource and fast iteration; good for testing and lightweight agents.

Prototyping
Embedded
High-throughput
View on Hugging Face

Scientific

Models for biomedical, chemistry, and research text understanding and generation.

BioGPT

1.5B

Biomedical Q&A and entity extraction.

Microsoft's biomedical language model for literature-based discovery, entity recognition, and question answering over scientific text. Trained on PubMed abstracts and full texts.

Literature review
Drug discovery
Biomedical NER
View on Hugging Face

BioGPT-Large (v2)

Biomedical text generation and completion.

Biomedical generative model for PubMed-style text generation and downstream tasks in life sciences. Useful for hypothesis generation from literature.

Abstract generation
Hypothesis suggestion
View on Hugging Face

Galactica

125M–120B

Scientific citations and formula-aware generation.

Meta's scientific language model trained on papers, references, and knowledge. For scientific Q&A and citation-aware generation. Handles formulas and references.

Literature Q&A
Citation extraction
Formula completion
View on Hugging Face

SciBERT

Scientific text embeddings and classification.

BERT pretrained on scientific papers (Semantic Scholar). Best for classification and embedding of scientific text, not generative chat.

Paper similarity
Topic classification
Search indexing
View on Hugging Face

PubMedBERT

Biomedical NLP tasks (NER, RE, classification).

BERT trained on PubMed and PMC. Strong for biomedical NER, relation extraction, and classification on clinical and biology text.

Clinical notes
Entity extraction
Evidence retrieval
View on Hugging Face

ChemBERTa

Chemistry and molecular ML.

Domain BERT for chemistry (SMILES, IUPAC, reactions). Useful for property prediction, reaction outcome, and molecular representation.

Property prediction
Reaction modeling
Molecular embeddings
View on Hugging Face

Engineering & coding

Code generation, completion, and documentation for software and engineering.

Code Llama

7B–34B

Code completion, docs, and multi-language support.

Meta's code-specialized Llama for code completion, generation, and documentation. Supports multiple languages and fill-in-the-middle (FIM). Multiple size variants.

Context: 16K

IDE completion
Doc generation
Code review
View on Hugging Face

Qwen2.5-Coder

7B–32B

Full-stack and multi-language code generation.

Qwen2.5 fine-tuned for code. Strong at completion, generation, and editing across many programming languages. Good at repo-level context.

Context: 32K

Code generation
Refactoring
Explanations
View on Hugging Face

DeepSeek Coder

1.3B–33B

Code generation and FIM in IDEs.

DeepSeek's code models for generation and fill-in-the-middle. Strong on HumanEval and practical coding tasks. 1.3B to 33B sizes.

Context: 16K

Autocomplete
Snippet generation
Code explanation
View on Hugging Face

StarCoder2

3B–15B

Code completion and open-weight code models.

BigCode's StarCoder2 for code completion and generation. Trained on The Stack; supports many languages. 3B, 7B, and 15B variants.

Completion
Multi-language
FIM
View on Hugging Face

Codestral

22B

High-quality code generation and editing.

Mistral's code-specialized model for generation, completion, and fill-in-the-middle. Strong on multiple languages and code reasoning.

Context: 32K

Code gen
Documentation
Testing
View on Hugging Face

Magicoder

6.7B

Instruction-following code generation.

OSS-oriented instruction-tuned code model. Strong on diverse programming tasks and instruction following for code.

Code from specs
Tutorial generation
Multi-file
View on Hugging Face

DeepSeek R1 Coder

1.3B

Reasoning-heavy coding and debugging.

DeepSeek's reasoning-focused coder for complex programming tasks. Chain-of-thought and plan-then-code style outputs.

Debugging
Algorithm design
Step-by-step code
View on Hugging Face

Robotics & orchestration

Reasoning, planning, and task orchestration for automation and robotics workflows.

Qwen2.5 72B (reasoning)

72B

Orchestration and multi-step reasoning backends.

Large Qwen2.5 for complex reasoning and planning. Useful as a brain for orchestration, workflow planning, and multi-step tasks. Strong tool use.

Context: 128K

Workflow planning
Agent reasoning
Tool orchestration
View on Hugging Face

Llama 3.1 70B

70B

Heavy reasoning and agent backends.

Large Llama for reasoning and instruction. Suited for orchestration backends and agentic workflows requiring strong reasoning and long context.

Context: 128K

Agents
Planning
Complex Q&A
View on Hugging Face

OpenVLA

7B

Embodied AI and robot control from language.

Open vision-language-action model for robotics. Maps images and language to robot actions; for embodied and robotics research.

Robot control
Vision-language
Research
View on Hugging Face

Muon (reasoning)

Complex planning and chain-of-thought.

Cognition's reasoning-oriented model for complex planning and chain-of-thought. Useful for orchestration and decision pipelines.

Orchestration
Multi-step planning
Decision support
View on Hugging Face

Qwen2.5 14B

14B

Moderate-size orchestration and tool use.

Mid-size Qwen2.5 for reasoning and tool use. Good balance for orchestration when 72B is too large. Strong instruction and tool adherence.

Context: 128K

Agents
APIs
Workflow steps
View on Hugging Face

Command R

35B

RAG-centric orchestration and grounding.

Cohere's Command R for RAG and long-context orchestration. Tuned for retrieval-augmented generation and enterprise workflows.

Context: 128K

RAG pipelines
Document agents
Enterprise search
View on Hugging Face