Model directory
Curated NLP, scientific, engineering, and robotics models from Hugging Face for local and private deployment.
NLP & general language
Chat, summarization, and general-purpose language models for text and dialogue.
Llama 3.2
General chat, RAG, and low-latency on-device use.
Meta's efficient small and medium language models for chat and instruction following. Strong general-purpose performance with 3B and 11B variants. Optimized for latency and throughput on consumer hardware.
Context: 128K
Llama 3.1 8B
RAG backends, chat, and medium-complexity reasoning.
Meta's 8B parameter model with strong instruction following and extended context. Good default for balanced quality and speed on a single GPU.
Context: 128K
Mistral 7B
Fast inference and multilingual chat.
Fast and capable 7B model for chat and instruction. Good balance of speed and quality for on-device or edge deployment. Strong in French and multilingual settings.
Context: 32K
Mixtral 8x7B
High quality at moderate cost; RAG and long documents.
Mistral's mixture-of-experts model: 8 experts, 7B params each, ~13B active. Near-70B quality at much lower compute. Strong for long-form and reasoning.
Context: 32K
Qwen2.5
Multilingual, tool use, and scalable sizes.
Alibaba's Qwen2.5 family for chat and long-context tasks. Strong multilingual and reasoning performance in 0.5B–72B sizes. Excellent tool use and instruction adherence.
Context: 32K–128K
Phi-3
Laptops, edge, and low-memory environments.
Microsoft's small language models (3.8B) with strong reasoning and instruction following. Suited for resource-constrained and edge deployment. Good at math and logic.
Context: 4K–128K
Gemma 2
General chat and safe, aligned behavior.
Google's open weights models for chat and instruction. Available in 2B, 9B, and 27B; good for general NLP and RAG backends. Trained with RL and preference data.
Context: 8K
DeepSeek-V3
Heavy reasoning, math, and code when you have GPU capacity.
DeepSeek's large-scale model for complex reasoning, coding, and long-context tasks. Strong in math and code. Mixture-of-experts architecture for efficiency.
Context: 128K
Command R+
Enterprise RAG with citations and grounding.
Cohere's Command R+ for enterprise RAG and long-context tasks. Tuned for citation, grounding, and multilingual enterprise use.
Context: 128K
SOLAR 10.7B
Quality/size tradeoff on one GPU.
Upstage's SOLAR: strong 10.7B model for chat and instruction. Good performance per parameter; suitable for single-GPU deployment.
Context: 32K
SmolVM2
Fast iteration and minimal hardware.
Hugging Face's small, fast model for chat. Designed for low-resource and fast iteration; good for testing and lightweight agents.
Scientific
Models for biomedical, chemistry, and research text understanding and generation.
BioGPT
Biomedical Q&A and entity extraction.
Microsoft's biomedical language model for literature-based discovery, entity recognition, and question answering over scientific text. Trained on PubMed abstracts and full texts.
BioGPT-Large (v2)
Biomedical text generation and completion.
Biomedical generative model for PubMed-style text generation and downstream tasks in life sciences. Useful for hypothesis generation from literature.
Galactica
Scientific citations and formula-aware generation.
Meta's scientific language model trained on papers, references, and knowledge. For scientific Q&A and citation-aware generation. Handles formulas and references.
SciBERT
Scientific text embeddings and classification.
BERT pretrained on scientific papers (Semantic Scholar). Best for classification and embedding of scientific text, not generative chat.
PubMedBERT
Biomedical NLP tasks (NER, RE, classification).
BERT trained on PubMed and PMC. Strong for biomedical NER, relation extraction, and classification on clinical and biology text.
ChemBERTa
Chemistry and molecular ML.
Domain BERT for chemistry (SMILES, IUPAC, reactions). Useful for property prediction, reaction outcome, and molecular representation.
Engineering & coding
Code generation, completion, and documentation for software and engineering.
Code Llama
Code completion, docs, and multi-language support.
Meta's code-specialized Llama for code completion, generation, and documentation. Supports multiple languages and fill-in-the-middle (FIM). Multiple size variants.
Context: 16K
Qwen2.5-Coder
Full-stack and multi-language code generation.
Qwen2.5 fine-tuned for code. Strong at completion, generation, and editing across many programming languages. Good at repo-level context.
Context: 32K
DeepSeek Coder
Code generation and FIM in IDEs.
DeepSeek's code models for generation and fill-in-the-middle. Strong on HumanEval and practical coding tasks. 1.3B to 33B sizes.
Context: 16K
StarCoder2
Code completion and open-weight code models.
BigCode's StarCoder2 for code completion and generation. Trained on The Stack; supports many languages. 3B, 7B, and 15B variants.
Codestral
High-quality code generation and editing.
Mistral's code-specialized model for generation, completion, and fill-in-the-middle. Strong on multiple languages and code reasoning.
Context: 32K
Magicoder
Instruction-following code generation.
OSS-oriented instruction-tuned code model. Strong on diverse programming tasks and instruction following for code.
DeepSeek R1 Coder
Reasoning-heavy coding and debugging.
DeepSeek's reasoning-focused coder for complex programming tasks. Chain-of-thought and plan-then-code style outputs.
Robotics & orchestration
Reasoning, planning, and task orchestration for automation and robotics workflows.
Qwen2.5 72B (reasoning)
Orchestration and multi-step reasoning backends.
Large Qwen2.5 for complex reasoning and planning. Useful as a brain for orchestration, workflow planning, and multi-step tasks. Strong tool use.
Context: 128K
Llama 3.1 70B
Heavy reasoning and agent backends.
Large Llama for reasoning and instruction. Suited for orchestration backends and agentic workflows requiring strong reasoning and long context.
Context: 128K
OpenVLA
Embodied AI and robot control from language.
Open vision-language-action model for robotics. Maps images and language to robot actions; for embodied and robotics research.
Muon (reasoning)
Complex planning and chain-of-thought.
Cognition's reasoning-oriented model for complex planning and chain-of-thought. Useful for orchestration and decision pipelines.
Qwen2.5 14B
Moderate-size orchestration and tool use.
Mid-size Qwen2.5 for reasoning and tool use. Good balance for orchestration when 72B is too large. Strong instruction and tool adherence.
Context: 128K
Command R
RAG-centric orchestration and grounding.
Cohere's Command R for RAG and long-context orchestration. Tuned for retrieval-augmented generation and enterprise workflows.
Context: 128K
