Agent Catalog
The planner selects from this catalog automatically during --dynamic runs. Constrain selection with --dynamic-agent-provider-ids ID1,ID2. Add providers via YAML directories or Python classes.
Harness column: inferred or explicit platform harness profile used for smoke and capability probes.
Regenerate tables: python docs/scripts/generate_agent_catalog_md.py.
Shipped providers
openai (3 providers)
| ID | Model | Role | Good for | Hardware | Env | GP | Harness |
|---|---|---|---|---|---|---|---|
gpt_reason |
gpt-4o | Staff Engineer | OpenAI API—trade-offs, architecture choices, structured reasoning, and | min_vram 0 GiB | OPENAI_API_KEY | reason | |
gpt_research |
gpt-4o-mini | Research Analyst | OpenAI API—research, comparisons, grounded summaries, and structured f | min_vram 0 GiB | OPENAI_API_KEY | ✓ | research |
gpt_write |
gpt-4o-mini | Technical Writer | OpenAI API—polish prose, briefings, executive summaries, and clear tec | min_vram 0 GiB | OPENAI_API_KEY | write |
anthropic (3 providers)
| ID | Model | Role | Good for | Hardware | Env | GP | Harness |
|---|---|---|---|---|---|---|---|
claude_reason |
claude-3-5-sonnet-20241022 | Staff Engineer | Anthropic Claude—trade-offs, architecture choices, structured reasonin | cpu, gpu, tpu | ANTHROPIC_API_KEY | reason | |
claude_research |
claude-3-5-haiku-20241022 | Research Analyst | Anthropic Claude—research, comparisons, grounded summaries, and struct | min_vram 0 GiB | ANTHROPIC_API_KEY | ✓ | research |
claude_write |
claude-3-5-haiku-20241022 | Technical Writer | Anthropic Claude—polish prose, briefings, executive summaries, and cle | min_vram 0 GiB | ANTHROPIC_API_KEY | write |
ollama (84 providers)
| ID | Model | Role | Good for | Hardware | Env | GP | Harness |
|---|---|---|---|---|---|---|---|
ollama_agribot |
ayansh03/agribot | Local Crop-Care Assistant | Agriculture chatbot for crop-care, plant disease triage, and irrigatio | min_vram 6 GiB | OLLAMA_HOST | general | |
ollama_agrillama |
sike_aditya/AgriLlama | Local Agriculture Assistant | Agriculture-tuned Ollama model for crop, soil, and irrigation support. | min_vram 4 GiB | OLLAMA_HOST | general | |
ollama_codegemma |
codegemma | Software Engineer | CodeGemma — Google code-specialized Gemma. | min_vram 8 GiB | OLLAMA_HOST | reason | |
ollama_codellama |
codellama | Software Engineer | Code Llama — Meta code completion and generation. | min_vram 8 GiB | OLLAMA_HOST | reason | |
ollama_codeqwen |
codeqwen | Software Engineer | CodeQwen — Qwen code variant. | min_vram 8 GiB | OLLAMA_HOST | reason | |
ollama_codestral |
codestral | Software Engineer | Codestral — Mistral code model. | min_vram 12 GiB | OLLAMA_HOST | reason | |
ollama_cogito |
cogito | Reasoning Assistant | Cogito — reasoning-oriented general model line. | min_vram 8 GiB | OLLAMA_HOST | general | |
ollama_command_r |
command-r | Research-Oriented Assistant | Cohere Command R — long-context, RAG-friendly general assistant. | min_vram 8 GiB | OLLAMA_HOST | research | |
ollama_deepcoder |
deepcoder | Software Engineer | DeepCoder — code-specialized line on Ollama library. | min_vram 8 GiB | OLLAMA_HOST | reason | |
ollama_deepscaler |
deepscaler | Math Assistant | DeepScaler — math/reasoning-tuned line. | min_vram 14 GiB | OLLAMA_HOST | general | |
ollama_deepseek_coder |
deepseek-coder | Software Engineer | DeepSeek Coder — strong competitive coding and repo-style tasks. | min_vram 8 GiB | OLLAMA_HOST | reason | |
ollama_deepseek_coder_v2 |
deepseek-coder-v2 | Senior Software Engineer | DeepSeek Coder V2 — larger coding model for complex patches. | min_vram 8 GiB | OLLAMA_HOST | reason | |
ollama_deepseek_llm |
deepseek-llm | General Assistant | DeepSeek LLM — earlier DeepSeek general base. | min_vram 8 GiB | OLLAMA_HOST | general | |
ollama_deepseek_r1 |
deepseek-r1 | Reasoning Specialist | DeepSeek R1 — chain-of-thought style reasoning; math, logic, proofs sk | min_vram 14 GiB | OLLAMA_HOST | general | |
ollama_deepseek_v2 |
deepseek-v2 | Analyst | DeepSeek V2 — prior DeepSeek generation for general chat. | min_vram 16 GiB | OLLAMA_HOST | general | |
ollama_deepseek_v3 |
deepseek-v3 | Senior Analyst | DeepSeek V3 — large general+reasoning; heavy but capable. | min_vram 16 GiB | OLLAMA_HOST | general | |
ollama_devstral |
devstral | Developer Tools Engineer | Devstral — dev-focused Mistral line for tooling workflows. | min_vram 12 GiB | OLLAMA_HOST | reason | |
ollama_dolphin3 |
dolphin3 | Unrestricted Assistant | Dolphin 3 — uncensored-tuned line; use only where policy allows. | min_vram 8 GiB | OLLAMA_HOST | general | |
ollama_falcon |
falcon | General Assistant | Falcon — earlier TII general line. | min_vram 8 GiB | OLLAMA_HOST | general | |
ollama_falcon3 |
falcon3 | General Assistant | Falcon 3 — TII general instruct family. | min_vram 8 GiB | OLLAMA_HOST | general | |
ollama_gemma |
gemma | General Assistant | Original Gemma — smaller/older; simple tasks and classification-style | min_vram 8 GiB | OLLAMA_HOST | general | |
ollama_gemma2 |
gemma2 | General Assistant | Gemma 2 — prior Gemma generation; stable general chat. | min_vram 8 GiB | OLLAMA_HOST | general | |
ollama_gemma3 |
gemma3 | General Assistant | Gemma 3 — Google open general model; good instruction following. | min_vram 8 GiB | OLLAMA_HOST | general | |
ollama_gemma3n |
gemma3n | General Assistant | Gemma 3n — efficient Gemma variant for lighter devices. | min_vram 8 GiB | OLLAMA_HOST | general | |
ollama_glm4 |
glm4 | Multilingual Analyst | GLM-4 — general multilingual (strong Chinese/English) chat. | min_vram 8 GiB | OLLAMA_HOST | general | |
ollama_glm_4_7_flash |
glm-4.7-flash | Fast Analyst | GLM 4.7 Flash — fast GLM line for interactive use. | min_vram 8 GiB | OLLAMA_HOST | general | |
ollama_gpt_oss |
gpt-oss | General Assistant | gpt-oss — open-weight models in OpenAI-style families on Ollama. | min_vram 8 GiB | OLLAMA_HOST | general | |
ollama_granite3_1_moe |
granite3.1-moe | Efficient Analyst | Granite 3.1 MoE — efficient MoE general model. | min_vram 8 GiB | OLLAMA_HOST | general | |
ollama_granite3_2_vision |
granite3.2-vision | Vision Analyst | Granite 3.2 Vision — IBM multimodal for enterprise visuals. | min_vram 10 GiB | OLLAMA_HOST | vision | |
ollama_granite3_3 |
granite3.3 | Enterprise Assistant | IBM Granite 3.3 — enterprise-leaning general instruct. | min_vram 8 GiB | OLLAMA_HOST | general | |
ollama_granite4 |
granite4 | Enterprise Assistant | Granite 4 — newer IBM Granite general line. | min_vram 16 GiB | OLLAMA_HOST | general | |
ollama_granite_code |
granite-code | Software Engineer | Granite Code — IBM code models for enterprise patterns. | min_vram 8 GiB | OLLAMA_HOST | reason | |
ollama_hermes3 |
hermes3 | General Assistant | Hermes 3 — Nous general instruct; tool-use friendly style. | min_vram 8 GiB | OLLAMA_HOST | general | |
ollama_lfm2 |
lfm2 | Efficient Assistant | LFM2 — Liquid AI efficient foundation model. | min_vram 8 GiB | OLLAMA_HOST | general | |
ollama_lfm2_5_thinking |
lfm2.5-thinking | Reasoning Assistant | LFM2.5 Thinking — thinking-augmented efficient model. | min_vram 14 GiB | OLLAMA_HOST | general | |
ollama_llama2 |
llama2 | General Assistant | Legacy Llama 2 — lighter hardware; ok for simple Q&A and drafts. | min_vram 8 GiB | OLLAMA_HOST | general | |
ollama_llama3 |
llama3 | General Assistant | Llama 3 base family — general-purpose local assistant. | min_vram 8 GiB | OLLAMA_HOST | general | |
ollama_llama3_1 |
llama3.1 | General Assistant | Llama 3.1 family — general chat, longer context than 3.2 for many tags | min_vram 8 GiB | OLLAMA_HOST | general | |
ollama_llama3_2 |
llama3.2 | General Assistant | Default local generalist; planning, Q&A, summaries, light analysis. | min_vram 8 GiB | OLLAMA_HOST | general | |
ollama_llama3_2_vision |
llama3.2-vision | Vision Analyst | Llama 3.2 Vision — Meta multimodal; images + instructions. | min_vram 10 GiB | OLLAMA_HOST | vision | |
ollama_llama3_3 |
llama3.3 | Senior Generalist | Stronger Llama 3.3 for harder general reasoning and longer outputs. | min_vram 8 GiB | OLLAMA_HOST | general | |
ollama_llama4 |
llama4 | Senior Generalist | Llama 4 when available — frontier-class local general model (large dow | min_vram 8 GiB | OLLAMA_HOST | general | |
ollama_llava |
llava | Vision Analyst | LLaVA — image+text; describe screenshots, diagrams, UI. | min_vram 10 GiB | OLLAMA_HOST | general | |
ollama_llava_llama3 |
llava-llama3 | Vision Analyst | LLaVA Llama 3 — stronger LLaVA backbone for vision QA. | min_vram 10 GiB | OLLAMA_HOST | general | |
ollama_magistral |
magistral | Reasoning Specialist | Magistral — Mistral reasoning line. | min_vram 16 GiB | OLLAMA_HOST | general | |
ollama_minicpm_v |
minicpm-v | Efficient Vision Assistant | MiniCPM-V — efficient vision-language for edge. | min_vram 10 GiB | OLLAMA_HOST | general | |
ollama_ministral_3 |
ministral-3 | General Assistant | Ministral 3 — efficient Mistral line for edge and fast iteration. | min_vram 8 GiB | OLLAMA_HOST | general | |
ollama_mistral |
mistral | General Assistant | Mistral 7B-class general instruct; fast, good default for many subject | min_vram 8 GiB | OLLAMA_HOST | general | |
ollama_mistral_large |
mistral-large | Senior Generalist | Mistral Large — demanding analysis, writing, and reasoning locally. | min_vram 8 GiB | OLLAMA_HOST | general | |
ollama_mistral_nemo |
mistral-nemo | General Assistant | Mistral Nemo — strong multilingual and general instruct mid-size. | min_vram 8 GiB | OLLAMA_HOST | general | |
ollama_mistral_small |
mistral-small | General Assistant | Mistral Small — balanced speed/quality for everyday tasks. | min_vram 8 GiB | OLLAMA_HOST | general | |
ollama_mixtral |
mixtral | Senior Generalist | Mixtral MoE — stronger general quality when you have RAM/VRAM. | min_vram 16 GiB | OLLAMA_HOST | general | |
ollama_moondream |
moondream | Lightweight Vision Assistant | Moondream — tiny VLM for quick image Q&A. | min_vram 10 GiB | OLLAMA_HOST | general | |
ollama_nous_hermes |
nous-hermes | General Assistant | Nous Hermes — earlier Nous instruct line. | min_vram 8 GiB | OLLAMA_HOST | general | |
ollama_olmo2 |
olmo2 | Research Assistant | OLMo 2 — open research LM; general knowledge tasks. | min_vram 8 GiB | OLLAMA_HOST | research | |
ollama_openchat |
openchat | Conversational Assistant | OpenChat — conversational, assistant-style dialogue. | min_vram 8 GiB | OLLAMA_HOST | general | |
ollama_openhermes |
openhermes | General Assistant | OpenHermes — Mistral-based instruct tuning. | min_vram 8 GiB | OLLAMA_HOST | general | |
ollama_openthinker |
openthinker | Reasoning Specialist | OpenThinker — open reasoning-style assistant. | min_vram 14 GiB | OLLAMA_HOST | general | |
ollama_orca_mini |
orca-mini | Lightweight Assistant | Orca Mini — tiny model for demos and smoke tests. | min_vram 8 GiB | OLLAMA_HOST | general | |
ollama_phi |
phi | Lightweight Assistant | Legacy Phi — very small; trivial classification and micro-tasks. | min_vram 8 GiB | OLLAMA_HOST | general | |
ollama_phi3 |
phi3 | Efficient Assistant | Phi-3 — Microsoft small model; fast reasoning on modest hardware. | min_vram 8 GiB | OLLAMA_HOST | general | |
ollama_phi4 |
phi4 | Analyst | Phi-4 — stronger small Microsoft model for reasoning and instruction. | min_vram 8 GiB | OLLAMA_HOST | general | |
ollama_phi4_mini |
phi4-mini | Efficient Assistant | Phi-4 mini — smallest Phi-4 line for edge and high throughput. | min_vram 8 GiB | OLLAMA_HOST | general | |
ollama_phi4_reasoning |
phi4-reasoning | Reasoning Assistant | Phi-4 reasoning — Microsoft small reasoning specialist. | min_vram 14 GiB | OLLAMA_HOST | general | |
ollama_qwen |
qwen | General Assistant | Qwen base — legacy general Qwen family. | min_vram 8 GiB | OLLAMA_HOST | general | |
ollama_qwen2 |
qwen2 | General Analyst | Qwen 2 — earlier Qwen general; still useful for many languages. | min_vram 8 GiB | OLLAMA_HOST | general | |
ollama_qwen2_5 |
qwen2.5 | General Analyst | Qwen 2.5 general — strong multilingual and STEM-friendly chat. | min_vram 8 GiB | OLLAMA_HOST | general | |
ollama_qwen2_5_coder |
qwen2.5-coder | Software Engineer | Primary code model — implementation, refactors, scripts, APIs. | min_vram 8 GiB | OLLAMA_HOST | reason | |
ollama_qwen2_5vl |
qwen2.5vl | Vision Analyst | Qwen2.5-VL — strong document and scene understanding. | min_vram 10 GiB | OLLAMA_HOST | general | |
ollama_qwen3 |
qwen3 | General Analyst | Qwen 3 — newer general Qwen for harder questions and coding-adjacent c | cpu, gpu | OLLAMA_HOST | general | |
ollama_qwen3_5 |
qwen3.5 | Senior Analyst | Qwen 3.5 — upgraded Qwen line for demanding general tasks. | min_vram 8 GiB | OLLAMA_HOST | general | |
ollama_qwen3_coder |
qwen3-coder | Software Engineer | Qwen3 Coder — newer coding-focused Qwen. | min_vram 8 GiB | OLLAMA_HOST | reason | |
ollama_qwen3_coder_next |
qwen3-coder-next | Senior Software Engineer | Qwen3 Coder Next — latest Qwen coding line when available. | min_vram 12 GiB | OLLAMA_HOST | reason | |
ollama_qwen3_vl |
qwen3-vl | Vision Analyst | Qwen3-VL — newer Qwen vision-language line. | min_vram 10 GiB | OLLAMA_HOST | general | |
ollama_qwq |
qwq | Reasoning Specialist | QwQ — Qwen reasoning model for hard puzzles and math. | min_vram 16 GiB | OLLAMA_HOST | general | |
ollama_smollm |
smollm | Lightweight Assistant | SmolLM — first-gen small LM line. | min_vram 4 GiB | OLLAMA_HOST | general | |
ollama_smollm2 |
smollm2 | Efficient Assistant | SmolLM2 — Hugging Face small LM for edge. | min_vram 4 GiB | OLLAMA_HOST | general | |
ollama_starcoder |
starcoder | Software Engineer | StarCoder — first-gen BigCode model. | min_vram 8 GiB | OLLAMA_HOST | reason | |
ollama_starcoder2 |
starcoder2 | Software Engineer | StarCoder2 — BigCode family for code generation. | min_vram 8 GiB | OLLAMA_HOST | reason | |
ollama_tinyllama |
tinyllama | Micro Assistant | TinyLlama — very small; prototyping only. | min_vram 4 GiB | OLLAMA_HOST | general | |
ollama_translategemma |
translategemma | Translator | TranslateGemma — translation-focused Gemma; parallel text, localizatio | min_vram 8 GiB | OLLAMA_HOST | general | |
ollama_wizardlm2 |
wizardlm2 | General Assistant | WizardLM 2 — complex instruction following. | min_vram 8 GiB | OLLAMA_HOST | general | |
ollama_yi |
yi | Multilingual Assistant | Yi — bilingual EN/ZH capable general models. | min_vram 8 GiB | OLLAMA_HOST | general | |
ollama_zephyr |
zephyr | General Assistant | Zephyr — alignment-tuned small chat model. | min_vram 8 GiB | OLLAMA_HOST | general |
huggingface (62 providers)
| ID | Model | Role | Good for | Hardware | Env | GP | Harness |
|---|---|---|---|---|---|---|---|
hf_codellama_7b |
codellama/CodeLlama-7b-Instruct-hf | Code Specialist | HF Hub—Code Llama 7B instruct. | min_vram 0 GiB | HF_TOKEN | general | |
hf_cohere_aya_8b |
CohereForAI/aya-expanse-8b | Multilingual Specialist | HF Hub—Cohere Aya 8B multilingual. | min_vram 0 GiB | HF_TOKEN | general | |
hf_cohere_command_r |
CohereLabs/c4ai-command-r7b-12-2024 | RAG-Friendly Assistant | HF Hub—Cohere Command R (7B Dec’24 route via Inference Providers); RAG | min_vram 0 GiB | HF_TOKEN | general | |
hf_deepseek_coder_6_7b |
deepseek-ai/deepseek-coder-6.7b-instruct | Coder | HF Hub—DeepSeek Coder 6.7B for code-heavy tasks. | min_vram 0 GiB | HF_TOKEN | coding | |
hf_deepseek_r1_distill_llama_8b |
deepseek-ai/DeepSeek-R1-Distill-Llama-8B | Reasoning Assistant | HF Hub—R1-distilled Llama; good for math-like steps. | min_vram 0 GiB | HF_TOKEN | general | |
hf_deepseek_r1_distill_qwen_7b |
deepseek-ai/DeepSeek-R1-Distill-Qwen-7B | Reasoning Assistant | HF Hub—R1-distilled Qwen; concise chain-of-thought style. | min_vram 0 GiB | HF_TOKEN | general | |
hf_deepseek_v25 |
deepseek-ai/DeepSeek-V2.5 | General Assistant | HF Hub—DeepSeek V2.5 general/chat; larger context tasks. | min_vram 0 GiB | HF_TOKEN | general | |
hf_falcon_7b |
tiiuae/falcon-7b-instruct | General Chat | HF Hub—Falcon 7B instruct baseline. | min_vram 0 GiB | HF_TOKEN | general | |
hf_garden_agri_chat_multilingual |
mesabo/agri-chat-multilingual | Multilingual Extension-Style Gardening Advisor | Agriculture-focused multilingual chat model for extension-like guidanc | min_vram 0 GiB | HF_TOKEN | general | |
hf_garden_agriassist_llm |
sikeaditya/AgriAssist_LLM | Applied Agronomy Advisor | Domain-oriented agriculture/gardening fine-tune intended for crop-care | min_vram 0 GiB | HF_TOKEN | general | |
hf_garden_agriculture_advisory_8b |
Navinaa21/Agriculture-Advisory-LLM-8B | Crop and Vegetation Advisory Specialist | Agriculture advisory LLM for medium-depth guidance on crop management, | min_vram 0 GiB | HF_TOKEN | general | |
hf_garden_agriparam |
bharatgenai/AgriParam | Agriculture Decision-Support Advisor | Agriculture decision-support assistant tuned for agronomy and farm adv | min_vram 0 GiB | HF_TOKEN | general | |
hf_garden_diagnostics_qwen2_vl_7b |
Qwen/Qwen2-VL-7B-Instruct | Plant Health Diagnostics Specialist | Vision-capable gardening diagnostics model for leaf/stem/fruit image a | min_vram 0 GiB | HF_TOKEN | general | |
hf_garden_fast_triage_gemma3_4b |
google/gemma-3-4b-it | Rapid Garden Triage Assistant | Fast first-pass gardening triage for quick follow-up questions, checkl | min_vram 0 GiB | HF_TOKEN | general | |
hf_garden_generalist_qwen25_14b |
Qwen/Qwen2.5-14B-Instruct | Gardening Planning Advisor | Gardening generalist for practical home-garden guidance (plant selecti | min_vram 0 GiB | HF_TOKEN | general | |
hf_garden_irrigation_phi3_mini |
YuvrajSingh9886/phi3-mini-fine-tuned-agr | Irrigation Optimization Advisor | Irrigation-focused agriculture Q&A. Use for turf/zone watering **minut | min_vram 0 GiB | HF_TOKEN | general | |
hf_garden_leaf_disease_vlm |
YuchengShi/LLaVA-v1.5-7B-Plant-Leaf-Dise | Leaf Disease Visual Analyst | Plant leaf disease vision-language specialist fine-tuned for symptom r | min_vram 0 GiB | HF_TOKEN | vision | |
hf_garden_multilingual_aya_8b |
CohereForAI/aya-expanse-8b | Multilingual Gardening Support Agent | Multilingual gardening advisor for non-English or mixed-language suppo | min_vram 0 GiB | HF_TOKEN | general | |
hf_garden_reasoning_r1_qwen7b |
deepseek-ai/DeepSeek-R1-Distill-Qwen-7B | Garden Root-Cause Analyst | Deep troubleshooting model for complex gardening failures with multipl | min_vram 0 GiB | HF_TOKEN | general | |
hf_garden_soil_water_command_r |
CohereLabs/c4ai-command-r7b-12-2024 | Soil and Irrigation Planner | Primary pick for turf/lawn watering decisions—including zone watering | min_vram 0 GiB | HF_TOKEN | general | |
hf_gemma2_27b |
google/gemma-2-27b-it | Reasoning Assistant | HF Hub—Gemma 2 27B; stronger reasoning than 9B line. | min_vram 0 GiB | HF_TOKEN | general | |
hf_gemma2_9b |
google/gemma-2-9b-it | Research Assistant | HF Hub—Gemma 2 9B instruct; Google Gemma chat. | min_vram 0 GiB | HF_TOKEN | research | |
hf_gemma3_4b |
google/gemma-3-4b-it | Compact Assistant | HF Hub—compact Gemma 3 4B for edge-like cloud calls. | min_vram 0 GiB | HF_TOKEN | general | |
hf_hermes_llama31_8b |
NousResearch/Hermes-3-Llama-3.1-8B | Tool-Aware Assistant | HF Hub—Hermes 3 on Llama 3.1 8B; tool-friendly tendencies. | min_vram 0 GiB | HF_TOKEN | general | |
hf_ibm_granite_8b |
ibm-granite/granite-3.1-8b-instruct | Enterprise Coder-Analyst | HF Hub—IBM Granite 3.1 8B instruct. | min_vram 0 GiB | HF_TOKEN | general | |
hf_internlm25_7b |
internlm/internlm2_5-7b-chat | Research Chat | HF Hub—InternLM2.5 7B chat. | min_vram 0 GiB | HF_TOKEN | research | |
hf_llama_3_2_11b_vision |
meta-llama/Llama-3.2-11B-Vision-Instruct | Vision-Language Assistant | HF Hub—vision+language; describe images and charts when the crew passe | min_vram 0 GiB | HF_TOKEN | vision | |
hf_llama_3_2_3b |
meta-llama/Llama-3.2-3B-Instruct | Lightweight Assistant | HF Hub—small fast Llama 3.2 for quick drafts and classification. | min_vram 0 GiB | HF_TOKEN | general | |
hf_llama_3_3_70b |
meta-llama/Llama-3.3-70B-Instruct | Senior Analyst | HF Hub—Llama 3.3 70B; heavier reasoning and long-context tasks. | min_vram 0 GiB | HF_TOKEN | general | |
hf_magistral_small |
mistralai/Magistral-Small-2509 | Reasoning Assistant | HF Hub—Mistral Magistral Small reasoning-oriented line. | min_vram 0 GiB | HF_TOKEN | general | |
hf_meta_llama_3_1_70b |
meta-llama/Meta-Llama-3.1-70B-Instruct | Lead Assistant | HF Hub—Llama 3.1 70B instruct flagship tier. | min_vram 0 GiB | HF_TOKEN | general | |
hf_meta_llama_3_1_8b |
meta-llama/Meta-Llama-3.1-8B-Instruct | Instruction-Following Assistant | HF Hub—Llama 3.1 8B; strong instruction following. | min_vram 0 GiB | HF_TOKEN | general | |
hf_meta_llama_3_8b |
meta-llama/Meta-Llama-3-8B-Instruct | General Assistant | HF Hub—Llama 3 8B instruct; balanced chat and reasoning. | min_vram 0 GiB | HF_TOKEN | general | |
hf_mistral_7b_v3 |
mistralai/Mistral-7B-Instruct-v0.3 | Chat Specialist | HF Hub—Mistral 7B v0.3 general instruct chat. | min_vram 0 GiB | HF_TOKEN | general | |
hf_mistral_small_24b |
mistralai/Mistral-Small-24B-Instruct-250 | Technical Generalist | HF Hub—Mistral Small 24B instruct; good mid-size workhorse. | min_vram 0 GiB | HF_TOKEN | general | |
hf_mixtral_8x7b |
mistralai/Mixtral-8x7B-Instruct-v0.1 | MoE Generalist | HF Hub—Mixtral MoE 8x7B; stronger quality at medium cost. | min_vram 0 GiB | HF_TOKEN | general | |
hf_nemotron_70b |
nvidia/Llama-3.1-Nemotron-70B-Instruct-H | Technical Advisor | HF Hub—Nemotron 70B instruct; NVIDIA-tuned Llama family. | min_vram 0 GiB | HF_TOKEN | general | |
hf_olmo2_7b |
allenai/OLMo-2-1124-7B-Instruct | Open Science Assistant | HF Hub—OLMo 2 7B instruct (Allen AI). | min_vram 0 GiB | HF_TOKEN | general | |
hf_openchat_35 |
openchat/openchat-3.5-0106 | Conversational Assistant | HF Hub—OpenChat 3.5 conversation model. | min_vram 0 GiB | HF_TOKEN | general | |
hf_openhermes_25_7b |
teknium/OpenHermes-2.5-Mistral-7B | General Instruct | HF Hub—OpenHermes 2.5 on Mistral 7B. | min_vram 0 GiB | HF_TOKEN | general | |
hf_phi3_medium |
microsoft/Phi-3-medium-4k-instruct | Explainer | HF Hub—Phi-3 medium instruct for longer explanations. | min_vram 0 GiB | HF_TOKEN | general | |
hf_phi3_mini |
microsoft/Phi-3-mini-4k-instruct | Analyst | HF Hub—Phi-3 mini; strong small model for reasoning snippets. | min_vram 0 GiB | HF_TOKEN | general | |
hf_phi4_mini |
microsoft/Phi-4-mini-instruct | Assistant | HF Hub—Phi-4 mini; Microsoft small instruct model. | min_vram 0 GiB | HF_TOKEN | general | |
hf_qwen25_14b |
Qwen/Qwen2.5-14B-Instruct | Generalist | HF Hub—Qwen2.5 14B; stronger general instruct. | min_vram 0 GiB | HF_TOKEN | general | |
hf_qwen25_72b |
Qwen/Qwen2.5-72B-Instruct | Research Lead | HF Hub—Qwen2.5 72B; heavy lifting for research-grade answers. | min_vram 0 GiB | HF_TOKEN | research | |
hf_qwen25_7b |
Qwen/Qwen2.5-7B-Instruct | Multilingual Assistant | HF Hub—Qwen2.5 7B instruct; multilingual general tasks. | min_vram 0 GiB | HF_TOKEN | general | |
hf_qwen25_coder_32b |
Qwen/Qwen2.5-Coder-32B-Instruct | Staff Engineer | HF Hub—Qwen2.5 Coder 32B; larger coding model. | min_vram 0 GiB | HF_TOKEN | reason | |
hf_qwen25_coder_7b |
Qwen/Qwen2.5-Coder-7B-Instruct | Code Assistant | HF Hub—Qwen2.5 Coder 7B; code completion and refactoring hints. | min_vram 0 GiB | HF_TOKEN | coding | |
hf_qwen2_vl_7b |
Qwen/Qwen2-VL-7B-Instruct | Vision Assistant | HF Hub—Qwen2-VL multimodal text+image understanding. | min_vram 0 GiB | HF_TOKEN | general | |
hf_qwen3_8b |
Qwen/Qwen3-8B | Assistant | HF Hub—Qwen3 8B family; modern Qwen chat/reasoning. | min_vram 0 GiB | HF_TOKEN | general | |
hf_sambanova_qwen25_72b |
sambanova/Qwen/Qwen2.5-72B-Instruct | Heavyweight Generalist | HF Hub via Sambanova—Qwen2.5 72B instruct. | min_vram 0 GiB | HF_TOKEN | general | |
hf_smollm2_1_7b |
HuggingFaceTB/SmolLM2-1.7B-Instruct | Light Assistant | HF Hub—SmolLM2 tiny instruct for cheap passes. | min_vram 0 GiB | HF_TOKEN | general | |
hf_snowflake_arctic |
snowflake/snowflake-arctic-instruct | Enterprise Assistant | HF Hub—Snowflake Arctic instruct for enterprise-flavored QA. | min_vram 0 GiB | HF_TOKEN | general | |
hf_solar_10b |
upstage/SOLAR-10.7B-Instruct-v1.0 | Instruct Model | HF Hub—SOLAR 10.7B instruct (Upstage). | min_vram 0 GiB | HF_TOKEN | general | |
hf_starchat2_15b |
HuggingFaceH4/starchat2-15b-v0.1 | Code Chat | HF Hub—StarChat2 code conversation. | min_vram 0 GiB | HF_TOKEN | general | |
hf_starcoder2_15b |
bigcode/starcoder2-15b | Code Generator | HF Hub—StarCoder2 15B for code generation. | min_vram 0 GiB | HF_TOKEN | coding | |
hf_tinyllama_1b |
TinyLlama/TinyLlama-1.1B-Chat-v1.0 | Micro Assistant | HF Hub—TinyLlama 1.1B for ultra-cheap generations. | min_vram 0 GiB | HF_TOKEN | general | |
hf_together_deepseek_r1 |
together/deepseek-ai/DeepSeek-R1 | Reasoning Model | HF Hub via Together provider route—DeepSeek R1 class reasoning. | min_vram 0 GiB | HF_TOKEN | general | |
hf_together_llama3_70b |
together/meta-llama/Meta-Llama-3-70B-Ins | Large Chat Model | HF Hub via Together—Llama 3 70B instruct. | min_vram 0 GiB | HF_TOKEN | general | |
hf_vicuna_7b |
lmsys/vicuna-7b-v1.5 | Chat Model | HF Hub—Vicuna v1.5 7B chat baseline. | min_vram 0 GiB | HF_TOKEN | general | |
hf_yi_15_9b |
01-ai/Yi-1.5-9B-Chat-16K | Bilingual Assistant | HF Hub—Yi 1.5 9B chat with 16k flavor. | min_vram 0 GiB | HF_TOKEN | general | |
hf_zephyr_7b |
HuggingFaceH4/zephyr-7b-beta | Helpful Assistant | HF Hub—Zephyr 7B aligned chat. | min_vram 0 GiB | HF_TOKEN | general |
vllm (8 providers)
| ID | Model | Role | Good for | Hardware | Env | GP | Harness |
|---|---|---|---|---|---|---|---|
vllm_tpu_google_gemma_3_27b_it |
google/gemma-3-27b-it | TPU Inference Specialist | vLLM TPU recommended model: google/gemma-3-27b-it. | tpu | VLLM_BASE_URL | general | |
vllm_tpu_meta_llama_llama_3_1_8b_instruct |
meta-llama/Llama-3.1-8B-Instruct | TPU Inference Specialist | vLLM TPU recommended model: meta-llama/Llama-3.1-8B-Instruct. | tpu | VLLM_BASE_URL | general | |
vllm_tpu_meta_llama_llama_3_3_70b_instruct |
meta-llama/Llama-3.3-70B-Instruct | TPU Inference Specialist | vLLM TPU recommended model: meta-llama/Llama-3.3-70B-Instruct. | tpu | VLLM_BASE_URL | general | |
vllm_tpu_meta_llama_llama_guard_4_12b |
meta-llama/Llama-Guard-4-12B | TPU Inference Specialist | vLLM TPU recommended model: meta-llama/Llama-Guard-4-12B. | tpu | VLLM_BASE_URL | general | |
vllm_tpu_qwen_qwen2_5_vl_7b_instruct |
Qwen/Qwen2.5-VL-7B-Instruct | TPU Inference Specialist | vLLM TPU recommended model: Qwen/Qwen2.5-VL-7B-Instruct. | tpu | VLLM_BASE_URL | general | |
vllm_tpu_qwen_qwen3_30b_a3b |
Qwen/Qwen3-30B-A3B | TPU Inference Specialist | vLLM TPU recommended model: Qwen/Qwen3-30B-A3B. | tpu | VLLM_BASE_URL | general | |
vllm_tpu_qwen_qwen3_32b |
Qwen/Qwen3-32B | TPU Inference Specialist | vLLM TPU recommended model: Qwen/Qwen3-32B. | tpu | VLLM_BASE_URL | general | |
vllm_tpu_qwen_qwen3_4b |
Qwen/Qwen3-4B | TPU Inference Specialist | vLLM TPU recommended model: Qwen/Qwen3-4B. | tpu | VLLM_BASE_URL | general |
jetstream (22 providers)
| ID | Model | Role | Good for | Hardware | Env | GP | Harness |
|---|---|---|---|---|---|---|---|
jetstream_tpu_google_gemma_2b |
google/gemma-2b | TPU Inference Specialist | JetStream PyTorch listed model: google/gemma-2b. | tpu | JETSTREAM_BASE_URL | general | |
jetstream_tpu_google_gemma_2b_it |
google/gemma-2b-it | TPU Inference Specialist | JetStream PyTorch listed model: google/gemma-2b-it. | tpu | JETSTREAM_BASE_URL | general | |
jetstream_tpu_google_gemma_7b |
google/gemma-7b | TPU Inference Specialist | JetStream PyTorch listed model: google/gemma-7b. | tpu | JETSTREAM_BASE_URL | general | |
jetstream_tpu_google_gemma_7b_it |
google/gemma-7b-it | TPU Inference Specialist | JetStream PyTorch listed model: google/gemma-7b-it. | tpu | JETSTREAM_BASE_URL | general | |
jetstream_tpu_meta_llama_llama_2_13b_chat_hf |
meta-llama/Llama-2-13b-chat-hf | TPU Inference Specialist | JetStream PyTorch listed model: meta-llama/Llama-2-13b-chat-hf. | tpu | JETSTREAM_BASE_URL | general | |
jetstream_tpu_meta_llama_llama_2_13b_hf |
meta-llama/Llama-2-13b-hf | TPU Inference Specialist | JetStream PyTorch listed model: meta-llama/Llama-2-13b-hf. | tpu | JETSTREAM_BASE_URL | general | |
jetstream_tpu_meta_llama_llama_2_70b_chat_hf |
meta-llama/Llama-2-70b-chat-hf | TPU Inference Specialist | JetStream PyTorch listed model: meta-llama/Llama-2-70b-chat-hf. | tpu | JETSTREAM_BASE_URL | general | |
jetstream_tpu_meta_llama_llama_2_70b_hf |
meta-llama/Llama-2-70b-hf | TPU Inference Specialist | JetStream PyTorch listed model: meta-llama/Llama-2-70b-hf. | tpu | JETSTREAM_BASE_URL | general | |
jetstream_tpu_meta_llama_llama_2_7b_chat_hf |
meta-llama/Llama-2-7b-chat-hf | TPU Inference Specialist | JetStream PyTorch listed model: meta-llama/Llama-2-7b-chat-hf. | tpu | JETSTREAM_BASE_URL | general | |
jetstream_tpu_meta_llama_llama_2_7b_hf |
meta-llama/Llama-2-7b-hf | TPU Inference Specialist | JetStream PyTorch listed model: meta-llama/Llama-2-7b-hf. | tpu | JETSTREAM_BASE_URL | general | |
jetstream_tpu_meta_llama_llama_3_1_8b |
meta-llama/Llama-3.1-8B | TPU Inference Specialist | JetStream PyTorch listed model: meta-llama/Llama-3.1-8B. | tpu | JETSTREAM_BASE_URL | general | |
jetstream_tpu_meta_llama_llama_3_1_8b_instruct |
meta-llama/Llama-3.1-8B-Instruct | TPU Inference Specialist | JetStream PyTorch listed model: meta-llama/Llama-3.1-8B-Instruct. | tpu | JETSTREAM_BASE_URL | general | |
jetstream_tpu_meta_llama_llama_3_2_1b |
meta-llama/Llama-3.2-1B | TPU Inference Specialist | JetStream PyTorch listed model: meta-llama/Llama-3.2-1B. | tpu | JETSTREAM_BASE_URL | general | |
jetstream_tpu_meta_llama_llama_3_2_1b_instruct |
meta-llama/Llama-3.2-1B-Instruct | TPU Inference Specialist | JetStream PyTorch listed model: meta-llama/Llama-3.2-1B-Instruct. | tpu | JETSTREAM_BASE_URL | general | |
jetstream_tpu_meta_llama_llama_3_3_70b |
meta-llama/Llama-3.3-70B | TPU Inference Specialist | JetStream PyTorch listed model: meta-llama/Llama-3.3-70B. | tpu | JETSTREAM_BASE_URL | general | |
jetstream_tpu_meta_llama_llama_3_3_70b_instruct |
meta-llama/Llama-3.3-70B-Instruct | TPU Inference Specialist | JetStream PyTorch listed model: meta-llama/Llama-3.3-70B-Instruct. | tpu | JETSTREAM_BASE_URL | general | |
jetstream_tpu_meta_llama_meta_llama_3_70b |
meta-llama/Meta-Llama-3-70B | TPU Inference Specialist | JetStream PyTorch listed model: meta-llama/Meta-Llama-3-70B. | tpu | JETSTREAM_BASE_URL | general | |
jetstream_tpu_meta_llama_meta_llama_3_70b_instruct |
meta-llama/Meta-Llama-3-70B-Instruct | TPU Inference Specialist | JetStream PyTorch listed model: meta-llama/Meta-Llama-3-70B-Instruct. | tpu | JETSTREAM_BASE_URL | general | |
jetstream_tpu_meta_llama_meta_llama_3_8b |
meta-llama/Meta-Llama-3-8B | TPU Inference Specialist | JetStream PyTorch listed model: meta-llama/Meta-Llama-3-8B. | tpu | JETSTREAM_BASE_URL | general | |
jetstream_tpu_meta_llama_meta_llama_3_8b_instruct |
meta-llama/Meta-Llama-3-8B-Instruct | TPU Inference Specialist | JetStream PyTorch listed model: meta-llama/Meta-Llama-3-8B-Instruct. | tpu | JETSTREAM_BASE_URL | general | |
jetstream_tpu_mistralai_mixtral_8x7b_instruct_v0_1 |
mistralai/Mixtral-8x7B-Instruct-v0.1 | TPU Inference Specialist | JetStream PyTorch listed model: mistralai/Mixtral-8x7B-Instruct-v0.1. | tpu | JETSTREAM_BASE_URL | general | |
jetstream_tpu_mistralai_mixtral_8x7b_v0_1 |
mistralai/Mixtral-8x7B-v0.1 | TPU Inference Specialist | JetStream PyTorch listed model: mistralai/Mixtral-8x7B-v0.1. | tpu | JETSTREAM_BASE_URL | general |