{"slug":"llama-4-scout","id":"llama-4-scout","type":"model","title":"Llama 4 Scout","description":"Meta's efficient open-source MoE model with 109B total parameters (17B active). Features the largest context window of any model at 10M tokens.","last_updated":"2026-04-10","last_verified":null,"verification_status":"unverified","markdown_url":"/content/models/llama-4-scout.md","html_url":"/models/llama-4-scout","api_url":"/api/v1/models/llama-4-scout.json","content_hash":"bd39ae1c5f0d58eac4f04ca023b65a63d4b1957f7f373f2f2c49d159eec1b42c","sha256":"bd39ae1c5f0d58eac4f04ca023b65a63d4b1957f7f373f2f2c49d159eec1b42c","provider":"Meta","pricing":{"input":"Free (self-hosted)","output":"Free (self-hosted)","free":true},"benchmarks":{"reasoning":80,"coding":79,"math":77,"writing":81,"multilingual":79,"speed":88},"tags":["meta","open-source","text","image"],"website":"https://llama.meta.com","release_date":"2025-04","relationships":{"links":[],"related":[{"id":"llama-4-maverick","title":"Llama 4 Maverick","type":"model","html_url":"/models/llama-4-maverick","markdown_url":"/content/models/llama-4-maverick.md","shared_tags":["meta","open-source","text","image"],"score":8},{"id":"gemma-3","title":"Gemma 3","type":"model","html_url":"/models/gemma-3","markdown_url":"/content/models/gemma-3.md","shared_tags":["open-source","text","image"],"score":5},{"id":"gemma-4","title":"Gemma 4","type":"model","html_url":"/models/gemma-4","markdown_url":"/content/models/gemma-4.md","shared_tags":["open-source","text","image"],"score":5},{"id":"mistral-small-4","title":"Mistral Small 4","type":"model","html_url":"/models/mistral-small-4","markdown_url":"/content/models/mistral-small-4.md","shared_tags":["open-source","text","image"],"score":5},{"id":"qwen-3.5","title":"Qwen 3.5 397B-A17B","type":"model","html_url":"/models/qwen-3.5","markdown_url":"/content/models/qwen-3.5.md","shared_tags":["open-source","text","image"],"score":5},{"id":"claude-haiku-4.5","title":"Claude Haiku 4.5","type":"model","html_url":"/models/claude-haiku-4.5","markdown_url":"/content/models/claude-haiku-4.5.md","shared_tags":["text","image"],"score":4}],"explicit":{}},"metadata":{"title":"Llama 4 Scout","type":"model","id":"llama-4-scout","provider":"Meta","model_type":"open-source","release_date":"2025-04","description":"Meta's efficient open-source MoE model with 109B total parameters (17B active). Features the largest context window of any model at 10M tokens.","last_updated":"2026-04-10","context_window":"10M tokens","website":"https://llama.meta.com","license":"Llama Community License","modality":["text","image"],"tags":["meta","open-source","text","image"],"pricing":{"input":"Free (self-hosted)","output":"Free (self-hosted)","free":true},"benchmarks":{"reasoning":80,"coding":79,"math":77,"writing":81,"multilingual":79,"speed":88},"parameters":"109B total (17B active)","hardware_requirements":"1x A100 80GB (FP16); single RTX 4090 with Q4 quantization","best_for":["Long-context applications","Fine-tuning","Edge deployment","Learning AI development"]},"content_text":"# Llama 4 Scout\n\nThe 10M token context window is the headline, and it's not a gimmick. Scout can ingest entire codebases, full legal document sets, or months of conversation history in a single pass -- no other model comes close. At 109B total parameters with only 17B active, it runs on a single A100 or a quantized RTX 4090.\n\nScout is the practical choice for teams that need long-context processing on their own hardware. The Llama Community License keeps it free for most commercial use, and the lightweight architecture means inference costs stay manageable even at massive context lengths. Speed at 88/100 is strong, making it viable for interactive applications despite the huge context.\n\nThe quality tradeoffs are real. Reasoning (80), coding (79), and math (77) are all a clear step below Maverick and well behind proprietary models. Scout is not the model you choose for hard problems -- it's the model you choose for problems that require absorbing enormous amounts of text before answering. Think retrieval-heavy RAG pipelines, long-form document QA, or codebase-wide search.\n\n**When to pick something else:** For anything quality-sensitive that doesn't require extreme context, Maverick is the better Llama. For coding, DeepSeek V3.2 at $0.27/$1.10 is both smarter and cheaper via API. Scout's unique value is that 10M context window -- if you don't need it, you're leaving quality on the table by choosing this over stronger models.","content_length":2346,"generated_at":"2026-04-24"}