Models like Meta's Llama 3, Mistral, and Qwen can be self-hosted using tools like Ollama or deployed via providers like Together.ai and Groq. The tradeoff versus closed models (GPT-4o, Claude): lower cost and data privacy, but typically lower capability at the frontier. For non-sensitive, high-volume tasks, open-source models can reduce AI inference costs by 70–90%.