Llama

Privacy: Run it on your own VPC (AWS Bedrock, Azure, or self-hosted).
Fine-Tuning: It is the default base model for fine-tuning on domain data.
Cost: Inference on Groq/Together AI is significantly cheaper than GPT.

Meta Llama is the king of Open Weights models. Llama 4 (2025) pushes 405B+ parameters, rivaling closed models like GPT-5.

When to Use

Running models at 4-bit or 8-bit precision to fit in VRAM with minimal quality loss (GGUF, EXL2).

Standardized tooling for building agentic apps on Llama.

Do:

Use via API: Groq (LPU) runs Llama Instantaneously (>1000 tok/s).
Fine-Tune 8B: For specific tasks (classification, SQL generation), a fine-tuned 8B beats a generic 70B.

Don't: