Inference is what happens every time you call the OpenAI or Anthropic API. Training is expensive and rare; inference is cheap and continuous. Inference cost optimization — choosing the right model size for each task, caching frequent responses, batching requests — is a core skill for founders building AI products at scale, where API costs can become a meaningful percentage of COGS.