LLMs Explained for Builders

You don't need to know how a car engine works to drive well. Same with LLMs. But you do need to know what the gas pedal, brakes, and steering wheel do.

This guide covers what matters for people who use AI to build things.

The One-Paragraph Version

Large Language Models (LLMs) predict the next word in a sequence. They've read billions of documents, so they're very good at it. So good that "predicting the next word" looks a lot like "understanding and reasoning." They don't actually think — they pattern-match at superhuman scale.

That's it. Everything else is details.

What You Actually Need to Know

1. Context Windows = Short-Term Memory

Every LLM has a context window — the amount of text it can "see" at once.

  • GPT-4o: ~128K tokens (~300 pages)
  • Claude 3.5: ~200K tokens (~500 pages)
  • Gemini 1.5: ~1M tokens (~2,500 pages)

Why this matters for you: If you're building a product that needs to process long documents, context window size determines which model you choose. If you're doing simple chat, even 8K tokens is fine.

2. Temperature = Creativity vs. Consistency

Temperature controls randomness:

  • 0.0 = Deterministic. Same input → same output. Good for data extraction, classification.
  • 0.7 = Balanced. Good for most tasks.
  • 1.0+ = Creative. Good for brainstorming, but less reliable.

Why this matters for you: If your product needs consistent, reliable outputs (like a customer support bot), use low temperature. If it needs creative variety (like a writing tool), use higher temperature.

3. Prompting is Your API

The way you talk to the model IS the programming. This isn't a limitation — it's a feature.

Basic prompt patterns that matter:

// Role assignment
"You are a senior copywriter at a SaaS company..."

// Few-shot examples
"Here are 3 examples of good output: [examples]
Now do the same for: [input]"

// Chain of thought
"Think through this step by step before giving your answer."

// Output formatting
"Respond in JSON with these fields: title, summary, score"

4. Tokens ≠ Words

LLMs process tokens, not words. A token is roughly 3/4 of a word.

  • "Hello" = 1 token
  • "unbelievable" = 3 tokens
  • Code tends to use more tokens per "word"

Why this matters for you: Pricing is per-token. A 1,000-word article is roughly 1,333 tokens. At Claude's pricing (~$3/million input tokens), that's $0.004 to process. Basically free.

5. Fine-Tuning vs. Prompting vs. RAG

Three ways to customize AI behavior:

Approach When to Use Cost Effort
Prompting 90% of use cases $ Low
RAG (Retrieval) Need current/private data $$ Medium
Fine-Tuning Need consistent specialized behavior $$$ High

Start with prompting. Most people who think they need fine-tuning just need better prompts.

Common Mistakes

  1. Treating AI like a search engine. It's not looking things up — it's generating responses based on patterns. It can be confidently wrong.

  2. Not giving enough context. The more specific your prompt, the better the output. "Write a blog post" is bad. "Write a 500-word blog post about X for Y audience in Z tone" is good.

  3. Expecting perfection. AI output is a first draft. Always. Build your workflow around editing, not publishing raw output.

  4. Ignoring costs at scale. $0.004 per request seems free until you're doing 1 million requests/month. Model choice matters for production.

The Decision Framework

When choosing a model for your product:

  1. What's the task? (Classification, generation, analysis, conversation)
  2. What's the quality bar? (Perfect accuracy vs. "good enough")
  3. What's the volume? (10 requests/day vs. 10,000/hour)
  4. What's the latency requirement? (Real-time vs. batch processing)

Match these to a model. Don't default to the biggest, most expensive option.


This guide is part of FOMA's fundamentals series. We explain AI concepts for people who build things, not people who build models.