LLMs Explained for Builders

You don't need to know how a car engine works to drive well. Same with LLMs. But you do need to know what the gas pedal, brakes, and steering wheel do.

This guide covers what matters for people who use AI to build things.

The One-Paragraph Version

Large Language Models (LLMs) predict the next word in a sequence. They've read billions of documents, so they're very good at it. So good that "predicting the next word" looks a lot like "understanding and reasoning." They don't actually think — they pattern-match at superhuman scale.

That's it. Everything else is details.

What You Actually Need to Know

1. Context Windows = Short-Term Memory

Every LLM has a context window — the amount of text it can "see" at once.

GPT-4o: ~128K tokens (~300 pages)
Claude 3.5: ~200K tokens (~500 pages)
Gemini 1.5: ~1M tokens (~2,500 pages)

Why this matters for you: If you're building a product that needs to process long documents, context window size determines which model you choose. If you're doing simple chat, even 8K tokens is fine.

2. Temperature = Creativity vs. Consistency

Temperature controls randomness:

0.0 = Deterministic. Same input → same output. Good for data extraction, classification.
0.7 = Balanced. Good for most tasks.
1.0+ = Creative. Good for brainstorming, but less reliable.

Why this matters for you: If your product needs consistent, reliable outputs (like a customer support bot), use low temperature. If it needs creative variety (like a writing tool), use higher temperature.

3. Prompting is Your API

The way you talk to the model IS the programming. This isn't a limitation — it's a feature.

Basic prompt patterns that matter:

// Role assignment
"You are a senior copywriter at a SaaS company..."

// Few-shot examples
"Here are 3 examples of good output: [examples]
Now do the same for: [input]"

// Chain of thought
"Think through this step by step before giving your answer."

// Output formatting
"Respond in JSON with these fields: title, summary, score"

4. Tokens ≠ Words

LLMs process tokens, not words. A token is roughly 3/4 of a word.

"Hello" = 1 token
"unbelievable" = 3 tokens
Code tends to use more tokens per "word"

Why this matters for you: Pricing is per-token. A 1,000-word article is roughly 1,333 tokens. At Claude's pricing (~$3/million input tokens), that's $0.004 to process. Basically free.

5. Fine-Tuning vs. Prompting vs. RAG

Three ways to customize AI behavior:

Approach	When to Use	Cost	Effort
Prompting	90% of use cases	$	Low
RAG (Retrieval)	Need current/private data	$$	Medium
Fine-Tuning	Need consistent specialized behavior	$$$	High

Start with prompting. Most people who think they need fine-tuning just need better prompts.

Common Mistakes

Treating AI like a search engine. It's not looking things up — it's generating responses based on patterns. It can be confidently wrong.
Not giving enough context. The more specific your prompt, the better the output. "Write a blog post" is bad. "Write a 500-word blog post about X for Y audience in Z tone" is good.
Expecting perfection. AI output is a first draft. Always. Build your workflow around editing, not publishing raw output.
Ignoring costs at scale. $0.004 per request seems free until you're doing 1 million requests/month. Model choice matters for production.

The Decision Framework

When choosing a model for your product:

What's the task? (Classification, generation, analysis, conversation)
What's the quality bar? (Perfect accuracy vs. "good enough")
What's the volume? (10 requests/day vs. 10,000/hour)
What's the latency requirement? (Real-time vs. batch processing)

Match these to a model. Don't default to the biggest, most expensive option.

This guide is part of FOMA's fundamentals series. We explain AI concepts for people who build things, not people who build models.

LLMs Explained for People Who Build Products, Not Models

LLMs Explained for Builders

The One-Paragraph Version

What You Actually Need to Know

1. Context Windows = Short-Term Memory

2. Temperature = Creativity vs. Consistency

3. Prompting is Your API

4. Tokens ≠ Words

5. Fine-Tuning vs. Prompting vs. RAG

Common Mistakes

The Decision Framework

Get playbooks like this in your inbox

Keep reading

Prompt Engineering Is Dead. Prompt Design Is What Matters.

Can AI Write Better Ad Copy Than a Senior Copywriter? We Tested It.

How a 12-Person Marketing Agency Uses AI to Compete With the Big Players