The Weekend Builder's Intensive
Ship a real product with Claude. Build a local AI agent. Two days, two things that actually work.
Before the weekend
~2 hours the evening before, so the weekend is spent building, not installing.
0.1 Setup checklist
For the non-tech learner
Tick each box. If something fails, that's normal β note the error and move on; you'll have buffer time Saturday morning. A "verify" command just confirms a tool installed correctly: it prints a version number if all is well.
Accounts (all free to start)
Laptop (the machine you code on)
Mac mini (for Sunday)
Pre-reading (15-min skim)
Alternatives (for later)
- Editor: Cursor (AI-first) or Zed instead of VS Code.
- Local model runner: LM Studio (point-and-click, no terminal) or llama.cpp instead of Ollama.
- Node install: nvm, to switch Node versions easily.
0.2 Credentials & keys tracker
For the non-tech learner
Keys are easy to lose. Track where each one lives β never paste a real secret value into a shared doc or your code. Keep real values in a password manager.
| Item | Where it lives | Notes |
|---|---|---|
| Anthropic API key | Password manager + .env | Starts sk-ant-β¦ β never commit to GitHub |
| GitHub login | Password manager | Enable 2-factor auth |
| Vercel login | "Sign in with GitHub" | No separate password |
| Google Places key | Password manager + .env | Restrict the key in Google Cloud |
Ship a vibecode product
Goal: a working web app at a live public URL, version-controlled and auto-deploying.
How a web app is wired
For the non-tech learner
A web app is four parts talking to each other. You don't need to memorize this β just recognize the words when Claude uses them.
flowchart LR
U([User browser]) -->|clicks, types| FE[Frontend<br/>what users see]
FE -->|asks for data| BE[Backend<br/>rules and logic]
BE -->|reads / writes| DB[(Database<br/>stored data)]
BE -->|calls| EXT[External APIs<br/>Claude, Maps]
EXT --> BE --> FE --> U
Frontend = seen by user Β· Backend = logic Β· Database = storage Β· API = the messenger between systems.
The whole Saturday, as one flow
flowchart LR
A[Idea] --> B[PRD / MVP]
B --> C[Build with Claude Code]
C --> D[Test on localhost]
D --> E[Push to GitHub]
E --> F[Deploy to Vercel]
F --> G([Live public URL])
G -.feedback.-> B
1.1 Product vision statement
For the non-tech learner
One paragraph that keeps you honest. When you're tempted to add a 10th feature at 4pm, re-read this. If the new idea doesn't serve it, it waits.
PRODUCT VISION My app is called: <NAME> It helps: <WHO β the specific person> do: <WHAT β the one core job> so that: <WHY β the benefit they get> In one sentence: "<NAME> lets <WHO> <DO THE JOB> without <THE OLD PAIN>." The single thing it must do well by tonight: <ONE CORE FEATURE>
1.2 One-page PRD
For the non-tech learner
A PRD (Product Requirements Document) is just a plan on one page: what you're building and where the edges are. The most important section is "Out of scope this weekend" β that's what protects your timeline. Let Claude draft it, then you cut it down.
PRD β <APP NAME> Date: <DATE> 1. PROBLEM <who hurts, and how> 2. TARGET USER <one specific kind of person> 3. CORE VALUE <the one thing that must work> 4. MVP FEATURES (max 5 β ruthless) [ ] F1: <feature> β must have [ ] F2: <feature> β must have [ ] F3: <feature> β nice if time 5. USER STORIES As a <user>, I want <action>, so that <benefit>. 6. SCREENS <Home>: <what's on it> 7. DATA <what gets stored> 8. OUT OF SCOPE <login, payments, mobile app...> <- protect this 9. DONE = <one sentence describing the demo>
Alternatives (for later)
- For bigger projects, split into separate spec + roadmap docs, or use Notion / Linear. For now, one file is better.
1.3 CLAUDE.md β Claude Code's memory
For the non-tech learner
This file is Claude Code's "sticky note" about your project β it reads it automatically each session. Keep it short. For every line ask: "if I deleted this, would Claude make a mistake?" If no, delete it. Generate a starter with the /init command, then trim.
# Project: <APP NAME> ## What this is <One line. e.g. "A habit tracker β add habits, tick them daily."> ## Tech stack - Framework: Next.js (React) - Hosting: Vercel - Data: <in-memory / JSON file / Postgres later> ## Commands - Run locally: npm run dev - Run tests: npm test - Build: npm run build ## Conventions - Keep components small and clearly named. - Don't add dependencies without telling me first. - After each feature: run the app, then I review before commit. ## Never - Never commit secrets or .env files. - Never delete files without asking.
Alternatives (for later)
- Cursor uses
.cursorrules; some teams keep a/docsfolder Claude reads. Same idea, different filename.
1.4 Claude Code prompts
For the non-tech learner
Reusable "scripts" for talking to Claude Code. The pattern behind all of them is Explore β Plan β Code β Commit: make Claude look and plan before it writes, so it doesn't run off in the wrong direction.
flowchart TD
E[Explore: Claude reads code] --> P[Plan: propose steps]
P --> R{You approve?}
R -->|No, adjust| P
R -->|Yes| C[Code: make changes]
C --> T[Test / run app]
T --> OK{Works?}
OK -->|No, paste error| C
OK -->|Yes| CM[Commit to Git]
CM --> E
A Β· Plan Mode kick-off
I'm building <APP NAME>. Here is my PRD: [paste PRD]. Before writing any code, use Plan Mode: 1. Propose a simple file structure. 2. List the steps to build MVP feature F1 only. 3. Flag anything risky or any choice you're making for me. Wait for my approval before changing files.
B Β· Build one feature
Let's build feature <F1: NAME> from the PRD, and ONLY that. - Keep it the simplest version that works. - Explain each new file in one line as you create it. - When done, tell me exactly how to run and see it.
C Β· Debug (when something breaks)
I ran <COMMAND> and got this error: <PASTE THE FULL ERROR TEXT> Diagnose the cause, propose the smallest fix, and show me the change before applying it. Don't change anything unrelated.
D Β· Explain it to me
Explain what <FILE or CONCEPT> does as if I'm new to coding, in 4 sentences. Then tell me the one thing I should understand about it to not break it later.
E Β· Write a test
Write 2 simple tests for feature <F1> that would fail if it breaks. Then run them and show me the result.
The golden habits
(1) approve plans before code; (2) review changes before committing; (3) when Claude drifts, stop and re-plan rather than piling on instructions; (4) run /clear when you switch to an unrelated task so Claude's memory stays clean.
1.5 Git workflow β your safety net
For the non-tech learner
Git saves snapshots of your work so you can always go back. GitHub stores those snapshots online and triggers your deploy. You only need five commands today. A "commit" = a saved snapshot with a label; a "push" = upload it to GitHub.
flowchart LR
W[Change files] --> ADD[git add .]
ADD --> COMMIT[git commit -m msg]
COMMIT --> PUSH[git push]
PUSH --> DEPLOY[Vercel auto-deploys]
# One-time, at project start git init git add . git commit -m "Initial commit: project setup" git remote add origin <YOUR_GITHUB_REPO_URL> git push -u origin main # Repeat after every working feature git add . git commit -m "Add <feature>: <what it does>" git push # Undo uncommitted changes git restore .
Alternatives (for later)
- GitHub Desktop (a click-based app, no terminal) or VS Code's Source Control panel do the same thing visually.
1.6 Secrets & .env
For the non-tech learner
Secrets (like your API key) must never go into your code or GitHub. They live in a file called .env that Git ignores, and in your host's settings. You commit a fake version called .env.example so others know what's needed β with no real values.
# .env.example (SAFE to commit β placeholders only) ANTHROPIC_API_KEY=your-key-here DATABASE_URL=your-db-url-here # .gitignore must contain at least: .env .env.local node_modules/
Rule of thumb
If a value would let a stranger spend your money or read your data, it's a secret β .env + host settings only. Set a spending limit in the Anthropic Console and a usage alert on your host.
1.7 Deploy to production
For the non-tech learner
"Deploying" = publishing your app to a hosting company's computers so anyone can visit it. Production = that live version. The first deploy feels scary; it's mostly clicking "Import" and "Deploy."
Alternatives (for later)
- Netlify β same flow, great for static/frontend sites.
- Render / Railway β when you need an always-on backend or hosted database.
- Custom domain β buy from Namecheap/Cloudflare, point it at Vercel; do this after the app works.
1.8 Post-launch backlog
For the non-tech learner
The moment you launch you'll see things to improve. Don't fix them live in a panic β write them here and tackle them calmly. This is how real products evolve.
BACKLOG β <APP NAME> NOW (this week) [ ] <bug or tiny win> NEXT (this month) [ ] <feature people asked for> LATER (someday) [ ] <bigger idea> FEEDBACK LOG - <date> <who> said: <quote> -> action: <what you'll do>
Build a local AI agent
Goal: a Mac-mini pipeline that turns compliant data into confidence-scored POIs, proven with accuracy evals.
The pipeline at a glance
For the non-tech learner
You build a small "assembly line." Raw place data comes in from a source you're allowed to use; a local AI (Gemma, running on your Mac β nothing leaves the machine) cleans and labels each place; you attach a confidence score (how sure it is, 0β1); out comes a tidy spreadsheet. Then you prove it's good with evals. (POI = Point of Interest = a place: name + category + location.)
flowchart TD
S[Compliant source<br/>OSM / Foursquare / Overture] --> I[Ingest raw records]
I --> X[Gemma extracts and normalizes]
X --> SC[Assign confidence 0-1]
SC --> TH{Confidence high enough?}
TH -->|Yes| OUT[(Write CSV / JSON)]
TH -->|No| RV[Flag for review<br/>or route to Claude]
RV --> OUT
2.1 Mac mini setup
For the non-tech learner
This gets a local AI running on your Mac. "Local" means it runs on your own hardware β free per use, private, works offline. Ollama downloads and runs the model; it automatically uses your Mac's graphics chip with zero setup.
Which model size for your Mac? (memory decides)
| Mac mini RAM | Comfortable Gemma size | Roughly |
|---|---|---|
| 16 GB | 12B (recommended) | good quality, ~6.7 GB |
| 24 GB | 12B or 27B-class | higher quality |
| 48 GB | 27B+ with long context | best |
Alternatives (for later)
- LM Studio β a click-based app to run models without the terminal.
- Smaller model (gemma3:4b) if 12B feels slow β faster, slightly less accurate.
- MLX backend (Apple's) β noticeably faster on Apple Silicon, a tuning step for later.
2.2 Compliant data source picker
For the non-tech learner
The most important decision today. We do not scrape Facebook β it breaks their rules and is fragile. We use data we're allowed to use. Rule of thumb: open dataset or official API > scraping. Check three words on any source: License (am I allowed?), Rate limit (how fast may I ask?), robots.txt / ToS (what do they forbid?).
| Source | What it gives | Why it's safe | Cost |
|---|---|---|---|
| OpenStreetMap (Overpass) | POIs by type + area | Open data, query in browser first | Free |
| Foursquare Open Source Places | 100M+ POIs, 22 fields | Apache-2.0, download as files | Free |
| Overture Maps (places) | Tens of millions of POIs | Open license, has a confidence field | Free |
| Google/Foursquare/Mapbox/HERE API | Rich, current POIs | Official API β stay within terms | Free tier+ |
| Gov / civic open data | Local registries | Public, licensed | Free |
Alternatives (for later)
- Meta's official Graph API (within its terms) for pages you own or manage β the compliant way to touch Meta data. Never scraping.
2.3 Overpass query (POIs from OpenStreetMap)
For the non-tech learner
Overpass is a free way to ask OpenStreetMap "give me all the X in this rectangle." Paste this into overpass-turbo.eu, press Run, then Export β JSON. The four numbers are south, west, north, east β a box on the map.
[out:json][timeout:25]; // all cafes in a small bounding box (south,west,north,east) node["amenity"="cafe"](40.700,-74.020,40.730,-73.990); out body;
Swap cafe for restaurant, pharmacy, bank, school, hotel, supermarketβ¦
2.4 First Gemma script
For the non-tech learner
Five lines to prove Gemma answers from your own Python. Save as hello_gemma.py, run python3 hello_gemma.py.
import ollama
reply = ollama.chat(
model="gemma3:12b",
messages=[{"role": "user", "content": "Say hello in one short sentence."}],
)
print(reply["message"]["content"])2.5 Extraction prompt
For the non-tech learner
The instruction you give Gemma for each place. The tricks that make a local model reliable: be explicit, show examples, ask for JSON only, and ask for a calibrated confidence. "Calibrated" means: if it says 0.9, it should be right about 9 times out of 10. Save as extraction_prompt.txt.
You are a data cleaner for Points of Interest (POIs).
For the raw record below, return ONE JSON object with these fields:
- name: cleaned business name (Title Case, no extra symbols)
- category: one of [cafe, restaurant, shop, pharmacy, bank, hotel, other]
- address: a single tidy line, or "" if unknown
- lat: number or null
- lon: number or null
- confidence: 0.0-1.0 β how sure YOU are this is correct and complete.
Be calibrated: 0.9 means you'd be right ~9 times in 10.
Rules:
- Return JSON only. No commentary, no markdown.
- If a field is unknown, use "" or null. Do not invent data.
EXAMPLE INPUT: {"nm":"joe's COFFEE","type":"coffee shop","addr":"12 main st"}
EXAMPLE OUTPUT: {"name":"Joe's Coffee","category":"cafe","address":"12 Main St","lat":null,"lon":null,"confidence":0.78}
RAW RECORD:
<PASTE ONE RECORD HERE>2.6 POI schema β force clean output
For the non-tech learner
A "schema" is a strict shape for the data β every POI gets the same fields, so your spreadsheet isn't a mess. Ollama can enforce it. temperature=0 makes answers consistent. Save as poi_schema.py.
from pydantic import BaseModel
from typing import Optional
import ollama
class POI(BaseModel):
name: str
category: str
address: str
lat: Optional[float]
lon: Optional[float]
confidence: float
def extract_one(raw_record: str, prompt_template: str) -> POI:
resp = ollama.chat(
model="gemma3:12b",
messages=[{"role": "user",
"content": prompt_template.replace("<PASTE ONE RECORD HERE>", raw_record)}],
format=POI.model_json_schema(), # enforce the shape
options={"temperature": 0}, # consistent output
)
return POI.model_validate_json(resp["message"]["content"])2.7 Pipeline orchestration
For the non-tech learner
"Orchestration" is the glue that runs each step in order, survives a bad answer, and writes the file. Keep it one simple script. Tip: use Claude Code from Saturday to write and debug this β paste it with your data sample and say "make this run on my file."
import json, csv
from poi_schema import POI, extract_one # from 2.6
PROMPT = open("extraction_prompt.txt").read() # from 2.5
THRESHOLD = 0.6 # tune later
def run(input_json_path, output_csv_path):
raw = json.load(open(input_json_path)) # ingest
records = raw.get("elements", raw) # OSM uses "elements"
kept, flagged = [], []
for rec in records:
try:
poi = extract_one(json.dumps(rec), PROMPT) # extract + score
except Exception as e:
flagged.append({"error": str(e), "record": rec})
continue
if poi.confidence >= THRESHOLD:
kept.append(poi.model_dump())
else:
flagged.append(poi.model_dump())
with open(output_csv_path, "w", newline="") as f:
cols = ["name","category","address","lat","lon","confidence"]
w = csv.DictWriter(f, fieldnames=cols)
w.writeheader()
for r in kept:
w.writerow({k: r.get(k, "") for k in cols})
print(f"Kept {len(kept)} POIs, flagged {len(flagged)} for review.")
if __name__ == "__main__":
run("raw_pois.json", "pois.csv")Alternatives (for later)
- Agent frameworks like LangChain or LlamaIndex add structure for bigger systems β overkill today. A plain script is the right altitude for learning.
2.8 Golden dataset β your answer key
For the non-tech learner
To know if your pipeline is good, you need known-right answers to compare against. Hand-check ~30β50 places (confirm name/category/location) and save as the "golden" truth. This is the most valuable hour of the day β quality here decides everything. Save as golden.csv.
id,name,category,address,lat,lon 1,Joe's Coffee,cafe,12 Main St,40.71,-74.00 2,City Pharmacy,pharmacy,5 Oak Ave,40.72,-74.01 3,...
Include a few tricky cases (odd names, duplicates, missing addresses) β they reveal where your pipeline struggles.
2.9 Eval grader β prove quality in numbers
For the non-tech learner
An "eval" is a repeatable quality test. Four numbers tell the story: Precision (of what I returned, how much was right?), Recall (of what existed, how much did I catch?), F1 (their balance), Accuracy (overall fraction right). Higher is better; 1.0 is perfect.
flowchart LR
G[Golden dataset<br/>known-correct] --> CMP[Match by name + location]
P[Pipeline output] --> CMP
CMP --> M[Precision / Recall<br/>F1 / Accuracy]
CMP --> CAL[Bucket by confidence<br/>then calibration]
M --> R[Eval report]
CAL --> R
import pandas as pd
def norm(s):
return str(s).strip().lower()
def evaluate(pred_csv, gold_csv):
pred = pd.read_csv(pred_csv)
gold = pd.read_csv(gold_csv)
gold_names = set(norm(n) for n in gold["name"])
pred_names = [norm(n) for n in pred["name"]]
tp = sum(1 for n in pred_names if n in gold_names)
fp = sum(1 for n in pred_names if n not in gold_names)
fn = sum(1 for n in gold_names if n not in set(pred_names))
precision = tp / (tp + fp) if (tp + fp) else 0
recall = tp / (tp + fn) if (tp + fn) else 0
f1 = 2*precision*recall/(precision+recall) if (precision+recall) else 0
print(f"Precision: {precision:.2f}")
print(f"Recall: {recall:.2f}")
print(f"F1: {f1:.2f}")
return precision, recall, f1
if __name__ == "__main__":
evaluate("pois.csv", "golden.csv")Alternatives (for later)
- Use the Anthropic Console's Evaluation tool to auto-generate extra test cases and grade them; add error bars so you don't over-read small samples.
2.10 Confidence calibration check
For the non-tech learner
Does the AI know what it knows? Group POIs by the confidence it gave, then check how often each group was actually right. If the "0.9" group is right ~90% of the time, it's well-calibrated. Big gaps mean the confidence number can't be trusted yet.
CALIBRATION TABLE (fill after running the grader per bucket) Confidence bucket | # POIs | Actually correct | Actual accuracy 0.9 - 1.0 | | | % 0.7 - 0.9 | | | % 0.5 - 0.7 | | | % below 0.5 | | | % Reading it: actual accuracy should roughly MATCH the bucket. If the 0.9 bucket is only 60% correct -> the model is overconfident -> lower the THRESHOLD or improve the prompt.
2.11 Model routing β Gemma vs Claude
For the non-tech learner
You don't have to pick one AI forever. A smart pattern: do the bulk locally on Gemma (free, private), and send only the hard, low-confidence cases to Claude (smarter, costs a little). This is your one "agentic" decision β the system choosing its own next step.
flowchart TD
POI[POI to process] --> LOC[Gemma local<br/>free, private]
LOC --> CONF{Confidence high?}
CONF -->|Yes| KEEP[Keep result]
CONF -->|No| CLA[Send hard case<br/>to Claude API]
CLA --> KEEP
| Gemma (local) | Claude (hosted) | |
|---|---|---|
| Privacy | data stays on Mac | sent to cloud |
| Cost | $0 per use | pay per token |
| Quality | good | higher |
| Speed | depends on Mac | fast |
| Offline | yes | no |
Alternatives (for later)
- Swap which Claude model you escalate to (Haiku = cheapest/fast, Sonnet = balanced, Opus = strongest). The migration is often just replacing
ollama.chat()with the Anthropic SDK call.
Reference cards
Keep these open while you work.
Command cheat-sheet
CLAUDE CODE claude start it /init create CLAUDE.md /clear wipe conversation memory (between tasks) /cost see how much you've spent claude doctor check your install GIT git add . stage changes git commit -m "msg" save a snapshot git push upload to GitHub git restore . undo uncommitted changes WEB APP (Next.js) npm run dev run locally npm test run tests npm run build production build check OLLAMA / GEMMA ollama pull gemma3:12b download the model ollama run gemma3:12b chat in terminal ollama ps what's loaded + memory ollama list downloaded models
Troubleshooting β first things to try
For the non-tech learner
When stuck, copy the exact error and paste it into Claude Code's debug prompt (1.4-C). 90% of fixes start there.
| Symptom | First thing to try |
|---|---|
| "command not found" | tool isn't installed / restart the terminal |
| app won't load on localhost | is npm run dev still running? right URL/port? |
| secret / key errors | is it in .env AND in the host's settings? |
| Gemma is very slow | try gemma3:4b, or close other heavy apps |
| Gemma returns broken JSON | set temperature 0, keep "JSON only", retry |
| deploy build fails | read the build log top-to-bottom; paste it to Claude |
| git push rejected | run git pull first, then push again |
Glossary at a glance
- Vibe coding
- describe in words, AI writes the code, you review & test
- LLM
- the AI (Claude, Gemma)
- IDE
- the app you edit code in (VS Code)
- Terminal
- the text window for typing commands
- Git / GitHub
- version snapshots / online store + deploy trigger
- PRD / MVP
- one-page plan / smallest useful version
- Production
- the live version users visit
- Deploy
- publish the app to a host
- localhost
- your own computer as a private server
- .env / secret
- values like API keys, kept out of code & GitHub
- Context window
- the AI's working memory; clear it between tasks
- Agent / workflow
- AI picks its own steps / fixed steps AI fills in
- Open / local model
- a model you download & run yourself (Gemma)
- Quantization (Q4)
- shrink a model to fit memory, minor quality loss
- POI
- a place: name + category + location
- Schema
- a strict shape for data (every record same fields)
- Eval / golden set
- a repeatable quality test / hand-verified correct answers
- Precision / Recall
- right-of-returned / caught-of-existing
- Calibration
- does "90% sure" mean right 90% of the time?
- robots.txt / ToS
- bot rules / the contract you accept
Go deeper (official first)
A note on accuracy
Model names, prices, and free tiers change month to month. Double-check any figures (Claude pricing, Gemma sizes, Mac configs, host tiers) on the official pages before you budget or buy.