/ A two-day build

The Weekend Builder's Intensive

Ship a real product with Claude. Build a local AI agent. Two days, two things that actually work.

Prep · before the weekend0%

Saturday · ship a product0%

Sunday · build an agent0%

Copy any code with the button — no retyping. Tick boxes to fill your progress bars. Guidance boxes explain every term in plain words. 📌 For later notes hold alternatives — skip them now.

/ Pack 0

Before the weekend

~2 hours the evening before, so the weekend is spent building, not installing.

0.2 Credentials & keys tracker

For the non-tech learner

Keys are easy to lose. Track where each one lives — never paste a real secret value into a shared doc or your code. Keep real values in a password manager.

Item	Where it lives	Notes
Anthropic API key	Password manager + .env	Starts sk-ant-… — never commit to GitHub
GitHub login	Password manager	Enable 2-factor auth
Vercel login	"Sign in with GitHub"	No separate password
Google Places key	Password manager + .env	Restrict the key in Google Cloud

/ Pack 2 · Sunday

Build a local AI agent

Goal: a Mac-mini pipeline that turns compliant data into confidence-scored POIs, proven with accuracy evals.

The pipeline at a glance

For the non-tech learner

You build a small "assembly line." Raw place data comes in from a source you're allowed to use; a local AI (Gemma, running on your Mac — nothing leaves the machine) cleans and labels each place; you attach a confidence score (how sure it is, 0–1); out comes a tidy spreadsheet. Then you prove it's good with evals. (POI = Point of Interest = a place: name + category + location.)

flowchart TD
  S[Compliant source<br/>OSM / Foursquare / Overture] --> I[Ingest raw records]
  I --> X[Gemma extracts and normalizes]
  X --> SC[Assign confidence 0-1]
  SC --> TH{Confidence high enough?}
  TH -->|Yes| OUT[(Write CSV / JSON)]
  TH -->|No| RV[Flag for review<br/>or route to Claude]
  RV --> OUT

2.1 Mac mini setup

For the non-tech learner

This gets a local AI running on your Mac. "Local" means it runs on your own hardware — free per use, private, works offline. Ollama downloads and runs the model; it automatically uses your Mac's graphics chip with zero setup.

Ollama installedollama --version
Pull the modelollama pull gemma3:12b
Quick chat testollama run gemma3:12b (say hi, then /bye)
Confirm it's loadedollama ps

Which model size for your Mac? (memory decides)

Mac mini RAM	Comfortable Gemma size	Roughly
16 GB	12B (recommended)	good quality, ~6.7 GB
24 GB	12B or 27B-class	higher quality
48 GB	27B+ with long context	best

Alternatives (for later)

LM Studio — a click-based app to run models without the terminal.
Smaller model (gemma3:4b) if 12B feels slow — faster, slightly less accurate.
MLX backend (Apple's) — noticeably faster on Apple Silicon, a tuning step for later.

2.2 Compliant data source picker

For the non-tech learner

The most important decision today. We do not scrape Facebook — it breaks their rules and is fragile. We use data we're allowed to use. Rule of thumb: open dataset or official API > scraping. Check three words on any source: License (am I allowed?), Rate limit (how fast may I ask?), robots.txt / ToS (what do they forbid?).

Source	What it gives	Why it's safe	Cost
OpenStreetMap (Overpass)	POIs by type + area	Open data, query in browser first	Free
Foursquare Open Source Places	100M+ POIs, 22 fields	Apache-2.0, download as files	Free
Overture Maps (places)	Tens of millions of POIs	Open license, has a confidence field	Free
Google/Foursquare/Mapbox/HERE API	Rich, current POIs	Official API — stay within terms	Free tier+
Gov / civic open data	Local registries	Public, licensed	Free

Alternatives (for later)

Meta's official Graph API (within its terms) for pages you own or manage — the compliant way to touch Meta data. Never scraping.

Source chosen & compliance checked

2.3 Overpass query (POIs from OpenStreetMap)

For the non-tech learner

Overpass is a free way to ask OpenStreetMap "give me all the X in this rectangle." Paste this into overpass-turbo.eu, press Run, then Export → JSON. The four numbers are south, west, north, east — a box on the map.

[out:json][timeout:25];
// all cafes in a small bounding box (south,west,north,east)
node["amenity"="cafe"](40.700,-74.020,40.730,-73.990);
out body;

Swap cafe for restaurant, pharmacy, bank, school, hotel, supermarket…

Mark this step done

2.4 First Gemma script

For the non-tech learner

Five lines to prove Gemma answers from your own Python. Save as hello_gemma.py, run python3 hello_gemma.py.

import ollama

reply = ollama.chat(
    model="gemma3:12b",
    messages=[{"role": "user", "content": "Say hello in one short sentence."}],
)
print(reply["message"]["content"])

Gemma replied locally ✓

2.5 Extraction prompt

For the non-tech learner

The instruction you give Gemma for each place. The tricks that make a local model reliable: be explicit, show examples, ask for JSON only, and ask for a calibrated confidence. "Calibrated" means: if it says 0.9, it should be right about 9 times out of 10. Save as extraction_prompt.txt.

You are a data cleaner for Points of Interest (POIs).
For the raw record below, return ONE JSON object with these fields:
- name: cleaned business name (Title Case, no extra symbols)
- category: one of [cafe, restaurant, shop, pharmacy, bank, hotel, other]
- address: a single tidy line, or "" if unknown
- lat: number or null
- lon: number or null
- confidence: 0.0-1.0 — how sure YOU are this is correct and complete.
  Be calibrated: 0.9 means you'd be right ~9 times in 10.

Rules:
- Return JSON only. No commentary, no markdown.
- If a field is unknown, use "" or null. Do not invent data.

EXAMPLE INPUT:  {"nm":"joe's  COFFEE","type":"coffee shop","addr":"12 main st"}
EXAMPLE OUTPUT: {"name":"Joe's Coffee","category":"cafe","address":"12 Main St","lat":null,"lon":null,"confidence":0.78}

RAW RECORD:
<PASTE ONE RECORD HERE>

Mark this step done

2.6 POI schema — force clean output

For the non-tech learner

A "schema" is a strict shape for the data — every POI gets the same fields, so your spreadsheet isn't a mess. Ollama can enforce it. temperature=0 makes answers consistent. Save as poi_schema.py.

from pydantic import BaseModel
from typing import Optional
import ollama

class POI(BaseModel):
    name: str
    category: str
    address: str
    lat: Optional[float]
    lon: Optional[float]
    confidence: float

def extract_one(raw_record: str, prompt_template: str) -> POI:
    resp = ollama.chat(
        model="gemma3:12b",
        messages=[{"role": "user",
                   "content": prompt_template.replace("<PASTE ONE RECORD HERE>", raw_record)}],
        format=POI.model_json_schema(),   # enforce the shape
        options={"temperature": 0},        # consistent output
    )
    return POI.model_validate_json(resp["message"]["content"])

Mark this step done

2.7 Pipeline orchestration

For the non-tech learner

"Orchestration" is the glue that runs each step in order, survives a bad answer, and writes the file. Keep it one simple script. Tip: use Claude Code from Saturday to write and debug this — paste it with your data sample and say "make this run on my file."

import json, csv
from poi_schema import POI, extract_one   # from 2.6

PROMPT = open("extraction_prompt.txt").read()   # from 2.5
THRESHOLD = 0.6                                   # tune later

def run(input_json_path, output_csv_path):
    raw = json.load(open(input_json_path))         # ingest
    records = raw.get("elements", raw)             # OSM uses "elements"
    kept, flagged = [], []

    for rec in records:
        try:
            poi = extract_one(json.dumps(rec), PROMPT)   # extract + score
        except Exception as e:
            flagged.append({"error": str(e), "record": rec})
            continue
        if poi.confidence >= THRESHOLD:
            kept.append(poi.model_dump())
        else:
            flagged.append(poi.model_dump())

    with open(output_csv_path, "w", newline="") as f:
        cols = ["name","category","address","lat","lon","confidence"]
        w = csv.DictWriter(f, fieldnames=cols)
        w.writeheader()
        for r in kept:
            w.writerow({k: r.get(k, "") for k in cols})

    print(f"Kept {len(kept)} POIs, flagged {len(flagged)} for review.")

if __name__ == "__main__":
    run("raw_pois.json", "pois.csv")

Alternatives (for later)

Agent frameworks like LangChain or LlamaIndex add structure for bigger systems — overkill today. A plain script is the right altitude for learning.

pois.csv produced end-to-end ✓

2.8 Golden dataset — your answer key

For the non-tech learner

To know if your pipeline is good, you need known-right answers to compare against. Hand-check ~30–50 places (confirm name/category/location) and save as the "golden" truth. This is the most valuable hour of the day — quality here decides everything. Save as golden.csv.

id,name,category,address,lat,lon
1,Joe's Coffee,cafe,12 Main St,40.71,-74.00
2,City Pharmacy,pharmacy,5 Oak Ave,40.72,-74.01
3,...

Include a few tricky cases (odd names, duplicates, missing addresses) — they reveal where your pipeline struggles.

Mark this step done

2.9 Eval grader — prove quality in numbers

For the non-tech learner

An "eval" is a repeatable quality test. Four numbers tell the story: Precision (of what I returned, how much was right?), Recall (of what existed, how much did I catch?), F1 (their balance), Accuracy (overall fraction right). Higher is better; 1.0 is perfect.

flowchart LR
  G[Golden dataset<br/>known-correct] --> CMP[Match by name + location]
  P[Pipeline output] --> CMP
  CMP --> M[Precision / Recall<br/>F1 / Accuracy]
  CMP --> CAL[Bucket by confidence<br/>then calibration]
  M --> R[Eval report]
  CAL --> R

import pandas as pd

def norm(s):
    return str(s).strip().lower()

def evaluate(pred_csv, gold_csv):
    pred = pd.read_csv(pred_csv)
    gold = pd.read_csv(gold_csv)
    gold_names = set(norm(n) for n in gold["name"])
    pred_names = [norm(n) for n in pred["name"]]

    tp = sum(1 for n in pred_names if n in gold_names)
    fp = sum(1 for n in pred_names if n not in gold_names)
    fn = sum(1 for n in gold_names if n not in set(pred_names))

    precision = tp / (tp + fp) if (tp + fp) else 0
    recall    = tp / (tp + fn) if (tp + fn) else 0
    f1 = 2*precision*recall/(precision+recall) if (precision+recall) else 0

    print(f"Precision: {precision:.2f}")
    print(f"Recall:    {recall:.2f}")
    print(f"F1:        {f1:.2f}")
    return precision, recall, f1

if __name__ == "__main__":
    evaluate("pois.csv", "golden.csv")

Alternatives (for later)

Use the Anthropic Console's Evaluation tool to auto-generate extra test cases and grade them; add error bars so you don't over-read small samples.

Mark this step done

2.10 Confidence calibration check

For the non-tech learner

Does the AI know what it knows? Group POIs by the confidence it gave, then check how often each group was actually right. If the "0.9" group is right ~90% of the time, it's well-calibrated. Big gaps mean the confidence number can't be trusted yet.

CALIBRATION TABLE  (fill after running the grader per bucket)

Confidence bucket | # POIs | Actually correct | Actual accuracy
0.9 - 1.0         |        |                  |        %
0.7 - 0.9         |        |                  |        %
0.5 - 0.7         |        |                  |        %
below 0.5         |        |                  |        %

Reading it: actual accuracy should roughly MATCH the bucket.
If the 0.9 bucket is only 60% correct -> the model is
overconfident -> lower the THRESHOLD or improve the prompt.

Mark this step done

2.11 Model routing — Gemma vs Claude

For the non-tech learner

You don't have to pick one AI forever. A smart pattern: do the bulk locally on Gemma (free, private), and send only the hard, low-confidence cases to Claude (smarter, costs a little). This is your one "agentic" decision — the system choosing its own next step.

flowchart TD
  POI[POI to process] --> LOC[Gemma local<br/>free, private]
  LOC --> CONF{Confidence high?}
  CONF -->|Yes| KEEP[Keep result]
  CONF -->|No| CLA[Send hard case<br/>to Claude API]
  CLA --> KEEP

	Gemma (local)	Claude (hosted)
Privacy	data stays on Mac	sent to cloud
Cost	$0 per use	pay per token
Quality	good	higher
Speed	depends on Mac	fast
Offline	yes	no

Alternatives (for later)

Swap which Claude model you escalate to (Haiku = cheapest/fast, Sonnet = balanced, Opus = strongest). The migration is often just replacing ollama.chat() with the Anthropic SDK call.

Sunday capstone built 🎉

/ Pack 3

Reference cards

Keep these open while you work.

Command cheat-sheet

CLAUDE CODE
  claude              start it
  /init               create CLAUDE.md
  /clear              wipe conversation memory (between tasks)
  /cost               see how much you've spent
  claude doctor       check your install

GIT
  git add .           stage changes
  git commit -m "msg" save a snapshot
  git push            upload to GitHub
  git restore .       undo uncommitted changes

WEB APP (Next.js)
  npm run dev         run locally
  npm test            run tests
  npm run build       production build check

OLLAMA / GEMMA
  ollama pull gemma3:12b   download the model
  ollama run gemma3:12b    chat in terminal
  ollama ps                what's loaded + memory
  ollama list              downloaded models

Troubleshooting — first things to try

For the non-tech learner

When stuck, copy the exact error and paste it into Claude Code's debug prompt (1.4-C). 90% of fixes start there.

Symptom	First thing to try
"command not found"	tool isn't installed / restart the terminal
app won't load on localhost	is `npm run dev` still running? right URL/port?
secret / key errors	is it in .env AND in the host's settings?
Gemma is very slow	try gemma3:4b, or close other heavy apps
Gemma returns broken JSON	set temperature 0, keep "JSON only", retry
deploy build fails	read the build log top-to-bottom; paste it to Claude
git push rejected	run `git pull` first, then push again

Glossary at a glance

Vibe coding: describe in words, AI writes the code, you review & test
LLM: the AI (Claude, Gemma)
IDE: the app you edit code in (VS Code)
Terminal: the text window for typing commands
Git / GitHub: version snapshots / online store + deploy trigger
PRD / MVP: one-page plan / smallest useful version
Production: the live version users visit
Deploy: publish the app to a host
localhost: your own computer as a private server
.env / secret: values like API keys, kept out of code & GitHub
Context window: the AI's working memory; clear it between tasks
Agent / workflow: AI picks its own steps / fixed steps AI fills in
Open / local model: a model you download & run yourself (Gemma)
Quantization (Q4): shrink a model to fit memory, minor quality loss
POI: a place: name + category + location
Schema: a strict shape for data (every record same fields)
Eval / golden set: a repeatable quality test / hand-verified correct answers
Precision / Recall: right-of-returned / caught-of-existing
Calibration: does "90% sure" mean right 90% of the time?
robots.txt / ToS: bot rules / the contract you accept

Go deeper (official first)

Anthropic AcademyClaude Code docs Building Effective AgentsPrompt engineering tutorial Ollama docsGoogle Gemma docs GitHub Hello WorldVercel docs Overpass TurboFoursquare OS Places Overture Maps

A note on accuracy

Model names, prices, and free tiers change month to month. Double-check any figures (Claude pricing, Gemma sizes, Mac configs, host tiers) on the official pages before you budget or buy.

Before the weekend

0.1 Setup checklist

Accounts (all free to start)

Laptop (the machine you code on)

Mac mini (for Sunday)

Pre-reading (15-min skim)

0.2 Credentials & keys tracker

Ship a vibecode product

How a web app is wired

The whole Saturday, as one flow

1.1 Product vision statement

1.2 One-page PRD

1.3 CLAUDE.md — Claude Code's memory

1.4 Claude Code prompts

A · Plan Mode kick-off

B · Build one feature

C · Debug (when something breaks)

D · Explain it to me

E · Write a test

1.5 Git workflow — your safety net

1.6 Secrets & .env

1.7 Deploy to production

1.8 Post-launch backlog

Build a local AI agent

The pipeline at a glance

2.1 Mac mini setup

Which model size for your Mac? (memory decides)

2.2 Compliant data source picker

2.3 Overpass query (POIs from OpenStreetMap)

2.4 First Gemma script

2.5 Extraction prompt

2.6 POI schema — force clean output

2.7 Pipeline orchestration

2.8 Golden dataset — your answer key

2.9 Eval grader — prove quality in numbers

2.10 Confidence calibration check

2.11 Model routing — Gemma vs Claude

Reference cards

Command cheat-sheet

Troubleshooting — first things to try

Glossary at a glance

Go deeper (official first)