AI Coding Agents for Economics Research

Claude Code. From Messy Code to AER-Ready Replication Package. Day 2

Alexander Rieber (@AlexRieber) · Dennis Steinle (@DennisSteinle) · Ulm University

2026-04-22

Workshop Overview

Part	Topic	Duration
1	Recap: what is a coding agent?	5 min
2	Claude Code setup, quick check	10 min
3	The CLAUDE.md briefing	15 min
4	Commands, skills, permissions, hooks	20 min
5	Research best practices	20 min
6	Live demo: messy code to replication package	20 min
7	Hands-on: your turn	self-paced

Prerequisite: Docker container running (see Docker Setup How-To). Claude Code authenticated (either claude login on Max, or ANTHROPIC_API_KEY set). If you want to use non-Claude models, the Day 2 tutorial appendix covers Aider + OpenRouter.

Part 1: Recap

The AI Spectrum

💬

Chatbot

You ask questions.
AI answers in text.
You do the work.

→

🛠️

Copilot

AI suggests code.
You review & accept.
Shared work.

→

🤖

Agent

AI plans, writes, runs
& debugs code.
AI does the work.

Why Claude Code today

🟣 Claude Code

Terminal-native, Anthropic-built
CLAUDE.md briefing file
First-class plan mode
Custom skills, hooks, permission allow-list
Subagents for independent subtasks
Models: Opus / Sonnet / Haiku

Other options exist

We're going deep on Claude Code because the full feature set (plan / hooks / skills / subagents) is the best fit for long research refactors. Aider + OpenRouter is the flexible alternative when you need other model families. See the tutorial appendix.

Today’s Task

🧹 → 📦 Messy Project → AER Replication Package

7 R scripts, 5 data files (.dta + .csv), no documentation. Use Claude Code to:

✅ Reorganize into clean directories

✅ Write an AEA-compliant README

✅ Create a master script & Makefile

✅ Fix code to use relative paths

✅ Add dependency management

✅ Produce all tables & figures

Data: Kessler & Roth (2025), “Increasing Organ Donor Registration” (openICPSR 195641).

Part 2: Setup

Into the container, claude ready

$ docker exec -it econ-replication-agent bash
root@container:~# cd /workspace/messy_project

Two authentication paths (Day 1 covers both):

Max subscription

$ claude login

API key

$ export ANTHROPIC_API_KEY=sk-ant-…

Sanity check

$ claude
▸ /help    # all built-in commands
▸ /model   # confirm which Claude model is active
▸ /cost    # tokens on API, or "uses Max"

If /cost reports real numbers (or "uses Max"), billing is wired up.

Part 3: The instruction file

Same file, many names

The agent reads a Markdown file on startup.

Agent	File
Claude Code	`CLAUDE.md`
Aider	`CONVENTIONS.md`
Gemini CLI	`GEMINI.md`
Codex CLI	`AGENTS.md`

One content, different filenames. Draft once, cp across agents.

Think of it as a briefing, not a config.

Anatomy of a good briefing

🎯 Mission
One paragraph. What is the project? What's the job?

📋 Principles
Numbered non-negotiable rules.

🏗 Structure
Expected directory tree.

⚙️ Technical specs
Packages, output formats, SE specifications, seeds.

🚫 Don'ts
What must never happen. "Never touch data/raw/."

📞 Protocol
What to do when stuck. Where to log progress.

Example: mission & principles

# Replication Package Agent

## Mission
Transform the messy project folder into an AER-compliant replication
package. Reorganize code, write documentation, create a master script,
produce all tables and figures in one reproducible pipeline.

## Principles
1. Never delete or modify original data files. Copy to data/raw/.
2. All code runs from the project root using relative paths.
3. R with tidyverse. Load packages via library() in 00_setup.R only.
4. Tables: modelsummary() saved as .csv and .tex.
5. Figures: ggsave() saved as .pdf and .png at 300 DPI.
6. Standard errors: heteroskedasticity-robust unless otherwise specified.
7. Follow the AEA Social Science Data Editors README template.

Example: structure & specs

## Project Structure
replication_package/
├── README.md              ├── code/
├── LICENSE.txt             │   ├── 00_setup.R
├── Makefile                │   ├── 01_descriptive.R
├── master.R                │   ├── 02_main_analysis.R
├── data/                   │   ├── 03_nok_analysis.R
│   ├── raw/                │   ├── 04_dmv_analysis.R
│   └── README_data.md      │   ├── 05_robustness.R
└── output/                 │   └── 06_figures.R
    ├── tables/
    └── figures/

## Don'ts
- Never modify files in data/raw/.
- No install.packages() outside 00_setup.R.
- No hardcoded absolute paths.

`/init` bootstraps the first draft

$ cd /workspace/messy_project
$ claude
▸ /init

Scanning repo… reading sample of files…
✓ Wrote CLAUDE.md (62 lines)

Treat the output as a first draft. Edit to add your domain-specific rules, then commit to git.

Six design rules

1. Specific, not vague
"Clean up" ❌ → "Refactor into code/02_main_analysis.R" ✅

2. Show the expected tree
A directory tree beats 1000 words of description.

3. Define output formats
"Tables as .tex + .csv" / "Figures as PDF at 300 DPI"

4. Pin the conventions
AEA template sections, clustering specs, relative paths only.

5. State what NOT to do
"Never delete original data" / "No install.packages() in scripts"

6. Use an LLM to write it
Draft in claude.ai, then iterate: "add a section on X".

Part 4: Commands, skills, permissions

Slash commands

Command	What it does
`/plan`	Enter planning mode. Draft, wait for approval.
`/init`	Auto-draft `CLAUDE.md` from the repo
`/compact`	Compress a long conversation to save context
`/clear`	Clear history (keep files)
`/cost`	API spend this session (or “uses Max”)
`/model`	Switch opus / sonnet / haiku
`/your-skill`	Run a custom command from `.claude/commands/`

`/plan`: think before you act

❌ Without `/plan`

> Reorganize this project into
  an AER replication package

Agent starts moving files immediately. You find out what it did after.

✅ With `/plan`

> /plan
> Reorganize this project into
  an AER replication package

Agent reads files, drafts a plan, waits for your approval.

Custom skills

Drop a Markdown file in .claude/commands/. It becomes a slash command:

.claude/commands/check-package.md

Verify the replication package is AER-compliant. Report pass/fail for each:

1. README.md present and follows the AEA template sections.
2. Every script referenced in README.md exists.
3. `Rscript master.R` completes without errors from a clean session.
4. Every table/figure listed in the README is produced.
5. No absolute paths, no `install.packages()` outside `00_setup.R`.
6. Data in `data/raw/` is byte-identical to the inputs.

Type /check-package → the agent runs the checklist. Skills travel with the repo.

Permissions: three gears

🔒

Default

Asks before
every command

claude

📋

Allow-list

Pre-approve
specific commands

settings.json

⚡

Skip all

Full autonomy
(no prompts)

--dangerously-skip-permissions

Inside Docker, "skip all" is reasonable. The container is your sandbox; the host is untouched.

settings.json allow-list

.claude/settings.json

{
  "permissions": {
    "allow": ["Bash(Rscript:*)", "Bash(R:*)", "Bash(mkdir:*)",
              "Bash(cp:*)", "Bash(mv:*)", "Bash(make:*)",
              "Write(*)", "Read(*)"],
    "deny":  ["Bash(rm -rf /*)", "Bash(git push:*)",
              "Write(data/raw/**)"]
  }
}

Let the agent run R, move files, read and write. Block rm -rf /, git push, or touches to data/raw/.

Hooks: enforce rules outside the prompt

.claude/settings.json

{
  "hooks": {
    "PostToolUse": [
      { "matcher": { "tool": "Write", "path_glob": "code/*.R" },
        "command": "Rscript -e 'lintr::lint(\"$CLAUDE_FILE_PATH\")'" },
      { "matcher": { "tool": "Write", "path_glob": "data/raw/**" },
        "command": "echo 'BLOCKED: data/raw/ is read-only'; exit 1" }
    ]
  }
}

The harness runs these, not the agent. A rule you write once is enforced every time, even if the agent forgets it.

Part 5: Research best practices

The first rule: verify, don’t trust

Every agent output is a first draft from a capable but junior RA.

✅ Do

Read every diff before accepting.
Run master.R yourself from a clean R session.
Sanity-check results against known priors.
Spot-check regression coefficients against a prior version.

❌ Don't

Blindly accept a 300-line diff.
Trust agent summaries of what it did.
Skip running the code yourself.
Let the agent be the sole author of identification decisions.

If you’d check an RA’s first output, you check the agent’s. Same rule.

Delegate paperwork, keep the research

🤖 Delegate

Folder inventory, code audit
Bulk rewrites (rename, reformat, port API)
Download / parse / clean glue code
README and docstrings
Running → failing → fixing loops
Generating variants for comparison

🧠 Keep for yourself

Research question and motivation
Identification strategy
Choice of estimator and SEs
Interpretation of results
Final-paper prose (voice matters)
Anything involving sensitive data

Agents win on the paperwork around the research, not the research itself.

Data privacy & IRB

🚫 Never put these in front of a cloud agent

PII (names, addresses, precise geocodes)
IRB-restricted microdata
Unpublished results
Third-party data under NDA or restricted-use DUA

✅ Practical mitigations

Anthropic: training opt-out is default for API; confirm in console settings.
Claude Code: sensitive folders in .claudeignore.
Docker: mount only what the agent needs; keep the raw extract outside.
When in doubt, don't. Run restricted data on a local model on a secure machine.

Session hygiene beats context degradation

📝 Write things down
Decisions live in CLAUDE.md / DECISIONS.md, not the chat buffer.

🧹 Short sessions
Break a full replication into 5 sessions, not one 3-hour one.

🔄 /clear between topics
Context degrades. Fix with hygiene, not a bigger window.

📂 Files as handoff
Next session re-reads files, not the prior chat.

The canonical symptom: the agent forgets a rule in CLAUDE.md. Fix: /clear, the rule is re-read on the next message.

The “referee 2” protocol

Inspired by Scott Cunningham. Use a second fresh session as an independent reviewer:

# Terminal 1: build the package
$ claude --dangerously-skip-permissions
▸ (builds replication_package/)

# Terminal 2: fresh session, clean context
$ claude
▸ You are a referee for the AEA Data Editor. Review
  /workspace/replication_package/ against the AEA template.
  Check: README sections, master.R runs clean, tables produced,
  packages listed with versions. Report pass/fail for each item.

A fresh context has no bias from the construction. Cheapest code review you’ll ever run.

Reproducibility is the point

Git is your safety net

Commit frequently. Hit Esc and re-prompt if a diff looks wrong.
Commit CLAUDE.md and custom skills. They're part of the repo.
A clean git log is the audit trail.

Environment is your truth

Docker image pins R + system libs.
renv::snapshot() pins package versions.
Always test from a clean R session, not yours.

Cardinal AEA rule: “Reproduce the tables and figures by running the code without manual intervention, starting from the raw data.”

Prompt patterns that work

Pattern	Example
Vague → iterate	"I want X. Ask anything unclear, then do it."
Scripts, not commands	"Write `download_data.R`", not "download the data"
Style by reference	"Follow Kieran Healy's ggplot2 conventions"
Small iterations	Three narrow prompts beat one monster prompt
Context → goal → constraints	"This replicates X. Refactor scripts. Relative paths only."
Show, don't tell	A directory tree beats paragraphs of description

Model economics

Task	Good choice
`/plan` on a hard refactor	Opus
Everyday execution	Sonnet (default for today)
Small edits, tight loops	Haiku

Use /plan on Opus, then /model sonnet after approving. On Max, this costs nothing extra; on API, it’s the single biggest cost-saver.

Want to compare to a non-Claude model (DeepSeek Reasoner, GPT-5)? See the Aider + OpenRouter appendix in the tutorial.

Part 6: Live demo

Before → after

🧹 Before (messy_project/)

1_experiment_clean.dta    ← Stata
2_nextofkin_clean.dta
3_dmv_quarterly_clean.dta
analysis_v2_FINAL.R       ← which version?
dmv_analysis_v3.R         ← v3 of what?
dmv_figures.R
make figs.R               ← space in name!
nok_analysis.R
notes.txt                 ← scratch notes
old_robustness_checks.R   ← "old"?
quick_look.R              ← exploratory
Table1_descriptive.R      ← CamelCase
nextofkin_raw.csv         ← raw data
dmv_quarterly_raw.csv

📦 After (replication_package/)

README.md
LICENSE.txt
Makefile
master.R
code/
  00_setup.R
  01_descriptive.R
  02_main_analysis.R
  03_nok_analysis.R
  04_dmv_analysis.R
  05_robustness.R
  06_figures.R
data/raw/         ← .dta + .csv
output/tables/    ← .tex + .csv
output/figures/   ← .pdf + .png

Claude Code run

$ cd /workspace/messy_project

$ claude --dangerously-skip-permissions

▸ /plan

▸ Analyze all files. Plan the reorganisation

  into an AER-compliant replication package

  per CLAUDE.md.

Planning… reading 7 R scripts, 5 data files…

📋 Plan:

  1. Create dir structure

  2. Copy data → data/raw/

  3. Refactor 6 scripts into code/

  4. Create master.R, 00_setup.R, Makefile

  5. Write README.md per AEA template

  6. Run master.R, debug until clean

What the AER README needs

Required sections

Overview
Data availability & provenance
Computational requirements
Description of programs
Instructions to replicators
Table / figure → program map
References

Common failures

Missing README sections
Code does not run clean
Hardcoded file paths
Missing data citations
No master script
Exploratory files included
Missing dependencies

"Reproduce tables and figures by running code without manual intervention, starting from raw data." (AEA Data Editor)

Part 7: Your turn

Exercise, step by step

Enter Docker, confirm Claude is authenticated.

Draft the briefing: claude /init, then edit CLAUDE.md to add AEA rules.

Start with /plan. Review, approve, let it run.

Build a /check-package skill and run it when the agent says "done".

Verify in a clean shell: Rscript master.R. Read the outputs.

Referee 2. Fresh claude session; audit the package against the AEA template.

Quick reference

🚀 Start

$ cd /workspace/messy_project
$ claude --dangerously-skip-permissions
▸ /init     # draft CLAUDE.md
▸ /plan     # plan the refactor

💬 Starter prompt

▸ /plan
▸ Analyse every file. Plan the
  reorganisation per CLAUDE.md.
  Wait for my OK.

🔧 Useful commands

/plan think first, act second
/compact compress long sessions
/cost check API spend
Esc cancel / rewind

✅ Verify

# cd replication_package
# Rscript master.R
# ls output/tables/ output/figures/

Stuck? Paste the exact error back to Claude. Out of ideas? Run the referee-2 workflow on your own build.

Do / don’t

✅ Do

Read the messy files before starting
Start with /plan and review
Iterate: "fix X", "add Y", "change Z"
Check /cost between steps
Add a /check-package skill
Run the code yourself in a clean shell

❌ Don't

Give vague instructions
Skip the briefing file
Argue with the model; just hit Esc and re-prompt
Let one session run for three hours
Trust the agent's own summary of what it did
Put restricted data in front of a cloud model

Build with Claude Code, then open a second fresh claude session and have it audit /workspace/replication_package/ against the AEA template. What did it flag that you missed?

Create .claude/commands/ with:

/check-package: verify AER compliance
/compare-output: summarise generated tables
/referee: run the referee 2 checklist

Take one of your half-finished research projects. Draft a CLAUDE.md. Ask the agent to inventory it and propose a reorganisation without executing. Read the plan. What does it see that you missed?

With the Aider + OpenRouter appendix, re-run today’s task on DeepSeek Reasoner or GPT-5. Compare plan quality, diffs, and cost against your Claude Sonnet run.

Summary

📝

CLAUDE.md is everything. A great briefing = a great agent.

🔍

Plan first, execute second. Use /plan for anything non-trivial.

⚡

Permissions match your trust. Default → allow-list → skip-all, in Docker.

🛠️

Custom skills travel with the repo. Version-control your checks.

🧠

Delegate the paperwork, keep the research. You verify; the agent types.

🤝

Referee 2. A fresh session is the cheapest code review you'll ever run.

Resources

Claude Code docs	code.claude.com/docs
Anthropic Console	console.anthropic.com
AEA Data Editor guidance	aeadataeditor.github.io
AEA README template	social-science-data-editors.github.io
Goldsmith-Pinkham on Claude Code	part 1 · part 2
Tutorial appendix (Aider + OpenRouter)	Use non-Claude models
Workshop repo	AlexRieber/Workshops

Questions?

Now let the agent do the work.