AI Coding Agents for Economics Research

Claude Code. From Messy Code to AER-Ready Replication Package. Day 2

Alexander Rieber (@AlexRieber) · Dennis Steinle (@DennisSteinle) · Ulm University

2026-04-22

Workshop Overview

Part Topic Duration
1 Recap: what is a coding agent? 5 min
2 Claude Code setup, quick check 10 min
3 The CLAUDE.md briefing 15 min
4 Commands, skills, permissions, hooks 20 min
5 Research best practices 20 min
6 Live demo: messy code to replication package 20 min
7 Hands-on: your turn self-paced

Prerequisite: Docker container running (see Docker Setup How-To). Claude Code authenticated (either claude login on Max, or ANTHROPIC_API_KEY set). If you want to use non-Claude models, the Day 2 tutorial appendix covers Aider + OpenRouter.

Part 1: Recap

The AI Spectrum

💬

Chatbot

You ask questions.
AI answers in text.
You do the work.

🛠️

Copilot

AI suggests code.
You review & accept.
Shared work.

🤖

Agent

AI plans, writes, runs
& debugs code.
AI does the work.

Why Claude Code today

🟣 Claude Code

  • Terminal-native, Anthropic-built
  • CLAUDE.md briefing file
  • First-class plan mode
  • Custom skills, hooks, permission allow-list
  • Subagents for independent subtasks
  • Models: Opus / Sonnet / Haiku

Other options exist

We're going deep on Claude Code because the full feature set (plan / hooks / skills / subagents) is the best fit for long research refactors. Aider + OpenRouter is the flexible alternative when you need other model families. See the tutorial appendix.

Today’s Task

🧹 → 📦 Messy Project → AER Replication Package

7 R scripts, 5 data files (.dta + .csv), no documentation. Use Claude Code to:

✅ Reorganize into clean directories
✅ Write an AEA-compliant README
✅ Create a master script & Makefile
✅ Fix code to use relative paths
✅ Add dependency management
✅ Produce all tables & figures

Data: Kessler & Roth (2025), “Increasing Organ Donor Registration” (openICPSR 195641).

Part 2: Setup

Into the container, claude ready

$ docker exec -it econ-replication-agent bash
root@container:~# cd /workspace/messy_project

Two authentication paths (Day 1 covers both):

Max subscription
$ claude login
API key
$ export ANTHROPIC_API_KEY=sk-ant-…

Sanity check

$ claude
 /help    # all built-in commands
 /model   # confirm which Claude model is active
 /cost    # tokens on API, or "uses Max"

If /cost reports real numbers (or "uses Max"), billing is wired up.

Part 3: The instruction file

Same file, many names

The agent reads a Markdown file on startup.

Agent File
Claude Code CLAUDE.md
Aider CONVENTIONS.md
Gemini CLI GEMINI.md
Codex CLI AGENTS.md

One content, different filenames. Draft once, cp across agents.

Think of it as a briefing, not a config.

Anatomy of a good briefing

🎯 Mission
One paragraph. What is the project? What's the job?
📋 Principles
Numbered non-negotiable rules.
🏗 Structure
Expected directory tree.
⚙️ Technical specs
Packages, output formats, SE specifications, seeds.
🚫 Don'ts
What must never happen. "Never touch data/raw/."
📞 Protocol
What to do when stuck. Where to log progress.

Example: mission & principles

# Replication Package Agent

## Mission
Transform the messy project folder into an AER-compliant replication
package. Reorganize code, write documentation, create a master script,
produce all tables and figures in one reproducible pipeline.

## Principles
1. Never delete or modify original data files. Copy to data/raw/.
2. All code runs from the project root using relative paths.
3. R with tidyverse. Load packages via library() in 00_setup.R only.
4. Tables: modelsummary() saved as .csv and .tex.
5. Figures: ggsave() saved as .pdf and .png at 300 DPI.
6. Standard errors: heteroskedasticity-robust unless otherwise specified.
7. Follow the AEA Social Science Data Editors README template.

Example: structure & specs

## Project Structure
replication_package/
├── README.md              ├── code/
├── LICENSE.txt             │   ├── 00_setup.R
├── Makefile                │   ├── 01_descriptive.R
├── master.R                │   ├── 02_main_analysis.R
├── data/                   │   ├── 03_nok_analysis.R
│   ├── raw/                │   ├── 04_dmv_analysis.R
│   └── README_data.md      │   ├── 05_robustness.R
└── output/                 │   └── 06_figures.R
    ├── tables/
    └── figures/

## Don'ts
- Never modify files in data/raw/.
- No install.packages() outside 00_setup.R.
- No hardcoded absolute paths.

/init bootstraps the first draft

$ cd /workspace/messy_project
$ claude
 /init

Scanning repo… reading sample of files…
 Wrote CLAUDE.md (62 lines)

Treat the output as a first draft. Edit to add your domain-specific rules, then commit to git.

Six design rules

1. Specific, not vague
"Clean up" ❌ → "Refactor into code/02_main_analysis.R" ✅
2. Show the expected tree
A directory tree beats 1000 words of description.
3. Define output formats
"Tables as .tex + .csv" / "Figures as PDF at 300 DPI"
4. Pin the conventions
AEA template sections, clustering specs, relative paths only.
5. State what NOT to do
"Never delete original data" / "No install.packages() in scripts"
6. Use an LLM to write it
Draft in claude.ai, then iterate: "add a section on X".

Part 4: Commands, skills, permissions

Slash commands

Command What it does
/plan Enter planning mode. Draft, wait for approval.
/init Auto-draft CLAUDE.md from the repo
/compact Compress a long conversation to save context
/clear Clear history (keep files)
/cost API spend this session (or “uses Max”)
/model Switch opus / sonnet / haiku
/your-skill Run a custom command from .claude/commands/

/plan: think before you act

❌ Without /plan

> Reorganize this project into
  an AER replication package

Agent starts moving files immediately. You find out what it did after.

✅ With /plan

> /plan
> Reorganize this project into
  an AER replication package

Agent reads files, drafts a plan, waits for your approval.

Custom skills

Drop a Markdown file in .claude/commands/. It becomes a slash command:

.claude/commands/check-package.md
Verify the replication package is AER-compliant. Report pass/fail for each:

1. README.md present and follows the AEA template sections.
2. Every script referenced in README.md exists.
3. `Rscript master.R` completes without errors from a clean session.
4. Every table/figure listed in the README is produced.
5. No absolute paths, no `install.packages()` outside `00_setup.R`.
6. Data in `data/raw/` is byte-identical to the inputs.

Type /check-package → the agent runs the checklist. Skills travel with the repo.

Permissions: three gears

🔒

Default

Asks before
every command

claude
📋

Allow-list

Pre-approve
specific commands

settings.json

Skip all

Full autonomy
(no prompts)

--dangerously-skip-permissions

Inside Docker, "skip all" is reasonable. The container is your sandbox; the host is untouched.

settings.json allow-list

.claude/settings.json
{
  "permissions": {
    "allow": ["Bash(Rscript:*)", "Bash(R:*)", "Bash(mkdir:*)",
              "Bash(cp:*)", "Bash(mv:*)", "Bash(make:*)",
              "Write(*)", "Read(*)"],
    "deny":  ["Bash(rm -rf /*)", "Bash(git push:*)",
              "Write(data/raw/**)"]
  }
}

Let the agent run R, move files, read and write. Block rm -rf /, git push, or touches to data/raw/.

Hooks: enforce rules outside the prompt

.claude/settings.json
{
  "hooks": {
    "PostToolUse": [
      { "matcher": { "tool": "Write", "path_glob": "code/*.R" },
        "command": "Rscript -e 'lintr::lint(\"$CLAUDE_FILE_PATH\")'" },
      { "matcher": { "tool": "Write", "path_glob": "data/raw/**" },
        "command": "echo 'BLOCKED: data/raw/ is read-only'; exit 1" }
    ]
  }
}

The harness runs these, not the agent. A rule you write once is enforced every time, even if the agent forgets it.

Part 5: Research best practices

The first rule: verify, don’t trust

Every agent output is a first draft from a capable but junior RA.

✅ Do
  • Read every diff before accepting.
  • Run master.R yourself from a clean R session.
  • Sanity-check results against known priors.
  • Spot-check regression coefficients against a prior version.
❌ Don't
  • Blindly accept a 300-line diff.
  • Trust agent summaries of what it did.
  • Skip running the code yourself.
  • Let the agent be the sole author of identification decisions.

If you’d check an RA’s first output, you check the agent’s. Same rule.

Delegate paperwork, keep the research

🤖 Delegate
  • Folder inventory, code audit
  • Bulk rewrites (rename, reformat, port API)
  • Download / parse / clean glue code
  • README and docstrings
  • Running → failing → fixing loops
  • Generating variants for comparison
🧠 Keep for yourself
  • Research question and motivation
  • Identification strategy
  • Choice of estimator and SEs
  • Interpretation of results
  • Final-paper prose (voice matters)
  • Anything involving sensitive data

Agents win on the paperwork around the research, not the research itself.

Data privacy & IRB

🚫 Never put these in front of a cloud agent

  • PII (names, addresses, precise geocodes)
  • IRB-restricted microdata
  • Unpublished results
  • Third-party data under NDA or restricted-use DUA

✅ Practical mitigations

  • Anthropic: training opt-out is default for API; confirm in console settings.
  • Claude Code: sensitive folders in .claudeignore.
  • Docker: mount only what the agent needs; keep the raw extract outside.
  • When in doubt, don't. Run restricted data on a local model on a secure machine.

Session hygiene beats context degradation

📝 Write things down
Decisions live in CLAUDE.md / DECISIONS.md, not the chat buffer.
🧹 Short sessions
Break a full replication into 5 sessions, not one 3-hour one.
🔄 /clear between topics
Context degrades. Fix with hygiene, not a bigger window.
📂 Files as handoff
Next session re-reads files, not the prior chat.

The canonical symptom: the agent forgets a rule in CLAUDE.md. Fix: /clear, the rule is re-read on the next message.

The “referee 2” protocol

Inspired by Scott Cunningham. Use a second fresh session as an independent reviewer:

# Terminal 1: build the package
$ claude --dangerously-skip-permissions
 (builds replication_package/)

# Terminal 2: fresh session, clean context
$ claude
 You are a referee for the AEA Data Editor. Review
  /workspace/replication_package/ against the AEA template.
  Check: README sections, master.R runs clean, tables produced,
  packages listed with versions. Report pass/fail for each item.

A fresh context has no bias from the construction. Cheapest code review you’ll ever run.

Reproducibility is the point

Git is your safety net
  • Commit frequently. Hit Esc and re-prompt if a diff looks wrong.
  • Commit CLAUDE.md and custom skills. They're part of the repo.
  • A clean git log is the audit trail.
Environment is your truth
  • Docker image pins R + system libs.
  • renv::snapshot() pins package versions.
  • Always test from a clean R session, not yours.

Cardinal AEA rule: “Reproduce the tables and figures by running the code without manual intervention, starting from the raw data.”

Prompt patterns that work

Pattern Example
Vague → iterate "I want X. Ask anything unclear, then do it."
Scripts, not commands "Write download_data.R", not "download the data"
Style by reference "Follow Kieran Healy's ggplot2 conventions"
Small iterations Three narrow prompts beat one monster prompt
Context → goal → constraints "This replicates X. Refactor scripts. Relative paths only."
Show, don't tell A directory tree beats paragraphs of description

Model economics

Task Good choice
/plan on a hard refactor Opus
Everyday execution Sonnet (default for today)
Small edits, tight loops Haiku

Use /plan on Opus, then /model sonnet after approving. On Max, this costs nothing extra; on API, it’s the single biggest cost-saver.

Want to compare to a non-Claude model (DeepSeek Reasoner, GPT-5)? See the Aider + OpenRouter appendix in the tutorial.

Part 6: Live demo

Before → after

🧹 Before (messy_project/)

1_experiment_clean.dta    ← Stata
2_nextofkin_clean.dta
3_dmv_quarterly_clean.dta
analysis_v2_FINAL.R       ← which version?
dmv_analysis_v3.R         ← v3 of what?
dmv_figures.R
make figs.R               ← space in name!
nok_analysis.R
notes.txt                 ← scratch notes
old_robustness_checks.R   ← "old"?
quick_look.R              ← exploratory
Table1_descriptive.R      ← CamelCase
nextofkin_raw.csv         ← raw data
dmv_quarterly_raw.csv

📦 After (replication_package/)

README.md
LICENSE.txt
Makefile
master.R
code/
  00_setup.R
  01_descriptive.R
  02_main_analysis.R
  03_nok_analysis.R
  04_dmv_analysis.R
  05_robustness.R
  06_figures.R
data/raw/         ← .dta + .csv
output/tables/    ← .tex + .csv
output/figures/   ← .pdf + .png

Claude Code run

$ cd /workspace/messy_project
$ claude --dangerously-skip-permissions

/plan
Analyze all files. Plan the reorganisation
  into an AER-compliant replication package
  per CLAUDE.md.

Planning… reading 7 R scripts, 5 data files…
📋 Plan:
  1. Create dir structure
  2. Copy data → data/raw/
  3. Refactor 6 scripts into code/
  4. Create master.R, 00_setup.R, Makefile
  5. Write README.md per AEA template
  6. Run master.R, debug until clean

What the AER README needs

Required sections

  1. Overview
  2. Data availability & provenance
  3. Computational requirements
  4. Description of programs
  5. Instructions to replicators
  6. Table / figure → program map
  7. References

Common failures

  • Missing README sections
  • Code does not run clean
  • Hardcoded file paths
  • Missing data citations
  • No master script
  • Exploratory files included
  • Missing dependencies

"Reproduce tables and figures by running code without manual intervention, starting from raw data." (AEA Data Editor)

Part 7: Your turn

Exercise, step by step

1
Enter Docker, confirm Claude is authenticated.
2
Draft the briefing: claude /init, then edit CLAUDE.md to add AEA rules.
3
Start with /plan. Review, approve, let it run.
4
Build a /check-package skill and run it when the agent says "done".
5
Verify in a clean shell: Rscript master.R. Read the outputs.
6
Referee 2. Fresh claude session; audit the package against the AEA template.

Quick reference

🚀 Start

$ cd /workspace/messy_project
$ claude --dangerously-skip-permissions
 /init     # draft CLAUDE.md
 /plan     # plan the refactor

💬 Starter prompt

 /plan
 Analyse every file. Plan the
  reorganisation per CLAUDE.md.
  Wait for my OK.

🔧 Useful commands

/plan think first, act second
/compact compress long sessions
/cost check API spend
Esc cancel / rewind

✅ Verify

# cd replication_package
# Rscript master.R
# ls output/tables/ output/figures/

Stuck? Paste the exact error back to Claude. Out of ideas? Run the referee-2 workflow on your own build.

Do / don’t

✅ Do

  • Read the messy files before starting
  • Start with /plan and review
  • Iterate: "fix X", "add Y", "change Z"
  • Check /cost between steps
  • Add a /check-package skill
  • Run the code yourself in a clean shell

❌ Don't

  • Give vague instructions
  • Skip the briefing file
  • Argue with the model; just hit Esc and re-prompt
  • Let one session run for three hours
  • Trust the agent's own summary of what it did
  • Put restricted data in front of a cloud model

Bonus challenges

Build with Claude Code, then open a second fresh claude session and have it audit /workspace/replication_package/ against the AEA template. What did it flag that you missed?

Create .claude/commands/ with:

  • /check-package: verify AER compliance
  • /compare-output: summarise generated tables
  • /referee: run the referee 2 checklist

Take one of your half-finished research projects. Draft a CLAUDE.md. Ask the agent to inventory it and propose a reorganisation without executing. Read the plan. What does it see that you missed?

With the Aider + OpenRouter appendix, re-run today’s task on DeepSeek Reasoner or GPT-5. Compare plan quality, diffs, and cost against your Claude Sonnet run.

Summary

📝
CLAUDE.md is everything. A great briefing = a great agent.
🔍
Plan first, execute second. Use /plan for anything non-trivial.
Permissions match your trust. Default → allow-list → skip-all, in Docker.
🛠️
Custom skills travel with the repo. Version-control your checks.
🧠
Delegate the paperwork, keep the research. You verify; the agent types.
🤝
Referee 2. A fresh session is the cheapest code review you'll ever run.

Resources

Claude Code docs code.claude.com/docs
Anthropic Console console.anthropic.com
AEA Data Editor guidance aeadataeditor.github.io
AEA README template social-science-data-editors.github.io
Goldsmith-Pinkham on Claude Code part 1 · part 2
Tutorial appendix (Aider + OpenRouter) Use non-Claude models
Workshop repo AlexRieber/Workshops

Questions?

Now let the agent do the work.