AI Coding Agents for Economics Research
Claude Code. From Messy Code to AER-Ready Replication Package. Day 2
Who is this for? Master’s students, PhD students, and post-docs who want to use Claude Code from the terminal to organise, document, and package research code to professional (AER/AEA) standards.
This is Day 2. We go deeper on Claude Code specifically: custom skills, hooks, plan mode, permission allow-lists, research best practices, and the messy-project-to-replication-package exercise. If you skipped Day 1, Part 2 repeats the Claude Code setup so you can follow along.
Prerequisites: Docker container built and running (see Docker Setup How-To). Either a Claude Max subscription, or an Anthropic API key with a few dollars of credit.
Want other model families (GPT-5, DeepSeek, Gemini)? See Appendix A: Using other models via Aider + OpenRouter at the end of this handout. It’s a self-contained setup you can use alongside Claude Code.
1 Part 1: What you will build today
1.1 The task
You have inherited a messy project folder — the kind every researcher has at some point. It contains:
- R scripts with inconsistent names (
analysis_v2_FINAL.R,make figs.R) - No README, no master script, no documentation
- Data files mixed with code
- Hardcoded file paths and commented-out code
- No dependency management
- Exploratory scripts mixed with the final analysis
Your job: use Claude Code to transform this mess into a publication-ready replication package that meets the standards of the American Economic Association (AEA).
1.2 The data
The project uses the actual replication data from Kessler & Roth (2025), “Increasing Organ Donor Registration as a Means to Increase Transplantation: An Experiment with Actual Organ Donor Registrations” (AEJ: Economic Policy, openICPSR 195641). The experiment tested whether changing how organ-donor registration questions are framed (opt-in vs. active choice) affects real registration decisions.
The messy folder contains three Stata datasets (.dta) and two raw CSV files, mixed with seven R scripts of varying quality and a notes.txt:
| File | Description | Observations |
|---|---|---|
1_experiment_clean.dta |
Lab experiment waves 1–3 (real registrations) | 1,043 |
2_nextofkin_clean.dta |
Next-of-kin experiment (MTurk) | 803 |
3_dmv_quarterly_clean.dta |
State-level DMV registration panel | 1,416 |
dmv_quarterly_raw.csv |
Raw quarterly DMV data | — |
nextofkin_raw.csv |
Raw next-of-kin Qualtrics data | — |
1.3 What a finished package looks like
By the end, you will have:
replication_package/
├── README.md # Structured per AEA template
├── LICENSE.txt # MIT for code, CC-BY for data
├── Makefile # One command reproduces everything
├── master.R # Runs all scripts in correct order
├── CLAUDE.md # The briefing
├── .claude/
│ ├── settings.json # Permission allow-list + hooks
│ └── commands/
│ ├── check-package.md # Custom skill: AER compliance check
│ └── referee.md # Custom skill: referee-2 audit
├── code/
│ ├── 00_setup.R # Install/load packages
│ ├── 01_descriptive.R # Table 1: summary statistics
│ ├── 02_main_analysis.R # Tables 2–3: treatment effects
│ ├── 03_nok_analysis.R # Table 4: next-of-kin results
│ ├── 04_dmv_analysis.R # Table 5: state-level DiD
│ ├── 05_robustness.R # Appendix robustness checks
│ └── 06_figures.R # All figures
├── data/
│ ├── raw/ # Original CSV/Stata files
│ └── README_data.md # Variable descriptions
└── output/
├── tables/ # LaTeX and CSV tables
└── figures/ # PDF and PNG figures
2 Part 2: Claude Code setup
Skip this part if you did Day 1 and Claude Code is already authenticated. Otherwise, ten minutes gets you from zero to an authenticated session.
Two payment paths.
2.1 Track A. Claude Max (flat subscription)
Good if you expect heavy daily use.
- Subscribe. Open claude.ai/settings/plans, pick Max 5× or Max 20×.
- Enter the container.
docker exec -it econ-replication-agent bash - Log in.
claude loginA browser window opens for OAuth; the token is stored in the container. On a headless server, use claude login --no-browser for a paste-in flow.
2.2 Track B. API tokens (pay-as-you-go)
Good if you want to start small.
- Sign up at console.anthropic.com.
- Add $5 credit. Billing → Add credits.
- Create an API key with a spend cap. Workspaces → API Keys → Create. Name it
workshop-day2, set a hard ceiling (e.g. $5). Copy the key immediately; you cannot view it again. - Confirm privacy settings (Settings → Privacy). Training opt-out is default on API; confirm the toggle.
- Export and verify.
docker exec -it econ-replication-agent bash
export ANTHROPIC_API_KEY=sk-ant-...
claude
▸ /cost/cost should now show tokens. You’re wired up.
2.3 First session: what to check
▸ /help # list all built-in commands
▸ /model # confirm which Claude model is active
▸ /cost # confirm billing is wired up
Expected cost for today: $2–5 on API for a full messy-project build on Sonnet; just session time on Max.
3 Part 3: Claude Code, in one page
The features that matter most for research workflows.
3.1 Slash commands
| Command | What it does |
|---|---|
/help |
Full command list |
/init |
Auto-draft a CLAUDE.md from the repo |
/plan |
Enter planning mode — drafts a plan, waits for approval |
/compact |
Compress the conversation to save context |
/clear |
Clear history (keep files). Agent re-reads CLAUDE.md on the next message. |
/model <name> |
Switch opus / sonnet / haiku |
/cost |
Session cost on API, or “uses Max” |
/your-skill |
Run a custom command from .claude/commands/ |
3.2 Keyboard shortcuts
| Shortcut | Action |
|---|---|
Ctrl+O |
Toggle verbose output (show agent thinking) |
Ctrl+C |
Cancel current generation |
Esc Esc |
Rewind to a previous point |
3.3 Starting modes
claude # interactive, asks before each bash command
claude --model opus # start with Opus (e.g. for /plan)
claude --dangerously-skip-permissions # full autonomy — reasonable inside Docker3.4 Useful things you can ask once and reuse
/init— drafts a first-passCLAUDE.mdby scanning the repo.- Custom skills — drop Markdown files in
.claude/commands/; they become slash commands. - Hooks — pre/post-tool shell commands in
.claude/settings.json; the harness enforces them. - Subagents — for independent subtasks (“refactor file A while you refactor file B”), run in parallel.
4 Part 4: The instruction file — your agent’s brain
4.1 Same file, many names
| Agent | File it reads | Discovery |
|---|---|---|
| Claude Code | CLAUDE.md |
Project root, then up the tree to / |
| Aider | CONVENTIONS.md |
Via --read or .aider.conf.yml |
| Gemini CLI | GEMINI.md |
Project root |
| Codex CLI | AGENTS.md |
Hierarchical |
The content is identical. cp CLAUDE.md CONVENTIONS.md literally works.
Think of these as a detailed briefing you hand to a new research assistant on their first day. The agent reads them automatically — you never paste them into the chat.
4.2 Why the briefing matters
Without it, the agent will:
- Not know the project’s goals or context.
- Make arbitrary decisions about file organisation.
- Use whatever coding style it prefers (your adviser won’t).
- Not know your field’s conventions (AER formatting, clustering choices, etc.).
With a good briefing, the agent will:
- Follow your exact specifications.
- Use the right packages and methods.
- Produce output in the format you need.
- Handle edge cases the way you want.
4.3 Anatomy of a good briefing
Six sections, in roughly this order:
# Project Name
## Mission
One paragraph. What is this project? What should the agent achieve?
## Principles
Numbered list of non-negotiable rules. Short. Unambiguous.
## Project Structure
Show the expected directory tree.
## Technical Specifications
Packages to use, output formats, coding style, SE specifications.
## Don'ts
What must never happen ("never delete data/raw/", "no install.packages()
outside 00_setup.R", "no hardcoded absolute paths").
## Communication Protocol
What to do when stuck. Where to log decisions for handoff between sessions.4.4 A full CLAUDE.md for today’s exercise
CLAUDE.md
# Replication Package Agent
## Mission
Transform the messy project folder into an AER-compliant replication
package. Reorganise code, write documentation, create a master script,
and produce all tables and figures in a clean, reproducible pipeline.
## Principles
1. Never delete or modify the original data files. Copy to data/raw/.
2. All code must run from the project root using relative paths.
3. Use R with tidyverse conventions. Load packages with library().
4. Output tables as both .csv and .tex (using modelsummary).
5. Output figures as both .pdf and .png (300 DPI).
6. Follow the AEA Social Science Data Editors README template exactly.
7. Document every non-obvious decision in the README.
8. Use fixest::feols() for regressions, not lm().
9. Standard errors: heteroskedasticity-robust (vcov = "hetero") unless
a script comment says otherwise.
## Project Structure
replication_package/
├── README.md
├── LICENSE.txt
├── Makefile
├── master.R
├── code/
│ ├── 00_setup.R
│ ├── 01_descriptive.R
│ ├── 02_main_analysis.R
│ ├── 03_nok_analysis.R
│ ├── 04_dmv_analysis.R
│ ├── 05_robustness.R
│ └── 06_figures.R
├── data/
│ ├── raw/
│ └── README_data.md
└── output/
├── tables/
└── figures/
## Technical Specifications
- R packages: tidyverse, haven, fixest, modelsummary, ggplot2, marginaleffects
- Tables: modelsummary() with output formats "latex" and "csv"
- Figures: ggsave() with width=8, height=5, dpi=300
- Data files are Stata .dta format — read with haven::read_dta()
- Set seed at the top of every script that randomises anything
(set.seed(20260422))
## AER README Requirements
The README.md must follow the Social Science Data Editors template:
1. Overview
2. Data Availability and Provenance Statements
(Statement about Rights, Summary of Availability, Dataset List)
3. Computational Requirements (software, packages with versions, runtime)
4. Description of Programs (what each script does)
5. Instructions to Replicators (step-by-step)
6. List of Tables and Programs (every output mapped to its script)
7. References
## Don'ts
- Never modify files in data/raw/.
- No install.packages() outside 00_setup.R.
- No hardcoded absolute paths.
- No `setwd()` calls.
- No committing large data files to git — use .gitignore.
## Communication Protocol
- If a script fails: log the error, continue with independent tasks,
summarise all errors at the end.
- Write a DECISIONS.md at the project root for non-obvious choices
("I dropped quick_look.R because it's exploratory and not referenced
anywhere").
- At the end of a session, append a short note to DECISIONS.md so the
next session (or the next agent) can pick up without re-reading the
full chat history.4.5 Best practices
- Be specific, not vague. “Use
modelsummary()withoutput = 'latex'” beats “make nice tables.” - Show the expected directory layout. A tree diagram removes ambiguity.
- State what NOT to do. “Never delete files in
data/raw/” prevents costly mistakes. - Use emphasis for critical rules. Bold or ALLCAPS for non-negotiable constraints.
- Keep it concise. Long instruction files dilute important rules. Aim for 50–250 lines.
- Update as you learn. The briefing is a living document.
- Use
/initto bootstrap. Then edit. - Meta move: draft with a chatbot. Paste a description of your project into claude.ai, ask for the six sections, iterate.
CLAUDE.md
Claude Code discovers CLAUDE.md up the directory tree. A ~/.claude/CLAUDE.md sets global preferences (“always use tidyverse”); a project-level CLAUDE.md adds project-specific rules. Both are loaded and merged. Put taste preferences at the global level; put project conventions in the project. Every Claude Code session in every project then behaves consistently without re-specifying your house style.
5 Part 5: Permissions, planning, skills, hooks
5.1 Permission modes
| Mode | How to activate | What happens |
|---|---|---|
| Default | claude |
Agent asks before each bash command |
| Allow-list | .claude/settings.json |
You pre-approve specific patterns |
| Skip all | claude --dangerously-skip-permissions |
Runs everything without asking |
An allow-list in .claude/settings.json:
.claude/settings.json
{
"permissions": {
"allow": [
"Bash(Rscript:*)",
"Bash(R:*)",
"Bash(mkdir:*)",
"Bash(cp:*)",
"Bash(mv:*)",
"Bash(cat:*)",
"Bash(ls:*)",
"Bash(make:*)",
"Write(*)",
"Read(*)"
],
"deny": [
"Bash(rm -rf /*)",
"Bash(git push:*)",
"Write(data/raw/**)"
]
}
}The agent can run R, create directories, copy files, read/write — but can’t rm -rf, git push, or modify data/raw/.
Inside a Docker container, the agent cannot escape the sandbox. The host’s files are safe. claude --dangerously-skip-permissions is appropriate for today.
5.2 Planning mode
Complex tasks benefit from planning before execution. Without it, the agent might start reorganising files before understanding the full scope, miss dependencies between scripts, or build a structure that needs to be redone.
▸ /plan
Analyse all files in this project folder. Create a detailed plan to
reorganise everything into an AER-compliant replication package.
In planning mode, the agent:
- Reads all files in the project.
- Analyses dependencies between scripts.
- Drafts a step-by-step plan with specific actions.
- Presents the plan for your review.
- Waits for your approval before executing anything.
You can:
- Approve: “Looks good, proceed.”
- Modify: “Skip the robustness checks. Rename Table 5 to Table 4.”
- Reject: “Start over with a different structure.”
/plan
Use it when:
- The task has many steps (like today’s exercise)
- You want to review the approach before execution
- The agent might make structural decisions you want to influence
You do NOT need /plan for:
- Simple, single-file edits (“fix the axis label in Figure 3”)
- Running existing scripts (“run master.R”)
- Questions about the project (“what packages does this use?”)
5.3 Custom skills
Drop a Markdown file into .claude/commands/ and it becomes a project-scoped slash command.
mkdir -p .claude/commandsExample — a /check-package command:
.claude/commands/check-package.md
Verify that the replication package is AER-compliant. For each item
below, report pass/fail with a one-line justification.
1. README.md present and follows the AEA Social Science Data Editors
template (7 required sections).
2. Every script referenced in README.md exists on disk.
3. `Rscript master.R` completes without errors from a clean R session.
4. Every table and figure listed in the README is produced in
output/tables/ or output/figures/.
5. No absolute paths in any .R file (`grep -rn '/home\|C:' code/`).
6. 00_setup.R loads every library() used across code/.
7. data/raw/ is byte-identical to the original input (compare SHA-256).
8. LICENSE.txt present.
Finish with a numerical score: X / 8.Type /check-package in Claude Code — the agent runs the checklist. Skills travel with your repo; anyone who clones it gets the same commands.
Another useful one — /referee:
.claude/commands/referee.md
You are a fresh AEA Data Editor referee. You have NOT seen this project
before. Review the replication package at /workspace/replication_package/.
1. Is the README structured per the AEA template? List any missing
sections.
2. Does `master.R` run without errors from a clean shell? (Run it.)
3. Are all packages listed with versions (`sessionInfo()` or
`renv.lock`)?
4. Are all tables/figures in the paper produced by the code?
5. What would you ask the authors to fix?
Output a structured referee report.5.4 Hooks
Hooks are shell commands that execute automatically on agent events. Configure in .claude/settings.json:
.claude/settings.json
{
"hooks": {
"PostToolUse": [
{
"matcher": { "tool": "Write", "path_glob": "code/*.R" },
"command": "Rscript -e 'lintr::lint(\"$CLAUDE_FILE_PATH\")'"
},
{
"matcher": { "tool": "Write", "path_glob": "data/raw/**" },
"command": "echo 'BLOCKED: data/raw/ is read-only'; exit 1"
}
],
"Stop": [
{
"command": "git log --oneline -n 5"
}
]
}
}The harness runs these — the agent cannot skip them. Use cases:
- Lint R code after every file write.
- Prevent writes to
data/raw/. - Log all shell commands for reproducibility auditing.
- Show a compact git log when a session ends.
A hook you write once is enforced every time, even if the agent “forgets” a rule in CLAUDE.md.
6 Part 6: The AER replication package standard
6.1 Why standards matter
The American Economic Association requires all published papers to include a replication package deposited on openICPSR. The AEA Data Editor reviews every package before publication. Understanding these standards is essential for any economist submitting to AEA journals — and a good practice for any empirical research you want others to trust.
6.2 The AEA README template
The AEA uses the Social Science Data Editors’ README template. Your README must include:
- Overview: Paper title, authors, one-paragraph summary of what the package contains.
- Data Availability and Provenance Statements:
- Statement about Rights — certify you have permission to use and redistribute the data.
- Summary of Availability — all public / some restricted / none available.
- Dataset List — table mapping data files to their sources.
- Computational Requirements: Software (with versions), packages (with versions), hardware, runtime estimate.
- Description of Programs: What each script does.
- Instructions to Replicators: Numbered steps to reproduce everything.
- List of Tables and Programs: Map every table/figure to the script that produces it.
- References: Full citations for all data sources.
6.2.1 Computational requirements checklist
Your README must specify:
6.3 Other required files
| File | Purpose |
|---|---|
README.md |
Documentation (see above) |
LICENSE.txt |
Terms of use (MIT for code, CC-BY 4.0 for data) |
master.R or Makefile |
Single entry point that runs everything |
00_setup.R |
Installs/loads all package dependencies |
“The replication package should reproduce the tables and figures, as well as any in-text numbers, by running code without manual intervention, starting from the raw data.” — AEA Data Editor
6.4 Common compliance failures
| Problem | How to avoid |
|---|---|
| Missing or incomplete README | Use the template, fill every section |
| Code does not run | Test from a clean environment |
| Missing data citations | Cite every dataset in the manuscript’s references |
| No master script | Create master.R that sources all scripts in order |
| Hardcoded paths | Use relative paths from the project root |
| Missing dependencies | List every package in 00_setup.R with version |
| Exploratory files included | Remove scratch scripts, keep only final code |
| Manual steps required | Automate everything |
7 Part 7: Research best practices with AI agents
This is the section that makes the difference between using an agent competently and using one recklessly. Read it before the hands-on exercise.
7.1 1. Verify, don’t trust
Every agent output is a first draft from a capable but junior RA. The agent is fast, confident, and occasionally wrong in ways that look right.
- Read every diff. Before accepting a change, look at it. If you hit “yes” reflexively, you are not doing research — you are hoping.
- Run the code yourself from a clean shell. Don’t rely on “agent says it works.” Open a fresh terminal,
cd replication_package,Rscript master.R, read every line of output. - Spot-check against priors. If a regression now says the effect is +0.12 when your prior result was +0.08, stop and figure out why before moving on. Know your magnitudes.
- Don’t trust agent summaries of what it did. They are narratives, not audits.
git logand a diff are the audit trail.
If you would check an RA’s first output, you check the agent’s.
7.2 2. Delegate the paperwork, keep the research
AI coding agents win on the paperwork around the research, not the research itself.
Delegate to the agent:
- Inventory / audit of an unknown folder.
- Bulk rewrites (rename, reformat, port API).
- Glue code (download, parse, save, plot).
- Documentation drafts (README, script headers, data dictionaries).
- Running → failing → fixing loops.
- Generating variants for comparison (three plot styles, two SE specifications).
Keep for yourself:
- Research question and motivation.
- Identification strategy.
- Choice of estimator and standard errors.
- Interpretation of coefficients.
- Final-paper prose.
- Anything involving sensitive data.
The failure mode is offloading the wrong decisions to the agent — asking it to pick a clustering level or choose whether to weight. Don’t. Make that decision yourself, write it into CLAUDE.md as a rule, and let the agent execute it.
7.3 3. Data privacy and IRB
Cloud agents send your prompt — and everything in context — to a third-party LLM provider.
Never put these in front of a cloud agent:
- Personally identifiable data (names, addresses, precise geocodes, SSNs, NHS numbers).
- IRB-restricted microdata.
- Restricted-use administrative data (Census, tax records, health) under a DUA.
- Unpublished results you don’t want a model provider seeing.
Mitigations:
- Anthropic: training opt-out is default for API traffic. Confirm in console settings.
.claudeignore: add sensitive folders so the repo map doesn’t scan them.- Docker mount surface: only mount what the agent needs. Keep the raw extract outside the container if you can.
- Local models for the really sensitive stuff: run
ollamalocally; the Aider appendix in this handout shows how. - When in doubt, don’t. There is no button that makes sending restricted data to a cloud model OK.
7.4 4. Session hygiene: beat context degradation
Claude’s context window is 200k tokens (or 1M on newer models). It fills up faster than you’d think, especially when you add large files or run verbose shell commands whose output lands back in the chat.
As the conversation grows, performance drops:
- The agent forgets a rule stated 20 turns ago.
- Sloppier diffs — spurious edits to files you didn’t ask about.
- Vague summaries (“I’ve updated the analysis”) in place of concrete ones.
The fix is session hygiene, not a bigger context window:
/clearwhen switching topics. The files stay; the chat resets. The agent re-readsCLAUDE.mdon the next message./compactwhen the session is clearly bloated but you want to keep a summary.- Short sessions over long ones. Break the messy-project task into “reorganise”, “write scripts”, “write README”, “run and debug” — each a separate session.
- Files as handoff state. Decisions go in
DECISIONS.mdorCLAUDE.md, not the chat buffer.
The canonical symptom of context degradation is the agent forgetting a rule in CLAUDE.md. That’s your cue: /clear, the rule is re-read and honoured.
7.5 5. The “referee 2” protocol
The single most useful pattern for verifying research code from an agent: a second, fresh session as an independent reviewer.
# Terminal 1: build the package
claude --dangerously-skip-permissions
▸ (builds replication_package/)
# Terminal 2: fresh session, clean context
claude
▸ You are a referee for the AEA Data Editor. Review
/workspace/replication_package/. Check:
1. Does `Rscript master.R` run without errors from a clean session?
2. Does the README follow the AEA template (7 sections)?
3. Are all tables and figures produced?
4. Are all packages listed with versions?
5. Any absolute paths? Any exploratory files left over?
Report pass/fail for each criterion with specific evidence.Why this works:
- A fresh context has no bias from the construction. It didn’t spend an hour choosing a structure; it’s just reading.
- Running the code from a fresh shell surfaces hardcoded paths and missing dependencies that the building session papered over.
Variation: cross-model referee. Build with Sonnet, review with Opus. Or use the Aider + OpenRouter appendix to audit with GPT-5 or DeepSeek Reasoner. Different brains catch different mistakes.
7.6 6. Reproducibility first
The point of an AER replication package is that someone else, on a different machine, a year later, can reproduce your results. Every agent decision should be evaluated against that bar.
Git is your safety net:
- Commit frequently. Hit Esc and re-prompt if a diff looks wrong before accepting.
- Commit
CLAUDE.mdand custom skills. They’re part of the repo. - A clean
git logis the audit trail of the agent’s work.
Environment is your truth:
- The Docker image pins R, Python, system libraries.
renv::snapshot()pins R package versions.renv::restore()recreates them elsewhere.- Always test the final package from a clean R session (
R --vanilla), not your dev session.
One-command reproducibility:
master.Rsources every script in order.Makefiledoes the same with dependency tracking. Bonus:make cleanwipes the outputs so you can verify the build from scratch.
7.7 7. Prompt patterns that work
| # | Pattern | Example |
|---|---|---|
| 1 | Vague → iterate | “I want X. Ask anything unclear, then do it.” |
| 2 | Scripts, not commands | “Write download_data.R”, not “download the data” |
| 3 | Style by reference | “Kieran Healy’s ggplot2 best practices” beats a 20-line theme() spec |
| 4 | Small iterations | Three narrow prompts beat one 500-word prompt |
| 5 | Context → objective → constraints | “This project replicates X. Refactor scripts. Relative paths only.” |
| 6 | Show, don’t tell | A directory tree beats paragraphs of description |
| 7 | Reference files explicitly | @filename lets Claude Code scope to specific files |
| 8 | Stop arguing — start over | If the agent keeps making the same mistake, Esc, /clear, re-prompt. |
Weak vs. strong prompts:
| Weak | Strong |
|---|---|
| “Clean up this code” | “Refactor analysis_v2_FINAL.R into code/02_main_analysis.R. Replace read.csv with haven::read_dta. Use fixest::feols().” |
| “Make a README” | “Write README.md following the AEA Social Science Data Editors template. Include all 7 required sections. List every R package used with its installed version (use packageVersion()).” |
| “Fix the figures” | “In code/06_figures.R, change Figure 1 to theme_minimal(), save as both PDF and PNG at 300 DPI in output/figures/, filename figure1_main.{pdf,png}.” |
7.8 8. Model economics
A messy-to-replication-package run is 20–60 minutes of agent time.
The rule of thumb for Claude Code:
- Use
/planon Opus for the initial architecture. /model sonnetonce you’ve approved the plan — cheaper execution./model haikufor tiny subsequent edits.
On Max, switching is free. On API, Opus is ~5× more expensive than Sonnet; use it deliberately.
Want to compare models across providers? The Aider + OpenRouter appendix has the architect + editor split (reasoner plans, cheap editor writes) with typical savings of 40–60% on long refactors.
7.9 9. Keep a human in the loop for methodology
An agent happily writes
feols(outcome ~ treat | wave + state, data = df, vcov = "hetero")without pausing on whether clustered SEs at the wave level make more sense, or whether state fixed effects absorb the treatment’s identifying variation. Don’t let it decide for you.
- Write the estimator, the fixed effects, and the SE choice into
CLAUDE.mdas rules. - When the agent proposes a different spec, it is asking you a question — answer it before accepting.
- Review regression output the same way you’d review an RA’s: coefficient signs, magnitudes, sample sizes.
7.10 10. Write progress logs for handoff
Between sessions, the agent re-reads files, not the prior conversation. Use that:
- Append a short note to
DECISIONS.mdwhenever you or the agent makes a non-obvious choice. - At the end of each session, ask the agent to summarise what was done into
DECISIONS.md. Commit it. - Next session: Claude Code reads
CLAUDE.mdandDECISIONS.mdon startup. It picks up where you left off with no context loss.
7.11 11. Tutor mode is real: learn by doing
If you are new to Claude Code, don’t read about it — run a tutor session. Load a tutor briefing as a read-only file, let the agent walk you through the workshop material by having you do the steps yourself. Progress is stored in a file, not in the chat.
8 Part 8: Hands-on exercise
8.1 Step 1: Get the workshop files
If you haven’t already, clone the workshop repository on your host machine (outside Docker):
git clone https://github.com/AlexRieber/Workshops.gitThe messy project files are at Workshops/AI_Agents/messy_project/.
8.2 Step 2: Set up your workspace
# Copy the messy project into the running container
docker cp ~/Workshops/AI_Agents/messy_project econ-replication-agent:/workspace/messy_project
# Enter the container
docker exec -it econ-replication-agent bash
cd /workspace/messy_project
ls -laYou should see:
1_experiment_clean.dta analysis_v2_FINAL.R
2_nextofkin_clean.dta dmv_analysis_v3.R
3_dmv_quarterly_clean.dta dmv_figures.R
dmv_quarterly_raw.csv figure8_nok.R
nextofkin_raw.csv make figs.R
nok_analysis.R notes.txt
old_robustness_checks.R quick_look.R
Table1_descriptive.R
Before starting the agent, look at the files yourself. Open a few scripts. Notice the problems: inconsistent naming, no structure, hardcoded paths, commented-out code, exploratory scripts mixed with analysis code. This is what the agent will fix — but you need to know what to verify later.
8.3 Step 3: Write the briefing
Two ways:
8.3.1 A. Bootstrap with /init
claude
▸ /initClaude Code writes a first-draft CLAUDE.md. Edit it to add the AEA-specific rules from Part 4.
8.3.2 B. Paste the example from Part 4
nano CLAUDE.md
# (paste the content from Part 4, save with Ctrl+O, Enter, Ctrl+X)8.4 Step 4: Start with planning
claude --dangerously-skip-permissionsThen type:
▸ /plan
Analyse all files in this project folder. Create a detailed plan to
reorganise everything into an AER-compliant replication package.
The package should follow the structure in CLAUDE.md. Show me the
plan before you start.
Review the plan. If it looks good:
▸ Approved. Please execute the plan.
8.5 Step 5: Watch the agent work
The agent will typically:
- Read all existing scripts to understand what they do.
- Create the directory structure (
code/,data/raw/,output/tables/,output/figures/). - Copy data files to
data/raw/. - Refactor each script: clean code, fix paths, add proper output saving.
- Create
00_setup.Rwith all package dependencies. - Create
master.Rthat sources all scripts in order. - Write
README.mdfollowing the AEA template. - Create
LICENSE.txtandMakefile. - Test by running
master.R. - Fix any errors that come up.
Depending on the model. Use Ctrl+O to toggle verbose output and watch the thinking in real time. Check /cost every few steps — build cost intuition.
8.6 Step 6: Review the results
After the agent finishes, check the output from a clean shell, not from inside the agent session:
# In a new terminal, inside the container:
cd /workspace/replication_package
# Tree
ls -R
# Clean-session run
R --vanilla -e 'source("master.R")'
# Verify outputs
ls output/tables/
ls output/figures/
# Read the README
head -60 README.md8.7 Step 7: Iterate and improve
The first pass will not be perfect. Use the agent to fix issues:
▸ The README is missing the Computational Requirements section.
Please add it with the correct R version (run `R --version` to check)
and the package list with versions (run packageVersion(...) for each).
▸ Table 2 should use clustered standard errors at the session level,
not heteroskedasticity-robust. Update 02_main_analysis.R and regenerate.
▸ The Makefile's `clean` target should also remove output/tables/ and
output/figures/. Please update it.
This iterative refinement is where agents excel. Each fix takes seconds, not minutes.
8.8 Step 8: Run the referee-2 audit
# Fresh session, clean context:
claude
▸ You are a fresh referee for the AEA Data Editor. You have NOT seen
this project before. Audit /workspace/replication_package/ against
the AEA Social Science Data Editors README template. Specifically:
1. List any missing README sections.
2. From an empty /tmp/test/ directory, try to reproduce the package:
copy the folder, run master.R, confirm every table/figure in the
README is produced.
3. Are all packages listed with versions?
4. Any absolute paths or setwd() calls?
5. Any exploratory files left over from the original messy folder?
Output a structured referee report with pass/fail for each item.Read the report. Fix what it catches. This is the habit that separates a good replication package from a great one.
9 Part 9: Advanced techniques
9.1 Opus for planning, Sonnet for execution
claude --model opus
▸ /plan
▸ Analyse /workspace/messy_project and draft a detailed plan for the
AER replication package, per CLAUDE.md.After approving:
▸ /model sonnet
▸ Proceed with the plan.
On API, this is the single biggest cost-saver for long refactors. On Max, there’s no token cost — but Sonnet is typically faster.
9.2 Subagents for parallel work
▸ Refactor 02_main_analysis.R while a subagent simultaneously refactors
03_nok_analysis.R. When both complete, integrate both into master.R.
Claude Code spawns a second agent with its own context. The parent agent re-integrates the results. Use this when subtasks are genuinely independent — otherwise you get merge conflicts and no speed-up.
9.3 Hooks for research discipline
Useful hooks for an AER-bound project (.claude/settings.json):
{
"hooks": {
"PostToolUse": [
{
"matcher": { "tool": "Write", "path_glob": "code/*.R" },
"command": "Rscript -e 'lintr::lint(\"$CLAUDE_FILE_PATH\")'"
},
{
"matcher": { "tool": "Write", "path_glob": "data/raw/**" },
"command": "echo 'BLOCKED: data/raw/ is read-only'; exit 1"
},
{
"matcher": { "tool": "Write", "path_glob": "**/*.R" },
"command": "grep -nE '^setwd|\"/home/|\"C:' \"$CLAUDE_FILE_PATH\" && echo 'WARN: absolute path or setwd()'; true"
}
],
"Stop": [
{
"command": "git log --oneline -n 5"
}
]
}
}- First hook: lint every R file the agent writes.
- Second: hard-block writes into
data/raw/. - Third: warn (but don’t block) on
setwd()or absolute paths. - Fourth: at end of session, show the last 5 commits.
9.4 Using .claudeignore
Like .gitignore, this file tells Claude Code to skip certain files. Useful for large projects:
.claudeignore
# Skip large data files the agent doesn't need to read
data/raw/*.csv
data/raw/*.dta
# Skip compiled output
output/
# Skip renv library
renv/library/
This speeds up the agent (fewer files in the repo map) and prevents accidental reads of sensitive data.
9.5 Global vs project CLAUDE.md
Global (~/.claude/CLAUDE.md) — house style, cross-project:
# Global preferences
- Prefer tidyverse over base R.
- Use `fixest::feols` for regressions, not `lm`.
- Use `here::here()` for paths.
- Write concise comments only where the "why" is non-obvious.Project (CLAUDE.md) — project-specific rules, loaded on top.
Both are loaded and merged. Use this for consistent cross-project behaviour without re-specifying your house style.
10 Part 10: Troubleshooting
10.1 Common issues
| Problem | Solution |
|---|---|
| Claude Code asks for permission on every command | Set up .claude/settings.json (Part 5) or use --dangerously-skip-permissions inside Docker |
| Agent takes a wrong approach | Ctrl+C to cancel, then “Instead of X, please do Y” — or Esc Esc to rewind |
| Agent runs out of context | /compact to compress, or /clear to reset (re-reads CLAUDE.md) |
| R package not installed | “Install fixest.” Or: Rscript -e 'install.packages("fixest")' |
| Agent creates files in wrong location | Be specific: “Create code/01_descriptive.R in /workspace/replication_package/” |
| Agent modifies original data files | Add to CLAUDE.md: “Never modify files in data/raw/.” Add a deny rule to .claude/settings.json. |
| Agent generates code that doesn’t run | Approve the run command. It sees the error and debugs. |
authentication failed |
Re-run claude login (Max) or re-export ANTHROPIC_API_KEY |
| Rate-limit errors | Wait 30 seconds; or consider the Max plan |
| Context window exceeded | /compact or /clear |
10.2 When to intervene
Let the agent work autonomously unless:
- It is about to delete original data files.
- It is taking a fundamentally wrong statistical approach.
- It has been stuck in a loop for more than 3 attempts on the same error.
- It is making network requests you did not expect.
/costis growing much faster than you expected — stop and look.
10.3 Getting help
- Claude Code docs: code.claude.com/docs
- Anthropic Console: console.anthropic.com
- AEA Data Editor guidance: aeadataeditor.github.io
- AEA README template: social-science-data-editors.github.io/template_README
11 Part 11: Pre-submission checklist
Use this before you claim the package is done. Better yet, ask the agent:
▸ /check-package
11.1 Package structure
11.2 README completeness
11.3 Code quality
11.4 Reproducibility
12 Appendix A: Using other models via Aider + OpenRouter
Claude Code uses Claude. When you want to try GPT-5, DeepSeek Reasoner, Gemini 2.5, or free-tier models, Aider + OpenRouter is the flexible alternative.
This is a fully self-contained setup. You can use it alongside Claude Code (e.g. Claude Code builds, Aider does the referee-2 review on GPT-5), or switch to it entirely for a specific task.
12.1 Why this combination
Aider is a terminal coding agent similar to Claude Code:
- Open source, Apache 2.0, Python.
CONVENTIONS.mdplays the same role asCLAUDE.md.- Git-native: auto-commits every edit, so you can
/undo. - Model-agnostic. Bring any API key.
OpenRouter is a single API that routes to ~300 models across providers:
- Anthropic, OpenAI, Google, DeepSeek, Meta, Mistral, Qwen.
- One key, one bill, passthrough pricing.
- Zero Data Retention by default.
- Per-key spend caps (your airbag).
- Free-tier models for warm-ups.
12.2 Setup in ten minutes
12.2.1 1. Create the OpenRouter account
Go to openrouter.ai, sign up, load $5 at Credits. About $0.80 minimum processing fee.
12.2.2 2. Create a scoped API key
Keys → Create Key. Name it workshop-aider. Set a spend cap ($5 whole key, or $1/day). Copy the key — starts with sk-or-v1-.... Shown once.
12.2.3 3. Privacy settings
- ZDR on (OpenRouter does not store prompts after forwarding).
- Prompt logging off (unless you want the ~1% discount).
12.2.4 4. Enter the container and export
docker exec -it econ-replication-agent bash
export OPENROUTER_API_KEY=sk-or-v1-...
# Sanity check — list available models
aider --list-models openrouter/ | head12.2.5 5. First run
mkdir -p ~/aider-test && cd ~/aider-test
aider --model openrouter/anthropic/claude-sonnet-4.5At the Aider prompt:
> Hello. What model are you?
> /cost
> /tokens
/cost should show a few fractions of a cent. You’re wired up.
12.3 Model cost reference
Approximate pay-as-you-go prices per 1 million tokens (input / output). Live at openrouter.ai/models.
| Model | Input | Output | Good for |
|---|---|---|---|
anthropic/claude-sonnet-4.5 |
$3.00 | $15.00 | Same as Claude Code, routed differently |
anthropic/claude-haiku-4.5 |
$1.00 | $5.00 | Smaller edits, iteration |
openai/gpt-5.x |
$2.50 | $10.00 | Alternative to Sonnet |
google/gemini-2.5-pro |
$1.25 | $5.00 | Very large context |
deepseek/deepseek-reasoner |
$0.55 | $2.19 | Architect-mode reasoning |
deepseek/deepseek-chat |
$0.32 | $0.89 | Warm-ups, small fixes |
qwen/qwen3-coder-480b:free |
$0.00 | $0.00 | Free, capable for boilerplate |
12.4 The CONVENTIONS.md is the same CLAUDE.md
# From the messy_project directory:
cp CLAUDE.md CONVENTIONS.mdThat’s it. Content is identical; Aider reads CONVENTIONS.md.
12.5 Aider command cheat sheet
# Launch
aider start in current folder
aider file1.R file2.R pre-add files
aider --model openrouter/<prov>/<m> choose model
aider --read CONVENTIONS.md pre-load read-only
aider --chat-mode architect \
--model <reasoner> \
--editor-model <cheap> architect + editor split
aider --message "do X" file.R one-shot, non-interactive
aider --yes-always --message "do X" fully autonomous one-shot
# In chat
/add <file> add editable file
/read <file> add read-only file
/drop <file> remove file
/diff show pending changes
/undo revert last edit plus auto-commit
/run <cmd> run shell command, output returns to chat
/model <name> switch model mid-session
/chat-mode {code|ask|architect|help}
/tokens context usage
/cost session cost
/clear clear conversation, keep files
/reset clear conversation and drop files
12.6 The architect + editor split
Aider’s cost-saver: use a reasoning model to plan, and a cheap model to execute.
aider --chat-mode architect \
--model openrouter/deepseek/deepseek-reasoner \
--editor-model openrouter/anthropic/claude-haiku-4.5 \
--read CONVENTIONS.mdThe reasoner thinks about each change; the editor types the diff. Typical savings on a long refactor: 40–60% vs. using Sonnet for everything.
On Claude Code, the equivalent is: /plan with --model opus, then /model sonnet after approval. Architect split doesn’t apply directly.
12.7 Running the messy-project task with Aider
cd /workspace/messy_project
cp CLAUDE.md CONVENTIONS.md
aider --model openrouter/anthropic/claude-sonnet-4.5 \
--read CONVENTIONS.mdThen:
> Read every R script and every data file header. Draft a detailed plan
to reorganise into an AER-compliant replication package per
CONVENTIONS.md. Present the plan as a numbered list and wait for my
approval before modifying any files.
After approving: Approved. Execute. Auto-commit each logical step.
The rest of the workflow (verify, iterate, referee-2) is identical to Claude Code. Just swap /run Rscript master.R for approving shell commands in Claude Code.
12.8 Cross-model referee-2
Build with Claude Code, audit with a different model via Aider:
# Terminal 1: build with Claude Code
claude --dangerously-skip-permissions
# Terminal 2: audit with, e.g., GPT-5 via Aider + OpenRouter
aider --model openrouter/openai/gpt-5 \
--read /workspace/replication_package/README.md
> You are a fresh referee for the AEA Data Editor. Audit the package.
(... full referee prompt from Part 8 Step 8 ...)Different providers catch different mistakes. This is the most robust verification you can do short of a human referee.
12.9 Local models for sensitive data
If you can’t send data to any cloud model, run locally with ollama:
# Install ollama (one-time)
curl -fsSL https://ollama.com/install.sh | sh
# Pull a capable local model
ollama pull qwen2.5-coder:32b
# Run aider against it
aider --model ollama/qwen2.5-coder:32b --read CONVENTIONS.mdQuality is lower than Sonnet but zero network egress. The right choice for IRB-restricted data on a secure machine.
12.10 Aider vs. Claude Code, at a glance
| Aider | Claude Code | |
|---|---|---|
| Instruction file | CONVENTIONS.md |
CLAUDE.md |
| Licence | Apache 2.0 | Proprietary |
| Models | Any via OpenRouter / direct | Claude (Opus/Sonnet/Haiku) |
| Plan mode | Ask in prompt, or architect mode |
First-class /plan |
| Custom skills | Load a markdown file per session | .claude/commands/ (versioned) |
| Hooks | — | .claude/settings.json |
| Permission allow-list | — | .claude/settings.json |
| Git auto-commits | Yes (default) | Manual, or via hooks |
| Architect + editor split | Yes (built-in) | Use Opus for plan, Sonnet for exec |
Short version: if you need the richest harness and are happy with Claude, use Claude Code. If you need to try specific non-Claude models or want auto-commits and the architect split, use Aider + OpenRouter. Both can coexist in the same project (different config files, different branches).
13 Glossary
| Term | Definition |
|---|---|
| AEA | American Economic Association |
| AER | American Economic Review |
| Agent | AI system that can plan, execute code, and iterate autonomously |
| Aider | Open-source terminal coding agent (Appendix A) |
| Architect mode | Aider mode where a reasoning model plans and a cheap editor model writes |
| CLAUDE.md | Markdown instructions file read by Claude Code on startup |
| Claude Code | Anthropic’s terminal-native AI coding agent |
/compact |
Claude Code command to compress a long conversation |
| Context degradation | Drop in agent performance as conversation grows long |
| Container | Running instance of a Docker image |
| Dangerously skip permissions | Claude Code mode that runs all commands without asking |
| Docker | Platform for running isolated containers |
| Hooks | Shell commands in Claude Code run automatically on agent events |
| Makefile | Build automation file — make all runs the full pipeline |
| Master script | Single script (master.R) that runs all others in order |
| openICPSR | Repository where AEA replication packages are deposited |
| OpenRouter | API aggregator: one key, ~300 models, per-key spend caps |
/plan |
Claude Code planning mode |
| Referee 2 protocol | Fresh-session audit by a second agent of the built package |
| Replication package | Complete set of data, code, documentation to reproduce a paper |
| Skills | Project-specific slash commands in .claude/commands/ |
| ZDR | Zero Data Retention — OpenRouter policy of not storing prompts |