AI Coding Agents for Economics Research

Claude Code. From Messy Code to AER-Ready Replication Package. Day 2

Author

Alexander Rieber (@AlexRieber) · Ulm University

Published

April 22, 2026

Who is this for? Master’s students, PhD students, and post-docs who want to use Claude Code from the terminal to organise, document, and package research code to professional (AER/AEA) standards.

This is Day 2. We go deeper on Claude Code specifically: custom skills, hooks, plan mode, permission allow-lists, research best practices, and the messy-project-to-replication-package exercise. If you skipped Day 1, Part 2 repeats the Claude Code setup so you can follow along.

Prerequisites: Docker container built and running (see Docker Setup How-To). Either a Claude Max subscription, or an Anthropic API key with a few dollars of credit.

Want other model families (GPT-5, DeepSeek, Gemini)? See Appendix A: Using other models via Aider + OpenRouter at the end of this handout. It’s a self-contained setup you can use alongside Claude Code.

1 Part 1: What you will build today

1.1 The task

You have inherited a messy project folder — the kind every researcher has at some point. It contains:

R scripts with inconsistent names (analysis_v2_FINAL.R, make figs.R)
No README, no master script, no documentation
Data files mixed with code
Hardcoded file paths and commented-out code
No dependency management
Exploratory scripts mixed with the final analysis

Your job: use Claude Code to transform this mess into a publication-ready replication package that meets the standards of the American Economic Association (AEA).

1.2 The data

The project uses the actual replication data from Kessler & Roth (2025), “Increasing Organ Donor Registration as a Means to Increase Transplantation: An Experiment with Actual Organ Donor Registrations” (AEJ: Economic Policy, openICPSR 195641). The experiment tested whether changing how organ-donor registration questions are framed (opt-in vs. active choice) affects real registration decisions.

The messy folder contains three Stata datasets (.dta) and two raw CSV files, mixed with seven R scripts of varying quality and a notes.txt:

File	Description	Observations
`1_experiment_clean.dta`	Lab experiment waves 1–3 (real registrations)	1,043
`2_nextofkin_clean.dta`	Next-of-kin experiment (MTurk)	803
`3_dmv_quarterly_clean.dta`	State-level DMV registration panel	1,416
`dmv_quarterly_raw.csv`	Raw quarterly DMV data	—
`nextofkin_raw.csv`	Raw next-of-kin Qualtrics data	—

1.3 What a finished package looks like

By the end, you will have:

replication_package/
├── README.md                 # Structured per AEA template
├── LICENSE.txt               # MIT for code, CC-BY for data
├── Makefile                  # One command reproduces everything
├── master.R                  # Runs all scripts in correct order
├── CLAUDE.md                 # The briefing
├── .claude/
│   ├── settings.json         # Permission allow-list + hooks
│   └── commands/
│       ├── check-package.md  # Custom skill: AER compliance check
│       └── referee.md        # Custom skill: referee-2 audit
├── code/
│   ├── 00_setup.R            # Install/load packages
│   ├── 01_descriptive.R      # Table 1: summary statistics
│   ├── 02_main_analysis.R    # Tables 2–3: treatment effects
│   ├── 03_nok_analysis.R     # Table 4: next-of-kin results
│   ├── 04_dmv_analysis.R     # Table 5: state-level DiD
│   ├── 05_robustness.R       # Appendix robustness checks
│   └── 06_figures.R          # All figures
├── data/
│   ├── raw/                  # Original CSV/Stata files
│   └── README_data.md        # Variable descriptions
└── output/
    ├── tables/               # LaTeX and CSV tables
    └── figures/              # PDF and PNG figures

2 Part 2: Claude Code setup

Skip this part if you did Day 1 and Claude Code is already authenticated. Otherwise, ten minutes gets you from zero to an authenticated session.

Two payment paths.

2.1 Track A. Claude Max (flat subscription)

Good if you expect heavy daily use.

Subscribe. Open claude.ai/settings/plans, pick Max 5× or Max 20×.
Enter the container. docker exec -it econ-replication-agent bash
Log in.

claude login

A browser window opens for OAuth; the token is stored in the container. On a headless server, use claude login --no-browser for a paste-in flow.

2.2 Track B. API tokens (pay-as-you-go)

Good if you want to start small.

Sign up at console.anthropic.com.
Add $5 credit. Billing → Add credits.
Create an API key with a spend cap. Workspaces → API Keys → Create. Name it workshop-day2, set a hard ceiling (e.g. $5). Copy the key immediately; you cannot view it again.
Confirm privacy settings (Settings → Privacy). Training opt-out is default on API; confirm the toggle.
Export and verify.

docker exec -it econ-replication-agent bash
export ANTHROPIC_API_KEY=sk-ant-...

claude
▸ /cost

/cost should now show tokens. You’re wired up.

2.3 First session: what to check

▸ /help             # list all built-in commands
▸ /model            # confirm which Claude model is active
▸ /cost             # confirm billing is wired up

Expected cost for today: $2–5 on API for a full messy-project build on Sonnet; just session time on Max.

3 Part 3: Claude Code, in one page

The features that matter most for research workflows.

3.1 Slash commands

Command	What it does
`/help`	Full command list
`/init`	Auto-draft a `CLAUDE.md` from the repo
`/plan`	Enter planning mode — drafts a plan, waits for approval
`/compact`	Compress the conversation to save context
`/clear`	Clear history (keep files). Agent re-reads `CLAUDE.md` on the next message.
`/model <name>`	Switch opus / sonnet / haiku
`/cost`	Session cost on API, or “uses Max”
`/your-skill`	Run a custom command from `.claude/commands/`

3.2 Keyboard shortcuts

Shortcut	Action
`Ctrl+O`	Toggle verbose output (show agent thinking)
`Ctrl+C`	Cancel current generation
`Esc Esc`	Rewind to a previous point

3.3 Starting modes

claude                                  # interactive, asks before each bash command
claude --model opus                     # start with Opus (e.g. for /plan)
claude --dangerously-skip-permissions   # full autonomy — reasonable inside Docker

3.4 Useful things you can ask once and reuse

/init — drafts a first-pass CLAUDE.md by scanning the repo.
Custom skills — drop Markdown files in .claude/commands/; they become slash commands.
Hooks — pre/post-tool shell commands in .claude/settings.json; the harness enforces them.
Subagents — for independent subtasks (“refactor file A while you refactor file B”), run in parallel.

4 Part 4: The instruction file — your agent’s brain

4.1 Same file, many names

Agent	File it reads	Discovery
Claude Code	`CLAUDE.md`	Project root, then up the tree to `/`
Aider	`CONVENTIONS.md`	Via `--read` or `.aider.conf.yml`
Gemini CLI	`GEMINI.md`	Project root
Codex CLI	`AGENTS.md`	Hierarchical

The content is identical. cp CLAUDE.md CONVENTIONS.md literally works.

Think of these as a detailed briefing you hand to a new research assistant on their first day. The agent reads them automatically — you never paste them into the chat.

4.2 Why the briefing matters

Without it, the agent will:

Not know the project’s goals or context.
Make arbitrary decisions about file organisation.
Use whatever coding style it prefers (your adviser won’t).
Not know your field’s conventions (AER formatting, clustering choices, etc.).

With a good briefing, the agent will:

Follow your exact specifications.
Use the right packages and methods.
Produce output in the format you need.
Handle edge cases the way you want.

4.3 Anatomy of a good briefing

Six sections, in roughly this order:

# Project Name

## Mission
One paragraph. What is this project? What should the agent achieve?

## Principles
Numbered list of non-negotiable rules. Short. Unambiguous.

## Project Structure
Show the expected directory tree.

## Technical Specifications
Packages to use, output formats, coding style, SE specifications.

## Don'ts
What must never happen ("never delete data/raw/", "no install.packages()
outside 00_setup.R", "no hardcoded absolute paths").

## Communication Protocol
What to do when stuck. Where to log decisions for handoff between sessions.

4.4 A full `CLAUDE.md` for today’s exercise

CLAUDE.md

# Replication Package Agent

## Mission
Transform the messy project folder into an AER-compliant replication
package. Reorganise code, write documentation, create a master script,
and produce all tables and figures in a clean, reproducible pipeline.

## Principles
1. Never delete or modify the original data files. Copy to data/raw/.
2. All code must run from the project root using relative paths.
3. Use R with tidyverse conventions. Load packages with library().
4. Output tables as both .csv and .tex (using modelsummary).
5. Output figures as both .pdf and .png (300 DPI).
6. Follow the AEA Social Science Data Editors README template exactly.
7. Document every non-obvious decision in the README.
8. Use fixest::feols() for regressions, not lm().
9. Standard errors: heteroskedasticity-robust (vcov = "hetero") unless
   a script comment says otherwise.

## Project Structure
replication_package/
├── README.md
├── LICENSE.txt
├── Makefile
├── master.R
├── code/
│   ├── 00_setup.R
│   ├── 01_descriptive.R
│   ├── 02_main_analysis.R
│   ├── 03_nok_analysis.R
│   ├── 04_dmv_analysis.R
│   ├── 05_robustness.R
│   └── 06_figures.R
├── data/
│   ├── raw/
│   └── README_data.md
└── output/
    ├── tables/
    └── figures/

## Technical Specifications
- R packages: tidyverse, haven, fixest, modelsummary, ggplot2, marginaleffects
- Tables: modelsummary() with output formats "latex" and "csv"
- Figures: ggsave() with width=8, height=5, dpi=300
- Data files are Stata .dta format — read with haven::read_dta()
- Set seed at the top of every script that randomises anything
  (set.seed(20260422))

## AER README Requirements
The README.md must follow the Social Science Data Editors template:
1. Overview
2. Data Availability and Provenance Statements
   (Statement about Rights, Summary of Availability, Dataset List)
3. Computational Requirements (software, packages with versions, runtime)
4. Description of Programs (what each script does)
5. Instructions to Replicators (step-by-step)
6. List of Tables and Programs (every output mapped to its script)
7. References

## Don'ts
- Never modify files in data/raw/.
- No install.packages() outside 00_setup.R.
- No hardcoded absolute paths.
- No `setwd()` calls.
- No committing large data files to git — use .gitignore.

## Communication Protocol
- If a script fails: log the error, continue with independent tasks,
  summarise all errors at the end.
- Write a DECISIONS.md at the project root for non-obvious choices
  ("I dropped quick_look.R because it's exploratory and not referenced
  anywhere").
- At the end of a session, append a short note to DECISIONS.md so the
  next session (or the next agent) can pick up without re-reading the
  full chat history.

4.5 Best practices

Be specific, not vague. “Use modelsummary() with output = 'latex'” beats “make nice tables.”
Show the expected directory layout. A tree diagram removes ambiguity.
State what NOT to do. “Never delete files in data/raw/” prevents costly mistakes.
Use emphasis for critical rules. Bold or ALLCAPS for non-negotiable constraints.
Keep it concise. Long instruction files dilute important rules. Aim for 50–250 lines.
Update as you learn. The briefing is a living document.
Use /init to bootstrap. Then edit.
Meta move: draft with a chatbot. Paste a description of your project into claude.ai, ask for the six sections, iterate.

Multi-level CLAUDE.md

Claude Code discovers CLAUDE.md up the directory tree. A ~/.claude/CLAUDE.md sets global preferences (“always use tidyverse”); a project-level CLAUDE.md adds project-specific rules. Both are loaded and merged. Put taste preferences at the global level; put project conventions in the project. Every Claude Code session in every project then behaves consistently without re-specifying your house style.

5 Part 5: Permissions, planning, skills, hooks

5.1 Permission modes

Mode	How to activate	What happens
Default	`claude`	Agent asks before each bash command
Allow-list	`.claude/settings.json`	You pre-approve specific patterns
Skip all	`claude --dangerously-skip-permissions`	Runs everything without asking

An allow-list in .claude/settings.json:

.claude/settings.json

{
  "permissions": {
    "allow": [
      "Bash(Rscript:*)",
      "Bash(R:*)",
      "Bash(mkdir:*)",
      "Bash(cp:*)",
      "Bash(mv:*)",
      "Bash(cat:*)",
      "Bash(ls:*)",
      "Bash(make:*)",
      "Write(*)",
      "Read(*)"
    ],
    "deny": [
      "Bash(rm -rf /*)",
      "Bash(git push:*)",
      "Write(data/raw/**)"
    ]
  }
}

The agent can run R, create directories, copy files, read/write — but can’t rm -rf, git push, or modify data/raw/.

“Dangerously skip” + Docker = reasonable

Inside a Docker container, the agent cannot escape the sandbox. The host’s files are safe. claude --dangerously-skip-permissions is appropriate for today.

5.2 Planning mode

Complex tasks benefit from planning before execution. Without it, the agent might start reorganising files before understanding the full scope, miss dependencies between scripts, or build a structure that needs to be redone.

▸ /plan

Analyse all files in this project folder. Create a detailed plan to
reorganise everything into an AER-compliant replication package.

In planning mode, the agent:

Reads all files in the project.
Analyses dependencies between scripts.
Drafts a step-by-step plan with specific actions.
Presents the plan for your review.
Waits for your approval before executing anything.

You can:

Approve: “Looks good, proceed.”
Modify: “Skip the robustness checks. Rename Table 5 to Table 4.”
Reject: “Start over with a different structure.”

When to use /plan

Use it when:

The task has many steps (like today’s exercise)
You want to review the approach before execution
The agent might make structural decisions you want to influence

You do NOT need /plan for:

Simple, single-file edits (“fix the axis label in Figure 3”)
Running existing scripts (“run master.R”)
Questions about the project (“what packages does this use?”)

5.3 Custom skills

Drop a Markdown file into .claude/commands/ and it becomes a project-scoped slash command.

mkdir -p .claude/commands

Example — a /check-package command:

.claude/commands/check-package.md

Verify that the replication package is AER-compliant. For each item
below, report pass/fail with a one-line justification.

1. README.md present and follows the AEA Social Science Data Editors
   template (7 required sections).
2. Every script referenced in README.md exists on disk.
3. `Rscript master.R` completes without errors from a clean R session.
4. Every table and figure listed in the README is produced in
   output/tables/ or output/figures/.
5. No absolute paths in any .R file (`grep -rn '/home\|C:' code/`).
6. 00_setup.R loads every library() used across code/.
7. data/raw/ is byte-identical to the original input (compare SHA-256).
8. LICENSE.txt present.

Finish with a numerical score: X / 8.

Type /check-package in Claude Code — the agent runs the checklist. Skills travel with your repo; anyone who clones it gets the same commands.

Another useful one — /referee:

.claude/commands/referee.md

You are a fresh AEA Data Editor referee. You have NOT seen this project
before. Review the replication package at /workspace/replication_package/.

1. Is the README structured per the AEA template? List any missing
   sections.
2. Does `master.R` run without errors from a clean shell? (Run it.)
3. Are all packages listed with versions (`sessionInfo()` or
   `renv.lock`)?
4. Are all tables/figures in the paper produced by the code?
5. What would you ask the authors to fix?

Output a structured referee report.

5.4 Hooks

Hooks are shell commands that execute automatically on agent events. Configure in .claude/settings.json:

.claude/settings.json

{
  "hooks": {
    "PostToolUse": [
      {
        "matcher": { "tool": "Write", "path_glob": "code/*.R" },
        "command": "Rscript -e 'lintr::lint(\"$CLAUDE_FILE_PATH\")'"
      },
      {
        "matcher": { "tool": "Write", "path_glob": "data/raw/**" },
        "command": "echo 'BLOCKED: data/raw/ is read-only'; exit 1"
      }
    ],
    "Stop": [
      {
        "command": "git log --oneline -n 5"
      }
    ]
  }
}

The harness runs these — the agent cannot skip them. Use cases:

Lint R code after every file write.
Prevent writes to data/raw/.
Log all shell commands for reproducibility auditing.
Show a compact git log when a session ends.

A hook you write once is enforced every time, even if the agent “forgets” a rule in CLAUDE.md.

6 Part 6: The AER replication package standard

6.1 Why standards matter

The American Economic Association requires all published papers to include a replication package deposited on openICPSR. The AEA Data Editor reviews every package before publication. Understanding these standards is essential for any economist submitting to AEA journals — and a good practice for any empirical research you want others to trust.

6.2 The AEA README template

The AEA uses the Social Science Data Editors’ README template. Your README must include:

Overview: Paper title, authors, one-paragraph summary of what the package contains.
Data Availability and Provenance Statements:
1. Statement about Rights — certify you have permission to use and redistribute the data.
2. Summary of Availability — all public / some restricted / none available.
3. Dataset List — table mapping data files to their sources.
Computational Requirements: Software (with versions), packages (with versions), hardware, runtime estimate.
Description of Programs: What each script does.
Instructions to Replicators: Numbered steps to reproduce everything.
List of Tables and Programs: Map every table/figure to the script that produces it.
References: Full citations for all data sources.

6.2.1 Computational requirements checklist

Your README must specify:

Software and version (e.g., R 4.4.2)
All packages with versions (e.g., fixest 0.12.1)
Operating system (or “platform-independent”)
Expected runtime (e.g., “< 10 minutes on a standard laptop”)
Storage requirements (e.g., “< 25 MB”)
Whether a random seed is set (and where)

6.3 Other required files

File	Purpose
`README.md`	Documentation (see above)
`LICENSE.txt`	Terms of use (MIT for code, CC-BY 4.0 for data)
`master.R` or `Makefile`	Single entry point that runs everything
`00_setup.R`	Installs/loads all package dependencies

The cardinal rule

“The replication package should reproduce the tables and figures, as well as any in-text numbers, by running code without manual intervention, starting from the raw data.” — AEA Data Editor

6.4 Common compliance failures

Problem	How to avoid
Missing or incomplete README	Use the template, fill every section
Code does not run	Test from a clean environment
Missing data citations	Cite every dataset in the manuscript’s references
No master script	Create `master.R` that sources all scripts in order
Hardcoded paths	Use relative paths from the project root
Missing dependencies	List every package in `00_setup.R` with version
Exploratory files included	Remove scratch scripts, keep only final code
Manual steps required	Automate everything

7 Part 7: Research best practices with AI agents

This is the section that makes the difference between using an agent competently and using one recklessly. Read it before the hands-on exercise.

7.1 1. Verify, don’t trust

Every agent output is a first draft from a capable but junior RA. The agent is fast, confident, and occasionally wrong in ways that look right.

Read every diff. Before accepting a change, look at it. If you hit “yes” reflexively, you are not doing research — you are hoping.
Run the code yourself from a clean shell. Don’t rely on “agent says it works.” Open a fresh terminal, cd replication_package, Rscript master.R, read every line of output.
Spot-check against priors. If a regression now says the effect is +0.12 when your prior result was +0.08, stop and figure out why before moving on. Know your magnitudes.
Don’t trust agent summaries of what it did. They are narratives, not audits. git log and a diff are the audit trail.

If you would check an RA’s first output, you check the agent’s.

7.2 2. Delegate the paperwork, keep the research

AI coding agents win on the paperwork around the research, not the research itself.

Delegate to the agent:

Inventory / audit of an unknown folder.
Bulk rewrites (rename, reformat, port API).
Glue code (download, parse, save, plot).
Documentation drafts (README, script headers, data dictionaries).
Running → failing → fixing loops.
Generating variants for comparison (three plot styles, two SE specifications).

Keep for yourself:

Research question and motivation.
Identification strategy.
Choice of estimator and standard errors.
Interpretation of coefficients.
Final-paper prose.
Anything involving sensitive data.

The failure mode is offloading the wrong decisions to the agent — asking it to pick a clustering level or choose whether to weight. Don’t. Make that decision yourself, write it into CLAUDE.md as a rule, and let the agent execute it.

7.3 3. Data privacy and IRB

Cloud agents send your prompt — and everything in context — to a third-party LLM provider.

Never put these in front of a cloud agent:

Personally identifiable data (names, addresses, precise geocodes, SSNs, NHS numbers).
IRB-restricted microdata.
Restricted-use administrative data (Census, tax records, health) under a DUA.
Unpublished results you don’t want a model provider seeing.

Mitigations:

Anthropic: training opt-out is default for API traffic. Confirm in console settings.
.claudeignore: add sensitive folders so the repo map doesn’t scan them.
Docker mount surface: only mount what the agent needs. Keep the raw extract outside the container if you can.
Local models for the really sensitive stuff: run ollama locally; the Aider appendix in this handout shows how.
When in doubt, don’t. There is no button that makes sending restricted data to a cloud model OK.

7.4 4. Session hygiene: beat context degradation

Claude’s context window is 200k tokens (or 1M on newer models). It fills up faster than you’d think, especially when you add large files or run verbose shell commands whose output lands back in the chat.

As the conversation grows, performance drops:

The agent forgets a rule stated 20 turns ago.
Sloppier diffs — spurious edits to files you didn’t ask about.
Vague summaries (“I’ve updated the analysis”) in place of concrete ones.

The fix is session hygiene, not a bigger context window:

/clear when switching topics. The files stay; the chat resets. The agent re-reads CLAUDE.md on the next message.
/compact when the session is clearly bloated but you want to keep a summary.
Short sessions over long ones. Break the messy-project task into “reorganise”, “write scripts”, “write README”, “run and debug” — each a separate session.
Files as handoff state. Decisions go in DECISIONS.md or CLAUDE.md, not the chat buffer.

The canonical symptom of context degradation is the agent forgetting a rule in CLAUDE.md. That’s your cue: /clear, the rule is re-read and honoured.

7.5 5. The “referee 2” protocol

The single most useful pattern for verifying research code from an agent: a second, fresh session as an independent reviewer.

# Terminal 1: build the package
claude --dangerously-skip-permissions
▸ (builds replication_package/)

# Terminal 2: fresh session, clean context
claude
▸ You are a referee for the AEA Data Editor. Review
  /workspace/replication_package/. Check:
  1. Does `Rscript master.R` run without errors from a clean session?
  2. Does the README follow the AEA template (7 sections)?
  3. Are all tables and figures produced?
  4. Are all packages listed with versions?
  5. Any absolute paths? Any exploratory files left over?
  Report pass/fail for each criterion with specific evidence.

Why this works:

A fresh context has no bias from the construction. It didn’t spend an hour choosing a structure; it’s just reading.
Running the code from a fresh shell surfaces hardcoded paths and missing dependencies that the building session papered over.

Variation: cross-model referee. Build with Sonnet, review with Opus. Or use the Aider + OpenRouter appendix to audit with GPT-5 or DeepSeek Reasoner. Different brains catch different mistakes.

7.6 6. Reproducibility first

The point of an AER replication package is that someone else, on a different machine, a year later, can reproduce your results. Every agent decision should be evaluated against that bar.

Git is your safety net:

Commit frequently. Hit Esc and re-prompt if a diff looks wrong before accepting.
Commit CLAUDE.md and custom skills. They’re part of the repo.
A clean git log is the audit trail of the agent’s work.

Environment is your truth:

The Docker image pins R, Python, system libraries.
renv::snapshot() pins R package versions. renv::restore() recreates them elsewhere.
Always test the final package from a clean R session (R --vanilla), not your dev session.

One-command reproducibility:

master.R sources every script in order.
Makefile does the same with dependency tracking. Bonus: make clean wipes the outputs so you can verify the build from scratch.

7.7 7. Prompt patterns that work

#	Pattern	Example
1	Vague → iterate	“I want X. Ask anything unclear, then do it.”
2	Scripts, not commands	“Write `download_data.R`”, not “download the data”
3	Style by reference	“Kieran Healy’s ggplot2 best practices” beats a 20-line `theme()` spec
4	Small iterations	Three narrow prompts beat one 500-word prompt
5	Context → objective → constraints	“This project replicates X. Refactor scripts. Relative paths only.”
6	Show, don’t tell	A directory tree beats paragraphs of description
7	Reference files explicitly	`@filename` lets Claude Code scope to specific files
8	Stop arguing — start over	If the agent keeps making the same mistake, Esc, `/clear`, re-prompt.

Weak vs. strong prompts:

Weak	Strong
“Clean up this code”	“Refactor `analysis_v2_FINAL.R` into `code/02_main_analysis.R`. Replace `read.csv` with `haven::read_dta`. Use `fixest::feols()`.”
“Make a README”	“Write `README.md` following the AEA Social Science Data Editors template. Include all 7 required sections. List every R package used with its installed version (use `packageVersion()`).”
“Fix the figures”	“In `code/06_figures.R`, change Figure 1 to `theme_minimal()`, save as both PDF and PNG at 300 DPI in `output/figures/`, filename `figure1_main.{pdf,png}`.”

7.8 8. Model economics

A messy-to-replication-package run is 20–60 minutes of agent time.

The rule of thumb for Claude Code:

Use /plan on Opus for the initial architecture.
/model sonnet once you’ve approved the plan — cheaper execution.
/model haiku for tiny subsequent edits.

On Max, switching is free. On API, Opus is ~5× more expensive than Sonnet; use it deliberately.

Want to compare models across providers? The Aider + OpenRouter appendix has the architect + editor split (reasoner plans, cheap editor writes) with typical savings of 40–60% on long refactors.

7.9 9. Keep a human in the loop for methodology

An agent happily writes

feols(outcome ~ treat | wave + state, data = df, vcov = "hetero")

without pausing on whether clustered SEs at the wave level make more sense, or whether state fixed effects absorb the treatment’s identifying variation. Don’t let it decide for you.

Write the estimator, the fixed effects, and the SE choice into CLAUDE.md as rules.
When the agent proposes a different spec, it is asking you a question — answer it before accepting.
Review regression output the same way you’d review an RA’s: coefficient signs, magnitudes, sample sizes.

7.10 10. Write progress logs for handoff

Between sessions, the agent re-reads files, not the prior conversation. Use that:

Append a short note to DECISIONS.md whenever you or the agent makes a non-obvious choice.
At the end of each session, ask the agent to summarise what was done into DECISIONS.md. Commit it.
Next session: Claude Code reads CLAUDE.md and DECISIONS.md on startup. It picks up where you left off with no context loss.

7.11 11. Tutor mode is real: learn by doing

If you are new to Claude Code, don’t read about it — run a tutor session. Load a tutor briefing as a read-only file, let the agent walk you through the workshop material by having you do the steps yourself. Progress is stored in a file, not in the chat.

8 Part 8: Hands-on exercise

8.1 Step 1: Get the workshop files

If you haven’t already, clone the workshop repository on your host machine (outside Docker):

git clone https://github.com/AlexRieber/Workshops.git

The messy project files are at Workshops/AI_Agents/messy_project/.

8.2 Step 2: Set up your workspace

# Copy the messy project into the running container
docker cp ~/Workshops/AI_Agents/messy_project econ-replication-agent:/workspace/messy_project

# Enter the container
docker exec -it econ-replication-agent bash

cd /workspace/messy_project
ls -la

You should see:

1_experiment_clean.dta        analysis_v2_FINAL.R
2_nextofkin_clean.dta         dmv_analysis_v3.R
3_dmv_quarterly_clean.dta     dmv_figures.R
dmv_quarterly_raw.csv         figure8_nok.R
nextofkin_raw.csv             make figs.R
nok_analysis.R                notes.txt
old_robustness_checks.R       quick_look.R
Table1_descriptive.R

Take a moment

Before starting the agent, look at the files yourself. Open a few scripts. Notice the problems: inconsistent naming, no structure, hardcoded paths, commented-out code, exploratory scripts mixed with analysis code. This is what the agent will fix — but you need to know what to verify later.

8.3 Step 3: Write the briefing

Two ways:

8.3.1 A. Bootstrap with `/init`

claude
▸ /init

Claude Code writes a first-draft CLAUDE.md. Edit it to add the AEA-specific rules from Part 4.

8.3.2 B. Paste the example from Part 4

nano CLAUDE.md
# (paste the content from Part 4, save with Ctrl+O, Enter, Ctrl+X)

8.4 Step 4: Start with planning

claude --dangerously-skip-permissions

Then type:

▸ /plan

Analyse all files in this project folder. Create a detailed plan to
reorganise everything into an AER-compliant replication package.
The package should follow the structure in CLAUDE.md. Show me the
plan before you start.

Review the plan. If it looks good:

▸ Approved. Please execute the plan.

8.5 Step 5: Watch the agent work

The agent will typically:

Read all existing scripts to understand what they do.
Create the directory structure (code/, data/raw/, output/tables/, output/figures/).
Copy data files to data/raw/.
Refactor each script: clean code, fix paths, add proper output saving.
Create 00_setup.R with all package dependencies.
Create master.R that sources all scripts in order.
Write README.md following the AEA template.
Create LICENSE.txt and Makefile.
Test by running master.R.
Fix any errors that come up.

This takes 10–30 minutes

Depending on the model. Use Ctrl+O to toggle verbose output and watch the thinking in real time. Check /cost every few steps — build cost intuition.

8.6 Step 6: Review the results

After the agent finishes, check the output from a clean shell, not from inside the agent session:

# In a new terminal, inside the container:
cd /workspace/replication_package

# Tree
ls -R

# Clean-session run
R --vanilla -e 'source("master.R")'

# Verify outputs
ls output/tables/
ls output/figures/

# Read the README
head -60 README.md

8.7 Step 7: Iterate and improve

The first pass will not be perfect. Use the agent to fix issues:

▸ The README is missing the Computational Requirements section.
  Please add it with the correct R version (run `R --version` to check)
  and the package list with versions (run packageVersion(...) for each).

▸ Table 2 should use clustered standard errors at the session level,
  not heteroskedasticity-robust. Update 02_main_analysis.R and regenerate.

▸ The Makefile's `clean` target should also remove output/tables/ and
  output/figures/. Please update it.

This iterative refinement is where agents excel. Each fix takes seconds, not minutes.

8.8 Step 8: Run the referee-2 audit

# Fresh session, clean context:
claude

▸ You are a fresh referee for the AEA Data Editor. You have NOT seen
  this project before. Audit /workspace/replication_package/ against
  the AEA Social Science Data Editors README template. Specifically:
  1. List any missing README sections.
  2. From an empty /tmp/test/ directory, try to reproduce the package:
     copy the folder, run master.R, confirm every table/figure in the
     README is produced.
  3. Are all packages listed with versions?
  4. Any absolute paths or setwd() calls?
  5. Any exploratory files left over from the original messy folder?
  Output a structured referee report with pass/fail for each item.

Read the report. Fix what it catches. This is the habit that separates a good replication package from a great one.

9 Part 9: Advanced techniques

9.1 Opus for planning, Sonnet for execution

claude --model opus
▸ /plan
▸ Analyse /workspace/messy_project and draft a detailed plan for the
  AER replication package, per CLAUDE.md.

After approving:

▸ /model sonnet
▸ Proceed with the plan.

On API, this is the single biggest cost-saver for long refactors. On Max, there’s no token cost — but Sonnet is typically faster.

9.2 Subagents for parallel work

▸ Refactor 02_main_analysis.R while a subagent simultaneously refactors
  03_nok_analysis.R. When both complete, integrate both into master.R.

Claude Code spawns a second agent with its own context. The parent agent re-integrates the results. Use this when subtasks are genuinely independent — otherwise you get merge conflicts and no speed-up.

9.3 Hooks for research discipline

Useful hooks for an AER-bound project (.claude/settings.json):

{
  "hooks": {
    "PostToolUse": [
      {
        "matcher": { "tool": "Write", "path_glob": "code/*.R" },
        "command": "Rscript -e 'lintr::lint(\"$CLAUDE_FILE_PATH\")'"
      },
      {
        "matcher": { "tool": "Write", "path_glob": "data/raw/**" },
        "command": "echo 'BLOCKED: data/raw/ is read-only'; exit 1"
      },
      {
        "matcher": { "tool": "Write", "path_glob": "**/*.R" },
        "command": "grep -nE '^setwd|\"/home/|\"C:' \"$CLAUDE_FILE_PATH\" && echo 'WARN: absolute path or setwd()'; true"
      }
    ],
    "Stop": [
      {
        "command": "git log --oneline -n 5"
      }
    ]
  }
}

First hook: lint every R file the agent writes.
Second: hard-block writes into data/raw/.
Third: warn (but don’t block) on setwd() or absolute paths.
Fourth: at end of session, show the last 5 commits.

9.4 Using `.claudeignore`

Like .gitignore, this file tells Claude Code to skip certain files. Useful for large projects:

.claudeignore

# Skip large data files the agent doesn't need to read
data/raw/*.csv
data/raw/*.dta
# Skip compiled output
output/
# Skip renv library
renv/library/

This speeds up the agent (fewer files in the repo map) and prevents accidental reads of sensitive data.

9.5 Global vs project `CLAUDE.md`

Global (~/.claude/CLAUDE.md) — house style, cross-project:

# Global preferences

- Prefer tidyverse over base R.
- Use `fixest::feols` for regressions, not `lm`.
- Use `here::here()` for paths.
- Write concise comments only where the "why" is non-obvious.

Project (CLAUDE.md) — project-specific rules, loaded on top.

Both are loaded and merged. Use this for consistent cross-project behaviour without re-specifying your house style.

10 Part 10: Troubleshooting

10.1 Common issues

Problem	Solution
Claude Code asks for permission on every command	Set up `.claude/settings.json` (Part 5) or use `--dangerously-skip-permissions` inside Docker
Agent takes a wrong approach	Ctrl+C to cancel, then “Instead of X, please do Y” — or `Esc Esc` to rewind
Agent runs out of context	`/compact` to compress, or `/clear` to reset (re-reads `CLAUDE.md`)
R package not installed	“Install fixest.” Or: `Rscript -e 'install.packages("fixest")'`
Agent creates files in wrong location	Be specific: “Create `code/01_descriptive.R` in `/workspace/replication_package/`”
Agent modifies original data files	Add to `CLAUDE.md`: “Never modify files in `data/raw/`.” Add a deny rule to `.claude/settings.json`.
Agent generates code that doesn’t run	Approve the run command. It sees the error and debugs.
`authentication failed`	Re-run `claude login` (Max) or re-export `ANTHROPIC_API_KEY`
Rate-limit errors	Wait 30 seconds; or consider the Max plan
Context window exceeded	`/compact` or `/clear`

10.2 When to intervene

Let the agent work autonomously unless:

It is about to delete original data files.
It is taking a fundamentally wrong statistical approach.
It has been stuck in a loop for more than 3 attempts on the same error.
It is making network requests you did not expect.
/cost is growing much faster than you expected — stop and look.

10.3 Getting help

Claude Code docs: code.claude.com/docs
Anthropic Console: console.anthropic.com
AEA Data Editor guidance: aeadataeditor.github.io
AEA README template: social-science-data-editors.github.io/template_README

11 Part 11: Pre-submission checklist

Use this before you claim the package is done. Better yet, ask the agent:

▸ /check-package

11.1 Package structure

README.md exists in the root directory
LICENSE.txt exists
master.R or Makefile exists
code/ directory contains all analysis scripts
data/raw/ contains all data files, unmodified
output/tables/ and output/figures/ contain results
No exploratory or scratch files remain

11.2 README completeness

Overview section with paper title and summary
Data Availability Statement
Dataset list table
Computational requirements (software, packages with versions)
Description of programs (every script listed)
Instructions to replicators (numbered steps)
Table/figure to program mapping
References for all data sources

11.3 Code quality

All paths are relative (no C:/Users/... or /home/user/...)
00_setup.R installs/loads all required packages
master.R runs all scripts in the correct order
Code runs without errors from a clean R session (R --vanilla)
No install.packages() calls in analysis scripts (only in 00_setup.R)
No setwd() calls
Commented-out code removed
Random seeds set where applicable (set.seed())

11.4 Reproducibility

Rscript master.R completes without errors from /tmp/test/
All tables in output/tables/ match expected results
All figures in output/figures/ are produced
make clean && make all reproduces everything (if Makefile exists)
A referee-2 session (fresh context) passes the audit

12 Appendix A: Using other models via Aider + OpenRouter

Claude Code uses Claude. When you want to try GPT-5, DeepSeek Reasoner, Gemini 2.5, or free-tier models, Aider + OpenRouter is the flexible alternative.

This is a fully self-contained setup. You can use it alongside Claude Code (e.g. Claude Code builds, Aider does the referee-2 review on GPT-5), or switch to it entirely for a specific task.

12.1 Why this combination

Aider is a terminal coding agent similar to Claude Code:

Open source, Apache 2.0, Python.
CONVENTIONS.md plays the same role as CLAUDE.md.
Git-native: auto-commits every edit, so you can /undo.
Model-agnostic. Bring any API key.

OpenRouter is a single API that routes to ~300 models across providers:

Anthropic, OpenAI, Google, DeepSeek, Meta, Mistral, Qwen.
One key, one bill, passthrough pricing.
Zero Data Retention by default.
Per-key spend caps (your airbag).
Free-tier models for warm-ups.

12.2 Setup in ten minutes

12.2.1 1. Create the OpenRouter account

Go to openrouter.ai, sign up, load $5 at Credits. About $0.80 minimum processing fee.

12.2.2 2. Create a scoped API key

Keys → Create Key. Name it workshop-aider. Set a spend cap ($5 whole key, or $1/day). Copy the key — starts with sk-or-v1-.... Shown once.

12.2.3 3. Privacy settings

Settings → Privacy:

ZDR on (OpenRouter does not store prompts after forwarding).
Prompt logging off (unless you want the ~1% discount).

12.2.4 4. Enter the container and export

docker exec -it econ-replication-agent bash
export OPENROUTER_API_KEY=sk-or-v1-...

# Sanity check — list available models
aider --list-models openrouter/ | head

12.2.5 5. First run

mkdir -p ~/aider-test && cd ~/aider-test
aider --model openrouter/anthropic/claude-sonnet-4.5

At the Aider prompt:

> Hello. What model are you?
> /cost
> /tokens

/cost should show a few fractions of a cent. You’re wired up.

12.3 Model cost reference

Approximate pay-as-you-go prices per 1 million tokens (input / output). Live at openrouter.ai/models.

Model	Input	Output	Good for
`anthropic/claude-sonnet-4.5`	$3.00	$15.00	Same as Claude Code, routed differently
`anthropic/claude-haiku-4.5`	$1.00	$5.00	Smaller edits, iteration
`openai/gpt-5.x`	$2.50	$10.00	Alternative to Sonnet
`google/gemini-2.5-pro`	$1.25	$5.00	Very large context
`deepseek/deepseek-reasoner`	$0.55	$2.19	Architect-mode reasoning
`deepseek/deepseek-chat`	$0.32	$0.89	Warm-ups, small fixes
`qwen/qwen3-coder-480b:free`	$0.00	$0.00	Free, capable for boilerplate

12.4 The CONVENTIONS.md is the same CLAUDE.md

# From the messy_project directory:
cp CLAUDE.md CONVENTIONS.md

That’s it. Content is identical; Aider reads CONVENTIONS.md.

12.5 Aider command cheat sheet

# Launch
aider                                   start in current folder
aider file1.R file2.R                   pre-add files
aider --model openrouter/<prov>/<m>     choose model
aider --read CONVENTIONS.md             pre-load read-only
aider --chat-mode architect \
      --model         <reasoner> \
      --editor-model  <cheap>           architect + editor split
aider --message "do X" file.R           one-shot, non-interactive
aider --yes-always --message "do X"     fully autonomous one-shot

# In chat
/add <file>            add editable file
/read <file>           add read-only file
/drop <file>           remove file
/diff                  show pending changes
/undo                  revert last edit plus auto-commit
/run <cmd>             run shell command, output returns to chat
/model <name>          switch model mid-session
/chat-mode {code|ask|architect|help}
/tokens                context usage
/cost                  session cost
/clear                 clear conversation, keep files
/reset                 clear conversation and drop files

12.6 The architect + editor split

Aider’s cost-saver: use a reasoning model to plan, and a cheap model to execute.

aider --chat-mode architect \
      --model         openrouter/deepseek/deepseek-reasoner \
      --editor-model  openrouter/anthropic/claude-haiku-4.5 \
      --read CONVENTIONS.md

The reasoner thinks about each change; the editor types the diff. Typical savings on a long refactor: 40–60% vs. using Sonnet for everything.

On Claude Code, the equivalent is: /plan with --model opus, then /model sonnet after approval. Architect split doesn’t apply directly.

12.7 Running the messy-project task with Aider

cd /workspace/messy_project
cp CLAUDE.md CONVENTIONS.md

aider --model openrouter/anthropic/claude-sonnet-4.5 \
      --read CONVENTIONS.md

Then:

> Read every R script and every data file header. Draft a detailed plan
  to reorganise into an AER-compliant replication package per
  CONVENTIONS.md. Present the plan as a numbered list and wait for my
  approval before modifying any files.

After approving: Approved. Execute. Auto-commit each logical step.

The rest of the workflow (verify, iterate, referee-2) is identical to Claude Code. Just swap /run Rscript master.R for approving shell commands in Claude Code.

12.8 Cross-model referee-2

Build with Claude Code, audit with a different model via Aider:

# Terminal 1: build with Claude Code
claude --dangerously-skip-permissions

# Terminal 2: audit with, e.g., GPT-5 via Aider + OpenRouter
aider --model openrouter/openai/gpt-5 \
      --read /workspace/replication_package/README.md

> You are a fresh referee for the AEA Data Editor. Audit the package.
  (... full referee prompt from Part 8 Step 8 ...)

Different providers catch different mistakes. This is the most robust verification you can do short of a human referee.

12.9 Local models for sensitive data

If you can’t send data to any cloud model, run locally with ollama:

# Install ollama (one-time)
curl -fsSL https://ollama.com/install.sh | sh

# Pull a capable local model
ollama pull qwen2.5-coder:32b

# Run aider against it
aider --model ollama/qwen2.5-coder:32b --read CONVENTIONS.md

Quality is lower than Sonnet but zero network egress. The right choice for IRB-restricted data on a secure machine.

12.10 Aider vs. Claude Code, at a glance

	Aider	Claude Code
Instruction file	`CONVENTIONS.md`	`CLAUDE.md`
Licence	Apache 2.0	Proprietary
Models	Any via OpenRouter / direct	Claude (Opus/Sonnet/Haiku)
Plan mode	Ask in prompt, or `architect` mode	First-class `/plan`
Custom skills	Load a markdown file per session	`.claude/commands/` (versioned)
Hooks	—	`.claude/settings.json`
Permission allow-list	—	`.claude/settings.json`
Git auto-commits	Yes (default)	Manual, or via hooks
Architect + editor split	Yes (built-in)	Use Opus for plan, Sonnet for exec

Short version: if you need the richest harness and are happy with Claude, use Claude Code. If you need to try specific non-Claude models or want auto-commits and the architect split, use Aider + OpenRouter. Both can coexist in the same project (different config files, different branches).

13 Glossary

Term	Definition
AEA	American Economic Association
AER	American Economic Review
Agent	AI system that can plan, execute code, and iterate autonomously
Aider	Open-source terminal coding agent (Appendix A)
Architect mode	Aider mode where a reasoning model plans and a cheap editor model writes
CLAUDE.md	Markdown instructions file read by Claude Code on startup
Claude Code	Anthropic’s terminal-native AI coding agent
`/compact`	Claude Code command to compress a long conversation
Context degradation	Drop in agent performance as conversation grows long
Container	Running instance of a Docker image
Dangerously skip permissions	Claude Code mode that runs all commands without asking
Docker	Platform for running isolated containers
Hooks	Shell commands in Claude Code run automatically on agent events
Makefile	Build automation file — `make all` runs the full pipeline
Master script	Single script (`master.R`) that runs all others in order
openICPSR	Repository where AEA replication packages are deposited
OpenRouter	API aggregator: one key, ~300 models, per-key spend caps
`/plan`	Claude Code planning mode
Referee 2 protocol	Fresh-session audit by a second agent of the built package
Replication package	Complete set of data, code, documentation to reproduce a paper
Skills	Project-specific slash commands in `.claude/commands/`
ZDR	Zero Data Retention — OpenRouter policy of not storing prompts