Run

Run

Run a single experiment iteration. Edit the target file, evaluate, keep or discard

Category: development Source: alirezarezvani/claude-skills

What Is Run?

"Run" is a Claude Code skill designed to facilitate controlled, iterative experimentation within a development workflow. It enables developers and researchers to refine code by running a single experiment iteration—editing a target file, evaluating the outcome, and then deciding whether to keep or discard the change. This skill enforces a disciplined, measurable approach to code improvements by tightly integrating version control, experiment history, and strategy escalation.

The primary command, /ar:run, executes one iteration of the experiment pipeline. It ensures that each change is deliberate, traceable, and informed by past results. By automating the mundane yet crucial steps of experiment management, "Run" streamlines iterative development and research efforts.

Why Use Run?

Iterative experimentation is foundational to effective software development, especially in engineering and research domains where hypotheses must be tested and validated methodically. Uncontrolled changes can quickly lead to confusion, regressions, or duplication of effort. The "Run" skill addresses these challenges by:

  • Enforcing Discipline: Each iteration is limited to a single, purposeful change, reducing the risk of introducing multiple variables at once.
  • Leveraging History: By systematically reviewing experiment results, developers avoid repeating failed approaches and build upon what works.
  • Strategic Escalation: The skill guides users through increasingly sophisticated strategies as the number of iterations grows, ensuring that simple fixes are attempted before more radical modifications.
  • Seamless Version Control: Automatic branch checkouts and commit management ensure that every experiment is isolated and reproducible.

This approach is particularly valuable in contexts such as algorithm optimization, parameter tuning, and exploratory code refactoring, where clear experimental boundaries are essential.

How to Get Started

To begin using the "Run" skill, ensure you have the Autoresearch Agent and the relevant skill scripts installed from the official repository.

Running an Experiment Iteration

Run an iteration for a specific experiment:

/ar:run engineering/api-speed

If you omit the experiment name, the skill will prompt you to select one:

/ar:run

This will internally execute:

python {skill_path}/scripts/setup_experiment.py --list

and present available experiments for selection.

Workflow Integration

Ensure your project follows the expected directory structure:

  • .autoresearch/{domain}/{name}/config.cfg — experiment configuration
  • .autoresearch/{domain}/{name}/program.md — strategy and constraints
  • .autoresearch/{domain}/{name}/results.tsv — experiment history

Each experiment is managed in its own Git branch: autoresearch/{domain}/{name}.

Key Features

1. Automated Context Loading

Upon invocation, "Run" automatically retrieves the experiment's configuration, prior results, and strategic documentation. It checks out the corresponding Git branch to ensure all changes are tracked in isolation.

cat .autoresearch/engineering/api-speed/config.cfg
cat .autoresearch/engineering/api-speed/program.md
cat .autoresearch/engineering/api-speed/results.tsv
git checkout autoresearch/engineering/api-speed

2. Results-Driven Decision Making

The skill examines the results.tsv file to inform the next change. It identifies successful patterns, avoids repeated failures, and escalates strategies based on iteration count.

Example: Interpreting History

Suppose results.tsv contains:

run_id   change_description    outcome
1        increased timeout    improved
2        added retry loop     no effect
3        switched parser      crash
4        tuned caching        improved

The next iteration might avoid parser changes (due to crash) and focus on further caching or timeout adjustments.

3. Strategy Escalation

As experiment runs accumulate, "Run" recommends shifting tactics:

  • Runs 1–5: Try obvious, low-risk improvements.
  • Runs 6–15: Systematically explore parameters.
  • Runs 16–30: Attempt more significant structural changes.
  • Runs 31+: Pursue radical or unconventional solutions.

4. Targeted File Editing

Each iteration is restricted to editing only the target file specified by the experiment configuration. This focused approach prevents uncontrolled sprawl and makes outcomes attributable to a single change.

Example: Editing the Target File

If the experiment targets api_handler.py, you might adjust a timeout parameter:

## Before
timeout = 5

## After
timeout = 10

After editing, the skill commits the change, runs evaluation scripts, and records the result.

5. Keep or Discard Changes

Based on evaluation results, the skill either keeps the change (committing to history) or discards it, ensuring that only beneficial changes persist.

Best Practices

  • Review Experiment History: Always study results.tsv to avoid duplicating failed experiments and to recognize successful strategies.
  • Make Atomic Changes: Limit each iteration to a single, well-defined modification for clear attribution.
  • Document Rationale: Update program.md with the reasoning behind each change to provide context for future runs.
  • Escalate Thoughtfully: Follow the recommended escalation path—do not jump to radical changes before exhausting simpler options.
  • Clean Up Regularly: Remove stale experiment branches and obsolete configurations to maintain a manageable workspace.

Important Notes

  • Single Change Policy: "Run" enforces a strict one-change-per-iteration rule. Attempting to modify multiple files or introduce sweeping changes will violate the skill’s constraints.
  • Branch Isolation: All changes occur in the experiment’s dedicated Git branch. Merge carefully to prevent conflicts with the main codebase.
  • Result Evaluation: The quality of experiment outcomes depends on the robustness of your evaluation scripts. Ensure these scripts reliably indicate improvement, regression, or neutrality.
  • Manual Intervention: While much is automated, human judgment remains essential in interpreting results and deciding subsequent strategies.
  • Skill Extensibility: The "Run" skill is designed to be extended or integrated into larger automation pipelines, supporting scalable and reproducible research workflows.