Autoresearch Agent

Autonomous experiment loop that optimizes any file by a measurable metric. Inspired by Karpathy's autoresearch. The agent edits a target file, runs a

What Is Autoresearch Agent?

Autoresearch Agent is an autonomous experiment loop skill for Claude, inspired by Andrej Karpathy’s autoresearch paradigm. It automates the process of iteratively optimizing any file in a git repository according to a user-defined, measurable metric. Specifically, the agent repeatedly edits a target file, runs an evaluation command that outputs a quantitative metric, and determines whether the change is an improvement. Improvements are committed to version control; regressions are discarded. This loop can run indefinitely, continuously searching for better solutions—all without manual intervention.

This skill is especially suited to scenarios where measurable, incremental improvement is valuable, such as optimizing code performance, reducing bundle or image sizes, increasing test pass rates, refining prompts, or improving content quality. By automating the experimentation loop, Autoresearch Agent allows developers to offload repetitive optimization tasks and focus on higher-level problem-solving.

Why Use Autoresearch Agent?

Manual optimization cycles—tweak, test, evaluate, repeat—are time-consuming and prone to human error or fatigue. Autoresearch Agent addresses this by:

  • Automating repetitive experiments: The agent can run hundreds of iterations while you focus on other work or even while you sleep.
  • Consistent measurement: Every iteration uses the same evaluation metric and process, providing reliable data.
  • Version control integration: Improvements are tracked with git commits, making it simple to review or revert changes.
  • Applicable to diverse domains: Whether the goal is to optimize code, assets, prompt wording, or generated content, as long as the improvement is measurable, the agent can help.

For example, suppose you want to minimize the size of a JavaScript bundle. Manually rewriting, building, and measuring size differences is tedious. With Autoresearch Agent, you define the target file and a command to output the bundle size. The agent then conducts a guided search, exploring modifications and retaining only those that decrease the bundle size.

How to Get Started

To use Autoresearch Agent, you need a git repository, the file you want to optimize, and a command-line evaluation that outputs a numeric metric. Here’s a step-by-step example for a project aiming to minimize a JavaScript bundle:

  1. Install and configure the skill in Claude. Follow the setup instructions in the Autoresearch Agent repository.
  2. Prepare your repository:
    git init
    npm install  # Ensure dependencies are installed
  3. Set up your evaluation command. For example, to report bundle size:
    npx webpack --config webpack.config.js
    du -b dist/bundle.js | cut -f1 > bundle_size.txt
    cat bundle_size.txt
    This will print the bundle size in bytes.
  4. Activate the skill and initialize an experiment:
    /ar:setup
    Follow the interactive prompts to specify:
    • The target file (e.g., src/index.js)
    • The evaluation command (e.g., the commands above)
    • The optimization objective (e.g., minimize output value)
  5. Start the autonomous loop:
    /ar:loop
    The agent will begin making edits, evaluating them, and committing improvements.

Key Features

  • Autonomous Experiment Loop: Runs indefinitely, making and evaluating changes without manual intervention.
  • Evaluation-Driven Optimization: Every iteration measures the impact of changes using a custom metric.
  • Git Integration: Commits improvements and resets regressions, ensuring your repository only contains beneficial changes.
  • Flexible Scheduling: Configure the loop to run at intervals—every 10 minutes, hourly, daily, weekly, or monthly.
  • Interactive Setup: The /ar:setup command walks you through experiment configuration.
  • Dashboard and Status Commands: The /ar:status command provides real-time feedback and experiment results.
  • Resume Capability: Pause and later resume experiments with /ar:resume.
  • Broad Applicability: Optimize for speed, size, accuracy, content quality, or any metric you can script.

Best Practices

  • Define a Clear Metric: The effectiveness of the agent depends on your evaluation command. Ensure it outputs a single, comparable metric. For example, for test accuracy:
    pytest --maxfail=1 --disable-warnings --tb=short | grep 'passed' | awk '{print $2}'
  • Start with Isolated Files: Limit the agent to a single file at first. This reduces complexity and makes the optimization process more manageable.
  • Use Descriptive Commit Messages: The agent commits each improvement. Customizing commit messages can help you track the nature of each change.
  • Monitor Progress: Use /ar:status to check experiment results and ensure the process is moving in the desired direction.
  • Set Sensible Intervals: Depending on evaluation time, set the loop interval to balance resource usage and experimentation speed.
  • Backup Your Work: Although git integration makes it easy to revert, keep regular backups of your repository.

Important Notes

  • Requires a Measurable Metric: The agent cannot optimize subjective qualities. Your evaluation command must output a numeric value that clearly represents "better" or "worse."
  • Single File Focus: Currently, the agent is designed to optimize one file at a time. Multi-file or project-wide optimization may require future enhancements.
  • Resource Usage: Running experiments indefinitely can consume compute resources. Adjust the loop interval and monitor system load accordingly.
  • Version Control Required: The agent depends on git for tracking changes. Ensure your repository is properly initialized and clean before starting.
  • Human Oversight: While autonomous, the agent’s edits may sometimes be unconventional or undesirable. Regularly review committed changes to ensure code or content quality.
  • Experiment responsibly: Before deploying changes to production, thoroughly review and test the optimized output.

By leveraging Autoresearch Agent, development teams can automate and accelerate the search for optimal solutions, freeing valuable time and ensuring a robust, empirical approach to continuous improvement.