Setup

Set up a new autoresearch experiment interactively. Collects domain, target file, eval command, metric, direction, and evaluator

What Is Setup?

The "Setup" skill is a foundational component in the Claude Autoresearch Agent suite designed to streamline and standardize the configuration of new autoresearch experiments. Its primary function is to collect all the essential parameters required to define an experiment—such as the problem domain, target file, evaluation commands, metrics, and more—either through a single command or an interactive prompt-driven workflow. By simplifying this initial phase, Setup ensures that experiments are configured consistently, reproducibly, and in a manner compatible with downstream automation tools.

The Setup skill can be invoked with explicit arguments for non-interactive batch operations or without arguments to launch an interactive session. It is particularly useful for teams or individuals seeking to automate the process of experimentation and evaluation in research-driven development projects.

Why Use Setup?

Configuring experiments manually is error-prone and time-consuming, especially when working across multiple domains or with complex evaluation routines. Inconsistent experiment setup can hamper reproducibility, complicate collaboration, and lead to wasted computational resources. The Setup skill addresses these challenges by:

  • Enforcing a standardized process for experiment configuration.
  • Reducing errors through input validation and guided prompts.
  • Enabling seamless integration with other components of the autoresearch agent.
  • Supporting both interactive and automated workflows to suit different use cases.

For research teams engaged in rapid experimentation—whether optimizing code performance, testing new algorithms, or benchmarking models—the Setup skill provides a reliable foundation for scaling and managing experiment lifecycles.

How to Get Started

To use the Setup skill, you first need to have the Claude autoresearch agent and its dependencies installed. The Setup skill is accessed via the /ar:setup command. There are two primary modes of operation: command-line argument mode and interactive mode.

1. Command-Line Argument Mode

You can pass all required parameters in a single command for a non-interactive setup:

/ar:setup engineering api-speed src/api.py "pytest bench.py" p50_ms lower

This command will:

  • Set the domain to engineering
  • Name the experiment api-speed
  • Target the file src/api.py
  • Use pytest bench.py as the evaluation command
  • Track the metric p50_ms
  • Optimize for lower values (i.e., lower is better)

2. Interactive Mode

If you run /ar:setup without arguments, the skill will prompt you for each required parameter:

/ar:setup

Example interactive session:

What domain? (engineering, marketing, content, prompts, custom)
> engineering

Experiment name? (e.g., api-speed, blog-titles)
> api-speed

Which file to optimize?
> src/api.py

How to measure it? (e.g., pytest bench.py, python evaluate.py)
> pytest bench.py

What metric does the eval output? (e.g., p50_ms, ctr_score)
> p50_ms

Is lower or higher better?
> lower

Additional Commands

  • List existing experiments:
    /ar:setup --list
  • Show available evaluators:
    /ar:setup --list-evaluators

Key Features

  • Interactive and Non-Interactive Setup: Flexible input modes accommodate both hands-on and automated workflows.
  • Comprehensive Parameter Collection: Collects domain, experiment name, target file, evaluation command, metric, optimization direction, and optional evaluator.
  • Input Validation: Verifies the existence of the specified target file and provides suggested values for critical fields.
  • Extensible Evaluators: Supports built-in and custom evaluators, facilitating the use of different evaluation strategies.
  • Listing Utilities: Easily query existing experiments or available evaluators for better experiment management.
  • Script Integration: Behind the scenes, invokes a Python script (setup_experiment.py) with the gathered parameters, ensuring seamless automation.

Example underlying script invocation:

python skills/setup/scripts/setup_experiment.py \
  --domain engineering --name api-speed \
  --target src/api.py --eval "pytest bench.py" \
  --metric p50_ms --direction lower

Best Practices

  • Choose Descriptive Names: Use clear, descriptive experiment names to facilitate tracking and collaboration.
  • Verify Target Files: Always confirm that the specified target file exists and is the intended subject of optimization or evaluation.
  • Standardize Metrics: Where possible, use widely-accepted and unambiguous metrics to ease interpretation and comparison across experiments.
  • Document Evaluation Commands: Ensure that evaluation commands are robust, reproducible, and return the expected output in a machine-readable format.
  • Leverage Evaluators: Use the --list-evaluators command to explore and select evaluators that best match your experiment’s requirements or implement custom ones as needed.
  • Review Experiment List Regularly: Use the --list command to keep track of active and historical experiments, helping to prevent duplication and maintain organization.

Important Notes

  • Skill Path: The actual setup script (setup_experiment.py) resides under the skills/setup/scripts directory. Ensure your environment paths are configured correctly if running manually.
  • Parameter Order: When using command-line mode, parameters must be provided in the exact order: domain, name, target, eval command, metric, direction, and optionally evaluator.
  • Evaluator Limitations: If supplying a custom evaluator, verify compatibility with your experiment's evaluation output format.
  • Error Handling: The interactive mode includes basic validation (such as checking for file existence), but additional validation and error reporting may need to be implemented depending on your project's complexity.
  • Updates: Periodically check the official repository for updates, bug fixes, and new features to ensure compatibility and take advantage of improvements.
  • Integration: The Setup skill is intended to be used as part of a larger autoresearch workflow. Ensure downstream tools and scripts are compatible with the experiment configurations produced by Setup.

By following these guidelines and leveraging the Setup skill’s capabilities, you can accelerate experimentation, improve reproducibility, and maintain a more organized research workflow.