Run

One-shot lifecycle command that chains init → baseline → spawn → eval → merge in a single invocation

What Is Run?

The Run skill in Claude Code is a powerful, streamlined command designed for AgentHub users who need to execute a complete agent lifecycle in a single step. Tagged simply as "One-shot lifecycle command that chains init → baseline → spawn → eval → merge in a single invocation," the /hub:run command encapsulates a series of critical development and evaluation phases—initializing the environment, capturing a baseline, spawning multiple agent variants, evaluating their outputs, and merging the best result—into one seamless workflow.

Instead of invoking each phase separately (which can be repetitive and error-prone), Run enables developers and machine learning practitioners to orchestrate the entire lifecycle with a single, well-structured command. This capability is particularly beneficial for teams focusing on rapid iteration, agent-based code optimization, or automated software improvement pipelines.

Why Use Run?

The Run skill addresses several common pain points in agent-based development and experimentation workflows:

  • Efficiency: By chaining all lifecycle steps—init, baseline, spawn, eval, and merge—users save time and reduce manual intervention.
  • Consistency: Automating the lifecycle ensures that each run is reproducible and that all necessary steps are executed in the correct order.
  • Parallelism: With built-in support for spawning multiple agents, Run facilitates parallel experimentation, comparison, and selection of the best outcome.
  • Flexibility: The command supports a range of use cases, from code optimization and refactoring to test writing and creative generation, all customizable via parameters.
  • Scalability: Whether you are running a single agent or orchestrating dozens, Run abstracts the complexity into a simple, parameterized interface.

In short, Run is designed for those who want to streamline agent-based development workflows, automate evaluations, and ensure optimal results with minimal manual overhead.

How to Get Started

Getting started with the Run skill is straightforward. After installing and configuring AgentHub and its skills, you can invoke /hub:run directly from your preferred terminal or chat interface.

Example 1:

Code Optimization

/hub:run --task "Reduce p50 latency" --agents 3 \
  --eval "pytest bench.py --json" --metric p50_ms --direction lower \
  --template optimizer

This command will:

  • Describe the task to all agents ("Reduce p50 latency"),
  • Spawn three agent variants using the optimizer template,
  • Evaluate each agent's solution using a benchmark test,
  • Extract the p50_ms metric,
  • Treat lower values as better,
  • Automatically merge the best-performing agent’s changes.

Example 2:

Automated Refactoring

/hub:run --task "Refactor auth module" --agents 2 --template refactorer

Here, two refactorer agents independently refactor the authentication module. No external eval command is specified, so only agent template logic applies.

Example 3:

Test Coverage Improvement

/hub:run --task "Cover untested utils" --agents 3 \
  --eval "pytest --cov=utils --cov-report=json" --metric coverage_pct --direction higher \
  --template test-writer

Three agents attempt to maximize code coverage in the utils module, and the agent producing the highest coverage percentage is selected.

Example 4:

Creative Generation

/hub:run --task "Write 3 email subject lines for spring sale campaign" --agents 3 --judge

For tasks requiring subjective evaluation, such as creative writing, you can use the --judge flag to invoke LLM-based judging.

Key Features

  • One-Shot Lifecycle: Executes initialization, baseline capture, agent spawning, evaluation, and merging in a single invocation.
  • Flexible Agent Templates: Supports various agent roles, including optimizer, refactorer, test-writer, and bug-fixer.
  • Evaluation Integration: Accepts any shell command for automated evaluation, with metrics extracted from command output.
  • Metric Directionality: Specify whether higher or lower metric values are preferred via the --direction flag.
  • Parallel Agent Support: Easily set the number of agents to run in parallel for comparative experimentation.
  • Judge Mode: For tasks without quantitative evaluation, enable LLM-based judging with the --judge flag.

Best Practices

  • Define Clear Tasks: Always provide a concise, unambiguous task description with the --task parameter. This ensures agents understand the objective.
  • Choose Appropriate Templates: Select the agent template (--template) that best matches your problem (e.g., use optimizer for performance tuning).
  • Set Meaningful Evaluation: When using --eval, ensure the command outputs the metric you intend to optimize, and specify both --metric and --direction.
  • Leverage Parallelism Thoughtfully: More agents can increase the chance of finding an optimal solution, but also consume more resources.
  • Monitor Output Carefully: Review merged results, especially after creative or refactoring tasks, before deploying to production.

Important Notes

  • Parameter Requirements: --task is always required. If you provide --eval, you must also supply both --metric and --direction. The --agents parameter defaults to 3 if omitted.
  • Evaluation Output: The evaluation command must output the metric in a machine-readable format so the skill can extract and compare results.
  • Judge Mode vs. Eval Mode: Use --judge for subjective tasks; otherwise, always provide an evaluation command and metric for objective comparison.
  • Template Extensibility: You can create custom templates to extend agent behaviors, though built-in templates cover most common scenarios.
  • Manual Review: While Run automates merging, always review critical or production code changes post-merge for safety.
  • Skill Source and Updates: The latest version and documentation are available at AgentHub Run Skill GitHub.

By integrating the Run skill into your workflow, you can automate, accelerate, and standardize agent-driven development processes with minimal configuration and maximum flexibility.