Skill Improver

Automate and integrate Skill Improver tools to accelerate personal and professional growth

Skill Improver is a community skill for enhancing and optimizing existing Claude skills, covering prompt refinement, output quality improvement, edge case handling, performance tuning, and user experience optimization for better skill effectiveness.

What Is This?

Overview

Skill Improver provides methods for analyzing and enhancing the quality of existing Claude skills. It covers prompt refinement that improves instruction clarity and specificity for more consistent outputs, output quality improvement that adjusts formatting, detail level, and accuracy of generated results, edge case handling that identifies and addresses unusual inputs that cause unexpected behavior, performance tuning that optimizes response time and token efficiency, and user experience optimization that aligns skill behavior with user expectations. The skill helps developers iterate on existing skills.

Who Should Use This

This skill serves skill authors improving published skills, teams maintaining skill libraries for their organization, and developers debugging skills that produce inconsistent results.

Why Use It?

Problems It Solves

Skills written quickly may produce inconsistent output quality across different inputs. Edge cases cause unexpected behavior that frustrates users. Prompts that are too vague lead to unpredictable results. Skills that use excessive tokens for simple tasks waste compute resources.

Core Highlights

Prompt refiner improves instruction clarity for consistent outputs. Quality optimizer adjusts formatting and detail levels. Edge case handler identifies inputs that cause unexpected behavior. Efficiency tuner reduces token usage without sacrificing quality.

How to Use It?

Basic Usage

from dataclasses import (
  dataclass, field)

@dataclass
class TestCase:
  input_text: str
  expected_format: str
  tags: list[str] = field(
    default_factory=list)

@dataclass
class TestResult:
  test: TestCase
  output: str
  passed: bool
  issues: list[str]

class SkillTester:
  def __init__(self):
    self.cases: list[
      TestCase] = []
    self.results: list[
      TestResult] = []

  def add_case(
    self,
    input_text: str,
    expected: str,
    tags: list = None
  ):
    self.cases.append(
      TestCase(
        input_text,
        expected,
        tags or []))

  def evaluate(
    self,
    output: str,
    case: TestCase
  ) -> TestResult:
    issues = []
    if case.expected_format\
      == 'json':
      if not output\
        .strip()\
        .startswith('{'):
        issues.append(
          'Not JSON format')
    passed = (
      len(issues) == 0)
    result = TestResult(
      case, output,
      passed, issues)
    self.results.append(
      result)
    return result

  def summary(self) -> dict:
    total = len(
      self.results)
    passed = sum(
      1 for r in
      self.results
      if r.passed)
    return {
      'total': total,
      'passed': passed,
      'rate': passed /
        max(total, 1)}

tester = SkillTester()
tester.add_case(
  'Summarize this',
  'text', ['basic'])
tester.add_case(
  'Extract data',
  'json', ['structured'])

Real-World Examples

from dataclasses import (
  dataclass)

@dataclass
class PromptVersion:
  version: int
  text: str
  score: float = 0.0

class PromptOptimizer:
  def __init__(self):
    self.versions: list[
      PromptVersion] = []

  def add_version(
    self,
    text: str,
    score: float
  ):
    v = len(
      self.versions) + 1
    self.versions.append(
      PromptVersion(
        v, text, score))

  def best(self
  ) -> PromptVersion:
    return max(
      self.versions,
      key=lambda v:
        v.score)

  def improvements(
    self
  ) -> list:
    changes = []
    for i in range(
      1, len(
        self.versions)
    ):
      prev = (
        self.versions[
          i - 1])
      curr = (
        self.versions[i])
      delta = (
        curr.score -
        prev.score)
      changes.append({
        'from': prev.version,
        'to': curr.version,
        'delta': delta})
    return changes

opt = PromptOptimizer()
opt.add_version(
  'Summarize the text',
  0.65)
opt.add_version(
  'Summarize in 2 '
  'sentences max',
  0.82)
opt.add_version(
  'Summarize in 2 '
  'sentences. Include '
  'key findings only.',
  0.91)
best = opt.best()
print(
  f'Best: v{best.version}'
  f' ({best.score})')

Advanced Tips

Create a test suite covering normal inputs, edge cases, and adversarial inputs to evaluate robustness. Track prompt versions with scores to identify which changes had the most positive impact. Use A/B testing with real users to validate that offline improvements translate to better experience.

When to Use It?

Use Cases

Refine a skill prompt that produces inconsistent output formatting across different inputs. Identify edge cases where a skill fails and add handling for those scenarios. Optimize a skill to use fewer tokens while maintaining output quality.

Related Topics

Prompt engineering, skill development, quality assurance, A/B testing, prompt optimization, and user experience.

Important Notes

Requirements

Existing skill to analyze and improve with access to its prompt configuration. Test cases representing typical and edge case inputs for evaluation. Metrics for measuring output quality such as format compliance and content accuracy.

Usage Recommendations

Do: test skill changes against a comprehensive set of inputs before deploying updates. Make incremental changes and measure impact rather than rewriting entire prompts at once. Document what each prompt version changed and why for future reference.

Don't: optimize for a single test case since changes may degrade performance on other inputs. Remove constraints from prompts to reduce tokens if it causes quality regression. Ship updates without regression testing against the full test suite.

Limitations

Prompt optimization is empirical and results may vary with model updates or different model versions. Automated quality metrics may not capture all aspects of output quality that users care about. Edge case coverage depends on anticipating unusual inputs which is inherently limited.