Runbook Generator
Automate and integrate Runbook Generator to streamline operational procedures
Runbook Generator is a community skill for creating operational runbooks, covering incident response procedures, troubleshooting guides, escalation workflows, automated diagnostics, and documentation templates for operations teams.
What Is This?
Overview
Runbook Generator provides tools for creating structured operational runbooks that guide teams through incident response and maintenance tasks. It covers incident response procedures that document step-by-step actions for common failure scenarios with diagnostic commands and remediation steps, troubleshooting guides that create decision trees for diagnosing root causes based on observed symptoms, escalation workflows that define when and how to escalate issues to specialized teams, automated diagnostics that embed executable commands for gathering system state during incidents, and documentation templates that standardize runbook format across teams. The skill helps operations teams respond to incidents consistently.
Who Should Use This
This skill serves site reliability engineers creating incident response documentation, operations teams standardizing troubleshooting procedures, and engineering managers building on-call reference materials.
Why Use It?
Problems It Solves
On-call engineers encounter unfamiliar systems during incidents and lack documented steps for diagnosis and remediation. Tribal knowledge about system behavior lives in individual memories rather than accessible documentation. Incident response quality varies depending on which engineer is on call. Escalation decisions are delayed when criteria and contact paths are not clearly defined.
Core Highlights
Procedure builder creates step-by-step incident response guides. Decision tree generator builds symptom-based diagnostic flows. Escalation mapper defines criteria and paths for issue handoff. Command embedder adds executable diagnostics to runbook steps.
How to Use It?
Basic Usage
from dataclasses import (
dataclass, field)
import json
@dataclass
class Step:
action: str
command: str = ''
expected: str = ''
on_failure: str = ''
@dataclass
class Runbook:
title: str
service: str
severity: str
steps: list = field(
default_factory=list)
escalation: str = ''
def add_step(
self, step: Step
):
self.steps.append(
step)
def to_markdown(self):
md = (f'# {self.title}'
f'\n\n'
f'**Service:** '
f'{self.service}\n'
f'**Severity:** '
f'{self.severity}\n\n')
for i, s in enumerate(
self.steps, 1
):
md += (
f'## Step {i}: '
f'{s.action}\n\n')
if s.command:
md += (
f'```bash\n'
f'{s.command}\n'
f'```\n\n')
if s.expected:
md += (
f'Expected: '
f'{s.expected}\n\n')
return md
rb = Runbook(
'DB Connection Fix',
'api-server', 'P1')
rb.add_step(Step(
'Check DB status',
'pg_isready -h db01',
'accepting connections'))
rb.add_step(Step(
'Check connections',
'SELECT count(*) FROM '
'pg_stat_activity;',
'Under max_connections'))
print(rb.to_markdown())Real-World Examples
from dataclasses import (
dataclass)
@dataclass
class DiagNode:
question: str
command: str = ''
yes_action: str = ''
no_action: str = ''
yes_next: str = ''
no_next: str = ''
class DiagTree:
def __init__(
self, name: str
):
self.name = name
self.nodes = {}
def add_node(
self,
node_id: str,
node: DiagNode
):
self.nodes[
node_id] = node
def render(
self
) -> str:
lines = [
f'# {self.name}',
'']
for nid, node in (
self.nodes.items()
):
lines.append(
f'## {nid}: '
f'{node.question}')
if node.command:
lines.append(
f'Run: '
f'`{node.command}`')
lines.append(
f'- Yes: '
f'{node.yes_action}')
lines.append(
f'- No: '
f'{node.no_action}')
lines.append('')
return '\n'.join(
lines)
tree = DiagTree(
'High Latency')
tree.add_node('A',
DiagNode(
'Is CPU above 80%?',
'top -bn1',
'Scale horizontally',
'Check step B'))
tree.add_node('B',
DiagNode(
'Is memory full?',
'free -h',
'Restart service',
'Check network'))
print(tree.render())Advanced Tips
Include specific diagnostic commands in each step so engineers can copy and run them directly during incidents. Add expected output descriptions so responders can verify each step succeeded. Link related runbooks to create a network of procedures that cover complex multi-service incidents.
When to Use It?
Use Cases
Create an incident response runbook for database connection failures with diagnostic commands. Build a decision tree for diagnosing high latency issues across application tiers. Generate escalation procedures with clear criteria and contact paths.
Related Topics
Runbooks, incident response, SRE, operations, troubleshooting, on-call procedures, and documentation.
Important Notes
Requirements
Knowledge of system architecture and common failure modes for accurate runbook content. Access to diagnostic commands and monitoring tools referenced in runbook steps. Review process to validate runbook accuracy with subject matter experts.
Usage Recommendations
Do: test runbook steps in a non-production environment to verify commands work as documented. Update runbooks after each incident to incorporate lessons learned. Include rollback steps for any remediation action that modifies system state.
Don't: write runbooks that assume specific expertise since on-call engineers may be unfamiliar with the service. Include credentials or secrets directly in runbook text. Let runbooks become stale by skipping reviews after infrastructure changes.
Limitations
Runbooks cover known failure scenarios and cannot anticipate every possible incident. Complex incidents may require deviation from documented steps based on specific circumstances. Command outputs may differ across environments requiring engineers to adapt documented steps accordingly.
More Skills You Might Like
Explore similar skills to enhance your workflow
Token Integration Analyzer
Token Integration Analyzer automation and integration
Grafbase Automation
Automate Grafbase operations through Composio's Grafbase toolkit via
Rdkit
Streamline cheminformatics workflows by automating RDKit molecule processing and chemical data analysis
Mailsoftly Automation
Automate Mailsoftly operations through Composio's Mailsoftly toolkit
Mailbluster Automation
Automate Mailbluster tasks via Rube MCP (Composio)
Imgix Automation
Automate Imgix operations through Composio's Imgix toolkit via Rube MCP