Senior Prompt Engineer

Senior Prompt Engineer automation and integration for expert prompt crafting

Senior Prompt Engineer is a community skill for designing effective prompts that maximize language model output quality, covering prompt structure, few-shot design, chain-of-thought techniques, system prompt architecture, and evaluation methodologies.

What Is This?

Overview

Senior Prompt Engineer provides advanced patterns for crafting prompts that produce reliable, high-quality outputs from language models. It covers system prompt design with role and constraint specification, few-shot example selection and formatting, chain-of-thought reasoning elicitation, output format control through structured instructions, and prompt evaluation frameworks. The skill enables practitioners to systematically improve model outputs through principled prompt engineering rather than trial and error.

Who Should Use This

This skill serves developers building LLM-powered applications that require consistent output quality, product teams designing conversational AI experiences with specific behavioral requirements, and engineers optimizing prompt performance for cost and latency.

Why Use It?

Problems It Solves

Ad-hoc prompt writing produces inconsistent results that vary unpredictably between similar inputs. Models ignore important constraints when instructions are vague or ambiguous in the prompt. Few-shot examples chosen without systematic criteria lead to biased or narrow model behavior. Without evaluation metrics, prompt improvements cannot be measured objectively across iterations.

Core Highlights

System prompt architecture defines model behavior through structured role, context, and constraint sections. Few-shot example design selects diverse, representative examples that demonstrate desired output patterns. Chain-of-thought templates guide models through explicit reasoning steps before generating answers. Output format specifications enforce consistent structure through schema definitions and parsing instructions.

How to Use It?

Basic Usage

from dataclasses import dataclass, field

@dataclass
class PromptTemplate:
    role: str
    context: str
    instructions: list[str] = field(default_factory=list)
    constraints: list[str] = field(default_factory=list)
    output_format: str = ""

    def build_system_prompt(self) -> str:
        parts = [f"You are {self.role}.\n"]
        if self.context:
            parts.append(f"Context: {self.context}\n")
        if self.instructions:
            parts.append("Instructions:")
            for inst in self.instructions:
                parts.append(f"- {inst}")
        if self.constraints:
            parts.append("\nConstraints:")
            for con in self.constraints:
                parts.append(f"- {con}")
        if self.output_format:
            parts.append(f"\nOutput Format: {self.output_format}")
        return "\n".join(parts)

class FewShotManager:
    def __init__(self):
        self.examples: list[dict] = []

    def add_example(self, input_text: str, output_text: str,
                    category: str = ""):
        self.examples.append({"input": input_text,
            "output": output_text, "category": category})

    def select(self, n: int = 3,
              diverse: bool = True) -> list[dict]:
        if not diverse or not self.examples[0].get("category"):
            return self.examples[:n]
        categories = set(e["category"] for e in self.examples)
        selected = []
        for cat in categories:
            if len(selected) >= n:
                break
            match = next(e for e in self.examples
                         if e["category"] == cat)
            selected.append(match)
        return selected[:n]

Real-World Examples

from dataclasses import dataclass, field

@dataclass
class PromptEvaluation:
    prompt_version: str
    test_cases: list[dict] = field(default_factory=list)
    scores: list[float] = field(default_factory=list)

class PromptEvaluator:
    def __init__(self):
        self.history: list[PromptEvaluation] = []

    def evaluate(self, version: str, prompt: str,
                 test_inputs: list[str],
                 score_fn) -> PromptEvaluation:
        evaluation = PromptEvaluation(prompt_version=version)
        for test_input in test_inputs:
            score = score_fn(prompt, test_input)
            evaluation.scores.append(score)
            evaluation.test_cases.append(
                {"input": test_input, "score": score})
        self.history.append(evaluation)
        return evaluation

    def compare_versions(self) -> list[dict]:
        return [{"version": e.prompt_version,
                 "avg_score": round(
                     sum(e.scores) / max(len(e.scores), 1), 4),
                 "num_tests": len(e.scores)}
                for e in self.history]

Advanced Tips

Use delimiters like XML tags or triple backticks to separate user input from instructions in the prompt, preventing injection. Build evaluation datasets that cover edge cases and failure modes specific to the target task. Version prompts alongside application code and track performance metrics for each version in production.

When to Use It?

Use Cases

Design system prompts for a customer-facing chatbot that must follow specific behavioral guidelines. Build an extraction pipeline that reliably outputs structured JSON from unstructured text. Create evaluation harnesses that measure prompt quality across diverse test scenarios before deployment.

Related Topics

Prompt engineering techniques, chain-of-thought reasoning, few-shot learning patterns, LLM application development, and output parsing strategies.

Important Notes

Requirements

Access to a language model API for testing prompt iterations. A set of representative test inputs for evaluating prompt quality. Clear definition of desired output format and behavioral constraints.

Usage Recommendations

Do: write explicit instructions rather than relying on implicit model knowledge. Test prompts on diverse inputs including edge cases before deploying to production. Use structured output formats like JSON when downstream systems need to parse responses.

Don't: assume that a prompt working on one model will transfer perfectly to another model. Include contradictory instructions that force the model to choose between conflicting rules. Rely on prompt engineering alone when fine-tuning would better address the task.

Limitations

Prompt effectiveness varies across model providers and versions. Complex behavioral requirements may exceed what prompt engineering alone can achieve. Long system prompts consume context window tokens that reduce space for user content.