Reflexion

Self-refinement loop that forces the LLM to reflect on previous output and correct itself

Source: NeoLabHQ/context-engineering-kit

What Is This?

Overview

Reflexion is a self-refinement technique that instructs a large language model to evaluate its own previous output and produce an improved version in a subsequent pass. Rather than accepting the first response as final, the pattern introduces a structured feedback loop where the model acts as both generator and critic. This approach draws from the broader field of context engineering, where the shape and content of the prompt context directly influence output quality.

The core mechanism works by appending the model's prior response back into the context along with an explicit instruction to reflect, identify weaknesses, and rewrite. This forces the model to engage in a form of iterative reasoning rather than single-pass generation. The result is often a more accurate, complete, and well-structured output than what the initial prompt alone would produce.

Reflexion is part of the NeoLabHQ context-engineering-kit, a collection of reusable prompt patterns designed to give developers precise control over how language models reason and respond. It is particularly valuable in workflows where output quality is critical and a single generation pass is insufficient.

Who Should Use This

Software engineers building LLM-powered applications that require high-quality, reliable text generation
Prompt engineers looking for systematic techniques to improve model output without fine-tuning
AI product teams integrating language models into production pipelines where accuracy matters
Researchers experimenting with self-correction and iterative reasoning in large language models
Technical writers using LLM assistance who need outputs that meet strict quality standards
Developers working on code generation tools where the first draft often contains bugs or incomplete logic

Why Use It?

Problems It Solves

Single-pass LLM responses frequently contain factual gaps, logical inconsistencies, or incomplete reasoning that go unaddressed without a review step
Manual review of every model output is time-consuming and does not scale in automated pipelines
Standard prompts provide no mechanism for the model to catch its own errors before the response is delivered
Output quality varies significantly across runs, and there is no built-in way to enforce a minimum standard without human intervention
Code generation outputs often compile but fail to handle edge cases, which reflexion can surface through self-critique

Core Highlights

Implements a structured two-pass generation loop within a single prompt context
Requires no external tools, APIs, or fine-tuning to function
Compatible with any instruction-following language model
Produces measurably more complete and accurate outputs in complex generation tasks
Separates the generation phase from the critique phase, reducing confirmation bias in the model's self-assessment
Reusable as a modular pattern that can be composed with other context-engineering techniques
Lightweight implementation that adds minimal token overhead relative to quality gains

How to Use It?

Basic Usage

The reflexion pattern follows a three-part prompt structure: initial task, generated response, and reflection instruction.

[TASK]
Write a Python function that validates an email address.

[INITIAL RESPONSE]
{model_output_here}

[REFLECTION]
Review your response above. Identify any logical errors, missing edge cases,
or improvements. Then provide a corrected and improved version.

In practice, you run the model once to get the initial response, then inject that response back into the context and run a second pass with the reflection instruction appended.

Specific Scenarios

Code review loop: Generate a function, then apply reflexion to catch missing error handling, incorrect logic, or style violations before the code reaches a human reviewer.

Document drafting: Produce a first draft of technical documentation, then use reflexion to identify unclear explanations, missing steps, or inconsistent terminology.

Real-World Examples

A developer building an automated code assistant uses reflexion to reduce the number of bugs in generated functions by prompting the model to check its own output for off-by-one errors and unhandled exceptions before returning the result to the user.

A content pipeline for a technical blog uses reflexion to improve draft accuracy by asking the model to verify factual claims and tighten argument structure in a second pass.

Important Notes

Requirements

The model must support instruction-following to respond meaningfully to the reflection prompt
Sufficient context window length is required to hold both the original task and the initial response
API access or a local model runtime is needed to programmatically inject the first response back into the context

More Skills You Might Like

Explore similar skills to enhance your workflow