Reflexion

Self-refinement loop that forces the LLM to reflect on previous output and correct itself

What Is This?

Overview

Reflexion is a self-refinement technique that instructs a large language model to evaluate its own previous output and produce an improved version in a subsequent pass. Rather than accepting the first response as final, the pattern introduces a structured feedback loop where the model acts as both generator and critic. This approach draws from the broader field of context engineering, where the shape and content of the prompt context directly influence output quality.

The core mechanism works by appending the model's prior response back into the context along with an explicit instruction to reflect, identify weaknesses, and rewrite. This forces the model to engage in a form of iterative reasoning rather than single-pass generation. The result is often a more accurate, complete, and well-structured output than what the initial prompt alone would produce.

Reflexion is part of the NeoLabHQ context-engineering-kit, a collection of reusable prompt patterns designed to give developers precise control over how language models reason and respond. It is particularly valuable in workflows where output quality is critical and a single generation pass is insufficient.

Who Should Use This

  • Software engineers building LLM-powered applications that require high-quality, reliable text generation
  • Prompt engineers looking for systematic techniques to improve model output without fine-tuning
  • AI product teams integrating language models into production pipelines where accuracy matters
  • Researchers experimenting with self-correction and iterative reasoning in large language models
  • Technical writers using LLM assistance who need outputs that meet strict quality standards
  • Developers working on code generation tools where the first draft often contains bugs or incomplete logic

Why Use It?

Problems It Solves

  • Single-pass LLM responses frequently contain factual gaps, logical inconsistencies, or incomplete reasoning that go unaddressed without a review step
  • Manual review of every model output is time-consuming and does not scale in automated pipelines
  • Standard prompts provide no mechanism for the model to catch its own errors before the response is delivered
  • Output quality varies significantly across runs, and there is no built-in way to enforce a minimum standard without human intervention
  • Code generation outputs often compile but fail to handle edge cases, which reflexion can surface through self-critique

Core Highlights

  • Implements a structured two-pass generation loop within a single prompt context
  • Requires no external tools, APIs, or fine-tuning to function
  • Compatible with any instruction-following language model
  • Produces measurably more complete and accurate outputs in complex generation tasks
  • Separates the generation phase from the critique phase, reducing confirmation bias in the model's self-assessment
  • Reusable as a modular pattern that can be composed with other context-engineering techniques
  • Lightweight implementation that adds minimal token overhead relative to quality gains

How to Use It?

Basic Usage

The reflexion pattern follows a three-part prompt structure: initial task, generated response, and reflection instruction.

[TASK]
Write a Python function that validates an email address.

[INITIAL RESPONSE]
{model_output_here}

[REFLECTION]
Review your response above. Identify any logical errors, missing edge cases,
or improvements. Then provide a corrected and improved version.

In practice, you run the model once to get the initial response, then inject that response back into the context and run a second pass with the reflection instruction appended.

Specific Scenarios

Code review loop: Generate a function, then apply reflexion to catch missing error handling, incorrect logic, or style violations before the code reaches a human reviewer.

Document drafting: Produce a first draft of technical documentation, then use reflexion to identify unclear explanations, missing steps, or inconsistent terminology.

Real-World Examples

A developer building an automated code assistant uses reflexion to reduce the number of bugs in generated functions by prompting the model to check its own output for off-by-one errors and unhandled exceptions before returning the result to the user.

A content pipeline for a technical blog uses reflexion to improve draft accuracy by asking the model to verify factual claims and tighten argument structure in a second pass.

Important Notes

Requirements

  • The model must support instruction-following to respond meaningfully to the reflection prompt
  • Sufficient context window length is required to hold both the original task and the initial response
  • API access or a local model runtime is needed to programmatically inject the first response back into the context