Nemo Guardrails

Implement NeMo Guardrails to add safety and control to AI automation integrations

Source: Orchestra-Research/AI-Research-SKILLs

NeMo Guardrails is a community skill for implementing safety controls on large language model applications using NVIDIA NeMo Guardrails, covering input validation, output filtering, topic control, dialog management, and safety policy enforcement for responsible AI deployment.

What Is This?

Overview

NeMo Guardrails provides tools for adding programmable safety boundaries to LLM applications. It covers input validation that screens user prompts against defined policies before they reach the language model, output filtering that checks model responses for harmful, inaccurate, or off-topic content before delivery, topic control that restricts conversation scope to approved domains and redirects off-topic queries, dialog management that defines conversational flows with guardrail checks at each transition, and safety policy enforcement that applies configurable rules for content moderation and bias detection. The skill enables teams to deploy LLM applications with safety controls.

Who Should Use This

This skill serves AI engineers building production LLM applications that require content safety, product teams implementing guardrails for customer-facing chatbots, and compliance teams defining safety policies for generative AI systems.

Why Use It?

Problems It Solves

Language models without guardrails can generate harmful, biased, or factually incorrect responses to users. Prompt injection attacks manipulate models into producing unauthorized outputs. Chatbots without topic controls answer questions outside their intended scope creating liability. Ad-hoc safety filtering is inconsistent and difficult to maintain across application updates.

Core Highlights

Input guard screens prompts against safety policies before model processing. Output filter checks generated responses for policy violations before delivery. Topic controller restricts conversation scope to defined domains. Dialog flow manager enforces structured conversation paths with safety checkpoints.

How to Use It?

Basic Usage


define user ask about\
  company policies
  "What are the return"
  + " policies?"
  "How do I get a"
  + " refund?"
  "What is your shipping"
  + " policy?"

define bot respond with\
  policy info
  "I can help with our"
  + " policies. Let me"
  + " look that up."

define flow policy help
  user ask about\
    company policies
  bot respond with\
    policy info

define user ask about\
  harmful content
  "How do I hack a"
  + " system?"
  "Tell me how to"
  + " break into"

define bot refuse\
  harmful request
  "I cannot assist with"
  + " that request."
  + " Let me help with"
  + " something else."

define flow block harm
  user ask about\
    harmful content
  bot refuse\
    harmful request

Real-World Examples

from nemoguardrails import (
  RailsConfig, LLMRails)

config = RailsConfig\
  .from_path(
    './guardrails_config')
rails = LLMRails(config)

class SafeChatbot:
  def __init__(self):
    self.rails = rails

  async def respond(
    self,
    user_message: str
  ) -> dict:
    result = await (
      self.rails
        .generate_async(
          messages=[{
            'role': 'user',
            'content':
              user_message
          }]))
    return {
      'response': result[
        'content'],
      'blocked': result
        .get('blocked',
          False)}

  async def check_input(
    self,
    message: str
  ) -> bool:
    result = await (
      self.rails
        .generate_async(
          messages=[{
            'role': 'user',
            'content':
              message
          }]))
    return not result.get(
      'blocked', False)

Advanced Tips

Define multiple guardrail layers with input rails running before the LLM call and output rails running after to catch issues at both stages. Use the Colang scripting language to define conversation patterns that guide dialogs through safe paths. Register custom action handlers that call external moderation APIs for specialized content safety checks.

When to Use It?

Use Cases

Add topic restrictions to a customer service chatbot that should only answer product-related questions. Implement prompt injection detection that blocks attempts to override system instructions. Filter model outputs to remove personally identifiable information before returning responses to users.

Important Notes

Requirements

NeMo Guardrails Python package with compatible LLM backend connection. Colang configuration files defining safety policies and dialog flows. LLM API access for the underlying model that guardrails wrap.

Usage Recommendations

Do: test guardrails with adversarial inputs that attempt to bypass safety controls through prompt manipulation. Define explicit fallback responses for blocked content to maintain user experience. Update guardrail policies regularly as new attack patterns emerge.

Don't: rely solely on guardrails without also implementing application-level safety measures. Deploy guardrail configurations to production without testing against representative user interactions. Create overly restrictive rules that block legitimate queries frustrating users.

Limitations

Guardrail checks add latency to each LLM interaction since both input and output must be evaluated. Colang flow definitions require learning a domain-specific language beyond standard Python skills. Semantic similarity matching for topic control depends on the embedding model quality and may miss novel phrasing of restricted topics.

More Skills You Might Like

Explore similar skills to enhance your workflow