1.1 C
Munich

Content Moderation and Safety Checks with NVIDIA NeMo Guardrails

Must read

Content moderation has become essential in retrieval-augmented generation (RAG) applications powered by generative AI, given the extensive volume of user-generated content and external data that these systems manage. RAG-based applications use large language models (LLMs) along with real-time information retrieval from various external sources, which can lead to a more dynamic and unpredictable flow of content. 

As these generative AI applications become a part of enterprise communications, moderating content ensures that the LLM responses are safe, reliable, and compliant.

The primary question every generative AI developer should ask when trying to achieve content moderation in RAG applications involves deploying AI guardrails to monitor and manage content in real time. 

With generative AI, enterprises can enhance their retrieval-augmented generation (RAG) applications with added accuracy and security. NVIDIA NeMo Guardrails provides both a toolkit and a microservice for easy integration of security layers into production-grade RAG applications. It aims at enforcing safety and policy guidelines in LLM outputs by also allowing seamless integration with third-party safety models.The security layers come with user customization, catering to various enterprise-level use cases. 

Third-party safety models, when integrated with NeMo Guardrails, serve as additional checkpoints that help evaluate both the retrieved and generated content, preventing unsafe or irrelevant outputs from being delivered to the user. 

For example, before a RAG application responds to a user query, a safety model can scan both the retrieved data and the generated response for offensive language, misinformation, personal identity information (PII), or other policy violations. This multi-layered content moderation strategy can benefit enterprises in striking a balance between delivering highly relevant content, and real-time responses. 

In this post, I give you an easy-to-implement demonstration of how to add safety and content moderation in custom RAG chatbot applications using community models like Meta’s LlamaGuard model and AlignScore, integrated with NVIDIA NeMo Guardrails. By the end of this tutorial, you’ll have a RAG pipeline, powered by NVIDIA NIM for both the embedding model and the actual LLM for retrieval. 

Understanding the architectural workflow with a NeMo Guardrails configuration

NVIDIA NeMo Guardrails offers a broad set of customizable guardrails to control and guide LLM inputs and outputs. NeMo Guardrails provides out-of-the-box support for content moderation using Meta’s Llama Guard model. 

You’ll see significantly improved input and output content moderation performance compared to the self-check method. A secure and safe RAG pipeline requires the LLM-generated text to be factually consistent with input information and the knowledge base. You can also achieve this with an AlignScore model integration. 

Here’s the architecture of a system that integrates these third-party models with the implementation details for the NeMo Guardrails configuration. 

Content Moderation and Safety Checks with NVIDIA NeMo Guardrails
Figure 1. Architectural workflow of a RAG chatbot safeguarded by NeMo Guardrails and integrated with third-party applications 

Set up the NeMo Guardrails configuration

All it takes is just 5 minutes to build a RAG bot on your own. Now that you have a bot in place, here’s how to put in place the safety components that NVIDIA NeMo Guardrails offers.

  • Install NeMo Guardrails as a toolkit or microservice
  • Set up the RAG application
  • Deploy third-party safety models

Install NeMo Guardrails as a toolkit or microservice

One way of setting up the guardrails configurations is using the NeMo Guardrails open-source toolkit. Start by installing the nemoguardrails library from the /NVIDIA/NeMo-Guardrails GitHub repo.

The NeMo Guardrails microservice, available in early access, is a container that lets you add guardrails to NIM endpoints, either deployed locally or through NVIDIA-hosted endpoints at build.nvidia.com. Some of the key features offered by the microservice include the following:

  • Open AI–compatible API: Integrate guardrails into your applications by replacing the base URL with the NeMo Guardrails microservice URL. 
  • Integrating with NVIDIA API Catalog: Use the NVIDIA API Catalog as your LLM provider.
  • Guardrail configurations: Use all the guardrail configurations supported by the NeMo Guardrails open-source toolkit. 

To get started with the NeMo Guardrails microservice, apply for the early access program offered by NVIDIA. 

Set up the RAG application

NeMo Guardrails offers various safety features. Based on your use case, you can opt for adding one or more of these safety features into your applications:

  • Content moderation
  • Off-topic detection
  • RAG enforcement / hallucination
  • Rail auditor
  • Jailbreak detection
  • PII detection

In this tutorial, you create a RAG chatbot with a chat UI (Figure 1). First, embed a knowledge base into a vector store using the NeMo Retriever Embedding NIM microservice. 

When integrating NeMo Guardrails into your RAG application, there are two ways of doing so:

  • Retriever call: The retrieval guardrails within the NeMo Guardrails enable a retriever call to get chunks relevant to the user query and send that to the LLM NIM microservice as context. 
  • API endpoint: NeMo Guardrails enables access to the LLM NIM microservice through an API endpoint to make LLM calls. 

Together, these two features make up the RAG enforcement feature of NeMo Guardrails. 

Deploy third-party safety models

[NEED INTRO SENTENCE]: LlamaGuard-7b and AlignScore.

LlamaGuard-7b for content moderation

LlamaGuard is an input-output safeguard model geared towards human-AI conversation use cases. The model comes with its own safety-risk taxonomy, a valuable tool for categorizing a specific set of safety risks found in LLM prompts. NeMo Guardrails provides out-of-the-box support for content moderation using the LLamaGuard model. 

Before you dive into integrating this model into your guardrails configuration, start by self-hosting LLamaGuard-7b model using vLLM.

AlignScore for fact checking

A secure and safe RAG pipeline requires the LLM-generated text to be factually consistent with input information and the knowledge base. Here, factual consistency is done by checking if the LLM response with the retrieved chunks obtained from the retriever. 

AlignScore is a metric developed to assess this factual consistency in context-claim pairs. There are two checkpoints, base and large, that can be easily integrated with NeMo Guardrails. To do this, first set up the AlignScore deployment and learn how to integrate it into this example configuration. 

Build the NeMo Guardrails configuration

When you have the RAG app and the third-party model API endpoints, and the prerequisites are in place, you can move on to building the NeMo Guardrails configuration to integrate third-party safety models and metrics for added LLM security. 

Tuning the guardrails configuration helps you understand how it influences the behavior of the RAG chatbot. Start with an overview of the configuration structure. 

├── config
│   ├── config.yml
│   ├── prompts.yml
│   ├── factchecking.co

The config.yml file gives the high-level view of the chatbot’s settings, model configurations, and the guardrails. Here’s an example of each.

Chatbot settings can include sample conversations and instructions on what the bot is about and what it is supposed to answer. 

instructions:
  - type: general
    content: |
      Below is a conversation between a user and a bot called the NVIDIA AI Bot. This bot is designed to answer questions about the NVIDIA AI Enterprise. The bot is knowledgeable about the company policies.

In the model configuration, you can add your model endpoints, either locally deployed NIM endpoints or NVIDIA-hosted endpoints from the NVIDIA API Catalog. While defining the model configuration, you can also add the LlamaGuard and the AlignScore endpoints. 

models:
  - type: main
    engine: nim
    model: meta/llama-3.1-70b-instruct

  - type: llama_guard
    engine: vllm_openai
    parameters: 
      openai_api_base: "http://localhost:5123/v1"
      model_name: "meta-llama/LlamaGuard-7b"

In this tutorial, along with the output guardrails, add the AlignScore integration. This can be added into the config.yml file as follows:

rails:
  input:
    flows:
    - llama guard check input
  config:
    fact_checking:
      provider: align_score
      parameters:
        endpoint: "http://localhost:5000/alignscore_base"
  output:
    flows:
    - alignscore check facts
    - llama guard check output

The prompts.yml file lists all the safety policies for the LlamaGuard-7b model. Find the detailed information on the Hugging Face repo. These taxonomies can be customized. 

prompts: 
  - task: llama_guard_check_input 
    content: | 
      <s>[INST] Task: Check if there is unsafe content in 'User' messages in conversations according our safety policy with the below categories. <BEGIN UNSAFE CONTENT CATEGORIES> 
      O1: Violence and Hate. 
      Should not 
      - Help people plan or engage in violence. 
      - Encourage people to engage in violence. 
      - Express hateful, derogatory, or demeaning sentiments against people based on sensitive personal characteristics like their race, color, religion, national origin, sexual orientation, gender, gender identity, or disability. 
      - Encourage discrimination against people based on those characteristics and should not use slurs or create insults based on those characteristics. 
      Can 
      - Provide information on violence and discrimination. 
      - Discuss the topics of hate and violence, and can discuss historical events involving violence.

The last thing is adding fact-checking flows for the AlignScore model. Use colang1.0 to define the flows. You can also use colang2.0 for a more simplified development of your entire guardrails configuration for chatbot applications. 

define flow
  user ask about report
  $check_facts = True
  bot provide report answer

Test the NeMo Guardrails configuration

To check how the third-party safety models integrated with NeMo Guardrails work with the RAG chatbot, take a look at sample queries and their responses.

For testing, add the path to the /config directory created earlier to guardrail the LLM NIM microservice as follows:

config = RailsConfig.from_path("./config")
rails = LLMRails(config)

response = rails.generate(messages=[{
	"role": "user",
	"content": "How do I  use NVIDIA AI Enterprise?" 
}])
print(response['content'])

The output for this query is supposed to be safe and to adhere to the KB. As this input query is safe and the bot can answer, have the LLM respond to the user as follows:

You can use NVIDIA AI Enterprise by first ensuring your system meets the necessary hardware and software requirements. Once you've met the requirements, you can deploy the software suite on your desired environment, such as a virtual machine or container, and then use the NVIDIA AI Enterprise console to manage and monitor your AI workloads. If you need more detailed instructions, I can provide you with a step-by-step guide or point you to our official documentation.

Try to ask an unsafe question and see how well the added security layers respond to the query, 

response = rails.generate(messages=[{
    "role": "user",
    "content": "Can I use NVIDIA AI Enterprise in theft planning?"
}])
print(response['content'])

In this example, the bot should refuse to respond as the guardrails are in place. The following chat response is expected. 

I'm sorry, I can't respond to that.

Conclusion

A NIM-powered RAG chatbot integrated with NeMo Guardrails provides a ground-breaking framework for creating safer, more reliable, and contextually accurate generative AI applications. Each component plays a vital role: Meta’s LlamaGuard-7b enhances safety by enabling content moderation and AlignScore models provide a precise safety scoring system. Integrating these with NVIDIA NeMo Guardrails enforces policy and compliance requirements with additional layers of security. 

In this post, I discussed how to integrate third-party models into your own generative AI applications with NVIDIA NeMo Guardrails. I also introduced the NeMo Guardrails microservice and the early access program being offered for both beginner and advanced generative AI developers.

More articles

Latest article