September 30, 2024

Building a RAG-Based Chatbot with Azure’s Prompt Flow

By Santiago Aznarez

It’s undeniable that one of the fastest-growing fields of artificial intelligence in recent years is Generative AI, particularly in the field of natural language processing. The number of systems and processes that can benefit from the use of Large Language Models (LLMs) is enormous, ranging from applications like chatbots and content generation to summarization tools, and more.

In this blog, we will show you how to use Prompt Flow to design a simple application for chatting with your documents.

What is Prompt Flow?

Microsoft Azure’s Prompt Flow simplifies the end-to-end development process of an LLM application, covering everything from prototyping and development to evaluation, deployment, and monitoring.

Within Prompt Flow we will be able to:

Design a pipeline that connects different tools, including LLMs, prompts, Python scripts and other tools. This is known as a flow inside Prompt Flow.
Debug and iterate the flow, tracing the interaction between tools.
Evaluate your flows on large datasets, define different prompts or LLM variants to select the best performing one for your use case.
Deploy the flow to a functioning endpoint within Azure. Other options like Docker and Kubernetes are also available.

LLMs and the Need for Document Context

To begin with, let’s lay the groundwork for the rest of the discussion so we can understand what we’re doing and what we want to achieve. Large Language Models, like OpenAI’s GPT models, are trained on an immense amount of text data, including books, news articles, web pages, and more. However, these models don’t have the necessary knowledge to answer questions about your specific documents, as they haven’t been trained with them.

Nonetheless, you can address this by including the text you want to inquire about in the prompt. But let’s not forget that prompts have a limited token capacity, so how do we query a moderately large document database?

Lucky for us, this problem can be solved with an architecture known as RAG (retrieval-augmented generation). Essentially, it involves splitting the text in the database into chunks, calculating a vector that represents these chunks using embedding models, and storing that vector, along with the text, in a vector database. Before asking the LLM a question, the system performs a vector search on the database, retrieving the text chunks most similar to the user’s input. These text chunks are then added to the prompt. Now, the LLM has the necessary context to answer the question at hand.

Building a RAG-Based Chatbot with Prompt Flow

So, using Prompt Flow, we will develop a RAG based Chatbot to answer questions about a text dataset that OpenAI models have not been trained with. With this intention, we will use the Microsoft SEC Annual Filings from 2024 as our document dataset.

Prerequisites:

Active Azure Subscription with access to OpenAI models through the OpenAI Service (access to them needs to be requested).
Some documents to work with, we will be using the Microsoft SEC Annual Filings from 2024.

Step-by-Step Guide to Setting Up Microsoft Azure AI Studio

First of all, we need to create an Azure AI Studio resource:

Sign in to the Azure Portal:
- Go to the Azure Portal and sign in with your Azure account.
Create a New Resource:
- Click on the + Create a resource button located in the top left corner of the portal.
Search for Azure AI Studio:
- In the search bar, type Azure AI Studio and select it from the search results.
Click on ‘Create’:
- On the Azure AI Studio page, click on the Create button and follow the resource creation process.
Launch the Azure AI Studio.
- Wait for the resource to be created and launch it.

Deploying OpenAI Models for Your RAG Chatbot

Within Azure AI Studio, we need to create the necessary components. In this case, we need:

An OpenAI text generation model, which can be GPT 3.5. This will be the text generation model that will generate the answers for the user question and the retrieved context.
An OpenAI embeddings model, such as ADA. This model is in charge of generating the embeddings that will represent each chunk of text.
An index, where the text chunks will be stored along with the vector that represents them.

To deploy the OpenAI models, we need to go to Deployments on the left panel of the screen, under Components. Within deployments, click on Deploy and then Deploy base models. We need to deploy the following ones:

gpt-35-turbo
text-embedding-ada-002

Creating an Index for Document Retrieval

Next, let’s create the index. To do this, we need to go to Indexes on the left panel of the page and follow these steps:

Click on New index.
Select Upload Files on Data Source, and upload your document files. We will be using the Microsoft SEC Annual Filings.
On Index Settings, select your Azure AI Search service where the index will be stored. If you don’t have one, click on Create a new Azure AI Search resource and follow the creation process. Make sure you choose the Free Pricing Tier for now.
On search settings, select Add vector search to this search resource, and select the previously deployed embedding model. If you haven’t deployed one yet, it will be deployed automatically at this step.
Review the selected settings and click on Create.
Wait for the index to be created.

Designing the Chat Flow in Prompt Flow

Now that everything is set, it’s time to have some fun! Let’s navigate to Prompt Flow on the left panel of the screen, located under Tools. Next, create a new Chat Flow. Initially, you should see a simple setup with just three nodes: an input node, an output node, and a chat node (the component that connects to the OpenAI GPT models). For now, delete the chat node so we can build the flow from scratch.

Default Chat Flow

A flow consists of multiple nodes that interact with each other. As of the time of writing, the following tools exist:

LLM tool: Use OpenAI LLMs for text completion or chat.
Prompt tool: Craft prompts using jinja as the templating language.
Python tool: Run Python code.
Content Safety: Use Azure Content Safety to detect harmful content.
Embedding: Use Open AI’s models to create an embedding vector representing the input text.
Open Model LLM: Use an open model from the Azure models catalog, deployed to an online endpoint, for LLM Chat or Completion API calls.
Serp API: Use Serp API to obtain search results from specific search engines.
Index Lookup: Search an index for relevant text results using one or more text queries.

For this simple application, we will only use the following tools:

LLM tool
Prompt tool
Python tool
Index Lookup

First, create an Index Lookup tool to retrieve the text chunks from the vector database, which will serve as context to answer the user’s question.

Index lookup tool promptflow

You will need to configure the connection to the index, as well as selecting the type of query, and the number of chunks to be retrieved (top_k).

Next, let’s create a Python tool that receives the output from the retrieval tool and pre-processes it to retain only the text. We won’t go into the details of how to do this, but you can use the following script within the tool.

Prompt Flow python script tool settings

As the next node, let’s create a Prompt tool. Here we will write the prompt that will be the input to the LLM. In it, besides the system prompt with the behavior guidelines, we’ll include the chat history, the user input, and the context retrieved from the vector database. Initially, we can use a simple prompt like the following:

large language model system prompt

As you can see, the prompt takes the question, the context retrieved from the vector database, and the chat history as parameters. Therefore, it’s necessary to provide these variables as inputs to the Prompt node. As a result, the output of this node will be the formatted prompt with the question and context.

Something worth noting about Prompt nodes is their option to create different variants of the prompt. This is very helpful when testing the solution, as it allows you to submit a batch run to the pipeline, compare the results from each prompt, and ultimately choose the one that delivers the best outcomes.

Finally, we need to create an LLMs node, which will connect to the OpenAI service, specifically to the gpt-35-turbo text generation model that was previously deployed. Once again, on the LLM nodes, multiple variants could be created, tweaking the models parameters as well as using different language models to compare the results.

Prompt Flow chat tool settings

The resulting flow should look like this:

Final RAG-Based Flow

Testing Your Chatbot in Prompt Flow

It’s time to test our LLM app. In order to do this, click on the blue Chat button on the right side of the screen. A chat window will open where we can test the solution.

Testing the RAG-Based chatbot

Thanks to the context provided in the prompt, GPT-3.5 was able to answer questions about Microsoft’s earning in 2024, even though the data it was trained on only goes up to September 2021. By clicking on View trace below the last response, we can look at the inputs and outputs of each node, as well as the response times and other interesting insights about each one of them.

Evaluating the Performance of Your LLM Application

To evaluate the quality of the app’s responses, Prompt Flow offers built-in evaluation flows for various metrics such as groundedness, coherence, and relevance, among others. To start the evaluation, click on Evaluate at the top of the screen and upload a .jsonl or .csv file containing the evaluation dataset. In order to achieving this, the dataset needs to include a set of questions and the app-generated answers. For some metrics, it may also require the context retrieved by the lookup node and the ground truth answers. You can generate this evaluation dataset with a batch run.

Deploying and Monitoring Your Document Chatbot

Deploying this app to a production environment is really straightforward. Simply click on the Deploy button at the top of the screen, select the Virtual Machine size, configure settings like the Authentication type, and wait for the endpoint to be ready.

Once deployment is complete, the app will appear under Deployments in the left panel. By clicking on it, you can access key details such as the endpoint URI and Key. Additionally, you’ll find a code snippet for consuming the endpoint in multiple programming languages, a chat interface for testing, and access to logs and monitoring information.

Conclusion: Leveraging Prompt Flow for Developing LLM Apps

Microsoft Azure Prompt Flow allows for the rapid development of LLM applications with minimal coding, offering an intuitive interface to connect tools like LLMs, prompts, and Python scripts. This enables developers to quickly build, test, and refine their applications without needing extensive custom code.

With its built-in evaluation prompts and easy deployment options, Prompt Flow makes it simple to test and optimize LLM models while swiftly moving them into production. However, it does have some limitations, such as less flexibility in text chunking techniques and fewer options for serverless deployments. Despite these constraints, Prompt Flow remains a powerful tool for accelerating LLM-based app development.