Building a RAG-based PDF Question-Answering System with LangChain

API

Building a RAG-based PDF Question-Answering System with LangChain

Build a robust question-answering app that lets users upload PDFs and ask questions, powered by LangChain and a RAG pipeline with real-time responses.

Kanika Bansal

Last updated:

June 9, 2025

Min Read

Try Murf for Free

Contact Sales

Building a RAG-based PDF Question-Answering System with LangChain

Table of Contents

Text Link

This tutorial will help you create a simple PDF question-answering system that lets users get instant, AI-powered answers from their documents. We'll keep it simple and powerful, using LangChain for the AI magic and React for a sleek interface.

Prerequisites

Before we dive in, make sure you have:

Python 3.8+
Node.js and npm
OpenAI API key
Some PDFs for testing!

Understanding RAG and LangChain

What's RAG?

Retrieval-Augmented Generation (RAG) is like giving your AI assistant a perfect memory. Instead of relying only on its training data, RAG lets the AI:

Store and index your documents (Retrieval)
Use this specific information to generate accurate answers (Augmented Generation)

Think of it as the difference between asking someone about a book they read years ago versus letting them look at the book while answering your questions.

How Does RAG Work Under the Hood?

Let's break down how RAG actually processes and retrieves information:

Document to Vectors: First, we convert text into vectors (called embeddings) that AI can understand. Think of embeddings as converting words into coordinates in a huge multi-dimensional space, where similar meanings are closer together. For example, "happy" and "joyful" would be near each other, while "happy" and "table" would be far apart.
Smart Storage: These embeddings are stored in a special database called a vector store (we're using ChromaDB). It's like a smart library that can quickly find similar pieces of text based on their meaning, not just matching exact words.
Question Processing: When you ask a question, we:
1. Convert your question into an embedding using the same process
2. Use the vector store to find the most similar text chunks from your documents
3. Send these relevant chunks along with your question to the LLM for answering

Why LangChain?

LangChain makes building RAG applications super easy by providing:

Document loading capabilities
Text splitting utilities
Vector storage solutions
Language model integration
And much more!

What We're Building

Our app will:

Let users upload up to 5 PDFs
Process documents using LangChain and ChromaDB
Answer questions about the uploaded documents
Show answers with their source references
Provide a clean, intuitive interface

Project Structure

We're keeping it super simple with just two main files:

  pdf-qa-system
├──  backend
│   └── app.py               
└──  frontend
    └──  src
        └── PDFQuestionAnswering.jsx

Building the Backend

Let's break down our backend implementation into logical steps. First, make sure you have the prerequisites.

Step 1: Setting Up the Environment

# Create project directory
mkdir pdf-qa-system
cd pdf-qa-system

# Create and activate virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install flask flask-cors langchain chromadb sentence-transformers pypdf openai

# Backend (.env)
OPENAI_API_KEY=your-openai-api-key

Step 2: Understanding the Backend Components

Our backend consists of three main parts:

PDF Processing: Reading and splitting documents
Embedding Generation: Converting text into vector representations
Question-Answering: Using RAG to generate accurate answers

Let's look at how each part works:

PDF Processing Flow

User uploads PDFs → Temporary storage
PyPDFLoader extracts text
Text gets split into chunks
HuggingFace models create embeddings
ChromaDB stores everything for quick retrieval

Question-Answering Process

User asks a question
System finds relevant document chunks
LLM (GPT) generates an answer using the context
User gets the answer with source references

Step 3: Backend Implementation

Let's break down our backend code and understand each component in detail:

from flask import Flask, request, jsonify, make_response
from flask_cors import CORS
from werkzeug.utils import secure_filename
from langchain_community.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.embeddings import HuggingFaceEmbeddings
from langchain_community.vectorstores import Chroma
from langchain_openai import ChatOpenAI
from langchain.chains import RetrievalQA
import tempfile
import os
import logging
from dotenv import load_dotenv

Why these imports?

Flask and CORS: For creating our web server and handling cross-origin requests
LangChain components: For document processing, embeddings, and QA chain
Utility imports: For file handling, logging, and environment variables

CORS Configuration

CORS(app, resources={
   r"/*": {
       "origins": ["http://localhost:5173", "http://127.0.0.1:5173"],
       "methods": ["GET", "POST", "OPTIONS"],
       "allow_headers": ["Content-Type", "Authorization", "Accept", "Accept-Language",
                        "Connection", "Origin", "Referer", "Sec-Fetch-Dest",
                        "Sec-Fetch-Mode", "Sec-Fetch-Site", "User-Agent",
                        "sec-ch-ua", "sec-ch-ua-mobile", "sec-ch-ua-platform"],
       "supports_credentials": True,
       "max_age": 3600
   }
})

This detailed CORS configuration:

Allows requests from our frontend development server
Permits necessary HTTP methods
Includes all required headers for modern browsers
Enables credential support for authenticated requests

PDF Upload Endpoint

@app.route('/upload-pdfs', methods=['POST', 'OPTIONS'])
def upload_pdfs():
   if request.method == "OPTIONS":
       response = make_response()
       response.headers.add("Access-Control-Allow-Origin", request.headers.get("Origin", "http://localhost:5173"))
       response.headers.add("Access-Control-Allow-Headers", "*")
       response.headers.add("Access-Control-Allow-Methods", "*")
       response.headers.add("Access-Control-Allow-Credentials", "true")
       return response

   if 'files[]' not in request.files:
       return jsonify({'error': 'No files provided'}), 400
  
   files = request.files.getlist('files[]')
  
   if len(files) > MAX_FILES:
       return jsonify({'error': f'Maximum {MAX_FILES} files allowed'}), 400
  
   try:
       # Initialize text splitter
       text_splitter = RecursiveCharacterTextSplitter(
           chunk_size=1000,
           chunk_overlap=200,
           length_function=len
       )
      
       # Initialize embeddings
       embeddings = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")
      
       all_documents = []
      
       # Process each PDF file
       for file in files:
           if file and allowed_file(file.filename):
               filename = secure_filename(file.filename)
              
               # Create a temporary file
               with tempfile.NamedTemporaryFile(delete=False, suffix=".pdf") as tmp_file:
                   file.save(tmp_file.name)
                  
                   # Load PDF
                   logger.info(f"Processing file: {filename}")
                   loader = PyPDFLoader(tmp_file.name)
                   documents = loader.load()
                  
                   # Split documents
                   split_docs = text_splitter.split_documents(documents)
                   all_documents.extend(split_docs)
                  
                   # Clean up
                   os.unlink(tmp_file.name)
      
       # Create vector store
       global vectorstore
       vectorstore = Chroma.from_documents(
           documents=all_documents,
           embedding=embeddings,
           persist_directory=PERSIST_DIRECTORY
       )
       vectorstore.persist()
      
       response = jsonify({
           'message': f'Successfully processed {len(files)} PDF files',
           'document_chunks': len(all_documents)
       })
       return response
      
   except Exception as e:
       logger.error(f"Error processing files: {str(e)}")
       return jsonify({'error': str(e)}), 500

Key implementation details:

File Validation:
- Checks for proper file extensions
- Limits number of files (MAX_FILES = 5)
- Uses secure_filename for safety
Document Processing:

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,
    chunk_overlap=200,
    length_function=len
)

Why these settings?

1000-character chunks balance context size and relevance
200-character overlap prevents context loss at chunk boundaries
RecursiveCharacterTextSplitter handles various document structures better than simple splitting

3. Embedding Generation:

What's happening here?

We use HuggingFace's all-MiniLM-L6-v2 model to create our embeddings
Each chunk of text gets converted into a list of numbers (a vector)
These vectors capture the meaning of the text in a way computers can understand
Similar meanings will have similar vectors, even if they use different words
For example, "What's the weather?" and "How's the temperature outside?" would have similar vectors

Why this specific model?

Balances performance and resource usage
Good at understanding meaning across different phrasings
Works well with ChromaDB for quick similarity searches

4. Vector Store Creation:

vectorstore = Chroma.from_documents(
    documents=all_documents,
    embedding=embeddings,
    persist_directory=PERSIST_DIRECTORY
)

Why ChromaDB?

Efficiently stores and organizes our embeddings
When you ask a question, it quickly finds the most relevant text chunks by comparing vector similarities
Think of it like a smart search engine that understands meaning, not just keywords
Saves everything to disk so we don't need to regenerate embeddings every time

Question-Answering Endpoint

@app.route('/ask-question', methods=['POST', 'OPTIONS'])
def ask_question():
   if request.method == "OPTIONS":
       return jsonify({"message": "OK"}), 200

   if not vectorstore:
       return jsonify({'error': 'No documents have been uploaded yet'}), 400
  
   try:
       data = request.get_json()
       if not data or 'text' not in data:
           return jsonify({'error': 'No question provided'}), 400
      
       question = data['text']
      
       # Initialize LLM
       llm = ChatOpenAI(model_name="gpt-4o-mini", temperature=0)
      
       # Create QA chain
       qa_chain = RetrievalQA.from_chain_type(
           llm=llm,
           chain_type="stuff",
           retriever=vectorstore.as_retriever(search_kwargs={"k": 5}),
           return_source_documents=True
       )
      
       # Get answer
       custom_prompt = f"Based on the provided contexts from documents and your knowledge, answer concisely:\n\nQuestion: {question}"
       result = qa_chain({"query": custom_prompt})
      
       response = jsonify({
           'answer': result["result"],
           'sources': [doc.page_content for doc in result["source_documents"]]
       })
       return response
      
   except Exception as e:
       logger.error(f"Error processing question: {str(e)}")
       return jsonify({'error': str(e)}), 500

Key components:

LLM Setup:

llm = ChatOpenAI(model_name="gpt-4o-mini", temperature=0)

Why these settings?

temperature=0 for consistent, factual responses
Using gpt-4o-mini model for better comprehension and response quality

‍

2. QA Chain Configuration:

qa_chain = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",
    retriever=vectorstore.as_retriever(search_kwargs={"k": 5}),
    return_source_documents=True
)

Implementation choices:

"stuff" chain type: Simple and effective for moderate context lengths
k=5: Retrieves top 5 most relevant chunks for context
return_source_documents: Enables source attribution in responses

3. Custom Prompt:

custom_prompt = f"Based on the provided contexts from documents and your knowledge, answer concisely:\n\nQuestion: {question}"

Why this format?

Clear instruction for concise answers
Explicitly mentions context usage
Simple structure for consistent responses

Running and Testing the Backend

Start the backend server:

python app.py

You can test the endpoints using tools like Postman or curl:

/upload-pdfs: POST request with PDF files
/ask-question: POST request with your question

Frontend Implementation

Now that we have our backend working, let's create a simple interface to demonstrate the functionality. This part is optional but helpful for testing and demonstration purposes.

Frontend Setup

# Create React project
npm create vite@latest frontend -- --template react
cd frontend

# Install dependencies
npm install lucide-react @mui/material

# Frontend (.env)
VITE_API_URL=http://localhost:5000

Frontend Code

Here's our React component for the interface:

import React, { useState } from "react";
import { Upload, MessageSquare, FileText, AlertCircle } from "lucide-react";
import { Alert, AlertTitle } from "@mui/material";

const API_URL = import.meta.env.VITE_API_URL;

const PDFQuestionAnswering = () => {
 const [files, setFiles] = useState([]);
 const [isUploading, setIsUploading] = useState(false);
 const [uploadStatus, setUploadStatus] = useState(null);
 const [question, setQuestion] = useState("");
 const [answer, setAnswer] = useState(null);
 const [isLoading, setIsLoading] = useState(false);

 const handleFileUpload = async (e) => {
   const selectedFiles = Array.from(e.target.files);

   if (selectedFiles.length > 5) {
     setUploadStatus({ type: "error", message: "Maximum 5 files allowed" });
     return;
   }

   setIsUploading(true);
   setFiles(selectedFiles);

   const formData = new FormData();
   selectedFiles.forEach((file) => {
     formData.append("files[]", file);
   });

   try {
     const response = await fetch(`${API_URL}/upload-pdfs`, {
       method: "POST",
       body: formData,
       headers: {
         Accept: "application/json", // Allow JSON responses
       },
     });

     if (!response.ok) {
       const errorData = await response.json();
       throw new Error(errorData.error || "Upload failed");
     }

     const data = await response.json();
     setUploadStatus({
       type: "success",
       message: `${data.message} (${data.document_chunks} chunks created)`,
     });
   } catch (error) {
     setUploadStatus({ type: "error", message: error.message });
   } finally {
     setIsUploading(false);
   }
 };

 const handleQuestionSubmit = async (e) => {
   e.preventDefault();
   if (!question.trim()) return;

   setIsLoading(true);

   try {
     const response = await fetch(`${API_URL}/ask-question`, {
       method: "POST",
       headers: {
         "Content-Type": "application/json",
       },
       body: JSON.stringify({ text: question }),
     });

     if (!response.ok) {
       const errorData = await response.json();
       throw new Error(errorData.error || "Failed to get answer");
     }

     const data = await response.json();
     setAnswer(data);
   } catch (error) {
     setUploadStatus({ type: "error", message: error.message });
   } finally {
     setIsLoading(false);
   }
 };

 return (
   <div>
     <div>
       <h1>PDF Question Answering System</h1>

       {/* File Upload Section */}
       <div>
         <div>
           <Upload />
         </div>
         <div>
           <label>
             Choose PDFs
             <input
               type="file"
               multiple
               accept=".pdf"
               onChange={handleFileUpload}
             />
           </label>
         </div>
         <p>Upload up to 5 PDF files</p>

         {files.length > 0 && (
           <div>
             <h3>Uploaded Files:</h3>
             <ul
               style={{
                 display: "flex",
                 justifyContent: "center",
                 alignItems: "center",
                 gap: "40px",
               }}
             >
               {files.map((file, index) => (
                 <li key={index}>
                   <FileText />
                   {file.name}
                 </li>
               ))}
             </ul>
           </div>
         )}
       </div>

       {/* Status Messages */}
       {uploadStatus && (
         <Alert
           severity={uploadStatus.type === "error" ? "error" : "info"}
           sx={{
             display: "flex",
             justifyContent: "center",
             alignItems: "center",
           }}
         >
           <AlertTitle>
             {uploadStatus.type === "error" ? "Error" : "Info"}
           </AlertTitle>
           {uploadStatus.message}
         </Alert>
       )}

       {/* Question Input */}
       <form onSubmit={handleQuestionSubmit}>
         <div>
           <input
             type="text"
             value={question}
             onChange={(e) => setQuestion(e.target.value)}
             placeholder="Ask a question about your PDFs..."
             disabled={isUploading || files.length === 0}
             style={{ width: "500px" }}
           />
           <button
             type="submit"
             disabled={isLoading || isUploading || files.length === 0}
           >
             {isLoading ? "Thinking..." : "Ask"}
           </button>
         </div>
       </form>

       {/* Answer Display */}
       {answer && (
         <div>
           <div>
             <h3>
               <MessageSquare />
               Answer
             </h3>
             <p>{answer.answer}</p>
           </div>

           <div>
             <h4>Sources:</h4>
             <ul>
               {answer.sources.map((source, index) => (
                 <li key={index}>{source}</li>
               ))}
             </ul>
           </div>
         </div>
       )}

       {/* Loading State */}
       {isLoading && (
         <div>
           <div>Processing your question...</div>
         </div>
       )}
     </div>
   </div>
 );
};

export default PDFQuestionAnswering;

Frontend Experience

Easy drag-and-drop file upload
Simple question input
Clean answer display with sources
Real-time status updates

Running the Complete Application

Start the frontend:

cd frontend
npm run dev

Visit http://localhost:5173 to test your application!

Pro Tips

Quality Matters: Clean, well-formatted PDFs work best
Be Specific: Ask clear, focused questions
Check Sources: The system shows you where it got the information
Multiple PDFs: Upload related documents for broader context

Common Issues and Solutions

CORS Problems
- Check the Flask CORS configuration
- Verify the frontend URL matches the allowed origins
Upload Issues
- Check file size limits
- Ensure PDFs are not corrupted
- Verify file permissions
No Response
- Check your OpenAI API key
- Verify the backend is running
- Check the console for errors

And you’re done!

The beauty of this system is its simplicity - just two main files handling all the magic. Yet it's powerful enough to:

Process multiple PDFs
Understand document context
Generate relevant answers
Provide source references

Whether you're building this for personal use or as a starting point for a larger project, the architecture we've created here is both scalable and adaptable.

Happy coding!

Frequently Asked Questions

Author’s Profile

Kanika Bansal

Kanika is a Principal Product Manager at Murf AI, specializing in AI-driven voice technology. Previously worked with Amazon's Alexa AI and Nova, she brings deep expertise in artificial intelligence, speech synthesis, and product innovation. At Murf, Kanika focuses on enhancing AI voice solutions to empower content creators, businesses, and developers - to bridge the gap between cutting-edge AI advancements and real-world applications.

Share this post