Home
Blog
Building a RAG-based PDF Question-Answering System with LangChain
API

Building a RAG-based PDF Question-Answering System with LangChain

Build a robust question-answering app that lets users upload PDFs and ask questions, powered by LangChain and a RAG pipeline with real-time responses.
Kanika Bansal
Kanika Bansal
Last updated:
May 29, 2025
12
Min Read
Building a RAG-based PDF Question-Answering System with LangChain
Table of Contents
Table of Contents

This tutorial will help you create a simple PDF question-answering system that lets users get instant, AI-powered answers from their documents. We'll keep it simple and powerful, using LangChain for the AI magic and React for a sleek interface.

Prerequisites

Before we dive in, make sure you have:

  • Python 3.8+
  • Node.js and npm
  • OpenAI API key
  • Some PDFs for testing! 

Understanding RAG and LangChain

What's RAG?

Retrieval-Augmented Generation (RAG) is like giving your AI assistant a perfect memory. Instead of relying only on its training data, RAG lets the AI:

  1. Store and index your documents (Retrieval)
  2. Use this specific information to generate accurate answers (Augmented Generation)

Think of it as the difference between asking someone about a book they read years ago versus letting them look at the book while answering your questions.

How Does RAG Work Under the Hood?

Let's break down how RAG actually processes and retrieves information:

  1. Document to Vectors: First, we convert text into vectors (called embeddings) that AI can understand. Think of embeddings as converting words into coordinates in a huge multi-dimensional space, where similar meanings are closer together. For example, "happy" and "joyful" would be near each other, while "happy" and "table" would be far apart.
  2. Smart Storage: These embeddings are stored in a special database called a vector store (we're using ChromaDB). It's like a smart library that can quickly find similar pieces of text based on their meaning, not just matching exact words.
  3. Question Processing: When you ask a question, we:
    1. Convert your question into an embedding using the same process
    2. Use the vector store to find the most similar text chunks from your documents
    3. Send these relevant chunks along with your question to the LLM for answering

Why LangChain?

LangChain makes building RAG applications super easy by providing:

  • Document loading capabilities
  • Text splitting utilities
  • Vector storage solutions
  • Language model integration
  • And much more!

What We're Building

Our app will:

  • Let users upload up to 5 PDFs
  • Process documents using LangChain and ChromaDB
  • Answer questions about the uploaded documents
  • Show answers with their source references
  • Provide a clean, intuitive interface

Project Structure

We're keeping it super simple with just two main files:

{{qq-border-start}}

  pdf-qa-system
├──  backend
│   └── app.py               
└──  frontend
    └──  src
        └── PDFQuestionAnswering.jsx

{{qq-border-end}}

Building the Backend

Let's break down our backend implementation into logical steps. First, make sure you have the prerequisites.

Step 1: Setting Up the Environment

{{qq-border-start}}

# Create project directory
mkdir pdf-qa-system
cd pdf-qa-system

# Create and activate virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install flask flask-cors langchain chromadb sentence-transformers pypdf openai

# Backend (.env)
OPENAI_API_KEY=your-openai-api-key

{{qq-border-end}}

Step 2: Understanding the Backend Components

Our backend consists of three main parts:

  1. PDF Processing: Reading and splitting documents
  2. Embedding Generation: Converting text into vector representations
  3. Question-Answering: Using RAG to generate accurate answers

Let's look at how each part works:

PDF Processing Flow

  1. User uploads PDFs → Temporary storage
  2. PyPDFLoader extracts text
  3. Text gets split into chunks
  4. HuggingFace models create embeddings
  5. ChromaDB stores everything for quick retrieval

Question-Answering Process

  1. User asks a question
  2. System finds relevant document chunks
  3. LLM (GPT) generates an answer using the context
  4. User gets the answer with source references

Step 3: Backend Implementation

Let's break down our backend code and understand each component in detail:

{{qq-border-start}}

from flask import Flask, request, jsonify, make_response
from flask_cors import CORS
from werkzeug.utils import secure_filename
from langchain_community.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.embeddings import HuggingFaceEmbeddings
from langchain_community.vectorstores import Chroma
from langchain_openai import ChatOpenAI
from langchain.chains import RetrievalQA
import tempfile
import os
import logging
from dotenv import load_dotenv

{{qq-border-end}}

Why these imports?

  • Flask and CORS: For creating our web server and handling cross-origin requests
  • LangChain components: For document processing, embeddings, and QA chain
  • Utility imports: For file handling, logging, and environment variables

CORS Configuration

{{qq-border-start}}

CORS(app, resources={
   r"/*": {
       "origins": ["http://localhost:5173", "http://127.0.0.1:5173"],
       "methods": ["GET", "POST", "OPTIONS"],
       "allow_headers": ["Content-Type", "Authorization", "Accept", "Accept-Language",
                        "Connection", "Origin", "Referer", "Sec-Fetch-Dest",
                        "Sec-Fetch-Mode", "Sec-Fetch-Site", "User-Agent",
                        "sec-ch-ua", "sec-ch-ua-mobile", "sec-ch-ua-platform"],
       "supports_credentials": True,
       "max_age": 3600
   }
})

{{qq-border-end}}

This detailed CORS configuration:

  • Allows requests from our frontend development server
  • Permits necessary HTTP methods
  • Includes all required headers for modern browsers
  • Enables credential support for authenticated requests

PDF Upload Endpoint

{{qq-border-start}}

@app.route('/upload-pdfs', methods=['POST', 'OPTIONS'])
def upload_pdfs():
   if request.method == "OPTIONS":
       response = make_response()
       response.headers.add("Access-Control-Allow-Origin", request.headers.get("Origin", "http://localhost:5173"))
       response.headers.add("Access-Control-Allow-Headers", "*")
       response.headers.add("Access-Control-Allow-Methods", "*")
       response.headers.add("Access-Control-Allow-Credentials", "true")
       return response

   if 'files[]' not in request.files:
       return jsonify({'error': 'No files provided'}), 400
  
   files = request.files.getlist('files[]')
  
   if len(files) > MAX_FILES:
       return jsonify({'error': f'Maximum {MAX_FILES} files allowed'}), 400
  
   try:
       # Initialize text splitter
       text_splitter = RecursiveCharacterTextSplitter(
           chunk_size=1000,
           chunk_overlap=200,
           length_function=len
       )
      
       # Initialize embeddings
       embeddings = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")
      
       all_documents = []
      
       # Process each PDF file
       for file in files:
           if file and allowed_file(file.filename):
               filename = secure_filename(file.filename)
              
               # Create a temporary file
               with tempfile.NamedTemporaryFile(delete=False, suffix=".pdf") as tmp_file:
                   file.save(tmp_file.name)
                  
                   # Load PDF
                   logger.info(f"Processing file: {filename}")
                   loader = PyPDFLoader(tmp_file.name)
                   documents = loader.load()
                  
                   # Split documents
                   split_docs = text_splitter.split_documents(documents)
                   all_documents.extend(split_docs)
                  
                   # Clean up
                   os.unlink(tmp_file.name)
      
       # Create vector store
       global vectorstore
       vectorstore = Chroma.from_documents(
           documents=all_documents,
           embedding=embeddings,
           persist_directory=PERSIST_DIRECTORY
       )
       vectorstore.persist()
      
       response = jsonify({
           'message': f'Successfully processed {len(files)} PDF files',
           'document_chunks': len(all_documents)
       })
       return response
      
   except Exception as e:
       logger.error(f"Error processing files: {str(e)}")
       return jsonify({'error': str(e)}), 500

{{qq-border-end}}

Key implementation details:

  1. File Validation:
    • Checks for proper file extensions
    • Limits number of files (MAX_FILES = 5)
    • Uses secure_filename for safety
  2. Document Processing:

{{qq-border-start}}

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,
    chunk_overlap=200,
    length_function=len
)

{{qq-border-end}}

Why these settings?

  • 1000-character chunks balance context size and relevance
  • 200-character overlap prevents context loss at chunk boundaries
  • RecursiveCharacterTextSplitter handles various document structures better than simple splitting

   3.  Embedding Generation: 

What's happening here?

  • We use HuggingFace's all-MiniLM-L6-v2 model to create our embeddings
  • Each chunk of text gets converted into a list of numbers (a vector)
  • These vectors capture the meaning of the text in a way computers can understand
  • Similar meanings will have similar vectors, even if they use different words
  • For example, "What's the weather?" and "How's the temperature outside?" would have similar vectors

Why this specific model?

  • Balances performance and resource usage
  • Good at understanding meaning across different phrasings
  • Works well with ChromaDB for quick similarity searches

   4. Vector Store Creation: 

{{qq-border-start}}

vectorstore = Chroma.from_documents(
    documents=all_documents,
    embedding=embeddings,
    persist_directory=PERSIST_DIRECTORY
)

{{qq-border-end}}

Why ChromaDB? 

  • Efficiently stores and organizes our embeddings
  • When you ask a question, it quickly finds the most relevant text chunks by comparing vector similarities 
  • Think of it like a smart search engine that understands meaning, not just keywords 
  • Saves everything to disk so we don't need to regenerate embeddings every time

Question-Answering Endpoint

{{qq-border-start}}

@app.route('/ask-question', methods=['POST', 'OPTIONS'])
def ask_question():
   if request.method == "OPTIONS":
       return jsonify({"message": "OK"}), 200

   if not vectorstore:
       return jsonify({'error': 'No documents have been uploaded yet'}), 400
  
   try:
       data = request.get_json()
       if not data or 'text' not in data:
           return jsonify({'error': 'No question provided'}), 400
      
       question = data['text']
      
       # Initialize LLM
       llm = ChatOpenAI(model_name="gpt-4o-mini", temperature=0)
      
       # Create QA chain
       qa_chain = RetrievalQA.from_chain_type(
           llm=llm,
           chain_type="stuff",
           retriever=vectorstore.as_retriever(search_kwargs={"k": 5}),
           return_source_documents=True
       )
      
       # Get answer
       custom_prompt = f"Based on the provided contexts from documents and your knowledge, answer concisely:\n\nQuestion: {question}"
       result = qa_chain({"query": custom_prompt})
      
       response = jsonify({
           'answer': result["result"],
           'sources': [doc.page_content for doc in result["source_documents"]]
       })
       return response
      
   except Exception as e:
       logger.error(f"Error processing question: {str(e)}")
       return jsonify({'error': str(e)}), 500

{{qq-border-end}}

Key components:

  1. LLM Setup: 

{{qq-border-start}}

llm = ChatOpenAI(model_name="gpt-4o-mini", temperature=0)

{{qq-border-end}}

Why these settings?

  • temperature=0 for consistent, factual responses
  • Using gpt-4o-mini model for better comprehension and response quality

     2. QA Chain Configuration: 

{{qq-border-start}}

qa_chain = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",
    retriever=vectorstore.as_retriever(search_kwargs={"k": 5}),
    return_source_documents=True
)

{{qq-border-end}}

Implementation choices:

  • "stuff" chain type: Simple and effective for moderate context lengths
  • k=5: Retrieves top 5 most relevant chunks for context
  • return_source_documents: Enables source attribution in responses

     3. Custom Prompt: 

{{qq-border-start}}

custom_prompt = f"Based on the provided contexts from documents and your knowledge, answer concisely:\n\nQuestion: {question}"

{{qq-border-end}}

Why this format?

  • Clear instruction for concise answers
  • Explicitly mentions context usage
  • Simple structure for consistent responses

Running and Testing the Backend

Start the backend server:

{{qq-border-start}}

python app.py

{{qq-border-end}}

You can test the endpoints using tools like Postman or curl:

  • /upload-pdfs: POST request with PDF files
  • /ask-question: POST request with your question

Frontend Implementation

Now that we have our backend working, let's create a simple interface to demonstrate the functionality. This part is optional but helpful for testing and demonstration purposes.

Frontend Setup

{{qq-border-start}}

# Create React project
npm create vite@latest frontend -- --template react
cd frontend

# Install dependencies
npm install lucide-react @mui/material

# Frontend (.env)
VITE_API_URL=http://localhost:5000

{{qq-border-end}}

Frontend Code

Here's our React component for the interface:

{{qq-border-start}}

import React, { useState } from "react";
import { Upload, MessageSquare, FileText, AlertCircle } from "lucide-react";
import { Alert, AlertTitle } from "@mui/material";

const API_URL = import.meta.env.VITE_API_URL;

const PDFQuestionAnswering = () => {
 const [files, setFiles] = useState([]);
 const [isUploading, setIsUploading] = useState(false);
 const [uploadStatus, setUploadStatus] = useState(null);
 const [question, setQuestion] = useState("");
 const [answer, setAnswer] = useState(null);
 const [isLoading, setIsLoading] = useState(false);

 const handleFileUpload = async (e) => {
   const selectedFiles = Array.from(e.target.files);

   if (selectedFiles.length > 5) {
     setUploadStatus({ type: "error", message: "Maximum 5 files allowed" });
     return;
   }

   setIsUploading(true);
   setFiles(selectedFiles);

   const formData = new FormData();
   selectedFiles.forEach((file) => {
     formData.append("files[]", file);
   });

   try {
     const response = await fetch(`${API_URL}/upload-pdfs`, {
       method: "POST",
       body: formData,
       headers: {
         Accept: "application/json", // Allow JSON responses
       },
     });

     if (!response.ok) {
       const errorData = await response.json();
       throw new Error(errorData.error || "Upload failed");
     }

     const data = await response.json();
     setUploadStatus({
       type: "success",
       message: `${data.message} (${data.document_chunks} chunks created)`,
     });
   } catch (error) {
     setUploadStatus({ type: "error", message: error.message });
   } finally {
     setIsUploading(false);
   }
 };

 const handleQuestionSubmit = async (e) => {
   e.preventDefault();
   if (!question.trim()) return;

   setIsLoading(true);

   try {
     const response = await fetch(`${API_URL}/ask-question`, {
       method: "POST",
       headers: {
         "Content-Type": "application/json",
       },
       body: JSON.stringify({ text: question }),
     });

     if (!response.ok) {
       const errorData = await response.json();
       throw new Error(errorData.error || "Failed to get answer");
     }

     const data = await response.json();
     setAnswer(data);
   } catch (error) {
     setUploadStatus({ type: "error", message: error.message });
   } finally {
     setIsLoading(false);
   }
 };

 return (
   <div>
     <div>
       <h1>PDF Question Answering System</h1>

       {/* File Upload Section */}
       <div>
         <div>
           <Upload />
         </div>
         <div>
           <label>
             Choose PDFs
             <input
               type="file"
               multiple
               accept=".pdf"
               onChange={handleFileUpload}
             />
           </label>
         </div>
         <p>Upload up to 5 PDF files</p>

         {files.length > 0 && (
           <div>
             <h3>Uploaded Files:</h3>
             <ul
               style={{
                 display: "flex",
                 justifyContent: "center",
                 alignItems: "center",
                 gap: "40px",
               }}
             >
               {files.map((file, index) => (
                 <li key={index}>
                   <FileText />
                   {file.name}
                 </li>
               ))}
             </ul>
           </div>
         )}
       </div>

       {/* Status Messages */}
       {uploadStatus && (
         <Alert
           severity={uploadStatus.type === "error" ? "error" : "info"}
           sx={{
             display: "flex",
             justifyContent: "center",
             alignItems: "center",
           }}
         >
           <AlertTitle>
             {uploadStatus.type === "error" ? "Error" : "Info"}
           </AlertTitle>
           {uploadStatus.message}
         </Alert>
       )}

       {/* Question Input */}
       <form onSubmit={handleQuestionSubmit}>
         <div>
           <input
             type="text"
             value={question}
             onChange={(e) => setQuestion(e.target.value)}
             placeholder="Ask a question about your PDFs..."
             disabled={isUploading || files.length === 0}
             style={{ width: "500px" }}
           />
           <button
             type="submit"
             disabled={isLoading || isUploading || files.length === 0}
           >
             {isLoading ? "Thinking..." : "Ask"}
           </button>
         </div>
       </form>

       {/* Answer Display */}
       {answer && (
         <div>
           <div>
             <h3>
               <MessageSquare />
               Answer
             </h3>
             <p>{answer.answer}</p>
           </div>

           <div>
             <h4>Sources:</h4>
             <ul>
               {answer.sources.map((source, index) => (
                 <li key={index}>{source}</li>
               ))}
             </ul>
           </div>
         </div>
       )}

       {/* Loading State */}
       {isLoading && (
         <div>
           <div>Processing your question...</div>
         </div>
       )}
     </div>
   </div>
 );
};

export default PDFQuestionAnswering;

{{qq-border-end}}

Frontend Experience

  • Easy drag-and-drop file upload
  • Simple question input
  • Clean answer display with sources
  • Real-time status updates

Running the Complete Application

Start the frontend:

{{qq-border-start}}

cd frontend
npm run dev

{{qq-border-end}}

Visit http://localhost:5173 to test your application!

Pro Tips

  1. Quality Matters: Clean, well-formatted PDFs work best
  2. Be Specific: Ask clear, focused questions
  3. Check Sources: The system shows you where it got the information
  4. Multiple PDFs: Upload related documents for broader context

Common Issues and Solutions

  •  CORS Problems
    • Check the Flask CORS configuration
    • Verify the frontend URL matches the allowed origins
  • Upload Issues
    • Check file size limits
    • Ensure PDFs are not corrupted
    • Verify file permissions
  • No Response
    • Check your OpenAI API key
    • Verify the backend is running
    • Check the console for errors

And you’re done!

The beauty of this system is its simplicity - just two main files handling all the magic. Yet it's powerful enough to:

  • Process multiple PDFs
  • Understand document context
  • Generate relevant answers
  • Provide source references

Whether you're building this for personal use or as a starting point for a larger project, the architecture we've created here is both scalable and adaptable.

Happy coding!

For more such developer resources and content, join us on our free Discord community.


Frequently Asked Questions

No items found.
Author’s Profile
Kanika Bansal
Kanika Bansal
Kanika is a Principal Product Manager at Murf AI, specializing in AI-driven voice technology. Previously worked with Amazon's Alexa AI and Nova, she brings deep expertise in artificial intelligence, speech synthesis, and product innovation. At Murf, Kanika focuses on enhancing AI voice solutions to empower content creators, businesses, and developers - to bridge the gap between cutting-edge AI advancements and real-world applications.
Share this post

Get in touch

Discover how we can improve your content production and help you save costs. A member of our team will reach out soon