Skip to content
Go back

Building a RAG System for Cybersecurity Compliance: A Simple POC with LangChain v1+

RAG

Table of Contents

Open Table of Contents

Introduction

⚠️ This is a Proof of Concept (POC) - This article demonstrates the fundamentals of building a RAG system using modern tools. It’s intentionally simple and unoptimized to focus on core concepts. Do not use this in production without significant enhancements.

The Challenge

Organizations dealing with cybersecurity compliance face a common problem: regulations like ENS (Spain), NIS2 (EU), GDPR, and DORA span hundreds of pages. Finding specific answers requires manual PDF searches or expensive consultants.

The Solution

Retrieval-Augmented Generation (RAG) solves this by combining:

  1. Semantic search to find relevant document chunks.
  2. LLMs to synthesize natural language answers from those chunks.

This post walks through building a minimal RAG system from scratch using:

What you’ll learn:

📦 Full code available: The complete project with Docker Compose, uv package manager, FastAPI backend, and Streamlit frontend is on GitHub.


System Architecture

Simple three-tier architecture:

System Architecture

Flow:

  1. User asks a question via Streamlit UI
  2. FastAPI backend receives the query
  3. LangChain retrieves relevant chunks from FAISS
  4. LLM generates answer based on retrieved context
  5. Response with sources returned to user

Tech Stack:

RAG Workflow


How RAG Works (Simplified)

Before diving into code, understand the RAG workflow:

Indexing Phase (run once):

  1. Load PDFs.
  2. Split into chunks (~1000 chars).
  3. Generate embeddings (convert text → vectors) — Learn more about embeddings.
  4. Store in FAISS vector database.

Indexing Phase

Query Phase (per request):

  1. Convert user question → embedding.
  2. Find top-k most similar chunks in FAISS (cosine similarity).
  3. Pass chunks + question to LLM with prompt.
  4. LLM generates answer based on context.
  5. Return answer + source citations.

Key insight: The LLM never “knows” the regulations—it only sees the retrieved chunks. This prevents hallucination.

💡 New to embeddings? Check out my deep dive on Vector Embeddings and Semantic Search to understand how text becomes searchable vectors.


Building the RAG System

Step 1: Document Processing

Load PDFs and split into chunks:

from langchain_community.document_loaders import PyPDFLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter

# Load all PDFs
pdf_files = Path('data/storage').glob('*.pdf')
docs = []
for pdf in pdf_files:
    docs.extend(PyPDFLoader(str(pdf)).load())

# Smart chunking
splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,      # ~250 tokens
    chunk_overlap=200,    # Preserve context at boundaries
    separators=['\n\n', '\n', ' ', '']  # Split on paragraphs first
)
chunks = splitter.split_documents(docs)

Document Processing

Why these settings?

Step 2: Generate Embeddings & Index

from langchain_openai import AzureOpenAIEmbeddings  # Or OpenAIEmbeddings
from langchain_community.vectorstores import FAISS

# Any OpenAI-compatible embedding API works
embeddings = AzureOpenAIEmbeddings(
    model="text-embedding-3-small",  # 1536 dimensions
    # ... API credentials
)

# Create FAISS index
vectorstore = FAISS.from_documents(chunks, embeddings)
vectorstore.save_local('faiss_index')  # Persist to disk

Generate Embeddings

Note: This example uses Azure OpenAI, but you can swap in:

Batch processing tip: For large datasets, process in batches with delays to avoid rate limits:

BATCH_SIZE = 250
for i in range(0, len(chunks), BATCH_SIZE):
    batch = chunks[i:i+BATCH_SIZE]
    vectorstore.add_documents(batch)
    time.sleep(5)  # Respect rate limits

Embeddings Initialization

Step 3: Build the RAG Chain (LangChain v1 LCEL)

LangChain v1 uses LCEL (LangChain Expression Language) for composable chains:

from langchain_openai import AzureChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough, RunnableLambda

# Load vector store
vectorstore = FAISS.load_local('faiss_index', embeddings)
retriever = vectorstore.as_retriever(search_kwargs={'k': 5})

# Prompt template
prompt = ChatPromptTemplate.from_template(
    "You are an expert in regulatory compliance. "
    "Answer based EXCLUSIVELY on the provided context. "
    "Cite the regulation (ENS, NIS2, GDPR, DORA).\n\n"
    "Context:\n{context}\n\n"
    "Question: {question}\n\n"
    "Answer:"
)

# LLM (any OpenAI-compatible API)
llm = AzureChatOpenAI(
    model="gpt-4o-mini",
    temperature=0.2  # Low temp for factual answers
)

# Format documents helper
def format_docs(docs):
    return '\n\n'.join(d.page_content for d in docs)

# Build RAG chain with LCEL
rag_chain = (
    {
        "context": retriever | RunnableLambda(format_docs),
        "question": RunnablePassthrough()
    }
    | prompt
    | llm
    | StrOutputParser()
)

# Use it
answer = rag_chain.invoke("What is ENS?")

LCEL benefits:

Step 4: FastAPI Backend

Wrap the RAG chain in a REST API:

from pydantic import BaseModel

app = FastAPI()

class QueryRequest(BaseModel):
    question: str
    max_sources: int = 3

@app.post("/query")
async def query_rag(request: QueryRequest):
    # Retrieve documents
    sources = retriever.get_relevant_documents(request.question)
    
    # Generate answer
    answer = rag_chain.invoke(request.question)
    
    return {
        "answer": answer,
        "sources": [
            {
                "content": doc.page_content[:300],
                "metadata": doc.metadata  # filename, page number
            }
            for doc in sources[:request.max_sources]
        ]
    }

Minimal but functional. Production would add:

Step 5: Streamlit Frontend (Optional)

Quick UI for testing:

import streamlit as st
import requests

question = st.text_input("Ask a compliance question:")
if question:
    response = requests.post(
        "http://localhost:8000/query",
        json={"question": question}
    )
    result = response.json()
    
    st.write(result["answer"])
    
    with st.expander("Sources"):
        for i, source in enumerate(result["sources"], 1):
            st.text(f"{i}. {source['metadata']['source']} (p. {source['metadata']['page']})")

Example Query


Prompt Engineering for RAG

The prompt is critical. Here’s what works:

You are an expert in regulatory compliance.
Answer based EXCLUSIVELY on the provided context.
If the answer isn't in the context, say "I don't have enough information."
Cite the specific regulation (ENS, NIS2, GDPR, DORA).

Context:
{context}

Question: {question}

Answer:

Key constraints:

  1. “EXCLUSIVELY on the provided context”: Prevents hallucination.
  2. “If not in context, say so”: Avoids making up answers.
  3. Cite regulation: Helps verification.
  4. Low temperature (0.2): Deterministic, factual responses.

Bad prompt example:

Answer this question: {question}

→ LLM will use its training data, not your documents!

Prompt Engineering


Example Query Flow

Question: “What security measures does ENS require for MEDIUM category?”

1. Retrieval: FAISS finds top-5 chunks:

Chunk 1 (similarity: 0.87): "ENS categorization MEDIUM requires [op.acc.2] authentication controls..."
Chunk 2 (similarity: 0.84): "For MEDIUM systems, [mp.info.3] data encryption at rest..."
...

2. Prompt Construction:

Context:
[Chunk 1 content]
[Chunk 2 content]
...

Question: What security measures does ENS require for MEDIUM category?

3. LLM Response:

For MEDIUM category systems under ENS:
- [op.acc.2] Authentication controls with password policies
- [mp.info.3] Data encryption for sensitive information at rest
- [op.exp.9] Security logging and monitoring
...

4. Add Sources: Return answer + metadata (filename, page numbers) for verification.

FastAPI Swagger


Running the POC

Repository includes:

Full code on GitHub.

This POC shows how RAG works, not how to run it in production. Treat it as a learning base, not a deployment template.

Quick start:

# 1. Clone repo
git clone https://github.com/manulqwerty/cyber-compliance-rag
cd cyber-compliance-rag

# 2. Add PDFs to backend/data/storage/

# 3. Configure .env with your LLM API credentials

# 4. Initialize FAISS
cd backend
uv run python dev/init_rag.py

# 5. Run services
docker compose up -d --build

Visit http://localhost:8501 to test.


Performance & Costs

For reference (your mileage may vary):

Latency:

Costs (Azure OpenAI example):

For 10K queries/month: ~ $7/month


Challenges & Lessons

1. Rate Limits

2. Source Attribution

3. Context Window

4. Prompt Injection Resistance

Prompt injection test showing the system is not vulnerable


Conclusion

This POC demonstrates the fundamentals of building a RAG system with modern tools:

What we built:

What we learned:

What’s missing:

Use this as:

For production, you’d need weeks of additional work on the items listed in “What’s Missing for Production.”

The code is on GitHub.


Resources


Share this post on:

Previous Post
Building LangChain Tools and Agents: From Zero to SOAR Assistant
Next Post
Engineering Security ML with Elastic – Part 4: Production Pipelines with Dagster & MLflow