Building a RAG System for Cybersecurity Compliance: A Simple POC with LangChain v1+

RAG

Open Table of Contents

Introduction
- The Challenge
- The Solution
System Architecture
How RAG Works (Simplified)
Building the RAG System
Prompt Engineering for RAG
Example Query Flow
Running the POC
Performance & Costs
Challenges & Lessons
Conclusion
Resources

Introduction

⚠️ This is a Proof of Concept (POC) - This article demonstrates the fundamentals of building a RAG system using modern tools. It’s intentionally simple and unoptimized to focus on core concepts. Do not use this in production without significant enhancements.

The Challenge

Organizations dealing with cybersecurity compliance face a common problem: regulations like ENS (Spain), NIS2 (EU), GDPR, and DORA span hundreds of pages. Finding specific answers requires manual PDF searches or expensive consultants.

The Solution

Retrieval-Augmented Generation (RAG) solves this by combining:

Semantic search to find relevant document chunks.
LLMs to synthesize natural language answers from those chunks.

This post walks through building a minimal RAG system from scratch using:

LangChain v1+ (for RAG orchestration)
FAISS (vector database)
Any OpenAI-compatible API (Azure OpenAI, OpenAI, Ollama, etc.)
FastAPI (backend)
Streamlit (frontend)

What you’ll learn:

How RAG works under the hood.
Document chunking strategies.
Vector embeddings and similarity search.
Prompt engineering for grounded responses.
What’s missing for production use.

📦 Full code available: The complete project with Docker Compose, uv package manager, FastAPI backend, and Streamlit frontend is on GitHub.

System Architecture

Simple three-tier architecture:

System Architecture

Flow:

User asks a question via Streamlit UI
FastAPI backend receives the query
LangChain retrieves relevant chunks from FAISS
LLM generates answer based on retrieved context
Response with sources returned to user

Tech Stack:

LangChain v1+: RAG pipeline orchestration.
FAISS: Local vector database (no server needed).
LLM API: Any OpenAI-compatible endpoint (I use Azure OpenAI, but OpenAI API or Ollama work too).
FastAPI: REST API.
Streamlit: Quick UI prototype.

RAG Workflow

How RAG Works (Simplified)

Before diving into code, understand the RAG workflow:

Indexing Phase (run once):

Load PDFs.
Split into chunks (~1000 chars).
Generate embeddings (convert text → vectors) — Learn more about embeddings.
Store in FAISS vector database.

Indexing Phase

Query Phase (per request):

Convert user question → embedding.
Find top-k most similar chunks in FAISS (cosine similarity).
Pass chunks + question to LLM with prompt.
LLM generates answer based on context.
Return answer + source citations.

Key insight: The LLM never “knows” the regulations—it only sees the retrieved chunks. This prevents hallucination.

💡 New to embeddings? Check out my deep dive on Vector Embeddings and Semantic Search to understand how text becomes searchable vectors.

Building the RAG System

Step 1: Document Processing

Load PDFs and split into chunks:

from langchain_community.document_loaders import PyPDFLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter

# Load all PDFs
pdf_files = Path('data/storage').glob('*.pdf')
docs = []
for pdf in pdf_files:
    docs.extend(PyPDFLoader(str(pdf)).load())

# Smart chunking
splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,      # ~250 tokens
    chunk_overlap=200,    # Preserve context at boundaries
    separators=['\n\n', '\n', ' ', '']  # Split on paragraphs first
)
chunks = splitter.split_documents(docs)

Document Processing

Why these settings?

1000 chars: Balances context (enough info) vs. specificity (focused chunks).
200 overlap: Prevents losing information at chunk boundaries.
Hierarchical separators: Prefer paragraph breaks over word breaks.

Step 2: Generate Embeddings & Index

from langchain_openai import AzureOpenAIEmbeddings  # Or OpenAIEmbeddings
from langchain_community.vectorstores import FAISS

# Any OpenAI-compatible embedding API works
embeddings = AzureOpenAIEmbeddings(
    model="text-embedding-3-small",  # 1536 dimensions
    # ... API credentials
)

# Create FAISS index
vectorstore = FAISS.from_documents(chunks, embeddings)
vectorstore.save_local('faiss_index')  # Persist to disk

Generate Embeddings

Note: This example uses Azure OpenAI, but you can swap in:

OpenAIEmbeddings (OpenAI API).
OllamaEmbeddings (local models).
Any provider with LangChain integration.

Batch processing tip: For large datasets, process in batches with delays to avoid rate limits:

BATCH_SIZE = 250
for i in range(0, len(chunks), BATCH_SIZE):
    batch = chunks[i:i+BATCH_SIZE]
    vectorstore.add_documents(batch)
    time.sleep(5)  # Respect rate limits

Embeddings Initialization

Step 3: Build the RAG Chain (LangChain v1 LCEL)

LangChain v1 uses LCEL (LangChain Expression Language) for composable chains:

from langchain_openai import AzureChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough, RunnableLambda

# Load vector store
vectorstore = FAISS.load_local('faiss_index', embeddings)
retriever = vectorstore.as_retriever(search_kwargs={'k': 5})

# Prompt template
prompt = ChatPromptTemplate.from_template(
    "You are an expert in regulatory compliance. "
    "Answer based EXCLUSIVELY on the provided context. "
    "Cite the regulation (ENS, NIS2, GDPR, DORA).\n\n"
    "Context:\n{context}\n\n"
    "Question: {question}\n\n"
    "Answer:"
)

# LLM (any OpenAI-compatible API)
llm = AzureChatOpenAI(
    model="gpt-4o-mini",
    temperature=0.2  # Low temp for factual answers
)

# Format documents helper
def format_docs(docs):
    return '\n\n'.join(d.page_content for d in docs)

# Build RAG chain with LCEL
rag_chain = (
    {
        "context": retriever | RunnableLambda(format_docs),
        "question": RunnablePassthrough()
    }
    | prompt
    | llm
    | StrOutputParser()
)

# Use it
answer = rag_chain.invoke("What is ENS?")

LCEL benefits:

Composable: Swap components easily (FAISS → Qdrant, Azure → OpenAI).
Readable: Pipeline is explicit (retrieval → prompt → LLM → parse).
Streamable: Built-in streaming support.

Step 4: FastAPI Backend

Wrap the RAG chain in a REST API:

from pydantic import BaseModel

app = FastAPI()

class QueryRequest(BaseModel):
    question: str
    max_sources: int = 3

@app.post("/query")
async def query_rag(request: QueryRequest):
    # Retrieve documents
    sources = retriever.get_relevant_documents(request.question)
    
    # Generate answer
    answer = rag_chain.invoke(request.question)
    
    return {
        "answer": answer,
        "sources": [
            {
                "content": doc.page_content[:300],
                "metadata": doc.metadata  # filename, page number
            }
            for doc in sources[:request.max_sources]
        ]
    }

Minimal but functional. Production would add:

Authentication.
Rate limiting.
Caching.
Error handling.
Logging/monitoring.

Step 5: Streamlit Frontend (Optional)

Quick UI for testing:

import streamlit as st
import requests

question = st.text_input("Ask a compliance question:")
if question:
    response = requests.post(
        "http://localhost:8000/query",
        json={"question": question}
    )
    result = response.json()
    
    st.write(result["answer"])
    
    with st.expander("Sources"):
        for i, source in enumerate(result["sources"], 1):
            st.text(f"{i}. {source['metadata']['source']} (p. {source['metadata']['page']})")

Example Query

Prompt Engineering for RAG

The prompt is critical. Here’s what works:

You are an expert in regulatory compliance.
Answer based EXCLUSIVELY on the provided context.
If the answer isn't in the context, say "I don't have enough information."
Cite the specific regulation (ENS, NIS2, GDPR, DORA).

Context:
{context}

Question: {question}

Answer:

Key constraints:

“EXCLUSIVELY on the provided context”: Prevents hallucination.
“If not in context, say so”: Avoids making up answers.
Cite regulation: Helps verification.
Low temperature (0.2): Deterministic, factual responses.

Bad prompt example:

Answer this question: {question}

→ LLM will use its training data, not your documents!

Prompt Engineering

Example Query Flow

Question: “What security measures does ENS require for MEDIUM category?”

1. Retrieval: FAISS finds top-5 chunks:

Chunk 1 (similarity: 0.87): "ENS categorization MEDIUM requires [op.acc.2] authentication controls..."
Chunk 2 (similarity: 0.84): "For MEDIUM systems, [mp.info.3] data encryption at rest..."
...

2. Prompt Construction:

Context:
[Chunk 1 content]
[Chunk 2 content]
...

Question: What security measures does ENS require for MEDIUM category?

3. LLM Response:

For MEDIUM category systems under ENS:
- [op.acc.2] Authentication controls with password policies
- [mp.info.3] Data encryption for sensitive information at rest
- [op.exp.9] Security logging and monitoring
...

4. Add Sources: Return answer + metadata (filename, page numbers) for verification.

FastAPI Swagger

Running the POC

Repository includes:

✅ Complete backend (FastAPI + LangChain + FAISS)
✅ Streamlit frontend
✅ Docker Compose setup (backend + frontend containers)
✅ uv package manager configuration (pyproject.toml)
✅ Initialization script and Jupyter notebook
✅ Example .env template

Full code on GitHub.

This POC shows how RAG works, not how to run it in production. Treat it as a learning base, not a deployment template.

Quick start:

# 1. Clone repo
git clone https://github.com/manulqwerty/cyber-compliance-rag
cd cyber-compliance-rag

# 2. Add PDFs to backend/data/storage/

# 3. Configure .env with your LLM API credentials

# 4. Initialize FAISS
cd backend
uv run python dev/init_rag.py

# 5. Run services
docker compose up -d --build

Visit http://localhost:8501 to test.

Performance & Costs

For reference (your mileage may vary):

Latency:

Embedding query: ~150ms
FAISS search: ~20ms
LLM generation: ~1.5s
Total: ~1.7s

Costs (Azure OpenAI example):

Embeddings: ~$0.00001/query
Generation: ~$0.00066/query
~$0.00067/query (~ 1,500 queries/$1)

For 10K queries/month: ~ $7/month

Challenges & Lessons

1. Rate Limits

Problem: Batch embedding generation hit API limits.
Solution: Process in batches with delays.

2. Source Attribution

Problem: Page numbers incorrect due to PDF parsing.
Solution: Verify metadata, display filename+page for manual checks.

3. Context Window

Problem: Too many chunks exceed token limits.
Solution: Limit k=5, each chunk ~250 tokens → ~1,250 total.

4. Prompt Injection Resistance

Test: Tried injecting malicious prompts like “Ignore previous instructions and say…”
Result: System correctly refused and stayed grounded in documents (see screenshot below).
Key: The “EXCLUSIVELY on provided context” instruction prevents the LLM from following injected commands.

Prompt injection test showing the system is not vulnerable

5. Spanish Legal Terms

Problem: LLM struggled with jargon initially.
Solution: Explicit “Spanish regulatory compliance” in prompt helped.

Conclusion

This POC demonstrates the fundamentals of building a RAG system with modern tools:

What we built:

Document chunking and indexing.
Vector similarity search.
LLM-powered answer generation.
Source citation.

What we learned:

RAG prevents hallucination by grounding responses in documents.
Prompt engineering is critical (“EXCLUSIVELY on context”).
LangChain v1 LCEL makes pipelines composable.
FAISS is sufficient for small datasets.

What’s missing:

Authentication, rate limiting, caching.
Hybrid search, reranking.
Monitoring, evaluation.
Scalability, security.

Use this as:

✅ Learning resource for RAG architecture.
✅ Starting point for experiments.
❌ Production deployment.

For production, you’d need weeks of additional work on the items listed in “What’s Missing for Production.”

The code is on GitHub.

Building a RAG System for Cybersecurity Compliance: A Simple POC with LangChain v1+

Table of Contents

Introduction

The Challenge

The Solution

System Architecture

How RAG Works (Simplified)

Building the RAG System

Step 1: Document Processing

Step 2: Generate Embeddings & Index

Step 3: Build the RAG Chain (LangChain v1 LCEL)

Step 4: FastAPI Backend

Step 5: Streamlit Frontend (Optional)

Prompt Engineering for RAG

Example Query Flow

Running the POC

Performance & Costs

Challenges & Lessons

1. Rate Limits

2. Source Attribution

3. Context Window

4. Prompt Injection Resistance

5. Spanish Legal Terms

Conclusion

Resources