Building RedTeam MCP: An AI-Powered Penetration Testing Assistant

Introduction

In my previous post on MCP, I explored the Model Context Protocol and how it bridges the gap between AI and external systems. Now, let’s put that knowledge into practice by building RedTeam MCP - an MCP server that enables AI assistants like Claude to orchestrate penetration testing tools.

This project serves a dual purpose:

Learning TypeScript from a Python background (see my TypeScript for Pythonistas guide)
Building a practical security tool that demonstrates MCP’s power

⚠️ Disclaimer: This tool is for authorized security testing only. Always ensure you have explicit permission before scanning any target.

The Vision

Imagine doing HackTheBox machines with an AI assistant that understands your workflow:

User: "I'm starting a new HTB machine at 10.10.10.123. Help me enumerate it."

Claude: I'll begin reconnaissance.

[Uses port_scan] → Found: 22/SSH, 80/HTTP, 443/HTTPS
[Uses tech_detect] → Apache 2.4.41, PHP 7.4.3, WordPress 5.9.1
[Uses subdomain_enum] → Found: admin.target.htb, dev.target.htb
[Uses vuln_scan] → Critical: CVE-2024-XXXX on admin.target.htb

Recommendation: Research CVE-2024-XXXX for potential exploitation.

The AI handles the tedious enumeration while you focus on the interesting parts: understanding the target and crafting exploits.

Architecture

RedTeam MCP follows a layered architecture:

┌─────────────────────────────────────────────────────────────┐
│                     MCP CLIENT (Agent)                      │
│               (Claude Desktop / LangGraph Agent)            │
└─────────────────────────────────────────────────────────────┘
                              │
                        MCP Protocol
                              │
                              ▼
┌─────────────────────────────────────────────────────────────┐
│                  MCP SERVER (redteam-mcp)                   │
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐         │
│  │ Task Tools  │  │ Low-Level   │  │  Prompts    │         │
│  │             │  │   Tools     │  │ (Playbooks) │         │
│  │ port_scan   │  │ nmap        │  │             │         │
│  │ vuln_scan   │  │ nuclei      │  │ htb_recon   │         │
│  │ dir_fuzz    │  │ httpx       │  │ web_enum    │         │
│  │ subdomain   │  │ whatweb     │  │ quick_scan  │         │
│  └─────────────┘  └─────────────┘  └─────────────┘         │
│                                                             │
│  ┌─────────────────────────────────────────────────────┐   │
│  │              Security Guardrails                     │   │
│  │  • Target Whitelisting    • Rate Limiting           │   │
│  │  • Audit Logging          • Input Validation        │   │
│  └─────────────────────────────────────────────────────┘   │
└─────────────────────────────────────────────────────────────┘
                              │
                         CLI Wrappers
                              │
                              ▼
┌─────────────────────────────────────────────────────────────┐
│                    Security Tools                           │
│        nmap  |  nuclei  |  ffuf  |  httpx  |  whatweb       │
└─────────────────────────────────────────────────────────────┘

Design Decisions

Task-Oriented vs Tool-Oriented

A key insight: the AI doesn’t need to know which tool to use, just what task to accomplish.

❌ Tool-Oriented (Bad):

# AI must know ffuf's Host header fuzzing syntax
ffuf -u http://target.htb -H "Host: FUZZ.target.htb" -w wordlist.txt

✅ Task-Oriented (Good):

subdomain_enum --domain target.htb

We built 6 task-oriented tools that abstract the underlying complexity:

Tool	Underlying Implementation
`subdomain_enum`	ffuf with Host header fuzzing
`directory_fuzz`	ffuf with path fuzzing
`vhost_fuzz`	ffuf targeting IP with Host variations
`port_scan`	nmap with version detection
`tech_detect`	httpx + whatweb combination
`vuln_scan`	nuclei with severity filters

For power users, we also expose the low-level tools (nmap, nuclei, httpx, whatweb) with full options.

Safety Guardrails

With great power comes great responsibility. We implemented multiple layers of protection:

1. Target Whitelisting

const ALLOWED_TARGETS = [
  "*.htb", // HackTheBox domains
  "*.thm", // TryHackMe domains
  "10.10.10.*", // HTB IP range
  "10.10.11.*", // HTB IP range
  "*.lab.internal", // Local labs
];

Any target not matching these patterns is rejected before execution.

2. Rate Limiting

Token bucket algorithm prevents abuse:

const LIMITS = {
  port_scan: { maxTokens: 5, refillRate: 0.1 }, // 5 scans, refill 1/10sec
  vuln_scan: { maxTokens: 3, refillRate: 0.05 }, // 3 scans, refill 1/20sec
};

3. Audit Logging

Every invocation is logged with full context:

{
  "id": "audit-1706000000000-abc123",
  "timestamp": "2025-01-26T12:00:00Z",
  "action": "scan_started",
  "tool": "port_scan",
  "target": "10.10.10.123",
  "result": "success",
  "duration": 45200
}

Implementation Highlights

TypeScript for Security Tools

Coming from Python, TypeScript offered surprising benefits for security tooling:

Zod Schemas for input validation:

const NmapInputSchema = z.object({
  target: z.string().describe("Target IP or hostname"),
  scanType: z.enum(["quick", "default", "version", "aggressive"]),
  timing: z.enum(["T0", "T1", "T2", "T3", "T4", "T5"]).default("T4"),
});

Type-safe CLI wrappers:

async function executeCommand(
  command: string,
  options?: { timeout?: number }
): Promise<CommandResult> {
  // Type-checked inputs and outputs
}

MCP SDK integration:

server.registerTool(
  "port_scan",
  {
    description: "Scan for open ports and services",
    inputSchema: { target, scanType, timing },
  },
  async input => {
    const result = await runNmapScan(input);
    return { content: [{ type: "text", text: formatResult(result) }] };
  }
);

Workflow Prompts

MCP supports prompts - pre-defined workflows the AI can follow:

const PROMPTS = [
  {
    name: "htb_initial_recon",
    description: "Complete HTB machine reconnaissance",
    template: `
    1. Port scan with version detection
    2. Technology detection on web services
    3. Subdomain enumeration if domain found
    4. Vulnerability scan on discovered services
  `,
  },
];

Project Structure

redteam-mcp/
├── src/
│   ├── index.ts              # MCP server (12 tools + 4 prompts)
│   ├── tools/
│   │   ├── fuzzing.ts        # subdomain, directory, vhost fuzzing
│   │   ├── nmap.ts           # Port scanning
│   │   ├── nuclei.ts         # Vulnerability scanning
│   │   └── httpx.ts          # HTTP probing
│   ├── prompts/
│   │   └── playbooks.ts      # Enumeration workflows
│   └── utils/
│       ├── validation.ts     # Target whitelisting
│       ├── rateLimiter.ts    # Token bucket rate limiting
│       └── logger.ts         # Audit logging
├── exercises/                # TypeScript learning exercises
├── docs/
│   └── SECURITY.md          # Security guardrails documentation
└── examples/
    ├── htb-machine-workflow.md  # Demo walkthrough
    └── quick-examples.md        # Quick reference

Testing with MCP Inspector

Before connecting to an AI agent, test your server with MCP Inspector:

cd redteam-mcp
npm run build
npx @anthropic-ai/mcp-inspector ./build/index.js

This opens a web UI where you can:

View all registered tools and prompts
Execute tools with custom parameters
Inspect JSON-RPC messages

What’s Next

This post covered the MCP Server — the tool provider. The next phase of the project is the Agent built with LangGraph that:

Connects to the MCP server
Decides which tools to use based on context
Analyzes outputs and plans next steps
Maintains state across the reconnaissance workflow

┌─────────────────────────────────────────────────────────────┐
│                       LangGraph Agent                       │
│  ┌─────────────────────────────────────────────────────┐   │
│  │                   StateGraph                         │   │
│  │   [Recon] → [Enum] → [Analyze] → [Report]           │   │
│  │      ↑                    ↓                          │   │
│  │      └────── [Need More Info] ←──────────────┘      │   │
│  └─────────────────────────────────────────────────────┘   │
│                          ↓                                  │
│                    MCP Client                               │
│                          ↓                                  │
│              RedTeam MCP Server (This Post)                 │
└─────────────────────────────────────────────────────────────┘

Resources

Source Code: github.com/manulqwerty/redteam-mcp
MCP Documentation: modelcontextprotocol.io
TypeScript for Pythonistas: Previous Post
MCP Introduction: Previous Post

Conclusion

RedTeam MCP demonstrates how MCP can bridge AI assistants and specialized security tools. The key takeaways:

Task-oriented design makes tools AI-friendly
Safety guardrails are non-negotiable for security tools
TypeScript + Zod provides excellent input validation
MCP’s architecture cleanly separates concerns

With proper safeguards, AI-assisted penetration testing isn’t just possible - it’s practical. The tedious enumeration phase becomes conversational, letting you focus on the creative aspects of security research.

The next phase of the project will complete the picture with a LangGraph agent that orchestrates this MCP server autonomously.