RAG Chatbot Architecture

Dortha Franecki, Computer Science Student

Walk through the full request lifecycle of a production-ready RAG (Retrieval-Augmented Generation) chatbot — from input sanitization through vector retrieval, LLM inference, and response delivery. Designed for developers, system architects, and technical interviewers who need to communicate how a modern AI system handles context, memory, and safety in a single sequence.

How to create a RAG Chatbot Architecture

To create a RAG chatbot architecture, follow these steps:

01.

Map the layers first

Identify your core components: UI, safety/guardrails layer, backend API, session cache, vector database, and LLM. Each becomes a participant in the sequence.

02.

Start with the safety gate

Model input validation as the first step — before the backend ever sees a prompt. Use alt blocks to show the rejected vs. safe paths.

03.

Add session memory

Show the backend querying a cache (e.g., Redis) to retrieve recent conversation history before calling the LLM. This is what makes the chatbot feel coherent.

04.

Model the RAG step

Insert a vector DB query between the memory lookup and the LLM call — the backend embeds the sanitized prompt and retrieves relevant context.

05.

Build the LLM call

Pass the combination of history, retrieved context, and current prompt to the model. Show the response flowing back through the chain.

06.

Use autonumber

Add autonumber at the top of the sequence — it labels every step automatically and makes the diagram easy to reference in documentation.

07.

Use critical blocks for multi-step processing

Wrap the backend processing steps in a critical block to visually group the core request logic.

Share with others

Energy Flow Sankey Diagram

Visualize how energy, materials, or resources flow through systems with proportional arrows that show volume at a glance. This template makes it easy to spot where the biggest flows occur, identify losses or inefficiencies, and communicate complex transformations visually. Perfect for sustainability reports, process optimization, or explaining resource allocation to stakeholders.

Mermaid

Agile Workflow Kanban Board

Visualize work items flowing through stages from start to finish. This template organizes tasks into columns showing their current status, making bottlenecks obvious and progress transparent. Perfect for agile teams, sprint planning, workflow management, or any process where you need to see what's being worked on and what's next.

Mermaid

Entity Relationship Diagram

Visualize how your database pieces fit together. This template maps the relationships between different data entities — showing what information each table holds, how tables connect to each other, and the type of relationships that exist. It's essential for anyone building or documenting databases, helping developers understand data structure, identifying missing connections, or planning migrations.

Mermaid

ERD Customer Relationship Management (CRM)

Build the data foundation for tracking customer relationships. This template maps accounts, contacts, leads, opportunities, cases, and campaigns — with keys, attributes, and relationships — so teams can align on how records connect from first touch to closed deal and support.

RAG Chatbot Architecture

How to create a RAG Chatbot Architecture

Map the layers first

Identify your core components: UI, safety/guardrails layer, backend API, session cache, vector database, and LLM. Each becomes a participant in the sequence.

Start with the safety gate

Model input validation as the first step — before the backend ever sees a prompt. Use alt blocks to show the rejected vs. safe paths.

Add session memory

Show the backend querying a cache (e.g., Redis) to retrieve recent conversation history before calling the LLM. This is what makes the chatbot feel coherent.

Model the RAG step

Insert a vector DB query between the memory lookup and the LLM call — the backend embeds the sanitized prompt and retrieves relevant context.

Build the LLM call

Pass the combination of history, retrieved context, and current prompt to the model. Show the response flowing back through the chain.

Use autonumber

Add autonumber at the top of the sequence — it labels every step automatically and makes the diagram easy to reference in documentation.

Use critical blocks for multi-step processing

Wrap the backend processing steps in a critical block to visually group the core request logic.

Share with others

Tags

You might also like

Energy Flow Sankey Diagram

Mermaid

Agile Workflow Kanban Board

Mermaid

Entity Relationship Diagram

Mermaid

ERD Customer Relationship Management (CRM)

Build the data foundation for tracking customer relationships. This template maps accounts, contacts, leads, opportunities, cases, and campaigns — with keys, attributes, and relationships — so teams can align on how records connect from first touch to closed deal and support.

Mermaid