Back to Articles

MongoDB + RAG

MongoDB + RAG

 


Building a Local RAG System with MongoDB, Ollama & Streamlit

A step-by-step guide to building a fully local AI-powered notes assistant — no cloud, no API costs.


What I have Built

A RAG (Retrieval-Augmented Generation) system that lets you ask questions about your own notes and documents, powered entirely by local AI models.


Tech Stack

 Layer Tool
 Database             MongoDB 7.0 (Docker)
 Embeddings nomic-embed-text (Ollama)
 LLM llama3.2 (Ollama)
 Backend Python (pymongo, numpy, requests)
 UI Streamlit
 Infrastructure Docker + Docker Compose

Step 1 — Set Up MongoDB with Docker

  • Install Docker Desktop from docker.com
  • Pull MongoDB image: docker pull mongo:7.0
  • Run a 3-node Replica Set using Docker Compose
  • Initialize the replica set via rs.initiate() in mongosh

Key point: Use Docker Compose for a clean, reproducible setup. Replica sets are required for transactions and change streams.


Step 2 — Install Ollama & Pull Models

  • Download Ollama from ollama.com
  • Pull the LLM: ollama pull llama3.2
  • Pull the embedding model: ollama pull nomic-embed-text

Key point: Ollama runs models fully locally — no API keys, no costs, no data leaving your machine.


Step 3 — Build the Ingestion Pipeline (ingest.py)

  • Split text into overlapping chunks (500 words, 50-word overlap)
  • Generate vector embeddings for each chunk using nomic-embed-text
  • Store chunks + embeddings into MongoDB

Key point: Chunking with overlap ensures context is not lost at chunk boundaries.


Step 4 — Build the RAG Query Pipeline (rag.py)

  • Embed the user's question using the same model
  • Compute cosine similarity against all stored chunks in MongoDB
  • Retrieve top-K most relevant chunks as context
  • Send context + question to llama3.2 and return the answer

Key point: Always use the same embedding model for both ingestion and querying - mismatched dimensions will cause errors.


Step 5 — Build the Streamlit Chat UI (app.py)

  • Dark-themed chat interface in the browser
  • Sidebar to paste text or upload .txt files
  • Real-time ingestion with chunk counter
  • Chat history with source document chips per answer

Key point: Use st.set_page_config() as the very first Streamlit command in your script.


Architecture

Your Notes → Chunk & Embed → MongoDB
                                 ↓
         Question → Embed → Cosine Search → Top Chunks
                                                 ↓
                                    llama3.2 → Answer

Challenges & Fixes

  • Port conflicts — Stop old containers before starting new ones
  • torch DLL crash on Windows — Replace sentence-transformers with Ollama embeddings
  • Embedding dimension mismatch (384 vs 768) — Clear MongoDB and re-ingest when switching embedding models
  • Windows Long Path error — Enable long paths via PowerShell or Registry

Run It Yourself

# 1. Start MongoDB
docker-compose up -d

# 2. Ingest your notes
python ingest.py

# 3. Launch the UI
streamlit run app.py
# Open http://localhost:8501

Built entirely locally — your data never leaves your machine. 🍃