Building a Local RAG System with MongoDB, Ollama & Streamlit

A step-by-step guide to building a fully local AI-powered notes assistant — no cloud, no API costs.

What I have Built

A RAG (Retrieval-Augmented Generation) system that lets you ask questions about your own notes and documents, powered entirely by local AI models.

Tech Stack

Layer Tool

Database MongoDB 7.0 (Docker)

Embeddings nomic-embed-text (Ollama)

LLM llama3.2 (Ollama)

Backend Python (pymongo, numpy, requests)

UI Streamlit

Infrastructure Docker + Docker Compose

Layer	Tool
Database	MongoDB 7.0 (Docker)
Embeddings	nomic-embed-text (Ollama)
LLM	llama3.2 (Ollama)
Backend	Python (pymongo, numpy, requests)
UI	Streamlit
Infrastructure	Docker + Docker Compose

Step 1 — Set Up MongoDB with Docker

Install Docker Desktop from docker.com

Pull MongoDB image: `docker pull mongo:7.0`

Run a 3-node Replica Set using Docker Compose

Initialize the replica set via `rs.initiate()` in mongosh

Key point: Use Docker Compose for a clean, reproducible setup. Replica sets are required for transactions and change streams.

Step 2 — Install Ollama & Pull Models

Download Ollama from ollama.com

Pull the LLM: `ollama pull llama3.2`

Pull the embedding model: `ollama pull nomic-embed-text`

Key point: Ollama runs models fully locally — no API keys, no costs, no data leaving your machine.

Step 3 — Build the Ingestion Pipeline (`ingest.py`)

Split text into overlapping chunks (500 words, 50-word overlap)

Generate vector embeddings for each chunk using `nomic-embed-text`

Store chunks + embeddings into MongoDB

Key point: Chunking with overlap ensures context is not lost at chunk boundaries.

Step 4 — Build the RAG Query Pipeline (`rag.py`)

Embed the user's question using the same model

Compute cosine similarity against all stored chunks in MongoDB

Retrieve top-K most relevant chunks as context

Send context + question to llama3.2 and return the answer

Key point: Always use the same embedding model for both ingestion and querying - mismatched dimensions will cause errors.

Step 5 — Build the Streamlit Chat UI (`app.py`)

Dark-themed chat interface in the browser

Sidebar to paste text or upload `.txt` files

Real-time ingestion with chunk counter

Chat history with source document chips per answer

Key point: Use `st.set_page_config()` as the very first Streamlit command in your script.

Architecture

Your Notes → Chunk & Embed → MongoDB
                                 ↓
         Question → Embed → Cosine Search → Top Chunks
                                                 ↓
                                    llama3.2 → Answer

Challenges & Fixes

Port conflicts — Stop old containers before starting new ones

torch DLL crash on Windows — Replace sentence-transformers with Ollama embeddings

Embedding dimension mismatch (384 vs 768) — Clear MongoDB and re-ingest when switching embedding models

Windows Long Path error — Enable long paths via PowerShell or Registry

Run It Yourself

# 1. Start MongoDB
docker-compose up -d

# 2. Ingest your notes
python ingest.py

# 3. Launch the UI
streamlit run app.py
# Open http://localhost:8501

Built entirely locally — your data never leaves your machine. 🍃

MongoDB + RAG

Building a Local RAG System with MongoDB, Ollama & Streamlit

A step-by-step guide to building a fully local AI-powered notes assistant — no cloud, no API costs.

What I have Built

A RAG (Retrieval-Augmented Generation) system that lets you ask questions about your own notes and documents, powered entirely by local AI models.

Tech Stack

Layer Tool

Database MongoDB 7.0 (Docker)

Embeddings nomic-embed-text (Ollama)

LLM llama3.2 (Ollama)

Backend Python (pymongo, numpy, requests)

UI Streamlit

Infrastructure Docker + Docker Compose

Step 1 — Set Up MongoDB with Docker

Step 2 — Install Ollama & Pull Models

Download Ollama from ollama.com

Pull the LLM: `ollama pull llama3.2`

Pull the embedding model: `ollama pull nomic-embed-text`

Key point: Ollama runs models fully locally — no API keys, no costs, no data leaving your machine.

Step 3 — Build the Ingestion Pipeline (`ingest.py`)

Split text into overlapping chunks (500 words, 50-word overlap)

Generate vector embeddings for each chunk using `nomic-embed-text`

Store chunks + embeddings into MongoDB

Key point: Chunking with overlap ensures context is not lost at chunk boundaries.

Step 4 — Build the RAG Query Pipeline (`rag.py`)

Step 5 — Build the Streamlit Chat UI (`app.py`)

Dark-themed chat interface in the browser

Sidebar to paste text or upload `.txt` files

Real-time ingestion with chunk counter

Chat history with source document chips per answer

Key point: Use `st.set_page_config()` as the very first Streamlit command in your script.

Architecture

`Your Notes → Chunk & Embed → MongoDB ↓ Question → Embed → Cosine Search → Top Chunks ↓ llama3.2 → Answer`

Challenges & Fixes

Run It Yourself

`# 1. Start MongoDB docker-compose up -d # 2. Ingest your notes python ingest.py # 3. Launch the UI streamlit run app.py # Open http://localhost:8501`

Built entirely locally — your data never leaves your machine. 🍃

NeoVidya

Quick Links

Categories

Connect

Building a Local RAG System with MongoDB, Ollama & Streamlit

A step-by-step guide to building a fully local AI-powered notes assistant — no cloud, no API costs.

What I have Built

A RAG (Retrieval-Augmented Generation) system that lets you ask questions about your own notes and documents, powered entirely by local AI models.

Tech Stack

Layer Tool Database MongoDB 7.0 (Docker) Embeddings nomic-embed-text (Ollama) LLM llama3.2 (Ollama) Backend Python (pymongo, numpy, requests) UI Streamlit Infrastructure Docker + Docker Compose

Step 1 — Set Up MongoDB with Docker

Step 2 — Install Ollama & Pull Models

Download Ollama from ollama.com Pull the LLM: ollama pull llama3.2 Pull the embedding model: ollama pull nomic-embed-text Key point: Ollama runs models fully locally — no API keys, no costs, no data leaving your machine.

Step 3 — Build the Ingestion Pipeline (ingest.py)

Split text into overlapping chunks (500 words, 50-word overlap) Generate vector embeddings for each chunk using nomic-embed-text Store chunks + embeddings into MongoDB Key point: Chunking with overlap ensures context is not lost at chunk boundaries.

Step 4 — Build the RAG Query Pipeline (rag.py)

Step 5 — Build the Streamlit Chat UI (app.py)

Dark-themed chat interface in the browser Sidebar to paste text or upload .txt files Real-time ingestion with chunk counter Chat history with source document chips per answer Key point: Use st.set_page_config() as the very first Streamlit command in your script.

Architecture

Your Notes → Chunk & Embed → MongoDB ↓ Question → Embed → Cosine Search → Top Chunks ↓ llama3.2 → Answer

Challenges & Fixes

Run It Yourself

# 1. Start MongoDB docker-compose up -d # 2. Ingest your notes python ingest.py # 3. Launch the UI streamlit run app.py # Open http://localhost:8501 Built entirely locally — your data never leaves your machine. 🍃

NeoVidya

Quick Links

Categories

Connect

Layer Tool

Database MongoDB 7.0 (Docker)

Embeddings nomic-embed-text (Ollama)

LLM llama3.2 (Ollama)

Backend Python (pymongo, numpy, requests)

UI Streamlit

Infrastructure Docker + Docker Compose

Download Ollama from ollama.com

Pull the LLM: `ollama pull llama3.2`

Pull the embedding model: `ollama pull nomic-embed-text`

Key point: Ollama runs models fully locally — no API keys, no costs, no data leaving your machine.

Step 3 — Build the Ingestion Pipeline (`ingest.py`)

Split text into overlapping chunks (500 words, 50-word overlap)

Generate vector embeddings for each chunk using `nomic-embed-text`

Store chunks + embeddings into MongoDB

Key point: Chunking with overlap ensures context is not lost at chunk boundaries.

Step 4 — Build the RAG Query Pipeline (`rag.py`)

Step 5 — Build the Streamlit Chat UI (`app.py`)

Dark-themed chat interface in the browser

Sidebar to paste text or upload `.txt` files

Real-time ingestion with chunk counter

Chat history with source document chips per answer

Key point: Use `st.set_page_config()` as the very first Streamlit command in your script.

`Your Notes → Chunk & Embed → MongoDB ↓ Question → Embed → Cosine Search → Top Chunks ↓ llama3.2 → Answer`

`# 1. Start MongoDB docker-compose up -d # 2. Ingest your notes python ingest.py # 3. Launch the UI streamlit run app.py # Open http://localhost:8501`

Built entirely locally — your data never leaves your machine. 🍃