Upload any document and ask questions in natural language. DocMind uses a self-correcting RAG pipeline that retrieves, verifies, and — when context is insufficient — automatically rewrites the query and falls back to web search before generating an answer.
5-Step
RAG Pipeline
Hybrid
Search + RRF
SSE
Real-Time Stream
pgvector
Vector Store
Standard RAG pipelines retrieve documents and generate answers in a single pass. If the retrieved context is wrong or incomplete, the answer is wrong — and you never know. Corrective RAG adds a verification loop that detects bad context and self-corrects before answering.
Orchestrated as a LangGraph state machine with conditional edges. If the grading step detects insufficient context, the pipeline branches into query transformation and web search before generating — this is the corrective loop.
Hybrid search: pgvector cosine similarity + PostgreSQL tsvector full-text, merged with Reciprocal Rank Fusion.
Cohere cross-encoder re-scores candidates. Higher precision than bi-encoder similarity alone.
Score threshold filters irrelevant documents. If too many filtered → triggers correction branch.
Query rewritten for better retrieval. Tavily web search provides external context as fallback.
LLM produces an answer grounded in verified context. Streamed in real-time via Server-Sent Events.
Hybrid search: pgvector cosine similarity + PostgreSQL tsvector full-text, merged with Reciprocal Rank Fusion.
Cohere cross-encoder re-scores candidates for higher precision.
Score threshold filters irrelevant documents. Triggers correction if needed.
Query rewriting + web search fallback. Only runs when grading detects insufficient context.
LLM produces answer from verified context, streamed via SSE.
Every component is chosen for a reason. No unnecessary abstractions, no over-engineering — just the right tool for each layer.
The RAG pipeline is a directed graph with conditional edges — not a linear chain. Nodes execute async, edges branch on grading results.
Semantic similarity (pgvector HNSW) and keyword matching (tsvector GIN) merged via Reciprocal Rank Fusion for robust retrieval across query types.
Every vector, document, and query is scoped by user_id — enforced at the SQL WHERE clause level. Complete data isolation between users.
Server-Sent Events deliver LLM tokens in real-time. The frontend renders chunks as they arrive — no polling, no waiting for the full response.
Redis caches embedding vectors with SHA-256 keys and 1-hour TTL. Repeated queries skip the OpenAI API entirely.
Regex-based prompt injection detection catches common attack patterns before they reach the LLM. Zero API cost, sub-millisecond latency.
Next.js 14
Frontend
App Router, RSC
FastAPI
Backend
Async Python
LangGraph
Orchestration
State machine
LangChain
LLM Framework
Chains & prompts
pgvector
Vector Store
HNSW + cosine
PostgreSQL
Database
tsvector + GIN
Redis
Cache
Embeddings + rate limit
Better Auth
Auth
Sessions + cookies
OpenAI
LLM Provider
GPT-4o + embeddings
Cohere
Reranking
Cross-encoder v3.5
PyMuPDF
PDF Processing
Text + images
Docker
Infrastructure
Compose + CI/CD
Upload PDFs with text or visual content. PyMuPDF extracts and chunks text using tiktoken, with optional GPT-4o Vision for charts and diagrams.
Ask anything about your documents. The Corrective RAG pipeline retrieves, verifies, and generates answers streamed in real time via SSE.
Rate every answer with thumbs up/down. Feedback is tracked per user with analytics dashboard showing satisfaction rates over time.
Create a free account and start querying in minutes. No credit card required.
Get Started