Build a PDF to Vector RAG System: Mistral OCR, Weaviate Database and MCP Server
A comprehensive RAG (Retrieval-Augmented Generation) workflow that transforms PDF documents into searchable vector embeddings using advanced AI technologies.
π Features
-
PDF Document Processing: Upload and extract text from PDF files using Mistral's OCR capabilities
-
Vector Database Storage: Store document embeddings in Weaviate vector database for efficient retrieval
-
AI-Powered Search: Search through documents using semantic similarity with Cohere embeddings
-
MCP Server Integration: Expose the knowledge base as an AI tool through MCP (Model Context Protocol)
-
Document Metadata: Basic document metadata including filename, content, source, and upload timestamp
-
Text Chunking: Automatic text splitting for optimal vector storage and retrieval
π οΈ Technologies Used
-
Mistral AI: OCR and text extraction from PDF documents
-
Weaviate: Vector database for storing and retrieving document embeddings
-
Cohere: Multilingual embeddings and reranking for improved search accuracy
-
MCP (Model Context Protocol): AI tool integration for external AI workflows
-
n8n: Workflow automation and orchestration
π Prerequisites
Before using this template, you'll need to set up the following credentials:
-
Mistral Cloud API: For PDF text extraction
-
Weaviate API: For vector database operations
-
Cohere API: For embeddings and reranking
-
HTTP Header Auth: For MCP server authentication
π§ Setup Instructions
-
Import the template into your n8n instance
-
Configure credentials for all required services
-
Set up Weaviate collection named "KnowledgeDocuments"
-
Configure webhook paths for the MCP server and form trigger
-
Test the workflow by uploading a PDF document
π Workflow Overview
PDF Upload β Text Extraction β Document Processing β Vector Storage β AI Search
β β β β β
Form Trigger β Mistral OCR β Prepare Metadata β Weaviate DB β MCP Server
π― Use Cases
-
Knowledge Base Management: Create searchable repositories of company documents
-
Research Documentation: Process and search through research papers and reports
-
Legal Document Search: Index and search through legal documents and contracts
-
Technical Documentation: Make technical manuals and guides searchable
-
Academic Literature: Process and search through academic papers and publications
β οΈ Important Notes
-
Model Consistency: Use the same embedding model for both storage and retrieval
-
Collection Management: Ensure your Weaviate collection is properly configured
-
API Limits: Be aware of rate limits for Mistral, Cohere, and Weaviate APIs
-
Document Size: Consider chunking large documents for optimal processing
π Related Resources
π License
This template is provided as-is for educational and commercial use.