LLM RAG with Local GPU Search
Built a Retrieval Augmented Generation (RAG) system running locally with GPU-accelerated search. Implemented document ingestion pipeline, FAISS vector indexing, and a chat interface for querying documents with context-aware responses.
Overview
This project implements a complete Retrieval Augmented Generation (RAG) system that runs locally on GPU-accelerated hardware. The system enables users to upload documents, which are processed and indexed using FAISS for efficient vector similarity search. Users can then query the documents through a chat interface, receiving context-aware responses generated by an LLM with relevant document chunks retrieved using vector search.
Approaches
Document Ingestion Pipeline
Implemented ingest.py to process and chunk documents, extracting text content and preparing it for vectorization.
FAISS Vector Indexing
Used FAISS (Facebook AI Similarity Search) for efficient vector storage and similarity search, enabling fast retrieval of relevant document chunks.
GPU-Accelerated Search
Leveraged GPU acceleration for vector operations, significantly improving search speed and system performance for large document collections.
RAG Chat Interface
Built a web-based chat application (my-chat-app) that allows users to query documents and receive context-aware responses generated by combining retrieved document chunks with LLM generation.
Results
- Successfully implemented local RAG system with GPU acceleration
- Created efficient document ingestion and indexing pipeline
- Developed user-friendly chat interface for document queries
- Achieved fast vector search performance with FAISS
Technical Details
- Used Python for backend processing and document ingestion
- Implemented FAISS for vector similarity search and indexing
- Built chat interface with JavaScript, HTML, and CSS
- Configured GPU acceleration for enhanced performance
- Created modular architecture separating ingestion, indexing, and querying