Back to Portfolio
LLM & NLP

LLM RAG with Local GPU Search

Built a Retrieval Augmented Generation (RAG) system running locally with GPU-accelerated search. Implemented document ingestion pipeline, FAISS vector indexing, and a chat interface for querying documents with context-aware responses.

Overview

This project implements a complete Retrieval Augmented Generation (RAG) system that runs locally on GPU-accelerated hardware. The system enables users to upload documents, which are processed and indexed using FAISS for efficient vector similarity search. Users can then query the documents through a chat interface, receiving context-aware responses generated by an LLM with relevant document chunks retrieved using vector search.

Approaches

Document Ingestion Pipeline

Implemented ingest.py to process and chunk documents, extracting text content and preparing it for vectorization.

FAISS Vector Indexing

Used FAISS (Facebook AI Similarity Search) for efficient vector storage and similarity search, enabling fast retrieval of relevant document chunks.

GPU-Accelerated Search

Leveraged GPU acceleration for vector operations, significantly improving search speed and system performance for large document collections.

RAG Chat Interface

Built a web-based chat application (my-chat-app) that allows users to query documents and receive context-aware responses generated by combining retrieved document chunks with LLM generation.

Results

  • Successfully implemented local RAG system with GPU acceleration
  • Created efficient document ingestion and indexing pipeline
  • Developed user-friendly chat interface for document queries
  • Achieved fast vector search performance with FAISS

Technical Details

  • Used Python for backend processing and document ingestion
  • Implemented FAISS for vector similarity search and indexing
  • Built chat interface with JavaScript, HTML, and CSS
  • Configured GPU acceleration for enhanced performance
  • Created modular architecture separating ingestion, indexing, and querying

Technologies Used

PythonJavaScriptFAISSLLMRAGGPUVector Search