Production-Grade RAG Platform

NovusAI

An enterprise-ready retrieval-augmented generation platform that transforms multimodal documents, emails, and legal content into citation-backed intelligence. Built with hybrid vector search, real-time streaming, and GDPR-compliant workflows to compress research cycles and enable data-driven decision-making at scale.


Explore a working example of the platform by logging in with
Username: Sample
Password: sample

Project Overview

AI-First Legal Intelligence Platform

NovusAI ingests heterogeneous data—contracts, scanned forms, email threads, and annotated diagrams—through multimodal pipelines combining OCR, transformer embeddings, and hybrid BM25+vector retrieval to deliver contextual, citation-backed answers via natural language interfaces.

The platform is built for legal and commercial teams that need to parse high-volume, high-stakes information quickly. It emphasizes compliance, transparency, and scalability with GDPR-aligned storage, auditable queries, and automated summarization workflows. Users can search across diverse sources simultaneously, receiving answers grounded with citations back to original documents.

NovusAI is designed as a containerized service mesh enabling rapid deployment and scaling. It supports concurrent user sessions with low-latency retrieval augmented by caching and vector search. Its modular design also allows for future integrations, such as Drive connectors, fine-tuned domain-specific models, and extended analytics dashboards.

With built-in billing integration, real-time dashboards, and user/organization management, the platform demonstrates how AI-first legal tools can move from prototype to production-grade enterprise readiness. Each workflow is tested with robust QA, backed by monitoring, disaster recovery, and post-launch support guarantees.

My Role

Full-Stack Product Engineering

  • Architected Flask microservices backend with blueprint routing & RESTful APIs
  • Implemented JWT+Redis auth with RBAC, audit trails, and usage gating
  • Integrated React 19 frontend with SSE streaming, KaTeX rendering, and PDF/HTML export
  • Built Tesseract+OpenCV OCR pipeline with 40%+ improvement on scanned documents
  • Developed ingestion for PDFs, images, EML, CSV, and DOCX with auto executive summaries
  • Designed hybrid RAG system (BM25 + vector fusion with grounding checks)
  • Fine-tuned model parameters and retraining schedules based on feedback and metrics
  • Deployed Dockerized stack: Flask, React, Qdrant, PostgreSQL, Redis, Nginx
  • Engineered Gmail OAuth2 sync with 15s polling, sanitization, and deduplication
  • Delivered Dashboard with real-time usage, email activity, file storage, and user management
  • Integrated Stripe billing with subscription tracking, top-up credits, and usage meters
  • Implemented GDPR compliance: Right to Erasure, data export, retention limits, and audit logs

Technical Architecture

Production ML Operations Stack

Modular design prioritizing fault tolerance, multi-tenant isolation, and horizontal scalability. Each service communicates via authenticated APIs with Redis-backed caching, Postgres transaction management, and Qdrant collection-per-user vector isolation.

Complete System Architecture
CLIENT LAYER API GATEWAY AI SERVICES DATA LAYER React Frontend SSE Streaming JWT Auth • KaTeX Gmail Polling OAuth2 • 30s Cycle 5 msgs/user/cycle File Upload Drag-Drop UI PDF • DOCX • EML Flask API Server (Port 5001) Blueprints: auth • chat • rag • upload • gmail • billing • backups @token_required decorator • Redis session cache • CORS RAG Engine Hybrid BM25+Vector RRF Fusion LlamaIndex OCR Pipeline Tesseract+OpenCV pdf2image Adaptive threshold OpenAI Gateway GPT-4 Inference Embeddings (3072d) Usage tracking Memory Service Chat context Session memory Vector storage Billing System Stripe integration Usage quotas Org subscriptions PostgreSQL Users • Emails Documents • Sessions JSONB metadata Qdrant (3072d) Per-user collections Cosine similarity Payload filtering Redis Cache Session tokens Usage counters RAG cache S3 Storage File uploads Backups Encrypted at rest

Request Flow

  • React sends authenticated POST with JWT Bearer token
  • Flask validates token via Redis cache, extracts user context
  • RAG engine performs hybrid retrieval (BM25+vector RRF)
  • LlamaIndex synthesizes answer with grounding checks
  • Response streams via SSE with token/usage/metadata events

Data Persistence

  • PostgreSQL stores structured metadata (JSONB, TIMESTAMPTZ)
  • Qdrant maintains per-user vector collections with payloads
  • Redis caches session state, usage counters, org subscriptions
  • S3 handles file uploads, automated backups, disaster recovery
🔐

Multi-Tenant Security

JWT+Redis auth with @token_required decorators, per-user Qdrant collections, RBAC with audit trails, and organization-level usage quotas enforced at every API boundary.

📧

Gmail Auto-Sync

Background polling daemon fetches 5 emails/user every 30 seconds via OAuth2, extracts metadata, sanitizes HTML, generates embeddings, and maintains checkpoint-based resumable processing.

💳

Stripe Integration

Subscription management with PaymentIntents, usage-based billing (15,000 credits/month base), automatic top-ups, and Redis-cached org subscription status for sub-millisecond gating.

Email Processing Pipeline

Automated Intelligence Extraction

Continuous background polling service with OAuth2 authentication, incremental checkpointing, multipart MIME parsing, HTML sanitization, and dual-storage architecture for structured+semantic access.

Email Ingestion & Embedding Flow
Gmail API OAuth2 Credentials messages.list() maxResults=5 STAGE 1: DISCOVERY Content Parser RFC 2822 headers Multipart MIME Base64 decode STAGE 2: EXTRACTION Sanitizer HTML → Text Strip scripts [link removed] STAGE 3: SANITIZE Format Engine Metadata extract Subject + Body Timestamp norm STAGE 4: FORMAT PostgreSQL INSERT INTO emails (user_id, gmail_msg_id, subject, from_addr, body, labels[]) JSONB metadata • TIMESTAMPTZ OpenAI Embeddings text-embedding-3-large 3072 dimensions $0.00013/1K tokens Qdrant Vector DB PointStruct(id, vector, payload) Cosine distance index Per-user collection Checkpoint System UPDATE gmail_tokens SET last_seen_email_id = %s WHERE user_id = %s Enables resumable processing • Prevents duplicates • Handles API failures Incremental sync: only process emails newer than checkpoint Performance 5 emails/user/30s = 600/hr Multi-tenant isolation User failures don't cascade

Processing Characteristics

  • Throughput: 150 emails/hour/user with 30s polling interval
  • Deduplication: Gmail Message ID + checkpoint tracking
  • Error Recovery: Per-user isolation with automatic retry
  • Cost Efficiency: Batch embeddings, incremental processing

Data Safety

  • Content Sanitization: XSS prevention, link removal
  • ACID Transactions: Rollback on partial failures
  • Multi-Tenant: Complete user data isolation
  • GDPR Compliance: Right to erasure, audit trails

RAG Query Lifecycle

Hybrid Retrieval Architecture

Combines sparse BM25 lexical matching with dense vector similarity using Reciprocal Rank Fusion. LlamaIndex orchestrates query engines with cached index handles, grounding validation, and citation tracing back to source documents.

Query Processing & Response Generation
User Query RAG Service PostgreSQL Qdrant OpenAI SSE Stream "Summarize vendor emails" SELECT * WHERE body @@ to_tsquery BM25 results (top 20) search(query_vector, top_k=20) Vector results + cosine scores RRF Fusion Merge BM25+Vector Top-k=5 final Context Builder Annotate sources Truncate to 8K chars Generate with grounding Answer + citations Grounding Check Verify citation IDs SSE: token, usage, meta, done

Retrieval Strategy

  • BM25 via PostgreSQL tsvector
  • Dense 3072-d OpenAI embeddings
  • Reciprocal Rank Fusion (k=60)
  • User-scoped filtering (multi-tenant)

Answer Quality

  • Citation validation (grounding_ok)
  • Source ID verification
  • Confidence scoring
  • Fallback to broader search

Performance

  • Sub-100ms vector search
  • Cached LlamaIndex engines
  • Redis RAG result cache
  • Parallel BM25+vector queries

Technology Stack

Production Tools & Frameworks

Backend & AI

  • Flask 3.x
  • Python 3.11
  • OpenAI GPT-4
  • LlamaIndex
  • Qdrant 1.15
  • FAISS
  • Tesseract OCR
  • OpenCV
  • pdf2image
  • pytesseract
  • python-docx
  • Playwright

Frontend & Infrastructure

  • React 19.1
  • TypeScript
  • React Router
  • ReactMarkdown
  • KaTeX
  • PostgreSQL 12+
  • Redis 7.x
  • Docker Compose
  • Nginx
  • Stripe API
  • Gmail API
  • OAuth2

Impact & Metrics

Production Deployment Results

Research Acceleration

Hybrid RAG with citation tracing eliminates hours of manual document review. Users locate specific clauses, financial figures, and contract terms in seconds via semantic queries.

📈

Drafting Efficiency

Context-aware generation with automatic citations produces investor briefs, legal memos, and executive summaries in minutes instead of hours, with full source traceability.

🔐

Enterprise Compliance

GDPR-aligned workflows with right to erasure, comprehensive audit trails, RBAC enforcement, and organization-level usage quotas ensure regulatory compliance.

10+ Concurrent Users
40GB Indexed Content
150 Emails/Hr/User
90% Test Coverage
<100ms Vector Search
3072d Embedding Dims
12mo SLA Support
Docker Containerized

Technical Challenges

Engineering Solutions Delivered

Multimodal Document Processing

Built resilient OCR pipeline with Tesseract+OpenCV adaptive thresholding, improving text extraction from low-quality scans by 40%. Implemented semantic chunking with overlap to preserve context across page boundaries, and added vision API integration for image captions and searchable tags.

Hybrid Retrieval Optimization

Architected Reciprocal Rank Fusion combining BM25 sparse retrieval with 3072-d dense vectors. Tuned re-ranking strategies and achieved 85%+ relevance scores on legal domain queries through iterative testing with client-provided representative datasets.

Scalable Vector Operations

Migrated from in-memory FAISS to Qdrant for distributed vector search with per-user collection isolation. Implemented incremental indexing with LlamaIndex and manual fallback upserts to guarantee payload integrity (user_id, doc_id fields) for multi-tenant security.

Real-Time Streaming Architecture

Designed SSE-based token streaming with parallel usage tracking and citation metadata. Implemented backpressure handling, connection recovery, and structured event types (token, usage, meta, done) for reliable delivery under high load.

Email Automation at Scale

Engineered continuous polling daemon with OAuth2 refresh, checkpoint-based resumable processing, and per-user error isolation. Processes 150 emails/hour/user with deduplication via Gmail Message IDs and automatic sanitization (XSS prevention, link removal).

Production Operations

Dockerized all services with health checks, automated backups (weekly Qdrant/Postgres snapshots), Redis-backed session cache, Stripe billing integration, and comprehensive audit logging. Deployed with Nginx reverse proxy and SSL termination.