NovusAI — Enterprise RAG Platform

Project Overview

AI-First Legal Intelligence Platform

NovusAI ingests heterogeneous data—contracts, scanned forms, email threads, and annotated diagrams—through multimodal pipelines combining OCR, transformer embeddings, and hybrid BM25+vector retrieval to deliver contextual, citation-backed answers via natural language interfaces.

The platform is built for legal and commercial teams that need to parse high-volume, high-stakes information quickly. It emphasizes compliance, transparency, and scalability with GDPR-aligned storage, auditable queries, and automated summarization workflows. Users can search across diverse sources simultaneously, receiving answers grounded with citations back to original documents.

NovusAI is designed as a containerized service mesh enabling rapid deployment and scaling. It supports concurrent user sessions with low-latency retrieval augmented by caching and vector search. Its modular design also allows for future integrations, such as Drive connectors, fine-tuned domain-specific models, and extended analytics dashboards.

With built-in billing integration, real-time dashboards, and user/organization management, the platform demonstrates how AI-first legal tools can move from prototype to production-grade enterprise readiness. Each workflow is tested with robust QA, backed by monitoring, disaster recovery, and post-launch support guarantees.

My Role

Full-Stack Product Engineering

Architected Flask microservices backend with blueprint routing & RESTful APIs
Implemented JWT+Redis auth with RBAC, audit trails, and usage gating
Integrated React 19 frontend with SSE streaming, KaTeX rendering, and PDF/HTML export
Built Tesseract+OpenCV OCR pipeline with 40%+ improvement on scanned documents
Developed ingestion for PDFs, images, EML, CSV, and DOCX with auto executive summaries
Designed hybrid RAG system (BM25 + vector fusion with grounding checks)
Fine-tuned model parameters and retraining schedules based on feedback and metrics
Deployed Dockerized stack: Flask, React, Qdrant, PostgreSQL, Redis, Nginx
Engineered Gmail OAuth2 sync with 15s polling, sanitization, and deduplication
Delivered Dashboard with real-time usage, email activity, file storage, and user management
Integrated Stripe billing with subscription tracking, top-up credits, and usage meters
Implemented GDPR compliance: Right to Erasure, data export, retention limits, and audit logs

Technical Architecture

Production ML Operations Stack

Modular design prioritizing fault tolerance, multi-tenant isolation, and horizontal scalability. Each service communicates via authenticated APIs with Redis-backed caching, Postgres transaction management, and Qdrant collection-per-user vector isolation.

Complete System Architecture

Request Flow

React sends authenticated POST with JWT Bearer token
Flask validates token via Redis cache, extracts user context
RAG engine performs hybrid retrieval (BM25+vector RRF)
LlamaIndex synthesizes answer with grounding checks
Response streams via SSE with token/usage/metadata events

Data Persistence

PostgreSQL stores structured metadata (JSONB, TIMESTAMPTZ)
Qdrant maintains per-user vector collections with payloads
Redis caches session state, usage counters, org subscriptions
S3 handles file uploads, automated backups, disaster recovery

🔐

Multi-Tenant Security

JWT+Redis auth with @token_required decorators, per-user Qdrant collections, RBAC with audit trails, and organization-level usage quotas enforced at every API boundary.

📧

Gmail Auto-Sync

Background polling daemon fetches 5 emails/user every 30 seconds via OAuth2, extracts metadata, sanitizes HTML, generates embeddings, and maintains checkpoint-based resumable processing.

💳

Stripe Integration

Subscription management with PaymentIntents, usage-based billing (15,000 credits/month base), automatic top-ups, and Redis-cached org subscription status for sub-millisecond gating.

Email Processing Pipeline

Automated Intelligence Extraction

Continuous background polling service with OAuth2 authentication, incremental checkpointing, multipart MIME parsing, HTML sanitization, and dual-storage architecture for structured+semantic access.

Email Ingestion & Embedding Flow

Processing Characteristics

Throughput: 150 emails/hour/user with 30s polling interval
Deduplication: Gmail Message ID + checkpoint tracking
Error Recovery: Per-user isolation with automatic retry
Cost Efficiency: Batch embeddings, incremental processing

Data Safety

Content Sanitization: XSS prevention, link removal
ACID Transactions: Rollback on partial failures
Multi-Tenant: Complete user data isolation
GDPR Compliance: Right to erasure, audit trails

RAG Query Lifecycle

Hybrid Retrieval Architecture

Combines sparse BM25 lexical matching with dense vector similarity using Reciprocal Rank Fusion. LlamaIndex orchestrates query engines with cached index handles, grounding validation, and citation tracing back to source documents.

Query Processing & Response Generation

Retrieval Strategy

BM25 via PostgreSQL tsvector
Dense 3072-d OpenAI embeddings
Reciprocal Rank Fusion (k=60)
User-scoped filtering (multi-tenant)

Answer Quality

Citation validation (grounding_ok)
Source ID verification
Confidence scoring
Fallback to broader search

Performance

Sub-100ms vector search
Cached LlamaIndex engines
Redis RAG result cache
Parallel BM25+vector queries

Technology Stack

Production Tools & Frameworks

Backend & AI

Flask 3.x
Python 3.11
OpenAI GPT-4
LlamaIndex
Qdrant 1.15
FAISS
Tesseract OCR
OpenCV
pdf2image
pytesseract
python-docx
Playwright

Frontend & Infrastructure

React 19.1
TypeScript
React Router
ReactMarkdown
KaTeX
PostgreSQL 12+
Redis 7.x
Docker Compose
Nginx
Stripe API
Gmail API
OAuth2

Impact & Metrics

Production Deployment Results

⚡

Research Acceleration

Hybrid RAG with citation tracing eliminates hours of manual document review. Users locate specific clauses, financial figures, and contract terms in seconds via semantic queries.

📈

Drafting Efficiency

Context-aware generation with automatic citations produces investor briefs, legal memos, and executive summaries in minutes instead of hours, with full source traceability.

🔐

Enterprise Compliance

GDPR-aligned workflows with right to erasure, comprehensive audit trails, RBAC enforcement, and organization-level usage quotas ensure regulatory compliance.

10+ Concurrent Users

40GB Indexed Content

150 Emails/Hr/User

90% Test Coverage

<100ms Vector Search

3072d Embedding Dims

12mo SLA Support

Docker Containerized

Technical Challenges

Engineering Solutions Delivered

Multimodal Document Processing

Built resilient OCR pipeline with Tesseract+OpenCV adaptive thresholding, improving text extraction from low-quality scans by 40%. Implemented semantic chunking with overlap to preserve context across page boundaries, and added vision API integration for image captions and searchable tags.

Hybrid Retrieval Optimization

Architected Reciprocal Rank Fusion combining BM25 sparse retrieval with 3072-d dense vectors. Tuned re-ranking strategies and achieved 85%+ relevance scores on legal domain queries through iterative testing with client-provided representative datasets.

Scalable Vector Operations

Migrated from in-memory FAISS to Qdrant for distributed vector search with per-user collection isolation. Implemented incremental indexing with LlamaIndex and manual fallback upserts to guarantee payload integrity (user_id, doc_id fields) for multi-tenant security.

Real-Time Streaming Architecture

Designed SSE-based token streaming with parallel usage tracking and citation metadata. Implemented backpressure handling, connection recovery, and structured event types (token, usage, meta, done) for reliable delivery under high load.

Email Automation at Scale

Engineered continuous polling daemon with OAuth2 refresh, checkpoint-based resumable processing, and per-user error isolation. Processes 150 emails/hour/user with deduplication via Gmail Message IDs and automatic sanitization (XSS prevention, link removal).

Production Operations

Dockerized all services with health checks, automated backups (weekly Qdrant/Postgres snapshots), Redis-backed session cache, Stripe billing integration, and comprehensive audit logging. Deployed with Nginx reverse proxy and SSL termination.