An enterprise-ready retrieval-augmented generation platform that transforms multimodal documents, emails, and legal content into citation-backed intelligence. Built with hybrid vector search, real-time streaming, and GDPR-compliant workflows to compress research cycles and enable data-driven decision-making at scale.
Explore a working example of the platform by logging in with Username:Sample Password:sample
Project Overview
AI-First Legal Intelligence Platform
NovusAI ingests heterogeneous data—contracts, scanned forms, email threads, and annotated diagrams—through multimodal pipelines combining OCR, transformer embeddings, and hybrid BM25+vector retrieval to deliver contextual, citation-backed answers via natural language interfaces.
The platform is built for legal and commercial teams that need to parse high-volume, high-stakes information quickly. It emphasizes compliance, transparency, and scalability with GDPR-aligned storage, auditable queries, and automated summarization workflows. Users can search across diverse sources simultaneously, receiving answers grounded with citations back to original documents.
NovusAI is designed as a containerized service mesh enabling rapid deployment and scaling. It supports concurrent user sessions with low-latency retrieval augmented by caching and vector search. Its modular design also allows for future integrations, such as Drive connectors, fine-tuned domain-specific models, and extended analytics dashboards.
With built-in billing integration, real-time dashboards, and user/organization management, the platform demonstrates how AI-first legal tools can move from prototype to production-grade enterprise readiness. Each workflow is tested with robust QA, backed by monitoring, disaster recovery, and post-launch support guarantees.
My Role
Full-Stack Product Engineering
Architected Flask microservices backend with blueprint routing & RESTful APIs
Implemented JWT+Redis auth with RBAC, audit trails, and usage gating
Integrated React 19 frontend with SSE streaming, KaTeX rendering, and PDF/HTML export
Built Tesseract+OpenCV OCR pipeline with 40%+ improvement on scanned documents
Developed ingestion for PDFs, images, EML, CSV, and DOCX with auto executive summaries
Designed hybrid RAG system (BM25 + vector fusion with grounding checks)
Fine-tuned model parameters and retraining schedules based on feedback and metrics
Engineered Gmail OAuth2 sync with 15s polling, sanitization, and deduplication
Delivered Dashboard with real-time usage, email activity, file storage, and user management
Integrated Stripe billing with subscription tracking, top-up credits, and usage meters
Implemented GDPR compliance: Right to Erasure, data export, retention limits, and audit logs
Technical Architecture
Production ML Operations Stack
Modular design prioritizing fault tolerance, multi-tenant isolation, and horizontal scalability. Each service communicates via authenticated APIs with Redis-backed caching, Postgres transaction management, and Qdrant collection-per-user vector isolation.
Complete System Architecture
Request Flow
React sends authenticated POST with JWT Bearer token
Flask validates token via Redis cache, extracts user context
JWT+Redis auth with @token_required decorators, per-user Qdrant collections, RBAC with audit trails, and organization-level usage quotas enforced at every API boundary.
📧
Gmail Auto-Sync
Background polling daemon fetches 5 emails/user every 30 seconds via OAuth2, extracts metadata, sanitizes HTML, generates embeddings, and maintains checkpoint-based resumable processing.
💳
Stripe Integration
Subscription management with PaymentIntents, usage-based billing (15,000 credits/month base), automatic top-ups, and Redis-cached org subscription status for sub-millisecond gating.
Email Processing Pipeline
Automated Intelligence Extraction
Continuous background polling service with OAuth2 authentication, incremental checkpointing, multipart MIME parsing, HTML sanitization, and dual-storage architecture for structured+semantic access.
Email Ingestion & Embedding Flow
Processing Characteristics
Throughput: 150 emails/hour/user with 30s polling interval
Deduplication: Gmail Message ID + checkpoint tracking
Error Recovery: Per-user isolation with automatic retry
Content Sanitization: XSS prevention, link removal
ACID Transactions: Rollback on partial failures
Multi-Tenant: Complete user data isolation
GDPR Compliance: Right to erasure, audit trails
RAG Query Lifecycle
Hybrid Retrieval Architecture
Combines sparse BM25 lexical matching with dense vector similarity using Reciprocal Rank Fusion. LlamaIndex orchestrates query engines with cached index handles, grounding validation, and citation tracing back to source documents.
Query Processing & Response Generation
Retrieval Strategy
BM25 via PostgreSQL tsvector
Dense 3072-d OpenAI embeddings
Reciprocal Rank Fusion (k=60)
User-scoped filtering (multi-tenant)
Answer Quality
Citation validation (grounding_ok)
Source ID verification
Confidence scoring
Fallback to broader search
Performance
Sub-100ms vector search
Cached LlamaIndex engines
Redis RAG result cache
Parallel BM25+vector queries
Technology Stack
Production Tools & Frameworks
Backend & AI
Flask 3.x
Python 3.11
OpenAI GPT-4
LlamaIndex
Qdrant 1.15
FAISS
Tesseract OCR
OpenCV
pdf2image
pytesseract
python-docx
Playwright
Frontend & Infrastructure
React 19.1
TypeScript
React Router
ReactMarkdown
KaTeX
PostgreSQL 12+
Redis 7.x
Docker Compose
Nginx
Stripe API
Gmail API
OAuth2
Impact & Metrics
Production Deployment Results
⚡
Research Acceleration
Hybrid RAG with citation tracing eliminates hours of manual document review. Users locate specific clauses, financial figures, and contract terms in seconds via semantic queries.
📈
Drafting Efficiency
Context-aware generation with automatic citations produces investor briefs, legal memos, and executive summaries in minutes instead of hours, with full source traceability.
🔐
Enterprise Compliance
GDPR-aligned workflows with right to erasure, comprehensive audit trails, RBAC enforcement, and organization-level usage quotas ensure regulatory compliance.
10+Concurrent Users
40GBIndexed Content
150Emails/Hr/User
90%Test Coverage
<100msVector Search
3072dEmbedding Dims
12moSLA Support
DockerContainerized
Technical Challenges
Engineering Solutions Delivered
Multimodal Document Processing
Built resilient OCR pipeline with Tesseract+OpenCV adaptive thresholding, improving text extraction from low-quality scans by 40%. Implemented semantic chunking with overlap to preserve context across page boundaries, and added vision API integration for image captions and searchable tags.
Hybrid Retrieval Optimization
Architected Reciprocal Rank Fusion combining BM25 sparse retrieval with 3072-d dense vectors. Tuned re-ranking strategies and achieved 85%+ relevance scores on legal domain queries through iterative testing with client-provided representative datasets.
Scalable Vector Operations
Migrated from in-memory FAISS to Qdrant for distributed vector search with per-user collection isolation. Implemented incremental indexing with LlamaIndex and manual fallback upserts to guarantee payload integrity (user_id, doc_id fields) for multi-tenant security.
Real-Time Streaming Architecture
Designed SSE-based token streaming with parallel usage tracking and citation metadata. Implemented backpressure handling, connection recovery, and structured event types (token, usage, meta, done) for reliable delivery under high load.
Email Automation at Scale
Engineered continuous polling daemon with OAuth2 refresh, checkpoint-based resumable processing, and per-user error isolation. Processes 150 emails/hour/user with deduplication via Gmail Message IDs and automatic sanitization (XSS prevention, link removal).
Production Operations
Dockerized all services with health checks, automated backups (weekly Qdrant/Postgres snapshots), Redis-backed session cache, Stripe billing integration, and comprehensive audit logging. Deployed with Nginx reverse proxy and SSL termination.