Architecture

Overview

KrishnaVani uses a Django backend with Django Ninja for the API layer, a SvelteKit 2 frontend (Svelte 5, adapter-node), and PostgreSQL with pgvector for semantic search. Caddy serves as a reverse proxy routing requests to the appropriate service.

Browser (SvelteKit PWA)
  |
Caddy (:80 / :443)
  |-- /api/*   --> Django Ninja :8000
  |-- /admin/* --> Django Admin :8000
  |-- /*       --> SvelteKit :3000
  |
Django Backend
  |-- Retrieval: embed query -> multi-index pgvector search -> re-rank
  |-- Generation: stream answer via configured LLM
  |-- Verification: hard checks on citations
  |-- Caching: store verified answers for reuse
  |
PostgreSQL + pgvector
  |-- Verses, Translations (701 x 3)
  |-- Enrichments with 3 embedding vectors per verse (HNSW indexed)
  |-- Cached answers, sessions, feedback
  |
NVIDIA API (or any OpenAI-compatible)
  |-- LLM for answer generation
  |-- Embedding model for semantic search

Interactive Diagram

Click any component to see detailed information.

Open diagram in a new tab

Components

Retrieval Pipeline

The retrieval pipeline embeds the user’s query using the configured embedding model and performs multi-index pgvector search across three embedding types per verse: raw text, thematic, and situational. Results are re-ranked for relevance before being passed to the generation stage.

Generation

The generation component streams answers via a configured LLM (through NVIDIA API or any OpenAI-compatible endpoint). Answers are delivered in real-time using Server-Sent Events (SSE), with tokens appearing as they are generated. The LLM is prompted to respond in Krishna’s voice, grounded strictly in the retrieved Gita verses.

Verification

After generation, a verification step performs hard checks on all verse citations in the answer. This ensures that every referenced verse actually exists in the database and that the answer stays grounded in the Bhagavad Gita text.

Caching

Verified answers are cached in PostgreSQL for reuse. When a semantically similar question is asked again, the cached answer can be served directly, reducing latency and API costs.

Text-to-Speech (TTS)

Answers can be read aloud using Edge TTS with Microsoft neural voices. English uses the Prabhat voice and Hindi uses the Madhur voice. The TTS backend is swappable to other providers such as Google, NVIDIA, or ElevenLabs.