Indian Legal Assistant: Project Overview

Athuluri AkhilAthuluri Akhil
11 min read

The Indian Legal Assistant is an AI-driven web platform aimed at making Indian legal information accessible. It provides tools for laypersons and legal professionals alike to analyze documents, search case law, and receive AI-generated legal guidance in a user-friendly interface. The project addresses the complexity of Indian statutes and judgments by combining search APIs (e.g. Indian Kanoon) with modern AI. Similar initiatives have integrated Google’s Gemini generative AI and Supabase backends for legal aid (e.g. a “Legal Aid: Know Your Rights” platform). This project’s purpose is to streamline tasks like legal research, argument formulation, and data visualization for the Indian context. By leveraging AI (Gemini) and data services, it offers functions such as voice-based queries and multilingual support, aiming to democratize legal assistance.

🔗 GitHub Repository: https://github.com/akhilathuluri/Indian_Legal_Assistant

🔗 Live Website - https://indian-legal-assistant.vercel.app/login

Architecture and Tech Stack

The system follows a modular architecture (Figure 1). The frontend is built with React (TypeScript + Vite + Tailwind) providing a web UI for user interaction. Behind the scenes, it connects to Supabase (hosted PostgreSQL with Auth) for data storage and authentication. Supabase’s use of Postgres (rather than NoSQL) offers a robust, scalable relational database. React communicates with external services for core AI tasks.

All API keys and credentials are stored securely and never hard-coded. For example, React apps add the SUPABASE_URL and SUPABASE_KEY to a local secrets file and configure them in deployment (Supabase docs recommend git-ignoring these secrets). This protects the backend from exposing keys.

Tech Stack: Key components include: React (TypeScript + Vite + Tailwind) for UI; Supabase (PostgreSQL, Auth, RLS) as backend; Google Generative AI (Gemini) for NLP tasks; Indian Kanoon API for retrieving case law; World News API for aggregating news; speech libraries (e.g. Google Speech-to-Text/Whisper, gTTS) for audio I/O; and data viz tools (Plotly/Matplotlib) for charts. A comparative overview is in Table 1.

Component / LayerTechnology / ServiceRole
Frontend UIReact (TypeScript + Vite + Tailwind)Web interface; captures user queries, audio
AuthenticationSupabase AuthUser signup/login; issues JWT tokens
DatabaseSupabase (PostgreSQL, RLS)Stores users, transcripts, queries, etc.
AI/NLPGoogle Gemini APINatural language understanding & generation
Case-Law SearchIndian Kanoon APISearches Indian court judgments/statutes
News AggregationWorld News APIFetches global news (filtered by legal topics)
VisualizationMermaidCharts for crime stats; flowchart generation

React (TypeScript + Vite + Tailwind) is chosen for rapid prototyping and ease of use. The use of Supabase provides a fully managed database with built-in Row-Level Security (RLS). RLS enforces that users can only access their own records (e.g. transcript logs) by attaching policies based on auth.uid(). This ensures multi-tenant data isolation in a simple way. Supabase’s open-source stack (PostgreSQL, Kong gateway, etc.) allows horizontal scaling and performance tuning.

Core Features

  • Audio Transcription: Users can record or upload audio. The system transcribes spoken queries or content (e.g. case discussion) into text using a speech-to-text engine. For example, it may use Google Speech-to-Text or OpenAI’s Whisper under the hood. The platform supports multiple Indian languages, echoing a similar design that outputs voice in 12+ languages. This enables users who prefer voice interfaces to interact naturally. The transcribed text is then passed to the NLP chain for analysis.

  • Document Analysis: Users can upload legal documents (PDFs, Word, etc.). The backend extracts text and uses the AI model (via Gemini API) to summarize, answer questions, or detect key issues. For instance, the assistant might highlight risk factors in a contract or summarize the clauses in plain language. This builds on concepts of AI-driven legal summarization and compliance checking, allowing non-experts to understand complex text.

  • Case Law Research: Using the Indian Kanoon API, users can input legal queries or case citations. The app queries Kanoon’s database of Supreme Court and High Court judgments. Relevant cases are fetched and optionally ranked. The text of each case (or summary) can be fed into the LLM to generate concise explanations of rulings or precedent applicability. Providing direct integration with a legal search engine ensures access to India’s largest case-law repository.

  • AI Legal Analysis & Arguments: A key feature is an LLM-powered chatbot. Given a legal question, the system formulates answers or arguments. It employs retrieval-augmented generation (RAG) to improve accuracy: retrieved documents (statutes or cases) guide the AI’s response. (RAG is recommended for legal domains to reduce hallucinations.) The platform might present citations or outline reasoning steps. The GenAI model is used not just for raw answers, but for drafting legal arguments or risk analyses (e.g. “AI-Based Legal Arguments” as noted by similar projects). The system ensures explanations reference actual laws or cases to avoid “inventing” facts.

  • Legal Code Lookup: Users can search for sections of statutory codes (e.g. Indian Penal Code, Act sections). The app maintains or fetches a database of bare acts and sections, allowing quick lookup. For example, entering “IPC 302” returns the text and a summary of Section 302 (murder). This is typically implemented via a simple search or indexed dataset of legal statutes.

  • Legal Flowcharts: The assistant includes interactive flowcharts for common legal processes (e.g. filing a complaint, marriage dissolution). These are either pre-generated diagrams (using libraries like Mermaid or Plotly) or dynamically constructed from text. The flowchart engine parses multi-step procedures and displays them graphically. This helps visualize complex procedures (e.g. the stages of a civil lawsuit) in an intuitive way.

  • Crime Data Visualization: The system integrates official crime statistics (e.g. from the National Crime Records Bureau or open datasetsmoreajinkyaraj.com). Using Plotly/Map charts, it shows trends (e.g. crime rates by state or category). For example, bar graphs for “Crime Against Women” by year, or heatmaps for regional crime density. Interactive plots allow users and policymakers to explore correlations or track changes. This mirrors efforts to present NCRB data visually for public insightmoreajinkyaraj.com.

  • Legal News Aggregation: Fresh legal news and updates are provided via the World News API. The app fetches the latest articles from global and Indian media, filtered by legal keywords (e.g. “Supreme Court”, “legislation”). The World News API offers real-time access to thousands of sources. Users can read summaries of recent judgments, new laws, or high-profile cases – keeping legal professionals and lay users informed.

AI & API Integrations

The platform’s intelligence comes from integrated AI and web APIs:

  • Google Generative AI (Gemini): The core NLP and generation engine is Google’s Gemini model (via Google Cloud’s Generative AI API). This LLM handles understanding user questions and producing text answers or summaries. For example, the system might use Gemini-1.5 for its large context window, enabling it to process lengthy statutes or transcripts. In practice, user inputs (queries, doc text) and retrieved law snippets are fed into the Gemini API (e.g. using ChatGoogleGenerativeAI chains). Gemini provides advanced capabilities like context-based reasoning; one example project noted that Gemini’s embeddings and RAG approach allow precise legal Q&A. However, the architecture must also implement safeguards against hallucination. As Stanford researchers note, legal LLMs can fabricate cases if unchecked, so this integration likely uses RAG to ground the answers.

  • Indian Kanoon API: This API provides full-text search over Indian judgments and laws. The assistant makes HTTP requests to the Kanoon API endpoints, supplying keywords or case numbers. The returned JSON includes case titles, citation links, and text snippets. These results are presented to the user and used as context for the LLM. For instance, a query “right to privacy” would return the landmark K.S. Puttaswamy case. While the Kanoon API is not officially documented here, it is known to offer large-scale legal data.

  • World News API: The app uses the World News API (or similar news aggregator) to fetch legal news. This API allows searching global news by keywords, language, date, and source. The assistant periodically calls this API (or on-demand queries) to retrieve recent articles. For example, it might query “Supreme Court India” and present headlines. The World News API’s free tier includes ~500 requests/day. Its coverage of 50+ languages and 150+ countries ensures a broad spectrum of sources. The system likely filters the results for relevance (e.g. focusing on India or law journals) before displaying them.

  • Speech and Voice APIs: For audio features, Google’s Speech-to-Text API or the open Whisper model is used to convert uploads or recordings into text. For voice output, the app may use gTTS (Google Text-to-Speech) to play back LLM answers. These integrations allow a hands-free interface and accessibility for users.

Security and Data Privacy

Security is a priority given the sensitive nature of legal data. Key measures include:

  • Supabase Row-Level Security (RLS): Every database table intended for user data (e.g. transcripts, personal queries) has RLS enabled. As Supabase documentation explains, RLS lets developers write SQL policies that grant row access based on attributes like auth.uid(). For example, a policy can ensure each user sees only rows where owner_id = auth.uid(). This prevents cross-user data leaks. All access to the database uses JWTs issued by Supabase Auth; the backend enforces policies before any query returns data.

  • API Key Protection: No API keys are exposed to the client. The Reactfrontend never includes hard-coded keys. Instead, keys for Supabase, Gemini, Kanoon, and NewsAPI are stored on the server side or in secret configuration. For deployment, keys are entered into Vercel Cloud’s secret manager (per [70]) or environment variables. This follows best practice to “add [secrets.toml] to .gitignore and don’t commit”. If the app uses server-side functions or cloud backends, those endpoints hide the keys entirely from the browser.

  • Data Encryption & Compliance: Supabase encrypts data at rest and in transit. Sensitive information (e.g. user credentials) benefits from Supabase’s security compliance (SOC2, GDPR-ready). All communications to external APIs use HTTPS. No personal PII (beyond email/password) is stored by the app unless explicitly needed. Audio files and documents are typically processed on-the-fly and not persisted indefinitely, reducing data exposure. If long-term storage is required, records can be anonymized.

  • Retrieval Security: When calling third-party APIs (Gemini, Kanoon, News), the system likely uses server-side requests. This avoids exposing, for example, a private Gemini API key in the client. The design ensures that untrusted inputs (like uploaded docs or transcribed text) are sanitized before any database insertion, mitigating injection attacks.

By combining Supabase’s built-in security features with disciplined key management, the platform safeguards both legal data and user privacy.

Database Schema and Scalability

The backend data model centers on a PostgreSQL schema. Typical tables include:

  • users (user_id, email, hashed_password, profile_info),

  • queries (query_id, user_id, timestamp, query_text, response_text, metadata),

  • documents (doc_id, user_id, filename, content_extracted, analysis_results),

  • cases (cached case_id, title, summary),

  • news (article_id, title, source, link, timestamp), etc.

All tables use UUID primary keys. Foreign keys link user-owned data. Row-Level Security policies tie user_id to auth.uid(). Indexes are set on common search fields (e.g. full-text search indexes on document content and cases). The schema may also include a vector column if embeddings are stored (Supabase supports pgvector), enabling similarity search for RAG.

Scalability is handled via PostgreSQL’s capabilities. Supabase can scale vertically (bigger instance) or horizontally (read replicas). The app can cache frequent queries to reduce DB load. Since most heavy lifting is done by external APIs (Gemini, Kanoon), the DB mainly stores results, user data, and logs. If usage grows, the architecture could add database shards by function (e.g. a separate data warehouse for analytics, while preserving OLTP performance). On the application side, the React server can be deployed behind a load balancer or as a serverless app to auto-scale with traffic. Overall, using mature technologies (Postgres, Supabase) ensures the system can handle many users and data volume.

Academic and Domain Contributions; Use Cases

This platform contributes to both academia and legal practice. Academically, it serves as a case study of applying LLMs to Indian law, a relatively under-researched context. It could generate datasets of legal QA pairs or embeddings for future NLP research. It also embodies a human-centered AI approach: legal experts and students can critique the AI’s answers, enabling evaluation of model performance on Indian law.

In the legal domain, real-world use cases include:

  • Self-help and Legal Aid: Individuals without lawyers (especially in remote areas) can use the assistant to understand their rights and case scenarios. For example, a farmer dealing with land dispute can ask the AI to explain property laws in simple terms.

  • Paralegals and Lawyers: Professionals can use it to speed up research. Instead of manual LexisNexis searches, a lawyer can type a query and get summarized case law, draft outlines, or flowcharts of procedure. This mirrors global trends: “nearly three quarters of lawyers plan on using generative AI” for research and drafting.

  • Education: Law students can quiz themselves, analyze statutes, and visualize concepts via flowcharts or charts. The tool can serve as an interactive teaching aid.

  • Policy and Crime Analysis: NGOs or government agencies can use the crime data visualizations to identify hotspots and trends, aiding policy decisions.

  • News Monitoring: Media analysts tracking legislative changes or high-profile cases gain a centralized feed of legal news.

By aggregating diverse legal tools into one platform, it lowers the barrier to legal information and supports data-driven legal scholarship.

Strengths, Weaknesses, and Future Enhancements

Strengths:

  • Integration of AI with Local Law: Unlike generic chatbots, this assistant is tailored to Indian law (via Kanoon API and local statutes).

  • Multi-modal Interface: Users can type or speak queries and hear responses (gTTS), improving accessibility.

  • Comprehensive Feature Set: From transcription to data viz, it covers end-to-end legal needs in one app.

  • Modern Stack: Using React (TypeScript + Vite + Tailwind) and Supabase accelerates development and ensures scalability. Security features like RLS enhance trustworthiness.

  • Up-to-Date Information: The news feed and API-backed case search keep content current.

Weaknesses:

  • AI Reliability: Large language models can “hallucinate” or provide misleading legal advice. Without careful RAG and expert review, users might receive incorrect answers.

  • Limited Accuracy of Open-Source Data: If relying on third-party content, there may be gaps (e.g. some local court decisions not in Kanoon’s free API).

  • Scalability of Computation: Heavy API usage (Gemini, speech) can lead to latency. The system may incur costs or rate limits (e.g. News API’s 500/day).

  • UI Constraints: React ais great for prototypes but offers limited customization for complex interfaces; a professional version might need React or mobile app.

  • Privacy Concerns: Users must trust the platform with legal queries. Even with RLS, transcripts and queries are stored, which could be sensitive. Data governance policies must be clear.

0
Subscribe to my newsletter

Read articles from Athuluri Akhil directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Athuluri Akhil
Athuluri Akhil