MCP Server Setup Part 1: Building a Local RAG with DeepSeek for Enhanced Security & Compliance

daybreakdaybreak
13 min read

Introduction

In the realm of natural language processing, Retrieval Augmented Generation (RAG) has emerged as a powerful technique to enhance the performance of language models by incorporating external knowledge sources. In this blog post, we will explore how to build a local RAG system using the LightRAG framework and integrate it with the DeepSeek API.

What is LightRAG?

LightRAG is an open - source framework designed for building Retrieval Augmented Generation (RAG) systems. RAG is a technique that combines the power of retrieval mechanisms with large language models (LLMs). It allows the language model to access external knowledge sources while generating responses, thus enhancing the accuracy and informativeness of the generated text.

Why LightRAG?

LightRAG is lightweight and it provides a set of tools and interfaces to simplify the process of constructing a RAG system. It offers functions for embedding text, interacting with LLMs, managing knowledge storage, and performing queries. It also supports different types of embedding models and LLMs, allowing users to customize the RAG system according to their specific needs.

Most importantly, by setup the local knowledge base, it follows the standards of security and compliance and avoid the exposure of sensitive data to LLMs.

Why DeepSeek?

Because for testing or learning purpose, some DeepSeek models provide free tiers to use. In this example I will call the API of model “DeepSeek-R1-Distill-Qwen-7B”, which is free.

Prerequisites

Before diving into the code, make sure you have the following installed:

  • Python 3.8 or higher

  • LightRAG library

  • pdfplumber library for PDF text extraction

  • tenacity library for retrying API calls

  • ollama for embedding generation

You also need to have a valid DEEPSEEK_API_KEY to access the DeepSeek API.

Installation of LightRAG and download the example RAG source

First, you need to install the LightRAG library. You can follow these steps:

#--------------------------Install LightRAG--------------------------#

git clone https://github.com/HKUDS/LightRAG.git
cd LightRAG
pip install -e .
cd ..

Then, download the example RAG source. txt file and PDF (support both format)

#--------------------------Download the RAG source content ---------------#
# text format
curl https://raw.githubusercontent.com/joydesigner/mcp-lightrag/refs/heads/main/book.txt > ./book.txt
# PDF format
curl https://github.com/joydesigner/mcp-lightrag/blob/main/State_of_EV_2024.pdf > ./State_of_EV_2024.pdf

The above commands perform the following operations:

  1. Clone the LightRAG repository from GitHub.

  2. Enter the cloned directory.

  3. Install the LightRAG library in editable mode.

  4. Return to the previous directory.

  5. Download a sample text file (book.txt) from a remote source and save it in the current directory.

Set the API Key

Set the OPENAI_API_KEY environment variable. You should replace xxxxxxx with your actual API key.

cp .example.env .env

Setup the Project

Ref github URL: Github url for the project
We start by importing all the necessary libraries. asyncio is used for asynchronous programming, LightRAG is the main framework for building the RAG system, and other libraries are used for API calls, embedding generation, and PDF text extraction.

import os
import asyncio
from lightrag import LightRAG, QueryParam
from lightrag.llm.openai import openai_complete_if_cache
from lightrag.kg.shared_storage import initialize_pipeline_status
from lightrag.llm.ollama import ollama_embed
from lightrag.utils import EmbeddingFunc
from tenacity import retry, stop_after_attempt, wait_exponential
import pdfplumber

Setting up the Working Directory and Embedding Dimensions

Create your working directory for the local RAG vector db storage. You can use ollama to pull the text embedding. I create a working directory for LightRAG if it doesn't exist and set the embedding dimension to 768, which matches the output dimension of the nomic-embed-text model.

# Define the working directory for lightRAG
WORKING_DIR = "./mybook"

if not os.path.exists(WORKING_DIR):
    os.mkdir(WORKING_DIR)

# Define embedding dimensions
EMBEDDING_DIM = 768  # Updated to match nomic-embed-text's actual output dimension

Integrating with the DeepSeek API

We define an asynchronous function llm_model_func to interact with the DeepSeek API. The @retry decorator from the tenacity library is used to retry the API call up to 3 times if it fails. We also set various parameters such as max_tokens, temperature, top_p, presence_penalty, and frequency_penalty to control the output of the language model.

# integrate with DeepSeek API
@retry(stop=stop_after_attempt(3), wait=wait_exponential(multiplier=1, min=4, max=10))
async def llm_model_func(prompt, system_prompt=None, history_messages=None, **kwargs) -> str:
    try:
        response = await openai_complete_if_cache(
            model="deepseek-ai/DeepSeek-R1-Distill-Qwen-7B",
            prompt=prompt,
            system_prompt=system_prompt or "You are an prompt engineering expert, you can help me to generate a prompt for a given task.",
            history_messages=history_messages or [],
            api_key=os.getenv("DEEPSEEK_API_KEY"),
            base_url="https://api.siliconflow.cn/v1",
            max_tokens=1024,  # Increased max tokens
            temperature=0.3,  # Lower temperature for more consistent responses
            top_p=0.9,  # Added top_p parameter
            presence_penalty=0.1,  # Added presence penalty
            frequency_penalty=0.1,  # Added frequency penalty
        )

        if not response or response.strip() == "":
            raise ValueError("Empty response received from API")

        return response
    except Exception as e:
        print(f"Error in LLM model function: {str(e)}")
        raise

Testing the DeepSeek API

async def test_deepseek_api():
    """Test the DeepSeek API connection and response"""
    print("\nTesting DeepSeek API connection...")
    try:
        # Test with a simple prompt
        test_prompt = "Say hello and confirm you are working."
        print(f"Sending test prompt: {test_prompt}")

        response = await llm_model_func(test_prompt)
        print("\nAPI Response:")
        print(response)
        print("\nAPI test successful!")
        return True
    except Exception as e:
        print(f"\nAPI test failed with error: {str(e)}")
        return False

The test_deepseek_api function sends a simple test prompt to the DeepSeek API and checks if the response is valid. If the test fails, the program will stop execution.

Custom Embedding Function

The custom_embed function uses the ollama_embed function to generate embeddings for the given texts. It also verifies that the embedding dimensions match the expected value.

async def custom_embed(texts):
    try:
        embeddings = await ollama_embed(
            texts,
            embed_model="nomic-embed-text",
            host="http://localhost:11434"  # Using localhost as default
        )
        # Verify embedding dimensions
        if embeddings and len(embeddings) > 0 and len(embeddings[0]) != EMBEDDING_DIM:
            raise ValueError(f"Unexpected embedding dimension: {len(embeddings[0])}, expected {EMBEDDING_DIM}")
        return embeddings
    except Exception as e:
        print(f"Error in embedding function: {str(e)}")
        raise

Initializing the LightRAG Instance

# Initialize the lightRAG instance
async def initialize_rag():
    try:
        # Clear existing storage to avoid dimension mismatch with previous data
        if os.path.exists(WORKING_DIR):
            import shutil
            shutil.rmtree(WORKING_DIR)
            os.mkdir(WORKING_DIR)

        embedding_func = EmbeddingFunc(
            embedding_dim=EMBEDDING_DIM,
            max_token_size=512,
            func=custom_embed,
        )

        rag = LightRAG(
            working_dir=WORKING_DIR,
            llm_model_func=llm_model_func,
            embedding_func=embedding_func,
        )

        await rag.initialize_storages()
        await initialize_pipeline_status()

        return rag

Run and Test:

python app.py

The Output

The response of the DeepSeek API and LightRAG processing. In this output, you can see the responses for 4 example queries for different modes, “naive”, “local”, “global”, “hybrid”, have been retrieved from the RAG storage and further processed by DeepSeek.


Testing DeepSeek API connection...
Sending test prompt: Say hello and confirm you are working.

API Response:
Hello! I'm DeepSeek-R1, an AI assistant created by DeepSeek. I'm here to help with information, answer questions, and assist with tasks. How can I assist you today?

API test successful!
INFO: Process 70538 Shared-Data created for Single Process
INFO:nano-vectordb:Init {'embedding_dim': 768, 'metric': 'cosine', 'storage_file': './mybook/vdb_entities.json'} 0 data
INFO:nano-vectordb:Init {'embedding_dim': 768, 'metric': 'cosine', 'storage_file': './mybook/vdb_relationships.json'} 0 data
INFO:nano-vectordb:Init {'embedding_dim': 768, 'metric': 'cosine', 'storage_file': './mybook/vdb_chunks.json'} 0 data
INFO: Process 70538 initialized updated flags for namespace: [full_docs]
INFO: Process 70538 ready to initialize storage namespace: [full_docs]
INFO: Process 70538 initialized updated flags for namespace: [text_chunks]
INFO: Process 70538 ready to initialize storage namespace: [text_chunks]
INFO: Process 70538 initialized updated flags for namespace: [entities]
INFO: Process 70538 initialized updated flags for namespace: [relationships]
INFO: Process 70538 initialized updated flags for namespace: [chunks]
INFO: Process 70538 initialized updated flags for namespace: [chunk_entity_relation]
INFO: Process 70538 initialized updated flags for namespace: [llm_response_cache]
INFO: Process 70538 ready to initialize storage namespace: [llm_response_cache]
INFO: Process 70538 initialized updated flags for namespace: [doc_status]
INFO: Process 70538 ready to initialize storage namespace: [doc_status]
INFO: Process 70538 storage namespace already initialized: [full_docs]
INFO: Process 70538 storage namespace already initialized: [text_chunks]
INFO: Process 70538 storage namespace already initialized: [llm_response_cache]
INFO: Process 70538 storage namespace already initialized: [doc_status]
INFO: Process 70538 Pipeline namespace initialized

Processing PDF file: ./State_of_EV_2024.pdf
Extracted 129455 characters from PDF
Starting book processing...
INFO:nano-vectordb:Init {'embedding_dim': 768, 'metric': 'cosine', 'storage_file': './mybook/vdb_chunks.json'} 0 data
Received empty content from OpenAI API
Received empty content from OpenAI API
Received empty content from OpenAI API
INFO:openai._base_client:Retrying request to /chat/completions in 0.414335 seconds
INFO:openai._base_client:Retrying request to /chat/completions in 0.438990 seconds
INFO:openai._base_client:Retrying request to /chat/completions in 0.854296 seconds
INFO:openai._base_client:Retrying request to /chat/completions in 0.455213 seconds
INFO:openai._base_client:Retrying request to /chat/completions in 0.396082 seconds
Received empty content from OpenAI API
Received empty content from OpenAI API
Received empty content from OpenAI API
INFO:nano-vectordb:Init {'embedding_dim': 768, 'metric': 'cosine', 'storage_file': './mybook/vdb_entities.json'} 0 data
INFO:nano-vectordb:Init {'embedding_dim': 768, 'metric': 'cosine', 'storage_file': './mybook/vdb_relationships.json'} 0 data
Book processing completed successfully

Query: What is EV?
Mode: naive
Processing query in naive mode...
Response: **Electric Vehicle (EV)**  
An Electric Vehicle (EV) is a type of vehicle that runs primarily or exclusively on electricity generated from renewable sources or stored energy. EVs are powered by batteries, which can be either lithium-ion or lead-ion, and are often paired with regenerative braking systems that capture kinetic energy during braking to recharge the battery.  

### Types of Electric Vehicles  
1. **Battery Electric Vehicle (BEV)**: A BEV is powered entirely by a battery pack. The battery stores the energy required to accelerate the vehicle and power accessories like the onboard computer, radio, lights, and seat heaters. BEVs are typically faster, have greater range, and are more efficient than combustion-powered vehicles.  

2. **Plug-in Hybrid Electric Vehicle (PHEV)**: A PHEV combines a battery with a traditional internal combustion engine. The driver can choose to power the vehicle using either the battery or the engine. The battery is recharged using the kinetic energy generated when braking, which helps extend the range of the vehicle.  

### Market Share and Growth  
The document highlights that Australia’s electric vehicle fleet is growing rapidly. In 2023, the fleet target is to have over 500,000 EVs by the end of 2027, aligning with the EU’s ambitious target of 1 million EVs by 2027. Australia’s market share of electric vehicles has increased significantly over the years, with BEVs and PHEVs becoming the dominant segments.  

### Incentives and Challenges  
The document also discusses the role of government incentives in promoting EV adoption. For example, the Federal Government introduced a $7,500 discount on new vehicle registrations for eligible EVs purchased between 1 January 2024 and 31 December 2024. However, challenges such as high initial costs, range limitations, and charging infrastructure remain barriers to widespread adoption.  

### Conclusion  
EVs represent a significant shift in transportation, offering environmental benefits and reducing reliance on fossil fuels. While challenges persist, ongoing policy support and technological advancements are driving progress toward achieving global EV targets.  

---

**References**  
1. [KG] Document Chunks, Section 4202, Chunk 4202  
2. [KG] Document Chunks, Section 4202, Chunk 4202  
3. [KG] Document Chunks, Section 4202, Chunk 4202  
4. [KG] Document Chunks, Section 4202, Chunk 4202  
5. [KG] Document Chunks, Section 4202, Chunk 4202

Query: How is the market of EV in 2024 in Australia?
Mode: local
Processing query in local mode...
Response: ### Market of Electric Vehicles (EVs) in Australia in 2024

Australia's electric vehicle market in 2024 has shown significant growth, with a strong focus on sustainability and government support. The market is characterized by increasing vehicle availability, expanding charging infrastructure, and a shift towards battery-electric vehicles (BEVs) and plug-in hybrid electric vehicles (PHEVs).

#### Key Highlights:
1. **Market Growth and Policy Support**:
   - Australia aims to achieve net zero emissions by 2050, with a target of 50% new vehicle sales by 2025. This is supported by policies like the **National Electric Vehicle Strategy for 2025** and the **NSW kerbside charging program**, which complements workplace charging initiatives.
   - The **Electric Vehicle Sales** report shows that BEVs dominate the market, with PHEVs gaining traction in specific segments.

2. **Vehicle Availability**:
   - Leading EV models in 2024 include the Tesla Model Y, Model 3, and BYD Sealion, reflecting a mix of premium and accessible options.
   - The **Electric Vehicle Transition** report highlights the increasing competition and the role of government policies in accelerating the shift to EVs.

3. **Charging Infrastructure Expansion**:
   - The **Workplace Charging Infrastructure** report indicates a focus on expanding kerbside and workplace charging networks to support EV adoption.
   - The **Electric Vehicle Charging Infrastructure** report notes investments in destination charging and smart charging technologies to enhance convenience and efficiency.

4. **Regional Diversity**:
   - Regional markets like ACT, NSW, and Victoria lead in EV sales, reflecting Australia's spatial diversity and regional sustainability efforts.

5. **Challenges and Future Outlook**:
   - Challenges include funding and consent for workplace charging, but the market remains buoyant with strong policy support and increasing vehicle availability.

#### Conclusion:
Australia's EV market in 2024 is growing rapidly, supported by government initiatives, expanding charging infrastructure, and a shift towards BEVs and PHEVs. The market is aligned with sustainability goals and reflects a broader trend towards electric mobility.

---

### References
- [Knowledge Graph (KG)](https://knowledge.gov.au/energy/information/...)  
- [Vector Data (DC)](https://www.electric-vehicle.com/...)

Query: How is the New Vehicle Efficiency Standards in Australia?
Mode: global
Processing query in global mode...
Response: ### Answer:

The New Vehicle Efficiency Standards in Australia aim to reduce greenhouse gas emissions and improve fuel efficiency for light vehicles. These standards are part of the broader National Electric Vehicle (EV) Transition, which focuses on promoting the adoption of electric vehicles (EVs) and hybrid technologies. The standards were introduced to support the development of more affordable and efficient electric vehicle models, particularly battery-electric vehicles (BEVs) and plug-in hybrid electric vehicles (PHEVs).

#### Key Aspects of the New Vehicle Efficiency Standards:
1. **Purpose and Targets**:  
   The standards set targets for achieving a 40% reduction in vehicle emissions by 2030 and a 30% reduction in vehicle fleet emissions by 2035. This aligns with Australia's commitment to becoming a net-zero emissions country by 2050.

2. **Vehicle Types**:  
   The standards prioritize the production of battery-electric vehicles (BEVs), which are expected to dominate the market. BEVs are projected to account for over 50% of new vehicle sales by 2024. Plug-in hybrid electric vehicles (PHEVs) are also gaining traction, with Tesla and BYD leading the market.

3. **Market Share**:  
   The ACT continues to lead the country with 25% of EV sales, followed by Queensland (9.6%), New South Wales (9.5%), Victoria (9.4%), Western Australia (9.3%), South Australia (8.2%), Tasmania (8.0%), and the Northern Territory (4.0%). This regional diversity reflects Australia's efforts to achieve net zero emissions through lower transport sector emissions.

4. **Collaboration**:  
   The standards are supported by a partnership between the Department of Infrastructure and the Australian Capital Territory (ACT), reinforcing regional diversity in EV market performance.

For more detailed information, refer to the following sources:

- [EV Transition](https://knowledgebase.kitops.com/energyInfrastructure/...)  
- [State of Electric Infrastructure](https://knowledgebase.kitops.com/...)  
- [EV Market Share](https://knowledgebase.kitops.com/...)  

These sources provide comprehensive insights into the New Vehicle Efficiency Standards and their impact on Australia's EV market.

Query: How is the regulatory environment of EV in Australia?
Mode: hybrid
Processing query in hybrid mode...
Response: The regulatory environment for electric vehicles (EVs) in Australia is comprehensive and multifaceted, encompassing national strategies, state-specific initiatives, technical standards, government incentives, and infrastructure development. Here's an overview of the key regulatory aspects:

### 1. **National Electric Vehicle Strategy**
   - Australia's National Electric Vehicle Strategy, adopted in 2023, outlines a roadmap for EV adoption. It aims to achieve a 50% increase in new vehicle sales by 2030 and a 43% reduction in emissions by the same year. The strategy also emphasizes the integration of EVs into public transport systems and the development of charging infrastructure [1].

### 2. **State-Specific Strategies**
   - Each Australian state has its own Electric Vehicle Strategy, which provides detailed plans for EV adoption. For example:
     - **ACT** focuses on accelerating the EV market and expanding public charging networks [2].
     - **NSW** prioritizes EVs in public transport and aims to reduce emissions across the transport sector [3].
     - **VIC** emphasizes the integration of EVs into public transport and achieving net zero emissions by 2050 [4].

### 3. **Technical Standards and Performance Requirements**
   - The Australian Vehicle Efficiency Standard (AVS) sets technical requirements for electric vehicles, including battery capacity, charging efficiency, and emissions standards. Manufacturers must comply with these standards to produce vehicles that meet regulatory requirements [5].

### 4. **Government Policies and Incentives**
   - The Federal Government plays a crucial role in supporting EV adoption through policies and financial incentives. Key initiatives include:
     - Tax incentives and rebates for EVs and charging infrastructure.
     - The National Green Energy Strategy, which aims for net zero emissions by 2050 [6].
     - Support for public transport fleets through targeted financial incentives [7].

### 5. **Electric Vehicle Infrastructure Development**
   - The development of charging infrastructure is a critical component of the EV regulatory environment. The kerbside charging scheme, mandatory for private vehicles, now has over 202,000 installations as of 2023. Public charging points are also expanding to support public transport and commercial fleets [8].

### 6. **Electric Vehicle Council (EV Council)**
   - The EV Council acts as an advocacy group and knowledge hub, supporting the EV industry by providing resources, facilitating knowledge sharing, and promoting best practices. It collaborates with manufacturers, retailers, and policymakers to align private and public efforts [9].

### 7.

Amazing! In less then 10 mins, we already setup a local knowledge base for indexing, plus the power of LLMs, you can ask any questions related to the knowledge base. This greatly extended the knowledge of LLMs and enhance the compliance and security without expose/feed your data into LLM models.

0
Subscribe to my newsletter

Read articles from daybreak directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

daybreak
daybreak

I’m Xin, a passionate AI enthusiast and tech-savvy explorer on a mission to demystify the world of artificial intelligence. My journey into AI began with a fascination for how machines can learn and adapt, and it has since grown into a deep dive into the cutting-edge technologies that are shaping our future. On this blog, I aim to share my discoveries, insights, and experiences with fellow AI aficionados and curious minds. Whether you’re a seasoned developer, a tech student, or just someone intrigued by the possibilities of AI, I hope you’ll find something valuable here. From the latest breakthroughs in machine learning to practical applications in everyday life, I strive to make complex concepts accessible and engaging. Join me as we explore the fascinating intersection of AI, technology, and human ingenuity. Feel free to reach out if you have any questions or just want to chat about all things AI. Let’s embark on this exciting journey together!