Knowledge Graphs

Anant GabhaneAnant Gabhane
4 min read

RAG is based on indexing (ingestion) and retrieval. We need to optimize both indexing and retrieval so that we can perform NLP on large data sets. Understand Approach.

  • If the PDF is huge, then we cannot directly ingest the whole PDF, so we perform chunking on it.

  • We apply vector DB on it to perform a similarity search, which helps in finding relevant chunks

  • We perform retrieval using these chunks


  • Vector embedding does help in similarity search, but loses out on relation storing

  • Now, we’re adding another brain to the top of the RAG as a knowledge graph, which provides relational information to the RAG.

  • Knowledge (information) Graph (data structure represented by nodes and entities)

  • A graph is made out of edges (relations) and nodes (entities)

  • Everything can be represented as a graph

  • If LLM already has information about LLM, then its efficiency would be significantly increased.

  • In the world of RAG, a knowledge graph is a vital thing and is used widely

Vector embeddings are good for sementic search but they lose on relations. They can provide us the chunks but what are various relations inside chunks that’s missing

There are 2 parts involved in knowledge graph > construction & retrieval

  • We need a starting node, that’s why we use a vector database to store the address and meaning of each node. So, now using vector db, we can fetch relevant nodes and start traversal from there, and when we provide chunks combined with relations, then RAG improves

  • When we store node IDs and addresses in a vector database with relevant chunks based on this data, we can find relational nodes from the knowledge graph. When we provide relevant chunks combined with relational data to the LLM, then we get improved results

Use case - Where there are multiple entities and relations in the problem statement, we can use a graph.

Memory

  • ChatGPT stores the memory of the user; likewise, agents should also have access to memory

  • We store this memory in the knowledge graph

  • The issue with LLM is that after a lot of conversation, it starts removing older conversations from the context window, so what we can do is we can create relations of these older messages and instead of providing the whole message as context to the next one, we can give relations (facts) then we will never lose on context

  • The agent should have memory

  • Neo4J specializes in graph


Neo4J :

  • Provides native graph support and is not a wrapper or abstract solution. Uses Cypher query.

  • Creation of node: CREATE (c:Company { name: 'Anant'}) return c

  • MATCH (n) RETURN n returns everything

  • ChatGPT prompt to generate Neo4J cypher query: Create a Neo4J for all the entities and relations in this PDF file

  • MATCH p=(:Character{name:'Big Bad Wolf'})-[]->() RETURN p - Returns all the relations of the character Wolf

  • Relations are described as WHO? TO WHOM? HOW? relate

  • WHO represents → Starting node | TO WHOM → Ending node | HOW → Relation

  • Knowledge graphs are updated by rewriting of nodes
  • Return everything from DB - MATCH p=()-[]->() RETURN p


While building a RAG in the Indexing phase, you have to create a knowledge graph of those documents

Now there are 3 ways to do it :

  1. RAW

  2. Langchain

  3. Mem0

    Quandrant - http://localhost:6333/dashboard

    Neo4J -

  4. RAW - Take input PDF and create chunks out of it

    Ask ChatGPT to find out all the entities(persons, objects), then the relationships between entities

    Merge inserts the entities if they already exist, then it merges them rather than creating a new copy

    for entity in en:

    MERGE(c:Char( ))
    CREATE n - [rel] → (d)

    • Raw way to index a PDF into a knowledge graph
  5. Langchain :

  1. mem0 :

    • https://docs.mem0.ai/overview

    • uv pip install mem0ai command to install mem0

    • Proof that our RAG does not have memory

      >hello

      BOT: Hello! How can I help you today?
      >my name is anant

      BOT: Hello Anant.
      >what is my name

      BOT: As an AI, I don't have access to personal information about you, including your name, unless you've told me during our current conversation. I don't have memory of past interactions or personal details about users.

    • We want to add memory to our RAG application

    • Feed the initial context to generate a good knowledge graph


RAG works on hit and trial

  • While building something, apply effort and analysis techniques

  • Try to remove libraries after the product matures

  • To avoid out-of-context window error, the approach for this is to create a knowledge graph for each message, keep the last 5 messages in the chat along with all the memory [Hybrid approach]

  • So the 5th message is a summary of the last 4 messages + all the graph memory = 6 messages as context for the next user message

0
Subscribe to my newsletter

Read articles from Anant Gabhane directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Anant Gabhane
Anant Gabhane