Knowledge Graphs

RAG is based on indexing (ingestion) and retrieval. We need to optimize both indexing and retrieval so that we can perform NLP on large data sets. Understand Approach.
If the PDF is huge, then we cannot directly ingest the whole PDF, so we perform chunking on it.
We apply vector DB on it to perform a similarity search, which helps in finding relevant chunks
We perform retrieval using these chunks
Vector embedding does help in similarity search, but loses out on relation storing
Now, we’re adding another brain to the top of the RAG as a knowledge graph, which provides relational information to the RAG.
Knowledge (information) Graph (data structure represented by nodes and entities)
A graph is made out of edges (relations) and nodes (entities)
Everything can be represented as a graph
If LLM already has information about LLM, then its efficiency would be significantly increased.
In the world of RAG, a knowledge graph is a vital thing and is used widely
Vector embeddings are good for sementic search but they lose on relations. They can provide us the chunks but what are various relations inside chunks that’s missing
There are 2 parts involved in knowledge graph > construction & retrieval
We need a starting node, that’s why we use a vector database to store the address and meaning of each node. So, now using vector db, we can fetch relevant nodes and start traversal from there, and when we provide chunks combined with relations, then RAG improves
When we store node IDs and addresses in a vector database with relevant chunks based on this data, we can find relational nodes from the knowledge graph. When we provide relevant chunks combined with relational data to the LLM, then we get improved results
Use case - Where there are multiple entities and relations in the problem statement, we can use a graph.
Memory
ChatGPT stores the memory of the user; likewise, agents should also have access to memory
We store this memory in the knowledge graph
The issue with LLM is that after a lot of conversation, it starts removing older conversations from the context window, so what we can do is we can create relations of these older messages and instead of providing the whole message as context to the next one, we can give relations (facts) then we will never lose on context
The agent should have memory
Neo4J specializes in graph
Neo4J :
Provides native graph support and is not a wrapper or abstract solution. Uses Cypher query.
Creation of node:
CREATE (c:Company { name: 'Anant'}) return c
MATCH (n) RETURN n
returns everythingChatGPT prompt to generate Neo4J cypher query: Create a Neo4J for all the entities and relations in this PDF file
MATCH p=(:Character{name:'Big Bad Wolf'})-[]->() RETURN p
- Returns all the relations of the character WolfRelations are described as WHO? TO WHOM? HOW? relate
WHO represents → Starting node | TO WHOM → Ending node | HOW → Relation
Knowledge graphs are updated by rewriting of nodes
Return everything from DB -
MATCH p=()-[]->() RETURN p
While building a RAG in the Indexing phase, you have to create a knowledge graph of those documents
Now there are 3 ways to do it :
RAW
Langchain
Mem0
Quandrant - http://localhost:6333/dashboard
Neo4J -
RAW - Take input PDF and create chunks out of it
Ask ChatGPT to find out all the entities(persons, objects), then the relationships between entities
Merge inserts the entities if they already exist, then it merges them rather than creating a new copy
for entity in en:
MERGE(c:Char( ))
CREATE n - [rel] → (d)- Raw way to index a PDF into a knowledge graph
Langchain :
https://python.langchain.com/docs/how_to/graph_constructing/
We use the library of Langchain to create knowledge graphs
mem0 :
uv pip install mem0ai
command to install mem0Proof that our RAG does not have memory
>hello
BOT: Hello! How can I help you today?
>my name is anantBOT: Hello Anant.
>what is my nameBOT: As an AI, I don't have access to personal information about you, including your name, unless you've told me during our current conversation. I don't have memory of past interactions or personal details about users.
We want to add memory to our RAG application
Feed the initial context to generate a good knowledge graph
RAG works on hit and trial
While building something, apply effort and analysis techniques
Try to remove libraries after the product matures
To avoid out-of-context window error, the approach for this is to create a knowledge graph for each message, keep the last 5 messages in the chat along with all the memory [Hybrid approach]
So the 5th message is a summary of the last 4 messages + all the graph memory = 6 messages as context for the next user message
Subscribe to my newsletter
Read articles from Anant Gabhane directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
