Expanding Knowledge Graphs for Improved Contextual Search

expanding-kg-data
branch of the repo.In a previous post, we covered the fundamentals of Knowledge Graphs and Graph RAG, exploring how they enhance information retrieval and organization. Now, we will take that foundation a step further by expanding the knowledge graph with keyword nodes and relationships.
Knowledge Graph Expansion: Enriching and Extending Data
Expanding a knowledge graph isn’t just about adding more data. It’s about generating insights from existing information and sourcing external data to create a richer, more connected graph. This process improves semantic search, contextual understanding, and knowledge discovery by making relationships between entities more meaningful.
Keyword Extraction
In this post, we will simply extend our existing KG by pulling meaningful keywords from articles and adding them as nodes in a graph database, we create richer connections between content. This helps with navigation, strengthens semantic relationships, and improves content discovery. In this section, we’ll explore how contextual keywords are extracted, how they integrate into a knowledge graph, and how they enhance data representation.
The Keyword Extraction Process
The process of extracting and integrating keywords into a knowledge graph involves three key steps:
Extracting Keywords from Articles
- A language model (e.g., OpenAI) analyzes the article text to identify 5-10 significant keywords. These keywords are selected based on their contextual importance, focusing on named entities, technical terms, and recurring themes.
Adding Keywords as Nodes in the Graph
- Each keyword is stored as a first-class node in the graph database. Relationships are established between the keyword nodes and their corresponding articles using the
HAS_KEYWORD
relationship.
- Each keyword is stored as a first-class node in the graph database. Relationships are established between the keyword nodes and their corresponding articles using the
Generate Keyword Embeddings for Semantic Search
- Keywords are enriched with vector embeddings to capture their semantic meaning. This enables similarity-based matching between keywords and user queries.
Keyword Extraction and Relationship Mapping Code
This code from src/services/graphOperations.ts
illustrates how keywords are extracted from articles and integrated into a Neo4j knowledge graph.
export async function extractAndAddKeywords() {
const session = driver.session();
try {
const result = await session.run(`
MATCH (article:Article)
WHERE NOT (article)-[:HAS_KEYWORD]->(:Keyword)
RETURN article.articleId AS articleId, article.title AS title
`);
for (const record of result.records) {
const articleId = record.get("articleId");
const title = record.get("title");
// Get the full text of the article by collecting all chunks
const articleTextResult = await session.run(
`
MATCH (article:Article {articleId: $articleId})-[:FIRST_CHUNK|NEXT_CHUNK*]->(chunk:Chunk)
RETURN chunk.text AS text
ORDER BY chunk.chunkSeqId
`,
{ articleId }
);
const fullText = articleTextResult.records
.map((r) => r.get("text"))
.join(" ");
const contextText = title + "\n" + fullText.substring(0, 5000);
// Extract keywords using OpenAI
const keywordResponse = await model.invoke([
{
role: "system",
content: `Extract the 5-10 most important and relevant contextual keywords or entities from the article.
Focus on specific named entities, technical terms, concepts, or themes that best represent the content.
Return ONLY a JSON array of strings with no explanation or other text. Example: ["Climate Change", "United Nations", "Paris Agreement"]`,
},
{
role: "user",
content: contextText,
},
]);
// Parse the response to get the keywords
let keywords = [];
try {
// Get the content from the response
const contentText = keywordResponse.content.toString();
// Extract the JSON array from the response
const jsonMatch = contentText.match(/\[.*\]/s);
if (jsonMatch) {
keywords = JSON.parse(jsonMatch[0]);
}
} catch (error) {
console.error("Error parsing keywords:", error);
continue;
}
// Add each keyword and create relationship to article
for (const keyword of keywords) {
await session.run(
`
MERGE (k:Keyword {name: $keyword})
WITH k
MATCH (a:Article {articleId: $articleId})
MERGE (a)-[:HAS_KEYWORD]->(k)
`,
{
keyword,
articleId,
}
);
}
console.log(`Added ${keywords.length} keywords to article: ${title}`);
}
} catch (error) {
console.error("Error extracting keywords:", error);
throw error;
} finally {
await session.close();
console.log("Keyword extraction completed");
}
}
End Result
Articles now have attached Keyword nodes, which can be included in semantic search queries for more comprehensive results.
By adding keyword nodes to the knowledge graph, we make the dataset richer and more connected, improving semantic search and contextual discovery. In the screenshot, you can see an Article node (for an article about Trails in the Sky 1st Chapter) linked to several Keyword nodes. These keywords, like Matthew Mercer, Steam, PlayStation, and Falcom, are important entities or themes taken from the article.
Updated Semantic Search Code
src/services/semanticSearch.ts
If you compare between what is on main
vs expanding-kg-data
branch for the example repo, you’ll see that the semantic search code changed to include Keyword
relationships in the query.
const cypherQuery = `
CALL {
CALL db.index.vector.queryNodes('articleTitleVector', toInteger($topK), $queryEmbedding)
YIELD node as article, score
RETURN article, score
UNION
CALL db.index.vector.queryNodes('chunkTextVector', toInteger($topK), $queryEmbedding)
YIELD node as chunk, score
MATCH (article:Article)-[:FIRST_CHUNK|NEXT_CHUNK*0..]->(chunk)
RETURN article, score
UNION
CALL db.index.vector.queryNodes('keywordVector', toInteger($topK), $queryEmbedding)
YIELD node as keyword, score
MATCH (article:Article)-[:HAS_KEYWORD]->(keyword)
RETURN article, score
}
MATCH (article)-[:FROM_SOURCE]->(source)
OPTIONAL MATCH (article)-[:FIRST_CHUNK]->(firstChunk)-[:NEXT_CHUNK*0..3]->(relatedChunk)
OPTIONAL MATCH (article)-[:HAS_KEYWORD]->(keyword)
RETURN
score,
article.title AS title,
article.link AS link,
source.name AS sourceName,
article.published AS publishDate,
collect(DISTINCT relatedChunk.text) AS relatedContent,
collect(DISTINCT keyword.name) AS keywords
ORDER BY score DESC
LIMIT toInteger($topK)
`;
export async function searchSimilarArticles(query: string, topK: number = 5) {
const session = driver.session();
try {
const [queryEmbedding] = await embeddings.embedDocuments([query]);
const limit = Math.floor(topK);
const result = await session.run(cypherQuery, {
queryEmbedding,
topK: limit,
});
return result.records.map((record) => ({
score: record.get("score"),
title: record.get("title"),
link: record.get("link"),
source: record.get("sourceName"),
publishDate: record.get("publishDate"),
content: record.get("relatedContent").join("\n"),
keywords: record.get("keywords"),
}));
} finally {
await session.close();
}
}
Even more specific context for RAG
Now, when we run the RAG code with a prompt:
“Which games feature Matthew Mercer as a voice actor?”
The response should be something like this:
- Facts and Announcements:
- Matthew Mercer voices Olivier Lenheim in the upcoming remake of "The Legend of Heroes: Trails in the Sky 1st Chapter." Source: RPGFan, 2025-03-13
- Supporting Details and Context:
- Matthew Mercer replaced Troy Baker as the voice of Olivier Lenheim in the series after the Sky FC and SC entries.
- The English voice cast for "Trails in the Sky 1st Chapter" is reprising their roles for the remake, ensuring continuity for fans.
- Conflicting or Ambiguous Reports:
- None identified.
- Key Topics and Entities:
- Matthew Mercer, Olivier Lenheim, The Legend of Heroes, Trails in the Sky 1st Chapter, Falcom, GungHo Online Entertainment, English voice cast
- Summary of all retrieved information:
Matthew Mercer is confirmed to voice Olivier Lenheim in the upcoming remake of "The Legend of Heroes: Trails in the Sky 1st Chapter." This announcement is part of a broader update that the English voice cast from previous entries will return for the remake, maintaining continuity for fans. Mercer took over the role of Olivier Lenheim from Troy Baker in subsequent entries after the initial Sky FC and SC games.
- List of sources:
- RPGFan - 2025-03-13 - [Link](https://www.rpgfan.com/2025/03/13/trails-in-the-sky-1st-chapter-cast-back/)
The similarity search might have found articles mentioning Matthew Mercer even without using keywords. However, by using keywords as entities in the knowledge graph, we now have an extra method to refine the context when necessary.
Subscribe to my newsletter
Read articles from Rodolfo Yabut directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
