How to Implement Custom JSON Utility Procedures With Memgraph MAGE and Python
Introduction
Oftentimes you find yourself unable to come up with the perfect query that fits the problem at hand. Every query language has its disadvantages and Cypher is no exception. But thankfully, there is always the option of writing your own custom procedures.
Memgraph introduces the concept of query modules which are collections of custom Cypher procedures. You can implement them using a Python or C API.
In this tutorial, you will go through the process of implementing a few simple utility procedures to load and export data in a JSON format.
Introducing Memgraph MAGE
MAGE stands for Memgraph Advanced Graph Extensions. It's an open-source project started by Memgraph that encourages developers to share innovative and useful query modules so the whole community can benefit from them.
You can find the MAGE repository on this link.
Prerequisites
To complete this tutorial, you will need:
- An installation of Memgraph DB: a native fully distributed in-memory graph database built to handle real-time use-cases at enterprise scale. Follow the Docker Installation instructions on the Quick Start page to get started.
- An installation of Memgraph Lab: an integrated development environment used to import data, develop, debug and profile database queries and visualize query results.
Importing Data from JSON Files
Memgraph doesn't come with the option of handling JSON out of the box. So what are your options if you need this feature in an upcoming project?
Well, there are actually two ways of importing such data:
- Independently from Memgraph,
- Using query modules in Memgraph.
The first option is a pretty straightforward hack. You just parse the needed JSON document and create the appropriate queries for populating your database. This way, Memgraph has no knowledge about the JSON file, you have to handle it completely by yourself and only run the finished queries with the data extracted from the JSON file.
The second option is a bit more elegant is what you'll be learning in the rest of this tutorial.
Writing Custom Cypher Procedures
First things first. To start working on a new query module, you need to be familiar with the development process. If you are running Memgraph on anything other than Docker then continue with the next paragraph, otherwise, skip to the Developing Custom Query Modules using Docker section.
Upon startup, Memgraph will attempt to load the query modules from all *.so
and *.py
files it finds in the default (/usr/lib/memgraph/query_modules
) directory. If you want to change the directory in which Memgraph searches for query modules, just change the --query-modules-directory
flag in the main configuration file (/etc/memgraph/memgraph.conf
) or supply it as a command-line parameter (e.g. when using Docker), for example:
docker run -p 7687:7687 --query-modules-directory /usr/lib/memgraph/new_query_modules memgraph
If you want to add a new query module, it needs to be placed in this directory. It will automatically load when Memgraph starts, but you can also reload it while the database is running by executing the following query:
CALL mg.load("QUERY_MODULE_NAME")
Developing Custom Query Modules Using Docker
When using Docker, you don't have direct access to the default query modules directory because it is within the Docker container. Create a volume and mount it to access the /usr/lib/memgraph/query_modules
directory. This can be done by creating an empty directory modules
and executing the following command:
docker volume create --driver local --opt type=none --opt device=~modules --opt o=bind modules
Now, you can start Memgraph and mount the created volume: `
docker run -it --rm -v modules:/usr/lib/memgraph/query_modules -p 7687:7687 memgraph
Everything from the directory /usr/lib/memgraph/query_modules
will be visible/editable in your mounted modules
volume and vice versa.
Implementing the JSON Utility Query Module in Python
You will name the query module json_util.py
because it will contain utility functions that are needed to work with JSON files. For now, let's implement the following three procedures:
- Load JSON from a local file
- Load JSON from a remote address
- Export nodes as JSON document
1. Loading JSON from a Local File
In your json_util.py
module add the following code:
import json
import mgp
import urllib.request
@mgp.read_proc
def load_from_path(ctx: mgp.ProcCtx,
json_path: str) -> mgp.Record(objects=mgp.List[object]):
with open(json_path) as json_file:
objects = json.load(json_file)
if type(objects) is dict:
objects = [objects]
return mgp.Record(objects=objects)
With this, you have implemented your first procedure. The @mgp.read_proc
decorator registers the function as a read-only procedure of the current module. The if
statement makes sure that the procedure returns a list
even if it's just one element. This will be useful for working with the data later on.
How do you test this procedure? Let's create a file in the /usr/lib/memgraph/query_modules
directory and name it data.txt
. Place the following content in it:
[{"name":"Leslie"}, {"name":"Ron"}, {"name":"Donna"}]
Start Memgraph Lab if you haven't done so already and run the following query:
CALL json_util.load_from_path("/usr/lib/memgraph/query_modules/data.txt")
YIELD *
RETURN *
2. Loading JSON from a Remote Address
While loading data from local files can be helpful, especially when developing your new procedure, there is a bigger need for a procedure that loads data from a remote location via URL. Thankfully, you only have to add a small adjustment to the load_from_path()
function to achieve this functionality. Let's name this new procedure load_from_url
:
@mgp.read_proc
def load_from_url(ctx: mgp.ProcCtx,
json_path: str) -> mgp.Record(objects=mgp.List[object]):
with urllib.request.urlopen(json_path) as url:
objects = json.loads(url.read().decode())
if type(objects) is dict:
objects = [objects]
return mgp.Record(objects=objects)
You can test it by running the following query:
CALL json_util.load_from_url('ADDRESS')
YIELD objects
UNWIND objects AS o
RETURN o.name
3. Exporting Nodes as a JSON Document
This procedure will receive a list of nodes and save them in JSON format to a local file.
@mgp.read_proc
def export_nodes(ctx: mgp.ProcCtx,
nodes: mgp.List[mgp.Vertex],
file_path: str
) -> mgp.Record(success=bool):
json_nodes_list = []
for node in nodes:
json_node = {}
json_node['labels'] = []
for label in node.labels:
json_node['labels'].append(label.name)
json_node['properties'] = dict(node.properties.items())
json_nodes_list.append(json_node)
with open(file_name, 'w') as fp:
json.dump(json_nodes_list, fp)
return mgp.Record(success=True)
You can test the procedure by running:
MATCH (n)
WITH COLLECT(n) AS listn
CALL json_util.export_nodes(listn, "/usr/lib/memgraph/query_modules/data.json")
YIELD success RETURN success
The file data.json
should be in the /usr/lib/memgraph/query_modules
directory.
Conclusion
In this tutorial, you learned how you can easily add additional functionalities to the Cypher query language by writing your own procedures. While importing data from JSON documents is considered more of a utility procedure, query modules can be a powerful tool for writing custom graph algorithms or implementing all kinds of constructs from the realm of graph theory.
If you are working on your own query module and would like to share it, take a look at the contributing guidelines. We would be more than happy to provide feedback and add the module to the MAGE repository.
For a more in-depth explanation of how to create your own custom Cypher procedures take a look at our documentation. If you would like more step-by-step tutorials exploring custom query modules, make sure to read our How to Write Custom Cypher Procedures with NetworkX and Memgraph tutorial.
Subscribe to my newsletter
Read articles from Memgraph directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
Memgraph
Memgraph
Open Source Graph Database Built For Real-Time Streaming Data, Compatible With Neo4j.