Building a simple redis like data-store "crowRedis" in Python
Why I made this, what compelled me :
Two Reasons - First:
So I was tired of learning all that theory of databases, and there were lots of people who would yap about their great sage-like knowledge about databases. They would use fancy-sounding words and make it look like some hidden big-boy club knowledge.
And close to the ego-maniac I am, I decided to build most of the core functionality.
I don't like when people act like they know more than me and lecture me, You know the theory, Well I know how to build that.
Challenges:
So I did not just want to store some key-value pair using a hashmap in memory/RAM and then say yay I built a database, no one would take me seriously lol , I would look like a noob(I totally am).
Nah I took a paper and wrote the things I wanna implement that are very core to Redis like the database:
Basic Operations like Get, Set, Delete
Persistence: Snapshot and AOF
Transaction Support: I wanted to support multiple concurrent transactions as a challenge.
ACID: I know I alone in 5 days can't make it fully ACID compatible, but the current version is quite atomic.
Cucurracny: Current code support, concurrent operations, multiple concurrent transactions, and separate threads for most of the tasks.
TTl(TIme to Live): I have implemented that too, but for some reason when I try to get data that was not set up with the TTL flag, I get errors, so I am working to fix that.
PUB/SUB: Yeah I implemented that too, so you can subscribe to a channel and when you push some data to it, others who have subscribed to the same channel get those msg's instantly, but currently in my client if I push msg from one instance, it does not appear in other instance running of the same client subscribed to the same channel. **Either it's my architectural issue in sockets, or I need to implement a RabbitMQ solution for global msg sharing.
How I started:
So the first thing I searched on Google was, Signs you are stupid?
Ok, I searched, for how databases or a datastore work, and how does redis work, and then I read some 5-10 blogs, pdf's about database and their functions.
Write your own miniature Redis with Python
`I also read the actual paper by F. Codd's , that famous IBM paper, I wanted to know the mindset of these people who were tackling this problem the first time and how they thought.
I also searched how a request works, I know it sounds like basic stupid stuff, but me is very stupid. It takes me a lot of time to understand even basic things and their working, and I don't like complex language with big-sounding jargon, for even small things I need like 3-to 4 real examples with different circumstances and parameters for me to understand.
Slowly and steadily I understand them, but I like learning how things work, I have my own pace and way of dealing with things.
Until I know the workings of something from the core, basics, my mind does not let me do anything.
lol see the questions I asked chat gpt :
Github:
https://github.com/biohacker0/crowRedis
Architecture
Component | Description |
Redis Client App(s) | Application(s) that interact with the Redis server over the network. |
Redis Server (6381) | Redis server instances running on port 6381, written in Python. |
Data Store (Key-Value) | In-memory storage for key-value pairs. |
Snapshot (Persistence) | Periodically saves data to a snapshot file for durability. |
Append-Only File (AOF) | Logs each command for recovery and replication. |
Snapshot File (txt) | Holds a snapshot of the data for data recovery. |
Transactions | Support for multi-command transactions using MULTI, EXEC, DISCARD, etc. |
Transaction Handling | Functionality for handling transaction commands and operations. |
List Operations (LPUSH, RPUSH) | Commands for adding elements to lists (left and right). |
Data Retrieval (GET) | Command to retrieve values associated with keys. |
Data Storage (SET, DEL) | Commands for storing and deleting key-value pairs. |
Persistence (Snapshot, AOF) | Functions for data persistence, including snapshot and AOF. |
Network | Communication via TCP/IP between client applications and the server. |
Socket Server
At the core of this crowRedis server is a socket server that listens for incoming connections. It accepts client connections and spawns a new thread to handle each client's requests concurrently. The server listens on a specified host and port.
Data Store
The server maintains an in-memory data store, a Python dictionary, to store key-value pairs. This data store is the heart of Redis, and it supports operations like SET, GET, and DEL.
Persistence
server supports two forms of data persistence: snapshots and AOF.
Snapshots
Periodically, the server creates snapshots of the data store by writing all key-value pairs to a snapshot file. This allows the server to recover its state in case of a crash. Snapshots are created at specified time intervals.
Append-Only File (AOF)
The Append-Only File records all write operations as commands. It ensures durability by replaying these commands in case of server crashes. This feature can be enabled or disabled as needed.
Transactions
Redis supports transactions, a sequence of commands executed as a single atomic operation. Our simplified Redis server implements basic transactional commands: MULTI, EXEC, and DISCARD.
Features
Key-Value Operations
The server supports the following key-value operations:
SET: Set a key to hold a string value.
GET: Get the value of a key.
DEL: Delete a key.
List Operations
Redis can handle lists, and our server supports the following list operations:
LPUSH: Insert values at the beginning of a list.
RPUSH: Insert values at the end of a list.
LPOP: Remove and return the first element of a list.
RPOP: Remove and return the last element of a list.
LRANGE: Get a range of elements from a list.
Data Persistence
The server can save its data to a snapshot file and recover from it on startup. It also supports an Append-Only File (AOF) for command logging and recovery.
Transactions
The server implements a basic form of transactions. It allows clients to initiate a transaction, add multiple commands to it, and then either execute or discard the transaction as a whole.
Exploring the Code
Let's dive into the code and understand the functions responsible for these features.
Key-Value Operations
The handle_set
, handle_get
, and handle_del
functions handle SET, GET, and DEL operations, respectively. They interact with the data store and log changes to the AOF.
Data Persistence
The save_snapshot
function creates snapshots of the data store, while load_snapshot
and load_aof
load data from the snapshot and AOF files, respectively.
Transactions
Transaction-related functions include handle_transaction
, execute_transaction
, and the functions for list operations. These functions ensure that commands within a transaction are executed atomically
crowRedis Functions and their workings:
We can divide this into I guess, basic operations, persistence, and transactions.
1. Key-Value Operations (SET, GET, DEL) πͺ
handle_set(key, value) πͺπ
Input: Accepts a
key
and avalue
.Output: Responds with "OK" upon success.
Purpose π: To store a key-value pair in the data store and log the action.
How it Works π: This function takes a key and a value as input and stores them in the data store. It also records this action in the Append-Only File (AOF).
handle_get(key) πͺπ
Input: Requires a
key
.Output: Returns the associated value or "nil" if the key is not found.
Purpose π: To retrieve the value associated with a key from the data store.
How it Works π: This function takes a key as input and looks up the associated value in the data store, returning the value if found, or "nil" if the key is not present.
handle_del(key) πͺποΈ
Input: Expects a
key
.Output: Indicates success with "1" or "0" if the key is not found.
Purpose π: To delete a key-value pair from the data store and log the action.
How it Works π: This function takes a key as input, removes the corresponding key-value pair from the data store if it exists, and records the deletion in the AOF.
2. Data Persistence (SAVE, Snapshot, AOF) πΎ
handle_save() πΎ
Input: No specific input required.
Output: Confirms with "Data saved to snapshot file."
Purpose π: To create a snapshot of the current data and save it for later recovery.
How it Works π: This function generates a snapshot of the data and stores it in a snapshot file, ensuring that data is preserved.
save_snapshot() πΈ
Input: None, it simply captures the current data state.
Output: Quietly creates a snapshot for future reference.
Purpose π: To create a snapshot of the current data state, suitable for recovery.
How it Works π: This function iterates through the data and writes key-value pairs to a snapshot file, effectively creating a snapshot of the data.
append_to_aof(command) πβ‘οΈπ
Input: Accepts a command to append to the AOF.
Output: Appends the command to the AOF file for future recovery.
Purpose π: To log commands in the Append-Only File (AOF) for recovery purposes.
How it Works π: This function takes a command as input and appends it to the AOF file, ensuring that all commands are recorded for future recovery.
recover_from_aof() ππ
Input: Scans the AOF for stored commands.
Output: Restores data by executing the commands from the AOF.
Purpose π: To recover data by reading and executing the commands stored in the AOF.
How it Works π: This function reads the AOF file, parses the stored commands, and executes them to reconstruct the data state.
3. Transactions (MULTI, EXEC, DISCARD) π
handle_transaction() π
Input: Initiates a transaction with an empty command list.
Output: Prepares the transaction context for future commands.
Purpose π: To start a transaction and prepare a context to collect commands.
How it Works π: This function initializes a transaction context and begins collecting commands for execution.
Transaction Commands (e.g., LPUSH, RPUSH) β
Input: Listens for various transactional commands.
Output: Adds received commands to the transaction context.
Purpose π: To collect and store transactional commands in the context for later execution.
How it Works π: This function listens for transactional commands and adds them to the list of commands to be executed within the transaction.
handle_transaction_execute() β¨
Input: Executes the collected transaction commands.
Output: Applies the changes made by the transaction commands.
Purpose π: To execute and apply changes made by the collected transaction commands.
How it Works π: This function takes the collected transaction commands, executes them sequentially, and applies the changes to the data store.
handle_transaction_discard() ποΈβ
Input: Discards the collected transaction commands.
Output: Clears the transaction context, discarding all collected commands.
Purpose π: To discard all collected transaction commands and return to a clean state.
How it Works π: This function clears the transaction context, effectively discarding all previously collected commands
This was the explanation for the corwRedis.py file, but that is just a server, we need a client to interact with the server, where is that tho.
great, now let's discuss our client code (client.py)
Client Code explanation:
Why crowRedis and its Client? π€
crowRedis like actual redis is a versatile, in-memory data store that serves as a key-value database and a high-speed cache. To interact effectively with crowRedis, we need a clientβa connector that enables the communication between Python and Redis. With a client, we can send commands to crowRedis and receive its responses, making our interaction with this powerful database smooth and efficient.
Getting Set Up βοΈ
Before we start coding, we need to make sure our environment is ready:
Python Environment π: Ensure Python is installed on your system.
Socket Module π: We'll be using the
socket
module to establish connections.
Understanding the RedisClient Class ποΈ
The core of our Redis client is the Client
class. Here's an overview of how it works:
Initializing the Connection Parameters π: We set the host and port to connect to the Redis server.
Handling Connections π: Our client establishes connections when needed and closes them when we're done.
Sending and Receiving Commands π€π₯
The Client is responsible for sending commands and receiving responses. Here's a high-level view of how this process works:
Sending Commands βοΈ: We send crowRedis commands to the server. The client ensures they are correctly formatted before transmitting them.
Receiving Responses π¬: After sending a command, we listen for and decode responses from the crowRedis server, converting them to a readable format.
Transaction Support πΌ
Transactions allow us to bundle multiple Redis commands into a single unit of work. Here's how our crowRedis client supports this:
Introduction to Transactions πΌ: Transactions play a crucial role in ensuring data consistency and reliability.
Using "MULTI" π: We initiate transactions with the "MULTI" command.
Running Transactions π: Transactions are executed with "EXEC," and we can cancel them with "DISCARD."
π¦
Think of the
crowRedis.py
server as the main boss. It has all the functions to do things, but hey, the boss doesn't talk to normies like us. π€΅π©βπΌπ§βπΌ He's got big stuff to do. So, he hired a client(client.py), to handle all the requests and talk with the user. We the users, go to the client and say, "Ayo client, *looks around*, I need to set my stuff in your memory database, okay? Tell the big boss to do it."
The client goes to the boss (server) and gives him our request in a chit where it mentions what we want to do (A query).
All this talk happens via socket protocols. π§¦π¦π
Benchmark Test(basic) :
Benchmark Test
In this benchmark test, I compared crowRedis
, postgreSQL
, and real Redis
against each other on the same hardware.plz don't take this seriously, cause I am comparing a relational database with a memory database, but I wanted to see how much is a RAM-based database faster a disk-based one.
Also, real Redis uses a very complex mechanism for set,get,del of data and, mine is way too simplistic, that's why it's doing so fast operations.
I am also learning things and might make some stupid comparisons so plz forgive me, I will learn what I don't know and improve βοΈ
PostgreSQL
Metric | Value | Database |
INSERT | 0.1802 seconds | postgreSQL |
UPDATE | 1.6753 seconds | postgreSQL |
DELETE | 0.2250 seconds | postgreSQL |
TRANSACTIONS | 0.0680 seconds | postgreSQL |
Throughput | 1470.95 transactions per second | postgreSQL |
Average response time | 0.0007 seconds | postgreSQL |
crowRedis
Metric | Value | Database |
Total time taken | 0.021941661834716797 seconds | crowRedis |
Throughput | 4557.54 transactions per second | crowRedis |
Average response time | 0.0002 seconds | crowRedis |
Benchmark SET | 1000 requests in 0.4349 seconds | crowRedis |
Benchmark GET | 1000 requests in 0.0271 seconds | crowRedis |
Benchmark DEL | 1000 requests in 0.0322 seconds | crowRedis |
Redis
Metric | Value | Database |
Total time taken | 0.016948461532592773 seconds | Redis |
Throughput | 5900.24 transactions per second | Redis |
Average response time | 0.0002 seconds | Redis |
Benchmark SET | 1000 requests in 0.0280 seconds | Redis |
Benchmark GET | 1000 requests in 0.0320 seconds | Redis |
Benchmark DEL | 1000 requests in 0.0315 seconds | Redis |
My Troubles:
So building a working database/datastore, even a simple one Like mine is not as easy as it looks in theory, I promise you, any change you make affects everything, if you change the get, and set, then you need to make sure that transactions do not falter.
Currently, I am facing Two Issues:
1: TTL Functionality: when I added TTL support and then tested to set some data with the TTL flag, it worked fine, the subsequent get also worked fine.
But when I then set normal data, just after I used a ttl operation, the get fails for some reason.
I ran the code on actual paper, and I don't see any issues , But I will fix it.
2: PUB/SUB: So I have my custom client, where if you subscribe to a channel ex: Channel-1 and then open another instance of my client in the terminal and subscribe to the same channel Channel-1.
Then if you publish a msg to channel-1 it appears in the terminal of instance-1, but not in in instance-2 and vice versa.
I can set my sockets to handle inter-instance global msg sharing, but chat GPT says, it's better to implement RabbitMQ for this. I am gonna go with RabbitMQ, cause those sockets mess up with my normal set and get also, like if I subscribe and publish some data, and right after that set some data normally, the current code treats it as a published msg, so there is lots of these mix match thing.
I don't know if it's due to my client architecture or the main server architecture, every time I implement a new feature an old thing gets messed up, and I have to re-think how everything should work with the current change in mind and make changes.
But I like it, with this, I learned more about how computers work, the RAM, the threads, the OS, and communication protocols.
why is Redis used, when to use it, and when to use a relational database. How do these things work internally and much more.
**I am not saying I have become some Jedi coder, but now I don't fear or get anxious when I hear words like: concurrency, B-tree, rollback, ACID, Sharding, Partitions and whatnot.**
Bye Bye :3
Subscribe to my newsletter
Read articles from Biohacker0 directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
Biohacker0
Biohacker0
I am a software engineer and a bioinformatics researcher. I find joy in learning how things work and diving into rabbit holes. JavaScript + python + pdf's and some good music is all I need to get things done. Apart from Bio and software , I am deeply into applied physics. Waves, RNA, Viruses, drug design , Lithography are something I will get deep into in next 2 years. I will hack biology one day