ScyllaDB - Getting started
Recently I read this article where Discord migrated its messages cluster from Cassandra to ScyllaDB, it reduced message latencies from 200 milliseconds to 5 milliseconds, which got me intrigued to explore ScyllaDB.
How Discord Migrated Trillions of Messages to ScyllaDB
Scylla is an open-source distributed NoSQL database that is compatible with Apache Cassandra, but it provides faster performance and lower latencies. Scylla is based on the C++ programming language, and it has been designed to take advantage of modern hardware that is high-core count CPUs and fast SSDs. Scylla is also designed to be scalable, fault-tolerant, and highly available.
In this blog post, we will look at the steps to use ScyllaDB, starting from installation to creating and querying data using the Scylla Query Language (CQL).
Prerequisites:
Before getting started with ScyllaDB, ensure that you have the following prerequisites:
• A Linux machine running on the Ubuntu operating system
• JDK 11 or higher installed
• Maven installed
• A basic knowledge of Cassandra Query Language (CQL)
• A text editor of your choice
Steps:
Install ScyllaDB:
To install ScyllaDB, we need to add the Scylla repository to our Ubuntu system. Then update the package list and finally run the command to install Scylla.
The following commands install the ScyllaDB 4.4 version on Ubuntu 20.04.
$ curl -o /etc/apt/sources.list.d/scylla.list \
https://repositories.scylladb.com/scylla/repo/\
scylladb-4.4-focal.list
$ apt-get update
$ apt-get install scyllaCopy Code
- Start ScyllaDB:
After installing ScyllaDB, we need to start the ScyllaDB service. To start the Scylla service, run the following command:
$ systemctl start scylla-serverCopy Code
- Create a keyspace:
To create a keyspace in Scylla, we can use the CQL command CREATE KEYSPACE. Keyspace is similar to a database in the relational world. It is a logical container for tables.
CREATE KEYSPACE myKeyspace WITH replication = {
'class': 'SimpleStrategy',
'replication_factor': '1'
};Copy Code
Here, we created a keyspace named "myKeyspace" with a replication factor of "1". The replication class "SimpleStrategy" is used here.
- Create a table:
To create a table, we can use the CQL command CREATE TABLE. A table is like a table in the relational world, which stores data.
CREATE TABLE myKeyspace.users (
user_id uuid PRIMARY KEY,
username text,
email text
);Copy Code
Here we created a table named "users" with three columns: "user_id," which is the primary key of type UUID, "username," which is of type text, and "email," which is also of type text.
- Insert data:
To insert data into the table, we can use the CQL command INSERT INTO.
INSERT INTO myKeyspace.users
(user_id, username, email)
VALUES (now(), 'john', 'john@example.com');Copy Code
Here, we inserted a row into the "users" table with a user_id generated by the UUID function now().
- Query data:
To query data from the table, we can use the CQL command SELECT.
SELECT * FROM myKeyspace.users;Copy Code
This command returns all the rows present in the "users" table.
- Update data:
To update any data in the table, we can use the CQL command UPDATE.
UPDATE myKeyspace.users
SET username = 'peter'
WHERE user_id = d7a57b06-28a7-4eb2-acad-f4fe3a529adf;Copy Code
Here, we updated the username from "john" to "peter" where the user_id is d7a57b06-28a7-4eb2-acad-f4fe3a529adf.
- Delete data:
To delete any data from the table, we can use the CQL command DELETE.
DELETE FROM myKeyspace.users
WHERE user_id = d7a57b06-28a7-4eb2-acad-f4fe3a529adf;Copy Code
This command deletes the row where the user_id is d7a57b06-28a7-4eb2-acad-f4fe3a529adf
.
Sample Code w/ Python Driver:
Now that we've covered the basics of Scylla DB, let's take a look at some sample code using the Python driver for Scylla DB.
from cassandra.cluster import Cluster
from cassandra.auth import PlainTextAuthProvider
# Connect to the Scylla cluster
cluster = Cluster(['127.0.0.1'], auth_provider=PlainTextAuthProvider(username='myusername', password='mypassword'))
session = cluster.connect('mykeyspace')
# Insert a row into the mytable table
query = "INSERT INTO mytable (id, name, age) VALUES (%s, %s, %s)"
session.execute(query, (2, 'Bob', 30))
# Select rows from the mytable table
query = "SELECT * FROM mytable WHERE age > %s"
rows = session.execute(query, (20,))
for row in rows:
print(row.id, row.name, row.age)
This code connects to the Scylla cluster and inserts a row into the "mytable" table with an ID of 2, a name of "Bob", and an age of 30. It then selects all rows from the "mytable" table where the age is greater than 20 and prints out the results.
Creating a Table:
from cassandra.cluster import Cluster
cluster = Cluster(['127.0.0.1'])
session = cluster.connect()
session.execute("""
CREATE KEYSPACE IF NOT EXISTS mykeyspace
WITH replication = {'class': 'SimpleStrategy', 'replication_factor': '1'}
""")
session.execute("""
CREATE TABLE IF NOT EXISTS mykeyspace.users (
user_id INT PRIMARY KEY,
first_name TEXT,
last_name TEXT,
email TEXT
)
""")
In this example, we first connect to the Scylla cluster using the Cluster object. We then create a new keyspace and table using CQL statements executed through the session object.
Inserting Data:
from cassandra.cluster import Cluster
cluster = Cluster(['127.0.0.1'])
session = cluster.connect('mykeyspace')
insert_query = """
INSERT INTO mykeyspace.users (user_id, first_name, last_name, email)
VALUES (%s, %s, %s, %s)
"""
session.execute(insert_query, (1, 'John', 'Doe', 'johndoe@example.com'))
session.execute(insert_query, (2, 'Jane', 'Doe', 'janedoe@example.com'))
In this example, we insert two rows into the "users" table. We use a parameterized query to pass in the values for the user_id, first_name, last_name, and email columns.
Querying Data:
from cassandra.cluster import Cluster
cluster = Cluster(['127.0.0.1'])
session = cluster.connect('mykeyspace')
select_query = """
SELECT * FROM mykeyspace.users WHERE user_id = %s
"""
result = session.execute(select_query, (1,))
for row in result:
print(row.user_id, row.first_name, row.last_name, row.email)
In this example, we query the "users" table for the row with user_id = 1. We use a parameterized query to pass in the value for the user_id column, and then loop through the result set to print out the values for each column in the row.
Updating Data:
from cassandra.cluster import Cluster
cluster = Cluster(['127.0.0.1'])
session = cluster.connect('mykeyspace')
update_query = """
UPDATE mykeyspace.users SET email = %s WHERE user_id = %s
"""
session.execute(update_query, ('johndoe_updated@example.com', 1))
In this example, we update the email address for the row with user_id = 1. We use a parameterized query to pass in the new email address and the value for the user_id column.
Deleting Data:
from cassandra.cluster import Cluster
cluster = Cluster(['127.0.0.1'])
session = cluster.connect('mykeyspace')
delete_query = """
DELETE FROM mykeyspace.users WHERE user_id = %s
"""
session.execute(delete_query, (1,))
In this example, we delete the row with user_id = 1 from the "users" table. We use a parameterized query to pass in the value for the user_id column.
Conclusion
ScyllaDB is a fast, scalable, and fault-tolerant NoSQL database. In this blog post, we went through the steps to install and use ScyllaDB on Linux. We also looked at the basics of CQL commands to create, query, update and delete data from a table. ScyllaDB has a lot of features that we did not cover in this blog post, such as data modeling, high availability, and performance tuning. In the future, we will cover these topics in more detail.
Subscribe to my newsletter
Read articles from Harsh Daiya directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
Harsh Daiya
Harsh Daiya
Sr. Data Engineer working mostly on Data and Observability problems. Writing mostly about Data and cloud, sometimes productivity and other musings.