PostgreSQL for a rusty MySQL brain

Kevin NaidooKevin Naidoo
6 min read

I come from a PHP background, hence why I prefer MySQL but when you dealing with Django and other Python projects, PostgreSQL seems to be the most popular and preferred DB server.

I have used PostgreSQL now and then throughout my career, it just feels weird to me with all the “\” commands. MySQL syntax is more English-like: “show tables; describe table …” so it’s easy to remember.

Nonetheless, PostgreSQL is a powerful open-source DB server that’s more than capable for any project size, this guide is more of a “cheat sheet” for me (and hopefully you too!).

⚠️ This article assumes you are using Linux, more specifically Ubuntu. Debian might also work, as well as other Linux server types with a few minor tweaks.

Translating MySQL commands into PostgreSQL

Connecting to a database:

\c db_name; # Postgres
use db_name; # MySQL

List databases:

\list # Postgres
show databases; # MySQL

Show tables:

SELECT tablename FROM pg_catalog.pg_tables;  # Postgres
SHOW tables; # MySQL

It’s worth noting that Postgres is going to show you all tables on the server (you can use a where clause though), whereas MySQL will just show you only tables in the current database.

Also, not to be a PostgreSQL hater, sorry but PostgreSQL will show you a colon and you need to keep pressing enter, MySQL on the other hand just lists everything.

Describe a table:

\d tablename; # Postgres
describe tablename; # MySQL

Using the CLI:

mysql -usomeuser -p1234 somedb
psql -U postgres -W somedb

With PostgreSQL, there is a user called “postgres”, so you can also run the following to access the DB with no password prompt:

sudo su - postgres
psql somedb

# OR
sudo -u postgres psql somedb

Where art thou PostgreSQL?

The configs are located in /etc/ on Linux which is very similar to MySQL. Seems there are 2 main config files to pay attention to:

Client authentication settings

This config will control who has access to your server

/etc/postgresql/16/main/pg_hba.conf

# "local" is for Unix domain socket connections only
local   all             all                                     peer
host    all             all             192.168.0.1/32          scram-sha-256
host    all             all             10.0.0.1/32             scram-sha-256

PostgreSQL works a tad bit differently than MySQL, in MySQL, you just create a user with an IP address and grant them access to whatever database you want e.g.:

# MySQL grant remote user access
create user me@192.168.0.1 indentified by '1234';

In PostgreSQL on the other hand, you must create a user without the IP and then just grant them access by listing their IP in the config file mentioned above.

⚠️ For both DB’s you still need to bind the server on 0.0.0.0 or some network IP otherwise they won’t allow external connections.

Main PostgreSQL config:

/etc/postgresql/16/main/postgresql.conf

This is the main config file where you can set the bind address, data-directory path, and various other settings such as memory settings, worker configs, row security, logging, etc…

While you can change this in the main config, the default PostgreSQL data directory is located in:

/var/lib/postgresql/

Backups

On MySQL, Percona Toolkit is my go-to toolset. You can backup, archive data and do loads more complex management tasks while the DB server is online and serving requests.

Seems like there is a version for PostgreSQL as well, I briefly read through but haven’t used PostgreSQL that comprehensively on a large DB to warrant the use of Percona.

For simple backups, you can simply run (which is similar to mysqldump):

pg_dump -U postgres -d dbname -f backup.sql

This will create a SQL dump file. I tend to avoid SQL dump files like the plague, well at least using MySQL anyway because when you restore on a new server, there could be subtle differences in the versions which can cause the restore to fail.

Furthermore, each statement has to be evaluated and run so if you have a 500GB db for example, this will be slow even if you have fast disks.

A better option is to use a filesystem backup, these are much faster and less prone to break when restoring since you copying an entire PostgreSQL instance’s directory:

pg_basebackup -D backup -Fp
  • -D: the directory to store the backup in.

  • -F: Format, can be plain or “tar”. Usually “p” for plain is faster but takes up more disk space. Since the “t” for tar option creates a tarball archive, this can be slow depending on how good your disk IO is.

You can then use the following to restore:

rsync -av --progress backup/ /var/lib/postgresql/16/main/
chmod 700 /var/lib/postgresql/16/main
chown -R postgres:postgres /var/lib/postgresql/16/main

Timescale

The great thing about PostgreSQL is it allows you to customize the DB via extensions, this opens up a world of possibilities, allowing the community to extend the core feature set in whatever way needed.

One such impressive extension is called Timescale, as the name suggests, Timescale allows you to store and query large time-series datasets. When querying large tables in an RDBMS (hundreds of millions of rows), you will eventually notice queries becoming incredibly slow, even with indexes and optimizations. You then would need to resort to more expensive hardware, sharding, and replication to get around these issues.

Timescale on the other hand can query and write to such tables incredibly fast, and you probably won’t even need to shard or set up replication (and be able to run on cheaper hardware too).

To install Timescale on Ubuntu for an existing PostgreSQL server, use the following script (or view the full step-by-step instructions on Timescale’s website here):

echo "deb https://packagecloud.io/timescale/timescaledb/ubuntu/ $(lsb_release -c -s) main" | sudo tee /etc/apt/sources.list.d/timescaledb.list
wget --quiet -O - https://packagecloud.io/timescale/timescaledb/gpgkey | sudo gpg --dearmor -o /etc/apt/trusted.gpg.d/timescaledb.gpg

sudo apt update
sudo apt install timescaledb-2-postgresql-17 postgresql-client-17
sudo timescaledb-tune


sudo systemctl restart postgresql


sudo -u postgres psql
CREATE EXTENSION IF NOT EXISTS timescaledb;

Once you have Timescale installed, you can setup a “hypertable” as follows:

CREATE TABLE meter_readings (
   time        TIMESTAMPTZ       NOT NULL,
   warehouse   TEXT              NOT NULL,
   unit_code   TEXT              NOT NULL,
   reading DOUBLE PRECISION  NULL,
);


SELECT create_hypertable('meter_readings', by_range('time'))

So basically, 95% percent is the same except for the “create_hypertable” function which sets up all the Timescale optimizations on your table.

PGVector

Another great extension is “PGVector”; if you are building a RAG-powered chatbot or need to perform similarity searches, you will need to vectorize your data and store those vector embeddings somewhere. Traditional databases like PostgreSQL and MySQL, are not designed for vector embeddings.

You will need to reach for a DB like Qdrant, which is built for that kind of storage.

Except! One problem, your project runs on PostgreSQL and it’s such a pain to introduce a new service into your stack for a dataset that’s not going to exceed maybe 10-20k worth of documents.

In such a case PGVector, while not as fast as some of these vector-optimized storage solutions, is more than sufficient for your use case.

Since it’s just a PostgreSQL extension, installing is a breeze:

git clone --branch v0.8.0 https://github.com/pgvector/pgvector.git
cd pgvector
make
make install

Next in the PSQL shell (create extension only needs to be run once):

CREATE EXTENSION vector;

CREATE TABLE documents (
   id bigserial PRIMARY KEY,
   name VARCHAR(100) NOT NULL,
   embedding vector(1536)
);

I used “1536” because that refers to the dimensions of the vector embeddings and the size of the small text embedding model from OpenAI.

You can then insert and query data as follows:

INSERT INTO documents (embedding) VALUES ('[1,2,3...]'), ('[4,5,6....]');

SELECT * FROM items ORDER BY embedding <=> '[3,1,2...]' LIMIT 5;

The above “select” will perform a cosine similarity search and return the top 5 results. Bit of a schlep to convert those floating point numbers into the PostgreSQL format, but still far easier than maintaining another vector DB store.

0
Subscribe to my newsletter

Read articles from Kevin Naidoo directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Kevin Naidoo
Kevin Naidoo

I am a South African-born tech leader with 15+ years of experience in software development including Linux servers and machine learning. My passion is web development and teaching. I often love experimenting with emerging technologies and use this blog as an outlet to share my knowledge and adventures. Learn about Python, Linux servers, SQL, Golang, SaaS, PHP, Machine Learning, and loads more.