Steps I took

Rewrite drizzle schema from sqlite to PostgreSQL and generate migration (*.sql) files.
Setup connection using Neon WebSocket proxy and test migration and code.
Attempt various ways to copy and settle on COPY via CSV type using stdout and stdin.
Perform complete migration during not busy moment.

Why did I do this?

In the early days of my little project for citizens of the world, I made a mistake by optimizing prematurely. I had initially chosen Cloudflare’s D1 database but I quickly outgrew the limits and was forced to migrate to Turso.

I wrote about that journey in this https://lukasnotes.dk/migrating-large-d1-to-turso post.

After using the cloud option and the paid plan for a few months, I was still hitting the limits and seeing my bill grow even though there wasn’t that much usage. Without going into details, due to nature my app produces a lot of writes, which would cause a lot of changes in the WAL file, and that is how libsql tracks changes and sync to the local embedded replicated database. I do that to have zero latency reads from Laravel.

I had built a simple sync script, should you find that relevant. https://github.com/flexchar/turso-sync

The main Laravel app leverages PostgreSQL. The god level database as some address in the rise of AI. It made me double my work in certain cases and limited the type of queries I could write since data was split across several databases - Postgres and Sqlite.

After some procrastination, last November I got myself together and began tackling the challenge.

Is Turso bad? Is it not good enough?

No, it is not bad at all. It’s all about picking the right tool for the job.

It is amazing what the guys behind Turso are doing with the Sqlite’s fork, libsql. I had a chance to have a call with the very man behind, Glauber Costa as well as attend a few weekly meetings on their Discord. They’re going strong and I believe they will go far.

Let alone their recent announcement for a complete Sqlite rewrite in Rust - the project called Limbo (https://github.com/tursodatabase/limbo).

I think for standard blogs or simple CRUD apps, it’s more than enough. Don’t hesitate too much with “how but start now”.

For me, I had cloned my app from another project where it as all about latency and I kept driving the road without realizing it was the wrong path for me.

Making a plan

I had quite a challenge actually doing the “not how but start now” as I wanted the perfect option. It’s such a weakness of mine. The first problem was to pick between:

A: do I create entire new database within the same pgsql container, or
B: do I run migrations in the same database that Laravel uses and is running its own migrations.

Neither sounded too nice. Route A would prevent me from making joints across tables in both sources. Route B could potentially cause issues with migrations, backups and collide each other.

Then came the thought of Route C. Let’s use the same database but different schemas. It’s like a namespace to group tables. It’s something unique to Postgres as neither Mysql nor Sqlite has it.

There’s a nice explanation on Reddit: https://www.reddit.com/r/PostgreSQL/comments/qlqmyh/can_someone_help_me_understand_schemas_and_how/. I had never used them myself but Laravel by default uses public schema. I decided to create drizzle schema and have a logical separation for the migrations and data being written from the worker.

It felt fine with me, and in retro, I can still say, it was a good decision.

Establishing a connection to self-hosted Postgres from Cloudflare Worker

Before copying the data, I updated my worker’s Drizzle config to connect to Postgres. It didn’t come without challenges.

I care very much about the security and privacy, and I would really like to live my life without ending up leaking anyone’s data. I also self-host entire stack as it’s absurdly cheap on Hetzner. Even if one was to SSH on the server, the database connection is not exposed. There are no ports and TCPs. I use Docker and leverage UNIX sockets for ultimate performance. I absolutely love that!

Anyways, my ideal solution was to use Cloudflare Tunnel to tap into the container and expose the database on internal domain (protected by Cloudflare Zero Access) to my worker, also running on CF.

The issue is that worker’s runtime and postgres-js package did not support connecting directly. I could get some things to work locally but then not on the cloud/production, or other way around.

Fortunately, Neon Tech (the serverless & bottomless storage postgres startup) has a wonderful container to proxy requests from serverless environments to any postgres instance. However as I mentioned before, I prefer to use UNIX sockets. I happened to get very lucky to mix bits of Go knowledge with Claude Sonnet and have an updated code to connect via socket.

So now I had a docker compose stack with pgsql, neon-ws-proxy and cloudflare-tunnel exposing under private domain. After tinkering a bit with Drizzle Neon Client, I was able to connect. One catch, I have to close connections myself as sometimes it’d get issues.

Boom, it felt very nice. I was able to run migrations and had a list of tables in drizzle schema, right in my Laravel’s existing database.

Copying the data. Slow start, fast finish.

I have users using almost all day long across the globe. Even though I can afford a bit of data loss due its nature in this table, I really want that to be as close to zero rows as possible. That means, I want to migrate from production to my local PG. Check everything runs smoothly, if everything seems & feels good, deploy my worker and start migrating ASAP while new requests are written to the new database and some will temporarily fail due otherwise missing data.

💡

A larger organization with zero affordance for data loss would typically create a fallback option, where either they attempt to read from the new DB and fallback to the old DB; or keep the existing code and start backfilling data into new DB until it’s completely in sync, then flipping the switch.

Via script to read and insert (failed route)

Initially I ought to copy via writing simple script in TS and using drizzle to read from Turso connection and insert into PostgrSQL connection. In short, I’ve wasted a good day and it was a pure failure. It was too slow.

It seems that for whatever the reason, Turso reads are slow, and if reading a table with bigger rows, I’d quickly exceed the maximum 10MB response size. A table of several millions of rows but I can only query a few hundred at most per time. That went slow. To put into perspective, I had around 30GBs of data and the migration could take entire week. I still don’t know why it was that slow. But this is also one of the reasons why I wanted to migrate. Sometimes abnormal things would occur with LibSQL instance and the Cloudflare sitting in front of it.

Since I was self-hosting libsql for around 4 months before the migration, I established a local connection forwarding the port from the server to my laptop, then I used Turso CLI turso db shell http://localhost:8080?token={authTokenHere} to access the shell.

Via `.dump` and COPY (success)

I then searched for possible tools to automagically import sql from Sqlite into Postgres. While there are several options, none gave me the magic and I came across the copy command and the csv route.

I dumped entire database to the datalake.db file using turso cli.

After a bit of trial and error, I found that dumping a single table to CSV file, copying it to the pgsql container and then loading via PG’s COPY statement I could import millions of rows in a matter of 1-3 minutes.

I had few gotchas.

I had to make sure that the table is completely empty before running the copy, or I would encounter unique constraint issues or sometimes some errors.
Next gotcha was converting the date time string from Sqlite’s milliseconds to a proper UTC string that PG could understand.
Finally, my source schema was built over time with a lot of incremental migrations adding new columns, however when I rewrote it for PG, I only had one migration with a bit different order of columns. Making sure the order of columns matches exactly in the source and target.

Eventually I nailed this process to the entire one-liner.

time sqlite3 /turso/datalake.db -header -csv "SELECT col_1,col_2,another_name,DATETIME(created_at/1000, 'unixepoch') as created_at FROM turso_table;" | PGPASSWORD=$DB_PASSWORD psql -U $DB_USERNAME -d $DB_DATABASE -c "\COPY drizzle.turso_table (my_col_1,my_col_2,my_another_col,created_at) FROM STDIN WITH (FORMAT csv, HEADER true)"

If you’re going to do the same, please read this carefully and play around to understand how it works. It gives far more value than a blind copy-paste.

Execution

That brings to the final step. I had successfully COPY-ied data to PostgreSQL. I have established connection from Drizzle in the Worker as well as able to read using Laravel Eloquent models by updating $table property to include the schema prefix.

I’m very happy!

I tweak my note file with copy paste commands ready for each database, I ensure I can access my production databases and containers.

I reflect on the usage logs and pick the least busy hour that happens to be around 7 AM CET and execute it.

It goes like:

Complete dump of fresh production database into an sqlite file.
Create fresh migration on Production and deploy the worker.
Dump each table as CSV from the sqlite file.
COPY the CSV into Postgres table, starting with most busy ones to the least ones.
Rinse & repeat until I’m done.
Deploy changes on Laravel app.
Enjoy!

Runtime adjustments.

When using Turso driver with Drizzle, I often would make a lot of calls in an executionContext.waitUntil() method, that essentially awaits promises after the worker has sent HTTP response.

Using PostgreSQL Neon Proxy driver, I quickly began facing Too Many Connections ocean of errors. I had to rewrite my endpoints to await before response and then explicitly close the connection from Neon driver.

Since lots of these endpoints are write only (think analytics type of API), it was rather un-impactful change to the end user. It did require me to do a bit of rewrite to have a better access to the DB client. I’m using HonoJS and its c.set(‘db’, databaseClient) is a wonderful way to pass it around.

Final thoughts. Was it worth?

Absolutely!

I wish I had done this from the get-go. Although back when I first started, some things weren’t available so I probably would have struggled more.

My greatest wins are

No more time outs or errors when SELECT'ing too much data. Sqlite cannot leverage multiple indices for faster queries, where as Postgres does it much faster.
1. Everyone says Sqlite is the fastest at read speeds, so I wouldn’t rule out a skill issue here.
Much less code to maintain!! Very important win. One less database to backup (and restore, should that ever happen).
I can write more complex and interesting queries. This is especially valuable for analytics. It has been an enormous pleasure.

Migrating from Turso (libsql) to self hosted PostgreSQL 16 & connecting from Cloudflare Worker (Nov 2024)

Table of contents

Steps I took

Why did I do this?

Is Turso bad? Is it not good enough?

Making a plan

Establishing a connection to self-hosted Postgres from Cloudflare Worker

Copying the data. Slow start, fast finish.

Via script to read and insert (failed route)

Via `.dump` and COPY (success)

Execution

Runtime adjustments.

Final thoughts. Was it worth?

Subscribe to my newsletter

Lukas

Lukas

Migrating from Turso (libsql) to self hosted PostgreSQL 16 & connecting from Cloudflare Worker (Nov 2024)

Table of contents

Steps I took

Why did I do this?

Is Turso bad? Is it not good enough?

Making a plan

Establishing a connection to self-hosted Postgres from Cloudflare Worker

Copying the data. Slow start, fast finish.

Via script to read and insert (failed route)

Via .dump and COPY (success)

Execution

Runtime adjustments.

Final thoughts. Was it worth?

Subscribe to my newsletter

Lukas

Lukas

Via `.dump` and COPY (success)