Running LanceDB on OCI

Tom MooreTom Moore
5 min read

Lance DB is an open source vector database that can be used for multimodal data. It can handle things like text and images. Lance has either a serverless flavor that you can sign up for, or you can run it on your own using a number of back end options to suit your needs.

In this blog post I am going to cover how you can get LanceDB set up and running on in your OCI account.

Lance has multiple options that you can use for your back end storage. For example you can configure Lance DB to use object storage, or you could configure it to use a filesystem. There is an article here that covers how to select a storage option and consider the tradeoffs. For example object storage is great for low cost, pretty much limitless storage without operational overhead, however object storage has the downside of being much lower performance and higher latency than a traditional filesystem.

Setting up an instance

The first thing I will do is spin up an OCI instance in my tenancy. I’m not going to cover this in the blog post as the process is fairly straight forward. In my case I am using Visual Studio Code’s remote connection feature to connect to my instance as that option gives me a good text editing experience, as well as a terminal connection.

Given that we are looking to interact with the vector database fairly heavily, I have created a decent sized instance. I spun up and instance with 32 OCPUs and 512 GB of RAM as I intend to use this instance for some testing. You should feel free to scale your instance down as appropriate for your workload.

Once I have the Oracle Linux I need to install a few items to be ready to move forward.

Setting up access to OCI Object Storage

The next thing you need to do is make sure that you can access OCI object storage via the AWS Command line. This is required because Lance DB will talk to OCI via the S3 compatibility layer. My friend Tony Markel did an excellent job of how to configure access to OCI Object storage via S3 connections here.

Follow Tony’s instructions first. When you run the command line:

[opc@lancedb ~]$ oci iam customer-secret-key create --display-name display-name --user-id ocid1.user.oc1..aaaaaaaafhyayxojro2llkejgvsejdr7j754zjeesekh2stai7oau3k6yura
{
  "data": {
    "display-name": "display-name",
    "id": "08062f215XXXXXXXXXXXXXXX",
    "inactive-status": null,
    "key": "2yHCpMul0AnXXXXXXXXXXXXXXXXXX",
    "lifecycle-state": "ACTIVE",
    "time-created": "2025-08-14T19:31:36.588000+00:00",
    "time-expires": null,
    "user-id": "ocid1.user.oc1..aaaaaaaafhyayxojro2XXXXXXXXXXXXXXXXXXXXXXXXX"
  },

The value for “id” becomes your AWS Access key, and the value for “key” becomes the value for your AWS Secret Access Key. You should then run the command

aws configure

When prompted, enter in the values from above as your Access key and secret key.

You should then be able to use the command:

aws --endpoint-url https://{Account Namespace}.compat.objectstorage.{Region}.oraclecloud.com s3 ls

To get a list of the Object storage buckets in your OCI account. Magic!

Connecting Lance DB to OCI Object Storage

Interacting with Lance DB is then done through Python, or any other supported technology. For my example I am loading a CSV file that has city data into a table. You can see the code below. This code clearly only works for datasets where we have enough memory to lead the table into memory. For larger data sets you will want to optimize the code better.

import lancedb
import pandas as pd

# Read the file worldcities.csv line by line and add the data to lance db

data = []
with open("/home/opc/LanceDB-Test/worldcities.csv", "r") as file:
    # Skip the header line
    next(file)
    # Read each line in the file
    for line in file:
        # Split the line by comma and strip whitespace
        parts = line.strip().split(",")
        if len(parts) >= 3:  # Ensure there are at least 3 parts
            # Create a dictionary for the data
            city = parts[0].replace('"', '')  # Remove quotes from city name
            country = parts[4].replace('"', '')
            population = parts[len(parts) - 2].replace('"', '') # Population is the second last column
            latitude = parts[2].replace('"', '')
            longitude = parts[3].replace('"', '')

            if (population == "" or latitude == "" or longitude == ""):
                # Skip entries with missing population, latitude, or longitude
                print(f"Skipping entry due to missing data: {line.strip()}")
                continue

            try:
                # I don't have code to calculate the vector at the moment.
                # so this code will just act as a temporary placholder
                vector = [
                    len(city),  
                    len(country),
                    float(population) if population else 0.0,
                    float(latitude) if latitude else 0.0,
                    float(longitude) if longitude else 0.0,    
                ]
                entry = {
                    "vector": vector,
                    "city": city,
                    "country": country,
                    "population": int(float(population)),
                    "latitude": float(latitude),
                    "longitude": float(longitude)
                }
                data.append(entry)
            except ValueError:
                # Handle the case where conversion fails
                print(f"Skipping entry due to conversion error: {line.strip()}")

db = lancedb.connect(
    "s3://lancedb-bucket/",
    storage_options={
        "awsAccessKeyId": "XXXXXXXXXXXXXXXXXXXXXXXX",  # Replace with your Oracle Access Key
        "awsSecretAccessKey": "XXXXXXXXXXXXXXXXXXXXXXXX",  # Replace with your Oracle Secret Key
        "endpoint": "https://{Namespace}.compat.objectstorage.{Region}.oraclecloud.com", # Replace with your Oracle Object Storage Endpoint
        "region": "{Region}", # Specify your OCI region
    }
)

# check to see if a table exists in the database
if "sample_table" in db.table_names():
    # if it exists, drop the table so we can recreate it
    db.drop_table("sample_table")

# create a new table with the provided data
table = db.create_table("sample_table", data=data)
# create an index for the table
table.create_fts_index("city", replace=True)

Once you run your application, lance will create a structure that looks like the following in your object storage:

Querying sample data

Once your data has been loaded, you can run a Python job to query the data.

tbl = db.open_table("sample_table")
data = tbl.search().limit(100000).select(["city", "country", "population", "latitude", "longitude"]).to_list()
df = pd.DataFrame(data)

Keep in mind that object storage will provide you with an easy means for getting running. However if your application has high performance requirements, then you will want to leverage another storage option, such as Luster.

0
Subscribe to my newsletter

Read articles from Tom Moore directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Tom Moore
Tom Moore

I am a Master Principal Cloud Architect for Oracle Cloud (OCI). I create content designed to help developers with Cloud-Based technologies. This includes AWS, OCI, and occasionally Azure. I'm not excluding GCP, I just haven't had any excuse to dive into GCP yet. As far as development is concerned, my area of focus is predominantly .NET, though these days, I do branch out occasionally into NodeJS and Python. I am available for in-person speaking events as well as virtual sessions. I can usually be found at Boston Code Camp. Opinions expressed on my blog are my own, and should not be considered to be the opinions of my employer.