Quick tip: Selecting from an Oracle table and creating a Parquet datafile

There are many sites on how to connect to an Oracle database with Python and using Pandas to create a parquet format datafile. This example is using the python-oracledb
library to establish the connection. Searches for Python libraries for Oracle database connections included cx_Oracle but this library was obsoleted in 2022 https://oracle.github.io/python-cx_Oracle/ This example required additional features provided in the Oracle Client libraries and is located in the lib_dir directory.
In this example, I am using Visual Studio Code with with Jupyter Notebook on my Mac. Python 3.12.2 with Anaconda is being used as the kernel for the Jupyter notebook functionality. I also set-up a virtual environment for the Python development.
When you're working with large datasets in enterprise environments, it’s often helpful to pull data from an Oracle database and convert it into a Parquet file for analysis, especially with Python and Pandas. Parquet is an efficient, compressed columnar format that plays nicely with big data tools like Spark, DuckDB, and Dask.
Here’s a quick way to do just that using Python.
# Oracle Data Frames Writing Apache Parquet Files
import getpass
import oracledb
import pyarrow
import pyarrow.parquet as pq
oracledb.init_oracle_client(lib_dir="/opt/oracle/instantclient_23_3")
# pw = getpass.getpass("Enter password: ")
connection = oracledb.connect(
user="demopython",
password="***",
dsn="localhost/freepdb1")
FILE_NAME = "sample.parquet"
# Tune the fetch batch size for your query
BATCH_SIZE = 10000
print("Successfully connected to Oracle Database")
# Get a PyArrow table from the query results
sql = "select * from dba_users"
pqwriter = None
for odf in connection.fetch_df_batches(statement=sql, size=BATCH_SIZE):
# Get a PyArrow table from the query results
pyarrow_table = pyarrow.Table.from_arrays(
arrays=odf.column_arrays(), names=odf.column_names()
)
if not pqwriter:
pqwriter = pq.ParquetWriter(FILE_NAME, pyarrow_table.schema)
pqwriter.write_table(pyarrow_table)
pqwriter.close()
Subscribe to my newsletter
Read articles from Christopher Youll directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
