⚠️ How I Diagnosed and Fixed a Silent NumPy Upgrade That Broke My Databricks Pipelines

Background
Recently, one of our critical ADF production pipelines failed unexpectedly with the following error from the Databricks notebook activity:
"Could not reach driver of cluster 0619-120457-7gl1w8rb."
The notebook had been working fine earlier in the day. Upon investigation, I found that this started happening after the cluster auto-terminated and restarted—and suddenly many notebooks attached to the same cluster were failing on import statements related to pandas
, pyarrow
, and scikit-learn
.
Symptom
Inside notebook logs and standard output, I saw this common error:
A module that was compiled using NumPy 1.x cannot be run in NumPy 2.2.6 as it may crash.
...
ImportError: PyArrow >= 4.0.0 must be installed; however, it was not found.
At this point, it was clear there was a version mismatch between NumPy, PyArrow, and Pandas—breaking the runtime.
Investigation Strategy
Since Databricks installs all libraries listed in the cluster Library UI each time the cluster restarts, I suspected one of these libraries silently installed or upgraded NumPy. Here's how I investigated:
Cloned the cluster.
Opened a new notebook and ran:
%pip show numpy
Result:
numpy 2.2.6
was installed!Uninstalled each custom Python library one-by-one via UI and re-tested until I found the culprit.
Culprit: btyd
I found that adding the btyd
library (without version pinning) was implicitly upgrading NumPy to version 2.2.6
. This version is incompatible with most packages compiled for NumPy 1.x, including pyarrow
, which is crucial for Spark DataFrame I/O in Databricks.
This led to:
Broken notebook sessions
Inability to import
pandas
orpyarrow
ADF pipeline failures
Solution
To resolve the issue without removing btyd (which is still needed), I took the following steps:
Identified that
numpy==1.26.0
is the latest stable version before NumPy 2.x.In a notebook:
%pip install numpy==1.26.0
Restarted the kernel:
dbutils.library.restartPython()
Verified version:
pythonCopyEditimport numpy as np print(np.__version__) # 1.26.0
Finally, I added
numpy==1.26.0
as a separate entry at the end of the cluster’s library list via the Databricks Libraries UI.
This ensures that Databricks installs all other libraries first (including btyd
), and then overrides NumPy 2.x with the compatible 1.26.0 version.
Post-Resolution Actions
I tested a few notebooks and confirmed that PyArrow, Pandas, and other dependent libraries worked.
I checked scheduled ADF pipelines that ran during the error window and confirmed some of them failed with the driver error.
These need to be re-run manually now that the cluster is fixed.
We must monitor all downstream pipelines using the cluster to catch any lingering failures.
Key Lessons
Always pin versions of libraries when working in shared production clusters.
Not all PyPI packages are responsible with dependency declarations—btyd is a perfect example.
Avoid installing packages like NumPy blindly, especially on top of environments with native bindings (like Spark and Arrow).
Clone the cluster before testing conflicting packages—this saved me from breaking other workloads.
Have you experienced similar dependency chaos in Databricks or any other managed environment? Share your experience in the comments!
Subscribe to my newsletter
Read articles from Muhammad Atif Hafeez directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
