Harnessing the IP Address Data Type in Apache Druid (2025)

Everything you need to ingest, store, and analyze billions of IPv4 & IPv6 addresses at interactive speed.
Why a native IP data type?
Storing every address as a plain string inflates segment size, hurts bitmap selectivity, and forces costly runtime casts for subnet math.
The complex ipAddress (128‑bit) and ipPrefix (136‑bit) encodings introduced in Druid 30 collapse IPv4 and IPv6 into compact binaries, enabling index‑friendly comparisons, hash‑based GROUP BY, and true subnet pruning down to the segment level.
Enabling the feature
# common.runtime.properties
druid.extensions.loadList=["imply-utility-belt"] # ships the IP complex type
IPv6 support is still beta*—keep an eye on future releases before turning it on in production.*
Designing the ingestion spec
"dimensionsSpec": {
"dimensions": [
"page",
"language",
{ "type": "ipAddress", "name": "remote_address" }, // single host
{ "type": "ipPrefix", "name": "network_prefix" } // CIDR block
]
}
ipAddress stores 128‑bit hosts, while ipPrefix keeps the host bits plus an 8‑bit prefix length so range look‑ups remain constant‑time.
Example Ingestion Spec with Data
An example ingestion spec with inline data is shown below
{
"type": "index_parallel",
"spec": {
"dataSchema": {
"dataSource": "inline_data_ip_address",
"timestampSpec": {
"column": "timestamp",
"format": "iso",
"missingValue": null
},
"dimensionsSpec": {
"dimensions": [
{
"type": "string",
"name": "page",
"multiValueHandling": "SORTED_ARRAY",
"createBitmapIndex": true
},
{
"type": "string",
"name": "language",
"multiValueHandling": "SORTED_ARRAY",
"createBitmapIndex": true
},
{
"type": "ipAddress",
"name": "remote_address",
"multiValueHandling": "SORTED_ARRAY",
"createBitmapIndex": true
},
{
"type": "ipPrefix",
"name": "network_prefix",
"multiValueHandling": "SORTED_ARRAY",
"createBitmapIndex": true
}
],
"dimensionExclusions": [
"__time",
"timestamp"
],
"includeAllDimensions": false,
"useSchemaDiscovery": false
},
"metricsSpec": [],
"granularitySpec": {
"type": "uniform",
"segmentGranularity": "DAY",
"queryGranularity": {
"type": "none"
},
"rollup": false,
"intervals": []
},
"transformSpec": {
"filter": null,
"transforms": []
}
},
"ioConfig": {
"type": "index_parallel",
"inputSource": {
"type": "inline",
"data": "timestamp,page,language,remote_address,network_prefix\n2025-05-17T00:00:00,/index.html,en,bc89:60a9:23b8:c1e9:3924:56de:3eb1:3b90,bc89:60a9:23b8::/48\n2025-05-17T00:05:00,/index.html,jp,108.3.17.153,108.3.17.0/24\n2025-05-17T00:10:00,/,en,59.143.170.24,59.143.170.0/24\n2025-05-17T00:15:00,/checkout,jp,50.231.6.41,50.231.6.0/24\n2025-05-17T00:20:00,/checkout,de,150.218.29.172,150.218.29.0/24\n2025-05-17T00:25:00,/login,en,571a:a876:6c30:7511:b2b9:437a:28df:6ec4,571a:a876:6c30::/48\n2025-05-17T00:30:00,/login,es,195.116.89.238,195.116.89.0/24\n2025-05-17T00:35:00,/account,en,24.194.103.151,24.194.103.0/24\n2025-05-17T00:40:00,/account,fr,759c:de66:bacf:b3d0:b1f:9163:ce9f:f57f,759c:de66:bacf::/48\n2025-05-17T00:45:00,/checkout,en,4b0d:bb41:8d52:88f1:142c:3fe8:60e7:a113,4b0d:bb41:8d52::/48\n"
},
"inputFormat": {
"type": "csv",
"findColumnsFromHeader": true
},
"appendToExisting": false,
"dropExisting": false
},
"tuningConfig": {
"type": "index_parallel",
"maxRowsPerSegment": 5000000,
"appendableIndexSpec": {
"type": "onheap",
"preserveExistingMetrics": false
},
"maxRowsInMemory": 1000000,
"maxBytesInMemory": 0,
"skipBytesInMemoryOverheadCheck": false,
"maxTotalRows": null,
"numShards": null,
"splitHintSpec": null,
"partitionsSpec": {
"type": "dynamic",
"maxRowsPerSegment": 5000000,
"maxTotalRows": null
},
"indexSpec": {
"bitmap": {
"type": "roaring"
},
"dimensionCompression": "lz4",
"stringDictionaryEncoding": {
"type": "utf8"
},
"metricCompression": "lz4",
"longEncoding": "longs"
},
"indexSpecForIntermediatePersists": {
"bitmap": {
"type": "roaring"
},
"dimensionCompression": "lz4",
"stringDictionaryEncoding": {
"type": "utf8"
},
"metricCompression": "lz4",
"longEncoding": "longs"
},
"maxPendingPersists": 0,
"forceGuaranteedRollup": false,
"reportParseExceptions": false,
"pushTimeout": 0,
"segmentWriteOutMediumFactory": null,
"maxNumConcurrentSubTasks": 1,
"maxRetry": 3,
"taskStatusCheckPeriodMs": 1000,
"chatHandlerTimeout": "PT10S",
"chatHandlerNumRetries": 5,
"maxNumSegmentsToMerge": 100,
"totalNumMergeTasks": 10,
"logParseExceptions": false,
"maxParseExceptions": 2147483647,
"maxSavedParseExceptions": 0,
"maxColumnsToMerge": -1,
"awaitSegmentAvailabilityTimeoutMillis": 0,
"maxAllowedLockCount": -1,
"numPersistThreads": 1,
"partitionDimensions": []
}
},
"context": {
"forceTimeChunkLock": true,
"useLineageBasedSegmentAllocation": true
},
"dataSource": "inline_data_ip_address"
}
Quick sanity check
SELECT
IPV4_STRINGIFY(remote_address) AS ip,
IPV4_STRINGIFY(network_prefix) AS cidr
FROM traffic LIMIT 5
Querying with IP‑centric functions
Matching & searching
-- Is the client inside a zero‑trust subnet?
SELECT COUNT(*)
FROM traffic
WHERE IPV4_MATCH(remote_address, '108.3.0.0/16')
IPV4_MATCH
needs an exact host or CIDR on the right.
Migrating from STRING → IP
Backfill: Re‑ingest recent partitions with the new dimensionSpec.
Realtime streams: Add a transform when your parser emits the address:
{ "type":"expression", "name":"remote_address",
"expression":"IP_PARSE(remote_address_str)" }
Join strategy: For mixed historical segments, wrap filters in COALESCE(IP_PARSE(addr_str), addr_bin) so queries hit both encodings.
Once old segments age out, drop the legacy string column.
Performance tips & caveats
Tip | Why |
Use ipPrefix for common subnet filters | Bitmap indexes & Bloom filters can prune segments before the broker fan‑out. |
Keep prefix lengths ≤ 48 for IPv6 | Finer masks can explode the prefix space and hurt pruning. |
Disable dictionary encoding on bulk host columns (dimensionCompression: "none") | When cardinality ≈ row‑count, front‑load encoding cost is wasteful. |
Be mindful of JSON serialization | Result sets with ipAddress columns emit the binary as Base64; wrap with IPV4_STRINGIFY for dashboards. |
Feature maturity | IPv6 operations are marked beta—benchmark with real workloads before committing. |
Wrapping up
The native IP data types radically simplify network analytics in Druid:
Smaller segments → lower storage and faster scans.
True subnet intelligence → no more ad‑hoc regex parsing.
Full SQL ergonomics with first‑class GROUP BY and WHERE support.
Whether you’re protecting an edge, feeding a SIEM, or building click‑stream geo dashboards, upgrading to the ipAddress/ipPrefix complex types unlocks millisecond‑level insights at cloud scale.
Happy querying! 🐉✨
Subscribe to my newsletter
Read articles from Nikhil Rao directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by

Nikhil Rao
Nikhil Rao
Los Angeles