Everything you need to ingest, store, and analyze billions of IPv4 & IPv6 addresses at interactive speed.

Why a native IP data type?

Storing every address as a plain string inflates segment size, hurts bitmap selectivity, and forces costly runtime casts for subnet math.

The complex ipAddress (128‑bit) and ipPrefix (136‑bit) encodings introduced in Druid 30 collapse IPv4 and IPv6 into compact binaries, enabling index‑friendly comparisons, hash‑based GROUP BY, and true subnet pruning down to the segment level.

Enabling the feature

# common.runtime.properties
druid.extensions.loadList=["imply-utility-belt"]   # ships the IP complex type

IPv6 support is still beta*—keep an eye on future releases before turning it on in production.*

Designing the ingestion spec

"dimensionsSpec": {
  "dimensions": [
    "page",
    "language",
    { "type": "ipAddress", "name": "remote_address" },  // single host
    { "type": "ipPrefix",  "name": "network_prefix" }   // CIDR block
  ]
}

ipAddress stores 128‑bit hosts, while ipPrefix keeps the host bits plus an 8‑bit prefix length so range look‑ups remain constant‑time.

Example Ingestion Spec with Data

An example ingestion spec with inline data is shown below

{
  "type": "index_parallel",
  "spec": {
    "dataSchema": {
      "dataSource": "inline_data_ip_address",
      "timestampSpec": {
        "column": "timestamp",
        "format": "iso",
        "missingValue": null
      },
      "dimensionsSpec": {
        "dimensions": [
          {
            "type": "string",
            "name": "page",
            "multiValueHandling": "SORTED_ARRAY",
            "createBitmapIndex": true
          },
          {
            "type": "string",
            "name": "language",
            "multiValueHandling": "SORTED_ARRAY",
            "createBitmapIndex": true
          },
          {
            "type": "ipAddress",
            "name": "remote_address",
            "multiValueHandling": "SORTED_ARRAY",
            "createBitmapIndex": true
          },
          {
            "type": "ipPrefix",
            "name": "network_prefix",
            "multiValueHandling": "SORTED_ARRAY",
            "createBitmapIndex": true
          }
        ],
        "dimensionExclusions": [
          "__time",
          "timestamp"
        ],
        "includeAllDimensions": false,
        "useSchemaDiscovery": false
      },
      "metricsSpec": [],
      "granularitySpec": {
        "type": "uniform",
        "segmentGranularity": "DAY",
        "queryGranularity": {
          "type": "none"
        },
        "rollup": false,
        "intervals": []
      },
      "transformSpec": {
        "filter": null,
        "transforms": []
      }
    },
    "ioConfig": {
      "type": "index_parallel",
      "inputSource": {
        "type": "inline",
        "data": "timestamp,page,language,remote_address,network_prefix\n2025-05-17T00:00:00,/index.html,en,bc89:60a9:23b8:c1e9:3924:56de:3eb1:3b90,bc89:60a9:23b8::/48\n2025-05-17T00:05:00,/index.html,jp,108.3.17.153,108.3.17.0/24\n2025-05-17T00:10:00,/,en,59.143.170.24,59.143.170.0/24\n2025-05-17T00:15:00,/checkout,jp,50.231.6.41,50.231.6.0/24\n2025-05-17T00:20:00,/checkout,de,150.218.29.172,150.218.29.0/24\n2025-05-17T00:25:00,/login,en,571a:a876:6c30:7511:b2b9:437a:28df:6ec4,571a:a876:6c30::/48\n2025-05-17T00:30:00,/login,es,195.116.89.238,195.116.89.0/24\n2025-05-17T00:35:00,/account,en,24.194.103.151,24.194.103.0/24\n2025-05-17T00:40:00,/account,fr,759c:de66:bacf:b3d0:b1f:9163:ce9f:f57f,759c:de66:bacf::/48\n2025-05-17T00:45:00,/checkout,en,4b0d:bb41:8d52:88f1:142c:3fe8:60e7:a113,4b0d:bb41:8d52::/48\n"
      },
      "inputFormat": {
        "type": "csv",
        "findColumnsFromHeader": true
      },
      "appendToExisting": false,
      "dropExisting": false
    },
    "tuningConfig": {
      "type": "index_parallel",
      "maxRowsPerSegment": 5000000,
      "appendableIndexSpec": {
        "type": "onheap",
        "preserveExistingMetrics": false
      },
      "maxRowsInMemory": 1000000,
      "maxBytesInMemory": 0,
      "skipBytesInMemoryOverheadCheck": false,
      "maxTotalRows": null,
      "numShards": null,
      "splitHintSpec": null,
      "partitionsSpec": {
        "type": "dynamic",
        "maxRowsPerSegment": 5000000,
        "maxTotalRows": null
      },
      "indexSpec": {
        "bitmap": {
          "type": "roaring"
        },
        "dimensionCompression": "lz4",
        "stringDictionaryEncoding": {
          "type": "utf8"
        },
        "metricCompression": "lz4",
        "longEncoding": "longs"
      },
      "indexSpecForIntermediatePersists": {
        "bitmap": {
          "type": "roaring"
        },
        "dimensionCompression": "lz4",
        "stringDictionaryEncoding": {
          "type": "utf8"
        },
        "metricCompression": "lz4",
        "longEncoding": "longs"
      },
      "maxPendingPersists": 0,
      "forceGuaranteedRollup": false,
      "reportParseExceptions": false,
      "pushTimeout": 0,
      "segmentWriteOutMediumFactory": null,
      "maxNumConcurrentSubTasks": 1,
      "maxRetry": 3,
      "taskStatusCheckPeriodMs": 1000,
      "chatHandlerTimeout": "PT10S",
      "chatHandlerNumRetries": 5,
      "maxNumSegmentsToMerge": 100,
      "totalNumMergeTasks": 10,
      "logParseExceptions": false,
      "maxParseExceptions": 2147483647,
      "maxSavedParseExceptions": 0,
      "maxColumnsToMerge": -1,
      "awaitSegmentAvailabilityTimeoutMillis": 0,
      "maxAllowedLockCount": -1,
      "numPersistThreads": 1,
      "partitionDimensions": []
    }
  },
  "context": {
    "forceTimeChunkLock": true,
    "useLineageBasedSegmentAllocation": true
  },
  "dataSource": "inline_data_ip_address"
}

Quick sanity check

SELECT 
  IPV4_STRINGIFY(remote_address) AS ip,
  IPV4_STRINGIFY(network_prefix) AS cidr
FROM traffic LIMIT 5

Querying with IP‑centric functions

Matching & searching

-- Is the client inside a zero‑trust subnet?
SELECT COUNT(*) 
FROM traffic 
WHERE IPV4_MATCH(remote_address, '108.3.0.0/16')

IPV4_MATCH needs an exact host or CIDR on the right.

Migrating from STRING → IP

Backfill: Re‑ingest recent partitions with the new dimensionSpec.
Realtime streams: Add a transform when your parser emits the address:

{ "type":"expression", "name":"remote_address",
  "expression":"IP_PARSE(remote_address_str)" }

Join strategy: For mixed historical segments, wrap filters in COALESCE(IP_PARSE(addr_str), addr_bin) so queries hit both encodings.
Once old segments age out, drop the legacy string column.

Performance tips & caveats

Tip	Why
Use ipPrefix for common subnet filters	Bitmap indexes & Bloom filters can prune segments before the broker fan‑out.
Keep prefix lengths ≤ 48 for IPv6	Finer masks can explode the prefix space and hurt pruning.
Disable dictionary encoding on bulk host columns (dimensionCompression: "none")	When cardinality ≈ row‑count, front‑load encoding cost is wasteful.
Be mindful of JSON serialization	Result sets with ipAddress columns emit the binary as Base64; wrap with `IPV4_STRINGIFY` for dashboards.
Feature maturity	IPv6 operations are marked beta—benchmark with real workloads before committing.

Wrapping up

The native IP data types radically simplify network analytics in Druid:

Smaller segments → lower storage and faster scans.
True subnet intelligence → no more ad‑hoc regex parsing.
Full SQL ergonomics with first‑class GROUP BY and WHERE support.

Whether you’re protecting an edge, feeding a SIEM, or building click‑stream geo dashboards, upgrading to the ipAddress/ipPrefix complex types unlocks millisecond‑level insights at cloud scale.

Happy querying! 🐉✨

Harnessing the IP Address  Data Type in Apache Druid (2025)

Table of contents

Why a native IP data type?

Enabling the feature

Designing the ingestion spec

Example Ingestion Spec with Data

Quick sanity check

Querying with IP‑centric functions

Matching & searching

Migrating from STRING → IP

Performance tips & caveats

Wrapping up

Subscribe to my newsletter

Nikhil Rao

Nikhil Rao

Harnessing the IP Address Data Type in Apache Druid (2025)

Table of contents

Why a native IP data type?

Enabling the feature

Designing the ingestion spec

Example Ingestion Spec with Data

Quick sanity check

Querying with IP‑centric functions

Matching & searching

Migrating from STRING → IP

Performance tips & caveats

Wrapping up

Subscribe to my newsletter

Nikhil Rao

Nikhil Rao

Harnessing the IP Address  Data Type in Apache Druid (2025)