Adjusting Data Before Sending it to Kentik NMS

Leon adatoLeon adato
7 min read

In my ongoing exploration of Kentik NMS, I continue to peel back not only the layers of what the product can do but also the layers of information I quietly glossed over in my original example, hoping nobody noticed.

In this blog, I want to both admit to and correct one of the most glaring ones:

Metrics Explorer showing device temperature

If that is the temperature of one of your devices, you should seek immediate assistance. I don’t want to alarm you, but that’s six times hotter than the surface of the sun.

A very hot PC

In reality, the SNMP OID in question gives temperature in mC (microcelsius), so all we really need to do is divide by 1,000. But this opens the door to plenty of other situations where it’s not only nice but necessary to adjust metrics before sending them to Kentik NMS.

Starlark for the easily distracted

Kentik comes with scripting capabilities courtesy of Starlark (formerly known as Skylark), a Python-like language created by Google.

That last sentence will either set your mind at ease or send you running for the door, and I’m honestly not sure how I feel about it myself.

But, back to the task at hand, Starlark will let you take the values that come in via an OID and then manipulate them.

A script block, which goes in the reports file, must define a function called process with two parameters: the record and the index set. It typically looks like this:

reports:
  /foo/bar/baz:
    script: !starlark |
      def process(n, indexes):
          (do stuff here)

That’s really all you have to know for now.

To review, this is our Metric

If you missed the original post and don’t feel like going back and reading it, here are the essentials:

  • Move to (or create if it doesn’t exist) the dedicated folder on the system where the Kentik agent (kagent) is running:

/opt/kentik/components/ranger/local/config

  • In that directory, create directories for /sources, /reports, and /profiles

  • Create three specific files:

  1. Under /sources, a file that lists the custom OID to be collected

  2. Under /reports, a file that associates the custom OID with the data category it will appear under within the Kentik portal

  3. Under /profiles, a file that describes a type of device (Using the SNMP System Object ID) and the report(s) to be associated with that device type

  4. Make sure all of those directories (and the files beneath them) are owned by the Kentik user and group:

sudo chown -R kentik:kentik /opt/kentik/components/ranger/

sources/linux.yml

version: 1
metadata:
  name: local-linux
  kind: sources
sources:
  CPUTemp: !snmp
    value: 1.3.6.1.4.1.2021.13.16.2.1.3.1
    interval: 60s

reports/linux_temps_report.yml

version: 1
metadata:
  name: local-temp
  kind: reports
reports:
  /device/linux/temp:
    fields:
      CPUTemp: !snmp
        value: 1.3.6.1.4.1.2021.13.16.2.1.3.1
        metric: true
    interval: 60s

profiles/local-net-snmp.yml

version: 1
metadata:
  name: local-net-snmp
  kind: profile
profile:
  match:
    sysobjectid:
      - 1.3.6.1.4.1.8072.*
  reports:
    - local-temp
  include:
    - device_name_ip

As I showed earlier in this post, that gives you data that looks like this in Metrics Explorer:

Metrics Explorer showing device temperature

Notice that my temperature readings are up around the 33,000 mark? We gotta do something about that.

A chill PC

This is our metric on Starlark

First, we’ll do the simple math - dividing our output by 1000.

  • sources/linux.yml - stays the same

  • profiles/local-net-snmp.yml - stays the same

Our new reports/linux_temps_report.yml file becomes:

version: 1
metadata:
  name: local-temp
  kind: reports
reports:
  /device/linux/temp:
    script: !starlark |
      def process(n, indexes):
        n[’CPUTemp’].value = n[’CPUTemp’].value//1000
    fields:
      CPUTemp: !snmp
        value: 1.3.6.1.4.1.2021.13.16.2.1.3.1
        metric: true
    interval: 60s

Let’s take a moment to unpack the changes to this file:

  • under the category /device/linux/temp, we’re going to declare a starlark script

  • That script is going to take (is piped - | ) a process that includes

    • n, the record containing the data

    • indexes, the index set for the record

  • it pulls re-assigns the CPUTemp value from the record, replacing it with the original value divided by 1000

    • To dig into the guts of Starlark for a moment, the two slashes (”//”) indicate “floored division” - which takes just the integer portion of the result.
  • The YAML file then goes on to identify the record itself, pulling the value from the OID 1.3.6.1.4.1 (and so on).

I’m going to re-phrase because what the file does is actually backward from what is happening:

The script: block declares the process but doesn’t run it. It’s just setting the stage.

The fields: block is the part that identifies the data we’re pulling. Every time a machine returns temperature information (a record set), that process is run, replacing the original CPUTemp value with CPUTemp/1000.

The result is an entirely different set of temperature values:

Metrics Explorer showing revised temperature metrics

When you need a dessert topping AND a floor wax

New Shimmer meme

Sometimes, you need to do the math but also store (and display) the original value. In that case, you just need one small change:

version: 1
metadata:
  name: local-temp
  kind: reports
reports:
  /device/linux/temp:
    script: !starlark |
      def process(n, indexes):
        n.append(’CPUTempC’, n[’CPUTemp’].value//1000, metric=True)
    fields:
      CPUTemp: !snmp
        value: 1.3.6.1.4.1.2021.13.16.2.1.3.1
        metric: true
    interval: 60s

Updated CPU temperature metrics in Celsius

Making it more mathy!

To build on the previous example, this is what it would look like if you wanted to take that Celsius result and convert it to Fahrenheit:

version: 1
metadata:
  name: local-tempF
  kind: reports
reports:
  /device/linux/tempF:
    script: !starlark |
      def process(n, indexes):
        n.append(’CPUTempF’, n[’CPUTemp’].value//1000*9/5+32, metric=True)
    fields:
      CPUTemp: !snmp
        value: 1.3.6.1.4.1.2021.13.16.2.1.3.1
        metric: true
    interval: 60s

Updated CPU temperature metrics in Farenheit

Starlark for the un-wholesomely curious

There’s a lot more to say about (and explore with) Starlark, but I want to leave you with just a few tidbits for now:

  • Ranger will call the process function every time the report runs.

  • For table-based reports, the process function will be called once for each row.

    • create new records

    • maintain state across calls to process

    • combine data from multiple table rows

  • Scripts can be included in the report (as shown in this blog), or referenced as an external file:

script: !external
  type: starlark
  file: test.star

Building It Up

In my most recent blog on adding custom OIDs, I showed how to add a table of values instead of just a single item. The specific use case was providing temperatures for each of the CPUs in a system.

The YAML files to do that looked like this:

sources/linux.yml

version: 1
metadata:
  name: local-linux
  kind: sources
sources:
  CPUTemp: !snmp
    table: 1.3.6.1.4.1.2021.13.16.2
    interval: 60s

reports/temp.yml

version: 1
metadata:
  name: local-temp
  kind: reports
reports:
  /device/linux/temp:
    fields:
      name: !snmp
        table: 1.3.6.1.4.1.2021.13.16.2
        value: 1.3.6.1.4.1.2021.13.16.2.1.2
        metric: false
      CPUTemp: !snmp
        table: 1.3.6.1.4.1.2021.13.16.2
        value: 1.3.6.1.4.1.2021.13.16.2.1.3
        metric: true
    interval: 60s

profiles/local-net-snmp.yml

version: 1
metadata:
  name: local-net-snmp
  kind: profile
profile:
  match:
    sysobjectid:
      - 1.3.6.1.4.1.8072.*
  reports:
    - local-temp
  include:
    - device_name_ip

Incorporating what we’ve learned in this post, here are the changes. You’ll note that I’ve renamed a few things mostly to keep these new elements from conflicting with what we created before:

linux_multitemp.yml

version: 1
metadata:
  name: linux_multitemp
  kind: sources
sources:
  CPUTemp_Multi: !snmp
    table: 1.3.6.1.4.1.2021.13.16.2
    interval: 60s

This is effectively the same as the linux_temp.yml I re-posted from the last post. But again, I renamed the file, the metadata name, and the source name to keep things a little separate from what we’ve done.

linux_multitempsc_reports.yml

version: 1
metadata:
  name: local-multitempC
  kind: reports
reports:
  /device/linux/multitempC:
    script: !starlark |
      def process(n, indexes):
        n[’CPUTemp_Multi’].value = n[’CPUTemp_Multi’].value//1000
    fields:
      CPUname: !snmp
        table: 1.3.6.1.4.1.2021.13.16.2
        value: 1.3.6.1.4.1.2021.13.16.2.1.2
        metric: false
      CPUTemp_Multi: !snmp
        table: 1.3.6.1.4.1.2021.13.16.2
        value: 1.3.6.1.4.1.2021.13.16.2.1.3
        metric: true
    interval: 60s

The major change here is the addition of the script block. The other changes are simply renaming:

local_net_snmp.yml

version: 1
metadata:
  name: local-net-snmp
  kind: profile
profile:
  match:
    sysobjectid:
      - 1.3.6.1.4.1.8072.*
  reports:
    - local-temp
    - local-multitempC
  include:
    - device_name_ip

In this file, our addition strictly includes local-multitempC in the reports section.

The result is a delightful blend of everything we’ve tested out so far. We have temperature values for each of the CPUs on a given system, and those values have been converted from microCelsius to Celcius.

Updated temperature metrics - microCelsius to Celsius

Why Summarize When We Both Know I’m Not Done?

This post, along with all those that have come before, again highlights the incredible flexibility and capability of Kentik NMS. But there are so many more things to show! How to ingest non-SNMP data, how to add totally new device types, and how to install the NMS in the first place.

Wait… THAT HASN’T BEEN COVERED YET?!?!

Oof. I’d better get started writing the next post.

0
Subscribe to my newsletter

Read articles from Leon adato directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Leon adato
Leon adato

In my sordid career, I have been an actor, bug exterminator and wild-animal remover (nothing crazy like pumas or wildebeests. Just skunks, snakes, and raccoons.), electrician, carpenter, stage-combat instructor, ASL interpreter, and Sunday school teacher. Oh, yeah, I've also worked with computers. While my first keyboard was an IBM selectric, and my first digital experience was on an Atari 400, my professional work in tech started in 1989 (when you got Windows 286 for free on twelve 5¼” when you bought Excel 1.0). Since then I've worked as a classroom instructor, courseware designer, helpdesk operator, desktop support staff, sysadmin, network engineer, and software distribution technician. Then, about 25 years ago, I got involved with monitoring. I've worked with a wide range of tools: Tivoli, BMC, OpenView, janky perl scripts, Nagios, SolarWinds, DOS batch files, Zabbix, Grafana, New Relic, and other assorted nightmare fuel. I've designed solutions for companies that were modest (~10 systems), significant (5,000 systems), and ludicrous (250,000 systems). In that time, I've learned a lot about monitoring and observability in all it's many and splendid forms.