Part 12: Keeping Watch

In Part 11, we successfully interacted with our newly built cloud via the Horizon dashboard, launching a VM and validating the core compute, storage, and networking functionality. Our cloud works! But how do we know if it's healthy? How do we track resource usage? How are sensitive secrets managed? This is where the OpenStack Telemetry stack and services like Barbican come in.

Why Monitor Your Cloud? ๐Ÿ‘€

Operating a cloud without visibility is like flying blind. Monitoring and telemetry are essential for:

  • Performance Tracking: Identify bottlenecks, understand resource utilization trends.

  • Capacity Planning: Forecast future hardware needs based on actual usage.

  • Troubleshooting: Quickly pinpoint issues by correlating events and metrics.

  • Billing/Showback: Track resource consumption per project or user.

  • Automation: Trigger actions (like scaling or alerts) based on specific metric thresholds.

The OpenStack Telemetry Stack: Ceilometer, Gnocchi & Aodh ๐Ÿ“Š

OpenStack provides a suite of services working together for telemetry:

  1. Ceilometer: The data collection service. It uses agents (ceilometer-agent) running on compute nodes (and potentially other places) to gather metrics (CPU usage, disk I/O, network traffic, etc.) and events. It typically pushes this data onto the RabbitMQ message bus.

  2. Gnocchi: A specialized time-series database. It efficiently stores and indexes the vast amounts of metric data collected by Ceilometer, providing an API for querying this data. We configured Gnocchi to use our resilient Ceph cluster as its storage backend.

  3. Aodh: The alarming service. It defines and evaluates rules against metrics stored in Gnocchi. When a rule's threshold is breached (e.g., CPU > 90% for 5 minutes), Aodh triggers defined actions, such as sending notifications or potentially invoking other OpenStack services like Heat for auto-scaling.

graph TD
    subgraph "OpenStack Telemetry Flow"
        direction LR
        NovaCompute(Nova Compute Node);
        Agent(ceilometer-agent);
        MQ(RabbitMQ);
        %% Ceilometer API/Notification Listener
        CeiloAPI(ceilometer);
        Gnocchi(Gnocchi<br>Metrics DB);
        Aodh(Aodh<br>Alarming);
        Ceph(Ceph Cluster);
        User(User/Admin/API Client);

        NovaCompute -- Hosts --> Agent;
        Agent -- Collects Metrics --> MQ;
        MQ -- Publishes Metrics --> CeiloAPI;
        CeiloAPI -- Stores Metrics --> Gnocchi;
        Gnocchi -- Uses Storage --> Ceph;
        Aodh -- Queries Metrics & Defines Alarms --> Gnocchi;
        User -- Queries Metrics --> Gnocchi;
        User -- Manages Alarms --> Aodh;
    end

Deploying Telemetry with Juju โš™๏ธ

Let's deploy these components using Juju, following the targets from our command list where available.

  1. Deploy Memcached (Gnocchi Dependency):

     juju deploy --to lxd:6 memcached memcached
    
  2. Deploy Gnocchi (Metrics DB):

     # Deploy Gnocchi API/Metricd services
     # '--config gnocchi.yaml' contains VIP etc.
     juju deploy --to lxd:5 --channel 2023.2/stable --config gnocchi.yaml gnocchi
     juju deploy --channel 8.0/stable mysql-router gnocchi-mysql-router
    
     # Integrate Gnocchi
     juju integrate gnocchi-mysql-router:db-router mysql-innodb-cluster:db-router
     juju integrate gnocchi-mysql-router:shared-db gnocchi:shared-db
     juju integrate gnocchi:identity-service keystone:identity-service
     juju integrate gnocchi:amqp rabbitmq-server:amqp
     juju integrate gnocchi:coordinator-memcached memcached:cache
     juju integrate gnocchi:certificates vault:certificates
    
     # Integrate with Ceph for metric storage backend
     juju integrate gnocchi:storage-ceph ceph-mon:client
    
  3. Deploy Aodh (Alarming):

     # Deploy Aodh API/Evaluator/Notifier services
     # '--config aodh.yaml' contains VIP etc.
     juju deploy --to lxd:7 --channel 2023.2/stable --config aodh.yaml aodh
     juju deploy --channel 8.0/stable mysql-router aodh-mysql-router
    
     # Integrate Aodh
     juju integrate aodh-mysql-router:db-router mysql-innodb-cluster:db-router
     juju integrate aodh-mysql-router:shared-db aodh:shared-db
     juju integrate aodh:identity-service keystone:identity-service
     juju integrate aodh:amqp rabbitmq-server:amqp
     # Aodh implicitly uses Gnocchi via Keystone service catalog endpoint typically
    
  4. Deploy Ceilometer (Collection API/Notification Listener):

     # Deploy Ceilometer API service
     # '--config ceilometer.yaml' contains VIP etc.
     juju deploy --to lxd:7 --channel 2023.2/stable --config ceilometer.yaml ceilometer
    
     # Integrate Ceilometer
     juju integrate ceilometer:certificates vault:certificates
     juju integrate ceilometer:metric-service gnocchi:metric-service # Link to Gnocchi
     juju integrate ceilometer:amqp rabbitmq-server:amqp
     juju integrate ceilometer:identity-service keystone:identity-service
     juju integrate ceilometer:identity-notifications keystone:identity-notifications # Listen for events
     juju integrate ceilometer keystone:identity-credentials # Allow auth
    
  5. Deploy Ceilometer Agent (Collectors): Runs alongside nova-compute.

     juju deploy ceilometer-agent --channel 2023.2/stable
    
     # Integrate Agent
     juju integrate ceilometer-agent nova-compute # Deploy agent to compute nodes
     juju integrate ceilometer-agent:amqp rabbitmq-server:amqp
    
     # Link agent to API
     juju integrate ceilometer:ceilometer-service ceilometer-agent:ceilometer-service
    
  6. Run Ceilometer Database Migrations/Upgrades: A necessary step after deployment/upgrades.

     juju run ceilometer/leader ceilometer-upgrade
    

Watcher: Cloud Optimization Service (Briefly) ๐Ÿค–

Watcher analyzes cloud usage (using data from Gnocchi) and provides recommendations or even triggers actions to optimize resource allocation (e.g., consolidating VMs onto fewer hosts during low load).

  1. Deployment: (Inferring command as it wasn't in the list)

     # Deploy Watcher API/Engine services
     # Assume deployment to LXD on machine 7, apply config from watcher.yaml
     juju deploy --to lxd:7 --channel 2023.2/stable --config watcher.yaml watcher
     # Deploy MySQL Router if needed by charm (check charm docs)
    
  2. Configuration: Ensure the watcher.yaml config specifies Gnocchi as the datasource and includes your desired planner weights (as shown in your config.yaml).

  3. Integrations: (Inferring typical relations)

     # juju integrate watcher-mysql... # If needed
     juju integrate watcher:identity-service keystone:identity-service
     juju integrate watcher:amqp rabbitmq-server:amqp
     juju integrate watcher:compute-service nova-cloud-controller:compute-service # To interact with Nova
     # Gnocchi integration is often via config/service catalog lookup
    

Revisiting Barbican: Secure Secrets Management ๐Ÿ”

We deployed Barbican back in Part 6, integrated with Vault via the barbican-vault subordinate charm. Let's recap its role:

  • Purpose: Barbican provides a secure API for storing and managing secrets like TLS keys/certificates, symmetric keys, and other sensitive data.

  • Importance: Services like Octavia (for TLS termination on load balancers), Cinder (for volume encryption keys), and potentially users/apps need a central, secure place for secrets.

  • Vault Backend: By integrating with Vault, we ensure Barbican's secrets are ultimately stored in our highly secure, initialized Vault instance, rather than just in the OpenStack database.

graph LR
    subgraph "Barbican with Vault Backend"
        APIClient(User / Service e.g., Octavia) -- Interacts via API --> Barbican(Barbican API);
        Barbican -- Stores/Retrieves Secrets via --> BVault(barbican-vault <br> subordinate);
        BVault -- Uses Vault API --> Vault(Vault);
        Barbican -- Uses DB for metadata --> MySQL(MySQL);
        Barbican -- Auth via --> Keystone(Keystone);
    end
    style Vault fill:#d4edda

Verification โœ…

Check the status of the newly added telemetry and optimization services:

juju status gnocchi aodh ceilometer ceilometer-agent watcher barbican

You can also check the OpenStack service catalog for the new endpoints:

juju run keystone/leader 'openstack service list'
# Look for 'metric' (Gnocchi), 'alarming' (Aodh), 'key-manager' (Barbican) etc.

Querying metrics might take some time to populate, but you could try a basic Gnocchi check:

juju run gnocchi/leader 'openstack metric list'

Conclusion ๐Ÿ”ญ

Our cloud is now equipped with a comprehensive telemetry stack (Ceilometer, Gnocchi, Aodh) providing vital monitoring and alarming capabilities, along with Watcher for potential optimization. We also revisited Barbican's role in securing secrets, backed by Vault. This significantly enhances the operational visibility and maturity of our platform.

With the infrastructure, core services, networking, and monitoring in place, what's next? In Part 13, we'll demonstrate the real payoff by deploying complex applications, like a Kubernetes cluster, onto our newly built OpenStack cloud using Juju.

0
Subscribe to my newsletter

Read articles from Faiz Ahmed Farooqui directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Faiz Ahmed Farooqui
Faiz Ahmed Farooqui

Principal Technical Consultant at GeekyAnts. Bootstrapping our own Data Centre services. I lead the development and management of innovative software products and frameworks at GeekyAnts, leveraging a wide range of technologies including OpenStack, Postgres, MySQL, GraphQL, Docker, Redis, API Gateway, Dapr, NodeJS, NextJS, and Laravel (PHP). With over 9 years of hands-on experience, I specialize in agile software development, CI/CD implementation, security, scaling, design, architecture, and cloud infrastructure. My expertise extends to Metal as a Service (MaaS), Unattended OS Installation, OpenStack Cloud, Data Centre Automation & Management, and proficiency in utilizing tools like OpenNebula, Firecracker, FirecrackerContainerD, Qemu, and OpenVSwitch. I guide and mentor a team of engineers, ensuring we meet our goals while fostering strong relationships with internal and external stakeholders. I contribute to various open-source projects on GitHub and share industry and technology insights on my blog at blog.faizahmed.in. I hold an Engineer's Degree in Computer Science and Engineering from Raj Kumar Goel Engineering College and have multiple relevant certifications showcased on my LinkedIn skill badges.