Automated Solutions for Handling High CPU, Memory, and Disk Usage in Linux VMs

PratulPratul
9 min read

MONITORING the infrastructure becomes a challenge because of the rapid growth in Infrastructure domain. It is essential that machines are up & running efficiently without the need of manual intervention throughout the time. Systems are repaired/resolved but we as infra engineers miss the big picture, where we sacrifice a great deal of C.S.E → Cost, Service time & Efficiency.

This is where AUTOMATION comes into picture ensuring that a great deal of optimization in terms of C.S.E → Cost, Service time & Efficiency are planned & optimized for our daily workforce.

Taking inspiration, I explored & configured AUTOMATION scripts using Pipelining Workflow following the version control.


DISCLAIMER


My Lab

I will be using a Shell Script file containing all the necessary LINUX commands required to monitor/diagnose the LINUX VM parameters. Performing DRY RUN with EXECUTE permission, I will then deploy it through CI/CD Pipelining using YAML scripting. All the scripts & artifacts I will store it in GITHUB remote repository which will work in version control.

  1. My first approach is to configure a LINUX VM (can be Debian or RHEL) in vSphere or Hyper-V or Cloud platform (Azure or AWS or GCP). For now, I deployed my VM in Azure.

  2. Setup & Configure Shell Script file - ‘diagnose.sh‘ with EXCEUTE permission.

    (Create file using touch or vi/vim commands → Use chmod with +x or 700 for EXECUTE permission)

    (Once script written → Save & Exit).

#!/bin/bash

# diagnose_usage.sh
# Run this locally or remotely via SSH or YAML

echo "=== High Usage Diagnostic Report ==="
echo "[+] Hostname: $(hostname)"
echo "[+] Time: $(date)"
echo ""

echo "[CPU Usage:]"
top -bn1 | grep "Cpu(s)"

echo "[Top CPU-Consuming Processes:]"
ps -eo pid,ppid,cmd,%mem,%cpu --sort=-%cpu | head -10

echo "[Memory Usage:]"
free -m

echo "[Disk Usage:]"
df -hT | grep -v tmpfs

echo "[Inode Usage:]"
df -i

echo "[I/O Wait / Load Average:]"
uptime

echo "[Zombie Processes:]"
ps aux | awk '{ if ($8 == "Z") print $0; }'

echo "[Dmesg Errors:]"
dmesg | tail -20
  1. (For Dry RUN) Execute the script using ./ command. (./diagnose.sh).

    OUTPUT

  2. Moving forward, created a local repo containing the script & pushed it to my remote GITHUB account.

    Note: Can use alternate repos such as Azure Repos as well.

  3. I used Version Control with Local GIT having branches mainly main, release/* or hotfix etc..

  4. Once GitHub is setup, I applied Branch Policies to default branches leaving hotfix related branches used for debugging out from policy. Performed commits using Pull Requests.

  5. Finally creating a New Pipeline named as “Az-Ubuntu-YAML script.yml” in CI/CD - Azure DevOps Tool with YAML script. Linking my Code Repository as GitHub account containing the Shell Script - ‘diagnose.sh’.

    Note: Can use alternate CI/CD Tools such as Jenkins, GitHub Actions etc.

trigger:
  branches:
    include:
      - release/hotfix

    exclude:
      - main

stages:
- stage: DiagnoseHighUsage
  displayName: 'Diagnose High Resource Usage on VM'
  jobs:
  - job: connectToVM
    displayName: 'SSH to VM using key and run diagnostics'
    pool:
      vmImage: 'ubuntu-latest'

    steps:
    - task: DownloadSecureFile@1
      displayName: 'Download SSH keys from Az Devops SecureFile'
      name: Ubuntukey           ##Uploaded to Azure Devops -> Project -> Pipeline -> Library
      inputs:
        secureFile: AzUbuntu-key.pem

    - script: |
        chmod 600 $(Ubuntukey.secureFilePath)
        ssh -o StrictHostKeyChecking=no -i $(Ubuntukey.secureFilePath) -o GSSAPIAuthentication=no pratul@172.191.228.28 << 'EOF'
        echo ">>> Conenction successful"

        echo ">>> Updating & Upgrading of Ubuntu VM"
        sudo apt update -y && sudo apt upgrade -y

        echo ">>> Downloading diagnose_usage.sh from GitHub..."
        curl -o ~/diagnose_usage.sh "https://raw.githubusercontent.com/prak96/motadata-assignment/release/hotfix/Linux%20IOSTAT%20automate/diagnose_usage.sh"

        echo ">>> Setting script permission..."
        chmod +x ~/diagnose_usage.sh

        echo ">>> Running diagnose_usage.sh..."
        bash ~/diagnose_usage.sh

        EOF
      displayName: "Secure SSH conenction to UbuntuVM & executed the SCRIPT"

CHALLENGES #1

Service Connection of Azure DevOps with Azure Failed!

My First approach was to integrate the Azure Subscription using below task within stage: DiagnoseHighUsage

- task: AzureCLI@2

Since there was an issue with my Azure Subscription, for which Service Connection got failed with Error:

Alternative:

Therefore, I moved my approach to using ssh connection with the Public IP address attached to LINUX VM created in Azure.

Note: To provide extra security & for script automation, I relied on keygen using RSA-4096 public/private keys for ssh.

CHALLENGES #2

Pipeline stuck after VM connects!

Diagnose:

As checked, There was a command EOF missing from ssh connection line. Due to which my Pipeline was stuck in an infinite loop with no results reflected.

Note:

EOF is a delimiter used in a here document (also called a heredoc) in shell scripting.

A here document allows you to pass a block of text or commands to a command like ssh, cat, or bash. Everything between << 'EOF' and the ending EOF is treated as input to the command.

In this SSH command, the heredoc is used to send a series of commands to be executed on the remote server after logging in via SSH.

This will:

  1. SSH into the remote machine.

  2. Run the commands inside the heredoc (echo, apt update, apt install).

Error with script:

chmod 600 $(Ubuntukey.secureFilePath) 

##Missing 'EOF'
ssh -o StrictHostKeyChecking=no -i $(Ubuntukey.secureFilePath) -o GSSAPIAuthentication=no pratul@172.191.228.28

After findings over internet, I correctly appended the EOF to ssh line as below.

ssh -o StrictHostKeyChecking=no -i $(Ubuntukey.secureFilePath) -o GSSAPIAuthentication=no pratul@172.191.228.28 << 'EOF'

Corrected script:

chmod 600 $(Ubuntukey.secureFilePath)

##Appended 'EOF' at end of ssh line
ssh -o StrictHostKeyChecking=no -i $(Ubuntukey.secureFilePath) -o GSSAPIAuthentication=no pratul@172.191.228.28 << 'EOF'
echo ">>> Conenction successful"

🔐 Why 'EOF' in Quotes?

Using 'EOF' (quoted) prevents variable expansion and command substitution inside the heredoc. That means $VAR or $(command) will not be evaluated—they’ll be treated as literal strings.

CHALLENGES #3

Reference Syntaxes didn’t Work!

Error:

Diagnose:

Found Errors in my YAML script:

  1. Using -p: Not Working when appended with sudo yum or sudo apt.

  2. Using incorrect GITHUB URL to download the ‘diagnose.sh’ content using curl.

echo ">>> Updating & Upgrading of Ubuntu VM"
        ##'-p' used which need to be omitted. 
sudo apt update -y -p && sudo apt upgrade -y -p

echo ">>> Downloading diagnose_usage.sh from GitHub..."
        ##Incorrect GitHub URL (https://github.com/prak96/) used to download content using command like 'curl'  
curl -o ~/diagnose_usage.sh "https://github.com/prak96/motadata-assignment/release/hotfix/Linux%20IOSTAT%20automate/diagnose_usage.sh"

Corrected script:

echo ">>> Updating & Upgrading of Ubuntu VM"
        ##'-p' omitted. 
sudo apt update -y && sudo apt upgrade -y

echo ">>> Downloading diagnose_usage.sh from GitHub..."
        ##Corrected GitHub URL (https://raw.githubusercontent.com/prak96/) used to download content using command like 'curl'  
curl -o ~/diagnose_usage.sh "https://raw.githubusercontent.com/prak96/motadata-assignment/release/hotfix/Linux%20IOSTAT%20automate/diagnose_usage.sh"

Once everything are rectified after certain iterations of trials & commits in my CI/CD Pipeline. I finally got my desired output with system diagnostics of LINUX VM (hosted in Azure) which is essential for Monitoring.


OUTPUT

Output Logs:

2025-08-05T19:54:45.4288596Z ##[section]Starting: Secure SSH conenction to UbuntuVM & executed the SCRIPT
2025-08-05T19:54:45.4296877Z ==============================================================================
2025-08-05T19:54:45.4297427Z Task         : Command line
2025-08-05T19:54:45.4297907Z Description  : Run a command line script using Bash on Linux and macOS and cmd.exe on Windows
2025-08-05T19:54:45.4298491Z Version      : 2.250.1
2025-08-05T19:54:45.4298837Z Author       : Microsoft Corporation
2025-08-05T19:54:45.4299560Z Help         : https://docs.microsoft.com/azure/devops/pipelines/tasks/utility/command-line
2025-08-05T19:54:45.4300245Z ==============================================================================
2025-08-05T19:54:45.5719919Z Generating script.
2025-08-05T19:54:45.5727973Z ========================== Starting Command Output ===========================
2025-08-05T19:54:45.5741757Z [command]/usr/bin/bash --noprofile --norc /home/vsts/work/_temp/46965a7a-0806-47d7-b1cf-4ea4a13c3070.sh
2025-08-05T19:54:45.6884695Z Pseudo-terminal will not be allocated because stdin is not a terminal.
2025-08-05T19:54:46.6504532Z Warning: Permanently added '172.191.228.28' (ED25519) to the list of known hosts.
2025-08-05T19:54:48.4750518Z Welcome to Ubuntu 24.04.3 LTS (GNU/Linux 6.11.0-1018-azure x86_64)
2025-08-05T19:54:48.4752760Z 
2025-08-05T19:54:48.4753284Z  * Documentation:  https://help.ubuntu.com
2025-08-05T19:54:48.4753958Z  * Management:     https://landscape.canonical.com
2025-08-05T19:54:48.4754385Z  * Support:        https://ubuntu.com/pro
2025-08-05T19:54:48.4754533Z 
2025-08-05T19:54:48.4755081Z  System information as of Tue Aug  5 19:54:47 UTC 2025
2025-08-05T19:54:48.4755806Z 
2025-08-05T19:54:48.4756116Z   System load:  0.08              Processes:             135
2025-08-05T19:54:48.4756448Z   Usage of /:   6.6% of 28.02GB   Users logged in:       1
2025-08-05T19:54:48.4756825Z   Memory usage: 4%                IPv4 address for eth0: 10.0.0.4
2025-08-05T19:54:48.4757266Z   Swap usage:   0%
2025-08-05T19:54:48.4757412Z 
2025-08-05T19:54:48.4757582Z 
2025-08-05T19:54:48.4757899Z Expanded Security Maintenance for Applications is not enabled.
2025-08-05T19:54:48.4758052Z 
2025-08-05T19:54:48.4758367Z 0 updates can be applied immediately.
2025-08-05T19:54:48.4758493Z 
2025-08-05T19:54:48.4759126Z Enable ESM Apps to receive additional future security updates.
2025-08-05T19:54:48.4759616Z See https://ubuntu.com/esm or run: sudo pro status
2025-08-05T19:54:48.4759840Z 
2025-08-05T19:54:48.4759969Z 
2025-08-05T19:54:48.6930757Z >>> Conenction successful
2025-08-05T19:54:48.6932138Z >>> Updating & Upgrading of Ubuntu VM
2025-08-05T19:54:48.7217434Z 
2025-08-05T19:54:48.7219240Z WARNING: apt does not have a stable CLI interface. Use with caution in scripts.
2025-08-05T19:54:48.7219546Z 
2025-08-05T19:54:48.8270387Z Hit:1 http://azure.archive.ubuntu.com/ubuntu noble InRelease
2025-08-05T19:54:48.8271635Z Hit:2 http://azure.archive.ubuntu.com/ubuntu noble-updates InRelease
2025-08-05T19:54:48.8272456Z Hit:3 http://azure.archive.ubuntu.com/ubuntu noble-backports InRelease
2025-08-05T19:54:48.8273021Z Hit:4 http://azure.archive.ubuntu.com/ubuntu noble-security InRelease
2025-08-05T19:54:48.9005976Z Hit:5 https://packages.microsoft.com/repos/microsoft-ubuntu-noble-prod noble InRelease
2025-08-05T19:54:50.4927449Z Reading package lists...
2025-08-05T19:54:50.6980571Z Building dependency tree...
2025-08-05T19:54:50.6983230Z Reading state information...
2025-08-05T19:54:50.7184007Z 3 packages can be upgraded. Run 'apt list --upgradable' to see them.
2025-08-05T19:54:50.7402419Z 
2025-08-05T19:54:50.7403602Z WARNING: apt does not have a stable CLI interface. Use with caution in scripts.
2025-08-05T19:54:50.7404984Z 
2025-08-05T19:54:50.7569963Z Reading package lists...
2025-08-05T19:54:50.9781790Z Building dependency tree...
2025-08-05T19:54:50.9785585Z Reading state information...
2025-08-05T19:54:51.0910066Z Calculating upgrade...
2025-08-05T19:54:51.2822702Z The following upgrades have been deferred due to phasing:
2025-08-05T19:54:51.2825113Z   python-apt-common python3-apt snapd
2025-08-05T19:54:51.2975536Z 0 upgraded, 0 newly installed, 0 to remove and 3 not upgraded.
2025-08-05T19:54:51.3180035Z >>> Downloading diagnose_usage.sh from GitHub...
2025-08-05T19:54:51.3244687Z   % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
2025-08-05T19:54:51.3245211Z                                  Dload  Upload   Total   Spent    Left  Speed
2025-08-05T19:54:51.3245423Z 
2025-08-05T19:54:51.4260548Z   0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
2025-08-05T19:54:51.4262428Z 100   591  100   591    0     0   5824      0 --:--:-- --:--:-- --:--:--  5851
2025-08-05T19:54:51.4286397Z >>> Setting script permission...
2025-08-05T19:54:51.4299237Z >>> Running diagnose_usage.sh...
2025-08-05T19:54:51.4313803Z === High Usage Diagnostic Report ===
2025-08-05T19:54:51.4323787Z [+] Hostname: motadata-ubuntuVM
2025-08-05T19:54:51.4336655Z [+] Time: Tue Aug  5 19:54:51 UTC 2025
2025-08-05T19:54:51.4337115Z 
2025-08-05T19:54:51.4339917Z [CPU Usage:]
2025-08-05T19:54:51.6455344Z %Cpu(s):  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st 
2025-08-05T19:54:51.6456629Z [Top CPU-Consuming Processes:]
2025-08-05T19:54:51.6530431Z     PID    PPID CMD                         %MEM %CPU
2025-08-05T19:54:51.6534580Z    6758       1 /usr/libexec/packagekitd     0.2  0.4
2025-08-05T19:54:51.6535294Z    6440    1133 sshd: pratul [priv]          0.1  0.1
2025-08-05T19:54:51.6535803Z       1       0 /sbin/init                   0.1  0.1
2025-08-05T19:54:51.6536302Z    1018     806 /usr/bin/python3 -u bin/WAL  0.4  0.1
2025-08-05T19:54:51.6537159Z     100       2 [kworker/0:2-events]         0.0  0.1
2025-08-05T19:54:51.6537723Z    2672       1 /usr/libexec/fwupd/fwupd     0.5  0.0
2025-08-05T19:54:51.6538072Z      70       2 [kworker/u8:4-events_power_  0.0  0.0
2025-08-05T19:54:51.6538499Z     889       1 /usr/sbin/chronyd -F 1       0.0  0.0
2025-08-05T19:54:51.6539168Z      60       2 [hwrng]                      0.0  0.0
2025-08-05T19:54:51.6539566Z [Memory Usage:]
2025-08-05T19:54:51.6552771Z                total        used        free      shared  buff/cache   available
2025-08-05T19:54:51.6553300Z Mem:            7893         550        6909           4         686        7342
2025-08-05T19:54:51.6553678Z Swap:              0           0           0
2025-08-05T19:54:51.6556890Z [Disk Usage:]
2025-08-05T19:54:51.6573331Z Filesystem     Type      Size  Used Avail Use% Mounted on
2025-08-05T19:54:51.6575324Z /dev/root      ext4       29G  1.9G   27G   7% /
2025-08-05T19:54:51.6576012Z efivarfs       efivarfs  128K   37K   87K  30% /sys/firmware/efi/efivars
2025-08-05T19:54:51.6576950Z /dev/sda16     ext4      881M   60M  760M   8% /boot
2025-08-05T19:54:51.6577829Z /dev/sda15     vfat      105M  6.2M   99M   6% /boot/efi
2025-08-05T19:54:51.6578186Z /dev/sdb1      ext4       16G   28K   15G   1% /mnt
2025-08-05T19:54:51.6578623Z [Inode Usage:]
2025-08-05T19:54:51.6585201Z Filesystem      Inodes IUsed   IFree IUse% Mounted on
2025-08-05T19:54:51.6587281Z /dev/root      3801088 78649 3722439    3% /
2025-08-05T19:54:51.6588384Z tmpfs          1010366     1 1010365    1% /dev/shm
2025-08-05T19:54:51.6591042Z tmpfs           819200   781  818419    1% /run
2025-08-05T19:54:51.6591435Z tmpfs          1010366     3 1010363    1% /run/lock
2025-08-05T19:54:51.6591761Z efivarfs             0     0       0     - /sys/firmware/efi/efivars
2025-08-05T19:54:51.6592304Z /dev/sda16       58496   600   57896    2% /boot
2025-08-05T19:54:51.6592642Z /dev/sda15           0     0       0     - /boot/efi
2025-08-05T19:54:51.6592970Z /dev/sdb1      1048576    12 1048564    1% /mnt
2025-08-05T19:54:51.6593311Z tmpfs           202073    32  202041    1% /run/user/1000
2025-08-05T19:54:51.6593631Z [I/O Wait / Load Average:]
2025-08-05T19:54:51.6604043Z  19:54:51 up 29 min,  2 users,  load average: 0.08, 0.03, 0.01
2025-08-05T19:54:51.6608066Z [Zombie Processes:]
2025-08-05T19:54:51.6692438Z [Dmesg Errors:]
2025-08-05T19:54:51.6707033Z dmesg: read kernel buffer failed: Operation not permitted
2025-08-05T19:54:51.6738339Z 
2025-08-05T19:54:51.6800752Z ##[section]Finishing: Secure SSH conenction to UbuntuVM & executed the SCRIPT
0
Subscribe to my newsletter

Read articles from Pratul directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Pratul
Pratul

A passionate L1 Server Engineer with a growing focus on DevOps practices. With experience in server administration, troubleshooting and infrastructure management, I am skilled at optimizing workflows through automation and CI/CD pipelines. Currently working with cloud platforms like AWS & Azure, virtualization technologies, and configuration management tools. Committed to enhancing efficiency and productivity. Through this blog, I will be sharing hands-on insights, tutorials, and practical tips aimed at helping fellow professionals in server engineering and DevOps.