Bash Scripting for Server Health Monitoring: A Beginner’s Guide with A

📜 History

"I built this script after my server crashed mid-game because I ignored a full disk. Now it’s my overzealous guardian angel—it nags me via email before things explode. Turns out, logs are cheaper than therapy."

👤 Author

Name: Nishant Yadav
Background: "Bash scripting hobbyist who treats terminal errors like cryptic poetry. I still panic when I see ‘segmentation fault’, but at least my scripts email me about it now."
Social: GitHub | Email

📖 Summary

A Bash script that monitors CPU, RAM, and disk usage like a paranoid sysadmin. Sends email alerts when thresholds are breached and logs everything (because trust issues).

Key Features:

Configurable thresholds (no code editing required).
Anti-spam logic: Alerts only after ALERT_THRESHOLD breaches.
Logs: For when you need receipts to prove your server is dramatic.

📝 Notes

The Struggle™:

“Debugging Bash floats is like asking a toaster to do calculus. Thank you, bc -l, for existing.”
“Gmail blocked my alerts until I learned about app passwords. Now my script is basically a tattletale.”

Future Plans:

Discord Alerts: “Replace emails with a bot that posts ‘RIP SERVER’ in #general.”
Panic Mode: “Auto-delete my Minecraft world if disk hits 95%. Priorities.”

🎯 Objectives

Primary: “Prevent 2 AM server meltdowns with passive-aggressive emails.”
Secondary: “Make logs so detailed they could be a Netflix documentary.”

🔌 Dependencies

Tools: mailutils, bc (for math that Bash can’t handle).
Tested On: Ubuntu/Debian. “If it breaks on Arch, blame the AUR gremlins.”

🚀 Examples

bash

Copy

Download

# Run a health check (like a responsible adult):  
./health-monitor.sh  

# Trigger a fake alert to test your email setup (chaos mode):  
MAX_CPU=5 && ./health-monitor.sh  # Prepare for spam!

💻 The Code

"Here’s the script—comments included because I forget how my own code works."

bash

Copy

Download

#!/bin/bash

# Configuration (because hardcoding is for amateurs)
CONFIG_FILE="server-health.conf"  # Thresholds live here
LOG_FILE="health.log"             # Server's diary
ALERT_FILE="alerts.log"           # Panic log
MAX_CPU=90                        # "Why is my CPU crying?" threshold
MAX_MEM=85                        # RAM's breaking point
MAX_DISK=90                       # When your disk becomes a hoarder
ADMIN_EMAIL="nishantyadav2207@gmail.com"  # Where tears go
ALERT_THRESHOLD=1                 # How many warnings before spamming you

ALERT_COUNT=0                     # Tracks how mad the script is

# Load config (or cry trying)
load_config() {
    [ -f "$CONFIG_FILE" ] && source "$CONFIG_FILE"
}

# Check CPU Usage (spoiler: it's always 100%)
check_cpu() {
    cpu_usage=$(top -bn1 | grep "Cpu(s)" | awk '{print 100 - $8}')
    printf -v cpu_usage "%.1f" "$cpu_usage"
    echo "CPU Usage: $cpu_usage%"  # Debug line for existential crises

    if (( $(echo "$cpu_usage > $MAX_CPU" | bc -l) )); then
        log_alert "CPU" "$cpu_usage" "$MAX_CPU"
    fi
}

# Check Memory Usage (where did all the RAM go?)
check_memory() {
    mem_usage=$(free -m | awk '/Mem:/ {print ($3/$2)*100}')
    printf -v mem_usage "%.1f" "$mem_usage"
    echo "Memory Usage: $mem_usage%"  # Debug line for denial

    if (( $(echo "$mem_usage > $MAX_MEM" | bc -l) )); then
        log_alert "Memory" "$mem_usage" "$MAX_MEM"
    fi
}

# Check Disk Usage (spoiler: it's always /tmp)
check_disk() {
    disk_usage=$(df -P / | awk 'NR==2 {gsub("%", "", $5); print $5}')
    echo "Disk Usage: $disk_usage%"  # Debug line for hoarders

    if [ "$disk_usage" -gt "$MAX_DISK" ]; then
        log_alert "Disk" "$disk_usage" "$MAX_DISK"
    fi
}

# Log alerts and maybe send an email (you've been warned)
log_alert() {
    local metric=$1
    local value=$2
    local max=$3
    local message="$metric usage high: ${value}% (Threshold: ${max}%)"

    echo "$(date) - WARNING: $message" >> "$ALERT_FILE"
    ((ALERT_COUNT++))

    if [ "$ALERT_COUNT" -ge "$ALERT_THRESHOLD" ]; then
        send_notification "$message"
        ALERT_COUNT=0  # Reset counter (no spam zone)
    fi
}

# Send email (requires mailutils setup. Good luck.)
send_notification() {
    local message=$1
    echo "$message" | mail -s "Server Alert" "$ADMIN_EMAIL"
}

# Generate a report nobody will read (but it's pretty)
generate_report() {
    echo "----- Server Health Report -----" >> "$LOG_FILE"
    echo "Timestamp: $(date)" >> "$LOG_FILE"
    echo "CPU: ${cpu_usage}% (Max: ${MAX_CPU}%)" >> "$LOG_FILE"
    echo "Memory: ${mem_usage}% (Max: ${MAX_MEM}%)" >> "$LOG_FILE"
    echo "Disk: ${disk_usage}% (Max: ${MAX_DISK}%)" >> "$LOG_FILE"
    echo "--------------------------------" >> "$LOG_FILE"
}

# Main function (where the magic happens)
main() {
    load_config
    check_cpu
    check_memory
    check_disk
    generate_report
}

# Let's roll!
main

💬 Human Touch

“This script is like a grumpy roommate who texts you ‘CLEAN YOUR DISK’ at 3 AM. It’s not perfect, but it works—kinda like my sleep schedule. Star the repo if you’ve ever cried over a segmentation fault!”

Latest Code: GitHub Repo
PS: If it breaks, blame the gremlins. Or capitalism.

HASNODE Documentation: Server Health Monitor