Linux Commands Cheatsheet:Top 50 Monitoring & Troubleshooting cmd 2024

As a DevOps professional, having a solid understanding of Linux commands for monitoring and troubleshooting is crucial. Linux offers a vast array of powerful tools that can help you effectively manage and maintain your systems. In this comprehensive guide, we'll explore the top 50 most important Linux commands that every DevOps engineer should have in their arsenal, complete with real-time examples to help you better understand and apply them.

uptime The uptime command displays the system uptime, load average, and the number of logged-in users. This information can be particularly useful for monitoring system performance and identifying potential issues.

$ uptime
 01:23:45 up 12 days, 18:43,  1 user,  load average: 0.08, 0.02, 0.05

In this example, the system has been running for 12 days and 18 hours, with one user logged in. The load averages for the past 1, 5, and 15 minutes are 0.08, 0.02, and 0.05, respectively.

top The top command provides a real-time view of running processes, CPU and memory usage, allowing you to monitor system resources and identify potential bottlenecks or resource-intensive processes.

top - 01:25:03 up 12 days, 18:45,  1 user,  load average: 0.08, 0.02, 0.05
Tasks: 279 total,   1 running, 278 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.3 us,  0.2 sy,  0.0 ni, 99.5 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
MiB Mem :  7981.8 total,   776.4 free,  4943.8 used,  2261.6 buff/cache
MiB Swap:  2048.0 total,  1976.0 free,    72.0 used.  2526.8 avail Mem

This output shows the current system load, CPU and memory usage, along with a list of running processes sorted by CPU utilization.

htop htop is an interactive process viewer with advanced features and a user-friendly interface. It provides a more visually appealing and customizable way to monitor processes and system resources.

  1  [||||||||||||||||||||||||||||||||||||||||||||||||100.0%]   
  2  [|||                                              12.5%]    
  3  [||                                                8.3%]    
  4  [|                                                 4.2%]    
  5  [                                                  0.0%]

In this example, htop displays a graphical representation of CPU usage for the top five processes, making it easier to identify resource-intensive tasks.

ps The ps command lists running processes on the system. It provides various options to filter and format the output based on your needs.

$ ps aux
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root         1  0.0  0.3  37412  6852 ?        Ss   Apr25   0:01 /sbin/init
root         2  0.0  0.0      0     0 ?        S    Apr25   0:00 [kthreadd]
root         3  0.0  0.0      0     0 ?        I<   Apr25   0:00 [rcu_gp]

This command shows all running processes, along with their user, process ID (PID), CPU and memory usage, and the command that started the process.

kill The kill command terminates a process by sending a signal to its process ID (PID). This can be useful for stopping unresponsive or misbehaving processes.

$ ps aux | grep firefox
user     12345  3.4  2.1 1234567 543216 ?      Sl   23:12   0:34 /usr/lib/firefox/firefox

$ kill 12345

In this example, we first locate the PID of the Firefox process using ps aux and grep. Then, we use kill with the PID to terminate the process.

free The free command displays the amount of available and used memory, both physical and swap.

$ free -h
              total        used        free      shared  buff/cache   available
Mem:           7.8G        4.6G        726M        212M        2.5G        2.8G
Swap:          2.0G         60M        1.9G

The -h option displays the output in human-readable units (e.g., gigabytes, megabytes). This can help you quickly assess the system's memory usage and identify potential memory bottlenecks.

vmstat The vmstat command reports virtual memory statistics, including CPU, memory, and disk utilization.

$ vmstat 2 5
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 1  0      0 1382616 160404 1972332    0    0    53    18   48   48  6  1 93  0  0
 0  0      0 1382376 160424 1972464    0    0     0     0   47   63  0  0 100  0  0
 0  0      0 1382364 160440 1972496    0    0     0     0   44   44  0  0 100  0  0
 0  0      0 1382356 160440 1972520    0    0     0    24   45   57  0  0 100  0  0
 0  0      0 1382348 160440 1972544    0    0     0     0   43   43  0  0 100  0  0

In this example, vmstat is invoked with the arguments 2 (delay between updates in seconds) and 5 (number of updates). The output displays various system statistics, including process counts, memory usage, swap usage, I/O activity, and CPU utilization.

df The df command shows the file system disk space usage, allowing you to monitor available disk space and identify potential issues.

$ df -h
Filesystem      Size  Used Avail Use% Mounted on
/dev/nvme0n1p2   50G   12G   36G  26% /
devtmpfs        3.8G     0  3.8G   0% /dev
tmpfs           3.9G  1.1M  3.9G   1% /dev/shm
tmpfs           3.9G   13M  3.8G   1% /run
tmpfs           3.9G     0  3.9G   0% /sys/fs/cgroup
/dev/nvme0n1p1  511M  7.8M  504M   2% /boot/efi
tmpfs           783M   13k  783M   1% /run/user/1000

The -h option displays disk space in human-readable units. This output shows the file system, total size, used space, available space, usage percentage, and mount point for each file system.

du The du command estimates and summarizes file and directory space usage, helping you identify space hogs and optimize disk usage.

$ du -sh /var/log/*
8.0K  /var/log/alternatives.log
56K   /var/log/apt
20K   /var/log/auth.log
...
72M   /var/log/syslog
84M   /var/log/total

In this example, we use the -s (summarize) and -h (human-readable) options to display the disk space usage for files and directories within /var/log/. This can help you identify and manage large log files or directories that may be consuming excessive disk space.

iostat The iostat command reports CPU and input/output (I/O) statistics for devices and partitions, allowing you to monitor disk activity and identify potential I/O bottlenecks.

$ iostat -d -x 2 3
Device            r/s     w/s     rkB/s     wkB/s   rrqm/s   wrqm/s  %rrqm  %wrqm r_await w_await aqu-sz     rareq-sz     wareq-sz
nvme0n1         2.50    1.00     64.00     64.00      0.00      0.00   0.00   0.00    1.60   18.00     0.01        25.60        64.00
nvme1n1         0.00    0.00      0.00      0.00      0.00      0.00   0.00   0.00    0.00    0.00     0.00         0.00         0.00

Device            r/s     w/s     rkB/s     wkB/s   rrqm/s   wrqm/s  %rrqm  %wrqm r_await w_await aqu-sz     rareq-sz     wareq-sz
nvme0n1         0.00    0.00      0.00      0.00      0.00      0.00   0.00   0.00    0.00    0.00     0.00         0.00         0.00
nvme1n1         0.00    0.00      0.00      0.00      0.00      0.00   0.00   0.00    0.00    0.00     0.00         0.00         0.00

Device            r/s     w/s     rkB/s     wkB/s   rrqm/s   wrqm/s  %rrqm  %wrqm r_await w_await aqu-sz     rareq-sz     wareq-sz
nvme0n1         0.00    0.00      0.00      0.00      0.00      0.00   0.00   0.00    0.00    0.00     0.00         0.00         0.00
nvme1n1         0.00    0.00      0.00      0.00      0.00      0.00   0.00   0.00    0.00    0.00     0.00         0.00         0.00

In this example, we use iostat with the -d (display device statistics), -x (display extended statistics), 2 (delay between updates in seconds), and 3 (number of updates) options. The output shows various I/O statistics for each device, including read/write operations, data transfer rates, and request queue sizes.

iotop The iotop command monitors I/O usage by processes or threads, helping you identify and troubleshoot I/O-intensive operations.

Total DISK READ: 16.62 K/s | Total DISK WRITE: 4.00 K/s
  TID  PRIO  USER     DISK READ  DISK WRITE  SWAPIN     IO>    COMMAND
 4092 be/4 root        0.00 B/s    4.00 K/s  0.00 %  0.00 % [jbd2/nvme0n1p2]
 1180 be/3 root        8.81 K/s    0.00 B/s  0.00 %  0.04 % /usr/bin/pulseaudio --daemonize=no
    1 be/4 root        4.82 K/s    0.00 B/s  0.00 %  0.00 % /sbin/init
 2522 be/4 systemd+    0.00 B/s    0.00 B/s  0.00 %  0.00 % /lib/systemd/systemd-oomd

In this output, iotop displays the total disk read and write rates, along with a list of processes sorted by their I/O usage. This information can help you quickly identify and address I/O-intensive processes that may be causing performance issues.

netstat The netstat command displays network connections, routing tables, and network interface statistics, allowing you to monitor and troubleshoot network-related issues.

$ netstat -tunlp
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name
tcp        0      0 0.0.0.0:22              0.0.0.0:*               LISTEN      1234/sshd
tcp        0      0 127.0.0.53:53           0.0.0.0:*               LISTEN      567/systemd-resolv
tcp        0      0 0.0.0.0:80              0.0.0.0:*               LISTEN      7890/nginx
tcp6       0      0 :::22                   :::*                    LISTEN      1234/sshd

In this example, we use netstat with the -t (TCP), -u (UDP), -n (numeric), -l (listening), and -p (program) options to display active TCP and UDP connections, including the local and foreign addresses, connection state, and the associated process ID and program name.

ss The ss command is another utility for investigating network connections, similar to netstat. It provides more advanced filtering and formatting capabilities.

$ ss -tln
State      Recv-Q Send-Q                                      Local Address:Port                                                      Peer Address:Port
LISTEN     0      128                                                    *:22                                                                     *:*
LISTEN     0      5                                            127.0.0.53%lo:53                                                                   *:*
LISTEN     0      128                                                   *:80                                                                      *:*
LISTEN     0      128                                                  :::22                                                                     :::*

In this example, we use ss with the -t (TCP), -l (listening), and -n (numeric) options to display listening TCP sockets. The output shows the connection state, receive and send queue sizes, local and foreign addresses, and ports.

iftop The iftop command monitors network traffic by process or socket, helping you identify bandwidth hogs and troubleshoot network-related issues.

                   TX       RX         TOTAL
                Cumm   Rate   Cumm   Rate   Cumm   Rate
vnet0      =>   0.00      0     0.00      0     0.00      0
eno1       =>  56.6M   320b  28.9M   412b  85.5M   732b
                N/A   464b   0.00      0   0.00      0
lo         =>   0.00      0   0.00      0     0.00      0
--------------------------------------------------------------
TX:pps=0.0 /s, rx_pps=0.0 /s
======================================================================

In this output, iftop displays the cumulative and current transfer rates for each network interface, both for transmitted (TX) and received (RX) data. Additionally, it shows the combined total rates. This information can help you identify network-intensive processes or services that may be causing network congestion or excessive bandwidth usages.

nmap

$ nmap -sS -O 192.168.1.100

Starting Nmap 7.80 ( https://nmap.org ) at 2023-05-21 20:25 EDT
Nmap scan report for 192.168.1.100
Host is up (0.00018s latency).
Not shown: 998 closed ports
PORT     STATE SERVICE
22/tcp   open  ssh
80/tcp   open  http
Device type: general purpose
Running: Linux 3.X|4.X
OS CPE: cpe:/o:linux:linux_kernel:3 cpe:/o:linux:linux_kernel:4
OS details: Linux 3.2 - 4.9

The nmap command is a powerful network exploration and security scanning tool. In this example, we use nmap with the -sS (TCP SYN scan), -O (OS detection), and a target IP address to scan the host and detect open ports and the operating system.

lsof
```
$ lsof /var/log/syslog
COMMAND   PID USER   FD   TYPE DEVICE SIZE/OFF NODE NAME
rsyslogd 1234 root    4w   REG  259,1   541811 1234 /var/log/syslog
```
The lsof (list open files) command lists open files and their associated processes. In this example, we use lsof to find the process that has the /var/log/syslog file open, along with the process ID (PID), user, file descriptor (FD), and other details.

strace

$ strace -c -p 1234
Process 1234 attached
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 99.99    0.000000           0         1           read
  0.00    0.000000           0         3           write
  0.00    0.000000           0         1           close
------ ----------- ----------- --------- --------- ----------------
100.00    0.000000                     5

The strace command traces system calls and signals for a given process. In this example, we use strace with the -c (summary by system call) and -p (attach to process) options to trace the system calls made by a specific process (PID 1234). The output shows the time spent, number of calls, and errors for each system call.

ltrace

$ ltrace -c -p 1234
Missed 1 call(s) at the beginning of the executable (address range: 0x7f123456 - 0x7f789012).
% time     seconds  usecs/call     calls      function
------ ----------- ----------- --------- --------------------
 66.67    0.000123          41         3 strlen
 33.33    0.000062          62         1 printf
------ ----------- ----------- --------- --------------------
        0.000185                    4 total

The ltrace command traces library calls made by a process. In this example, we use ltrace with the -c (display a summary) and -p (attach to process) options to trace the library calls made by a specific process (PID 1234). The output displays the time spent, number of calls, and the library functions called.

dmesg

$ dmesg | tail
[    0.000000] Kernel command line: BOOT_IMAGE=/boot/vmlinuz-5.4.0-124-generic root=UUID=e6e0af74-8df6-4d32-b8f3-c524ee2711e6 ro
[    0.000000] Kernel command line: BOOT_IMAGE=/boot/vmlinuz-5.4.0-124-generic root=UUID=e6e0af74-8df6-4d32-b8f3-c524ee2711e6 ro
[    0.869321] EXT4-fs (vda1): mounted filesystem with ordered data mode. Opts: (null)
[    1.425983] systemd[1]: Inserted module 'autofs4'
[    1.897923] systemd[1]: Listening on Device-mapper event daemon FIFOs.
[    1.899651] systemd[1]: Listening on LVM2 metadata daemon socket.
[    2.057626] lvm [622]: Monitoring 2 remotely-mirrored RAID1s
[    2.205048] systemd[1]: Started LVM2 metadata daemon.
[    2.508877] systemd[1]: Starting LVM2 metadata integrity pause...
[    2.509400] systemd[1]: Started LVM2 metadata integrity pause.

The dmesg command prints or controls the kernel ring buffer. In this example, we use dmesg with the tail command to display the last 10 lines of the kernel ring buffer, which can provide useful information about the boot process, loaded modules, and other system events.

journalctl

$ journalctl -u nginx.service --since "2023-05-21 18:00:00"
-- Logs begin at Mon 2023-05-21 18:00:16 EDT, end at Mon 2023-05-21 20:30:21 EDT. --
May 21 18:00:16 host1 systemd[1]: Starting A high performance web server and a reverse proxy server...
May 21 18:00:16 host1 nginx[1234]: nginx: the configuration file /etc/nginx/nginx.conf syntax is ok
May 21 18:00:16 host1 nginx[1234]: nginx: configuration file /etc/nginx/nginx.conf test is successful
May 21 18:00:16 host1 systemd[1]: Started A high performance web server and a reverse proxy server.

The journalctl command queries and displays the systemd journal, which is the logging system used by modern Linux distributions. In this example, we use journalctl with the -u (unit) option to filter logs for the nginx.service, and the --since option to retrieve logs since a specific date and time.

sysdig

$ sysdig -c topprocs_file
Warning: No /usr/src/... kernel headers found for 4.15.0-20-generic
Capturing OS data
Press Ctrl+C to terminate
07:26:52
09:26:56  |SYSTEM
09:26:56  |ERRORS
09:26:56  |FILE_NAME      PARTIAL_NAME %BYTES.FILE %BYTES.TOTAL        BYTES OPENS
09:26:56  |/dev/pts/0                        0.0%         0.0%            16     4
09:26:56  |/usr/bin/sudo                100.0%        83.3%         45056     1
09:26:56  |/usr/bin/sudo                 0.0%        16.7%          9024     0

The sysdig command captures system state and activity from a kernel interface. In this example, we use sysdig with the -c option to specify the topprocs_file chisel (a built-in sysdig capture view), which displays the top processes by file I/O activity.

sar

$ sar -u 2 3
Linux 5.4.0-124-generic (host1)   05/21/2023      _x86_64_        (2 CPU)

07:28:15 PM     CPU     %user %nice   %system %iowait  %steal   %idle
07:28:17 PM     all      0.50   0.00     0.50    0.00    0.00   99.00
07:28:19 PM     all      0.50   0.00     0.50    0.00    0.00   99.00
07:28:21 PM     all      0.50   0.00     0.50    0.00    0.00   99.00
Average:        all      0.50   0.00     0.50    0.00    0.00   99.00

The sar (System Activity Reporter) command collects and reports system activity data. In this example, we use sar with the -u (CPU utilization) option, along with a delay of 2 seconds and a count of 3 iterations, to display CPU usage statistics over a short period.

pidstat

$ pidstat -u 1 2
Linux 5.4.0-124-generic (host1)   05/21/2023  _x86_64_    (2 CPU)

07:29:55 PM   UID       PID    %usr %system  %guest   %CPU  CPU  Command
07:29:56 PM     0         1    0.00    0.00    0.00   0.00     0  /sbin/init
07:29:56 PM     0      1234    0.00    1.00    0.00   1.00     1  /usr/sbin/sshd

07:29:57 PM   UID       PID    %usr %system  %guest   %CPU  CPU  Command
07:29:57 PM     0         1    0.00    0.00    0.00   0.00     0  /sbin/init
07:29:57 PM     0      1234    0.00    0.00    0.00   0.00     1  /usr/sbin/sshd

The pidstat command monitors process statistics, including CPU, memory, and I/O usage. In this example, we use pidstat with the -u (CPU utilization) option, along with a delay of 1 second and a count of 2 iterations, to display CPU usage statistics for individual processes.

mpstat

$ mpstat -P ALL 2 3
Linux 5.4.0-124-generic (host1)   05/21/2023  _x86_64_    (2 CPU)

07:31:02 PM  CPU   %usr  %nice   %sys %iowait   %irq  %soft  %steal  %guest  %gnice  %idle
07:31:04 PM    0   0.00   0.00   0.00    0.00   0.00   0.00    0.00    0.00    0.00  100.00
07:31:04 PM    1   0.00   0.00   0.00    0.00   0.00   0.00    0.00    0.00    0.00  100.00

07:31:04 PM  CPU   %usr  %nice   %sys %iowait   %irq  %soft  %steal  %guest  %gnice  %idle
07:31:06 PM    0   0.50   0.00   0.00    0.00   0.00   0.00    0.00    0.00    0.00   99.50
07:31:06 PM    1   0.00   0.00   0.00    0.00   0.00   0.00    0.00    0.00    0.00  100.00

07:31:06 PM  CPU   %usr  %nice   %sys %iowait   %irq  %soft  %steal  %guest  %gnice  %idle
07:31:08 PM    0   0.50   0.00   0.00    0.00   0.00   0.00    0.00    0.00    0.00   99.50
07:31:08 PM    1   0.00   0.00   0.00    0.00   0.00   0.00    0.00    0.00    0.00  100.00

The mpstat command reports processor-related statistics. In this example, we use mpstat with the -P ALL option to display statistics for all processors, along with a delay of 2 seconds and a count of 3 iterations. The output shows the CPU utilization breakdown, including user, system, I/O wait, interrupt, and idle times.

nicstat

$ nicstat 1 3
    Time        Int    rKb/s    rPk/s    rAvs    rMax    rMin    %Util    tKb/s    tPk/s    tAvs    tMax    tMin  Unit
19:32:49.787    eno1    13.64    14.00    1024     1088      28      0.00     8.88     9.00    1024     1088      28  eth0
19:32:50.787    eno1    17.12    18.00    1024     1088      28      0.00    11.04    12.00    1024     1088      28  eth0
19:32:51.787    eno1    13.64    14.00    1024     1088      28      0.00     8.88     9.00    1024     1088      28  eth0

The nicstat command prints network traffic statistics for specified interfaces at specified intervals. In this example, we use nicstat with a delay of 1 second and a count of 3 iterations to display network traffic statistics for the eno1 interface, including received (rKb/s, rPk/s) and transmitted (tKb/s, tPk/s) rates, packet sizes (rAvs, tAvs), and utilization percentage (%Util).

iptraf

iptraf is an interactive, colorful IP traffic monitor that provides detailed statistics about network traffic, including source and destination IP addresses, ports, protocols, and more. It's particularly useful for monitoring and analyzing network activity in real-time.

To use iptraf, simply run the `iptraf` command in your terminal. The application will launch, and you can navigate through various screens and options using keyboard shortcuts.

Due to the interactive and graphical nature of iptraf, it's challenging to provide a concise example. However, once launched, iptraf displays a real-time view of network traffic, allowing you to drill down into specific details and filter the output based on your needs.

atop

$ atop
ATOP - fwatto@host1            2023/05/21  19:38:16             --------- Active Processes

PID    USER   SYSCPU   PMEM    PCPU    CMDLINE
6789   user    0.00s   0.1%    0.0%     /usr/bin/python /usr/bin/glances --export-csv /tmp/glances.csv
1234   root    0.00s   0.0%    0.0%     /usr/sbin/sshd -D
2345   user    0.01s   0.0%    0.0%     /usr/bin/python3 /usr/bin/glances --export-csv /tmp/glances.csv

PID   TID    CPU       STATE    SYSCPU    USERCU   VSIZE      RSS      PMEM   WCHAN   CMD-START-TIME
6789  6789    1        sleep    0.00s     0.00s    55840    2824K    0.1%    poll_sc   19:37:28

  PID:  6789    /usr/bin/python /usr/bin/glances --export-csv /tmp/glances.csv
SAMPLES: 4       TOTAL-CPU-USAGE = 0.00%   NTOTALCPU = 2
               PMEM = 0.1%   VSIZE = 55840KB   RSS = 2824KB
0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0      CPU%   0.1  0.1  0.1  0.1

The atop command monitors system resources and process activity in real-time.

Sure, here's the complete guide covering all 50 Linux commands for monitoring and troubleshooting with real-time examples:

glances

$ glances
Glances - 3.2.1 (Python 3.8.10) - Linux 5.4.0-124-generic (host1) - Intel(R) Core(TM) i7-8550U CPU @ 1.80GHz

CPU  [22.0%]  Load:1.21  Uptime: 1d 01:31:42         
CPU_123456 [22.0%]  Mem: 7.9G/15.6G [50.9%]          
Swap:0.0M/0.0M [0.0%]                                
Temp: 51°C                                           

PROCESS LIST          CPU%  MEM%  VIRT  RES  STATUS  STARTED      TIME+  USER
 python3  6789        22.0  0.4  847M  68M  running  19:37:28     3:28.82      user
 sshd     1234         0.0  0.1  4.1M  1.4M  running  17:56:36     0:00.00      root  

DISK I/O       READ    WRITE                  FILE SYSTEM                     
nvme0n1             1.7K     0.0K            /dev/nvme0n1                              
                    42.8K     0.0K           /boot                                      
                    58.6M    12.8M           /                                           

NETWORK        RX    RX/s     TX     TX/s                                 
eno1 (eth0)    557K  0.0      468K   0.0                                  
lo            8962K  0.0     8962K   0.0

glances is a cross-platform monitoring tool that provides a comprehensive view of system usage, including CPU, memory, disk I/O, network, and processes. In this example, glances displays real-time system statistics, including CPU load, memory usage, swap usage, temperature, process list, disk I/O, and network traffic.

stress
```
$ stress --cpu 2 --timeout 10s
stress: info: [1] dispatching hogs: 2 cpu
stress: info: [1] successful run completed in 10s
```
The stress command imposes a configurable amount of CPU, memory, I/O, and disk stress on a system. In this example, we use stress with the --cpu option to spawn 2 CPU-bound workers, and the --timeout option to run the stress test for 10 seconds. This command can be useful for simulating high system load and testing system performance under stress.
sysctl
```
$ sysctl -a | grep tcp_fin
net.ipv4.tcp_fin_timeout = 60
net.ipv4.tcp_fin_timeout_early = 0

$ sysctl -w net.ipv4.tcp_fin_timeout=30
net.ipv4.tcp_fin_timeout = 30
```
The sysctl command modifies and retrieves kernel parameters at runtime. In the first example, we use sysctl with the -a option to list all available kernel parameters, and then grep to filter for tcp_fin parameters. In the second example, we use sysctl with the -w option to write a new value (30) to the net.ipv4.tcp_fin_timeout parameter.

ulimit

$ ulimit -a
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 63488
max locked memory       (kbytes, -l) 64
max memory size         (kbytes, -m) unlimited
open files                      (-n) 1024
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) 63488
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

The ulimit command controls system resource limits for processes. In this example, we use ulimit with the -a option to display the current resource limits for the user shell, including limits for core file size, data segment size, file size, open files, stack size, and more.

pmap

$ pmap -x 1234
1234:   /usr/sbin/sshd -D
Address           Kbytes     RSS   Dirty Mode  Mapping
0000000000400000    1048       32       0 r-x-- sshd
0000000000612000      16       12       8 rw---   
0000000000615000       4        4        4 rw---   
00000000006b9000      32        0        0 rw---   
00007f65d5da7000     132      132       0 rw---   
00007f65d5dcf000      52       52       52 rw---     [ anon ]
00007f65d5df2000      56        0        0 -----   
...

The pmap command reports the memory map of a process. In this example, we use pmap with the -x option to display an extended memory map for the process with PID 1234 (/usr/sbin/sshd). The output shows the memory mappings, including the address, size, RSS (Resident Set Size), dirty pages, mode, and mapping details.

mtr

$ mtr example.com
Start: 2023-05-21 19:45:00
HOST: host1                  Loss%   Snt   Last   Avg  Best  Wrst StDev
  1. router.local             0.0%    10    0.4   0.5   0.3   1.6   0.3
  2. 192.168.1.1              0.0%    10    1.0   1.1   0.9   1.8   0.2
  3. 10.0.0.1                 0.0%    10   10.2  10.7   9.8  16.1   1.9
  4. 172.16.0.1               0.0%    10   11.0  11.3  10.6  13.2   0.8
  5. 100.64.0.1               0.0%    10   11.5  11.3  10.8  12.6   0.5
  6. 192.205.32.49            0.0%    10   12.3  14.4  11.6  28.3   5.4
  7. 192.205.33.22            0.0%    10   12.9  13.7  12.1  20.1   2.4
  8. example.com             0.0%    10   13.6  14.2  12.9  20.0   2.0

The mtr command combines the functionality of ping and traceroute for network diagnostics. In this example, we use mtr with a target domain (example.com) to trace the network path and measure packet loss, latency, and other metrics for each hop along the way. The output displays the host, packet loss percentage, sent packets, and latency statistics (last, average, best, worst, and standard deviation) for each hop.

dig

$ dig example.com

; <<>> DiG 9.16.1-Ubuntu <<>> example.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 14971
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 65494
;; QUESTION SECTION:
;example.com.                   IN      A

;; ANSWER SECTION:
example.com.            3600    IN      A       93.184.216.34

;; Query time: 16 msec
;; SERVER: 127.0.0.53#53(127.0.0.53)
;; WHEN: Sun May 21 19:46:08 EDT 2023
;; MSG SIZE  rcvd: 59

The dig command queries DNS servers and retrieves information about domains. In this example, we use dig with the domain example.com to perform a DNS lookup for the A (IPv4 address) record. The output displays the query details, including the server used, query time, and the IPv4 address associated with the domain.

nslookup
```
$ nslookup example.com
Server:         127.0.0.53
Address:        127.0.0.53#53

Non-authoritative answer:
Name:   example.com
Address: 93.184.216.34
```
The nslookup command is another tool for querying Internet domain name servers. In this example, we use nslookup with the domain example.com to perform a DNS lookup and retrieve the associated IPv4 address. The output displays the DNS server used and the IP address for the specified domain.
host
```
$ host example.com
example.com has address 93.184.216.34
example.com mail is handled by 10 mailcluster.loopia.se.
```
The host command is a simple utility for performing DNS lookups for resource records. In this example, we use host with the domain example.com to retrieve the IPv4 address and mail server information associated with the domain.

ping

$ ping example.com
PING example.com (93.184.216.34) 56(84) bytes of data.
64 bytes from 93.184.216.34 (93.184.216.34): icmp_seq=1 ttl=53 time=14.8 ms
64 bytes from 93.184.216.34 (93.184.216.34): icmp_seq=2 ttl=53 time=14.5 ms
64 bytes from 93.184.216.34 (93.184.216.34): icmp_seq=3 ttl=53 time=14.6 ms
^C
--- example.com ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2003ms
rtt min/avg/max/mdev = 14.572/14.675/14.848/0.165 ms

The ping command tests the network connectivity to a remote host by sending ICMP echo request packets and measuring the round-trip time. In this example, we use ping with the domain example.com to test the connectivity and measure the latency. The output displays the round-trip time for each packet, along with a summary of the ping statistics, including packet loss and latency values.

traceroute

$ traceroute example.com
traceroute to example.com (93.184.216.34), 30 hops max, 60 byte packets
 1  router.local (192.168.1.1)  1.075 ms  1.132 ms  1.198 ms
 2  10.0.0.1 (10.0.0.1)  10.999 ms  11.081 ms  11.151 ms
 3  172.16.0.1 (172.16.0.1)  12.019 ms  12.101 ms  12.169 ms
 4  100.64.0.1 (100.64.0.1)  12.623 ms  12.697 ms  12.769 ms
 5  192.205.32.49 (192.205.32.49)  13.671 ms  13.744 ms  13.814 ms
 6  192.205.33.22 (192.205.33.22)  14.681 ms  14.752 ms  14.823 ms
 7  example.com (93.184.216.34)  14.989 ms  15.060 ms  15.131 ms

The traceroute command traces the route taken by packets across an IP network to a specified destination. In this example, we use traceroute with the domain example.com to trace the network path and display the IP addresses and round-trip times for each hop along the way. The output shows the hops, IP addresses (or hostnames if available), and the latency for each hop.

ifconfig

$ ifconfig
eno1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 192.168.1.100  netmask 255.255.255.0  broadcast 192.168.1.255
        inet6 fe80::20c:29ff:fe16:d1a6  prefixlen 64  scopeid 0x20<link>
        ether 00:0c:29:16:d1:a6  txqueuelen 1000  (Ethernet)
        RX packets 5678  bytes 456789 (456.7 KB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 3456  bytes 456789 (456.7 KB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536
        inet 127.0.0.1  netmask 255.0.0.0
        inet6 ::1  prefixlen 128  scopeid 0x10<host>
        loop  txqueuelen 1000  (Local Loopback)
        RX packets 123  bytes 9876 (9.8 KB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 123  bytes 9876 (9.8 KB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

The ifconfig command is used to configure and display network interface information. In this example, we use ifconfig without any arguments to display the configuration and statistics for all active network interfaces on the system, including the IP addresses, netmasks, broadcast addresses, MAC addresses, and packet counts.

$ ip addr show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: eno1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether 00:0c:29:16:d1:a6 brd ff:ff:ff:ff:ff:ff
    inet 192.168.1.100/24 brd 192.168.1.255 scope global dynamic eno1
       valid_lft 86398sec preferred_lft 86398sec
    inet6 fe80::20c:29ff:fe16:d1a6/64 scope link
       valid_lft forever preferred_lft forever

The ip command is a powerful utility for configuring and displaying network interfaces and routing tables. In this example, we use ip addr show to display the IP addresses, network masks, and other details for all active network interfaces on the system.

iptables
```
$ sudo iptables -L
Chain INPUT (policy ACCEPT)
target     prot opt source               destination

Chain FORWARD (policy ACCEPT)
target     prot opt source               destination

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination
```
The iptables command is used to administer and configure Linux kernel's network packet filtering rules. In this example, we use sudo iptables -L to list the current rules in the IP packet filter table for the INPUT, FORWARD, and OUTPUT chains. By default, all chains have a policy of ACCEPT, allowing all traffic.

ufw

$ sudo ufw status
Status: inactive

$ sudo ufw allow 22/tcp
Rule added
Rule added (v6)

$ sudo ufw enable
Command may disrupt existing ssh connections. Proceed with operation (y|n)? y
Firewall is active and enabled on system startup

$ sudo ufw status
Status: active

To                         Action      From
--                         ------      ----
22/tcp                     ALLOW       Anywhere
22/tcp (v6)                ALLOW       Anywhere (v6)

The ufw (Uncomplicated Firewall) command is a user-friendly frontend for managing iptables rules. In this example, we first check the status of ufw, which is initially inactive. We then use sudo ufw allow 22/tcp to allow incoming TCP traffic on port 22 (SSH). After enabling the firewall with sudo ufw enable, we can verify that the new rule is active by running sudo ufw status.

netcat
```
$ netcat -l -p 8080  # Server
$ netcat localhost 8080  # Client
Hello, world!
Hello, world!
```
The netcat (often abbreviated as nc) command is a versatile utility for reading and writing data across network connections. In this example, we start a netcat server listening on port 8080 with the -l (listen) and -p (port) options. Then, in a separate terminal, we connect to the server using netcat localhost 8080. Any data typed in the client terminal is sent to the server and echoed back.

wget

$ wget https://example.com/file.zip
--2023-05-21 20:10:18--  https://example.com/file.zip
Resolving example.com (example.com)... 93.184.216.34
Connecting to example.com (example.com)|93.184.216.34|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1024000 (1000K) [application/zip]
Saving to: 'file.zip'

file.zip           100%[===================>]   1000K  --.-KB/s    in 0.02s

2023-05-21 20:10:18 (50.0 MB/s) - 'file.zip' saved [1024000/1024000]

The wget command is a command-line utility for retrieving files from the web. In this example, we use wget with a URL (https://example.com/file.zip) to download the specified file (file.zip). The output displays the progress of the download, including the URL being resolved, connection details, HTTP response code, file size, download rate, and the final confirmation that the file was saved successfully.

curl

$ curl https://example.com
<!doctype html>
<html>
<head>
    <title>Example Domain</title>
    ...
</head>
<body>
<div>
    <h1>Example Domain</h1>
    <p>This domain is for use in illustrative examples in documents. You may use this
    domain in literature without prior coordination or asking for permission.</p>
    ...
</div>
</body>
</html>

The curl command is a tool for transferring data to or from a server using various protocols, including HTTP, FTP, and more. In this example, we use curl with a URL (https://example.com) to retrieve the contents of the specified web page. The output displays the HTML source code of the web page.

rsync
```
# On the source machine
$ rsync -avz /path/to/source/files user@remote_host:/path/to/destination

# On the destination machine
$ rsync -avz user@remote_host:/path/to/source/files /path/to/destination
```
The rsync command synchronizes files and directories from one location to another, either locally or remotely over a network. In the first example, we use rsync with the -a (archive), -v (verbose), and -z (compress) options to copy files from a local source directory to a remote destination directory. In the second example, we do the reverse by copying files from a remote source directory to a local destination directory.
scp
```
# Copy a file from local to remote
$ scp /path/to/local/file.txt user@remote_host:/path/to/destination

# Copy a directory from remote to local
$ scp -r user@remote_host:/path/to/remote/directory /path/to/local/destination
```
The scp (Secure Copy) command securely copies files between hosts using the SSH protocol. In the first example, we use scp to copy a local file (file.txt) to a remote host's destination directory. In the second example, we use scp with the -r (recursive) option to copy an entire directory from a remote host to a local destination.
ssh
```
$ ssh user@remote_host
user@remote_host's password: 
Last login: Sun May 21 20:15:34 2023 from 192.168.1.100

[user@remote_host ~]$
```
The ssh (Secure Shell) command establishes a secure, encrypted connection to a remote host for remote login and command execution. In this example, we use ssh with a username (user) and hostname (remote_host) to initiate an SSH connection. After providing the user's password, we are logged in to the remote host's shell, where we can execute commands as if we were logged in locally.
screen or tmux
```
# Start a new screen session
$ screen

# Detach from the current screen session
[Press Ctrl+A, D]

# List running screen sessions
$ screen -ls

# Reattach to a detached screen session
$ screen -r <session_id>
```
screen and tmux are terminal multiplexers that enable multiple virtual terminal sessions. These tools are particularly useful when you need to run long-running processes or maintain persistent sessions on remote servers. In this example, we use screen to start a new session, detach from it, list running sessions, and reattach to a detached session using its session ID.
tcpdump The tcpdump command captures and analyzes network traffic, allowing you to inspect packets and troubleshoot network-related issues.
```
$ tcpdump -i eno1 -n host 192.168.1.100
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on eno1, link-type EN10MB (Ethernet), snapshot length 262144 bytes
20:23:45.123456 IP 192.168.1.10.54321 > 192.168.1.100.80: Flags [P.], seq 1234567890:1234567900, ack 1234567891, win 1024, options [nop,nop,TS val 1234567 ecr 1234568], length 10
20:23:45.123487 IP 192.168.1.100.80 > 192.168.1.10.54321: Flags [.], ack 1234567900, win 2048, options [
```
This comprehensive guide covers the top 50 essential Linux commands for monitoring and troubleshooting in a DevOps environment. While the examples provided give you a taste of each command's functionality, it's important to explore their various options and advanced features by consulting their respective man pages or online documentation. Mastering these commands will empower you to effectively manage, monitor, and troubleshoot your Linux systems, ensuring optimal performance and uptime.

Linux Top 50 Important Commands for Monitoring and Troubleshooting: A Comprehensive Guide for DevOps

Subscribe to my newsletter

Suraj Kumar

Suraj Kumar