Day 05: Linux Pro Commands

Prashant GohelPrashant Gohel
7 min read

awk

Why Learn This?

When working as a DevOps engineer, you’ll deal with large log files, CSV data, or system outputs. You often need to extract specific parts, count things, or filter content quickly.

Now you have two choices:

  • 📜 Write a shell script

  • ⚙️ Use a powerful one-liner tool like awk

What is awk?

awk is a text-processing command in Linux. It’s like a mini programming language for working with structured text files like .log, .csv, .tsv, etc.

✨ Key Use Cases:

  • Extract specific columns from a file

  • Filter lines based on keywords (like INFO, ERROR)

  • Apply conditions and loops

  • Count things (like how many times something occurred)

🍽️ Analogy: Restaurant Waiter vs awk

Just like a waiter serves you specific items from a full menu...

awk serves you specific parts from a full text file 🍲


Basic awk syntax:

awk '{ action }' filename
  • $1, $2, $3 → Represent columns

  • NR → Current row number

  • print → Displays output


Common Examples

Print the whole content of a file:

awk '{print}' app.log

Print the first column:

awk '{print $1}' app.log

Print multiple columns (1st, 2nd, 3rd, 5th):

awk '{print $1, $2, $3, $5}' app.log

Print only lines that contain the word INFO:

awk '/INFO/ {print $1, $2, $3, $5}' app.log

Save the filtered INFO lines to a new file:

awk '/INFO/ {print $1, $2, $3, $5}' app.log > only_info.log

Count Occurrences

Count how many times INFO appears:

awk '/INFO/ {count++} END {print count}' app.log

With a custom message:

awk '/INFO/ {count++} END {print "The count of INFO is", count}' app.log

Filter by Time (e.g., Between 08:53:00 and 08:53:59)

Let’s say your log’s second column is a timestamp:

awk '$2 >= "08:53:00" && $2 <= "08:53:59" {print $3}' app.log

Print Lines Between Line Numbers 2 to 10

awk 'NR >= 2 && NR <= 10 {print}' app.log

To print the row numbers:

awk 'NR >= 2 && NR <= 10 {print NR}' app.log

Why awk is Powerful?

  • Supports conditions, loops, ranges, and custom logic

  • Can act like a mini-programming language for text

  • Works great with formatted data like .csv, .tsv, .log

⚠️ Works best with structured/column-based files


Difference between awk and sed

Featureawksed
PurposeProgramming-style text processorStream editor
Best Used ForParsing and processing structured files (logs, CSV)Find & replace, line edits, simple filters
LimitationNeeds structured formatsNot good for complex logic

Summary:

  • Use awk when you want columns, conditions, counts, loops

  • Use sed when you want quick replacements or line changes


Real-World DevOps Use Cases

  • Parse logs to extract IPs, timestamps, or error types

  • Generate quick reports from .csv files

  • Filter logs based on date/time or event type

  • Count occurrences of issues in system logs

What is sed?

sed stands for Stream Editor. It reads input line by line (as a stream), performs operations (like search, replace, delete), and outputs the result.

💡 Think of sed as a robot that edits your file while reading it — no need to open it in an editor.


awk vs sed

Featureawksed
Use-caseExtracting, printing, filtering structured dataEditing and transforming text
Syntax styleMini-programming language with {}Expression-based (no {})
Data structureWorks with columns (CSV/TSV/logs)Works line-by-line, unstructured
Ideal forReports, CSVs, log filteringSearch & replace, quick edits

Basic Syntax:

sed 'expression' filename

Lets compare it with awk:

awk '/INFO/' app.log
sed -n '/INFO/p' app.log

In sed, if you want to only show matched lines, you must use the -n flag:

sed -n '/INFO/p' app.log

Search & Replace

Replace all instances of INFO with LOG in the entire file:

sed 's/INFO/LOG/g' app.log
```bash
# - s = substitute
# - INFO = the pattern to search
# - LOG = replacement
# - g = globally (all matches on the line)

Show Line Numbers with a Match

Show only line numbers where INFO appears:

sed -n -e '/INFO=/' app.log

# Explanation:
# -n: suppress automatic printing
# -e: execute expression
# /INFO/=: print line numbers where match found

Show both line numbers and matching lines:

sed -n -e '/INFO/=' -e '/INFO/p' app.log

Replace Only in a Specific Line Range

Replace INFO with LOG only in the first 10 lines:

sed '1,10 s/INFO/LOG/g' app.log

Print Only the First 10 Changed Lines

sed '1,10 s/INFO/LOG/g; 1,10p; 11q' app.log

# 🧠 What’s going on here?
# 1,10 s/INFO/LOG/g → Replace within lines 1 to 10
# 1,10p → Print only lines 1 to 10
# 11q → Quit after line 10 (no extra output)
# ; → Separates multiple commands

Real-World DevOps Use Cases

Use CaseExample
Replace config variablesReplace localhost with IP
Clean up logsRemove or mask sensitive data
AutomationUpdate version tags or comments in scripts
PipelineUse in Dockerfile, CI/CD shell, bash scripts

Summary Cheat Sheet

Tasksed Command
Match linessed -n '/INFO/p' app.log
Replace globallysed 's/INFO/LOG/g' app.log
Replace in rangesed '1,10 s/INFO/LOG/g'
Show line numbers with matchsed -n -e '/INFO/=' app.log
Show both line & contentsed -n -e '/INFO/=' -e '/INFO/p'
Replace & print only 10 linessed '1,10 s/INFO/LOG/g; 1,10p; 11q' app.log

What is grep?

grep stands for Global Regular Expression Print.

It’s like a powerful filter — it scans a file or stream, searches for a pattern, and prints only those lines that match.

Real-Life Analogy

Imagine you’re reading a 200-page book 📖, but you’re only interested in the pages that talk about "AWS". Instead of reading the whole book, you just search for the word "AWS" and mark those pages.

That’s what grep does — it finds and highlights the relevant lines in big files like logs, scripts, or processes.


Basic Usage

Find lines containing the word INFO:

grep INFO app.log
grep -i info app.log
# -i makes the search case-insensitive

Count how many matches:

grep -i -c info app.log
# -c stands for count of matching lines

Same Thing in awk?

You can get the same count in awk like this:

awk '/INFO/ {count++} END {print count}' app.log

So why do we need both?
Because each tool has different strengths.


awk vs sed vs grep – What's the Difference?

ToolBest ForStructure NeededReal Power
grepSimple search and matchNo structure neededFast filtering
awkColumn-based logicNeeds structured dataLogic, conditions, counters
sedInline editingWorks line-by-lineSearch & replace, line edit

DevOps Use Case Example

See all running processes:

ps aux

Filter only processes run by ubuntu user:

ps aux | grep ubuntu

Now get only the 2nd column (PID):

ps aux | grep ubuntu | awk '{print $2}'
# ps aux shows all processes
# grep ubuntu filters processes started by the ubuntu user
# awk '{print $2}' shows the process ID

Bonus: Powerful Grep Flags

FlagMeaningExample
-iIgnore casegrep -i info app.log
-cCount matchesgrep -c INFO app.log
-vInvert match (exclude pattern)grep -v INFO app.log
-nShow line numbersgrep -n INFO app.log
-rRecursive search in foldersgrep -r "ERROR" /var/logs/

Summary: Choose the Right Tool

TaskTool
Quick searchgrep
Column-based filtering or countingawk
Inline find & replacesed
Complex report from structured logsawk
Editing config files or scripts in placesed
0
Subscribe to my newsletter

Read articles from Prashant Gohel directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Prashant Gohel
Prashant Gohel

DevOps & Cloud Enthusiast | Exploring Linux, AWS, CI/CD & Automation | Sharing hands-on notes, tips, and real-world learning as I grow into a DevOps Engineer