Text Processing Tools

Swati VermaSwati Verma
6 min read

1. grep

  • The grep command is used for searching and filtering text in files based on patterns (regular expressions) or simple strings.

  • grep stands for "Global Regular Expression Print"

  • We can use grep anywhere like with files, searching for file, directories etc.

    Syntax

      grep [OPTION] .. Pattern [File] ..
    

15 Cases of grep command

Reference video

File which is being used

  1. To ignore the upper and lower case while searching

    No results cuz its case sensitive

    Use : -i → ignore

     grep -i <keyword> <file-name>
    

    Can search patterns also

  2. To search everything except given pattern/keyword :

     grep -v <keyword> <file-name>
    

  3. To print how many times (count) given keyword present in file

     grep -c <keyword> <file-name>
    

    Counts the number of lines in <file-name> that contain <keyword> and prints the count.

  4. To search for exact match of given keyword in a file

     grep -w <keyword> <file-name>
    

  5. To print the line number of matches of given keyword in a file

     grep -n <keyword> <file-name>
    

  6. To search a given keyword in multiple files

     grep -n <keyword> <file-name> <file-name>
    

  7. To suppress file names while search a given keyword in multiple files

     grep -h <keyword> <file-name> <file-name>
    

  8. To search multiple keywords in a file

grep -e <keyword1>  -e <keyword2> <file-name>

  1. To search multiple keywords in multiple file

     grep -e <keyword1>  -e <keyword2> <file-name>
    

  2. To only print file names which matches given keywords

    grep -l <keyword> <file-name1> <file-name>
    

  3. To get the keywords/pattern from a file and match with a another file

-f option allows you to specify a file containing patterns to search for.

If you want to search a lot of keywords then rather than typing all the keywords on terminal , just create a file with those keywords and use the option -f

Syntax

grep -f pattern_file target_file

  1. To print the matching line which start with given keyword
grep "^keyword" file

“^” → caret

  1. To print the matching line which end with given keyword
grep "keyword$" file

  1. Suppose we have 100 files in a directory and we need to search a keyword in all the files
grep -R -f <file-name> <directory-name>/

Can also search a word

  1. We can use egrep command for the multiple keywords search
egrep "key1|key2|key3" <file-name>

  1. If you just wanna search but don't want to print on terminal

    grep -q "keyword" <file-name>
    


echo $?

This command is used to display the exit status of the previously executed command or script.

0 - successful

Non zero - indicate different types of errors or problems encountered during the execution.

If you want to suppress error messages.

  • ls | grep filename


2.awk

  • It is particularly useful for processing structured text data, such as tabular data log files, CSV, or space-separated files**.**

  • It is designed to process text line by line and allows you to perform various tasks.

  • Syntax

      awk 'pattern {action}' file
    
  • pattern → Defines a condition (optional).

  • action → Specifies what to do when the pattern matches.

  • file → Input file to process.The file that is to be used for this example

Sample file: data

Commands

  1. Print Entire File

     awk '{print}' <file-name>
    

  2. Print a Specific Column

     awk '{print $1}' <file-name>
    

  3. To print multiple columns

     awk '{print $1, $3}' <file-name>
    

    Use Case:

     ls -l | awk '{print $NF}'
    

  4. Filter Lines Matching a Pattern

     awk '/pattern/ {print}' <file-name>
    

  5. Use Conditions (if statements)

     awk '{if($4 >= 600000) print $0}' <file-name>
    

  6. Count Lines in a File

     awk 'END {print NR}' <file-name>
    

    NR (Number of Records) → A built-in awk variable that keeps track of the current line number..

    END Ensures the action is performed after reading all lines in the file

  7. To get last column

     awk '{print $NF}' <file-name>
    

    NF is a built-in awk variable that stands for Number of Fields in a line. It represents the total number of columns (fields) in the current line.

    $NF → Refers to the last field of each line.

  8. Using delimiter

      awk '{print $2 ":" $NF}' <file-name>
    

  9. Sum a Column

     awk '{sum += $4} END {print "Total salary:", sum}' <file-name>
    

  10. To get a specific row

    awk 'NR==3 {print}' <file-name>
    

  11. Range of lines

    awk 'NR==4, NR==5 {print}' <file-name>
    

  12. Print the line numbers of empty lines.

    awk 'NF==0 {print NR}' file_name
    

    NF==0 → Checks if the line has zero fields (i.e., an empty line).

  13. Updating a file

    The changes made by the awk command will not be permanent in the file. The awk command reads the file and makes modifications in memory but does not alter the original file.

  14. -F option

    The -F option in the awk command is used to specify the field separator (delimiter) in the input data. It tells awk how to split each line of input into fields.

    awk -F, '{print $2 " " $5}' <file-name>
    

    -F, means you're specifying a comma (,) as the field separator. This is often used when working with CSV (Comma-Separated Values) files, where each field is separated by a comma

  15. How to use for loop in AWK command?

    awk 'BEGIN {for(i=0;i<=10;i++) print i;}'
    

  16. How to use while loop in AWK command?

    awk 'BEGIN {while(i<10){ i++; print "Num is " i;}}'
    


SED command

The sed (Stream Editor) command is used for searching, replacing, deleting, and modifying text in a file or input stream.

  1. To print a specific line

     sed -n '3p' file-name
    

  2. To print the last line

     sed -n '$p' file-name
    

    '$p': Prints only the last ($) line of the file.

  3. To print a range of lines.

      sed -n '1,4p' file-name
    

  4. To print the lines that contain a specific word or pattern

     sed -n '/pattern/p' file-name
    

  5. To print only the specified lines.

     sed -n -e '3p' -e  '5p' file-name
    

  6. To print only the lines that contain specified patterns

     sed -n -e'/pilot/p' -e'/BTech/p' file-name
    

  7. To print a specified line and the next few lines from the given file.

     sed -n '3,+3p' file-name
    

  8. To print every alternate line starting from the given line.

     sed -n '3~2p' file-name
    

  9. To read expression from external file.

    ext.txt

     sed -n -f ext.txt file-name
    

  10. To replace a word in a file and print

    sed 's/old-string/new-string/g' file-name
    

  11. To replace all occurrences of <string_to_change> with <new_string> only on the specified line of file_name, leaving other lines unchanged.

    sed '3s/BTech/MTech/g' file-name
    

    sed '7! s/BTech/MBA/g' data
    

  12. To delete a line

    sed '1d' file-name
    

    Deleting a range of lines

    sed '1,3d' file-name
    


0
Subscribe to my newsletter

Read articles from Swati Verma directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Swati Verma
Swati Verma

Growing in DevOps, together! 🤝 | Associate Software Engineer at Tech Mahindra | Enthusiastic about automation, cloud solutions, and efficient software delivery. | Let's connect, collaborate, and learn from each other!