Python File Analysis: Counting Lines and Words

Omini OkoiOmini Okoi
5 min read

Introduction

Python is a versatile language that excels in various file manipulation and analysis tasks. In this article, we'll explore four Python scripts that allow you to count lines and words in text files. We'll discuss each script's functionality, explain how it works, and provide practical examples to demonstrate their usage.

Let's get started by creating the folder and file we will be working with. Here's a Python script that will create the list of files in a directory:

Script 0: Create Files in Folder

#!/usr/bin/python3
import os
# create_files.py
# Specify the directory path
directory_path = 'training'

# List of file names
file_names = [
    'lightyear.txt',
    'mavka.txt',
    'moana.txt',
    'mummies.txt',
    'puss-in-boots.txt',
    'alita.txt',
    'encanto.txt',
    'seabeasts.txt',
    'spider-man.txt',
    'the-magicians-elephant.txt',
    'vivo.txt'
]

# Create the directory if it doesn't exist
if not os.path.exists(directory_path):
    os.makedirs(directory_path)

# Create the files in the directory
for file_name in file_names:
    file_path = os.path.join(directory_path, file_name)
    open(file_path, 'w').close()

print(f'Files created in the "{directory_path}" directory.')

After the successful creation of these files in the folder, proceed to populate them with either soundtrack theme songs or cast information.

Script 1: Counting Lines in a File

#!/usr/bin/python3
import os

file = os.path.join('.', 'vivo.txt')
with open(file, 'r') as f:
    lines = f.readlines()
    line_count = len(lines)

print(line_count)

This script is designed to count the number of lines in a specified text file. Here's how it works:

  1. It imports the os module to access file-related functions.

  2. It defines the file variable, indicating the path to the target text file.

  3. It uses a context manager (with open(file, 'r') as f) to open the file in read mode and create a file object f.

  4. It reads all the lines from the file using f.readlines() and stores them in the lines list.

  5. It calculates the line count by finding the length of the lines list using len(lines).

  6. Finally, it prints the line count to the console.

Script 2: Counting Occurrences of Words in a File

#!/usr/bin/python3
import os
import re
from collections import Counter

file = os.path.join('.', 'seabeasts.txt')

with open(file, 'r') as f:
    text = f.read()
    words = re.findall(r'\b\w+\b', text.lower())
    word_counts = Counter(words)
    for word, count in word_counts.items():
        print(f'{word}: {count}')

This script counts the occurrences of each word in a specified text file, including the count of each unique word. Here's a breakdown of how it works:

  1. It imports the os module to work with files, the re module for regular expressions, and the Counter class from the collections module.

  2. It defines the file variable, indicating the path to the target text file.

  3. Using a context manager, it opens the file in read mode ('r') and creates a file object f.

  4. It reads the entire text content from the file using f.read() and converts it to lowercase to ensure case-insensitive counting.

  5. It uses the re.findall() method to extract all words from the text using a regular expression pattern.

  6. It creates a word count dictionary using the Counter class to count the occurrences of each word.

  7. Finally, it iterates through the dictionary and prints each word along with its count to the console.

Script 3: Counting Words in a File

#!/usr/bin/python3
import os

file = os.path.join('.', 'puss-in-boots.txt')
word_counts = {}

with open(file, 'r') as f:
    text = f.read()
    word_count = len(text.split())
word_counts = word_count
print(word_counts)

This script counts the total number of words in a specified text file. Here's how it works:

  1. It imports the os module to access file-related functions.

  2. It defines the file variable, indicating the path to the target text file.

  3. It initializes an empty dictionary word_counts to store the word count.

  4. Using a context manager, it opens the file in read mode ('r') and creates a file object f.

  5. It reads the entire text content from the file using f.read().

  6. It splits the text into words using text.split() and calculates the word count by finding the length of the resulting list.

  7. It assigns the word count to the word_counts variable and prints it to the console.

Script 4: Counting Words in Multiple Files

#!/usr/bin/python3
import os
import re
from collections import Counter

word_counts = Counter()

for file in os.listdir('.'):
    if file.endswith('.txt'):
        filename = os.path.join('.', file)
        with open(filename, 'r') as f:
            text = f.read()
            words = re.findall(r'\b\w+\b', text.lower())
            word_counts.update(words)
            for word, count in word_counts.items():
                print(f'{word}: {count}')

This script counts the occurrences of words in multiple text files within a folder. Here's how it works:

  1. It imports the os module to work with files, the re module for regular expressions, and the Counter class from the collections module.

  2. It initializes a Counter object word_counts to store the word count.

  3. It uses a for loop to iterate through all files in the current directory (os.listdir('.')).

  4. For each file ending with ".txt," it constructs the full file path using os.path.join() and opens the file in read mode ('r') using a context manager.

  5. It reads the entire text content from the file, converts it to lowercase, and uses regular expressions to extract all words.

  6. It updates the word_counts dictionary with the word count from each file using word_counts.update(words).

  7. Finally, it iterates through the dictionary and prints each word along with its count to the console.

Conclusion

These Python scripts provide essential tools for analyzing text files by counting lines and words. Whether you need to determine the number of lines in a file, count occurrences of specific words, calculate the total word count, or analyze multiple files simultaneously, these scripts demonstrate how Python can simplify such tasks. By understanding and adapting these scripts, you can perform file analysis efficiently in your Python projects.

Feel free to access the codebase by cloning the Git repository, which is available at the following Github URL

0
Subscribe to my newsletter

Read articles from Omini Okoi directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Omini Okoi
Omini Okoi

I am a web developer with experience in back-end engineering and cloud computing. I enjoy mentoring and training others in coding and technology. I’m always open to collaboration and building stuff that can make a positive impact.