Building your own wc tool in Node.js

Today, we are going to make an attempt to develop a simple but extremely useful CLI application, known as wc. But, what is this 'wc'?

According to Wikipedia -

"wc (short for word count) is a command in Unix, Plan 9, Inferno, and Unix-like operating systems. The program reads either standard input or a list of computer files and generates one or more of the following statistics: newline count, word count, and byte count. If a list of files is provided, both individual file and total statistics follow."

I plan to call my implementation rwc. The features that we aim to implement are:

  1. Printing the number of bytes in the file

  2. Printing the number of lines in the file

  3. Printing the number of words in the file

  4. Printing the number of characters in the file

So, without further ado, let's get started!

Step 1: The set up

  • Create a new directory for the project and navigate to the directory in your terminal. My folder name is rwc.

      cd rwc
    
  • Initialize a Node.js project in the folder.

      npm init -y
    
  • Go to package.json and add another field type.

      "type" : "module",
    

    This enables us to use import and export statements in our project.

  • Add an entry point for our program by adding another field main.

      "main" : "bin/index.js",
    
  • Create a directory bin and create a new file index.js, where we will write our code.

  • Create a new text file test.txt to test our CLI on. Fill it in with whatever you like, but make sure it spans over multiple lines, for proper test coverage.

With that, we have our furniture set up. Now, we need to breathe life in our instrument.

Step 2: Counting bytes

To ensure that our code is understandable 2 days after we have written it, we need to distribute and arrange it into proper segments. So, each type of count is divided into its individual function. First, we attempt to count the bytes:

const countBytes = () => {
    const bytes = Buffer.byteLength(data, "utf-8");
    output += `${bytes} bytes \t`;
};

We are using the Buffer.byteLength() method to count the number of bytes in the string, and then add a formatted string to the output.

The output variable is defined above all the other functions to aggregate the output. Finally, we intend to console.log the output at the end.

let output = "";  //add at the beginning

Step 3: Counting lines

We create another function to count the number of lines in the file

const countLines = () => {
    const lines = data.split(/\r?\n/).length;
    output += `${lines} lines \t`;
};

The above code splits the data to an array according to newlines, and then we store the length of the array. We append a formatted string containing the count to the output. One thing to note in the regex of the split function, \r\n is to accommodate Windows systems, since Windows specifies a new line with a combination of carriage return and newline.

Step 4: Count words

The countWords() function is very similar to the countLines() function, except the regex for the split. Here, we split whenever we come across a whitespace.

const countWords = () => {
    const words = data.split(/\s+/).length;
    output += `${words} words \t`;
};

Step 5: Count characters

This function simply returns the number of characters in the file (including whitespace and newline characters). It is implemented as follows:

const countCharacters = () => {
    const chars = data.length;
    output += `${chars} characters \t`;
};

Step 6: Configure flags

Now, we need to call the appropriate functions according to the flags provided. Also, keep in mind that, when no flags are provided, rwc prints all 4 counts, i.e, it assumes all flags are true.

To take of no flags provided, we do an if check. If

if(Object.keys(argv).length==2) {
    argv.c = true;
    argv.l = true;
    argv.w = true;
    argv.m = true;
}

Here, Object.keys(argv).length returns the number of keys in the argv object that yargs provides. Since it provides 2 keys by default (_ and $0), if Object.keys(argv).length returns 2, we can conclude that the user has not provided any more flags. Thus, we set all the flags to true, to print out all the counts.

Finally, we create a count function, which will call the appropriate count[..] function according to the status of the flags, log the output value thus accumulated.

const count = () => {
    if(argv.c) countBytes();
    if(argv.l) countLines();
    if(argv.w) countWords();
    if(argv.m) countCharacters();
    console.log(output);
};

Step 6: Configure the input sources

The last major step in our program is to settle the input sources that our CLI will have. The main source of input will be the file that user will provide while using the command. However, if the user does not provide any file path, our CLI will take input from standard input.

We'll account for the two cases using a simple if check.

if (argv._.length == 0) {
    // code for getting data from standard input
} else {
    // code for getting data from file
}

argv._ is an array which contains all the unnamed arguments provided by the user (basically our file path(s)).

Let's configure the standard input first. It consists of mostly boilerplate code and looks something like this:

import readline from "readline" // add at the top

// if (argv._.length == 0) { 
    let rl = readline.createInterface({
            input: process.stdin,
            output: process.stdout
        });

        function listen() {
            rl.question('', input => {
                if (input == 'quit()')
                    return rl.close();

                data = input;
                count();
                output = "";
                listen();
            });
        }
// } else {

We are using Javascript's readline package to accept input from standard input. We create a readline interface and configure the input and output to the standard input and standard output.

We define a function listen() which waits for an input from the user. It takes each line of input, calls the count() function on the data received, and then recursively calls itself again, effectively waiting for input again. This happens until the user quits the interface by giving the input quit().

Remember to set the value of output to an empty string after each iteration, to ensure the results of the current iteration are not appended to the results of the previous iteration.

Now, onto the else part. We basically want to read the data from a file as provided by the user and call our count() method on that. It should be a simple bit of code similar to:

import fs from "fs" // add to top of file

// } else { 
    const filepath = argv._[0];
    try {
        data = fs.readFileSync(filePath, 'utf8');
    } catch (err) {
        console.error('Error reading file. Make sure you have given the correct file path.');
        process.exit(1);
    }

    count();
}

We fetch the file in try-catch block to ensure that in case our perfect little application crashes, it does so gracefully, with our own helpful error message. The file path would be the first element in the argv._ array and that is passed to the fs.readFileSync function, which synchronously reads the data.

And that's it! Our application should work now. To run it, navigate to the current directory in terminal and use:

node bin/index.js -m

This should give an output similar to this:

To test out all the flags at once:

Finally, we want to check whether our application works without any file path, and takes data from standard input.

Fortunately, our expedition seems to have been a success. Congratulations!

Bonus Step: Make the application a global CLI

To execute our application, we always needed to attach the keyword node. However, the unix wc does not have that drawback. Thus, a handy addition would be to install our CLI globally to run it in any directory with just the keyword rwc.

  • Navigate to package.json and add a key bin.

      "bin": {
          "rwc": "./bin/index.js"
      },
    

    Here, rwc is the command that will trigger the CLI to execute, similar to wc in unix.

  • Finally, go to the terminal, navigate to the root folder of the project, and run:

      npm install -g .
    

And that's a wrap!

Congratulations for making it so far. If you had fun, I am delighted. If you hated every minute of it....well, that's life I guess... Anyways, this is simply my inexperienced implementation of a popular tool, and I'm open to suggestions on how to make it cleaner and better.

I would love to hear your thoughts in the comments below! Until then, have a good one!

Tata ๐Ÿ‘

0
Subscribe to my newsletter

Read articles from Reeju Bhattacharya directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Reeju Bhattacharya
Reeju Bhattacharya

I am a Software Developer from Kolkata who is trying his very best to center a div.