Building our own zsh_stats command line app
In the previous post we saw how zsh has a nice inbuilt function zsh_stats
to get a summarized list of the most commonly used terminal commands.
This got me wondering, can we replicate this result ourselves? 🤔
Let’s find out.
Language of Choice
We will go with F# to do this.
I’ve been checking it out recently and found it to be a really nice concise language (like most other functional languages), with a really helpful type system and an amazing development experience. I love the ability to interactively evaluate expressions right inside the editor. Reminds me a lot of Clojure.
This article: Why is F# code so robust and reliable? does a great job of explaining its amazing features better than I could, so please go and check it out.
I like it so far and want to use it more, so I’m taking every chance I get to build more projects with it 👷🏾
Understanding the history file
The first thing we need to do is to get where these historical commands are written and try to read the file ourselves.
In my case (using zsh and oh-my-zsh) I found that the history file is stored in the $HISTFILE
environment variable. This is nice since we can just provide the same environment variable to our program.
Let’s see how it looks like with cat $HISTFILE
We can see in the small sample here that there are some blank lines mixed in with others that contain a command, along with some other content. Let’s figure it out.
It looks like the lines with the commands start with a colon :
and a space, then a number that looks like a unix timestamp, and another colon then 0
Then we have a semicolon ;
and finally we have the actual command.
According to ChatGPT this the meaning of each part of the line
: <epoch_timestamp>
:
- This is a Unix epoch timestamp (the number of seconds since January 1, 1970). It represents when the command was executed.
: <elapsed_time_in_seconds>
:
- This is the amount of time (in seconds) that the command took to execute. It's the difference between the time the command started and when it finished.
;command
:
- After the semicolon (
;
), the actual command that was executed in the shell appears. This is the command as the user typed it.
Now that we understand what we have to deal with, let’s setup the project.
Project Setup
If you want to follow along ensure you have .NET installed. As of the time of writing, the most current version is .NET8.
Once everything is setup you should have the dotnet
command
❯ dotnet --version
8.0.104
Let’s use the dotnet
CLI to create a new console app and call it HistoryStats
. We also need to specify that we want to use F#
as the language.
dotnet new console -lang "F#" -o HistoryStats
It generates a console app with this directory structure
❯ tree
.
├── HistoryStats.fsproj
└── Program.fs
1 directory, 2 files
We will add all the subsequent code to the Program.fs
file.
To make sure everything is working, run the app with dotnet run
and you should see the following message printed from the default program
"Hello from F#"
Now that we have the environment ready, let’s get started.
The eventual goal is to run dotnet run $HISTFILE
and it should give us a ranked list of the most commonly used commands in our terminal.
Reading the History File
This is the first iteration of a program to read and print out each line from the history file, which is provided as a command line argument
open System
open System.IO
let commandsByFrequency historyFile =
File.ReadLines(historyFile)
|> Seq.iter (fun line -> printfn "%s" line)
[<EntryPoint>]
let main argv =
let args = Environment.GetCommandLineArgs()
// Optional: Print all the arguments
printfn "Command-line arguments: %A" args
match args with
| [| _app; historyFile; |] ->
printfn "First argument: %s" historyFile
readFileLines historyFile
| _ ->
printfn "Usage: dotnet run <historyFile>"
0
The main part to focus on is the commandsByFrequency
function, which reads the provided file line by line. We then use Seq.iter
to iterate and print out each line in the file. We will add most of the functionality to this function.
In the main
function, which will be the entrypoint, we also have some pattern matching to ensure we only try to process the file if the program has been run with the first argument, which should be the history file.
If the file path is not provided, we print a message showing that we require an argument to be provided and how to provide it.
Now we can run our application, providing $HISTFILE
as the argument, and it should print out each line of the history file.
dotnet run $HISTFILE
Parsing the History File
On my system, printing out all the lines in the history file is quite noisy, so let’s start processing those lines to get to the interesting part.
We will focus on the commandsByFrequency
function to do all the processing we need.
First, we need to skip the blank lines before continuing with the processing
let commandsByFrequency filePath =
// Returns an enumerable over the lines in the file
File.ReadLines(filePath)
|> Seq.choose (fun line -> if String.IsNullOrWhiteSpace line then None else Some line)
|> Seq.iter (fun line -> printfn "%s" line)
If you run the app now you should see that we no longer print out the blank lines in the file.
We will make heavy use of the Seq module, which is roughly similar to the Enumerable module in Ruby. It has a ton of useful functions for processing collections of data. It is lazy by default and only processes individual sequence elements as required, making it a nice tool for our needs.
Seq.choose
deserves a special mention since it makes it so easy to separate the lines we want to keep and the ones to discard. It takes a function that should return None
if we want to skip the item or Some x
if we want to keep the item x
I found it to be quite elegant and we will use it again in the next section.
Extracting the Command from a History Line
Now that we have established a pattern for iterating over the lines in the file, let’s start parsing them to extract the info we need.
Just a reminder that this is how a line in the history file looks:: 1726414019:0;brew info mongod
To extract the command from a non-blank line, let’s add a new function that should do 2 things:
Get the command, which starts immediately after the first semicolon i.e.
;
Extract the first part of the command, without the arguments
let parseHistoryLine (line: string) =
if line.StartsWith(":") then
let semiColonIndex = line.IndexOf(';')
// 1. Get the command i.e. everything after the semicolon
let fullCommand = line.Substring(semiColonIndex + 1)
// 2. Split by space and get the first part of the command
let command = fullCommand.Split([| ' ' |]) |> Array.head
Some command
else
None
We will use the same pattern of using a function that returns an option, to determine which lines to keep or discard, then passing this function to Seq.choose
We return Some command
to indicate that want to keep the line and None
to indicate that we want to discard the line.
By now we already know how to use this command in our processing pipeline:
let commandsByFrequency historyFile =
File.ReadLines(historyFile)
// Take only the non-blank lines
|> Seq.choose (fun line -> if String.IsNullOrWhiteSpace line then None else Some line)
// Extract the command from the line
|> Seq.choose (fun line -> (parseHistoryLine line))
// Optional: Print out the commands we will process
|> Seq.iter (fun command -> printfn "%s" command)
Now that we have this pattern established, let’s take advantage of it to now finally get the most frequently used commands.
More Seq Magic
All that’s let for us to do is to find out how many times each command occurs. Once we have that, we can sort the sequence of commands in descending order, then get the top commands based on the frequency of occurrence.
Let’s write out the whole function then go through the important parts
let commandsByFrequency count historyFile =
// Returns an enumerable over the lines in the file
File.ReadLines(historyFile)
// Take only the non-blank lines
|> Seq.choose (fun line -> if String.IsNullOrWhiteSpace line then None else Some line)
// Extract the command from the line
|> Seq.choose (fun line -> (parseHistoryLine line))
// Group by command -> (command, seq of commands)
|> Seq.groupBy id
// Count occurrences
|> Seq.map (fun (command, occurrences) -> (command, Seq.length occurrences))
// Sort by the count in descending order
|> Seq.sortByDescending snd
// Take the top `count` elements
|> Seq.take count
The first change is the addition of a new count
parameter, which represents the number of commands we want to return.
The first addition is Seq.groupBy
Applies a key-generating function to each element of a sequence and yields a sequence of unique keys. Each unique key contains a sequence of all elements that match to this key.
The key-generating function we provide is the id
function, which simply returns whatever is provided to it as an input.
id 12345 // Evaluates to 12345
id "command" // Evaluates to "command"
In our case, what it does is to group the commands by the command itself. It returns a tuple containing 2 items, where the first item is the command itself, and the second is each occurrence of the command in the provided sequence of commands.
This is the sample output we would get if we printed out the result at this point in the program
("omz_history", seq ["omz_history"])
("gds", seq ["gds"])
("ggpush", seq ["ggpush"])
("zsh_stats", seq ["zsh_stats"; "zsh_stats"; "zsh_stats"])
Now that we have the command and a sequence of all the occurrences, then we can count the occurrences of each command. This is what happens in the next step:
// Count occurrences
Seq.map (fun (command, occurrences) -> (command, Seq.length occurrences))
This takes in the tuple returned from the groupBy
and transforms it into a new tuple containing the command and the frequency i.e. number of times it occurs.
So it would transform the previous input into something like this:
("omz_history", 1)
("gds", 1)
("ggpush", 1)
("zsh_stats", 3)
And now that we have the counts of each command, we can now sort the commands by the count in descending order, which is the second element of the tuple.
// Sort by the count in descending order
|> Seq.sortByDescending snd
snd
is a function that returns the second element of a tuple, and we use it here to get the count and use it to sort the commands.
And now that we have the sorted commands, the final part is to return the top commands by the count.
// Take the top `count` elements
|> Seq.take count
And just like that, we have our very own custom zsh_stats
clone
Wrapping up
Now we can take a look at the whole program
open System
open System.IO
let parseHistoryLine (line: string) =
if line.StartsWith(":") then
let semiColonIndex = line.IndexOf(';')
// 1. Get the command i.e. everything after the semicolon
let fullCommand = line.Substring(semiColonIndex + 1)
// 2. Split by space to get the first part of the command
let command = fullCommand.Split([| ' ' |]) |> Array.head
Some command
else
None
let commandsByFrequency count historyFile =
// Returns an enumerable over the lines in the file
File.ReadLines(historyFile)
// Take only the non-blank lines
|> Seq.choose (fun line -> if String.IsNullOrWhiteSpace line then None else Some line)
// Extract the command from the line
|> Seq.choose (fun line -> (parseHistoryLine line))
// Group by command -> (command, seq of commands)
|> Seq.groupBy id
// Count occurrences
|> Seq.map (fun (command, occurrences) -> (command, Seq.length occurrences))
// Sort by the count in descending order
|> Seq.sortByDescending snd
// Take the top `count` elements
|> Seq.take count
[<EntryPoint>]
let main argv =
let args = Environment.GetCommandLineArgs()
match args with
| [| _app; historyFile |] ->
printfn "History file: %s" historyFile
let result = commandsByFrequency 10 historyFile
for (cmd: string, count) in result do
printfn "Command: %s \tCount: %d" cmd count
| _ -> printfn "Usage: dotnet run <historyFile>"
0
The only addition is that we now print out the results of the commandsByFrequency
function. We will default to getting the top 10 results for now.
And now for the moment of truth, let’s compare our program’s output and and the zsh_stats
command output
There’s an off by one error with the ls
result which returns 166 in our command vs 165 in zsh_stats
but they are mostly similar and it gets the job done.
Conclusion
In summary, we have gone through replicating the zsh function zsh_stats
by writing a custom program in F#.
It has proven to be capable enough to replicate the original function, and more importantly, has helped us learn more about operating on sequences of data in F#.
PS:
If you are interested in how the original zsh_stats
function works, here you go
❯ which zsh_stats
zsh_stats () {
fc -l 1 | awk '{ CMD[$2]++; count++; } END { for (a in CMD) print CMD[a] " " CMD[a]*100/count "% " a }' | grep -v "./" | sort -nr | head -n 20 | column -c3 -s " " -t | nl
}
You can access the full example in the GitHub repository
Subscribe to my newsletter
Read articles from Kevin Gathuku directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
Kevin Gathuku
Kevin Gathuku
I'm a web-focused developer with a passion for exploring new ideas and hopefully sharing more of them through this blog 😃