Ruby/Crystal/Rust Advent of Code 2022 - Day 6

Kirk HainesKirk Haines
26 min read

Day 6 is a fun one. The task itself is very simple, but I took it as an opportunity to do some performance benchmarking and explore some of the nuances around how implementation affects performance. Read on for more details!

The Repository

All of the code for this day's solutions, as well as the data input file, can be found at:

https://github.com/wyhaines/advent-of-code-2022/tree/main/day_6

Part 1

To see the full details of the challenge, visit the Advent of Code 2022 - Day 6 page. However, the tl;dr version of the challenge is as follows:

You and the Elves are on your way to the star fruit grove, and one of the Elves gives you a handheld device. The device has many fancy features, but the most important one to set up right now is the communication system. Unfortunately, the device is malfunctioning. However, the Elf is confident that you, with your significant experience dealing with signal-based systems, will have no problem fixing it. Just then, the device emits a few colorful sparks.

To fix the communication system, you need to add a subroutine to the device that detects a start-of-packet marker in the data stream. In the protocol being used by the Elves, the start of a packet is indicated by a sequence of four characters that are all different. The device will send your subroutine a datastream buffer (your puzzle input); your subroutine needs to identify the first position where the four most recently received characters were all different. Specifically, it needs to report the number of characters from the beginning of the buffer to the end of the first such four-character marker.

For example, suppose you receive the following datastream buffer:

mjqjpqmgbljsphdztnvjfqwrcgsmlb

After the first three characters (mjq) have been received, but there haven't been enough characters received yet to find the marker. The first time a marker could occur is after the fourth character is received, making the most recent four characters mjqj. Because j is repeated, this isn't a marker. The first time a marker appears is after the seventh character arrives. Once it does, the last four characters received are jpqm, which are all different. In this case, your subroutine should report the value 7, because the first start-of-packet marker is complete after 7 characters have been processed.

Essentially, the task is to put a 4-character sliding window over the string of characters and then check if all four characters in that sliding window are unique. If they are, then return the index of the first character after that 4-character block.

The Approach

  1. Slurp the input into a string. There is no additional processing this time!

  2. Iterate over a sliding window of 4 characters, passing those characters into another method that will determine if the characters are unique. Return the index +4 of that window when a unique set is found.

  3. Write a method to determine if a given 4 characters are unique.

Ruby solution

Ruby makes so much about programming so easy.

# frozen_string_literal: true

class TuningTrouble
  def initialize(filename)
    @data = parse_packet_data(filename)
  end

  def run
    puts "The start of packet marker is at character ##{find_start_of_packet}"
  end

  def parse_packet_data(filename)
    File.read(filename)
  end

  def find_start_of_packet
    idx = 0
    @data.chars.each_cons(4) do |chunk|
      return idx + 4 if chunk.uniq == chunk
      idx += 1
    end

    'No packet found'
  end
end

TuningTrouble.new(ARGV[0] || 'input.txt').run

So, caveat emptor. This version isn't the finished version. Rather, it is the first version. The simplest, most concise version. It's also the slowest. If you look at the code in the repository, you will see all of the variations that are discussed below, commented out. You can try any of them just by uncommenting the relevant methods.

In a celebration of simplicity, today's task has the simplest data parsing possible. i.e. the data in the data file is already ready to use without any other changes, so there is nothing to talk about regarding the parsing, so let's look at the solution:

def find_start_of_packet
  idx = 0
  @data.chars.each_cons(4) do |chunk|
    return idx + 4 if chunk.uniq == chunk
    idx += 1
  end

  'No packet found'
end

This is nicely concise. Ruby has a method, String#chars that can be used to get an array of characters from a string. There is also a method, Enumerable#each_cons, which is mixed into Array, which returns an enumerator that iterates over a sliding window of elements. So, in this example, each_cons(4) returns an iterator that will iterate over all 4 character windows of the original string. Because those windows are arrays, the uniq method can be used to eliminate any duplicated characters. If the result of calling that method equals the original chunk, then the start-of-packet marker has been found.

This version is safe for the full UTF-8 character set, but it is quite slow, taking about 17.7 seconds on my laptop, using Ruby 3.2.0 +YJIT. If @data.chars.each_cons is changed to @data.bytes.each_cons, making the code safe only for ASCII inputs, performance jumps to about 7.2 seconds.

Another variation that is nearly as concise, but faster, is this one.

def parse_packet_data(filename)
  File.read(filename)
end

def no_duplicate_characters?(chars)
  chars.split('').uniq.size == chars.size
end

def find_start_of_packet
  (@data.size - 4).times do |idx|
    return idx + 4 if no_duplicate_characters?(@data[idx, 4])
  end
  'No packet found'
end

Because the sliding window is 4 characters long, it can take no more than @data.size - 4 iterations to find the starting packet if one exists. Inside that loop, return the idx +4 if no duplicate characters are found in the sliding window that is composed of the 4 characters starting at idx. This version eliminates the use of each_cons, which also eliminates manually tracking the actual position of the sliding window, and just directly implements it with a loop and a string slide.

For ease of reading, I moved the check for duplicate characters into a separate method. This doesn't notably effect the Ruby solution execution times, as other things have so much more effect, and it makes them more readable.

def no_duplicate_characters?(chars)
  chars.split('').uniq.size == chars.size
end

This code splits the character string into an array of individual characters, then runs it through uniq to eliminate any duplicated characters, and compares to see if the result is the same size as the original.

For the data set that this task is working with, speed is largely irrelevant, but since the solution for this day is so much simpler than in past days, I'm going to take a closer look at performance.

I created an input file that has 4 megabytes of guaranteed repetitions in every 4-character window, followed by the contents of the original input file. On that set, this original solution finishes in about 19 seconds on my laptop.

Since our data set is composed only of ASCII characters, this can be improved.

def no_duplicate_characters?(chars)
  chars.bytes.uniq.size == chars.size
end

The String#bytes method returns an array of the bytes that make up the string. Since the input data is all ASCII, this is safe (ASCII characters are all single-byte characters). Because the implementation doesn't have to worry about multi-byte characters, this is much faster to execute than the split('') call, and in truth, this is pretty close to as fast as you can make this in Ruby.

The second version (chars.bytes.uniq.size == chars.size) finished in about 8.7 seconds on my laptop.

Can that be improved? What if the method that checks for duplicate characters is just a big boolean statement that explicitly checks the options?

def no_duplicate_characters?(chars)
  !( chars[0] == chars[1] ||
     chars[0] == chars[2] ||
     chars[0] == chars[3] ||
     chars[1] == chars[2] ||
     chars[1] == chars[3] ||
     chars[2] == chars[3] )
end

That version runs in about 8.4 seconds, and it works with all UTF-8 characters, too!

Can it be faster? Yes. If parse_packet_data is revisited, and it is modified to return an array of bytes instead of a string, the speedup is substantial.

def parse_packet_data(filename)
  File.read(filename).bytes
end

This version is what you will find in the repo (with comments explaining the other version). It runs in about 2.2 seconds on my laptop.

Ruby's Fastest Version

There is one final improvement that can be made to the Ruby version. All of the previous versions created new strings or arrays with every call into no_duplicate_characters?. Apart from the obvious performance hit from duplicating data that already exists somewhere else, object creation and cleanup are both expensive in Ruby, so creating large numbers of redundant, short-lived objects impacts performance. However, because all of the data that is required is directly accessible in the @data instance variable, this can be dealt with by just passing an index into no_duplicate_characters? instead of a string or array of bytes.

  def no_duplicate_characters?(idx)
    !(@data[idx] == @data[idx + 1] ||
       @data[idx] == @data[idx + 2] ||
       @data[idx] == @data[idx + 3] ||
       @data[idx + 1] == @data[idx + 2] ||
       @data[idx + 1] == @data[idx + 3] ||
       @data[idx + 2] == @data[idx + 3])
  end

  def find_start_of_packet
    (@data.size - 4).times do |idx|
      return idx + 4 if no_duplicate_characters?(idx)
    end
    'No packet found'
  end

This version is running in about 1.32 seconds on my laptop.

So, how do the other languages stack up?

Crystal solution (winner for most concise)

Part of Crystal's awesomeness is that it has the same sort of productive, expressive syntax as Ruby, but it provides a fantastic type system and creates fast-executing code, as well. Consider:

class TuningTrouble
  @data : String

  def initialize(filename)
    @data = parse_packet_data(filename)
  end

  def run
    puts "The start of packet marker is at character ##{find_start_of_packet}"
  end

  def parse_packet_data(filename)
    File.read(filename)
  end

  def find_start_of_packet
    @data.chars.each_cons(4, reuse: true).index! {|chunk| chunk.uniq == chunk} + 4
  end
end

TuningTrouble.new(ARGV[0]? || "input.txt").run

Crystal wins the concise code battle. This is spiritually idential to the original Ruby solution before we started trying to find faster versions, but it is even more terse! The definition of Array in Crystal mixes in Indexable, which has a method, Indexable#index that returns the index of the first object for which the block returns a truthy value. The index! version just returns an exception if no index is found. This eliminates manual counting of that index and makes this a very terse solution. Now, recall that the Ruby version of the above code takes about 17.7 seconds.

Crystal does it in about 1.27 seconds, and it is safe for the full UTF-8 character set. In fact, changing chars to bytes, in this example, offers no benefits. The next few examples show other fairly terse versions, but none are as fast as this until we get to the more complicated versions.

  @data : String

  def parse_packet_data(filename)
    File.read(filename)
  end

  def no_duplicate_characters?(chars)
    chars.split("").uniq.size == chars.size
  end

  def find_start_of_packet
    (@data.size - 4).times do |idx|
      return idx + 4 if no_duplicate_characters?(@data[idx, 4])
    end

   "No packet found"
  end

This one does manual window management. It is easy to understand, but it is slower than the more concise version, at about 5.5 seconds.

Now look at this one. It turns out that a solution that I didn't even mention in the Ruby section because it is so slow (43 seconds on the 4mb input file) is faster in Crystal:

def no_duplicate_characters?(chars)
  chars.chars.to_set.size == chars.size
end

This is tremendously slow in Ruby, but in Crystal, it takes Ruby's 43 seconds down to about 3.9 seconds, and it's pretty terse, too!

What about this change?

def no_duplicate_characters?(chars)
  chars.bytes.uniq.size == chars.size
end

This works on Crystal just like in Ruby, and performance increases to about 2.4 seconds.

Of course, the fastest Ruby versions use that big boolean statement. So what happens if Crystal is changed accordingly?

  def no_duplicate_characters?(chars)
    !( chars[0] == chars[1] ||
       chars[0] == chars[2] ||
       chars[0] == chars[3] ||
       chars[1] == chars[2] ||
       chars[1] == chars[3] ||
       chars[2] == chars[3] )
  end

This version runs in about 0.92 seconds on Crystal and is UTF-8-safe.

Recall that the fastest Ruby version got its final speedup by indexing directly into the main buffer to avoid the creation of large numbers of small objects. What happens with the Crystal version if that same change is made?

  def no_duplicate_characters?(idx)
    !( @data[idx] == @data[idx + 1] ||
       @data[idx] == @data[idx + 2] ||
       @data[idx] == @data[idx + 3] ||
       @data[idx + 1] == @data[idx + 2] ||
       @data[idx + 1] == @data[idx + 3] ||
       @data[idx + 2] == @data[idx + 3] )
  end

  def find_start_of_packet
    (@data.size - 4).times do |idx|
      return idx + 4 if no_duplicate_characters?(idx)
    end

   "No packet found"
  end

Execution time drops to around 0.465 seconds.

Now, what happens if we make changes so that it only works with ASCII input data? This will require changing the type of the @data instance variable, and a small change to the parse_packet_data method.

  @data : Bytes

  def parse_packet_data(filename)
    File.read(filename).to_slice
  end

  def no_duplicate_characters?(idx)
    !( @data[idx] == @data[idx + 1] ||
       @data[idx] == @data[idx + 2] ||
       @data[idx] == @data[idx + 3] ||
       @data[idx + 1] == @data[idx + 2] ||
       @data[idx + 1] == @data[idx + 3] ||
       @data[idx + 2] == @data[idx + 3] )
  end

  def find_start_of_packet
    (@data.size - 4).times do |idx|
      return idx + 4 if no_duplicate_characters?(idx)
    end

   "No packet found"
  end

The String#to_slice method returns a read-only Slice of bytes, which Crystal provides an alias for, Bytes, which provides direct access to the memory buffer which contains the String data.

This one change takes the execution time down to about 0.0585 seconds. The decimal point is in the right place. That is about 23x faster than the fastest Ruby version, with code that isn't significantly different from what Ruby is doing. Certainly, that is as fast as Crystal can get, right?

As it turns out, no. Crystal has a few more tricks.

  def no_duplicate_characters?(idx)
    seen = StaticArray(UInt8, 4).new(@data[idx])
    step = 1
    while step < 4
      if seen.includes?(@data[idx + step])
        return false
      end
      seen[step] = @data[idx + step]
      step += 1
    end

    true
  end

  def find_start_of_packet
    (@data.size - 4).times do |idx|
      return idx + 4 if no_duplicate_characters?(idx)
    end

    "No packet found"
  end

This version might take a little explanation. A StaticArray is an array that is allocated on the stack, instead of on the heap. The effect of this is that the allocation and deallocation of the memory that it uses are very fast. In Crystal, one can use a StaticArray pretty much anywhere where the size of the array is fixed, and where that size is known at compile time. In this case, we know the maximum amount of previously-seen characters, so we can use a StaticArray to very quickly allocate that space. Additionally, because the first byte to be checked can't match anything in the otherwise empty seen array, the code can save the work of checking that first byte by seeding the seen array with its value, and starting checks with the second byte.

It then iterates through the remaining three, checking to see if any of those characters are in the seen array and it adds them if they are not.

This version is the fastest. It averages about 0.049 seconds on my laptop, to process the 4 MB benchmarking file.

Crystal has one more tool that developers can exploit, though. The libraries that make up the Crystal standard library are all open. This means that if there is something that a person wants to add or change, even if it is part of the core libraries, one can.

Crystal's Fastest Version

One of the challenges to making the Crystal solution faster is that while the data that is needed is all available in memory, every solution preceding the next one is forced to copy pieces of it to another area of memory. Copying data takes more time than not copying data. If there were a way to check a section of a Slice to see if any of its elements match an argument, without copying anything or creating any new objects, the algorithm should be even faster.

struct Slice(T)
  @[AlwaysInline]
  def unsafe_includes_in_window?(offset, size, value)
    size.times do |idx|
      return true if self.unsafe_fetch(offset + idx) == value
    end
    false
  end
end

And then, given that, here is the fastest Crystal version for this puzzle.

  @data : Bytes

  def parse_packet_data(filename)
    File.read(filename).to_slice
  end

  def no_duplicate_characters?(idx)
    step = 1
    while step < 4
      if @data.unsafe_includes_in_window?(idx, step, @data.unsafe_fetch(idx + step))
        return false
      end
      step += 1
    end

    true
  end

  def find_start_of_packet
    (@data.size - 4).times do |idx|
      return idx + 4 if no_duplicate_characters?(idx)
    end

    "No packet found"
  end

A few quick notes about this solution. The @[AlwaysInline] directive is a hint to the compiler that it should inline uses of this method into the generated code. For something that will be called a lot, this can improve speed, and it seems to have a small but measurable effect with this code. Also, the Slice#unsafe_fetch method is like Slice#[], but it doesn't check to make sure that the offset being fetched is within the limits of the data allocated to the Slice before trying to access the data, and it is also defined with @[AlwaysInline].

This version runs in a lightning-fast 0.0355 seconds, on average.

So how does Rust compare? (tl;dr -- it's neck and neck with Crystal, but the fastest Crystal version ends up just edging out the fastest Rust version).

Rust solution

In another first, the Rust solution is, overall, almost-ish as concise as the Ruby/Crystal versions this time!

use std::env;
use std::collections::HashSet;

struct TuningTrouble {
    data: String,
}

impl TuningTrouble {
    fn new(filename: &str) -> TuningTrouble {
        let data = Self::parse_packet_data(filename).unwrap();

        TuningTrouble { data }
    }

    fn parse_packet_data(filename: &str) -> Option<String> {
        let data = std::fs::read_to_string(filename).ok()?;        
        Some(data)
    }

    fn run(&mut self) {
        println!("The start of packet marker is at character #{}", self.find_start_of_packet())
    }

    fn no_duplicate_characters(&self, s: &[u8]) -> bool {
        let mut set = HashSet::new();
        for c in s {
            set.insert(c);
        }
        set.len() == s.len()
    }

    fn find_start_of_packet(&mut self) -> usize {
        self.data
            .as_bytes()
            .windows(4)
            .position(|chunk| self.no_duplicate_characters(chunk))
            .unwrap()
            + 4
    }

}

fn main() {
    let args: Vec<String> = env::args().collect();
    let mut filename = "input.txt";
    if let Some(arg) = args.get(1) {
        filename = arg;
    }

    TuningTrouble::new(filename).run();
}

So, sneak preview, but just like with the two previous languages, this version is the slowest. Read on or peek at the code in the repository for the fastest version.

The details of this first version of the solution are found in the same two methods as in the other languages:

    fn no_duplicate_characters(&self, s: &[u8]) -> bool {
        let mut set = HashSet::new();
        for c in s {
            set.insert(c);
        }
        set.len() == s.len()
    }

    fn find_start_of_packet(&mut self) -> usize {
        self.data
            .as_bytes()
            .windows(4)
            .position(|chunk| self.no_duplicate_characters(chunk))
            .unwrap()
            + 4
    }

The find_start_of_packet() method is really interesting. For reasons that are explained a little later, Rust doesn't support indexing into strings, which makes the creation of a sliding window of string data more tricky. There's work there for the developer. This solution gets around that, though, by calling as_bytes() on the string, which returns an array of bytes.

That data structure does support creating sliding windows, via the windows() method. And then, much like the index method in Crystal, position() iterates through each of those windows, and when the associated block returns a true value, it returns the position in the iterator at which that true value occurred.

This solution to detecting duplicate characters in a string uses a HashSet. In Rust, there is no method equivalent to Crystal or Ruby's uniq function, which allows for a concise solution in those languages. To achieve a similar result in Rust, the method iterates over the characters in the string and inserts them into a HashSet. Since a HashSet only stores unique values, any duplicates will be automatically discarded. After all of the characters have been processed, the length of the HashSet is compared to the length of the original string. Equality returns a true value as it indicates that no duplicated characters were discarded from the HashSet.

This is concise but it does a lot of extra work in populating a fairly complex data structure (HashSet) only to check its length. Thus, as the Rust solutions go, it is the slowest, at about 2.00 seconds. Can it be improved?

fn no_duplicate_characters(&self, s: &str) -> bool {
    let mut chars = s.chars();
    let mut seen = vec![];
    while let Some(chr) = chars.next() {
        if seen.contains(&chr) {
            return false;
        }
        seen.push(chr);
    }
    true
}

This version builds a simple vec of seen characters, similar to the fastest Crystal solution. The code is fairly clear. It uses a while let loop to iterate through the characters, checking to see if they have been seen before, and returning false if they have been. Alternatively, if a given character has not been seen, it is pushed into the seen vector, and the next one is checked.

Recall, though, that the vec type is a dynamic array that is allocated on the heap. This would be better if it were built with a data structure that is faster to work with than a vec. Also, there is another interesting consideration with Rust.

In both Ruby and in Crystal, indexing into a string operates at the character level. In either of those languages, if you have a string like ๐Ÿฅ‘๐Ÿ’Ž๐Ÿฆ€ you can correctly index into it:

tag = "๐Ÿฅ‘๐Ÿ’Ž๐Ÿฆ€"
puts tag[1..2]

This will result in an output of ๐Ÿ’Ž๐Ÿฆ€. If you try that in Rust, though, you will get a runtime error.

fn main() {
  let tag = String::from("๐Ÿฅ‘๐Ÿ’Ž๐Ÿฆ€");

  println!("{:?}", &tag[1..2]);
}
thread 'main' panicked at 'byte index 1 is not a char boundary; it is inside '๐Ÿฅ‘' (bytes 0..4) of `๐Ÿฅ‘๐Ÿ’Ž๐Ÿฆ€`', junk.rs:4:21
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

This is because, while all three languages use UTF-8 strings, Rust doesn't permit one to index into them. Attempting to access a single index instead of a range isn't even allowed at the compilation level. A line like println!("{:?}", &tag[2]); would fail to even compile.

This poses interesting challenges. In Ruby and in Crystal, it is safe, and will not error with any UTF-8 characters, to do something like call_method_foo(data[idx..idx+4]). In Rust, though, that fails. So, if one wants to implement a solution that can handle all UTF-8 strings, what is one to do? Is it solvable to handle that issue and make our solution more like the fastest Crystal one, all at the same time?

fn sliding_windows<'a>(&self, data: &'a str, size: usize) -> impl Iterator<Item = &'a str> {
    data.char_indices().flat_map(move |(from, _)| {
        data[from..]
            .char_indices()
            .skip(size - 1)
            .next()
            .map(|(to, chr)| &data[from..from + to + chr.len_utf8()])
    })
}

fn no_duplicate_characters(&self, s: &str) -> bool {
    let chars = s.chars();
    let mut seen = ['\x00', '\x00', '\x00', '\x00'];
    let mut step = 0;
    for char in chars {
        if seen.contains(&char) {
            return false;
        }
        seen[step] = char;
        step += 1;
    }

    true
}

fn find_start_of_packet(&mut self) -> usize {
    let mut idx = 0;
    for window in self.sliding_windows(&self.data, 4) {
        if self.no_duplicate_characters(window) {
            return idx + 4;
        }
        idx += 1;
    }
    0
}

Yes. The tricky bit is that Rust makes the developer come up with their own iterator implementation for sliding windows that honor character boundaries. The only way that I could figure out to do that was to use char_indices() to do the dirty work. If there is a better way, I would love to know about it.

Leveraging a sliding window iterator, the other changes can be seen in no_duplicate_characters(). That method now uses an array instead of a vec for the seen buffer, which is much more lightweight.

This version is UTF-8 safe, and also much faster, finding the correct answer from our 4 MB input file in about 0.2 seconds.

This performance is still substantially slower than the fastest Crystal version, but there are a couple of tricks still available for this Rust solution. Recall that in the Ruby and the Crystal versions, eliminating data copying by just passing an index to refer to the original data resulted in a significant speedup. Let's do that in the Rust version (and get rid of the laboriously created UTF-8-safe sliding windows code) and also try to use the boolean block approach.

fn no_duplicate_characters(&self, idx: usize, bytes: &[u8]) -> bool {
    !(bytes[0] == bytes[1]
        || bytes[0] == bytes[2]
        || bytes[0] == bytes[3]
        || bytes[1] == bytes[2]
        || bytes[1] == bytes[3]
        || bytes[2] == bytes[3])
}

fn find_start_of_packet(&mut self) -> usize {
    self.data
        .as_bytes()
        .windows(4)
        .position(|window| self.no_duplicate_characters(0, window))
        .unwrap()
        + 4
}

This is very similar to one of the fastest Crystal versions. The differences in performance are obscured by the background noise of my laptop's operation, but benchmarking put this code at about 0.049 seconds, which is just a bit faster than Crystal's 0.0585 seconds for a similar implementation.

So what happens if I go back to the version that builds an array of seen characters, but have it work on bytes instead of characters?

 fn no_duplicate_characters(&self, bytes: &[u8]) -> ool {
    let mut seen = [bytes[idx], 0, 0, 0];
    let mut step = 1;
    while step < 4 {
        if seen.contains(&bytes[idx + step]) {
            return false;
        }
        seen[step] = bytes[idx + step];
        step += 1;
    }

    true
}

fn find_start_of_packet(&mut self) -> usize {
    self.data
        .as_bytes()
        .windows(4)
        .position(|window| self.no_duplicate_characters(window))
        .unwrap()
        + 4
}

This version chews through data impressively quickly, benchmarking at 0.0427 seconds for the 4mb test file.

Rust's Fastest Version

Believe it or not, the best Rust solution is a little faster, and bucking the trend, is also actually Rust's most concise solution.

fn find_start_of_packet(&mut self) -> usize {
    self.data
        .as_bytes()
        .windows(4)
        .position(|window| {
            window
                .iter()
                .enumerate()
                .all(|(idx, c)| !window[..idx].contains(c))
        })
        .unwrap()
        + 4
}

This one might need some explanation of the code that determines if the characters are unique. An iterator is created from the sliding window, and then it is enumerated, which means that each of the characters in the window will be iterated over accompanied by their index in the window. The invocation of all() means that the block that follows must return true for every iteration of its closure. That closure returns true if none of the unseen characters in the window match the current character being iterated. If none match, then all() will return true, position() will report the position, and when 4 is added to it, the start-of-packet marker has been found.

This version runs in about 0.0356 seconds, which basically means that with the 4 byte window, the performance of Crystal vs Rust is indistinguishable.

Whew! That was a lot of performance optimization with those different versions!

Part 2

It seems that the communication system on your device is correctly detecting packets, but it still isn't working. It looks like it needs to also search for messages. Start-of-message markers are similar to start-of-packet markers, but they consist of 14 distinct characters instead of 4.

The good thing here is that we aren't doing anything different in this second part of the solutions. All that is necessary is to identify a much larger window -- 14 characters -- instead of the much smaller 4-character window. For all of the fastest solutions, this change is as simple as changing a few numbers in the implementation, however. Let's see how they hold up.

The Approach

The approach is fundamentally no different than in part 1.

Ruby solution

# frozen_string_literal: true

class TuningTrouble
  def initialize(filename)
    @data = parse_packet_data(filename)
  end

  def run
    puts "The start of packet marker is at character ##{find_start_of_packet}"
  end

  def parse_packet_data(filename)
    File.read(filename).bytes
  end

  def no_duplicate_characters?(idx, window = 14)
    pos = 0
    while pos < window - 1
      comp_pos = pos + 1
      while comp_pos < window
        return false if @data[idx + pos] == @data[idx + comp_pos]

        comp_pos += 1
      end
      pos += 1
    end
    true
  end

  def find_start_of_packet
    (@data.size - 14).times do |idx|
      return idx + 14 if no_duplicate_characters?(idx)
    end

    'No packet found'
  end
end

TuningTrouble.new(ARGV[0] || 'input.txt').run

This is the fastest Ruby solution. It completes the task on the large file in about 1.45 seconds.

Instead of having a single large boolean statement, it works by doing single comparisons inside of two tight loops. The outer loop iterates through the first through the 13th byte and checks if any of the characters that follow it are the same. If any are, then there are duplicated characters, so the method should return false.

def no_duplicate_characters?(idx, window = 14)
  pos = 0
  while pos < window - 1
    comp_pos = pos + 1
    while comp_pos < window
      return false if @data[idx + pos] == @data[idx + comp_pos]

      comp_pos += 1
    end
    pos += 1
  end
  true
end

If it completes the loops, then there are no duplicates, and true can be returned. The nice thing about this solution is that it is essentially free to make it generic. If a window size is passed into no_duplicate_characters?, it will be used to determine the size of the search.

Crystal solution

struct Slice(T)
  @[AlwaysInline]
  def unsafe_includes_in_window?(offset, size, value)
    size.times do |idx|
      return true if self.unsafe_fetch(offset + idx) == value
    end
    false
  end
end

class TuningTrouble
  def initialize(filename)
    @data = parse_packet_data(filename)
  end

  def run
    puts "The start of packet marker is at character ##{find_start_of_packet}"
  end

  @data : Bytes

  def parse_packet_data(filename)
    File.read(filename).to_slice
  end

  def no_duplicate_characters?(idx)
    step = 1
    while step < 14
      if @data.unsafe_includes_in_window?(idx, step, @data.unsafe_fetch(idx + step))
        return false
      end
      step += 1
    end

    true
  end

  def find_start_of_packet
    (@data.size - 14).times do |idx|
      return idx + 14 if no_duplicate_characters?(idx)
    end

    "No packet found"
  end
end

TuningTrouble.new(ARGV[0]? || "input.txt").run

The only changes are to change the 4 to 14 in a couple of places.

So, how fast is it when looking for a 14 character window instead of a 4 character window?

Truthfully, I can't measure any consistent difference. It tends to find the answer in the 4mb benchmarking file in about 0.0355 seconds.

So, what about Rust?

Rust solution

use std::env;

struct TuningTrouble {
    data: String,
}

impl TuningTrouble {
    fn new(filename: &str) -> TuningTrouble {
        let data = Self::parse_packet_data(filename).unwrap();
        TuningTrouble { data }
    }

    fn parse_packet_data(filename: &str) -> Option<String> {
        let data = std::fs::read_to_string(filename).ok()?;
        Some(data)
    }

    fn run(&mut self) {
        println!(
            "The start of packet marker is at character #{}",
            self.find_start_of_packet()
        )
    }

    fn no_duplicate_characters(&self, idx: usize, bytes: &[u8], seen: &mut [u8; 14]) -> bool {
        seen[0] = bytes[idx];
        for i in 1..14 {
            seen[i] = 0;
        }
        let mut step = 1;
        while step < 14 {
            if seen.contains(&bytes[idx + step]) {
                return false;
            }
            seen[step] = bytes[idx + step];
            step += 1;
        }

        true
    }

    fn find_start_of_packet(&mut self) -> usize {
        let bytes = self.data.as_bytes();
        let mut seen = [0_u8, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0];
        for idx in 0..(self.data.len() - 14) {
            if self.no_duplicate_characters(idx, &bytes, &mut seen) {
                return idx + 14;
            }
        }

        0
    }
}

fn main() {
    let args: Vec<String> = env::args().collect();
    let mut filename = "input.txt";
    if let Some(arg) = args.get(1) {
        filename = arg;
    }

    TuningTrouble::new(filename).run();
}

Et voila! The Rust version follows a very similar pattern to the Crystal version, creating a reusable buffer that is cleared on every use instead of allocating a new one and expanding the buffer to 14 bytes.

This version is vastly slower than the Crystal one, clocking in at about 0.081 seconds, on average.

Conclusion

Whew! This was an interesting deep dive into how implementation affects performance, as well as some techniques that can be used to achieve better performance. The results of Ruby vs Rust should surprise nobody. The Ruby solutions were very programmer-friendly to write, but the best-performing Ruby solution was still vastly slower than the best-performing Rust solution.

What I think may have come as a surprise to some people is the Crystal performance. The Crystal solutions that were essentially identical to the Ruby versions were always much faster than the equivalent in Ruby, and sometimes vastly faster. However, the nature of Crystal means that it also offers other options that aren't available in Ruby. By paying careful attention to the implementation and to the way that memory was used, the fastest Crystal versions delivered shockingly fast performance.

TL;DR -- Jump here for a quick summary

if you jumped straight down here, I tested these with an artificially bloated input file that contained 4 MB of data from the portion of the puzzle input that didn't contain the solution, followed by the full puzzle input again.

Versions used:

Ubuntu 22.04 on an Intel i9-12900HK based laptop (2022 Dell XPS 15)
Ruby 3.2.0 +YJIT
Crystal 1.7.0-dev [292052bb2] (2023-01-04)
rustc 1.68.0-nightly (388538fc9 2023-01-05)

In Part 1:

  • Ruby's best-performing version took 1.32 seconds on average.

  • Crystal's best-performing version took 0.0355 seconds, on average.
    This seems to be a measurement limitation with the shell time and the minimum amount of time necessary to simply execute a Crystal program. Internal timing shows around a 0.015 second runtime for the actual solution....

  • Rust's best-performing version took 0.0356 seconds, on average.
    This is likely also a result of the minimum amount of time necessary to simply start an executable on my system, which is why it was essentially identical to the Crystal timings.

  • Crystal's most concise version was shorter than even Ruby's most concise version.

In Part 2:

  • Ruby's best-performing version took 1.45 seconds on average.

  • Crystal's best-performing version took 0.0355 seconds on average.
    This seems to be a measurement limitation with the shell time and the minimum amount of time necessary to simply execute a Crystal program. Internal timing shows around a 0.015 second runtime for the actual solution....

  • Rust's best-performing version took 0.081 seconds on average.

Rust is a justifiably exciting language, offering a phenomenally expressive language for a system programming langauge while at the same time providing mechanisms that make it pretty difficult to do truly hazardous things with memory management, and wrapping it all up with a compiler that produces very fast-executing code.

However, Rust's expressiveness can not match Crystal's. Even the easiest things are generally a lot more finicky to build with Rust than with Crystal. A complaint that I have heard in some circles is how expensive Rust development can be, because of this. This is not a knock on Rust, but rather, is speculation that there are many people who are using Rust because its compiled code runs quickly, but that Rust may not be the right tool for all of those things that it is being used for.

A very expressive language like Crystal, which is very programmer friendly along with providing robust type safety guarantees and the flexibility to tightly tune performance where it is needed could be an ideal tool to for many projects. If Crystal has thus far flown under your radar, maybe it's time to check it out?

0
Subscribe to my newsletter

Read articles from Kirk Haines directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Kirk Haines
Kirk Haines