my rust attempt at the 1 billion row challenge

My attempt and lessons learned from 1 billion row count optimzations in rust.

The task

https://github.com/gunnarmorling/1brc read max, min, mean temperatures from a 1 billion row file.

Pre-empting whats slow

When considering possible optimizations I'm trying to pre-empt what memory intensive operations might be. Maybe things like:

big O notation e.g. reading line parts more than once
memory e.g. streaming with buffers and clearing buffers for reuse
data structures e.g. hashmaps for fast lookups. key-value pairs are always best when revisiting data.
unnecessary type conversions for example str, float, int. Integers are FAST.
how to calculate floats using integers. Integers are FAST.

Attempt One:

Took longer than 20 minutes so I gave up waiting.

Attempt Two:

Took 17 mins! https://github.com/ryukyi/1brc/commit/90610eca291309590eebcb430e8ef9852423265b

Using line reader with strings and string hashmap:

rust attempt 2
use std::io::{BufRead, BufReader};
use std::collections::HashMap;

struct CityTemp {
    acc_temp: f64,
    count_temp: i64,
    max_temp: f64,
    min_temp: f64,
    mean_temp: f64,
}
pub fn read_1billion_rows(path: &str) -> Result<(), std::io::Error> {
    let file = File::open(path)?;
    let reader = BufReader::new(file);

    let mut city_temps: HashMap<String, CityTemp> = HashMap::new();

    for line in reader.lines() {
        let line = line?;
        let parts: Vec<&str> = line.split(';').collect();
        let city = parts[0].to_string();
        if let Ok(temp) = parts[1].parse::<f64>() {

            let city_temp = city_temps.entry(city).or_insert(CityTemp {
                acc_temp: 0.0,
                count_temp: 0,
                max_temp: -999f64,
                min_temp: 999f64,
                mean_temp: 0.0,
            });
            city_temp.acc_temp += temp;
            city_temp.count_temp += 1;
            city_temp.max_temp = city_temp.max_temp.max(temp);
            city_temp.min_temp = city_temp.min_temp.min(temp);
        }
    }

    // second pass for mean
    for (city, city_temp) in city_temps.iter_mut() {
        city_temp.mean_temp = city_temp.acc_temp / city_temp.count_temp as f64;
        println!("City: {}, Min: {}, Max: {}, Mean: {}", city, city_temp.min_temp, city_temp.max_temp, city_temp.mean_temp);
    }
    Ok(())
}

Attempt Three:

3mins and 9seconds! https://github.com/ryukyi/1brc/commit/9aec0ddcd04bbf485941d831a5ad821278028e7f

rust attempt 3

  // storing hashmap with bye key removes string conversions
  let mut city_temps: HashMap<Vec<u8>, CityTemp> = HashMap::new();
  // ...
  let city = parts[0].as_bytes().to_vec();

Attempt Four:

2mins and 9 seconds! https://github.com/ryukyi/1brc/commit/437c9910aea7587368ba7195f7cf0f643825840e

rust attempt 4

struct CityTemp {
    acc_temp: f64,
    count_temp: i64,
    max_temp: f64,
    min_temp: f64,
}

// remove mean from CityTemp since only need this at the final report:
struct CityTempReport {
    min_temp: f64,
    max_temp: f64,
    mean_temp: f64,
}
    // ...
    // read bytes instead of string and split manually
    let mut line = Vec::new();

    while reader.read_until(b'\n', &mut line)? > 0 {

        let parts = line.split(|&x| x == b';').collect::<Vec<&[u8]>>();
        // ...
          // still using temp string and removing line end
          if let Ok(temp_str) = std::str::from_utf8(temp_bytes) {
                      if let Ok(temp) = temp_str.trim_end().parse::<f64>() {
                        // ...
                        // Maximum and min don't both need to be checked
                        if temp > city_temp.max_temp {
                          city_temp.max_temp = temp;
                        } else if temp < city_temp.min_temp {
                            city_temp.min_temp = temp;
                        }

Attempt Five:

1min and 35 seconds! Ok so by now I'm out of optimization ideas so I started browsing open source solutions to learn how to:

profile code which I've never done before in rust
manipulate using integers instead of floats
optimize the hashmap

This had answers to all of my questions: https://curiouscoding.nl/posts/1brc/. What a powerful solution effectively shaving off 30%!

rust attempt 5

fn parse_float_as_ints(mut s: &[u8]) -> V {
    // efficent use of integers using base 10 to dynamically "bump out the decimal"
    // Only one float conversion required for final report. All credit to RagnarGrootKoerkamp!
    // https://github.com/RagnarGrootKoerkamp/1brc/blob/1fd779a2ae175b733793ca10ec94c73b769fee5e/src/main.rs

    // positive is most common so minimise handling negatives
    let is_pos = s[0] != b'-';
    if !is_pos {
        s = &s[1..];
    }

    // s = abc.d
    let (a, b, c, d) = match s {
        [c, b'.', d] => (0, 0, c - b'0', d - b'0'),
        [b, c, b'.', d] => (0, b - b'0', c - b'0', d - b'0'),
        [a, b, c, b'.', d] => (a - b'0', b - b'0', c - b'0', d - b'0'),
        [c] => (0, 0, 0, c - b'0'),
        [b, c] => (0, b - b'0', c - b'0', 0),
        [a, b, c] => (a - b'0', b - b'0', c - b'0', 0),
        _ => panic!("Unknown pattern: {:?}", std::str::from_utf8(s).unwrap()),
    };

    let v = a as V * 1000 + b as V * 100 + c as V * 10 + d as V;
    if is_pos {
        v
    } else {
        -v
    }
}

Attempt Six

After running synchronous and inspecting flamegraph, I'm out of ideas for how to optimize so onwards to async maximising available cores and threads.