my rust attempt at the 1 billion row challenge
My attempt and lessons learned from 1 billion row count optimzations in rust.
The task
https://github.com/gunnarmorling/1brc read max, min, mean temperatures from a 1 billion row file.
Pre-empting whats slow
When considering possible optimizations I'm trying to pre-empt what memory intensive operations might be. Maybe things like:
- big O notation e.g. reading line parts more than once
- memory e.g. streaming with buffers and clearing buffers for reuse
- data structures e.g. hashmaps for fast lookups. key-value pairs are always best when revisiting data.
- unnecessary type conversions for example str, float, int. Integers are FAST.
- how to calculate floats using integers. Integers are FAST.
Attempt One:
Took longer than 20 minutes so I gave up waiting.
Attempt Two:
Took 17 mins! https://github.com/ryukyi/1brc/commit/90610eca291309590eebcb430e8ef9852423265b
Using line reader with strings and string hashmap:
Attempt Three:
3mins and 9seconds! https://github.com/ryukyi/1brc/commit/9aec0ddcd04bbf485941d831a5ad821278028e7f
// storing hashmap with bye key removes string conversions
let mut city_temps: HashMap<Vec<u8>, CityTemp> = HashMap::new();
// ...
let city = parts[0].as_bytes().to_vec();
Attempt Four:
2mins and 9 seconds! https://github.com/ryukyi/1brc/commit/437c9910aea7587368ba7195f7cf0f643825840e
struct CityTemp {
acc_temp: f64,
count_temp: i64,
max_temp: f64,
min_temp: f64,
}
// remove mean from CityTemp since only need this at the final report:
struct CityTempReport {
min_temp: f64,
max_temp: f64,
mean_temp: f64,
}
// ...
// read bytes instead of string and split manually
let mut line = Vec::new();
while reader.read_until(b'\n', &mut line)? > 0 {
let parts = line.split(|&x| x == b';').collect::<Vec<&[u8]>>();
// ...
// still using temp string and removing line end
if let Ok(temp_str) = std::str::from_utf8(temp_bytes) {
if let Ok(temp) = temp_str.trim_end().parse::<f64>() {
// ...
// Maximum and min don't both need to be checked
if temp > city_temp.max_temp {
city_temp.max_temp = temp;
} else if temp < city_temp.min_temp {
city_temp.min_temp = temp;
}
Attempt Five:
1min and 35 seconds! Ok so by now I'm out of optimization ideas so I started browsing open source solutions to learn how to:
- profile code which I've never done before in rust
- manipulate using integers instead of floats
- optimize the hashmap
This had answers to all of my questions: https://curiouscoding.nl/posts/1brc/. What a powerful solution effectively shaving off 30%!
fn parse_float_as_ints(mut s: &[u8]) -> V {
// efficent use of integers using base 10 to dynamically "bump out the decimal"
// Only one float conversion required for final report. All credit to RagnarGrootKoerkamp!
// https://github.com/RagnarGrootKoerkamp/1brc/blob/1fd779a2ae175b733793ca10ec94c73b769fee5e/src/main.rs
// positive is most common so minimise handling negatives
let is_pos = s[0] != b'-';
if !is_pos {
s = &s[1..];
}
// s = abc.d
let (a, b, c, d) = match s {
[c, b'.', d] => (0, 0, c - b'0', d - b'0'),
[b, c, b'.', d] => (0, b - b'0', c - b'0', d - b'0'),
[a, b, c, b'.', d] => (a - b'0', b - b'0', c - b'0', d - b'0'),
[c] => (0, 0, 0, c - b'0'),
[b, c] => (0, b - b'0', c - b'0', 0),
[a, b, c] => (a - b'0', b - b'0', c - b'0', 0),
_ => panic!("Unknown pattern: {:?}", std::str::from_utf8(s).unwrap()),
};
let v = a as V * 1000 + b as V * 100 + c as V * 10 + d as V;
if is_pos {
v
} else {
-v
}
}
Attempt Six
After running synchronous and inspecting flamegraph, I'm out of ideas for how to optimize so onwards to async maximising available cores and threads.