benjamin.computer

Rust reverses research ruin

31-07-2019

I've been learning how to use Rust. It's quite the hot-hot thing in the world of programming at the moment. I think I can see why; the safety through interesting borrowing mechanics, the ease of installing libraries and writing tests. It's all there. I'm a big fan of compiled languages too, despite using Python an awful lot. I'm a bit old-school - I like C++ and C. Hell! I've even used Fortran! But I want to see where the modern edge of compiled languages is and I think Rust is it. So much so that I now use it in my research and I like to write a little about why.

Artificial Intelligence and Data

I'm a PhD student at the moment, working with Artificial Intelligence in a biology setting. That pretty much means either MATLAB (which is terrible - fight me internet!), or Python which is now a little boring and getting a little too bloated for my liking. I'll admit, I couldn't do my work without Tensorflow or Pytorch (sad as that is, and yes I know I can use other languages in Tensorflow), but before we get to the AI part, we need to work with the data.

AI needs lots of data. Sometimes, there is enough data available but it needs processing and that takes time and processing power. Sometimes the data doesn't exist at all and you need to make it up. This is where my previous life as a Research Software Engineer comes in. Working with big iron is one of the things I like to do. In such scenarios, C++ is king (Fortran and Python occasionally get a look-in). If I want to write super-fast, close to the metal code I can run over several threads, use GPU compute or MPI over different nodes, I need something powerful.

What is so good about Rust?

I know, I know. Rust is all trendy and new, their website uses funky webfonts, they have a cute animal mascot and all the rest. Is it just hype like any number of JS variants? Well, it could be, but then Microsoft announced they were looking at it and I've heard of a few other large scale projects using it (some of which I can't talk about because it's going to be a big suprise!).

To me, Rust seems to be trying to make low level languages safe. I believe they are going about this by making you, the programmer, very aware of which block of code owns what variables? It's called Ownership and I think it's the hardest and coolest concept that Rust brings in (at least so far, I've not seen everything the language has to offer yet). Sometimes, this can get very confusing and difficult to follow. I've found myself occasionally getting angry at not being able to add or remove from a vector, or passing a string to a function and modifying it, but eventually, it clicks. Guaranteeing memory safety without a garbage collector doing all the cleanup is quite a cool feat.I've found this forces me to write code slower but better. I think that's quite important in any setting but especially for research code.

The second thing I like about Rust is the program Cargo. It's a sort of program manager, a little bit like the one in Go, in-so-far as it uses a defined directory structure to look after your build variants, dependencies and tests. You don't have to use it, but the fact that it's there is a big win. I do really like CMake but for now, I can use Cargo and it all just works.

Speaking of the tests, Rust and cargo make it really easy to add tests. Here's one I added earlier


#[test]
fn test_distribs() {
    let n = log_normal(0.0, 0.5, 1.0);
    println!("Sample {}.", n);
    assert!((n - 0.7978846).abs() <  0.001);

    let p = normal(0.0, 1.0, 0.0);
    println!("Sample {}.", p);
    assert!((p - 0.3989).abs() <  0.001);
}

All you need to do is add it at the end of the file in question (typically main.rs) and then type

cargo test

Job done! I hate writing tests, so making it easy (and dare-I-say-it, fun?) is a really good idea.

Thirdly, the options for parallel programming are quite good. Threads are still around but we also have channels, mutexes and special traits out of the box.

Fourthly, there are some nice modern features that have made their way into the language on the ground floor. Closures are perhaps the most obvious, but also for...in loops, traits and iterators. It's nice to see these features make it over to a language closer to the metal. They can make for succinct and tight code. I like that.

Fifthly, I think the documentation is pretty darn good! The beginners Rust book is great, the standard library documentation is good and even the various crates I've looked at have good documentation too! Excellent!

Finally, I can't speak generally about the speed of the language but I've found it to make small, compact programs that run fast. I'll need to benchmark a little more in the future but so far, I like what I'm seeing.

CEP152 Images
Some of the images I'm trying to generate with Rust.

So what is bad?

There are a few things that I found tricky and though I'm sure I'll get around them, I think they might cause a few issues for folks new to the language.

Firstly, the syntax can get quite dense and hard to follow quite quickly. Apostrophes, double fullstops, question marks and all these little punctuation marks can be quite confusing. Not sure how I feel about that but I'm sure I'll cope.

Secondly, I had a lot of trouble with the mod keyword, the directory structure and how to divide up my code. Partly, this is my fault, coming from a python and C++ background but I couldn't get my head around it initially, and I still don't think I've fully sorted it out. I think it's an area where the designers have admitted they are changing things around so perhaps it's early days.

Finally, the borrowing and ownership I think can result in really verbose or hard to read code. I suspect, there are ways of doing things, much like writing python pythonicly. So often I get the dreaded Cannot move out of borrowed content error when all I want to do is a little modification to a variable - just a little one. No! Wrist slap! I get that this is fine but until I get really good, I have to write lots of lines where previously one would do.

Some of the code in the various libraries I have no idea about. I mean, what does this code really do? I can't really even parse it in my brain properly!


 #[inline(always)]
    pub fn err(&mut self, msg: &'static str) -> St {
        self.err = Some(types::CodecError { upto: self.pos, cause: msg.into_maybe_owned() });
        Default::default()
    }

  impl $dec {
            pub fn new() -> Box { box $dec { st: $stmod::$inist } as Box }
        }

I think it's fair to say Rust code is a bit more verbose and to conquer that, the designers use macros and short-hands to keep the absolute number of lines and characters down. It makes the code denser however. I think maybe I'll get better at this with time.

All of these are surmountable though, with enough time and proper training I think.

Simulators and supercomputers

In my current project I need to simulate a particular protein CEP152. A simulator exists, written by a chap called Christian Sieben. It's a cool little program but it's written in MATLAB and not designed for high-throughput of images. I figured, why not write this in Rust? It'll be tighter, faster and open to being parallelised in an easy way.

The meat of this simulator is a set of distributions and matrix multiplications. So where do we start? We need a good linear algebra set, some useful maths functions and image manipulation. Here is a list of some of the crates (Rust's name for libraries kinda) I've used:

nalgebra is a really nice matrix and vector library that I think will be useful for a lot of folks. I'd also recommend the scoped_threadpool library as it made dealing with ownership and threads a little bit easier and tighter for me. argparse is essential for, well, parsing command line arguments. pbr is nice if you like progress bars and why not.

Many of the functions in MATLAB don't exist in Rust. Inverse cumulative distribution functions don't quite exist yet so I wrote my own! It took me a while to find the actual algorithms but it was fun. Sadly, some of them I couldn't manage so I had to generate a set of values from the MATLAB functions, save them as a textfile, and use them inside Rust. It's not the best solution but it works for now.

fitrs deserves a special mention. I didn't know this but fits is a NASA built image format that has the nice ability of saving floating point values for pixels which is great, especially, when you are dealing with data that needs a little scaling and normalising.

After writing my simulator in Rust I get an insane speedup over the MATLAB version, although my computer sounds like a hurricane when all 24 cores max out. But I get 100,000 images generated in a few hours, instead of several days.

In conclusion...

I like Rust. It's being used in embedded systems which is exciting, but I can see a lot of potential for HPC, graphics, research and game engines! I've even seen a few folks like Logicoma use it in demoscene work. According to wikipedia and Stack-Overflow, Rust is most loved language of 2019 and I think I can see why. It's tight, complex and dense, with batteries included. I'll be using it as my goto language from now on I reckon.


benjamin.computer