The other day I was going through some old backups and I ran into an archive of a software project I worked on 15 years ago. It was a C project whose goal was to parallelize a very popular and widely used bioinformatics tool called clustalw. At that time bioinformatics was what “big data” is today (or should I say it is the other way around). Anyone could join the field, there were yet no schools offering advanced degrees in bioinformatics. I joined a group at the Plant Biotechnology Institute with the National Research Council of Canada as an intern (this was when I was a CS student at the University of Saskatchewan) just by asking if they needed anyone (ah those times of opportunity)!
At the time and even today clustalw was the most popular tool to align multiple genetic sequences. Without going into the reason why you would want to “align sequences” (it has to do with finding their evolutionary history), clustalw was a sequential app written in C, by people who were biology PhDs by training (shudder…). in the year 2000 the “multi-core thing” did not exist, if you wanted to run things in parallel, “big iron” was the answer - SGI, Sun, HP/Compaq, IBM etc. had their own multi-processor machines that cost quite a lot, ran proprietary version of Unix and were very costly to keep going. SGI had a version of clustalw but you could only get the binaries and you could only run them on SGI hardware/software. Hence, the folks at PBI thought they could help people along with an open-source version of clustalw, since it was so popular. Enter “lil’ old me”.
After a few months of working on this, I came up with a pthreads based version of clustalw that scaled on multi-cpu architectures. It was mostly the same (ugly) code base as the sequential original, however, I had identified two portions of the software that could be parallelized. For testing I had access to various multi-processor installations, most of them government computers somewhere far away in Ottawa (things like a 104-processor Sun Solaris machine in some governmental lab or a 12 CPU SGI Irix machine or…). At the same time, my group had hand-built one of the first Beowulf clusters in the country - it was a 16 node cluster running Linux, where each node had 2 Intel x86 32-bit processors. At that time it was a screwdriver in hand, build from scratch kind of an effort and the group was very proud of what it had done.
Enter the reason for this post. I ran into a comment I had put into the code regarding a thread function I had written. In those times (I had discovered early on), the best way to run things was to fire up a pool of worker threads and then stack the work for them in a FIFO queue. The workers only task would be to take work off this queue until it was exhausted. If it was exhausted, the worker threads would idle. There was usually one more worker thread than physical CPUs on the system. Various OSs differed in the levels of sophistication when it came to giving the programmer the ability to “pin a thread to a CPU” - Linux at that time did not have that functionality. Obviously, keeping a thread on a CPU sometimes has its advantages, especially if the thread is long running - a scheduler that has to move the code around CPUs will spend some time doing that instead of allowing the thread to do the work it was designed to do.
Anyways, one of the comments said that on Linux, the multi-threaded version produced higher precision results than the original sequential version of the software. So, if the original sequential result was, e.g. 0.12345, the multi-threaded version would sometimes produce 0.12346. But only on Linux/x86. Not on Linux/ia64 (anyone remember Itanium?), not on SGI Irix, not on Sun Solaris running on Sun hardware, not on Digital Unix running on Alpha architecture.
Turns out the fix I found was to declare that particular variable “volatile”. What volatile did, in reality, was to flag this variable for the compiler (as per the C standard) as being expected to change at any time (for lack of better description). This meant that the compiler had to keep the variable in memory, always. The C standard at the time was that floats (which this variable was), had to be 64-bit in size. However, it turns out that gcc (C compiler at the time) liked to keep variables in the floating point registers (which were 80-bit) and at a higher precision. What volatile effectively did to fix this “problem” is to force gcc to generate code to keep the variable at lower precision in memory. Obviously this was not a great solution since a thread was forced to sync accesses on a variable all the time even if there was no need to do so (since this variable was local to the thread and hence not mutable from any other thread, hence no race condition could ever exist on it).
Why this post? It brought back memories of the kind of problems we used to solve just 15 years ago! Today I work with Scala and Akka and concurrency is a whole new ball game. At some level Akka will let me pin an actor to a thread but in a multi-node actor system I may not even know what machine an actor lives on. Things like Spark or Ignite make this even more high-level - you have some kind of an abstraction to send a closure around the cluster, it magically gets executed and you get the results back. The framework worries about atomicity, data affinity/colocation etc.
I wonder though (and worry about it too!), how many software engineers or programmers today know what to do (and how to chase down a low-level bug) when things go wrong ;). So many layers of abstraction, so many things to worry about: the compiler for Scala, the library for Akka, the underlying JVM implementation of it all, then the increasingly complex operating system, the networking layer, the fact that it all runs in the cloud where someone else makes up the rules. We have come increasingly to rely on standards and even more than that, promises. Promises that things will work as someone swears they will, as someone exposes the functionality to do so. However, when things fail, well, topic for another post.
P.S. Nothing wrong with promises - it is how we make progress.
Ref: a mailing list discussion about the issue I mentioned. The dates are from 2004 but the actual discussion happened some time in 2000 or 2001 (I believe 2004 is the timestamp when it was all archived): Link to archive