I'd like to introduce the first Parallel Haskell Digest, a newsletter aiming to show off all the work that's going on using parallelism and concurrency in the Haskell community.
The digest is a bit of an experiment. For the community in general, I hope to help you catch up with a monthly recap of news, interesting blog posts and discussions about parallelism in Haskell. For people (like me) who are new to parallelism and concurrency in Haskell, or maybe just have a passing interest, I hope to give you little nibbles, with regular features like the Word of Month, Featured Code and Parallel Puzzlers.
Finally, as a bit of disclosure, I'm writing this digest as part of my work to promote the Parallel GHC Project. So don't be surprised if I give it a little special emphasis :-)
Word of the Month
This very first edition of the PH digest is brought to you by the word spark. You may have heard of sparks and threads in the same sentence. What's the difference?
A Haskell thread is a thread of execution for IO code. Multiple Haskell threads can execute IO code concurrently and they can communicate using shared mutable variables and channels.
Sparks are specific to parallel Haskell. Abstractly, a spark is a pure
computation which may be evaluated in parallel. Sparks are introduced
with the par combinator; the expression (x par
y) "sparks off" x,
telling the runtime that it may evaluate the value of x in parallel to
other work. Whether or not a spark is evaluated in parallel with other
computations, or other Haskell IO threads, depends on what your hardware
supports and on how your program is written. Sparks are put in a work
queue and when a CPU core is idle, it can execute a spark by taking one
from the work queue and evaluating it.
On a multi-core machine, both threads and sparks can be used to achieve parallelism. Threads give you concurrent, non-deterministic parallelism, while sparks give you pure deterministic parallelism.
Haskell threads are ideal for applications like network servers where you need to do lots of I/O and using concurrency fits the nature of the problem. Sparks are ideal for speeding up pure calculations where adding non-deterministic concurrency would just make things more complicated.
Parallel GHC project news
The Parallel GHC Project is an MSR-funded project to push the real-world use of parallel Haskell. Part of this project involves effort by Well-Typed to provide tools for use by the general community:
Work shall soon begin working on making the "Modified Additive Lagged Fibonacci" and perhaps other random number generators from the SPRNG library available in Haskell. Current plans are to implement the algorithms directly in Haskell, and to expose them as instances of System.Random.RandomGen. These generators are attractive for use in Monte Carlo simulations because they are splittable and have good statistical quality, while providing high performance.
Work is underway on extending the GHC EventLog and associated tools (ghc-events, ThreadScope). The aim is to support profiling of multi-process or distributed Haskell systems such as client/server or MPI programs. This involves incorporate some changes made in related projects (Eden, EdenTV). This work may have some benefits even for profiling single-process programs. It should also allow comparative profiling where multiple runs of the same program (e.g. different inputs or slightly different code) are viewed on the same timeline.
For more information on the Parallel GHC project, see the Haskell wiki page.
Featured Code
The feature code for this month is hulk, an IRC server in 1250 lines of code. Hulk was coded up by Chris Done in one evening, and it was used the next morning. (Chris has done a bit of cleaning up since).
Here's what Chris had to say about his experience writing Hulk:
Haskell's lightweight threads make it natural (and guilt-free!) to choose the design of one-thread-per-client. With a single MVar containing a pure value for the whole server-state, and the help of a monad stack of ReaderT(connection information), WriterT (replies) and StateT (user details), it was trivial to make the bulk of the code completely pure. LineBuffering on the (Handle-wrapped) sockets tripped me up; as opposed to NoBuffering, this did not behave as expected: many threads would block when only one should have. Overall, it was textbook Haskell; keep the main code pure, and only the truly impure in the outer shell.
It's up on hackage so you can install it with a quick
cabal install hulk
You can also fork his code on GitHub.
Got a cool use of Parallel Haskell or an interesting problem? Nominate it for the PH digest featured code/parallel puzzler!
Blog Posts
- Parallelism is not Concurrency - Bob Harper (17 March)
- Petri Net Concurrency - Edward Z. Yang (3 March)
Parallel-Haskell and Haskell Cafe
- How does the Yesod web framework deal with concurrency? Ertugrul Soeylemez is using concurrent Haskell to serve up two related sites and spawn off some background workers would pose any problems with Yesod and WAI. No problem, according to Michael Snoyman
- Concurrency best practices? Wren Ng Thorton is using STM to "run a lot of things in parallel without the headaches of locks." It works great, but then what if you want IO? How would you enforce atomic IO operations, eg. printing log messages in two threads without getting garbage on screen? Some suggestions were the twighlight-stm package, and using STM.TChan with potential performance penalty.
- Possible bug in Control.Concurrent. Krzysztof Skrzętnicki encountered an unexpected deadlock using Control.Concurrent. Was it a normal consequence of repeatedly reading from channel with only a single element put into it, or a bug?
- Faster timeout but is it correct? Bas van Dijk proposed increasingly faster potential replacements for System.Timeout, including some that use the new GHC event manager. After much scrutiny, it turns out that one version is not faster and the events based one had a bug. Maybe next time!
- Unable to get parallelism using
par
. C K Kashyap followed A tutorial on Parallel and Concurrent programming in Haskell but couldn't get a speed-up. Others had better luck with different sets of inputs.
- SMP parallelism gains inferior than expected.. José Pedro Magalhães also tried using SMP parallelism without getting performance gains he was hoping for. Simon Marlow suggested analysing with Threadscope, checking for too many/few sparks, and checking to see if a lot of time was being spent in GC.
- Auto elimination of MVars using a monad or monad transformer. Chris Dew asked if GHC eliminates MVars during compilation, or if there was a way to eliminate them automatically. Ryan Ingram suggested looking into the Adaptive package and Functional Reactive Programming.
Stack Overflow
- Parallelism on divide & conquer algorithm (7 Feb)
- How to best synchronize game engine and network server in Haskell? (11 Feb)
- Haskell concurrency and Handles (27 Feb)
Help and feedback
Got something for the digest? Send me an email at eric@well-typed.com. I'm particularly interested in short parallelism or concurrency puzzles, cool projects for featured code. Other comments and feedback (criticism and corrections especially!) would be welcome too.