What is a good way to design/structure large functional programs, especially in Haskell?

I've been through a bunch of the tutorials (Write Yourself a Scheme being my favorite, with Real World Haskell a close second) - but most of the programs are relatively small, and single-purpose. Additionally, I don't consider some of them to be particularly elegant (for example, the vast lookup tables in WYAS).

I'm now wanting to write larger programs, with more moving parts - acquiring data from a variety of different sources, cleaning it, processing it in various ways, displaying it in user interfaces, persisting it, communicating over networks, etc. How could one best structure such code to be legible, maintainable, and adaptable to changing requirements?

There is quite a large literature addressing these questions for large object-oriented imperative programs. Ideas like MVC, design patterns, etc. are decent prescriptions for realizing broad goals like separation of concerns and reusability in an OO style. Additionally, newer imperative languages lend themselves to a 'design as you grow' style of refactoring to which, in my novice opinion, Haskell appears less well-suited.

Is there an equivalent literature for Haskell? How is the zoo of exotic control structures available in functional programming (monads, arrows, applicative, etc.) best employed for this purpose? What best practices could you recommend?

Thanks!

EDIT (this is a follow-up to Don Stewart's answer):

@dons mentioned: "Monads capture key architectural designs in types."

I guess my question is: how should one think about key architectural designs in a pure functional language?

Consider the example of several data streams, and several processing steps. I can write modular parsers for the data streams to a set of data structures, and I can implement each processing step as a pure function. The processing steps required for one piece of data will depend on its value and others'. Some of the steps should be followed by side-effects like GUI updates or database queries.

What's the 'Right' way to tie the data and the parsing steps in a nice way? One could write a big function which does the right thing for the various data types. Or one could use a monad to keep track of what's been processed so far and have each processing step get whatever it needs next from the monad state. Or one could write largely separate programs and send messages around (I don't much like this option).

The slides he linked have a Things we Need bullet: "Idioms for mapping design onto types/functions/classes/monads". What are the idioms? :)

9 upvote
  flag
I think the core idea when writing large programs in a functional language is small, specialized, and stateless modules communicating via message passing. Of course you have to pretend a bit because a true program needs state. I think this is where F# shines over Haskell. – ChaosPandion
18 upvote
  flag
@Chaos but only Haskell enforces statelessness by default.You have no choice, and have to work hard to introduce state (to break compositionality) in Haskell :-) – Don Stewart
upvote
  flag
@Don - Yeah I know, but I am one of this best of both worlds type guys. – ChaosPandion
upvote
  flag
@caffeine - At first it seems like a lot of work but you would be surprised at how awesome it is to have largely separate programs sending messages. – ChaosPandion
7 upvote
  flag
@ChaosPandion: I don't disagree, in theory. Certainly, in an imperative language (or a functional one designed around message-passing), that might very well be what I'd do. But Haskell has other ways to deal with state, and perhaps they let me keep more of the 'pure' benefits. – Dan
upvote
  flag
I wrote a bit about this under "Design Guidelines" in this document: community.haskell.org/~ndm/downloads/… – Neil Mitchell
upvote
  flag
"What is a good way to design/structure large functional programs, especially in Haskell?". In his presentation "Engineering Large Projects in Haskell", Don Stewart says Galois have developed up to 0.2MLOC Haskell programs which is tiny by modern standards (many of the world's largest code bases are now over 10MLOC!). Given that Galois are pretty much the only major industrial user of Haskell in the world I'd say that nobody has ever developed a large Haskell program so nobody will know how to structure one. – Jon Harrop
upvote
  flag
@ChaosPandion Note that you can do state in haskell with infinite recursion or monads for example. – fread2281
4 upvote
  flag
@JonHarrop let's not forget that while MLOC is a good metric when you compare projects in similar languages, it doesn't make much sense for cross-language comparison, especially with languages like Haskell, where code reuse and modularity is much more easy & safe compared to some languages out there. – Tair
1 upvote
  flag
@tair: "languages like Haskell, where code reuse and modularity is much more easy & safe". I find that statement odd when Haskell has only a vestigial module system compared to ML. Why are there at least two >1MLOC OCaml code bases in industry (Jane St. and Citrix) but no Haskell ones? – Jon Harrop
upvote
  flag
@JonHarrop I didn't mean 'modularity' like in Java, it was more about not needing much glue code or optimizing data structures when using libraries. May be 'modularity' is not the right word here. I didn't have BIG projects in haskell, but I always was amazed how little code I end up with when I know that a library most probably handles its tasks in most efficient way. I think, most of this comes from lazyness of lang. Is OCaml lazy? – Tair
upvote
  flag
@tair: OCaml is not lazy. I studied several OSS Haskell code bases and didn't think laziness seemed particularly advantageous. – Jon Harrop
1 upvote
  flag
Can you elaborate on how "monads capture key architectural designs in types"? – user3594595

8 Answers 11

up vote 484 down vote accepted

I talk a bit about this in Engineering Large Projects in Haskell and in the Design and Implementation of XMonad. Engineering in the large is about managing complexity. The primary code structuring mechanisms in Haskell for managing complexity are :

The type system

  • Use the type system to enforce abstractions, simplifying interactions.
  • Enforce key invariants via types
    • (e.g. that certain values cannot escape some scope)
    • That certain code does no IO, does not touch the disk
  • Enforce safety: checked exceptions (Maybe/Either), avoid mixing concepts (Word,Int,Address)
  • Good data structures (like zippers) can make some classes of testing needless, as they rule out e.g. out of bounds errors statically.

The profiler

  • Provide objective evidence of your programs heap and time profiles.
  • Heap profiling, in particular, is the best way to ensure no unnecessary memory use.

Purity

  • Reduce complexity dramatically by removing state. Purely functional code scales, because it is compositional. All you need is the type to determine how to use some code -- it won't mysteriously break when you change some other part of the program.
  • Use lots of "model/view/controller" style programming: parse external data as soon as possible into purely functional data structures, operate on those structures, then once all work is done, render/flush/serialize out. Keeps most of your code pure

Testing

  • QuickCheck + Haskell Code Coverage, to ensure you are testing the things you can't check with types.
  • GHC +RTS is great for seeing if you're spending too much time doing GC.
  • QuickCheck can also help you identify clean, orthogonal APIs for your modules. If the properties of your code are difficult to state, they're probably too complex. Keep refactoring until you have a clean set of properties that can test your code, that compose well. Then the code is probably well designed too.

Monads for Structuring

  • Monads capture key architectural designs in types (this code accesses hardware, this code is a single-user session, etc .)
  • E.g. the X monad in xmonad, captures precisely the design for what state is visible to what components of the system.

Type classes and existential types

  • Use type classes to provide abstraction: hide implementations behind polymorphic interfaces.

Concurrency and parallelism

  • Sneak par into your program to beat the competition with easy, composable parallelism.

Refactor

  • You can refactor in Haskell a lot. The types ensure your large scale changes will be safe, if you're using types wisely. This will help your codebase scale. Make sure that your refactorings will cause type errors until complete.

Use the FFI wisely

  • The FFI makes it easier to play with foreign code, but that foreign code can be dangerous.
  • Be very careful in assumptions about the shape of data returned.

Meta programming

  • A bit of Template Haskell or generics can remove boilerplate.

Packaging and distribution

  • Use Cabal. Don't roll your own build system. (EDIT: Actually you probably want to use Stack now for getting started.).
  • Use Haddock for good API docs
  • Tools like graphmod can show your module structures.
  • Rely on the Haskell Platform versions of libraries and tools, if at all possible. It is a stable base. (EDIT: Again, these days you likely want to use Stack for getting a stable base up and running.)

Warnings

  • Use -Wall to keep your code clean of smells. You might also look at Agda, Isabelle or Catch for more assurance. For lint-like checking, see the great hlint, which will suggest improvements.

With all these tools you can keep a handle on complexity, removing as many interactions between components as possible. Ideally, you have a very large base of pure code, which is really easy to maintain, since it is compositional. That's not always possible, but it is worth aiming for.

In general: decompose the logical units of your system into the smallest referentially transparent components possible, then implement them in modules. Global or local environments for sets of components (or inside components) might be mapped to monads. Use algebraic data types to describe core data structures. Share those definitions widely.

7 upvote
  flag
Thanks Don, your answer is excellent - these are all valuable guidelines and I will refer to them regularly. I guess my question occurs a step before one would need all this, though. What I'd really like to know are the "Idioms for mapping design onto types/functions/classes/monads" ... I could try to invent my own, but I was hoping there might be a set of best practices distilled somewhere - or if not, recommendations for well-structured code to read of a large-ish system (as opposed to, say, a focused library). I edited my post to ask this same question more directly. – Dan
5 upvote
  flag
I've added some text on decomposition of design to modules. Your goal is to identify logically related functions into modules that have referentially transparent interfaces with other parts of the system, and to use purely functional data types as soon as possible, as much as possible, to model the outside world safely. The xmonad design document covers a lot of this: xmonad.wordpress.com/2009/09/09/… – Don Stewart
upvote
  flag
Thanks again! The xmonad design document is just what I was looking for. Time to read some code... – Dan
upvote
  flag
One thing that I think could be added to this question is an explanation of how Haskell helps you avoid giant switch/case statements. In an OOP language whenever you see one it's a sign you should be using polymorphism and virtual functions, and it's one of the basic mechanisms through which OOP helps code scale, even just by helping physical layout -- all those case handlers when redone as virtual functions are going to belong in different classes with their own files. But how do you avoid the giant switch/case in Haskell? I think this is what the OP is getting at as regards the WYAS tables. – Joseph Garvin
upvote
  flag
Hmm. Giant switches? Haven't seen those in Haskell. Don't we just dispatch to type class methods? – Don Stewart
upvote
  flag
I'm no Haskell expert but I thought type class methods couldn't be virtual? The giant switches you see in C like languages are for runtime polymorphism, and usually the better alternative is to use virtual methods. If I understand Haskell type classes correctly, at a given call site which version of a type class method will be called is fixed at compile time. – Joseph Garvin
1 upvote
  flag
Sometimes the type isn't known statically, so a dictionary of methods is supplied as a parameter (in GHC's dictionary-passing implementation). – Don Stewart
3 upvote
  flag
I tried to download the slides from the Engineering Large Projects in Haskell talk, but the link appeared to be broken. Here's a working one: galois.com/~dons/talks/dons-londonhug-decade.pdf – mik01aj
upvote
  flag
The dowload link at xmonad.wordpress.com/2009/09/09/… for the slides does not work :( – Riccardo T.
3 upvote
  flag
I managed to found this new download link: pau-za.cz/data/2/sprava.pdf – Riccardo T.
1 upvote
  flag
I find this comment interesting: "Make sure that your refactorings will cause type errors until complete." – jameshfisher
upvote
  flag
This answer gives many good pieces of advice, but most are fairly obvious, like "use purity", "use the type system", "be careful with the FFI". I agree with Dan that this topic "Monads capture key architectural designs in types" should have its own page long answer. – Lii
1 upvote
  flag
@Riccardo now it's 404 too :( – Cynede
3 upvote
  flag
@Heather Even though the download link at the page I mentioned in the comment right before does not work, it looks like the slides can still be viewed on scribd: scribd.com/doc/19503176/The-Design-and-Implementation-of-xmo‌​nad – Riccardo T.

Designing large programs in Haskell is not that different from doing it in other languages. Programming in the large is about breaking your problem into manageable pieces, and how to fit those together; the implementation language is less important.

That said, in a large design it's nice to try and leverage the type system to make sure you can only fit your pieces together in a way that is correct. This might involve newtype or phantom types to make things that appear to have the same type be different.

When it comes to refactoring the code as you go along, purity is a great boon, so try to keep as much of the code as possible pure. Pure code is easy to refactor, because it has no hidden interaction with other parts of your program.

13 upvote
  flag
I actually found that refactoring is quite frustrating, if the data types need to change. It requires tediously modifying the arity of lots of constructors and pattern-matches. (I agree that refactoring pure functions into other pure functions of the same type is easy - as long as one doesn't touch the data types) – Dan
2 upvote
  flag
@Dan You can get away completely free with smaller changes (like just adding a field) when you use records. Some may want to make records a habit (I'm one of them ^^"). – MasterMastic
1 upvote
  flag
@Dan I mean if you change the data type of a function in any language don't you have to do the same? I don't see how a language such as Java or C++ would help you in this regard. If you say that you can use some sort of common interface that both types obey then you should have been doing that with Typeclasses in Haskell. – semicolon
1 upvote
  flag
@semicon the difference for languages like Java is the existence of mature, well-tested and fully-automated tools for refactoring. Generally these tools have fantastic editor integration, and take away a huge amount of the tedious work associated with refactoring. Haskell gives us a brilliant type system with which to detect things that must be changed in a refactoring, but the tools to actually carry out that refactoring are (at the present moment) very limited, especially compared to what has already been available in the Java ecosystem for more than 10 years. – jsk

Don gave you most of the details above, but here's my two cents from doing really nitty-gritty stateful programs like system daemons in Haskell.

  1. In the end, you live in a monad transformer stack. At the bottom is IO. Above that, every major module (in the abstract sense, not the module-in-a-file sense) maps its necessary state into a layer in that stack. So if you have your database connection code hidden in a module, you write it all to be over a type MonadReader Connection m => ... -> m ... and then your database functions can always get their connection without functions from other modules having to be aware of its existence. You might end up with one layer carrying your database connection, another your configuration, a third your various semaphores and mvars for the resolution of parallelism and synchronization, another your log file handles, etc.

  2. Figure out your error handling first. The greatest weakness at the moment for Haskell in larger systems is the plethora of error handling methods, including lousy ones like Maybe (which is wrong because you can't return any information on what went wrong; always use Either instead of Maybe unless you really just mean missing values). Figure out how you're going to do it first, and set up adapters from the various error handling mechanisms your libraries and other code uses into your final one. This will save you a world of grief later.

Addendum (extracted from comments; thanks to Lii & liminalisht) —
more discussion about different ways to slice a large program into monads in a stack:

Ben Kolera gives a great practical intro to this topic, and Brian Hurt discusses solutions to the problem of lifting monadic actions into your custom monad. George Wilson shows how to use mtl to write code that works with any monad that implements the required typeclasses, rather than your custom monad kind. Carlo Hamalainen has written some short, useful notes summarizing George's talk.

5 upvote
  flag
Two good points! This answer has the merit of being reasonably concrete, something that the other ones are not. It would be interesting to read more discussion about different ways to slice a large program into monads in a stack. Please post links to such articles if you have any! – Lii
4 upvote
  flag
@Lii Ben Kolera gives a great practical intro to this topic, and Brian Hurt discusses solutions to the problem of lifting monadic actions into your custom monad. George Wilson shows how to use mtl to write code that works with any monad that implements the required typeclasses, rather than your custom monad kind. Carlo Hamalainen has written some short, useful notes summarizing George's talk. – liminalisht

I did learn structured functional programming the first time with this book. It may not be exactly what you are looking for, but for beginners in functional programming, this may be one of the best first steps to learn to structure functional programs - independant of the scale. On all abstraction levels, the design should always have clearly arranged structures.

The Craft of Functional Programming

The Craft of Functional Programming

http://www.cs.kent.ac.uk/people/staff/sjt/craft2e/

11 upvote
  flag
As great as the Craft of FP is -- I learned Haskell from it -- it is an introductory text for beginner programmers, not for the design of large systems in Haskell. – Don Stewart
3 upvote
  flag
Well, It is the best book I know about designing APIs and hiding implementation details. With this book, I became a better programmer in C++ — just because I learned better ways to organize my code. Well, your experience (and answer) is surely better than this book, but Dan might probably still be a beginner in Haskell. (where beginner=do write $ tutorials `about` Monads) – comonad

Perhaps you have to go an step back and think of how to translate the description of the problem to a design in the first place. Since Haskell is so high level, it can capture the description of the problem in the form of data structures , the actions as procedures and the pure transformation as functions. Then you have a design. The development start when you compile this code and find concrete errors about missing fields, missing instances and missing monadic transformers in your code, because for example you perform a database Access from a library that need a certain state monad within an IO procedure. And voila, there is the program. The compiler feed your mental sketches and gives coherence to the design and the development.

In such a way you benefit from the help of Haskell since the beginning, and the coding is natural. I would not care to do something "functional" or "pure" or enough general if what you have in mind is a concrete ordinary problem. I think that over-engineering is the most dangerous thing in IT. Things are different when the problem is to create a library that abstract a set of related problems.

Gabriel's blog post Scalable program architectures might be worth a mention.

Haskell design patterns differ from mainstream design patterns in one important way:

  • Conventional architecture: Combine a several components together of type A to generate a "network" or "topology" of type B

  • Haskell architecture: Combine several components together of type A to generate a new component of the same type A, indistinguishable in character from its substituent parts

It often strikes me that an apparently elegant architecture often tends to fall out of libraries that exhibit this nice sense of homogeneity, in a bottom-up sort of way. In Haskell this is especially apparent - patterns that would traditionally be considered "top-down architecture" tend to be captured in libraries like mvc, Netwire and Cloud Haskell. That is to say, I hope this answer will not be interpreted as an attempt replace any of the others in this thread, just that structural choices can and should ideally be abstracted away in libraries by domain experts. The real difficulty in building large systems, in my opinion, is evaluating these libraries on their architectural "goodness" versus all of your pragmatic concerns.

As liminalisht mentions in the comments, The category design pattern is another post by Gabriel on the topic, in a similar vein.

3 upvote
  flag
I would mention another post by Gabriel Gonzalez on the category design pattern. His basic argument is that what we functional programmers think of as "good architecture" is really "compositional architecture" - it's designing programs using items that are guaranteed to compose. Since the category laws guarantee that identity and associativity are preserved under composition, a compositional architecture is achieved through using abstractions for which we have a category - e.g. pure functions, monadic actions, pipes, etc. – liminalisht

I have found the paper "Teaching Software Architecture Using Haskell" (pdf) by Alejandro Serrano useful for thinking about large-scale structure in Haskell.

I'm currently writing a book with the title "Functional Design and Architecture". It provides you with a complete set of techniques how to build a big application using pure functional approach. It describes many functional patterns and ideas while building an SCADA-like application 'Andromeda' for controlling spaceships from scratch. My primary language is Haskell. The book covers:

  • Approaches to architecture modelling using diagrams;
  • Requirements analysis;
  • Embedded DSL domain modelling;
  • External DSL design and implementation;
  • Monads as subsystems with effects;
  • Free monads as functional interfaces;
  • Arrowised eDSLs;
  • Inversion of Control using Free monadic eDSLs;
  • Software Transactional Memory;
  • Lenses;
  • State, Reader, Writer, RWS, ST monads;
  • Impure state: IORef, MVar, STM;
  • Multithreading and concurrent domain modelling;
  • GUI;
  • Applicability of mainstream techniques and approaches such as UML, SOLID, GRASP;
  • Interaction with impure subsystems.

You may get familiar with the code for the book here, and the 'Andromeda' project code.

I expect to finish this book at the end of 2017. Until that happens, you may read my article "Design and Architecture in Functional Programming" (Rus) here.

UPDATE

I shared my book online (first 5 chapters). See post on Reddit

upvote
  flag
Alexander, could you kindly update this note when you're book is complete, so we could follow it. Cheers. – Max
4 upvote
  flag
Sure! For now I finished a half of the text, but it's a 1/3 of the overall work. So, keep your interest, this inspires me a lot! – GAS
2 upvote
  flag
Hi! I shared my book online (only first 5 chapters). See post on Reddit: reddit.com/r/haskell/comments/6ck72h/… – GAS
upvote
  flag
thanks for sharing and work! – Max

Not the answer you're looking for? Browse other questions tagged or ask your own question.