Tuesday, September 19, 2006

Exception Handling in BPM

Bruce Silver has an interesting post on the role of exception handling in BPM. He points out that BPMN has support for exception handling (although lots of vendors don't support it). BPEL also has this support - and WebSphere Integration Developer does a particularly good job of allowing exceptions and unexpected situations to be handled in a natural manner.

However, it does seem that the problem of exception handling is a good example of the gulf between IT and Business.

Business Modeling tools usually encourage a "flow of control" methodology: "do this... then do that.... did it work?  great... now do this.... did that work?  no.... ok... do this instead".  That is: business modeling software (and its users) have a tendency to deal with errors in terms of asking "did the thing I just tried to do actually work?"

The exception handling approach of BPEL (and BPMN, it would seem) works the other way: "try this and if it doesn't work, here's what you should do to recover").

The difference sounds subtle but it's equivalent to the difference between C and Java: C encourages a "test your return codes" approach to programming; Java encourages a "catch your exceptions" approach.

I believe the lessons we learned from Java and C have a direct analogue in this space.  The industry is littered with the fallout from those who failed to check their return codes.... there's always a case you fail to deal with and dealing with them all in line just gets messy and encourages dangerous short cuts. Humans, it turns out, just aren't very good at getting this right - our brains presumably just don't work this way.  Exception handling also has problems, of course (after you've caught them, you still have to handle them) but at least you're forced to do something or make a conscious decision to do nothing.

In the BPM space, the problem with the "test your return codes" approach is similar: as models are refined, they get utterly swamped with error handling cases. Worse, this proliferation of cases masks the problem that many cases have not been considered at all - you just can't easily tell. 

This becomes a real problem when the models are handed over to IT. If they have been developed in this manner, it is tempting to "augment" the model with the special cases that only the IT staff know about. This adds even more mess to the diagram and fosters the belief that the model has changed beyond recognition once IT get their hands on it.

Working with an exception-handling methodology, by contrast, removes several of these problems: business-relevant error-handling logic can be kept in the diagram and the less-important cases and purely technical cases can be handled as exceptions (out of sight with respect to the main diagram).

I think it was James Governor who claimed that there's nothing new in IT and he's right: the lessons of C and Java are just as relevant here.


Bruce Silver said...

Thanks for adding an interesting dimension I had not thought of. But you neglect one very important feature of BPMN intermediate events (and BPEL fault handlers), which is that they can be attached to a subprocess (BPEL scope) and interrupt or abort that subprocess, including any parallel threads, before handling the exception. So that's a bit differet than waiting till the end and checking return codes.

Richard Brown said...

Hi Bruce,

Thanks for the comment.

I completely agree about not having to "wait till the end before handling the exception". Indeed, that was one of the points I was trying to make (but didn't do very well).

Indeed, the ability to handle problems somewhere far away from where they actually happened is a feature of the exception-handling approach that I particularly like (However, it doesn't enjoy universal approval as any participant in the C/Java holy-wars will testify).

However, the "including any parallel threads" was a key point that I had missed. Thanks for that.... perhaps my analogy doesn't work perfectly :-)