Persisting business data in MS Workflows

At my current job we have built an application that supports long running processes (up to about 8 months) using Microsoft Workflow Foundation. We have currently made about 15 releases in 2 years, with an ever changing process definition. In this post I will explain one of the most important lessons we have learnt during this time.

Background

Suppose we have a workflow service that handles job applications. Not very imaginative, but it has all the components we need. This process will be started by a service call to the workflow service, at which point it will initialize some data. One of these types is an instance of the Person class, defined by our company’s canonical schema:

namespace ExternalTypes
{
    public class Person
    {
        public string Name { get; set; }
        public string Phone { get; set; }

        public override string ToString()
        {
            return Name + "; " + Phone;
        }
    }
}

All of our activities have an InArgument<Person> on which they operate. This Person variable is also persisted in the persistence store, so that when the workflow is put on hold for a week, it won’t need to stay in memory.

Problem description

After having used our application for a while, we decide that having just a single string for the Name property is not granular enough. We want to be able to do background checks, and our external background checking service requires us to put initials and last name in a separate field.

No problem, right? We just modify our Person class a little:

namespace ExternalTypes
{
    public class Person
    {
        public Name Name { get; set; }
        public string Phone { get; set; }

        public override string ToString()
        {
            return Name.Initials + " " + Name.LastName + "; " + Phone;
        }
    }

    public class Name
    {
        public string Initials { get; set; }
        public string LastName { get; set; }
    }
}

Problem solved. Or is it? New workflows will accept this new Person class without any issue. Existing workflows prove problematic however: once they call the ToString() method, they will throw NullReferenceException errors due to the Name property not having been initialized in the Person object that we persisted before the upgrade.

While this example is somewhat contrived, similar things happen all the time in our application. Extra fields are added, fields become obsolete, types change. All of these things need to be handled.

Our solution

Of course, there are several solutions to this problem. We could have versioned our Person class, then have multiple versions of the activities that operate on this class and use the new versions of the activity in the newer versions of our workflow (we are versioning our workflows, right? Right?).

We opted for a more simple solution which is, in our minds, more maintainable. Instead of persisting entire domain objects, such as this Person with our workflow, we only persist an identifier. The domain object itself is then retrieved (from a database, or in our case from a Business Service). This has the advantage of always giving us an instance of the latest version of the class. Of course the problem remains that splitting a Name field into two fields called Initials and LastName requires some kind of conversion, but this can be done one time in a data migration script.

Additionally, our solution has the advantage of being able to run data updates in SQL without having to somehow get the new data into running workflows. This is of course a no-no, but it happens in practice. Say some process crashed due to a bug, now we can update the database and resume the workflow at the last persist point.

Conclusion

We have learned this problem the hard way. Initially we just persisted entire domain objects, thinking this would be more efficient. And if you never change your data types, it probably is: you don’t have to retrieve the data for every activity you execute. However, this is a tradeoff we’ll gladly make. Workflow services generally run asynchronous to user input – a user clicks something on a screen and the workflow starts its work without the user having to wait for it. We decided that a little performance hit was worth the easier maintenance.

Making this change halfway through your application’s life cycle, having running workflows, requires a lot more work than doing it from the start. Even though we discovered this after a few months of development, we have decided to refactor everything to work in our preferred manner. Now we have to wait for a few months before the workflows with the old persistence structure expire.

TL;DR

Don’t persist domain objects in your workflow, just save references instead. Retrieve the data in your code activities.

Leave a Reply

Your email address will not be published. Required fields are marked *