Tuesday, November 11, 2014

SOA and Referential Data Integrity

One of the issues that tends to pop up is how we maintain referential integrity between services. In this post, I'd like to share my experiences on how referential data integrity between services can ruin your SOA.

Database Constraints Between Services Break Boundaries

Two of the Tenets of Service-Orientation are: "boundaries are explicit", and "services are autonomous". The first one implies that internal (private) implementation details should not be leaked outside of a service boundary. And the second one implies that services are not subservient to other services (or other pieces of code).

For purposes of discussion, let's say we have the following:

  • a "customer" service that provides customer-related business capability and persists data in a database
  • an "order" service that provides order-related business capability and persists data in the same database
  • foreign key constraints between customer-related entities/tables and order-related entities/tables. More specifically, an "order" table contains the unique ID of a customer that placed the order, and that this ID needs to exists in the "customer" table.

Do the above services (customer and order) follow the "boundaries are explicit" tenet? Are the services autonomous? Let's examine further.

The way the "customer" service persists data in a database is (private) implementation detail that is internal to it. Likewise, the way the "order" service persists data in a database is also internal to it. But how would you consider the foreign key constraints between their database tables? Is this internal implementation detail leaking outside of a service boundary? (i.e. leaking outside the "customer" service boundary and onto the "order" service boundary)

If the "order"-related database undergoes some schema changes, will it not affect the "customer"-related database schema? When deploying the schema changes to the "order" service, will it not require the "customer" service to be temporarily unavailable? (e.g. due to database restart) If "services are autonomous", how come the "customer" and "order" services are inter-dependent, such that a change in one requires a restart (or a redeploy) on the other?

Split Service, Split Database

A better approach would be to split the databases of the two services, and do away with foreign key constraints. That would allow for explicit boundaries, and autonomy. But this might be unacceptable to some people at first.

A monolithic application (left) split into services in a service-oriented architecture (right).

How could one ensure that enough customer information is received before orders are placed by that customer? In other words, how can a developer ensure that orders are placed by known customers (an existing customer ID)? Well, in SOA, services don't have to! It is not the responsibility of services to maintain this referential integrity. The responsibility of ensuring that an order is placed by a known customer lies with the process of placing orders (not in the service). It is the orchestration layer's responsibility to maintain this.

What if some customer information was modified (e.g. billing address), shouldn't the related orders be affected? Again, the process, or orchestration layer, can become responsible for this. Is this really a change in the customer? Or is it just a change in the placed order? One possible way is we define the process to say that it would copy the customer's billing information, and attach a copy (i.e. duplicate) with the placed order. This would mean that the order's bill-to (billing) address is defined when the order was placed. Another possible way is to leave the bill-to address undetermined until the order is shipped, at which the billing address as provided by the customer service shall be used.

Correlation Between Services and Context Boundaries

A useful way of thinking about this is the Domain-Driven Design notion of Bounded Context. DDD splits a complex domain up into multiple bounded contexts and maps out the relationships between them. And this results into multiple databases.

Re-usable Services

Services (in SOA) are meant to be reusable. In the example, if we reconsider what the customer service contains, we can probably design it in such a way that it doesn't have to be for the purposes of order taking. It can be designed to be reusable and allow for any party's information (e.g. persons, organizations) and not just customers. It could be re-used to store employees, since employees can become customers in the future. It could be re-used to store suppliers (since the business may need to track the suppliers of products being produced and ordered).

The (business) goals of SOA any IT initiative are:

  • increase agility (e.g. support new/changing business processes/models, reduce time to solution)
  • reduce cost (e.g. re-use business processes and/or applications, improve utility of existing/legacy application)

These goals are further translated as (technical goals):

  • increase usability (i.e. re-usability and accessibility across different applications)
  • improve maintainability
  • reduce redundancy

When we make the boundaries of services explicit and make them autonomous, we can better achieve the goals of re-usability, and reduce redundancy.

Some other services, that come to mind, can become re-usable (when properly designed and split), are:

  • Authentication and authorization - if this were (re-)written for each application, it would be a huge cost.
  • Billing (or invoicing)
  • Product Catalog

Microservices

In a previous post on SOA, I did mention that I find the term Microservices to be misleading. Although the information found on the web were good, I found them to still be unclear with their implementation. Nonetheless, I did find that they do add an exciting twist if you consider that, Microservices:

Closing Thoughts

Finally, when communicating with business people, don't let reuse become the primary measure. They probably won't understand it. Tell them that it does help save development and maintenance costs. Tell them that it provides better time to market, reduce days of inventory, reduce employee turnover, etc.

Nice root beer from Virgil's Sodas. Is this available here in the Philippines? I'm not related to the product in any way. I just saw their Ad online. And being thirsty, I thought of having one.

After this rather long post, I think it's time for a nice cold drink. Root beer anyone?

2 comments:

  1. Nice... Helped me in my thought process.

    ReplyDelete
  2. It's nice that separate databases happens to make sense and happens to work for your use-case and current requirements, but... systems like these tend to evolve. Even if this satisfies the use-cases known today, tomorrow the client could ask for any number of new features that require joins between these databases.

    Reporting features are typically the kind of late requirements that pop up after the system is in production - and reporting features usually require joins between many (sometimes all) of the tables in the application.

    In my experience, for something as integrated and organic as e-commerce and B2B systems, a monolith is both simpler, easier and faster to maintain. Or perhaps just larger-than-micro services, e.g. don't try to enforce explicit boundaries between every entity - in some systems, Customer and Order domains may simply be too closely related to meaningfully break down into isolated sub-domains.

    I'm looking into CQRS and event-sourcing to see if maybe that can mitigate these practical problems created by narrow, explicit boundaries in highly-integrated architectures - on it's own, I don't feel like it's been very useful (or cost-effective) in practice, at all.

    ReplyDelete