Sr. Director, Product Research, Epicor Software

Software Architecture

This blog is not associated with my employer.

Thursday, June 21, 2007

Metadata Pathways (Rewrite)

I got a comment from Mark Little a couple of weeks ago about my 2005 blog entry Patterns for SOA 2.0 saying (nicely) that he couldn’t figure out what I was talking about. I completely agree – it’s like I took a quick discussion about metadata and the last-place entry in a Faux Faulkner contest and shoved them into that machine from The Fly.

I’ve been meaning to re-write the whole thing for a while. And earlier this week, I saw a comment from Jörg Schäfer here, which relates to what I was trying to say back on 2005:

Philosophical remark: Maybe what we need is a programming model that takes one of the few mathematically sound models of CS, namely relational theory, seriously (the other mathematically sound foundation is lambda calculus of course leading to functional programming). So rather than continuing with OO we need a proper programming abstraction for programming with relational algebra (entities, relations, RFD…).

I also saw David Ing asking Microsoft to please get on with whatever building 42 has cooking. I don’t know anything about that except to keep sunscreen with a high UV rating handy during the unveiling. The point is that all 3 examples (and I’m just guessing on the Microsoft one) are rethinking the abstractions used to define systems. Obviously, any benefits from improving the abstractions have great leverage for the rest of the system.

So, here goes another try:

If you take a typical approach to programming a data-driven application, you have a data access classes, business logic classes, (and now) service classes, client-side adapters, and UI classes. If it’s a large system you start breaking down common areas and define a framework. But in any case, you can wind up with a mass of interfaces, components, serialization, data binding, and event management.

It works initially – and you get great Intellisense support. But anyone who sells enterprise apps for a living knows that most any deviation from the initial functionality requires a rebuild of at least some application. The better solutions (like those from my employer :) ) go through significant hell trying to put the flexibility users want back into systems built on classic interface/component architectures and programming models. But that means fighting the toolkits and building lots of framework code. And on top of that, it turns out the classic interface-based programming model might not be the best way to go in the first place.

My point back in 2005 was this: Systems need to tolerate changing data models and changing behavior without downtime. Systems must accept varying message formats for a given business intent. Callers should be able invoke an arbitrary set of functions and specify the unit(s) of work. All data must be addressable by at least one human-readable identifier. Callers should not have to generate code or otherwise build proxies strongly typed to my application domain – but they can if they like (that’s a recent addition to the list).

I want to build systems around engines and metadata as much as possible. That doesn’t mean eliminating any work by a competent programmer. But I would like the work a programmer produces to stick to the explicit functional task at hand and be exquisitely reusable. I’m also not talking about DSLs or software factories (which I’m pretty much over with). But I like the idea of a system factory, where engines manage messages, data, and execution workflow – the primary colors of an enterprise system. The factory is a runtime that adapts the system as information about the application changes. The robustness of that system thus depends on the quality of its metadata and, crucially, the quality of coupling between metadata and the processes that consume metadata.

To get there, the industry needs to begin improving how abstractions are expressed through metadata – which is an architectural activity. Metadata now often spawns other metadata. Say I add a field to the customer entity in my data model (metadata). That infers a change to the message formats for my customer service – described in, say, XML Schema (metadata). My data model change also initiates a change to the physical data table (handled by an engine). The side-effects of metadata consumers are changes to system behavior, the generation of derived metadata, and many other possible actions.

Some attributes in my data model metadata affect message formats but not the physical tables (or vice-versa). In other words, data model attributes have aspects that drive processes that consume the metadata. For example, a concurrency model aspect tells the DDL engine to emit specialized columns to manage, say, record versioning. That same aspect also causes the message format description to include a required version value. The process that receives messages also sees this aspect and then knows to validate that messages actually the version value and checks to see if it is current.

Aspects might not just be name-value pairs, but complete metadata categories on their own. A good example is a life-cycle chart defined using a state sequence. As I’ve mentioned, I think you can derive all behavior (in my problem domain – ERP) by mapping CRUD constraints to a state chart – all of which can be invoked by a uniform interface. Putting my entity life-cycles into metadata and the understanding into (for example) the DDL and message description/processing engines is a pretty powerful approach because I can revise them to match business requirements as needed.

Aspects should be composable. For example, a Pat Hellend aspect in my data model metadata with values like resource, reference, and activity might combine the life-cycle and versioning aspects to make sure reference data stays read-only (affects the service behavior) and that resource data is versioned (affects message format, service behavior, and the physical data model).

Getting back to Jörg’s comment, let’s say that each attribute in a given metadata type represents a category A. You can then think of each aspect f as a morphism connecting A to another category B, where B is a consumer of the metadata type (f: A->B). As I mentioned, the target for a given aspect can be other metadata (like the message description), a running processor or engine (like the service layer), or anything else (like SQL DDL). So, “B” in this case is a collection of objects (like other metadata) or a behavior specification. I can also put an aspect g on category “B” to create a category “C” (g: B->C). That implies a compositional ability because for each element in “A” (a), you can get to a corresponding element in C by way of g(f(a)). If I have an engine that relies on metadata “C” and an aspect element a is changed, the engine knows it needs to execute the transforms g and f to get the revised member of “C”.

So, I have 2 questions for the world in looking to develop these lines of thought. The first is about dealing with the pervasive metadata pathways in complex applications. If we look at what all that metadata represents and how it affects a system, I think we could start to categorize metadata-oriented scenarios. In other words, we can identify software architectural patterns in the use of metadata – in both its content and the effects on systems. Does anyone agree?

The second question is about applying some sort of mathematical rigor to metadata architecture. The categories and morphisms I mentioned come from category theory, which may or may not have any promise as a basis for the formalism I’m looking for. I was an SDSU music major – what do I know? Category theory has been tried as a basis for programming languages without a lot of commercial success. But it also was helpful in connecting Cartesian Closed Categories and lambda-calculus (which is some of the math behind LINQ and F#). So, if metadata and metadata consumers are used to define systems, and we can cast these artifacts as collections of categories connected by morphisms, can some part of a mathematical discipline like category theory be employed to help formalize the way we define systems?

That was what I was trying to articulate in 2005 without success. I hope this comes off a little more clear.


Mike Parsons said...

"In other words, we can identify software architectural patterns in the use of metadata – in both its content and the effects on systems. Does anyone agree?"

Wasn't this the attempt of UML?

Erik J. said...

Mike -- it's hard to tell because UML has been stretched into modeling everything (including itself). I think UML was originally envisioned for modeling object-oriented development.

I wasn't thinking of modeling as much as just naming patterns like Martin Fowler and Gregor Hohpe have done with application and integration architectures, respectively. In fact Martin has identifies a metadata mapping pattern in his book, though it's specific to object-relational mapping.