Skip to content.

plope

Personal tools
You are here: Home » Members » chrism's Home » Middleware Tension
 
 

Middleware Tension

Many people want to break up their entire application into functionally decomposed pieces of WSGI middleware.

Purpose

Recently, I've been asked more than once to explain a position I have concerning the decomposition of application functionality into WSGI middleware. This blog post actually started out as an email, but since I'll probably need to make this case again, I figure I should just write it up more generally so I can point to it later instead of rethinking and retyping it. Although I mention Repoze below, this is not a Repoze-specific issue, it's instead an issue with any WSGI-based system.

Details

It's my opinion that the current Paste "pipeline" exposed under Repoze should be advertised as the domain of integrators, who should feel free to remove and add middleware that tweaks behavior in ways that don't "significantly change" the result of visiting most resources. In the common case, I would define "significantly change" as a situation where both the content body and the content-type of "most" resources returned by the stack will be different when the middleware is absent than when the middleware is present.

This assertion is in tension with ideas that various folks have expressed about using middleware filters to "theme" otherwise presentation-free semantic XML. For instance, folks have proposed that it would be beneficial to write WSGI applications that tend to return XML something like (only an example):

    <?xml version="1.0"?>
    <resource id="foo" type="form">
      <element name="State" type="select" source="states"/>
      <element name="City" type="text"/>
      <target="http://www.example.com/formtarget"/>
    </resource>

    <resource id="states" type="enum">
      <value id="NJ">New Jersey</value>
      <value id="MA">Massachusetts</value>
    </resource>

    <context>
      <user id="fred">Fred</user>
      <session id="abc123"/>
    </context>

Within an "upstream" WSGI middleware filter, the semantics of this XML would be divined, and code in the middleware would be able to produce some styled HTML that represented a form. While producing the form, a developer could select elements from the semantic XML and use them to make decisions about what to display and how to display it. He would then pass the HTML he created back upstream.

I tend to disagree with using WSGI for this purpose. The reason this is in tension with my "don't significantly change" assertion is that inserting or pulling the "transform" middleware out of the WSGI middleware stack has the potential to completely change the result of calling in to the application for most resources. At least in the arguments I've heard about this, most folks who want to do this want to do it for every resource. They intend to create an application which generates semantic XML on the right hand side, and create middleware which embeds application policy to interpret and transform that XML into something renderable by a browser. In effect, they want fundamental behavior of the "application" to live in two places: in the WSGI application, and within one or more pieces of WSGI middleware.

This is understandable. People want to do this using WSGI because they sense that a Paste "pipeline" is a form of functional composition where the output of one function becomes the input of another, and programming using functional composition fits their brains. It is often easier to understand and debug programs that are written this way because often you only need to understand one small piece of code in the pipeline in order to get the results you want, and the lines of demarcation between pipeline elements are consistent and explicit. Arbitrary applications don't have such well-defined lines of demarcation, so it's often difficult to find out where to jam in some code to get the result you want. Fans of the functional composition development model also like the ability to divide the problem space along lines of responsibility with a well-defined interface between them. In a functionally decomposed application like this, on a project divided between developers along "back end" and "front end" lines it's clear who's at fault if the semantic XML is broken (the "back end folks"), and it's likewise just as clear who's at fault if the rendering isn't right (the "front-end folks"). In some cases, this is a strategic benefit on a larger project, because it allows for clearer divisions of responsibility.

ASIDE: It's often tempting to say "WSGI pipeline" (indeed I do it all the time) but there actually is no such a thing as a "WSGI pipeline". Paste calls it a pipeline, perhaps mistakenly. In any case, WSGI is actually composed of two pipelines: the ingress pipeline and the egress pipeline. On ingress, the request is passed through middleware until it gets to "an application". On egress, a response is passed back up the same set of middleware until it gets to "a server", where it is sent back to the requestor's browser. Sorry, I thought this would be an appropriate place to mention this.

But despite the benefits of using functional composition as a development model, I posit that if a WSGI application which returns semantic XML or some other serialization of structured data doesn't actually yet exist that there is no purpose in making the rendering part of the application into WSGI middleware. Instead, it should just live in the application itself, because it actually is part of the application. Likewise, if a piece of WSGI middleware can't be dropped into an arbitrary Paste pipeline and potentially do something useful without requiring radical changes to the application being served, it's not actually "middleware". It's just a piece of code dropped into the call chain because it was convenient to put it there.

Although it's reasonable to be attracted to functional composition as a development model, and you may think of a stack of WSGI middleware as a call chain, my assertion is that the particular call chain exposed by WSGI isn't always the appropriate place to do application development.

All that said, this explanation makes a lot of people scratch their heads and wonder what I'm smoking, because the allure of programming within a functional call chain is so strong.

Potential Resolution

Given that people really like the idea of using functional composition as a development model because it so closely fits their brains, we've been considering adding an additional set of pipelines to Repoze. The existing Paste pipeline would be not need to be controlled by application developers. Instead, sysadmins and integrators would be free to add and remove WSGI middleware, change server port numbers, etc there to their heart's content.

The additional pipelines would be the domain of programmers. Programmers would plug code in to some point within a a separate functional call chain. The composition would be fixed for a particular application deployment, and would not meant to be changed by integrators. The composition may not be a Paste "pipeline" because WSGI 1.0 is actually not a great model for straight functional composition because it's engineered for various HTTP corner cases. Indeed, it may not even have a declarative composition syntax like Paste provides for WSGI 1.0. Instead, the protocol itself might be WSGI 2.0 or perhaps something even simpler which exposes separate ingress and egress filter chains, and uses plain-old-Python to compose each.

The packaging for applications which had candidate functions which were willing to be injected themselves into the plug points might be plain old Python eggs. These eggs would have entry points which essentially stated that they had functions which were "pipeline candidates", e.g.:

    [repoze.callchain.ingress]
    myingress = foo.bar:baz

    [repoze.callchain.egress]
    myegress = foo.bar:buz

Then a separate bit of configuration, perhaps in a separate config file, you'd compose all of these composables up into pipelines:

    [pipeline:ingress]
    pipeline = egg#mypackage:myingress egg#anotherpackage:thatingress

    [pipeline:egress]
    pipeline = egg#mypackage:myegress egg#anotherpackage:thategress

When Repoze (or whatever contained this framework) started up, it would look for that config file, and arrange the call chains. You might package up that configuration in a top-level egg and call that "an application", because that's exactly what it would be.

Another mechanism that could be used for the same configuration purpose is ZCML and the Zope component architecture. Insert cheer or jeer here depending on which way you swing. ;-)

RFC

I'd love to hear any countervailing opinions. Likewise, if you agree strongly, I'd appreciate it if you said why. I'd be particularly interested in what Ian and Phillip have to say on the topic {hint, hint}.

Created by chrism
Last modified 2007-11-21 05:13 PM

Agree with half of this

I agree that this itch (making HTML UIs in WSGI) shouldn't be done as middleware.

I disagree with the conclusion that it should, thus, live in each of the N application frameworks out there. From your perspective, it's a feature to do it in some application framework. However, from my perspective, there are some sharp edges to that alternative.

I would like to at least see if the itch can be scratched in new ways. If nobody wants it, then it will die of its own accord. If people *do* like the idea though, more than the alternative, then it was a good idea. [wink]

I guess your proposal is a way to keep the chocolate out of the peanut butter, regarding the flaky WSGI UI ideas?

egress and ingress?

Pipeline might not be quite the right term. And maybe middleware isn't (PJE disagrees with how I use the term). And maybe filter isn't the right term. But eh. The WSGI Onion, if you will. I think it's applicable here.

Some kinds of content-type changing middleware are pretty uncontroversial, I think. For instance, there's a middleware somewhere out there that translates XHTML to HTML for clients that don't accept XHTML. The output is basically isomorphic to the original content, but the type is different. Maybe truly isomorphic content isn't that interesting or useful, but at least that case seems reasonable and possibly useful (if I ever in my life found myself caring about XHTML it might seem more useful). That actually isn't best as *just* an egress transformation, because it would be nice to add XHTML to the Accept header on the way in. If the Accept header in the wild wasn't full of complete bullshit, maybe this example would be more than academic even.

Realistically, I find that ingress and egress filters aren't very distinct. At least, any output-transforming filter seems to eventually grow *some* ingress transformation, though the opposite isn't always true. I don't see a reason to distinguish between the two; keeping them together makes the HTTP message and interaction (large) correct and complete at each step. (The exception being if middleware adds mutable structures to the environment, some bad stuff can happen.) Having a singular notion of middleware, that can be either input and/or output modifying, but where it always goes together, avoids a *lot* of errors in the configuration and composition of the system.

The thing you lose using WSGI for this, is that you can only communicate in HTTP terms, with some small extensions in the request. You don't have shared data structures. Your transformations have to be pretty completely defined in terms of HTTP. If you transform XML, all you get is XML -- there is no underlying data structure you can refer to. Personally I find that a benefit; sharing data structures across applications SUCKS. You end up slowly recreating Acquisition, and we all know how that turns out. Using WSGI gives you an inspectable and simple structure, that is very easy to test, and where reuse is highly valued (since you can express the API in terms of a document format or conventions built on existing document formats, as opposed to code).

Now maybe rewriting semantic XML into HTML is too much. I don't know; XSLT seems like it was *built* for this kind of design pattern. XSLT also kind of sucks, so maybe that's part of the problem. Maybe you just can't get enough information in there; though I would *hope* that you don't restrict yourself to a one-to-one match of an HTML page and an XML document. And maybe it makes sense to actually compose the application as a truly separate application at a different URL, that just happens to feed off XML documents elsewhere in the system. You have to worry about the notion of "here" a bit, but it's not too horrible if you have detectable links.

So... wandering about, that's my feelings. As to actually *writing* the middleware, WebOb's req.get_response() method makes it *tons* easier, basically using a WSGI 2.0 model at the loss of some of the overly complex features of WSGI 1.0 (it works with any WSGI 1.0 application, but will sometimes exhaust streams of data even if it's not necessary).

Using the timemachine...

Paul and I actually worked together on a package to do functional
comopsition of applications back at the 2005 EuroPython sprint.
You can get it in its current form here:

http://palladion.com/home/tseaver/software/pipelines

Maybe we should work on a story for such an application development
environment atop that code.

for the sake of clarity in jargon

instead of endware, how bout (all be it more boring and less apocalyptic sounding), "appware"? and maybe for things like servers, "hostware"? hostware, appware and middleware cover most everything I can imagine creating a component for in wsgi and also seem to divide the configuration concerns nicely. I could see passing around hostware, appware and middleware stacks as metacomponents ie I run my appstack cluebin with my theming middleware on my GAE hostware. Obviously this is somewhat simplified (GAE determine could influence certain structures in the appware, especially persistence), but I think works generally.