Skip to content.

plope

Personal tools
You are here: Home » Members » chrism's Home » A str. __iter__ Gotcha in Cross-Compatible Py2/Py3 Code
 
 

A str. __iter__ Gotcha in Cross-Compatible Py2/Py3 Code

A bug caused by a minor incompatibility can remain latent for long periods of time in a cross-compatible Python 2 / Python 3 codebase.

It's pretty common in Python 2 apps to see code like this:

    if not hasattr(thing, '__iter__'):
        thing = [thing]

This sort of code is used when input is expected to be either a thing or a sequence of those things. The single-thing syntax is supported as an API convenience usually. Often the "thing" being checked for absence of __iter__ is a string.

Here's an example of using this pattern in a function that checks that a user has permission to perform an action based on an ACL:

    def check(acl, username, permission):
        for ace in acl:
            ace_action, ace_username, ace_permissions = ace
            if username == ace_username:
                if not hasattr(ace_permissions, '__iter__'):
                    ace_permissions = [ace_permissions]
                if permission in ace_permissions:
                    return ace_action == 'allow'
        return False

Let's pretend you're got an existing Python-2-only codebase that contains the above function, and it's been working for a long time. Now you want the same code to also run on Python 3. To your delight, as you make your codebase cross-Py2/Py3 compatible, you need to make no changes to the above function! It "just works". Your existing tests pass. You move on.

But there's a problem:

    acl = [
       ('allow', 'fred', 'edit_pictures'),
       ('allow', 'bob', ['view_pictures', 'delete_pictures']),
       ]

    check(acl, 'fred', 'edit')

On Python 2, the above call to check will return False. This is correct, because fred doesn't actually possess the edit permission. He possesses the edit_pictures permission, but not the edit permission.

On Python 3, however, the above call to check will incorrectly return True. Why? Because the if not hasattr(ace_permissions, "__iter__") check will evaluate to False. Why? In Python 3, instances of str have an __iter__ attribute, unlike instances of str in Python 2. The subsequent line if permission in ace_permissions will subsequently boil down to if "edit" in "edit_picture", which will evaluate True via substring checking used by in.

If such a bug makes it into a production release, it'll be a pretty embarrassing security hole, at least on Python 3. The current solution for cross-compatible code is to define a compatibility function like so:

    if PY3:
        def is_nonstr_iter(v):
            if isinstance(v, str):
                return False
            return hasattr(v, '__iter__')
    else:
        def is_nonstr_iter(v):
            return hasattr(v, '__iter__')

And to use it in the place you previously used if not hasattr("__iter__") :

    def check(acl, username, permission):
        for ace in acl:
            ace_action, ace_username, ace_permissions = ace
            if username == ace_username:
                if not is_nonstr_iter(ace_permissions):
                    ace_permissions = [ace_permissions]
                if permission in ace_permissions:
                    return ace_action == 'allow'
        return False

Bugs caused by this minor incompatibility will remain latent for long periods of time. You cannot rely on statement coverage, branch coverage, nor condition coverage to uncover it, and 2to3 won't help at all. Your test suite won't have an explicit test case for substring matching in the single-string case. Why would it?

If you're a porter, what can you do to avoid getting embarrassed by a bug caused by this backwards incompatibility? You'll want to grep your codebases for __iter__, ensuring in each usage that you don't use its presence to test if the value you're being passed is not a string. You'll need to do this "by eye", there's no automation for it.

It would have been better in general if Python 3 str instances continued to have no __iter__, matching its absence in Python 2. If that meant that you couldn't do for c in "abcdef", that would have been fine by me, and even preferable; I've seen enough ["s", "t", "r", "i", "n", "g"] results in buggy code to know that the feature is already a bug magnet. An explicit "to_iter" method on strings to produce an iterable object for folks who really do want to iterate character by character would have sufficed.

Created by chrism
Last modified 2012-03-03 06:57 PM

interesting. I've always done...

... isinstance(foo, basestring) in py2 code. Well, at least since basestring came along.
What's the origin of your hasattr idiom?

hasattr __iter__

The example provided about happens to use strings as the input values, but the API doesn't presume that the permission values must be strings. Accordingly, in some cases, you don't really want to check if the thing is a string, you just want to check if it's not already some sort of iterable and turn it into one if not.

The origin is years of history, mostly.