Shadowing in Python gave me an UnboundLocalError

6 points by carlana


dutc

I mentioned some of these things in a talk I gave at PyData Warsaw in 2017: “Top→Down; Left→Right”

Basically, in CPython, other than files (modules,) there are only three statements that can create scopes: class, def, and (sort of) except. In CPython, scoping is determined statically by the parser, and, as the post notes, scoping cannot generally change within a block (except in class!)

CPython provides five opcodes for loading values onto the stack:

These can be determined statically by first identifying constant values, then looking for static patterns that can create name bindings (x = ..., for x in ..., &c.,) then performing the same in closed over scopes, then defaulting to global scope if no such binding can be found.

Consider the below:

from dis import get_instructions

def f():
    x = ...

assert all(inst.opname == 'LOAD_CONST' for inst in get_instructions(f.__code__) if inst.argval is ...)

def f(x):
    return x

assert all(inst.opname == 'LOAD_FAST' for inst in get_instructions(f.__code__) if inst.argrepr == 'x')

def f():
    x = ...
    return x

assert all(inst.opname in {'LOAD_FAST', 'STORE_FAST'} for inst in get_instructions(f.__code__) if inst.argrepr == 'x')

def g(x):
    def f():
        return x
    return f

assert all(inst.opname in {'LOAD_DEREF'} for inst in get_instructions(g(...).__code__) if inst.argrepr == 'x')

def f():
    return x

assert all(inst.opname in {'LOAD_GLOBAL'} for inst in get_instructions(f.__code__) if inst.argrepr == 'x')

This is why the following raise UnboundLocalError or NameError: the parser cannot correctly statically determine where x comes from or it guesses wrong or it guesses from the static information it has (without the ability to dynamically run code paths)! (The use of global or nonlocal may be able to fix issues where the parser will guess incorrectly, by hinting to the parser what the correct answer is.)

from dis import get_instructions

def f():
    locals()['x'] = ...
    return x

assert all(inst.opname in {'LOAD_GLOBAL'} for inst in get_instructions(f.__code__) if inst.argrepr == 'x')
try: f()
except NameError: pass
else: assert False

x = ...,
def f():
    x += ...,
    return x

assert all(inst.opname.startswith(('LOAD_FAST', 'STORE_FAST')) for inst in get_instructions(f.__code__) if inst.argrepr == 'x')
try: f()
except UnboundLocalError: pass
else: assert False

x = ...
def f():
    if False:
        x = ...
    return x

assert all(inst.opname.startswith('LOAD_FAST') for inst in get_instructions(f.__code__) if inst.argrepr == 'x')
try: f()
except UnboundLocalError: pass
else: assert False

The wrong guess in the second case is because the parser tries only to statically analyse the left-hand-side of the x = ... line, presumably since the right hand side could be arbitrarily dynamic! The wrong guess in the third case is quite interesting, since the if False dead branch is actually elided from the source text! (The elision clearly happens after scope determination.) Thus, we have this oddity:

from dis import get_instructions

x = ...
def f():
    if __debug__:
        x = ...
    return x

if __debug__:
    assert all(inst.opname in {'STORE_FAST', 'LOAD_FAST'} for inst in get_instructions(f.__code__) if inst.argrepr == 'x')
    assert f() is ...
else:
    assert all(inst.opname.startswith('LOAD_FAST') for inst in get_instructions(f.__code__) if inst.argrepr == 'x')
    try: f()
    except UnboundLocalError: pass
    else: assert False

We will see starkly different behaviour whether we are running with or without optimisations (i.e., python vs python -O)!

Most people assume that LOAD_NAME is how all variable access works in Python, but that’s simply not the case. It used to be quite easy to generate this with from … import * was allowed in function bodies. Clearly, in this case, we can’t statically determine whether the name is available in the local or global scope without knowing the contents of the module (which cannot easily be determined statically!)

def f():
  from module import *
  return x

The easiest way to generate a LOAD_NAME now is in a class body, which leads us to this oddity (and the only case that I am aware of where a variable can belong to multiple scopes within the same block.)

from dis import get_instructions

x = ...
def f():
    class T:
        x = x
    return T

assert all(inst.opname in {'LOAD_NAME', 'STORE_NAME'} for inst in get_instructions(f.__code__.co_consts[1]) if inst.argrepr == 'x')
obj = f()()
assert obj.x is ...

If we look at the bytecode of a try/except we can see a DELETE_FAST removing a capture exception value.

from dis import get_instructions

def f():
    try: pass
    except Exception as e: pass
    return e

assert any(inst.opname in {'DELETE_FAST'} for inst in get_instructions(f.__code__) if inst.argrepr == 'e')
try: f()
except UnboundLocalError: pass
else: assert False

I suppose we could say that this is a scope: without an additional name binding that creates another reference to e, we cannot access it later.

I believe in the talk referenced above, I mention that the CPython parser is quite simplistic. It’s very easy to overthink how Python works (e.g., to assume that def and class are “definitions” separate from executable code, that mechanisms like “hoisting” might exist, &c.)

While CPython is gaining more optimisations by the way, I think it’s still currently true that the CPython parser/compiler does only three moderately interesting things:

It’s really quite a simple execution model, which I think has contributed a lot to the success of the language!

carlana

This is a pet peeve of mine about Python. I wish it had been fixed when they moved to Python 3. Now := is wasted as an expression, so it’s even less fixable than before.

Here is a minimal example showing the same problem:

>>> def outer():
...     x = 1
...     def inner():
...         print(x)
...         x = 2
...     inner()
...
>>> outer()
Traceback (most recent call last):
  File "<python-input-1>", line 1, in <module>
    outer()
    ~~~~~^^
  File "<python-input-0>", line 6, in outer
    inner()
    ~~~~~^^
  File "<python-input-0>", line 4, in inner
    print(x)
          ^
UnboundLocalError: cannot access local variable 'x' where it is not associated with a value

Doing for x in [1, 2, 3] would have the same problem because the for is an implicit =.