Jun 15, 2023
CEL RUST: VARIABLE SHADOWING AND SCOPES
Implementing .map() and .filter(). Variable shadowing, scopes and why we need them for map and filter expressions.
WHAT IS SHADOWING?

Variable shadowing in programming is a phenomenon where a variable declared within a certain scope has the same name as a variable in an outer scope. When this happens, the inner variable “shadows” the outer one. Within its scope, the inner variable takes precedence and the outer variable is temporarily obscured.

Scopes in programming define the region or section of the code where a variable or a function is visible and can be accessed. In Rust, the scope is usually defined by a pair of curly braces {}.

Here’s an example of variable shadowing in Rust:

fn main() {
    let x = 5;
    {
        let x = 10;
        println!("{}", x);
    }
    println!("{}", x);
}

// Output:
// 10
// 5

In this example, there are two variables named x. The outer x is declared first with the value 5. Then, inside a new scope (denoted by the curly braces), an inner x is declared with the value 10. When we print x inside the inner scope, it refers to the inner x and prints 10, because it’s shadowing the outer x. But once we’re out of the inner scope, the outer x becomes accessible again and it prints 5. This is how variable shadowing works with scopes in Rust.

HOW IS IT APPLICABLE TO CEL?

If you look at the syntax for a CEL map expression, you’ll notice that it’s one of the few built-in functions that allow you to declare a variable and then operate on that new variable.

//           | variable declaration
//           ↓  ↓ variable reference on each iteration
array.filter(x, x > 5)

Let’s assume that our CEL program has a few variables initialized

NAMEVALUE
nameJohn
array[1, 2, 3]

Let’s also assume that we didn’t implement variable shadowing and scopes. When I originally forked the cel-rust project, the expressions were evaluated in a global scope. How do you imagine this expression would be evaluated?

array.filter(name, name > 5) && name == "John"

With global scoping, this expression would evaluate to false. While the variable name starts with the value “John”, that value gets overwritten three times, once for each element in the array. The last value in the array is 3 so 3 == "John" evaluates to false.

CURRENT IMPLEMENTATION

The global scope for variables in CEL is maintained in a Rust type called the Context. This Context is passed around during expression evaluation so that any variable references can be resolved by looking up the value in the Context.

pub struct Context {
    pub variables: HashMap<String, CelType>,
    pub functions: HashMap<String, Box<ContextFunction>>,
}

The map and filter functions are unique in that they allow the user to declare a variable that refers to each item in the array as we iterate over the array. This means that on each iteration of the loop, the function implementations need to update the Context’s variables field with the new variable value.

As I mentioned, we can’t just variables.insert(...) otherwise we run the risk of overwriting an existing variable with the same name.

My next thought was, “let’s just clone the Context and pass that through each iteration of the array”. While this would work, in addition to cloning a copy of all the variables, which we want to do, it also requires us to clone all the functions in the Context as well. After some head-against-wall banging with traits and parent/child context types, I opted to try a different approach.

THE SOLUTION

I needed to come up with a way to encode two separate behaviors into the system:

  1. Create new scopes that had copies of the variables in the parent
  2. A child scope should store a reference to a parent scope, allowing it to access the parent’s variables and functions.

As always, my stroke of inspiration came while I was not working. I realized I could use an enum to address both of these needs. One variant would represent the root context — the context that gets created when we start evaluating a program. The other variant would represent a child context and would maintain its own copy of all the variables currently in the program.

pub enum Context<'a> {
    Root {
        functions: HashMap<String, Box<ContextFunction>>,
        variables: HashMap<String, CelType>,
    },
    Child {
        parent: &'a Context<'a>,
        variables: HashMap<String, CelType>,
    },
}

This effectively gives us a tree of Contexts where each child can access the variables in its parent but can also shadow them with its own variables. During variable resolution, we just need to make sure to traverse up the tree until we find the variable we’re looking for and honestly, with the new data structure, this was a piece of cake.

When a caller needs to access a variable, we check what type of Context we are. If we’re a Context::Root, then we try and access the variables directly. If we’re a Context::Child, then we try and access the variables in the child first and if that fails, we try and access the variables in the parent, effectively giving us the shadowing behavior we want and implicitly handling tree traversal.

impl Context {
    pub fn get_variable<S>(&self, name: S) -> Result<CelType, ExecutionError>
    where
        S: Into<String>,
    {
        let name = name.into();
        match self {
            Context::Child { variables, parent } => variables
                .get(&name)
                .cloned()
                .or_else(|| parent.get_variable(&name).ok())
                .ok_or(ExecutionError::UndeclaredReference(name.into())),
            Context::Root { variables, .. } => variables
                .get(&name)
                .cloned()
                .ok_or(ExecutionError::UndeclaredReference(name.into())),
        }
    }
}
TREE TRAVERSAL

Implicitly handling tree-traversal you say? How so? When &self is the Context::Child variant and when the variable does not exist in the child’s variables field, we call parent.get_variable(&name) which goes to the context’s parent and follows the same procedure. If our parent is a Context::Child as well (if we’re in nested scopes) then we just rinse and repeat, calling parent.get_variable(&name) until we get back to the Context::Root.

To construct the tree, we would have a function like this:

impl Context {
    fn clone(&self) -> Context {
        Context::Child {
            parent: self,
            variables: Default::default(),
        }
    }
}

And then callers can just incrementally call clone() to create a new child context.

let root = Context::default();
let child1 = root.clone();
let child2 = child1.clone();
NESTED SCOPES

The interesting thing about this approach is that it also implicitly supports nested scopes as well. Let’s assume that we had an array of arrays like this.

NAMEVALUE
array[[1, 2], [3, 4]]

What if we wanted to multiply each element in the array by 2? We could do that with this expression. For clarity’s sake you probably wouldn’t want to name your temporary variables in this way, but the point is that nested scopes are implicitly supported and prevent any unexpected behavior caused by a lack of variable shadowing.

array.map(x, x.map(x, x * 2))
SUMMARY

If you stuck around this long, thanks for enjoying my journey getting better at Rust. Stay tuned for more posts about this project.