Jun 30, 2023
CEL RUST: PARSING DURATION STRINGS WITH NOM
Learning how to use nom by adding support for the duration type to the CEL Rust implementation.
Duration strings

In CEL, duration strings allow us to do expressions like the expression below. This means that we need to be able to parse strings like “24h” into a meaningful representation that our interpreter can use to perform comparisons and other mathematical operations.

request.time - resource.age < duration("24h")

The CEL specification states that, “Duration strings should support the following suffixes: ‘h’ (hour), ‘m’ (minute), ‘s’ (second), ‘ms’ (millisecond), ‘us’ (microsecond), and ‘ns’ (nanosecond). Duration strings may be zero, negative, fractional, and/or compound. Examples: ‘0’, ‘-1.5h’, ‘1m6s’“.

What is nom?

nom is a parser combinator library with a focus on safe parsing, streaming patterns, and as much as possible, zero copy. I’m no expert in the library or the concepts, but I know enough to parse duration strings with it, so here we go. nom parsers are functions that take a &str as input and return a tuple containing the remaining input that was not parsed, and the parsed value. Let’s look at a basic example:

fn parse_http(i: &str) -> IResult<&str, &str> {
    tag("http")(i)
}

#[test]
fn test_parse_http() {
    let (leftover, parsed) = parse_http("http foo").unwrap();
    println!("leftover: '{}'", leftover);
    println!("parsed: '{}'", parsed);
}

// Outputs:
// leftover: ' foo'
// parsed: 'http'

Notice how we used a parser in our own parser called tag. nom ships with tons of byte parsers, character parsers, number parsers, and more, which we can combine to form our own more specific parsers.

You can also combine many parsers together using the built-in combinators to form more complex parsers. We’ll explore some examples of this in a moment.

Parsing units

One of the core principles of nom is starting with the most basic parser possible and working outwards from there. So, let’s start with a parser that can parse the duration units (hours, minutes, seconds, etc.). Ideally we’d map the units that we parse to a enum so that we have a more structured representation of the duration string.

enum Unit {
    Hour,
    Minute,
    Second,
    Millisecond,
    Microsecond,
    Nanosecond,
}

Now we can write a parser that will parse “h” to Unit::Hour, “m” to Unit::Minute, etc. We’ll start by just making sure we can parse the “h” unit. This parser is going to be a little bit different then the last parser example we looked at. Instead of returning the &str that we parsed, we want to map the “h” to the Unit::Hour variant. We can do this using the map combinator:

fn parse_unit(i: &str) -> IResult<&str, Unit> {
    map(char('h'), |_| Unit::Hour)(i)
}

Great! Now what if we want to parse and map the rest of the units? nom provides another combinator called alt which allows you to try multiple parsers in order until one succeeds. Parsing in order is important to keep in mind here. For example if we had the ‘m’ branch before the ‘ms’ branch, then we’d always parse minutes even when we’re supposed to be parsing milliseconds. We want to order our parsers from most specific to most generic (“ms” is more specific than “m”). We can use this to parse all the units:

fn parse_unit(i: &str) -> IResult<&str, Unit> {
    alt((
        map(tag("ms"), |_| Unit::Millisecond),
        map(tag("us"), |_| Unit::Microsecond),
        map(tag("ns"), |_| Unit::Nanosecond),
        map(char('h'), |_| Unit::Hour),
        map(char('m'), |_| Unit::Minute),
        map(char('s'), |_| Unit::Second),
    ))(i)
}
Parsing numbers

We also want to parse the numbers associated with the units and then convert that combination into a chrono::Duration. The general idea here is that we want to turn “1h” into Duration::hours(1) and “1h30m” into Duration::hours(1) + Duration::minutes(30), etc. The chrono::Duration is represented as nanoseconds, so we need to convert whatever we parse into nanoseconds. Let’s implement a new method on the Unit enum to give us what we need to make this conversion:

impl Unit {
    fn nanos(&self) -> i64 {
        match self {
            Unit::Nanosecond => 1,
            Unit::Microsecond => 1_000,
            Unit::Millisecond => 1_000_000,
            Unit::Second => 1_000_000_000,
            Unit::Minute => 60 * 1_000_000_000,
            Unit::Hour => 60 * 60 * 1_000_000_000,
        }
    }
}

Now let’s implement a function that can handle our conversion. We want to be able to take 1.5 (f64) Unit::Second and turn that into a chrono::Duration. This function converts a number into the appropriate duration based on the unit. It truncates any precision more granular than nanoseconds which is perfectly fine for our use case since the spec doesn’t require sub-nanocsecond precision.

fn to_duration(num: f64, unit: Unit) -> Duration {
    Duration::nanoseconds((num * unit.nanos() as f64).trunc() as i64)
}
Parsing duration strings

Let’s combine what we’ve built so far into a single function that can parse the duration string “1.5h”. First we expect to see a number, and nom already has built-in support for parsing numbers. Since we support fractional numbers we’ll need to use the double parser. Once we’ve parsed the number, we’ll want to parse the unit, then convert the number and unit into a duration.

fn parse_number_unit(i: &str) -> IResult<&str, Duration> {
    let (i, num) = double(i)?;
    let (i, unit) = parse_unit(i)?;
    let duration = to_duration(num, unit);
    Ok((i, duration))
}

And to test it:

#[test]
fn test_parse_number_unit() {
    let (_, duration) = parse_number_unit("1.5h").unwrap();
    println!("{:?}", duration);
}

// Outputs:
// Duration { secs: 5400, nanos: 0 }
Compound durations

We’re doing pretty good so far! The spec also says that we need to support compound durations like “1h30m”. We can do this by parsing multiple number-unit pairs and then summing them together. We can use nom’s many1 parser which will repeat a parser one or more times, producing a vector of parse results.

pub fn parse_duration(i: &str) -> IResult<&str, Duration> {
    let (i, durations) = many1(parse_number_unit)(i)?;
    let duration = durations
        .iter()
        .fold(Duration::zero(), |acc, next| acc + *next);
    Ok((i, duration))
}

There’s a bit to digest here, so let’s break it down.

  1. We start by parsing one or more number-unit pairs just like we were doing before, but this time instead of getting back Duration, we get Vec<Duration>.
  2. Our function signature for parse_duration says we need to return Duration so we need to turn Vec<Duration> into Duration which we can do by reducing (or folding in Rust) the vector into a single Duration. We wanted to be able to parse “1h30m”. We’re going to get a Vec<Duration> that looks like [Duration::hours(1), Duration::minutes(30)] that all need to be added together.
Negatives and zeroes

According to the spec, we also need to support duration strings like “-1h” or “0”. For “0” I figured it would be easier to just check if the string was “0” and short-circuit the parsers and we’ll see how far that gets us. For negative numbers, we can create a new parser and use the opt combinator to indicate that the negative sign is optional — parse it if it’s there, but don’t fail if it’s not. The opt combinator returns an option, if it’s Some, then we multiply our duration by -1, otherwise we just return the duration as-is.

fn parse_negative(i: &str) -> IResult<&str, ()> {
    let (i, _): (&str, char) = char('-')(i)?;
    Ok((i, ()))
}

pub fn parse_durations(i: &str) -> IResult<&str, Duration> {
    let (i, neg) = opt(parse_negative)(i)?;
    if i == "0" {
        return Ok(("", Duration::zero()));
    }
    let (i, durations) = many1(parse_number_unit)(i)?;
    let duration = durations
        .iter()
        .fold(Duration::zero(), |acc, next| acc + *next);
    Ok((i, duration * if neg.is_some() { -1 } else { 1 }))
}
Summary

And that’s it! Duration parsing is now supported in cel-rust. In addition to parsing, I also had to add support for the type in the interpreter and configure the mathematical operator implementations which (thanks to Rust) was surprisingly easy. But I’ll save this for another post.