In CEL, duration strings allow us to do expressions like the expression below. This means that we need to be able to parse strings like “24h” into a meaningful representation that our interpreter can use to perform comparisons and other mathematical operations.
request.time - resource.age < duration("24h")
The CEL specification states that, “Duration strings should support the following suffixes: ‘h’ (hour), ‘m’ (minute), ‘s’ (second), ‘ms’ (millisecond), ‘us’ (microsecond), and ‘ns’ (nanosecond). Duration strings may be zero, negative, fractional, and/or compound. Examples: ‘0’, ‘-1.5h’, ‘1m6s’“.
nom is a parser combinator library with a focus on safe parsing, streaming patterns, and as much as possible, zero copy. I’m no expert in the library or the concepts, but I know enough to parse duration strings with it, so here we go. nom parsers are functions that take a &str
as input and return a tuple containing the remaining input that was not parsed, and the parsed value. Let’s look at a basic example:
fn parse_http(i: &str) -> IResult<&str, &str> {
tag("http")(i)
}
#[test]
fn test_parse_http() {
let (leftover, parsed) = parse_http("http foo").unwrap();
println!("leftover: '{}'", leftover);
println!("parsed: '{}'", parsed);
}
// Outputs:
// leftover: ' foo'
// parsed: 'http'
Notice how we used a parser in our own parser called tag
. nom ships with tons of byte parsers, character parsers, number parsers, and more, which we can combine to form our own more specific parsers.
You can also combine many parsers together using the built-in combinators to form more complex parsers. We’ll explore some examples of this in a moment.
One of the core principles of nom is starting with the most basic parser possible and working outwards from there. So, let’s start with a parser that can parse the duration units (hours, minutes, seconds, etc.). Ideally we’d map the units that we parse to a enum so that we have a more structured representation of the duration string.
enum Unit {
Hour,
Minute,
Second,
Millisecond,
Microsecond,
Nanosecond,
}
Now we can write a parser that will parse “h” to Unit::Hour
, “m” to Unit::Minute
, etc. We’ll start by just making sure we can parse the “h” unit. This parser is going to be a little bit different then the last parser example we looked at. Instead of returning the &str
that we parsed, we want to map the “h” to the Unit::Hour
variant. We can do this using the map
combinator:
fn parse_unit(i: &str) -> IResult<&str, Unit> {
map(char('h'), |_| Unit::Hour)(i)
}
Great! Now what if we want to parse and map the rest of the units? nom provides another combinator called alt
which allows you to try multiple parsers in order until one succeeds. Parsing in order is important to keep in mind here. For example if we had the ‘m’ branch before the ‘ms’ branch, then we’d always parse minutes even when we’re supposed to be parsing milliseconds. We want to order our parsers from most specific to most generic (“ms” is more specific than “m”). We can use this to parse all the units:
fn parse_unit(i: &str) -> IResult<&str, Unit> {
alt((
map(tag("ms"), |_| Unit::Millisecond),
map(tag("us"), |_| Unit::Microsecond),
map(tag("ns"), |_| Unit::Nanosecond),
map(char('h'), |_| Unit::Hour),
map(char('m'), |_| Unit::Minute),
map(char('s'), |_| Unit::Second),
))(i)
}
We also want to parse the numbers associated with the units and then convert that combination into a chrono::Duration
. The general idea here is that we want to turn “1h” into Duration::hours(1)
and “1h30m” into Duration::hours(1) + Duration::minutes(30)
, etc. The chrono::Duration
is represented as nanoseconds, so we need to convert whatever we parse into nanoseconds. Let’s implement a new method on the Unit
enum to give us what we need to make this conversion:
impl Unit {
fn nanos(&self) -> i64 {
match self {
Unit::Nanosecond => 1,
Unit::Microsecond => 1_000,
Unit::Millisecond => 1_000_000,
Unit::Second => 1_000_000_000,
Unit::Minute => 60 * 1_000_000_000,
Unit::Hour => 60 * 60 * 1_000_000_000,
}
}
}
Now let’s implement a function that can handle our conversion. We want to be able to take 1.5 (f64) Unit::Second
and turn that into a chrono::Duration
. This function converts a number into the appropriate duration based on the unit. It truncates any precision more granular than nanoseconds which is perfectly fine for our use case since the spec doesn’t require sub-nanocsecond precision.
fn to_duration(num: f64, unit: Unit) -> Duration {
Duration::nanoseconds((num * unit.nanos() as f64).trunc() as i64)
}
Let’s combine what we’ve built so far into a single function that can parse the duration string “1.5h”. First we expect to see a number, and nom already has built-in support for parsing numbers. Since we support fractional numbers we’ll need to use the double
parser. Once we’ve parsed the number, we’ll want to parse the unit, then convert the number and unit into a duration.
fn parse_number_unit(i: &str) -> IResult<&str, Duration> {
let (i, num) = double(i)?;
let (i, unit) = parse_unit(i)?;
let duration = to_duration(num, unit);
Ok((i, duration))
}
And to test it:
#[test]
fn test_parse_number_unit() {
let (_, duration) = parse_number_unit("1.5h").unwrap();
println!("{:?}", duration);
}
// Outputs:
// Duration { secs: 5400, nanos: 0 }
We’re doing pretty good so far! The spec also says that we need to support compound durations like “1h30m”. We can do this by parsing multiple number-unit pairs and then summing them together. We can use nom’s many1 parser which will repeat a parser one or more times, producing a vector of parse results.
pub fn parse_duration(i: &str) -> IResult<&str, Duration> {
let (i, durations) = many1(parse_number_unit)(i)?;
let duration = durations
.iter()
.fold(Duration::zero(), |acc, next| acc + *next);
Ok((i, duration))
}
There’s a bit to digest here, so let’s break it down.
Duration
, we get Vec<Duration>
.parse_duration
says we need to return Duration
so we need to turn Vec<Duration>
into Duration
which we can do by reducing (or folding in Rust) the vector into a single Duration
. We wanted to be able to parse “1h30m”. We’re going to get a Vec<Duration>
that looks like [Duration::hours(1), Duration::minutes(30)]
that all need to be added together.According to the spec, we also need to support duration strings like “-1h” or “0”. For “0” I figured it would be easier to just check if the string was “0” and short-circuit the parsers and we’ll see how far that gets us. For negative numbers, we can create a new parser and use the opt
combinator to indicate that the negative sign is optional — parse it if it’s there, but don’t fail if it’s not. The opt
combinator returns an option, if it’s Some
, then we multiply our duration by -1, otherwise we just return the duration as-is.
fn parse_negative(i: &str) -> IResult<&str, ()> {
let (i, _): (&str, char) = char('-')(i)?;
Ok((i, ()))
}
pub fn parse_durations(i: &str) -> IResult<&str, Duration> {
let (i, neg) = opt(parse_negative)(i)?;
if i == "0" {
return Ok(("", Duration::zero()));
}
let (i, durations) = many1(parse_number_unit)(i)?;
let duration = durations
.iter()
.fold(Duration::zero(), |acc, next| acc + *next);
Ok((i, duration * if neg.is_some() { -1 } else { 1 }))
}
And that’s it! Duration parsing is now supported in cel-rust. In addition to parsing, I also had to add support for the type in the interpreter and configure the mathematical operator implementations which (thanks to Rust) was surprisingly easy. But I’ll save this for another post.