Evaluating the Use of Rust (PyO3) in a Python Project

Camilo MATAJIRA Avatar

In this project, I evaluated the adoption of Rust (via PyO3) inside one of my personal projects, sysadmindb.
The idea was to replace the main regex parser from Python’s re, to Rust’s regex crate.
I benchmarked the current implementation and the proposed Rust version using pytest-benchmark.
The results showed that the “pure” Python implementation was faster than Rust’s (~1.4x).
After several trials and error, and further benchmarking, I concluded that for this specific regex (full of capturing groups)
Python’s re is faster that Rusts regex crate, hence, independent of the FFI boundary overhead, the python implementation would always be more performant.

About SysadminDB

SysadminDB stores logs that can be queried using unix/linux powertools such as grep, sed and awk.
The project is written in Python, and it contains a TCPServer to receive the logs, and an HTTPServer to query the logs.
The database is Sqlite.

The main task of the TCPServer is to receive, parse and insert logs into the database.
The objective of this experiment was to improve the performance of this server by switching the regex parsing from Python to Rust.
This was chosen somewhat ad hoc, under the assumption that the migration would be relatively easy and provide a good ROI.

Code

This is the Python implementation:

from typing import List
import re


class Log:
    def __init__(self, message: str):
        pattern = "\<(?P<prival>[0-9]+)\>(?P<version>[0-9])?\s?"
        pattern += "(?P<date>([0-9]{4}-[0-9]{2}-[0-9]{2}T[0-9]{2}:[0-9]{2}:[0-9]{2}\.[0-9]+(Z|[+-][0-9]{2}:[0-9]{2})|\w{3}\s[0-9]{2}\s[0-9]{2}:[0-9]{2}:[0-9]{2}))\s"
        pattern += "(?P<hostname>[\w.]+)\s"
        pattern += "(?P<appname>[\w.]+)\s?"
        pattern += "\[?(?P<procid>[0-9-]+)?\]?\:?\s?"
        pattern += "(?P<msgid>(-|\w{2}[0-9]{2}))?\s?"
        pattern += "(?P<structureddata>(\[.+\]|-))?\s?(BOM)?"
        pattern += "(?P<msg>.+)?"
        match = re.match(pattern, message)
        print(match)
        try:
            self.version = int(match.group("version"))
        except:
            self.version = None
        self.prival = int(match.group("prival"))
        self.date = match.group("date")
        self.hostname = match.group("hostname")
        self.appname = match.group("appname")
        self.procid = match.group("procid")
        self.msgid = match.group("msgid")
        self.structureddata = match.group("structureddata")
        try:
            self.msg = match.group("msg")
        except:
            self.msg = ""
        self.original_msg = message

This is my final equivalent code in Rust (after several iterations):

use pyo3::prelude::*;

/// A Python module implemented in Rust.
#[pymodule]
mod sysadmindb_rs {
    use pyo3::exceptions::PyValueError;
    use pyo3::prelude::*;

    use regex::Regex;
    use std::sync::OnceLock;

    fn log_pattern() -> &'static Regex {
        static RE: OnceLock<Regex> = OnceLock::new();
        RE.get_or_init(|| Regex::new(r"<(?<prival>[0-9]+)>(?<version>[0-9])?\s?(?<date>([0-9]{4}-[0-9]{2}-[0-9]{2}T[0-9]{2}:[0-9]{2}:[0-9]{2}\.[0-9]+(Z|[+-][0-9]{2}:[0-9]{2})|\w{3}\s[0-9]{2}\s[0-9]{2}:[0-9]{2}:[0-9]{2}))\s(?<hostname>[\w.]+)\s(?<appname>[\w.]+)\s?\[?(?<procid>[0-9-]+)?\]?\:?\s?(?<msgid>(-|\w{2}[0-9]{2}))?\s?(?<structureddata>(\[.+\]|-))?\s?(BOM)?(?<msg>.+)?").unwrap())
    }

    #[pyclass]
    struct Log {
        #[pyo3(get)]
        version: Option<u32>,
        #[pyo3(get)]
        prival: u32,
        #[pyo3(get)]
        date: String,
        #[pyo3(get)]
        hostname: String,
        #[pyo3(get)]
        appname: String,
        #[pyo3(get)]
        procid: String,
        #[pyo3(get)]
        msgid: String,
        #[pyo3(get)]
        structureddata: String,
        #[pyo3(get)]
        msg: String,
    }
    #[pymethods]
    impl Log {
        #[new]
        fn new(line: &str) -> PyResult<Self> {
            match parse_log(line) {
                Ok(log) => Ok(log),
                Err(_) => Err(PyValueError::new_err("Cannot parse")),
            }
        }
    }
    fn parse_log(line: &str) -> Result<Log, String> {
        let Some(caps) = log_pattern().captures(&line) else {
            return Err("sorry".to_string());
        };

        Ok(Log {
            prival: caps["prival"].parse().unwrap(),
            version: caps.name("version").map(|m| m.as_str().parse().unwrap()),
            date: caps["date"].to_owned(),
            hostname: caps["hostname"].to_owned(),
            appname: caps["appname"].to_owned(),
            procid: caps["procid"].to_owned(),
            msgid: caps["msgid"].to_owned(),
            structureddata: caps["structureddata"].to_owned(),
            msg: caps
                .name("msg")
                .map(|m| m.as_str().to_owned())
                .unwrap_or_default(),
        })
    }

The Benchmark

To benchmark, pytest-benchmark was used.
The test was to parse a single log line.

class TestMessage:
    def test_init_benchmark(self, benchmark):
        original_message = "<34>1 2003-10-11T22:14:15.003Z mymachine.example.com su - ID47 - BOM'su root' failed for lonvick on /dev/pts/8"
        benchmark(lambda: log.Log(original_message))

    def test_init_benchmark_rs(self, benchmark):
        import sysadmindb_rs

        original_message = "<34>1 2003-10-11T22:14:15.003Z mymachine.example.com su - ID47 - BOM'su root' failed for lonvick on /dev/pts/8"
        benchmark(lambda: sysadmindb_rs.Log(original_message))

Initial Results

Several optimization attempts were made, during which I corrected several issues like:

  • Not using the “release” version of the compile library in Rust.
  • Removing print and println! statements.
  • Creating a static variable in Rust to mimic Python’s re.compile()

After correcting all that, the results are the following: Rust was ~1.33x slower than Python.

---------------------------------------------------------------------------------------------------------------------------------
Name (time in us)                Min                   Max                  Mean              StdDev                Median       
---------------------------------------------------------------------------------------------------------------------------------
test_init_benchmark           1.4840 (1.0)        118.6160 (4.60)         1.8705 (1.0)        0.7506 (1.53)         1.7690 (1.0) 
test_init_benchmark_rs        2.0340 (1.37)        25.7780 (1.0)          2.4947 (1.33)       0.4906 (1.0)          2.3780 (1.34)
---------------------------------------------------------------------------------------------------------------------------------

These results were unexpected, so I started experimenting with additional optimizations like:

  • Changing String’s .to_owned() to &str, and to string slices.
  • Modified the regex to use less captures.
  • Change the return types of the rust function to pay less PyO3 overhead.
    Yet, neither of those optimizations provided a meaningful impact.

Later, following some advice I received from Reddit, and started benchmarking the system more precisely to better understand were the overhead originated.
I benchmarked:

  • Benchmark the time to instantiate the Log Class and the Log Struct with fake data.
  • Benchmark the time to parse and create a Log Struct, but not returning it to Python (no second FFI overhead)
  • Benchmark the matching (and capturing) in Python and Rust independently.

To benchmark the Rust code in isolation I used Criterion, for the other benchmarks I continued using pytest-benchmark.
The Criterion code is the following:

use criterion::{Criterion, black_box, criterion_group, criterion_main};
use sysadmindb_rs::sysadmindb_rs::match_only;

fn benchmark_parse(c: &mut Criterion) {
    let line = "<34>1 2003-10-11T22:14:15.003Z mymachine.example.com su - ID47 - BOM'su root' failed for lonvick on /dev/pts/8";

    c.bench_function("match_only", |b| b.iter(|| match_only(black_box(line))));
}

criterion_group!(benches, benchmark_parse);
criterion_main!(benches);

The results are the following:

  ┌───────────────────────────┬───────────┬─────────────────────────────────────────┐
BenchmarkMeanWhat it measures             │
  ├───────────────────────────┼───────────┼─────────────────────────────────────────┤
  │ fake_log_rs               │ ~145 ns   │ #[pyclass] struct creation only         │
  ├───────────────────────────┼───────────┼─────────────────────────────────────────┤
  │ match_only_rs (criterion) │ ~1,855 ns │ Pure Rust regex captures(), no FFI
  ├───────────────────────────┼───────────┼─────────────────────────────────────────┤
  │ match_only_python         │ ~1,371 ns │ Python re.match()                       │
  ├───────────────────────────┼───────────┼─────────────────────────────────────────┤
  │ match_only_rs (pytest)    │ ~2,129 ns │ Rust regex + Single FFI crossing        │
  ├───────────────────────────┼───────────┼─────────────────────────────────────────┤
  │ test_init_benchmark       │ ~2,431 ns │ Full Python parse                       │
  ├───────────────────────────┼───────────┼─────────────────────────────────────────┤
  │ parse_no_return_rs        │ ~2,563 ns │ Rust regex + struct, no FFI marshalling │
  ├───────────────────────────┼───────────┼─────────────────────────────────────────┤
  │ test_init_benchmark_rs    │ ~4,233 ns │ Full Rust parse + FFI
  └───────────────────────────┴───────────┴─────────────────────────────────────────┘

Analysis

For the following analysis, let’s assume that results from criterion and pytest-benchmark are accurate.
Let’s also assume that the differences between estimates are statistically significant.
This disclaimer is important because the benchmark durations are extremely small and therefore heavily influenced by background activity on the machine running the tests.

  • Rust (match_only_rs (criterion)) took 1.4x longer to perform the regex and capture groups, than Python’s re (match_only_python).
  • This alone suggests that rewriting this regex in Rust is not beneficial. If the Rust regex implementation itself is slower, then adding FFI overhead will only worsen the results.
  • The cost of single FFI crossing can be estimated in 274ns (match_only_rs (pytest) – match_only_rs (criterion)).
  • The cost of returning the struct and converting it into Python objects is ~1670 ns (test_init_benchmar_rs- parse_no_return_rs).
    This cost could be potentially reduced by optimizing the data structures and manually converting the struct fields. Yet, as shown before, this is not worthwhile.
  • The cost of creating the Log struct in Rust is 145ns.
  • The total performance difference between the full Python implementation and the Rust implementation is 1802ns (test_init_benchmar_rs – test_init_benchmark).

Conclusion

This project demonstrated that “not everything that glitters is Rust”.
Under certain circumstances, Rust’s regex crate can be slows than Python’s re module.
In scenarios like this, regardless of the effort spent optimizing data structures or minimizing FFI overhead, if the core rewritten logic is not faster, the rest of the optimization work becomes irrelevant.
My recommendation for future Python/Rust integration projects is to benchmark both implementations in isolation first.
If the Rust implementation is fast enough to amortize the FFI overhead with sufficient margin left over, then it makes sense to continue building the glue code between Python and Rust.
Otherwise, it is better to search for a more suitable optimization target.

Tagged in :

Camilo MATAJIRA Avatar