mirai 2.5.0

  mirai, parallelism

  Charlie Gao

We’re excited to announce mirai 2.5.0, bringing production-grade async computing to R!

This milestone release delivers enhanced observability through OpenTelemetry, reproducible parallel RNG, and key user interface improvements. We’ve also packed in twice as many changes as usual - going all out in delivering a round of quality-of-life fixes to make your use of mirai even smoother!

You can install it from CRAN with:

install.packages("mirai")

Introduction to mirai

mirai (Japanese for ‘future’) provides a clean, modern approach to parallel computing in R. Built on current communication technologies, it delivers extreme performance through professional-grade scheduling and an event-driven architecture.

It continues to evolve as the foundation for asynchronous and parallel computing across the R ecosystem, powering everything from async Shiny applications to parallel map in purrr to hyperparameter tuning in tidymodels.

library(mirai)

# Set up persistent background processes
daemons(4)

# Async evaluation - non-blocking
m <- mirai({
  Sys.sleep(1)
  100 + 42
})
m
#> < mirai [] >

# Results are available when ready
m[]
#> [1] 142

# Shut down persistent background processes
daemons(0)

A unique design philosophy

Modern foundation: mirai builds on nanonext, the R binding to Nanomsg Next Generation, a high-performance messaging library designed for distributed systems. This means that it’s using the very latest technologies, and supports the most optimal connections out of the box: IPC (inter-process communications), TCP or secure TLS. It also extends base R’s serialization mechanism to support custom serialization of newer cross-language data formats such as safetensors, Arrow and Polars.

Extreme performance: as a consequence of its solid technological foundation, mirai has the proven capacity to scale to millions of concurrent tasks over thousands of connections. Moreover, it delivers up to 1,000x the efficiency and responsiveness of other alternatives. A key innovation is the implementation of event-driven promises that react with zero latency - this provides an extra edge for real-time applications such as live inference or Shiny apps.

Production first: mirai provides a clear mental model for parallel computation, with a clean separation of a user’s current environment with that in which a mirai is evaluated. This explicitness and simplicity helps avoid common pitfalls that can afflict parallel processing, such as capturing incorrect or extraneous variables. Transparency and robustness are key to mirai’s design, and are achieved by minimizing complexity, and eliminating all hidden state with no reliance on options or environment variables. Finally, its integration with OpenTelemetry provides for production-grade observability.

Deploy everywhere: deployment of daemon processes is made through a consistent interface across local, remote (SSH), and HPC environments (Slurm, SGE, PBS, LSF). Compute profiles are daemons settings that are managed independently, such that you can be connected to all three resource types simultaneously. You then have the freedom to distribute workload to the most appropriate resource for any given task - especially important if tasks have differing requirements such as GPU compute.

OpenTelemetry integration

New in mirai 2.5.0: complete observability of mirai requests through OpenTelemetry traces. This is a core feature that completes the final pillar in mirai’s ‘production first’ design philosophy.

When tracing is enabled via the otel and otelsdk packages, you can monitor the entire lifecycle of your async computations, from creation through to evaluation, making it easier to debug and optimize performance in production environments. This is especially powerful when used in conjunction with other otel-enabled packages (such as an upcoming Shiny release), providing end-to-end observability across your entire application stack.

Illustrative OpenTelemetry span structure shown in a Jaeger collector UI

Reproducible parallel RNG

Introduced in mirai 2.4.1: reproducible parallel random number generation. Developed in consultation with our tidymodels colleagues and core members of the mlr team, this is a great example of the R community pulling together to solve a common problem. It addresses a long-standing challenge in parallel computing in R, important for reproducible science.

mirai has, since its early days, used L’Ecuyer-CMRG streams for statistically-sound parallel RNG. Streams essentially cut into the RNG’s period (a very long sequence of pseudo-random numbers) at intervals that are far apart from each other that they do not in practice overlap. This ensures that statistical results obtained from parallel computations remain correct and valid.

Previously, we only offered the following option, matching the behaviour of base R’s parallel package:

Default behaviour daemons(seed = NULL): creates independent streams for each daemon. This ensures statistical validity but not numerical reproducibility between runs.

Now, we also offer the following option:

Reproducible mode daemons(seed = integer): creates a stream for each mirai() call rather than each daemon. This guarantees identical results across runs, regardless of the number of daemons used.

# Always provides identical results:

with(
  daemons(3, seed = 1234L),
  mirai_map(1:3, rnorm, .args = list(mean = 20, sd = 2))[]
)
#> [[1]]
#> [1] 19.86409
#> 
#> [[2]]
#> [1] 19.55834 22.30159
#> 
#> [[3]]
#> [1] 20.62193 23.06144 19.61896

User interface improvements

Compute profile helper functions

with_daemons() and local_daemons() make working with compute profiles much more convenient by allowing the temporary switching of contexts. This means that developers can continue to write mirai code without worrying about the resources on which it is eventually run. End-users now have the ability to change the destination of any mirai computation dynamically using one of these scoped helpers.

# Work with specific compute profiles
with_daemons("gpu", {
  result <- mirai(gpu_intensive_task())
})

# Local version for use inside functions
async_gpu_intensive_task <- function() {
  local_daemons("gpu")
  mirai(gpu_intensive_task())
}

Re-designed daemons()

Creating new daemons is now more ergonomic, as it automatically resets existing ones. This provides for more convenient use in contexts such as notebooks, where cells may be run out of order. Manual daemons(0) calls are no longer required to reset daemons.

# Old approach
daemons(0)  # Had to reset first
daemons(4)

# New approach - automatic reset
daemons(4)  # Just works, resets if needed

New info() function

Provides a more succinct alternative to status() for reporting key statistics. This is optimized and is now a supported developer interface for programmatic use.

info()
#> connections  cumulative    awaiting   executing   completed 
#>           4           4           8           4           2

Acknowledgements

We extend our gratitude to the R community for their continued feedback and contributions. Special thanks to all contributors who helped shape this release through feature requests, bug reports, and code contributions: @agilly, @D3SL, @DavZim, @dipterix, @eliocamp, @erydit, @karangattu, @louisaslett, @mikkmart, @sebffischer, @shikokuchuo, and @wlandau.