fastbioclim
:
Scalable Derivation of Climate Variablesfastbioclim
is an R package for efficiently deriving
standard bioclimatic and custom summary variables from large-scale
climate raster data. It is designed to overcome the memory limitations
of traditional approaches by intelligently switching between processing
backends.
Working with large climate datasets often presents a major challenge:
the data is too large to fit into memory. fastbioclim
solves this problem by providing a powerful and unified interface with a
dual-backend architecture:
"terra"
): For datasets that
fit comfortably in RAM, fastbioclim
uses the highly
optimized terra
package for maximum speed."tiled"
): For datasets that
exceed available RAM, it automatically switches to a memory-safe tiled
workflow. This backend uses exactextractr
and
Rfast
to process the data chunk by chunk, ensuring that
even continent-scale analyses can run on a standard computer.The core of the package is a set of “smart” wrapper
functions—derive_bioclim()
and
derive_statistics()
—that automatically select the best
backend, providing a seamless experience for the user.
auto
Method: Automatically
detects data size and chooses the optimal processing workflow (in-memory
vs. tiled).derive_statistics()
function to compute custom summaries
(mean, max, min, standard deviation, etc.) for any climate
variable.You can install the development version of fastbioclim
from GitHub with:
# install.packages("remotes")
::install_github("gepinillab/fastbioclim")
remotes# Install to get the package example data
::install_github("gepinillab/egdata.fastbioclim") remotes
The package provides two primary, user-facing functions:
derive_bioclim()
: For calculating the
standard set of 19-35 bioclimatic variables.derive_statistics()
: For calculating
custom summary statistics on any variable.This example demonstrates the core functionality using simple, self-contained mock data.
library(fastbioclim)
library(terra)
library(future.apply)
library(progressr)
<- system.file("extdata/ecuador/", package = "egdata.fastbioclim") |>
tmin_ecu list.files("tmin", full.names = TRUE) |> rast()
<- system.file("extdata/ecuador/", package = "egdata.fastbioclim") |>
tmax_ecu list.files("tmax", full.names = TRUE) |> rast()
<- system.file("extdata/ecuador/", package = "egdata.fastbioclim") |>
prcp_ecu list.files("prcp", full.names = TRUE) |> rast()
# The function will automatically use the fast "terra" method for this small dataset
<- file.path(tempdir(), "bioclim_ecuador")
output_dir_bioclim
<- derive_bioclim(
bioclim_vars bios = 1:19,
tmin = tmin_ecu,
tmax = tmax_ecu,
prcp = prcp_ecu,
output_dir = output_dir_bioclim,
overwrite = TRUE
)
plot(bioclim_vars[[c("bio01", "bio12")]])
# 3. Derive custom summary statistics for a different variable (e.g., wind speed)
<- system.file("extdata/ecuador/", package = "egdata.fastbioclim") |>
wind_rast list.files("wind", full.names = TRUE) |> rast()
<- file.path(tempdir(), "wind_ecuador")
output_dir_custom
<- derive_statistics(
custom_stats variable = wind_rast,
stats = c("mean", "max", "stdev"),
prefix_variable = "wind",
output_dir = output_dir_custom,
overwrite = TRUE
)
plot(custom_stats)
The real power of fastbioclim
shines with large
datasets. The method = "auto"
setting in
derive_bioclim()
and derive_statistics()
handles this automatically.
When the wrapper function detects that the input rasters are too large to fit in memory, it seamlessly switches to the tiled workflow.
Important Requirement: For the tiled workflow to
function, your input SpatRaster
objects must be pointing to
files on disk, not held entirely in memory.
# Conceptual example for large, file-based rasters
<- system.file("extdata/neotropics/", package = "egdata.fastbioclim") |>
tmin_neo list.files("tmin", full.names = TRUE) |> rast()
<- system.file("extdata/neotropics/", package = "egdata.fastbioclim") |>
tmax_neo list.files("tmax", full.names = TRUE) |> rast()
<- system.file("extdata/neotropics/", package = "egdata.fastbioclim") |>
prcp_neo list.files("prcp", full.names = TRUE) |> rast()
<- file.path(tempdir(), "bioclim_neotropics")
output_dir_bios
# Optional: ACTIVATE PROGRESS BAR
::handlers(global = TRUE)
progressr# Optional: DEFINE PARALLEL PLAN FOR EVEN FASTER PROCESSING
::plan("multisession", workers = 4)
future
# The call is identical. `derive_bioclim` will detect the large file size
# and automatically use the memory-safe tiled method.
<- derive_bioclim(
large_scale_vars bios = 1:19,
tmin = tmin_neo,
tmax = tmax_neo,
prcp = prcp_neo,
output_dir = output_dir_bios,
tile_degrees = 20,
overwrite = TRUE
)print(large_scale_vars)
plot(large_scale_vars[["bio11"]])
This R package is currently under active development. While it is functional, it may contain bugs or undergo changes to the API.
Contributions, bug reports, and feature requests are highly encouraged. Please open an issue on our GitHub repository to provide feedback.