RAPIDS Introduces GPU Polars Streaming, Unified GNN API, and Zero-Code ML Speedups

RAPIDS, a collection of software libraries from NVIDIA designed for Python data science, has just released version 25.06 with some exciting new additions. One of the key updates is the introduction of a Polars GPU streaming engine, a unified API for graph neural networks (GNNs), and enhancements to speed up support vector machines without requiring any code changes. Let’s take a closer look at these new features.

The Polars GPU engine has been updated with some significant improvements in the latest release. One of the most notable updates is the introduction of streaming execution. This feature enables the execution of operations on datasets that are larger than the VRAM by leveraging data partitioning and parallel processing. To use this new streaming executor, users can pass a properly configured GPUEngine object to the Polars collect call.

Moreover, users can scale data processing workflows to multiple GPUs thanks to this new streaming mode. This feature can accelerate analytics operations on datasets ranging from hundreds of GBs to TBs. Operations that require data movement between partitions are handled through a new shuffle mechanism, while multi-GPU execution is orchestrated using the Dask distributed scheduler.

In addition to streaming capabilities, the latest release of the Polars GPU engine also introduces support for rolling aggregations and additional column manipulations. Users can now perform .rolling() operations in Polars, creating rolling groups based on another column in their DataFrame. This functionality is especially useful for working with time series datasets.

Furthermore, the GPU engine now supports a wider range of expressions for manipulating datetime columns. New methods such as .strftime() and .cast_time_unit() are now supported, with more planned for future releases to expand the overall API coverage.

NVIDIA cuGraph-PyG has integrated WholeGraph for accelerated feature fetching, resulting in the Unified API for GNNs. This unified API enables users to use WholeGraph’s accelerated feature storage in single-GPU workflows without the need for modifying scripts for multi-GPU or multi-node workflows. The same GNN training script used for prototyping on a single GPU now works seamlessly on a single node with multiple GPUs and on multiple nodes.

Moreover, cuML has introduced enhancements to support vector machines (SVMs) with zero code changes. These powerful algorithms, SVC and SVR, can see significant speed-ups when executed on the GPU. With these additions to cuML’s zero-code-change interface, existing scikit-learn workflows using SVMs can now be accelerated without any modifications required.

The updated release also includes improved scikit-learn compatibility and an upgrade to cuML’s Random Forest estimators with the Forest Inference Library (FIL). This integration provides higher performance and better memory management while maintaining backward compatibility. Users are advised to take note of the deprecated API knobs from the previous implementation.

These new features in RAPIDS 25.06 bring exciting advancements in GPU acceleration and data processing capabilities for Python data scientists and machine learning practitioners. Learn more about these updates and enhancements by checking out the latest documentation and examples provided.