Scale your most complex AI and Python workloads with Ray, a simple yet powerful Parallel and distributed computing framework.
Can you imagine the pain of training complex machine learning models that take days or even months depending on the amount of data you have? What if you can train those models within minutes to a maximum of a few hours? Impressive, right? Who does not want that?
But the question is how?
This is where Python Ray comes to your rescue and helps you train models with great efficiency. Ray is a superb tool for effective distributed Python to speed up data processing and Machine Learning workflows. It leverages several CPUs and machines that process the code parallelly and process all the data at lightening fast speed.
This comprehensive Python Ray guide will help you understand its potential usage and how it can help ML platforms to work efficiently.
Let’s get you started.
What is Ray?
Ray is an open-source framework designed to scale AI and Python applications, including machine learning. It simplifies the process of parallel processing, eliminating the need for expertise in distributed systems. Ray gained immense popularity in quick time.
Do you know that top companies are leveraging Ray? Prominent companies such as Uber, Shopify, and Instacart utilize Ray.
Spotify Leveraging Ray
Ray helps Spotify’s data scientists and engineers access a wide range of Python-based libraries to manage their ML workload.
Image Credit: Anyscale
Understanding Ray Architecture
- The head node in a Ray cluster has additional components compared to worker nodes.
- The Global Control Store (GCS) stores cluster-wide information, including object tables, task tables, function tables, and event logs. It is used for web UI, error diagnostics, debugging, and profiling tools.
- The Autoscaler is responsible for launching and terminating worker nodes to ensure sufficient resources for workloads while minimizing idle resources.
- The head node serves as a master that manages the entire cluster through the Autoscaler. However, the head node is a single point of failure. If it is lost, the cluster needs to be re-created, and existing worker nodes may become orphans and require manual removal.
- Each Ray node contains a Raylet, which consists of two main components: the Object Store and the Scheduler.
- The Object Store connects all object stores together, similar to a distributed cache like Memcached.
- The Scheduler within each Ray node functions as a local scheduler that communicates with other nodes, creating a unified distributed scheduler for the cluster.
In a Ray cluster, nodes refer to logical nodes based on Docker images rather than physical machines. A physical machine can run one or more logical nodes when mapping to the physical infrastructure.
Ray Framework
It is possible with the help of the following low-level and high-level layers. Ray framework lets you scale AI and Python apps. It comes with a core distributed runtime and set of libraries (Ray AIR) that simplifies ML computations.
Image Credits: Ray
- Scale ML workloads (Ray AI Runtime)- Ray provides ready-to-use libraries for common machine learning tasks such as data preprocessing, distributed training, hyperparameter tuning, reinforcement learning, and model serving.
- Build Distributing Apps (Ray Core)- It offers user-friendly tools for parallelizing and scaling Python applications, making it easy to distribute workloads across multiple nodes and GPUs.
- Deploy large-scale workloads (Ray Cluster)- Ray clusters consist of multiple worker nodes that are connected to a central Ray head node. These clusters can be configured to have a fixed size or can dynamically scale up or down based on the resource requirements of the applications running on the cluster. Ray seamlessly integrates with existing tools and infrastructure like Kubernetes, AWS, GCP, and Azure, enabling the smooth deployment of Ray clusters.
Ray and Data Science Workflow and Libraries
The concept of “data science” has evolved in recent years and can have different definitions. In simple terms, data science is about using data to gain insights and create practical applications. If we consider ML, then it involves a series of steps.
Data Processing–
Preparing the data for machine learning, if applicable. This step involves selecting and transforming the data to make it compatible with the machine learning model. Reliable tools can assist with this process.
Model Training-
Training machine learning algorithms using the processed data. Choosing the right algorithm for the task is crucial. Having a range of algorithm options can be beneficial.
Hyperparameter Tuning–
Fine-tuning parameters and hyperparameters during the model training process to optimize performance. Proper adjustment of these settings can significantly impact the effectiveness of the final model. Tools are available to assist with this optimization process.
Model Serving–
Deploying trained models to make them accessible for users who need them. This step involves making the models available through various means, such as using HTTP servers or specialized software packages designed for serving machine learning models.
Ray has developed specialized libraries for each of the four machine-learning steps mentioned earlier. These libraries are designed to work seamlessly with Ray and include the following.
Ray Datasets-
This library facilitates data processing tasks, allowing you to efficiently handle and manipulate datasets. It supports different file formats and store data as blocks rather than a single block. Best used for data processing transformation.
Run the following command to install this library.
pip install ‘ray[data]’
Ray Train-
Designed for distributed model training, this library enables you to train your machine-learning models across multiple nodes, improving efficiency and speed. Best used for model training.
Image Credits: Projectpro
Run the following command to install this library.
pip install ‘ray[train]’
Ray RLlib–
Specifically built for reinforcement learning workloads, this library provides tools and algorithms to develop and train RL models.
Ray Tune–
If you’re looking to optimize your model’s performance, Ray Tune is the library for efficient hyperparameter tuning. It helps you find the best combination of parameters to enhance your model’s accuracy.
Ray tune can parallelize and leverage multiple cores of GPU and multiple CPU cores. It optimizes the hyperparameter tuning cost by providing optimization algorithms. Best used for Model hyperparameter tuning.
Run the following command to install this library.
pip install ‘ray[tune]’
Ray Serve–
Once your models are trained, Ray Serve comes into play. It allows you to easily serve your models, making them accessible for predictions or other applications.
Run the following command to install this library.
pip install ‘ray[serve]’
Ray benefits Data Engineers and Scientists
Ray has made it easier for data scientists and machine learning practitioners to scale apps without having in-depth knowledge of infrastructure. It helps them in
- Parallelizing and distributing workloads- You can efficiently distribute your tasks across multiple nodes and GPUs, maximizing the utilization of computational resources.
- Easy access to cloud computing resources- Ray simplifies the configuration and utilization of cloud-based computing power, ensuring quick and convenient access.
- Native and extensible integrations- Ray seamlessly integrates with the machine learning ecosystem, providing you with a wide range of compatible tools and options for customization.
For distributed systems engineers, Ray handles critical processes automatically, including-
- Orchestration- Ray manages the various components of a distributed system, ensuring they work together seamlessly.
- Scheduling- It coordinates the execution of tasks, determining when and where they should be performed.
- Fault tolerance- Ray ensures that tasks are completed successfully, even in the face of failures or errors.
- Auto-scaling- It adjusts the allocation of resources based on dynamic demand, optimizing performance and efficiency.
In simple terms, Ray empowers data scientists and machine learning practitioners to scale their work without needing deep infrastructure knowledge, while offering distributed systems engineers automated management of crucial processes.
The Ray Ecosystem
Image Credits: Thenewstack
Ray’s universal framework acts as a bridge between the hardware you use (such as your laptop or a cloud service provider) and the programming libraries commonly used by data scientists. These libraries can include popular ones like PyTorch, Dask, Transformers (HuggingFace), XGBoost, or even Ray’s own built-in libraries like Ray Serve and Ray Tune.
Ray occupies a distinct position that addresses multiple problem areas.
The first problem Ray tackles is scaling Python code by efficiently managing resources such as servers, threads, or GPUs. It accomplishes this through essential components: a scheduler, distributed data storage, and an actor system. Ray’s scheduler is versatile and capable of handling not only traditional scalability challenges but also simple workflows. The actor system in Ray provides a straightforward method for managing a resilient distributed execution state. By combining these features, Ray operates as a responsive system, where its various components can adapt and respond to the surrounding environment.
Reasons Top Companies Are Looking For Python Ray
Below are significant reasons why companies working on ML platforms are using Ray.
A powerful tool supporting Distributed Computing Efficiently
With Ray, developers can easily define their app’s logic in Python. Ray’s flexibility lies in its support for both stateless computations (Tasks) and stateful computations (Actors). A shared Object Store simplifies inter-node communication.
You may like to know: Ruby Vs Python: Which One to Embrace in 2024?
This allows Ray to implement distributed patterns that are way beyond the concept of simple data parallelism, which involves running the same function on different parts of a dataset simultaneously. In case of the machine learning applications, Ray supports more complex patterns.
Image Credits: Anyscale
These capabilities allow developers to tackle a wide range of distributed computing challenges in machine learning applications using Ray.
An example that demonstrates the flexibility of Ray is the project called Alpa, developed by researchers from Google, AWS, UC Berkeley, Duke, and CMU for simplifying large deep-learning model training.
Sometimes a large model cannot fit on the same device like a GPU, this type of scaling requires partitioning a computation graph across multiple devices distributed on different servers. These devices perform different types of computations. This parallelism involves two types: inter-operator parallelism (assigning different operators to different devices) and intra-operator parallelism (splitting the same operator across multiple devices).
Image Credits: Anyscale
Alpa brings together different ways of doing multiple tasks at once by figuring out and doing the best ways to split up and do things both within and between steps. It does this automatically for really big deep-learning models that need lots of computing power.
To make all this work smoothly, the creators of Alpa picked Ray as the tool for spreading out the work across many computers. They went with Ray because of its capability to handle different ways of doing things at once and make sure the right tasks are done on the right computers. Ray is the perfect fit for Alpa because it helps it run big and complex deep-learning models efficiently and effectively across many computers.
Few lines of code for complex deployments
Ray Serve, also known as “Serve,” is a library designed to enable scalable model inference. It facilitates complex deployment scenarios including deploying multiple models simultaneously. This capability is becoming increasingly crucial as machine learning models are integrated into different apps and systems.
With Ray Serve, you can orchestrate multiple Ray actors, each responsible for providing inference for different models. It offers support for both batch inference, where predictions are made for multiple inputs at once, and online inference, where predictions are made in real time.
Ray Serve is capable of scaling to handle thousands of models in production, making it a reliable solution for large-scale inference deployments. It simplifies the process of deploying and managing models, allowing organizations to efficiently serve predictions for a wide range of applications and systems.
Efficiently scaling Diverse Workload
Ray’s scalability is a notable characteristic that brings significant benefits to organizations. A prime example is Instacart, which leverages Ray to drive its ML pipeline for large-scale completion. Ray empowers Instacart’s ML modelers by providing a user-friendly, efficient, and productive environment to harness the capabilities of expansive clusters.
With Ray, Instacart’s modelers can tap into the immense computational resources offered by large clusters effortlessly. Ray considers the entire cluster as a single pool of resources and handles the optimal mapping of computing tasks and actors to this pool. As a result, Ray effectively removes non-scalable elements from the system, such as rigidly partitioned task queues prevalent in Instacart’s legacy architecture.
By utilizing Ray, Instacart’s modelers can focus on running models on extensive datasets without needing to dive into the intricate details of managing computations across numerous machines. Ray simplifies the process, enabling them to scale their ML workflows seamlessly while handling the complexities behind the scenes.
Another biggest example is OpenAI.
Scaling Complex Computations
Ray is not only useful for distributed training, but it also appeals to users because it can handle various types of computations that are important for machine learning applications.
- Graph Computations: Ray has proven to be effective in large-scale graph computations. Companies like Bytedance and Ant Group have used Ray for projects involving knowledge graphs in different industries.
- Reinforcement Learning: Ray is widely used for reinforcement learning tasks in various domains such as recommender systems, industrial applications, and gaming, among others.
- Processing New Data Types: Ray is utilized by several companies to create customized tools for processing and managing new types of data, including images, video, and text. While existing data processing tools mostly focus on structured or semi-structured data, there is an increasing need for efficient solutions to handle unstructured data like text, images, video, and audio.
Supporting Heterogeneous Hardware
As machine learning (ML) and data processing tasks continue to grow rapidly, and the advancements in computer hardware are slowing down, hardware manufacturers are introducing more specialized hardware accelerators. This means that when we want to scale up our workloads, we need to develop distributed applications that can work with different types of hardware.
One of the great features of Ray is its ability to seamlessly support different hardware types. Developers can specify the hardware requirements for each task or actor they create. For example, they can say that one task needs 1 CPU, while an actor needs 2 CPUs and 1 Nvidia A100 GPU, all within the same application.
Uber provides an example of how this works in practice. They improved their deep learning pipeline’s performance by 50% by using a combination of 8 GPU nodes and 9 CPU nodes with various hardware configurations, compared to their previous setup that used 16 GPU nodes. This not only made their pipeline more efficient but also resulted in significant cost savings.
Image Credits: Anyscale
Use Cases of Ray
Below is the list of popular use cases of Ray for scaling machine learning.
Batch Interface
Batch inference involves making predictions with a machine learning model on a large amount of input data all at once. Ray for batch inference is compatible with any cloud provider and machine learning framework. It is designed to be fast and cost-effective for modern deep-learning applications. Whether you are using a single machine or a large cluster, Ray can scale your batch inference tasks with minimal code modifications. Ray is a Python-centric framework, making it simple to express and interactively develop your inference workloads.
Many Model Training
In machine learning scenarios like time series forecasting, it is often necessary to train multiple models on different subsets of the dataset. This approach is called “many model training.” Instead of training a single model on the entire dataset, many models are trained on smaller batches of data that correspond to different locations, products, or other factors.
When each individual model can fit on a single GPU, Ray can handle the training process efficiently. It assigns each training run to a separate task in Ray. This means that all the available workers can be utilized to run independent training sessions simultaneously, rather than having one worker process the jobs sequentially. This parallel approach helps to speed up the training process and make the most of the available computing resources.
Below is the data parallelism pattern for distributed training on large and complex datasets.
Image Credits: Ray
Model Serving
Ray Serve is a great tool for combining multiple machine-learning models and business logic to create a sophisticated inference service. You can use Python code to build this service, which makes it flexible and easy to work with.
Ray Serve supports advanced deployment patterns where you need to coordinate multiple Ray actors. These actors are responsible for performing inference on different models. Whether you need to handle batch processing or real-time inference, Ray Serve has got you covered. It is designed to handle large-scale production environments with thousands of models.
In simpler terms, Ray Serve allows you to create a powerful service that combines multiple machine-learning models and other code in Python. It can handle various types of inference tasks, and you can scale it to handle a large number of models in a production environment.
Hyperparameter Tuning
The Ray Tune library allows you to apply hyperparameter tuning algorithms to any parallel workload in Ray.
Hyperparameter tuning often involves running multiple experiments, and each experiment can be treated as an independent task. This makes it a suitable scenario for distributed computing. Ray Tune simplifies the process of distributing the optimization of hyperparameters across multiple resources. It provides useful features like saving the best results, optimizing the scheduling of experiments, and specifying different search patterns.
In simpler terms, Ray Tune helps you optimize the parameters of your machine-learning models by running multiple experiments in parallel. It takes care of distributing the workload efficiently and offers helpful features like saving the best results and managing the experiment schedule.
Distributed Training
The Ray Train library brings together various distributed training frameworks into a unified Trainer API, making it easier to manage and coordinate distributed training.
When it comes to training many models, a technique called model parallelism is used. It involves dividing a large model into smaller parts and training them on different machines simultaneously. Ray Train simplifies this process by providing convenient tools for distributing these model shards across multiple machines and running the training process in parallel.
Reinforcement Learning
RLlib is a free and open-source library designed for reinforcement learning (RL). It is specifically built to handle large-scale RL workloads in production environments. RLlib provides a unified and straightforward interface that can be used across a wide range of industries.
Many leading companies in various fields, such as climate control, industrial control, manufacturing and logistics, finance, gaming, automobile, robotics, boat design, and more, rely on RLlib for their RL applications. RLlib’s versatility makes it a popular choice for implementing RL algorithms in different domains.
In simpler terms, the Ray Train library makes it simple to manage distributed training by combining different frameworks into one easy-to-use interface. It also supports training multiple models at once by dividing the models into smaller parts and training them simultaneously on different machines.
Experience Blazing-fast Python Distributed Computing with Ray
Ray’s powerful capabilities in distributed computing and parallelization revolutionize the way applications are built. With Ray, you can leverage the speed and scalability of distributed computing to develop high-performance Python applications with ease.
OnGraph, a leading technology company, brings its expertise and dedication to help you make the most of Ray’s potential. OnGraph enables you to develop cutting-edge applications that deliver unparalleled performance and user experiences.
With OnGraph, you can confidently embark on a journey toward creating transformative applications that shape the future of technology.