Scalable Reinforcement Learning

Scalable Reinforcement Learning in Production

Finding success with reinforcement learning (RL) is not easy. RL tooling hasn’t historically kept pace with the demands and constraints of those wanting to use it. Even with ready-made frameworks, failure is common when crossing over into production due to their rigidity, lack of speed, limited ecosystems, and operational overhead.

Sign Up for Anyscale Access

Anyscale helps you go beyond existing reinforcement limitations with Ray and RLlib, an open source, easy-to-use, distributed computing library for Python that:

Includes over 25 state-of-the-art algorithms that can be converted into TensorFlow and Pytorch

Covers subcategories including model-based, model-free, and Offline RL

Almost all RLlib algorithms can learn in multi-agent mode.

Can handle complex, heterogeneous applications

Learn More

Develop on your laptop and then scale the same Python code elastically across hundreds of nodes or GPUs on any cloud — with no changes.

Tackle scaled reinforcement learning with Ray

RLlib Is the Best Way to Do Reinforcement Learning

Data Loading

Existing RL solutions force developers to switch frameworks or disjointly glue RL systems with other tools. Avoid that with the Ray ecosystem. Find the perfect set of hyperparameters using Ray Tune or serve your trained model in a massively parallel way with Ray Serve.

Effortlessly scale all workloads from data loading to training to hyperparamer tuning, to reinforcement learning and model serving. Learn more about all capabilities and the Ray AI Runtime (AIR).

Organizations globally are using Ray and Anyscale for diverse solutions from recommendation systems, to supply-chain logistics optimization to pricing optimization, virtual environment simulations, and more.

Speed & Efficiency

Experience fast training and policy evaluation with lower overhead than most other algorithms.

Production Readiness

Iterate quickly without needing to rewrite again to go to production or scale to a large cluster.

Distributed RL, Simplified

RLlib algorithm implementations (such as our “APPO” or “APEX”) allow you to run workloads on hundreds of CPUs, GPUs, or nodes in parallel to speed up learning.

Environments to Meet Your Needs

RLlib works with several types of environments, including OpenAI Gym, user-defined, multi-agent, and batched environments.

External Simulators

RLlib supports an external environment API and comes with a pluggable, off-the-shelve client/ server setup to run hundreds of independent simulators on the “outside,” connecting to a central RLlib Policy-Server that learns and serves actionas.

Offline RL and Imitation Learning/ Behavior Cloning

RLlib’s comes with several offline RL algorithms (e.g., CQL, MARWIL, and DQfD), allowing you to either purely behavior-clone your existing system or learn how to further improve it.

Unmatched Algorithm Selection

With more than double the amount of any other library, RLlib allows teams to quickly iterate and test SOTA algorithms so you can get to the best options faster without having to worry about building and maintaining your own.

Emiliano Castro | Principal Data Scientist

"Ray and Anyscale have enabled us to quickly develop, test and deploy a new in-game offer recommendation engine based on reinforcement learning, and subsequently serve those offers 3X faster in production. This resulted in revenue lift and a better gaming experience."

Greg Brockman | Co-founder, Chairman, and President, OpenAI

"At OpenAI, we are tackling some of the world’s most complex and demanding computational problems. Ray powers our solutions to the thorniest of these problems and allows us to iterate at scale much faster than we could before. As an example, we use Ray to train our largest models, including ChatGPT."

See All →

Leading organizations today are already using reinforcement learning to create next-gen recommendation systems, create better gaming experiences, optimize industrial environments and more thanks to RLlib and Anyscale.