Metadata-Version: 2.1
Name: sbx-rl
Version: 0.10.0
Summary: Jax version of Stable Baselines, implementations of reinforcement learning algorithms.
Home-page: https://github.com/araffin/sbx
Author: Antonin Raffin
Author-email: antonin.raffin@dlr.de
License: MIT
Keywords: reinforcement-learning-algorithms reinforcement-learning machine-learning gym openai stable baselines toolbox python data-science
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
License-File: NOTICE
Requires-Dist: stable_baselines3>=2.3.0a1
Requires-Dist: jax
Requires-Dist: jaxlib
Requires-Dist: flax
Requires-Dist: optax; python_version >= "3.9.0"
Requires-Dist: optax<0.1.8; python_version < "3.9.0"
Requires-Dist: tqdm
Requires-Dist: rich
Requires-Dist: tensorflow_probability
Provides-Extra: tests
Requires-Dist: pytest; extra == "tests"
Requires-Dist: pytest-cov; extra == "tests"
Requires-Dist: pytest-env; extra == "tests"
Requires-Dist: pytest-xdist; extra == "tests"
Requires-Dist: mypy; extra == "tests"
Requires-Dist: ruff; extra == "tests"
Requires-Dist: black; extra == "tests"



# Stable Baselines Jax (SB3 + JAX = SBX)

See https://github.com/araffin/sbx

Proof of concept version of [Stable-Baselines3](https://github.com/DLR-RM/stable-baselines3) in Jax.

Implemented algorithms:
- [Soft Actor-Critic (SAC)](https://arxiv.org/abs/1801.01290) and [SAC-N](https://arxiv.org/abs/2110.01548)
- [Truncated Quantile Critics (TQC)](https://arxiv.org/abs/2005.04269)
- [Dropout Q-Functions for Doubly Efficient Reinforcement Learning (DroQ)](https://openreview.net/forum?id=xCVJMsPv3RT)
- [Proximal Policy Optimization (PPO)](https://arxiv.org/abs/1707.06347)
- [Deep Q Network (DQN)](https://arxiv.org/abs/1312.5602)
- [Twin Delayed DDPG (TD3)](https://arxiv.org/abs/1802.09477)
- [Deep Deterministic Policy Gradient (DDPG)](https://arxiv.org/abs/1509.02971)

## Example

```python
from sbx import TQC, DroQ, SAC, DQN, PPO, TD3, DDPG

model = TQC("MlpPolicy", "Pendulum-v1", verbose=1)
model.learn(total_timesteps=10_000, progress_bar=True)

