Metadata-Version: 2.1
Name: pyforce-rl
Version: 0.0.10
Summary: PyForce - A simple reinforcement learning library
Home-page: UNKNOWN
Author: Ole Meyer
Author-email: dev@olemeyer.com
License: UNKNOWN
Platform: UNKNOWN
Description-Content-Type: text/markdown
Requires-Dist: tqdm
Requires-Dist: gym
Requires-Dist: numpy
Requires-Dist: torch


<p  align="center"><img  src="https://docs.google.com/drawings/d/e/2PACX-1vQTqWkqIzSlT3zldytD8L0kj6MZVpE_5ZslrDAvMhLEG-anipK2UPJuHm3ImGhVVTVYiZrsbTlKf3Yo/pub?w=756&h=265"  height="200px"  /></p>



[![Status](https://img.shields.io/badge/status-active-success.svg)]()






</div>





---









## 🧐 About <a name = "about"></a>

A simple and modular reinforcement learning library based on PyTorch.


## 🏁 Getting Started <a name = "getting_started"></a>





    pip install pyforce-rl




## 🎈 Usage <a name="usage"></a>



```python
from pyforce.env import wrap_openai_gym
from pyforce.nn import default_network_components
from pyforce.agents import PPOAgent
import gym
import torch

device="cuda:0" if torch.cuda.is_available() else "cpu"

env=wrap_openai_gym(gym.make("LunarLanderContinuous-v2"))

observation_processor,hidden_layers,action_mapper=default_network_components(env)

agent=PPOAgent(
    observation_processor,
    hidden_layers,
    action_mapper,
    save_path="./evals/ppo_example",
    value_lr=5e-4,
    policy_lr=5e-4
).to(device)

agent.train(env,episodes=1000,train_freq=2048,eval_freq=50,render=True, batch_size=128,gamma=.99,tau=.95,clip=.2,n_steps=32,entropy_coef=.01)
```




## 🚀 Implement custom RL Agents <a name = "deployment"></a>



```python
from pyforce.agents.base import BaseAgent
from torch import nn

class  MyAgent(BaseAgent):

def  __init__(self,observationprocessor,hiddenlayers,actionmapper,save_path=None):

	super().__init__(save_path)

	self.policy_net = nn.Sequential(observationprocessor, hiddenlayers, actionmapper)
	self.value_net = ...

def  forward(self, state):
	return  self.policy_net(state)

def  get_action(self, state, eval, args):
	#return action + possible additional information to be stored in the memory
	return  self(state).sample(), {} 

def  after_step(self, done, eval, args):
	if  not  eval:
		if  self.env_steps % args["train_freq"] == 0 and len(self.memory) > 0:
			#do training

	if done and eval:
		#do evaluation
```


## ⛏️ Built Using <a name = "built_using"></a>





-  [PyTorch](https://pytorch.com/) - ML Framework



-  [OpenAI Gym](https://gym.openai.com/) - Environment API



-  [NumPy](https://numpy.org/) - Numerical calculations outside PyTorch



<!--



## ✍️ Authors <a name = "authors"></a>





-  [@olemeyer](https://github.com/olemeyer)





See also the list of [contributors](https://github.com/kylelobo/The-Documentation-Compendium/contributors) who participated in this project.





## 🎉 Acknowledgements <a name = "acknowledgement"></a>





-  [Cherry-RL](http://cherry-rl.net/) & [Keras-RL](https://keras-rl.readthedocs.io/en/latest/) for Inspiration

-->


