TODO: Look into device management with wrappers

TODO: Fix episode_len_min and episode_reward_min to output total reward in absence of finished episodes


TODO: Implement ugae  that works for arbitrary episode lengths

TODO: Integrate Action/Observation abstractions with gym.spaces

TODO: Add a trainer with tensorboard test. Maybe a "full training" test?