ray.rllib.env.single_agent_episode.SingleAgentEpisode.add_env_step#
- SingleAgentEpisode.add_env_step(observation: gymnasium.core.ObsType, action: gymnasium.core.ActType, reward: SupportsFloat, infos: Dict[str, Any] | None = None, *, terminated: bool = False, truncated: bool = False, extra_model_outputs: Dict[str, Any] | None = None) None [source]#
Adds results of an
env.step()
call (including the action) to this episode.This data consists of an observation and info dict, an action, a reward, terminated/truncated flags, and extra model outputs (e.g. action probabilities or RNN internal state outputs).
- Parameters:
observation – The next observation received from the environment after(!) taking
action
.action – The last action used by the agent during the call to
env.step()
.reward – The last reward received by the agent after taking
action
.infos – The last info received from the environment after taking
action
.terminated – A boolean indicating, if the environment has been terminated (after taking
action
).truncated – A boolean indicating, if the environment has been truncated (after taking
action
).extra_model_outputs – The last timestep’s specific model outputs. These are normally outputs of an RLModule that were computed along with
action
, e.g.action_logp
oraction_dist_inputs
.