ray.rllib.env.single_agent_episode.SingleAgentEpisode.add_env_step#

SingleAgentEpisode.add_env_step(observation: gymnasium.core.ObsType, action: gymnasium.core.ActType, reward: SupportsFloat, infos: Dict[str, Any] | None = None, *, terminated: bool = False, truncated: bool = False, extra_model_outputs: Dict[str, Any] | None = None) None[source]#

Adds results of an env.step() call (including the action) to this episode.

This data consists of an observation and info dict, an action, a reward, terminated/truncated flags, and extra model outputs (e.g. action probabilities or RNN internal state outputs).

Parameters:
  • observation – The next observation received from the environment after(!) taking action.

  • action – The last action used by the agent during the call to env.step().

  • reward – The last reward received by the agent after taking action.

  • infos – The last info received from the environment after taking action.

  • terminated – A boolean indicating, if the environment has been terminated (after taking action).

  • truncated – A boolean indicating, if the environment has been truncated (after taking action).

  • extra_model_outputs – The last timestep’s specific model outputs. These are normally outputs of an RLModule that were computed along with action, e.g. action_logp or action_dist_inputs.