ray.rllib.env.multi_agent_episode.MultiAgentEpisode.add_env_step#

MultiAgentEpisode.add_env_step(observations: Dict[Any, Any], actions: Dict[Any, Any], rewards: Dict[Any, Any], infos: Dict[Any, Any] | None = None, *, terminateds: Dict[Any, Any] | None = None, truncateds: Dict[Any, Any] | None = None, extra_model_outputs: Dict[Any, Any] | None = None) None[source]#

Adds a timestep to the episode.

Parameters:
  • observations – A dictionary mapping agent IDs to their corresponding next observations. Note that some agents may not have stepped at this timestep.

  • actions – Mandatory. A dictionary mapping agent IDs to their corresponding actions. Note that some agents may not have stepped at this timestep.

  • rewards – Mandatory. A dictionary mapping agent IDs to their corresponding observations. Note that some agents may not have stepped at this timestep.

  • infos – A dictionary mapping agent IDs to their corresponding info. Note that some agents may not have stepped at this timestep.

  • terminateds – A dictionary mapping agent IDs to their terminated flags, indicating, whether the environment has been terminated for them. A special __all__ key indicates that the episode is terminated for all agent IDs.

  • terminateds – A dictionary mapping agent IDs to their truncated flags, indicating, whether the environment has been truncated for them. A special __all__ key indicates that the episode is truncated for all agent IDs.

  • extra_model_outputs – A dictionary mapping agent IDs to their corresponding specific model outputs (also in a dictionary; e.g. vf_preds for PPO).