ray.rllib.env.multi_agent_episode.MultiAgentEpisode.add_env_step#
- MultiAgentEpisode.add_env_step(observations: Dict[Any, Any], actions: Dict[Any, Any], rewards: Dict[Any, Any], infos: Dict[Any, Any] | None = None, *, terminateds: Dict[Any, Any] | None = None, truncateds: Dict[Any, Any] | None = None, extra_model_outputs: Dict[Any, Any] | None = None) None [source]#
Adds a timestep to the episode.
- Parameters:
observations – A dictionary mapping agent IDs to their corresponding next observations. Note that some agents may not have stepped at this timestep.
actions – Mandatory. A dictionary mapping agent IDs to their corresponding actions. Note that some agents may not have stepped at this timestep.
rewards – Mandatory. A dictionary mapping agent IDs to their corresponding observations. Note that some agents may not have stepped at this timestep.
infos – A dictionary mapping agent IDs to their corresponding info. Note that some agents may not have stepped at this timestep.
terminateds – A dictionary mapping agent IDs to their
terminated
flags, indicating, whether the environment has been terminated for them. A special__all__
key indicates that the episode is terminated for all agent IDs.terminateds – A dictionary mapping agent IDs to their
truncated
flags, indicating, whether the environment has been truncated for them. A special__all__
key indicates that the episode istruncated
for all agent IDs.extra_model_outputs – A dictionary mapping agent IDs to their corresponding specific model outputs (also in a dictionary; e.g.
vf_preds
for PPO).