ray.rllib.env.multi_agent_episode.MultiAgentEpisode.get_extra_model_outputs#
- MultiAgentEpisode.get_extra_model_outputs(key: str | None = None, indices: int | slice | List[int] | None = None, agent_ids: Collection[Any] | Any | None = None, *, env_steps: bool = True, neg_index_as_lookback: bool = False, fill: Any | None = None, return_list: bool = False) Dict[Any, Any] | List[Dict[Any, Any]] [source]#
Returns agents’ actions or batched ranges thereof from this episode.
- Parameters:
key – The
key
within each agents’ extra_model_outputs dict to extract data for. If None, return data of all extra model output keys.indices – A single int is interpreted as an index, from which to return the individual extra model outputs stored at this index. A list of ints is interpreted as a list of indices from which to gather individual extra model outputs in a batch of size len(indices). A slice object is interpreted as a range of extra model outputs to be returned. Thereby, negative indices by default are interpreted as “before the end” unless the
neg_index_as_lookback=True
option is used, in which case negative indices are interpreted as “before ts=0”, meaning going back into the lookback buffer. If None, will return all extra model outputs (from ts=0 to the end).agent_ids – An optional collection of AgentIDs or a single AgentID to get extra model outputs for. If None, will return extra model outputs for all agents in this episode.
env_steps – Whether
indices
should be interpreted as environment time steps (True) or per-agent timesteps (False).neg_index_as_lookback – If True, negative values in
indices
are interpreted as “before ts=0”, meaning going back into the lookback buffer. For example, an episode with agent A’s actions [4, 5, 6, 7, 8, 9], where [4, 5, 6] is the lookback buffer range (ts=0 item is 7), will respond toget_actions(-1, agent_ids=[A], neg_index_as_lookback=True)
with {A:6
} and toget_actions(slice(-2, 1), agent_ids=[A], neg_index_as_lookback=True)
with {A:[5, 6, 7]
}.fill – An optional value to use for filling up the returned results at the boundaries. This filling only happens if the requested index range’s start/stop boundaries exceed the episode’s boundaries (including the lookback buffer on the left side). This comes in very handy, if users don’t want to worry about reaching such boundaries and want to zero-pad. For example, an episode with agent A’ actions [10, 11, 12, 13, 14] and lookback buffer size of 2 (meaning actions
10
and11
are part of the lookback buffer) will respond toget_actions(slice(-7, -2), agent_ids=[A], fill=0.0)
with{A: [0.0, 0.0, 10, 11, 12]}
.one_hot_discrete – If True, will return one-hot vectors (instead of int-values) for those sub-components of a (possibly complex) observation space that are Discrete or MultiDiscrete. Note that if
fill=0
and the requestedindices
are out of the range of our data, the returned one-hot vectors will actually be zero-hot (all slots zero).return_list – Whether to return a list of multi-agent dicts (instead of a single multi-agent dict of lists/structs). False by default. This option can only be used when
env_steps
is True due to the fact the such a list can only be interpreted as one env step per list item (would not work with agent steps).
- Returns:
A dictionary mapping agent IDs to actions (at the given
indices
). Ifenv_steps
is True, only agents that have stepped (were ready) at the given env stepindices
are returned (i.e. not all agent IDs are necessarily in the keys). Ifreturn_list
is True, returns a list of MultiAgentDicts (mapping agent IDs to extra_model_outputs) instead.