ray.rllib.env.multi_agent_episode.MultiAgentEpisode.slice#
- MultiAgentEpisode.slice(slice_: slice, *, len_lookback_buffer: int | None = None) MultiAgentEpisode [source]#
Returns a slice of this episode with the given slice object.
Works analogous to
slice()
However, the important differences are: -
slice_
is provided in (global) env steps, not agent steps. - In caseslice_
ends - for a certain agent - in an env step, where that particular agent does not have an observation, the previous observation will be included, but the next action and sum of rewards until this point will be stored in the agent’s hanging values caches for the returned MultiAgentEpisode slice.from ray.rllib.env.multi_agent_episode import MultiAgentEpisode from ray.rllib.utils.test_utils import check # Generate a simple multi-agent episode. observations = [ {"a0": 0, "a1": 0}, # 0 { "a1": 1}, # 1 { "a1": 2}, # 2 {"a0": 3, "a1": 3}, # 3 {"a0": 4}, # 4 ] # Actions are the same as observations (except for last obs, which doesn't # have an action). actions = observations[:-1] # Make up a reward for each action. rewards = [ {aid: r / 10 + 0.1 for aid, r in o.items()} for o in observations ] episode = MultiAgentEpisode( observations=observations, actions=actions, rewards=rewards, len_lookback_buffer=0, ) # Slice the episode and check results. slice = episode[1:3] a0 = slice.agent_episodes["a0"] a1 = slice.agent_episodes["a1"] check((a0.observations, a1.observations), ([3], [1, 2, 3])) check((a0.actions, a1.actions), ([], [1, 2])) check((a0.rewards, a1.rewards), ([], [0.2, 0.3])) check((a0.is_done, a1.is_done), (False, False)) # If a slice ends in a "gap" for an agent, expect actions and rewards to be # cached for this agent. slice = episode[:2] a0 = slice.agent_episodes["a0"] check(a0.observations, [0]) check(a0.actions, []) check(a0.rewards, []) check(slice._hanging_actions_end["a0"], 0) check(slice._hanging_rewards_end["a0"], 0.1)
- Parameters:
slice – The slice object to use for slicing. This should exclude the lookback buffer, which will be prepended automatically to the returned slice.
len_lookback_buffer – If not None, forces the returned slice to try to have this number of timesteps in its lookback buffer (if available). If None (default), tries to make the returned slice’s lookback as large as the current lookback buffer of this episode (
self
).
- Returns:
The new MultiAgentEpisode representing the requested slice.