ray.rllib.env.single_agent_episode.SingleAgentEpisode.set_actions#

SingleAgentEpisode.set_actions(*, new_data, at_indices: int | slice | List[int] | None = None, neg_index_as_lookback: bool = False) None[source]#

Overwrites all or some of this Episode’s actions with the provided data.

Note that an episode’s action data cannot be written to directly as it is managed by a InfiniteLookbackBuffer object. Normally, individual, current actions are added to the episode either by calling self.add_env_step or more directly (and manually) via self.actions.append|extend(). However, for certain postprocessing steps, the entirety (or a slice) of an episode’s actions might have to be rewritten, which is when self.set_actions() should be used.

Parameters:
  • new_data – The new action data to overwrite existing data with. This may be a list of individual action(s) in case this episode is still not numpy’ized yet. In case this episode has already been numpy’ized, this should be (possibly complex) struct matching the action space and with a batch size of its leafs exactly the size of the to-be-overwritten slice or segment (provided by at_indices).

  • at_indices – A single int is interpreted as one index, which to overwrite with new_data (which is expected to be a single action). A list of ints is interpreted as a list of indices, all of which to overwrite with new_data (which is expected to be of the same size as len(at_indices)). A slice object is interpreted as a range of indices to be overwritten with new_data (which is expected to be of the same size as the provided slice). Thereby, negative indices by default are interpreted as “before the end” unless the neg_index_as_lookback=True option is used, in which case negative indices are interpreted as “before ts=0”, meaning going back into the lookback buffer.

  • neg_index_as_lookback – If True, negative values in at_indices are interpreted as “before ts=0”, meaning going back into the lookback buffer. For example, an episode with actions = [4, 5, 6, 7, 8, 9], where [4, 5, 6] is the lookback buffer range (ts=0 item is 7), will handle a call to set_actions(individual_action, -1, neg_index_as_lookback=True) by overwriting the value of 6 in our actions buffer with the provided “individual_action”.

Raises:

IndexError – If the provided at_indices do not match the size of new_data.