ray.rllib.env.single_agent_episode.SingleAgentEpisode.__init__#
- SingleAgentEpisode.__init__(id_: str | None = None, *, observations: List[gymnasium.core.ObsType] | InfiniteLookbackBuffer | None = None, observation_space: gymnasium.Space | None = None, infos: List[Dict] | InfiniteLookbackBuffer | None = None, actions: List[gymnasium.core.ActType] | InfiniteLookbackBuffer | None = None, action_space: gymnasium.Space | None = None, rewards: List[SupportsFloat] | InfiniteLookbackBuffer | None = None, terminated: bool = False, truncated: bool = False, extra_model_outputs: Dict[str, Any] | None = None, t_started: int | None = None, len_lookback_buffer: int | str = 'auto', agent_id: Any | None = None, module_id: str | None = None, multi_agent_episode_id: int | None = None)[source]#
Initializes a SingleAgentEpisode instance.
This constructor can be called with or without already sampled data, part of which might then go into the lookback buffer.
- Parameters:
id – Unique identifier for this episode. If no ID is provided the constructor generates a unique hexadecimal code for the id.
observations – Either a list of individual observations from a sampling or an already instantiated
InfiniteLookbackBuffer
object (possibly with observation data in it). If a list, will construct the buffer automatically (given the data and thelen_lookback_buffer
argument).observation_space – An optional gym.Space, which all individual observations should abide to. If not None and this SingleAgentEpisode is numpy’ized (via the
self.to_numpy()
method), and data is appended or set, the new data will be checked for correctness.infos – Either a list of individual info dicts from a sampling or an already instantiated
InfiniteLookbackBuffer
object (possibly with info dicts in it). If a list, will construct the buffer automatically (given the data and thelen_lookback_buffer
argument).actions – Either a list of individual info dicts from a sampling or an already instantiated
InfiniteLookbackBuffer
object (possibly with info dict] data in it). If a list, will construct the buffer automatically (given the data and thelen_lookback_buffer
argument).action_space – An optional gym.Space, which all individual actions should abide to. If not None and this SingleAgentEpisode is numpy’ized (via the
self.to_numpy()
method), and data is appended or set, the new data will be checked for correctness.rewards – Either a list of individual rewards from a sampling or an already instantiated
InfiniteLookbackBuffer
object (possibly with reward data in it). If a list, will construct the buffer automatically (given the data and thelen_lookback_buffer
argument).extra_model_outputs – A dict mapping string keys to either lists of individual extra model output tensors (e.g.
action_logp
orstate_outs
) from a sampling or to already instantiatedInfiniteLookbackBuffer
object (possibly with extra model output data in it). If mapping is to lists, will construct the buffers automatically (given the data and thelen_lookback_buffer
argument).terminated – A boolean indicating, if the episode is already terminated.
truncated – A boolean indicating, if the episode has been truncated.
t_started – Optional. The starting timestep of the episode. The default is zero. If data is provided, the starting point is from the last observation onwards (i.e.
t_started = len(observations) - 1
). If this parameter is provided the episode starts at the provided value.len_lookback_buffer – The size of the (optional) lookback buffers to keep in front of this Episode for each type of data (observations, actions, etc..). If larger 0, will interpret the first
len_lookback_buffer
items in each type of data as NOT part of this actual episode chunk, but instead serve as “historical” record that may be viewed and used to derive new data from. For example, it might be necessary to have a lookback buffer of four if you would like to do observation frame stacking and your episode has been cut and you are now operating on a new chunk (continuing from the cut one). Then, for the first 3 items, you would have to be able to look back into the old chunk’s data. Iflen_lookback_buffer
is “auto” (default), will interpret all provided data in the constructor as part of the lookback buffers.agent_id – An optional AgentID indicating which agent this episode belongs to. This information is stored under
self.agent_id
and only serves reference purposes.module_id – An optional ModuleID indicating which RLModule this episode belongs to. Normally, this information is obtained by querying an
agent_to_module_mapping_fn
with a given agent ID. This information is stored underself.module_id
and only serves reference purposes.multi_agent_episode_id – An optional EpisodeID of the encapsulating
MultiAgentEpisode
that thisSingleAgentEpisode
belongs to.