ray.rllib.algorithms.algorithm_config.AlgorithmConfig.offline_data#
- AlgorithmConfig.offline_data(*, input_: str | ~typing.Callable[[~ray.rllib.offline.io_context.IOContext], ~ray.rllib.offline.input_reader.InputReader] | None = <ray.rllib.utils.from_config._NotProvided object>, offline_data_class: ~typing.Type | None = <ray.rllib.utils.from_config._NotProvided object>, input_read_method: str | ~typing.Callable | None = <ray.rllib.utils.from_config._NotProvided object>, input_read_method_kwargs: ~typing.Dict | None = <ray.rllib.utils.from_config._NotProvided object>, input_read_schema: ~typing.Dict[str, str] | None = <ray.rllib.utils.from_config._NotProvided object>, input_read_episodes: bool | None = <ray.rllib.utils.from_config._NotProvided object>, input_read_sample_batches: bool | None = <ray.rllib.utils.from_config._NotProvided object>, input_read_batch_size: int | None = <ray.rllib.utils.from_config._NotProvided object>, input_filesystem: str | None = <ray.rllib.utils.from_config._NotProvided object>, input_filesystem_kwargs: ~typing.Dict | None = <ray.rllib.utils.from_config._NotProvided object>, input_compress_columns: ~typing.List[str] | None = <ray.rllib.utils.from_config._NotProvided object>, materialize_data: bool | None = <ray.rllib.utils.from_config._NotProvided object>, materialize_mapped_data: bool | None = <ray.rllib.utils.from_config._NotProvided object>, map_batches_kwargs: ~typing.Dict | None = <ray.rllib.utils.from_config._NotProvided object>, iter_batches_kwargs: ~typing.Dict | None = <ray.rllib.utils.from_config._NotProvided object>, prelearner_class: ~typing.Type | None = <ray.rllib.utils.from_config._NotProvided object>, prelearner_buffer_class: ~typing.Type | None = <ray.rllib.utils.from_config._NotProvided object>, prelearner_buffer_kwargs: ~typing.Dict | None = <ray.rllib.utils.from_config._NotProvided object>, prelearner_module_synch_period: int | None = <ray.rllib.utils.from_config._NotProvided object>, dataset_num_iters_per_learner: int | None = <ray.rllib.utils.from_config._NotProvided object>, input_config: ~typing.Dict | None = <ray.rllib.utils.from_config._NotProvided object>, actions_in_input_normalized: bool | None = <ray.rllib.utils.from_config._NotProvided object>, postprocess_inputs: bool | None = <ray.rllib.utils.from_config._NotProvided object>, shuffle_buffer_size: int | None = <ray.rllib.utils.from_config._NotProvided object>, output: str | None = <ray.rllib.utils.from_config._NotProvided object>, output_config: ~typing.Dict | None = <ray.rllib.utils.from_config._NotProvided object>, output_compress_columns: ~typing.List[str] | None = <ray.rllib.utils.from_config._NotProvided object>, output_max_file_size: float | None = <ray.rllib.utils.from_config._NotProvided object>, output_max_rows_per_file: int | None = <ray.rllib.utils.from_config._NotProvided object>, output_write_remaining_data: bool | None = <ray.rllib.utils.from_config._NotProvided object>, output_write_method: str | None = <ray.rllib.utils.from_config._NotProvided object>, output_write_method_kwargs: ~typing.Dict | None = <ray.rllib.utils.from_config._NotProvided object>, output_filesystem: str | None = <ray.rllib.utils.from_config._NotProvided object>, output_filesystem_kwargs: ~typing.Dict | None = <ray.rllib.utils.from_config._NotProvided object>, output_write_episodes: bool | None = <ray.rllib.utils.from_config._NotProvided object>, offline_sampling: str | None = <ray.rllib.utils.from_config._NotProvided object>) AlgorithmConfig[source]#
- Sets the config’s offline data settings. - Parameters:
- input – Specify how to generate experiences: - “sampler”: Generate experiences via online (env) simulation (default). - A local directory or file glob expression (e.g., “/tmp/.json”). - A list of individual file paths/URIs (e.g., [“/tmp/1.json”, “s3://bucket/2.json”]). - A dict with string keys and sampling probabilities as values (e.g., {“sampler”: 0.4, “/tmp/.json”: 0.4, “s3://bucket/expert.json”: 0.2}). - A callable that takes an - IOContextobject as only arg and returns a- ray.rllib.offline.InputReader. - A string key that indexes a callable with- tune.registry.register_input
- offline_data_class – An optional - OfflineDataclass that is used to define the offline data pipeline, including the dataset and the sampling methodology. Override the- OfflineDataclass and pass your derived class here, if you need some primer transformations specific to your data or your loss. Usually overriding the- OfflinePreLearnerand using the resulting customization via- prelearner_classsuffices for most cases. The default is- Nonewhich uses the base- OfflineDatadefined in- ray.rllib.offline.offline_data.OfflineData.
- input_read_method – Read method for the - ray.data.Datasetto read in the offline data from- input_. The default is- read_parquetfor Parquet files. See https://docs.ray.io/en/latest/data/api/input_output.html for more info about available read methods in- ray.data.
- input_read_method_kwargs – Keyword args for - input_read_method. These are passed by RLlib into the read method without checking. Use these keyword args together with- map_batches_kwargsand- iter_batches_kwargsto tune the performance of the data pipeline. It is strongly recommended to rely on Ray Data’s automatic read performance tuning.
- input_read_schema – Table schema for converting offline data to episodes. This schema maps the offline data columns to ray.rllib.core.columns.Columns: - {Columns.OBS: 'o_t', Columns.ACTIONS: 'a_t', ...}. Columns in the data set that are not mapped via this schema are sorted into episodes’- extra_model_outputs. If no schema is passed in the default schema used is- ray.rllib.offline.offline_data.SCHEMA. If your data set contains already the names in this schema, no- input_read_schemais needed. The same applies if the data is in RLlib’s- EpisodeTypeor its old- SampleBatchformat.
- input_read_episodes – Whether offline data is already stored in RLlib’s - EpisodeTypeformat, i.e.- ray.rllib.env.SingleAgentEpisode(multi -agent is planned but not supported, yet). Reading episodes directly avoids additional transform steps and is usually faster and therefore the recommended format when your application remains fully inside of RLlib’s schema. The other format is a columnar format and is agnostic to the RL framework used. Use the latter format, if you are unsure when to use the data or in which RL framework. The default is to read column data, for example,- False.- input_read_episodes, and- input_read_sample_batchescan’t be- Trueat the same time. See also- output_write_episodesto define the output data format when recording.
- input_read_sample_batches – Whether offline data is stored in RLlib’s old stack - SampleBatchtype. This is usually the case for older data recorded with RLlib in JSON line format. Reading in- SampleBatchdata needs extra transforms and might not concatenate episode chunks contained in different- SampleBatch`es in the data. If possible avoid to read `SampleBatch`es and convert them in a controlled form into RLlib's `EpisodeType(i.e.- SingleAgentEpisode). The default is- False.- input_read_episodes, and- input_read_sample_batchescan’t be- Trueat the same time.
- input_read_batch_size – Batch size to pull from the data set. This could differ from the - train_batch_size_per_learner, if a dataset holds- EpisodeType(i.e.,- SingleAgentEpisode) or- SampleBatch, or any other data type that contains multiple timesteps in a single row of the dataset. In such cases a single batch of size- train_batch_size_per_learnerwill potentially pull a multiple of- train_batch_size_per_learnertimesteps from the offline dataset. The default is- Nonein which the- train_batch_size_per_learneris pulled.
- input_filesystem – A cloud filesystem to handle access to cloud storage when reading experiences. Can be either “gcs” for Google Cloud Storage, “s3” for AWS S3 buckets, “abs” for Azure Blob Storage, or any filesystem supported by PyArrow. In general the file path is sufficient for accessing data from public or local storage systems. See https://arrow.apache.org/docs/python/filesystems.html for details. 
- input_filesystem_kwargs – A dictionary holding the kwargs for the filesystem given by - input_filesystem. See- gcsfs.GCSFilesystemfor GCS,- pyarrow.fs.S3FileSystem, for S3, and- ablfs.AzureBlobFilesystemfor ABS filesystem arguments.
- input_compress_columns – What input columns are compressed with LZ4 in the input data. If data is stored in RLlib’s - SingleAgentEpisode(- MultiAgentEpisodenot supported, yet). Note the providing- rllib.core.columns.Columns.OBSalso tries to decompress- rllib.core.columns.Columns.NEXT_OBS.
- materialize_data – Whether the raw data should be materialized in memory. This boosts performance, but requires enough memory to avoid an OOM, so make sure that your cluster has the resources available. For very large data you might want to switch to streaming mode by setting this to - False(default). If your algorithm does not need the RLModule in the Learner connector pipeline or all (learner) connectors are stateless you should consider setting- materialize_mapped_datato- Trueinstead (and set- materialize_datato- False). If your data does not fit into memory and your Learner connector pipeline requires an RLModule or is stateful, set both- materialize_dataand- materialize_mapped_datato- False.
- materialize_mapped_data – Whether the data should be materialized after running it through the Learner connector pipeline (i.e. after running the - OfflinePreLearner). This improves performance, but should only be used in case the (learner) connector pipeline does not require an RLModule and the (learner) connector pipeline is stateless. For example, MARWIL’s Learner connector pipeline requires the RLModule for value function predictions and training batches would become stale after some iterations causing learning degradation or divergence. Also ensure that your cluster has enough memory available to avoid an OOM. If set to- True(True), make sure that- materialize_datais set to- Falseto avoid materialization of two datasets. If your data does not fit into memory and your Learner connector pipeline requires an RLModule or is stateful, set both- materialize_dataand- materialize_mapped_datato- False.
- map_batches_kwargs – Keyword args for the - map_batchesmethod. These are passed into the- ray.data.Dataset.map_batchesmethod when sampling without checking. If no arguments passed in the default arguments- {'concurrency': max(2, num_learners), 'zero_copy_batch': True}is used. Use these keyword args together with- input_read_method_kwargsand- iter_batches_kwargsto tune the performance of the data pipeline.
- iter_batches_kwargs – Keyword args for the - iter_batchesmethod. These are passed into the- ray.data.Dataset.iter_batchesmethod when sampling without checking. If no arguments are passed in, the default argument- {'prefetch_batches': 2}is used. Use these keyword args together with- input_read_method_kwargsand- map_batches_kwargsto tune the performance of the data pipeline.
- prelearner_class – An optional - OfflinePreLearnerclass that is used to transform data batches in- ray.data.map_batchesused in the- OfflineDataclass to transform data from columns to batches that can be used in the- Learner.update...()methods. Override the- OfflinePreLearnerclass and pass your derived class in here, if you need to make some further transformations specific for your data or loss. The default is None which uses the base- OfflinePreLearnerdefined in- ray.rllib.offline.offline_prelearner.
- prelearner_buffer_class – An optional - EpisodeReplayBufferclass that RLlib uses to buffer experiences when data is in- EpisodeTypeor RLlib’s previous- SampleBatchtype format. In this case, a single data row may contain multiple timesteps and the buffer serves two purposes: (a) to store intermediate data in memory, and (b) to ensure that RLlib samples exactly- train_batch_size_per_learnerexperiences per batch. The default is RLlib’s- EpisodeReplayBuffer.
- prelearner_buffer_kwargs – Optional keyword arguments for intializing the - EpisodeReplayBuffer. In most cases this value is simply the- capacityfor the default buffer that RLlib uses (- EpisodeReplayBuffer), but it may differ if the- prelearner_buffer_classuses a custom buffer.
- prelearner_module_synch_period – The period (number of batches converted) after which the - RLModuleheld by the- PreLearnershould sync weights. The- PreLearneris used to preprocess batches for the learners. The higher this value, the more off-policy the- PreLearner’s module is. Values too small force the- PreLearnerto sync more frequently and thus might slow down the data pipeline. The default value chosen by the- OfflinePreLearneris 10.
- dataset_num_iters_per_learner – Number of updates to run in each learner during a single training iteration. If None, each learner runs a complete epoch over its data block (the dataset is partitioned into at least as many blocks as there are learners). The default is - None. This value must be set to- 1, if RLlib uses a single (local) learner.
- input_config – Arguments that describe the settings for reading the input. If input is “sample”, this is the environment configuration, e.g. - env_nameand- env_config, etc. See- EnvContextfor more info. If the input is “dataset”, this contains e.g.- format,- path.
- actions_in_input_normalized – True, if the actions in a given offline “input” are already normalized (between -1.0 and 1.0). This is usually the case when the offline file has been generated by another RLlib algorithm (e.g. PPO or SAC), while “normalize_actions” was set to True. 
- postprocess_inputs – Whether to run postprocess_trajectory() on the trajectory fragments from offline inputs. Note that postprocessing is done using the current policy, not the behavior policy, which is typically undesirable for on-policy algorithms. 
- shuffle_buffer_size – If positive, input batches are shuffled via a sliding window buffer of this number of batches. Use this if the input data is not in random enough order. Input is delayed until the shuffle buffer is filled. 
- output – Specify where experiences should be saved: - None: don’t save any experiences - “logdir” to save to the agent log dir - a path/URI to save to a custom output directory (e.g., “s3://bckt/”) - a function that returns a rllib.offline.OutputWriter 
- output_config – Arguments accessible from the IOContext for configuring custom output. 
- output_compress_columns – What sample batch columns to LZ4 compress in the output data. Note that providing - rllib.core.columns.Columns.OBSalso compresses- rllib.core.columns.Columns.NEXT_OBS.
- output_max_file_size – Max output file size (in bytes) before rolling over to a new file. 
- output_max_rows_per_file – Max output row numbers before rolling over to a new file. 
- output_write_remaining_data – Determines whether any remaining data in the recording buffers should be stored to disk. It is only applicable if - output_max_rows_per_fileis defined. When sampling data, it is buffered until the threshold specified by- output_max_rows_per_fileis reached. Only complete multiples of- output_max_rows_per_fileare written to disk, while any leftover data remains in the buffers. If a recording session is stopped, residual data may still reside in these buffers. Setting- output_write_remaining_datato- Trueensures this data is flushed to disk. By default, this attribute is set to- False.
- output_write_method – Write method for the - ray.data.Datasetto write the offline data to- output. The default is- read_parquetfor Parquet files. See https://docs.ray.io/en/latest/data/api/input_output.html for more info about available read methods in- ray.data.
- output_write_method_kwargs – - kwargsfor the- output_write_method. These are passed into the write method without checking.
- output_filesystem – A cloud filesystem to handle access to cloud storage when writing experiences. Should be either “gcs” for Google Cloud Storage, “s3” for AWS S3 buckets, or “abs” for Azure Blob Storage. 
- output_filesystem_kwargs – A dictionary holding the kwargs for the filesystem given by - output_filesystem. See- gcsfs.GCSFilesystemfor GCS,- pyarrow.fs.S3FileSystem, for S3, and- ablfs.AzureBlobFilesystemfor ABS filesystem arguments.
- output_write_episodes – If RLlib should record data in its RLlib’s - EpisodeTypeformat (that is,- SingleAgentEpisodeobjects). Use this format, if you need RLlib to order data in time and directly group by episodes for example to train stateful modules or if you plan to use recordings exclusively in RLlib. Otherwise RLlib records data in tabular (columnar) format. Default is- True.
- offline_sampling – Whether sampling for the Algorithm happens via reading from offline data. If True, EnvRunners don’t limit the number of collected batches within the same - sample()call based on the number of sub-environments within the worker (no sub-environments present).
 
- Returns:
- This updated AlgorithmConfig object.