Tune Internals#
TunerInternal#
- class ray.tune.impl.tuner_internal.TunerInternal(restore_path: str = None, storage_filesystem: pyarrow.fs.FileSystem | None = None, resume_config: ResumeConfig | None = None, trainable: str | Callable | Type[Trainable] | BaseTrainer | None = None, param_space: Dict[str, Any] | None = None, tune_config: TuneConfig | None = None, run_config: RunConfig | None = None, _tuner_kwargs: Dict | None = None, _entrypoint: AirEntrypoint = AirEntrypoint.TUNER)[source]#
- The real implementation behind external facing - Tuner.- The external facing - Tunermultiplexes between local Tuner and remote Tuner depending on whether in Ray client mode.- In Ray client mode, external - Tunerwraps- TunerInternalinto a remote actor, which is guaranteed to be placed on head node.- TunerInternalcan be constructed from fresh, in which case,- trainableneeds to be provided, together with optional- param_space,- tune_configand- run_config.- It can also be restored from a previous failed run (given - restore_path).- Parameters:
- restore_path – The path from where the Tuner can be restored. If provided, None of the rest args are needed. 
- resume_config – Resume config to configure which trials to continue. 
- trainable – The trainable to be tuned. 
- param_space – Search space of the tuning job. One thing to note is that both preprocessor and dataset can be tuned here. 
- tune_config – Tuning algorithm specific configs. Refer to ray.tune.tune_config.TuneConfig for more info. 
- run_config – Runtime configuration that is specific to individual trials. If passed, this will overwrite the run config passed to the Trainer, if applicable. Refer to ray.tune.RunConfig for more info. 
 
 
Trial#
- class ray.tune.experiment.trial.Trial(trainable_name: str, *, config: Dict | None = None, trial_id: str | None = None, storage: StorageContext | None = None, evaluated_params: Dict | None = None, experiment_tag: str = '', placement_group_factory: PlacementGroupFactory | None = None, stopping_criterion: Dict[str, float] | None = None, checkpoint_config: CheckpointConfig | None = None, export_formats: List[str] | None = None, restore_path: str | None = None, trial_name_creator: Callable[[Trial], str] | None = None, trial_dirname_creator: Callable[[Trial], str] | None = None, log_to_file: str | None | Tuple[str | None, str | None] = None, max_failures: int = 0, stub: bool = False, _setup_default_resource: bool = True)[source]#
- A trial object holds the state for one model training run. - Trials are themselves managed by the TrialRunner class, which implements the event loop for submitting trial runs to a Ray cluster. - Trials start in the PENDING state, and transition to RUNNING once started. On error, it transitions to ERROR, otherwise TERMINATED on success. - There are resources allocated to each trial. These should be specified using - PlacementGroupFactory.- trainable_name#
- Name of the trainable object to be executed. 
 - config#
- Provided configuration dictionary with evaluated params. 
 - trial_id#
- Unique identifier for the trial. 
 - path#
- Path where results for this trial are stored. Can be on the local node or on cloud storage. 
 - local_path#
- Path on the local disk where results are stored. 
 - remote_path#
- Path on cloud storage where results are stored, or None if not set. 
 - relative_logdir#
- Directory of the trial relative to its experiment directory. 
 - evaluated_params#
- Evaluated parameters by search algorithm, 
 - experiment_tag#
- Identifying trial name to show in the console 
 - status#
- One of PENDING, RUNNING, PAUSED, TERMINATED, ERROR/ 
 - error_file#
- Path to the errors that this trial has raised. 
 - DeveloperAPI: This API may change across minor Ray releases. - create_placement_group_factory()[source]#
- Compute placement group factory if needed. - Note: this must be called after all the placeholders in self.config are resolved. 
 - property local_dir#
- Warning - DEPRECATED: This API is deprecated and may be removed in future Ray releases. 
 - property logdir: str | None#
- Warning - DEPRECATED: This API is deprecated and may be removed in future Ray releases. 
 - property checkpoint: Checkpoint | None#
- Returns the most recent checkpoint if one has been saved. 
 - init_logdir()[source]#
- Warning - DEPRECATED: This API is deprecated and may be removed in future Ray releases. 
 - update_resources(resources: dict | PlacementGroupFactory)[source]#
- EXPERIMENTAL: Updates the resource requirements. - Should only be called when the trial is not running. - Raises:
- ValueError – if trial status is running. 
 
 - set_storage(new_storage: StorageContext)[source]#
- Updates the storage context of the trial. - If the - storage_pathor- experiment_dir_namehas changed, then this setter also updates the paths of all checkpoints tracked by the checkpoint manager. This enables restoration from a checkpoint if the user moves the directory.
 - get_pickled_error() Exception | None[source]#
- Returns the pickled error object if it exists in storage. - This is a pickled version of the latest error that the trial encountered. 
 - get_error() TuneError | None[source]#
- Returns the error text file trace as a TuneError object if it exists in storage. - This is a text trace of the latest error that the trial encountered, which is used in the case that the error is not picklable. 
 - on_checkpoint(checkpoint_result: _TrainingResult)[source]#
- Hook for handling checkpoints taken by the Trainable. - Parameters:
- checkpoint – Checkpoint taken. 
 
 - should_recover()[source]#
- Returns whether the trial qualifies for retrying. - num_failuresshould represent the number of times the trial has failed up to the moment this method is called. If we’ve failed 5 times and- max_failures=5, then we should recover, since we only pass the limit on the 6th failure.- Note this may return true even when there is no checkpoint, either because - self.checkpoint_freqis- 0or because the trial failed before a checkpoint has been made.
 
FunctionTrainable#
- class ray.tune.trainable.function_trainable.FunctionTrainable(config: Dict[str, Any] = None, logger_creator: Callable[[Dict[str, Any]], Logger] = None, storage: StorageContext | None = None)[source]#
- Trainable that runs a user function reporting results. - This mode of execution does not support checkpoint/restore. - DeveloperAPI: This API may change across minor Ray releases. 
Registry#
- ray.tune.register_trainable(name: str, trainable: Callable | Type, warn: bool = True)[source]#
- Register a trainable function or class. - This enables a class or function to be accessed on every Ray process in the cluster. - Parameters:
- name – Name to register. 
- trainable – Function or tune.Trainable class. Functions must take (config, status_reporter) as arguments and will be automatically converted into a class during registration. 
 
 - DeveloperAPI: This API may change across minor Ray releases. 
- ray.tune.register_env(name: str, env_creator: Callable)[source]#
- Register a custom environment for use with RLlib. - This enables the environment to be accessed on every Ray process in the cluster. - Parameters:
- name – Name to register. 
- env_creator – Callable that creates an env. 
 
 - DeveloperAPI: This API may change across minor Ray releases. 
Output#
- class ray.tune.experimental.output.ProgressReporter(verbosity: AirVerbosity, progress_metrics: List[str] | List[Dict[str, str]] | None = None)[source]#
- Periodically prints out status update. 
- class ray.tune.experimental.output.TrainReporter(verbosity: AirVerbosity, progress_metrics: List[str] | List[Dict[str, str]] | None = None)[source]#