ray.rllib.algorithms.algorithm.Algorithm.training_step#
- Algorithm.training_step() None[source]#
- Default single iteration logic of an algorithm. - Collect on-policy samples (SampleBatches) in parallel using the Algorithm’s EnvRunners (@ray.remote). 
- Concatenate collected SampleBatches into one train batch. 
- Note that we may have more than one policy in the multi-agent case: Call the different policies’ - learn_on_batch(simple optimizer) OR- load_batch_into_buffer+- learn_on_loaded_batch(multi-GPU optimizer) methods to calculate loss and update the model(s).
- Return all collected metrics for the iteration. 
 - Returns:
- For the new API stack, returns None. Results are compiled and extracted automatically through a single - self.metrics.reduce()call at the very end of an iteration (which might contain more than one call to- training_step()). This way, we make sure that we account for all results generated by each individual- training_step()call. For the old API stack, returns the results dict from executing the training step.