bagua.torch_api.algorithms.base¶
Module Contents¶
- class bagua.torch_api.algorithms.base.Algorithm¶
This is the base class that all Bagua algorithms inherit.
It provides methods that can be override to implement different kinds of distributed algorithms.
- init_backward_hook(self, bagua_module)¶
Given a BaguaModule, return a hook function that will be executed on every parameter’s gradient computation completion.
- Parameters
bagua_module (bagua.torch_api.distributed.BaguaModule) – A PyTorch module initialized by
with_bagua(...)
method.- Returns
A function that takes the name of a parameter (as in torch.nn.Module.named_parameters()) and the parameter itself.
- init_forward_pre_hook(self, bagua_module)¶
Given a BaguaModule, return a hook function that will be executed before the forward process.
- Parameters
bagua_module (bagua.torch_api.distributed.BaguaModule) – A PyTorch module initialized by
with_bagua(...)
method.- Returns
A function that takes the model’s input.
- init_operations(self, bagua_module, bucket)¶
Given a BaguaModule, and a Bagua bucket, register operations to be executed on the bucket.
- Parameters
bagua_module (bagua.torch_api.distributed.BaguaModule) – A PyTorch module initialized by
with_bagua(...)
method.bucket (bagua.torch_api.bucket.BaguaBucket) – A single bucket to register operations.
- init_post_backward_hook(self, bagua_module)¶
Given a BaguaModule, return a hook function that will be executed when the backward pass is done.
- Parameters
bagua_module (bagua.torch_api.distributed.BaguaModule) – A PyTorch module initialized by
with_bagua(...)
method.- Returns
A function that takes no argument.
- init_post_optimizer_step_hook(self, bagua_module)¶
Given a BaguaModule, return a hook function that will be executed when the
optimizer.step()
is done.- Parameters
bagua_module (bagua.torch_api.distributed.BaguaModule) – A PyTorch module initialized by
with_bagua(...)
method.- Returns
A function that takes the optimizer that is called step().
- init_tensors(self, bagua_module)¶
Given a BaguaModule, return Bagua tensors to be used in Bagua for later operations.
- Parameters
bagua_module (bagua.torch_api.distributed.BaguaModule) – A PyTorch module initialized by
with_bagua(...)
method.- Returns
A list of Bagua tensors for communication.
- Return type
- need_reset(self)¶
- Returns
True if all initialization methods of the current algorithms should
- Return type
bool
be called again. This is useful for algorithms that has multiple stages where each stage needs different initializations.
- tensors_to_buckets(self, tensors)¶
Given the bucketing suggestion from Bagua, return the actual Bagua buckets. The default implementation follows the suggestion to do the bucketing.
- Parameters
tensors (List[List[bagua.torch_api.tensor.BaguaTensor]]) – Bagua tensors grouped in different lists, representing Bagua’s suggestion on how to bucketing the tensors.
- Returns
A list of Bagua buckets.
- Return type