bagua.torch_api.algorithms.base¶
Module Contents¶
- class bagua.torch_api.algorithms.base.Algorithm¶
This is the base class that all Bagua algorithms inherit.
- classmethod init(cls, name, **kwargs)¶
Helper class to initialize a registered Bagua algorithm.
- Parameters
name – Name of the registered Bagua algorithm.
kwargs – Arguments to initialize the registered Bagua algorithm.
- Returns
An instance of a registered Bagua algorithm.
- Return type
- Example::
>>> from bagua.torch_api.algorithms import Algorithm >>> algorithm = Algorithm.init("gradient_allreduce", hierarchical=True)
Note
Call
str(bagua.torch_api.algorithms.GlobalAlgorithmRegistry)
to see all registered Bagua algorithms.
- reify(self, process_group)¶
Create an algorithm implementation instance. See
AlgorithmImpl
.- Parameters
process_group (bagua.torch_api.communication.BaguaProcessGroup) – The process group to work on.
- Returns
An instance of Bagua algorithm implementation.
- class bagua.torch_api.algorithms.base.AlgorithmImpl(process_group)¶
This is the base class that all Bagua algorithm implementations inherit.
It provides methods that can be override to implement different kinds of distributed algorithms.
- Parameters
process_group (bagua.torch_api.communication.BaguaProcessGroup) – The process group to work on.
- init_backward_hook(self, bagua_ddp)¶
Given a
BaguaDistributedDataParallel
, return a hook function that will be executed on every parameter’s gradient computation completion.- Parameters
bagua_ddp (bagua.torch_api.data_parallel.bagua_distributed.BaguaDistributedDataParallel) –
bagua.torch_api.data_parallel.BaguaDistributedDataParallel
.- Returns
A function that takes the name of a parameter (as in
torch.nn.Module.named_parameters
) and the parameter itself.
- init_forward_pre_hook(self, bagua_ddp)¶
Given a
BaguaDistributedDataParallel
, return a hook function that will be executed before the forward process.- Parameters
bagua_ddp (bagua.torch_api.data_parallel.bagua_distributed.BaguaDistributedDataParallel) –
bagua.torch_api.data_parallel.BaguaDistributedDataParallel
.- Returns
A function that takes the model’s input.
- init_operations(self, bagua_ddp, bucket)¶
Given a
BaguaDistributedDataParallel
, and aBaguaBucket
, register operations to be executed on the bucket.- Parameters
bagua_ddp (bagua.torch_api.data_parallel.bagua_distributed.BaguaDistributedDataParallel) –
bagua.torch_api.data_parallel.BaguaDistributedDataParallel
.bucket (bagua.torch_api.bucket.BaguaBucket) – A single bucket to register operations.
- init_post_backward_hook(self, bagua_ddp)¶
Given a
BaguaDistributedDataParallel
, return a hook function that will be executed when the backward pass is done.- Parameters
bagua_ddp (bagua.torch_api.data_parallel.bagua_distributed.BaguaDistributedDataParallel) –
bagua.torch_api.data_parallel.BaguaDistributedDataParallel
.- Returns
A function that takes no argument.
- init_post_optimizer_step_hook(self, bagua_ddp)¶
Given a
BaguaDistributedDataParallel
, return a hook function that will be executed when theoptimizer.step()
is done.- Parameters
bagua_ddp (bagua.torch_api.data_parallel.bagua_distributed.BaguaDistributedDataParallel) –
bagua.torch_api.data_parallel.BaguaDistributedDataParallel
.- Returns
A function that gets called after an optimizer’s
step()
method is called. The function takes the optimizer as its argument.
- init_tensors(self, bagua_ddp)¶
Given a
BaguaDistributedDataParallel
, return Bagua tensors to be used in Bagua for later operations.- Parameters
bagua_ddp (bagua.torch_api.data_parallel.bagua_distributed.BaguaDistributedDataParallel) –
bagua.torch_api.data_parallel.BaguaDistributedDataParallel
.- Returns
A list of Bagua tensors for communication.
- Return type
- need_reset(self)¶
- Returns
True
if all initialization methods of the current algorithms should be called again. This is useful for algorithms that have multiple stages where each stage needs different initializations.- Return type
bool
- tensors_to_buckets(self, tensors, do_flatten)¶
Given the bucketing suggestion from Bagua, return the actual Bagua buckets. The default implementation follows the suggestion to do the bucketing.
- Parameters
tensors (List[List[bagua.torch_api.tensor.BaguaTensor]]) – Bagua tensors grouped in different lists, representing Bagua’s suggestion on how to bucketing the tensors.
do_flatten (bool) – Whether to flatten the Bagua buckets.
- Returns
A list of Bagua buckets.
- Return type