bagua.torch_api.algorithms¶
Submodules¶
Package Contents¶
- class bagua.torch_api.algorithms.Algorithm¶
This is the base class that all Bagua algorithms inherit.
- classmethod init(name, **kwargs)¶
Helper class to initialize a registered Bagua algorithm.
- Parameters:
name – Name of the registered Bagua algorithm.
kwargs – Arguments to initialize the registered Bagua algorithm.
- Returns:
An instance of a registered Bagua algorithm.
- Return type:
- Example::
>>> from bagua.torch_api.algorithms import Algorithm >>> algorithm = Algorithm.init("gradient_allreduce", hierarchical=True)
Note
Call
str(bagua.torch_api.algorithms.GlobalAlgorithmRegistry)
to see all registered Bagua algorithms.
- reify(process_group)¶
Create an algorithm implementation instance. See
AlgorithmImpl
.- Parameters:
process_group (bagua.torch_api.communication.BaguaProcessGroup) – The process group to work on.
- Returns:
An instance of Bagua algorithm implementation.
- class bagua.torch_api.algorithms.AlgorithmImpl(process_group)¶
This is the base class that all Bagua algorithm implementations inherit.
It provides methods that can be override to implement different kinds of distributed algorithms.
- Parameters:
process_group (bagua.torch_api.communication.BaguaProcessGroup) – The process group to work on.
- init_backward_hook(bagua_ddp)¶
Given a
BaguaDistributedDataParallel
, return a hook function that will be executed on every parameter’s gradient computation completion.- Parameters:
bagua_ddp (bagua.torch_api.data_parallel.bagua_distributed.BaguaDistributedDataParallel) –
bagua.torch_api.data_parallel.BaguaDistributedDataParallel
.- Returns:
A function that takes the name of a parameter (as in
torch.nn.Module.named_parameters
) and the parameter itself.
- init_forward_pre_hook(bagua_ddp)¶
Given a
BaguaDistributedDataParallel
, return a hook function that will be executed before the forward process.- Parameters:
bagua_ddp (bagua.torch_api.data_parallel.bagua_distributed.BaguaDistributedDataParallel) –
bagua.torch_api.data_parallel.BaguaDistributedDataParallel
.- Returns:
A function that takes the model’s input.
- init_operations(bagua_ddp, bucket)¶
Given a
BaguaDistributedDataParallel
, and aBaguaBucket
, register operations to be executed on the bucket.- Parameters:
bagua_ddp (bagua.torch_api.data_parallel.bagua_distributed.BaguaDistributedDataParallel) –
bagua.torch_api.data_parallel.BaguaDistributedDataParallel
.bucket (bagua.torch_api.bucket.BaguaBucket) – A single bucket to register operations.
- init_post_backward_hook(bagua_ddp)¶
Given a
BaguaDistributedDataParallel
, return a hook function that will be executed when the backward pass is done.- Parameters:
bagua_ddp (bagua.torch_api.data_parallel.bagua_distributed.BaguaDistributedDataParallel) –
bagua.torch_api.data_parallel.BaguaDistributedDataParallel
.- Returns:
A function that takes no argument.
- init_post_optimizer_step_hook(bagua_ddp)¶
Given a
BaguaDistributedDataParallel
, return a hook function that will be executed when theoptimizer.step()
is done.- Parameters:
bagua_ddp (bagua.torch_api.data_parallel.bagua_distributed.BaguaDistributedDataParallel) –
bagua.torch_api.data_parallel.BaguaDistributedDataParallel
.- Returns:
A function that gets called after an optimizer’s
step()
method is called. The function takes the optimizer as its argument.
- init_tensors(bagua_ddp)¶
Given a
BaguaDistributedDataParallel
, return Bagua tensors to be used in Bagua for later operations.- Parameters:
bagua_ddp (bagua.torch_api.data_parallel.bagua_distributed.BaguaDistributedDataParallel) –
bagua.torch_api.data_parallel.BaguaDistributedDataParallel
.- Returns:
A list of Bagua tensors for communication.
- Return type:
- need_reset()¶
- Returns:
True
if all initialization methods of the current algorithms should be called again. This is useful for algorithms that have multiple stages where each stage needs different initializations.- Return type:
bool
- tensors_to_buckets(tensors, do_flatten)¶
Given the bucketing suggestion from Bagua, return the actual Bagua buckets. The default implementation follows the suggestion to do the bucketing.
- Parameters:
tensors (List[List[bagua.torch_api.tensor.BaguaTensor]]) – Bagua tensors grouped in different lists, representing Bagua’s suggestion on how to bucketing the tensors.
do_flatten (bool) – Whether to flatten the Bagua buckets.
- Returns:
A list of Bagua buckets.
- Return type:
- bagua.torch_api.algorithms.GlobalAlgorithmRegistry¶