bagua.torch_api.algorithms.base¶

Module Contents¶

class bagua.torch_api.algorithms.base.Algorithm¶

This is the base class that all Bagua algorithms inherit.

classmethod init(name, **kwargs)¶

Helper class to initialize a registered Bagua algorithm.

Parameters:

name – Name of the registered Bagua algorithm.
kwargs – Arguments to initialize the registered Bagua algorithm.

Returns:

An instance of a registered Bagua algorithm.

Return type:

Algorithm

Example::

>>> from bagua.torch_api.algorithms import Algorithm
>>> algorithm = Algorithm.init("gradient_allreduce", hierarchical=True)

Note

Call str(bagua.torch_api.algorithms.GlobalAlgorithmRegistry) to see all registered Bagua algorithms.

reify(process_group)¶

Create an algorithm implementation instance. See AlgorithmImpl.

Parameters:: process_group (bagua.torch_api.communication.BaguaProcessGroup) – The process group to work on.
Returns:: An instance of Bagua algorithm implementation.

class bagua.torch_api.algorithms.base.AlgorithmImpl(process_group)¶

This is the base class that all Bagua algorithm implementations inherit.

It provides methods that can be override to implement different kinds of distributed algorithms.

Parameters:: process_group (bagua.torch_api.communication.BaguaProcessGroup) – The process group to work on.

init_backward_hook(bagua_ddp)¶

Given a BaguaDistributedDataParallel, return a hook function that will be executed on every parameter’s gradient computation completion.

Parameters:: bagua_ddp (bagua.torch_api.data_parallel.bagua_distributed.BaguaDistributedDataParallel) – bagua.torch_api.data_parallel.BaguaDistributedDataParallel.
Returns:: A function that takes the name of a parameter (as in torch.nn.Module.named_parameters) and the parameter itself.

init_forward_pre_hook(bagua_ddp)¶

Given a BaguaDistributedDataParallel, return a hook function that will be executed before the forward process.

Parameters:: bagua_ddp (bagua.torch_api.data_parallel.bagua_distributed.BaguaDistributedDataParallel) – bagua.torch_api.data_parallel.BaguaDistributedDataParallel.
Returns:: A function that takes the model’s input.

init_operations(bagua_ddp, bucket)¶

Given a BaguaDistributedDataParallel, and a BaguaBucket, register operations to be executed on the bucket.

Parameters:

bagua_ddp (bagua.torch_api.data_parallel.bagua_distributed.BaguaDistributedDataParallel) – bagua.torch_api.data_parallel.BaguaDistributedDataParallel.
bucket (bagua.torch_api.bucket.BaguaBucket) – A single bucket to register operations.

init_post_backward_hook(bagua_ddp)¶

Given a BaguaDistributedDataParallel, return a hook function that will be executed when the backward pass is done.

Parameters:: bagua_ddp (bagua.torch_api.data_parallel.bagua_distributed.BaguaDistributedDataParallel) – bagua.torch_api.data_parallel.BaguaDistributedDataParallel.
Returns:: A function that takes no argument.

init_post_optimizer_step_hook(bagua_ddp)¶

Given a BaguaDistributedDataParallel, return a hook function that will be executed when the optimizer.step() is done.

Parameters:: bagua_ddp (bagua.torch_api.data_parallel.bagua_distributed.BaguaDistributedDataParallel) – bagua.torch_api.data_parallel.BaguaDistributedDataParallel.
Returns:: A function that gets called after an optimizer’s step() method is called. The function takes the optimizer as its argument.

init_tensors(bagua_ddp)¶

Given a BaguaDistributedDataParallel, return Bagua tensors to be used in Bagua for later operations.

Parameters:: bagua_ddp (bagua.torch_api.data_parallel.bagua_distributed.BaguaDistributedDataParallel) – bagua.torch_api.data_parallel.BaguaDistributedDataParallel.
Returns:: A list of Bagua tensors for communication.
Return type:: List[bagua.torch_api.tensor.BaguaTensor]

need_reset()¶

Returns:: True if all initialization methods of the current algorithms should be called again. This is useful for algorithms that have multiple stages where each stage needs different initializations.
Return type:: bool

tensors_to_buckets(tensors, do_flatten)¶

Given the bucketing suggestion from Bagua, return the actual Bagua buckets. The default implementation follows the suggestion to do the bucketing.

Parameters:

tensors (List[List[bagua.torch_api.tensor.BaguaTensor]]) – Bagua tensors grouped in different lists, representing Bagua’s suggestion on how to bucketing the tensors.
do_flatten (bool) – Whether to flatten the Bagua buckets.

Returns:

A list of Bagua buckets.

Return type:

List[bagua.torch_api.bucket.BaguaBucket]