bagua.torch_api.algorithms

Submodules

Package Contents

class bagua.torch_api.algorithms.Algorithm

This is the base class that all Bagua algorithms inherit.

reify(self, process_group)

Create an algorithm instance.

Parameters

process_group (bagua.torch_api.communication.BaguaProcessGroup) – The process group to work on.

class bagua.torch_api.algorithms.AlgorithmImpl(process_group)

This is the base class that all Bagua algorithm implementations inherit.

It provides methods that can be override to implement different kinds of distributed algorithms.

Parameters

process_group (bagua.torch_api.communication.BaguaProcessGroup) – The process group to work on.

init_backward_hook(self, bagua_module)

Given a BaguaModule, return a hook function that will be executed on every parameter’s gradient computation completion.

Parameters

bagua_module (bagua.torch_api.distributed.BaguaModule) – A PyTorch module initialized by with_bagua method.

Returns

A function that takes the name of a parameter (as in torch.nn.Module.named_parameters) and the parameter itself.

init_forward_pre_hook(self, bagua_module)

Given a BaguaModule, return a hook function that will be executed before the forward process.

Parameters

bagua_module (bagua.torch_api.distributed.BaguaModule) – A PyTorch module initialized by with_bagua method.

Returns

A function that takes the model’s input.

init_operations(self, bagua_module, bucket)

Given a BaguaModule, and a BaguaBucket, register operations to be executed on the bucket.

Parameters
init_post_backward_hook(self, bagua_module)

Given a BaguaModule, return a hook function that will be executed when the backward pass is done.

Parameters

bagua_module (bagua.torch_api.distributed.BaguaModule) – A PyTorch module initialized by with_bagua method.

Returns

A function that takes no argument.

init_post_optimizer_step_hook(self, bagua_module)

Given a BaguaModule, return a hook function that will be executed when the optimizer.step() is done.

Parameters

bagua_module (bagua.torch_api.distributed.BaguaModule) – A PyTorch module initialized by with_bagua method.

Returns

A function that gets called after an optimizer’s step() method is called. The function takes the optimizer as its argument.

init_tensors(self, bagua_module)

Given a BaguaModule, return Bagua tensors to be used in Bagua for later operations.

Parameters

bagua_module (bagua.torch_api.distributed.BaguaModule) – A PyTorch module initialized by with_bagua method.

Returns

A list of Bagua tensors for communication.

Return type

List[bagua.torch_api.tensor.BaguaTensor]

need_reset(self)
Returns

True if all initialization methods of the current algorithms should be called again. This is useful for algorithms that have multiple stages where each stage needs different initializations.

Return type

bool

tensors_to_buckets(self, tensors, do_flatten)

Given the bucketing suggestion from Bagua, return the actual Bagua buckets. The default implementation follows the suggestion to do the bucketing.

Parameters
  • tensors (List[List[bagua.torch_api.tensor.BaguaTensor]]) – Bagua tensors grouped in different lists, representing Bagua’s suggestion on how to bucketing the tensors.

  • do_flatten (bool) – Whether to flatten the Bagua buckets.

Returns

A list of Bagua buckets.

Return type

List[bagua.torch_api.bucket.BaguaBucket]