bagua.torch_api.algorithms.base

Module Contents

class bagua.torch_api.algorithms.base.Algorithm

This is the base class that all Bagua algorithms inherit.

It provides methods that can be override to implement different kinds of distributed algorithms.

init_backward_hook(self, bagua_module)

Given a BaguaModule, return a hook function that will be executed on every parameter’s gradient computation completion.

Parameters

bagua_module (bagua.torch_api.distributed.BaguaModule) – A PyTorch module initialized by with_bagua(...) method.

Returns

A function that takes the name of a parameter (as in torch.nn.Module.named_parameters()) and the parameter itself.

init_forward_pre_hook(self, bagua_module)

Given a BaguaModule, return a hook function that will be executed before the forward process.

Parameters

bagua_module (bagua.torch_api.distributed.BaguaModule) – A PyTorch module initialized by with_bagua(...) method.

Returns

A function that takes the model’s input.

init_operations(self, bagua_module, bucket)

Given a BaguaModule, and a Bagua bucket, register operations to be executed on the bucket.

Parameters
init_post_backward_hook(self, bagua_module)

Given a BaguaModule, return a hook function that will be executed when the backward pass is done.

Parameters

bagua_module (bagua.torch_api.distributed.BaguaModule) – A PyTorch module initialized by with_bagua(...) method.

Returns

A function that takes no argument.

init_post_optimizer_step_hook(self, bagua_module)

Given a BaguaModule, return a hook function that will be executed when the optimizer.step() is done.

Parameters

bagua_module (bagua.torch_api.distributed.BaguaModule) – A PyTorch module initialized by with_bagua(...) method.

Returns

A function that takes the optimizer that is called step().

init_tensors(self, bagua_module)

Given a BaguaModule, return Bagua tensors to be used in Bagua for later operations.

Parameters

bagua_module (bagua.torch_api.distributed.BaguaModule) – A PyTorch module initialized by with_bagua(...) method.

Returns

A list of Bagua tensors for communication.

Return type

List[bagua.torch_api.tensor.BaguaTensor]

need_reset(self)
Returns

True if all initialization methods of the current algorithms should

Return type

bool

be called again. This is useful for algorithms that has multiple stages where each stage needs different initializations.

tensors_to_buckets(self, tensors)

Given the bucketing suggestion from Bagua, return the actual Bagua buckets. The default implementation follows the suggestion to do the bucketing.

Parameters

tensors (List[List[bagua.torch_api.tensor.BaguaTensor]]) – Bagua tensors grouped in different lists, representing Bagua’s suggestion on how to bucketing the tensors.

Returns

A list of Bagua buckets.

Return type

List[bagua.torch_api.bucket.BaguaBucket]