bagua.torch_api.algorithms.base

Module Contents

class bagua.torch_api.algorithms.base.Algorithm

This is the base class that all Bagua algorithms inherit.

classmethod init(name, **kwargs)

Helper class to initialize a registered Bagua algorithm.

Parameters
  • name – Name of the registered Bagua algorithm.

  • kwargs – Arguments to initialize the registered Bagua algorithm.

Returns

An instance of a registered Bagua algorithm.

Return type

Algorithm

Example::
>>> from bagua.torch_api.algorithms import Algorithm
>>> algorithm = Algorithm.init("gradient_allreduce", hierarchical=True)

Note

Call str(bagua.torch_api.algorithms.GlobalAlgorithmRegistry) to see all registered Bagua algorithms.

reify(process_group)

Create an algorithm implementation instance. See AlgorithmImpl.

Parameters

process_group (bagua.torch_api.communication.BaguaProcessGroup) – The process group to work on.

Returns

An instance of Bagua algorithm implementation.

class bagua.torch_api.algorithms.base.AlgorithmImpl(process_group)

This is the base class that all Bagua algorithm implementations inherit.

It provides methods that can be override to implement different kinds of distributed algorithms.

Parameters

process_group (bagua.torch_api.communication.BaguaProcessGroup) – The process group to work on.

init_backward_hook(bagua_ddp)

Given a BaguaDistributedDataParallel, return a hook function that will be executed on every parameter’s gradient computation completion.

Parameters

bagua_ddp (bagua.torch_api.data_parallel.bagua_distributed.BaguaDistributedDataParallel) – bagua.torch_api.data_parallel.BaguaDistributedDataParallel.

Returns

A function that takes the name of a parameter (as in torch.nn.Module.named_parameters) and the parameter itself.

init_forward_pre_hook(bagua_ddp)

Given a BaguaDistributedDataParallel, return a hook function that will be executed before the forward process.

Parameters

bagua_ddp (bagua.torch_api.data_parallel.bagua_distributed.BaguaDistributedDataParallel) – bagua.torch_api.data_parallel.BaguaDistributedDataParallel.

Returns

A function that takes the model’s input.

init_operations(bagua_ddp, bucket)

Given a BaguaDistributedDataParallel, and a BaguaBucket, register operations to be executed on the bucket.

Parameters
  • bagua_ddp (bagua.torch_api.data_parallel.bagua_distributed.BaguaDistributedDataParallel) – bagua.torch_api.data_parallel.BaguaDistributedDataParallel.

  • bucket (bagua.torch_api.bucket.BaguaBucket) – A single bucket to register operations.

init_post_backward_hook(bagua_ddp)

Given a BaguaDistributedDataParallel, return a hook function that will be executed when the backward pass is done.

Parameters

bagua_ddp (bagua.torch_api.data_parallel.bagua_distributed.BaguaDistributedDataParallel) – bagua.torch_api.data_parallel.BaguaDistributedDataParallel.

Returns

A function that takes no argument.

init_post_optimizer_step_hook(bagua_ddp)

Given a BaguaDistributedDataParallel, return a hook function that will be executed when the optimizer.step() is done.

Parameters

bagua_ddp (bagua.torch_api.data_parallel.bagua_distributed.BaguaDistributedDataParallel) – bagua.torch_api.data_parallel.BaguaDistributedDataParallel.

Returns

A function that gets called after an optimizer’s step() method is called. The function takes the optimizer as its argument.

init_tensors(bagua_ddp)

Given a BaguaDistributedDataParallel, return Bagua tensors to be used in Bagua for later operations.

Parameters

bagua_ddp (bagua.torch_api.data_parallel.bagua_distributed.BaguaDistributedDataParallel) – bagua.torch_api.data_parallel.BaguaDistributedDataParallel.

Returns

A list of Bagua tensors for communication.

Return type

List[bagua.torch_api.tensor.BaguaTensor]

need_reset()
Returns

True if all initialization methods of the current algorithms should be called again. This is useful for algorithms that have multiple stages where each stage needs different initializations.

Return type

bool

tensors_to_buckets(tensors, do_flatten)

Given the bucketing suggestion from Bagua, return the actual Bagua buckets. The default implementation follows the suggestion to do the bucketing.

Parameters
  • tensors (List[List[bagua.torch_api.tensor.BaguaTensor]]) – Bagua tensors grouped in different lists, representing Bagua’s suggestion on how to bucketing the tensors.

  • do_flatten (bool) – Whether to flatten the Bagua buckets.

Returns

A list of Bagua buckets.

Return type

List[bagua.torch_api.bucket.BaguaBucket]