bagua.torch_api.algorithms

Submodules

Package Contents

class bagua.torch_api.algorithms.Algorithm

This is the base class that all Bagua algorithms inherit.

classmethod init(name, **kwargs)

Helper class to initialize a registered Bagua algorithm.

Parameters:
  • name – Name of the registered Bagua algorithm.

  • kwargs – Arguments to initialize the registered Bagua algorithm.

Returns:

An instance of a registered Bagua algorithm.

Return type:

Algorithm

Example::
>>> from bagua.torch_api.algorithms import Algorithm
>>> algorithm = Algorithm.init("gradient_allreduce", hierarchical=True)

Note

Call str(bagua.torch_api.algorithms.GlobalAlgorithmRegistry) to see all registered Bagua algorithms.

reify(process_group)

Create an algorithm implementation instance. See AlgorithmImpl.

Parameters:

process_group (bagua.torch_api.communication.BaguaProcessGroup) – The process group to work on.

Returns:

An instance of Bagua algorithm implementation.

class bagua.torch_api.algorithms.AlgorithmImpl(process_group)

This is the base class that all Bagua algorithm implementations inherit.

It provides methods that can be override to implement different kinds of distributed algorithms.

Parameters:

process_group (bagua.torch_api.communication.BaguaProcessGroup) – The process group to work on.

init_backward_hook(bagua_ddp)

Given a BaguaDistributedDataParallel, return a hook function that will be executed on every parameter’s gradient computation completion.

Parameters:

bagua_ddp (bagua.torch_api.data_parallel.bagua_distributed.BaguaDistributedDataParallel) – bagua.torch_api.data_parallel.BaguaDistributedDataParallel.

Returns:

A function that takes the name of a parameter (as in torch.nn.Module.named_parameters) and the parameter itself.

init_forward_pre_hook(bagua_ddp)

Given a BaguaDistributedDataParallel, return a hook function that will be executed before the forward process.

Parameters:

bagua_ddp (bagua.torch_api.data_parallel.bagua_distributed.BaguaDistributedDataParallel) – bagua.torch_api.data_parallel.BaguaDistributedDataParallel.

Returns:

A function that takes the model’s input.

init_operations(bagua_ddp, bucket)

Given a BaguaDistributedDataParallel, and a BaguaBucket, register operations to be executed on the bucket.

Parameters:
  • bagua_ddp (bagua.torch_api.data_parallel.bagua_distributed.BaguaDistributedDataParallel) – bagua.torch_api.data_parallel.BaguaDistributedDataParallel.

  • bucket (bagua.torch_api.bucket.BaguaBucket) – A single bucket to register operations.

init_post_backward_hook(bagua_ddp)

Given a BaguaDistributedDataParallel, return a hook function that will be executed when the backward pass is done.

Parameters:

bagua_ddp (bagua.torch_api.data_parallel.bagua_distributed.BaguaDistributedDataParallel) – bagua.torch_api.data_parallel.BaguaDistributedDataParallel.

Returns:

A function that takes no argument.

init_post_optimizer_step_hook(bagua_ddp)

Given a BaguaDistributedDataParallel, return a hook function that will be executed when the optimizer.step() is done.

Parameters:

bagua_ddp (bagua.torch_api.data_parallel.bagua_distributed.BaguaDistributedDataParallel) – bagua.torch_api.data_parallel.BaguaDistributedDataParallel.

Returns:

A function that gets called after an optimizer’s step() method is called. The function takes the optimizer as its argument.

init_tensors(bagua_ddp)

Given a BaguaDistributedDataParallel, return Bagua tensors to be used in Bagua for later operations.

Parameters:

bagua_ddp (bagua.torch_api.data_parallel.bagua_distributed.BaguaDistributedDataParallel) – bagua.torch_api.data_parallel.BaguaDistributedDataParallel.

Returns:

A list of Bagua tensors for communication.

Return type:

List[bagua.torch_api.tensor.BaguaTensor]

need_reset()
Returns:

True if all initialization methods of the current algorithms should be called again. This is useful for algorithms that have multiple stages where each stage needs different initializations.

Return type:

bool

tensors_to_buckets(tensors, do_flatten)

Given the bucketing suggestion from Bagua, return the actual Bagua buckets. The default implementation follows the suggestion to do the bucketing.

Parameters:
  • tensors (List[List[bagua.torch_api.tensor.BaguaTensor]]) – Bagua tensors grouped in different lists, representing Bagua’s suggestion on how to bucketing the tensors.

  • do_flatten (bool) – Whether to flatten the Bagua buckets.

Returns:

A list of Bagua buckets.

Return type:

List[bagua.torch_api.bucket.BaguaBucket]

bagua.torch_api.algorithms.GlobalAlgorithmRegistry