Module Contents

class bagua.torch_api.contrib.fused_optimizer.FusedOptimizer(optimizer, do_flatten=False)

Bases: torch.optim.Optimizer

Convert any optimizer into a fused optimizer.

This fused optimizer fuses multiple module parameter update kernel launches into one or a few, by flattening parameter tensors into one or more contiguous buckets.

It can be used in conjunction with with_bagua method. In this case, Bagua will do the fusions automatically, otherwise, you need to explicitly set do_flatten=True.

  • optimizer (torch.optim.Optimizer) – Any PyTorch optimizer.

  • do_flatten (bool) – Whether to flatten the parameters. Default: False.


Fused optimizer.


To use in conjunction with with_bagua method:

>>> optimizer = torch.optim.Adadelta(model.parameters(), ....)
>>> optimizer = bagua.torch_api.contrib.FusedOptimizer(optimizer)
>>> model = model.with_bagua([optimizer], GradientAllReduceAlgorithm())

To use alone or with torch.nn.parallel.DistributedDataParallel, set do_flatten=True:

>>> optimizer = torch.optim.Adadelta(model.parameters(), ....)
>>> optimizer = bagua.torch_api.contrib.FusedOptimizer(optimizer, do_flatten=True)
step(self, closure=None)

Performs a single optimization step (parameter update).


closure (Callable) – A closure that reevaluates the model and returns the loss. Optional for most optimizers.


Unless otherwise specified, this function should not modify the .grad field of the parameters.