bagua.torch_api.contrib.fused_optimizer¶

Module Contents¶

class bagua.torch_api.contrib.fused_optimizer.FusedOptimizer(optimizer, do_flatten=False)¶

Bases: torch.optim.Optimizer

Convert any optimizer into a fused optimizer.

This fused optimizer fuses multiple module parameter update kernel launches into one or a few, by flattening parameter tensors into one or more contiguous buckets.

It can be used in conjunction with with_bagua method. In this case, Bagua will do the fusions automatically, otherwise, you need to explicitly set do_flatten=True.

Parameters

optimizer (torch.optim.Optimizer) – Any PyTorch optimizer.
do_flatten (bool) – Whether to flatten the parameters. Default: False.

Returns

Fused optimizer.

Example::

To use in conjunction with with_bagua method:

>>> optimizer = torch.optim.Adadelta(model.parameters(), ....)
>>> optimizer = bagua.torch_api.contrib.FusedOptimizer(optimizer)
>>> model = model.with_bagua([optimizer], GradientAllReduceAlgorithm())

To use alone or with torch.nn.parallel.DistributedDataParallel, set do_flatten=True:

>>> optimizer = torch.optim.Adadelta(model.parameters(), ....)
>>> optimizer = bagua.torch_api.contrib.FusedOptimizer(optimizer, do_flatten=True)

step(self, closure=None)¶

Performs a single optimization step (parameter update).

Parameters: closure (Callable) – A closure that reevaluates the model and returns the loss. Optional for most optimizers.

Note

Unless otherwise specified, this function should not modify the .grad field of the parameters.