bagua.torch_api.contrib.fused_optimizer

Module Contents

class bagua.torch_api.contrib.fused_optimizer.FusedOptimizer(optimizer, do_flatten=False)

Bases: torch.optim.Optimizer

Convert any optimizer into a fused optimizer.

This fused optimizer fuses multiple module parameter update kernel launches into one or a few, by flattening parameter tensors into one or more contiguous buckets.

It can be used in conjunction with bagua.torch_api.bagua_init. In this case, Bagua will do the fusions automatically, otherwise, you need to explicitly pass do_flatten=True.

Parameters
  • optimizer (torch.optim.Optimizer) – Any PyTorch optimizer.

  • do_flatten (bool) – Whether to flatten the parameters. Default: False.

Returns

Fused optimizer.

Example::

To use in conjunction with bagua.torch_api.bagua_init:

>>> optimizer = torch.optim.Adadelta(model.parameters(), ....)
>>> optimizer = bagua.torch_api.contrib.FusedOptimizer(optimizer)
>>> model = model.with_bagua([optimizer], GradientAllReduceAlgorithm())

To use alone or with torch.nn.parallel.DistributedDataParallel, set do_flatten to be True:

>>> optimizer = torch.optim.Adadelta(model.parameters(), ....)
>>> optimizer = bagua.torch_api.contrib.FusedOptimizer(optimizer, do_flatten=True)
step(self, closure=None)

Performs a single optimization step (parameter update).

Parameters

closure (callable) – A closure that reevaluates the model and returns the loss. Optional for most optimizers.

Note

Unless otherwise specified, this function should not modify the .grad field of the parameters.