bagua.torch_api.contrib.fused_optimizer¶
Module Contents¶
- class bagua.torch_api.contrib.fused_optimizer.FusedOptimizer(optimizer, do_flatten=False)¶
Bases:
torch.optim.Optimizer
Convert any optimizer into a fused optimizer.
This fused optimizer fuses multiple module parameter update kernel launches into one or a few, by flattening parameter tensors into one or more contiguous buckets.
It can be used in conjunction with
bagua.torch_api.bagua_init
. In this case, Bagua will do the fusions automatically, otherwise, you need to explicitly passdo_flatten=True
.- Parameters
optimizer (torch.optim.Optimizer) – Any PyTorch optimizer.
do_flatten (bool) – Whether to flatten the parameters. Default:
False
.
- Returns
Fused optimizer.
- Example::
To use in conjunction with
bagua.torch_api.bagua_init
:>>> optimizer = torch.optim.Adadelta(model.parameters(), ....) >>> optimizer = bagua.torch_api.contrib.FusedOptimizer(optimizer) >>> model = model.with_bagua([optimizer], GradientAllReduceAlgorithm())
To use alone or with
torch.nn.parallel.DistributedDataParallel
, set do_flatten to beTrue
:>>> optimizer = torch.optim.Adadelta(model.parameters(), ....) >>> optimizer = bagua.torch_api.contrib.FusedOptimizer(optimizer, do_flatten=True)
- step(self, closure=None)¶
Performs a single optimization step (parameter update).
- Parameters
closure (callable) – A closure that reevaluates the model and returns the loss. Optional for most optimizers.
Note
Unless otherwise specified, this function should not modify the
.grad
field of the parameters.