bagua.torch_api.contrib.fused_optimizer¶
Module Contents¶
- class bagua.torch_api.contrib.fused_optimizer.FusedOptimizer(optimizer, do_flatten=False)¶
Bases:
torch.optim.Optimizer
Convert any optimizer into a fused optimizer.
This fused optimizer fuses multiple module parameter update kernel launches into one or a few, by flattening parameter tensors into one or more contiguous buckets.
It can be used in conjunction with
with_bagua
method. In this case, Bagua will do the fusions automatically, otherwise, you need to explicitly setdo_flatten=True
.- Parameters
optimizer (torch.optim.Optimizer) – Any PyTorch optimizer.
do_flatten (bool) – Whether to flatten the parameters. Default:
False
.
- Returns
Fused optimizer.
- Example::
To use in conjunction with
with_bagua
method:>>> optimizer = torch.optim.Adadelta(model.parameters(), ....) >>> optimizer = bagua.torch_api.contrib.FusedOptimizer(optimizer) >>> model = model.with_bagua([optimizer], GradientAllReduceAlgorithm())
To use alone or with torch.nn.parallel.DistributedDataParallel, set
do_flatten=True
:>>> optimizer = torch.optim.Adadelta(model.parameters(), ....) >>> optimizer = bagua.torch_api.contrib.FusedOptimizer(optimizer, do_flatten=True)
- step(self, closure=None)¶
Performs a single optimization step (parameter update).
- Parameters
closure (Callable) – A closure that reevaluates the model and returns the loss. Optional for most optimizers.
Note
Unless otherwise specified, this function should not modify the
.grad
field of the parameters.