bagua.torch_api.contrib.fuse.optimizer¶
Module Contents¶
- bagua.torch_api.contrib.fuse.optimizer.fuse_optimizer(optimizer, do_flatten=True, check_flatten=True)¶
Convert any optimizer into a fused optimizer.
A fused optimizer can fuse multiple parameter updates into one or a few updates. To achieve this, users need to:
1) flatten multiple parameters in the same group into fused parameter by settingdo_flatten=True
, which is also the default behavior of a fused optimizer;2) perform a fused parameter update by callingfuse_step
.This fused optimizer is implemented for general use. It can be used used in conjunction with a
BaguaModule
as well as a torch.nn.parallel.DistributedDataParallel wrapped module, or some other cases (not listed here).- Parameters:
optimizer (torch.optim.Optimizer) – Any PyTorch optimizer.
do_flatten (bool) – Whether to flatten the parameters. The flatten operation will reset data pointers of parameter tensors so that they can be fused together. Default:
True
.check_flatten (bool) – When setting to
True
, it enables fused optimizer to automatically check if parameter tensors are contiguous as they are flattened to. Can only work withdo_flatten=True
. Default:True
.
- Returns:
A Fused optimizer.
- Example::
>>> optimizer = torch.optim.Adadelta(model.parameters(), ....) >>> optimizer = bagua.torch_api.contrib.fuse_optimizer(optimizer, do_flatten=True) >>> >>> optimizer.fuse_step()
When use in conjunction with a
BaguaModule
, setdo_flatten=False
inwith_bagua
explicitly:>>> optimizer = bagua.torch_api.contrib.fuse_optimizer(optimizer, do_flatten=True) >>> model = model.with_bagua([optimizer], GradientAllReduceAlgorithm(), do_flatten=False) >>> >>> optimizer.fuse_step()
Note
This function and
with_bagua
method both will reset data pointers of module parameters by default. In order to perform a more effective fused parameter update, users need to disable bucket flattening inwith_bagua
by setting itsdo_flatten
toFalse
.Note
A fuse optimizer does not change the original behaviors of
optimizer
, but enabling it to perform a fused parameter update throughfuse_step
. Users can still perform a normal parameter update throughstep
.
- bagua.torch_api.contrib.fuse.optimizer.fuse_step(optimizer, closure=None)¶
Perform a fused parameter update.
This operation will fuse multiple contiguous parameters into a fused parameter, by creating a tensor view sharing the same underlying storage with them, and then perform parameter update on fused parameters. If none of the parameter tensors are contiguous, this operation is equivalent to
step
.- Parameters:
optimizer (torch.optim.Optimizer) – A fused optimizer.
closure (Callable) – A closure that reevaluates the model and returns the loss. Optional for most optimizers.
Note
This function will not modify the storage of parameter tensors.
- bagua.torch_api.contrib.fuse.optimizer.is_fused_optimizer(optimizer)¶
Checking if
optimizer
is a fused optimizer or not.- Parameters:
optimizer (torch.optim.Optimizer) –