bagua.torch_api.contrib.fuse.optimizer

Module Contents

bagua.torch_api.contrib.fuse.optimizer.fuse_optimizer(optimizer, do_flatten=True, check_flatten=True)

Convert any optimizer into a fused optimizer.

A fused optimizer can fuse multiple parameter updates into one or a few updates. To achieve this, users need to:

1) flatten multiple parameters in the same group into fused parameter by setting do_flatten=True, which is also the default behavior of a fused optimizer;
2) perform a fused parameter update by calling fuse_step.

This fused optimizer is implemented for general use. It can be used used in conjunction with a BaguaModule as well as a torch.nn.parallel.DistributedDataParallel wrapped module, or some other cases (not listed here).

Parameters
  • optimizer (torch.optim.Optimizer) – Any PyTorch optimizer.

  • do_flatten (bool) – Whether to flatten the parameters. The flatten operation will reset data pointers of parameter tensors so that they can be fused together. Default: True.

  • check_flatten (bool) – When setting to True, it enables fused optimizer to automatically check if parameter tensors are contiguous as they are flattened to. Can only work with do_flatten=True. Default: True.

Returns

A Fused optimizer.

Example::
>>> optimizer = torch.optim.Adadelta(model.parameters(), ....)
>>> optimizer = bagua.torch_api.contrib.fuse_optimizer(optimizer, do_flatten=True)
>>>
>>> optimizer.fuse_step()

When use in conjunction with a BaguaModule, set do_flatten=False in with_bagua explicitly:

>>> optimizer = bagua.torch_api.contrib.fuse_optimizer(optimizer, do_flatten=True)
>>> model = model.with_bagua([optimizer], GradientAllReduceAlgorithm(), do_flatten=False)
>>>
>>> optimizer.fuse_step()

Note

This function and with_bagua method both will reset data pointers of module parameters by default. In order to perform a more effective fused parameter update, users need to disable bucket flattening in with_bagua by setting its do_flatten to False.

Note

A fuse optimizer does not change the original behaviors of optimizer, but enabling it to perform a fused parameter update through fuse_step. Users can still perform a normal parameter update through step.

bagua.torch_api.contrib.fuse.optimizer.fuse_step(optimizer, closure=None)

Perform a fused parameter update.

This operation will fuse multiple contiguous parameters into a fused parameter, by creating a tensor view sharing the same underlying storage with them, and then perform parameter update on fused parameters. If none of the parameter tensors are contiguous, this operation is equivalent to step.

Parameters
  • optimizer (torch.optim.Optimizer) – A fused optimizer.

  • closure (Callable) – A closure that reevaluates the model and returns the loss. Optional for most optimizers.

Note

This function will not modify the storage of parameter tensors.

bagua.torch_api.contrib.fuse.optimizer.is_fused_optimizer(optimizer)

Checking if optimizer is a fused optimizer or not.

Parameters

optimizer (torch.optim.Optimizer) –