bagua.torch_api.tensor

Module Contents

class bagua.torch_api.tensor.BaguaTensor

This class patch torch.Tensor with additional methods.

A Bagua tensor is required to use Bagua’s communication algorithms. Users can convert a PyTorch tensor to Bagua tensor by ensure_bagua_tensor.

Bagua tensor features a proxy structure, where the actual tensor used by backend is accessed via a “Proxy Tensor”. The proxy tensor is registered in Bagua, whenever the Bagua backend needs a tensor (for example use it for communication), it calls the bagua_getter_closure on the proxy tensor to get the tensor that is actually worked on. We call this tensor “Effective Tensor”. The bagua_setter_closure is also provided to replace the effective tensor during runtime. It is intended to be used to replace the effective tensor with customized workflow.

Their relation can be seen in the following diagram:

https://user-images.githubusercontent.com/18649508/139179394-51d0c0f5-e233-4ada-8e5e-0e70a889540d.png

For example, in the gradient allreduce algorithm, the effective tensor that needs to be exchanged between machines is the gradient. In this case, we will register the model parameters as proxy tensor, and register bagua_getter_closure to be lambda proxy_tensor: proxy_tensor.grad. In this way, even if the gradient tensor is recreated or changed during runtime, Bagua can still identify the correct tensor and use it for communication, since the proxy tensor serves as the root for access and is never replaced.

bagua_backend_tensor(self)
Returns

The raw Bagua backend tensor.

Return type

bagua_core.BaguaTensorPy

bagua_ensure_grad(self)

Create a zero gradient tensor for the current parameter if not exist.

Returns

The original tensor.

Return type

torch.Tensor

bagua_getter_closure(self)

Returns the tensor that will be used in runtime.

Return type

torch.Tensor

bagua_mark_communication_ready(self)

Mark a Bagua tensor ready for scheduled operations execution.

bagua_mark_communication_ready_without_synchronization(self)

Mark a Bagua tensor ready immediately, without CUDA event synchronization.

bagua_set_storage(self, storage, storage_offset=0)

Sets the underlying storage for the effective tensor returned by bagua_getter_closure with an existing torch.Storage.

Parameters
  • storage (torch.Storage) – The storage to use.

  • storage_offset (int) – The offset in the storage.

bagua_setter_closure(self, tensor)

Sets the tensor that will be used in runtime to a new Pytorch tensor tensor.

Parameters

tensor (torch.Tensor) – The new tensor to be set to.

ensure_bagua_tensor(self, name=None, module_name=None, getter_closure=None, setter_closure=None)

Convert a PyTorch tensor or parameter to Bagua tensor inplace and return it. A Bagua tensor is required to use Bagua’s communication algorithms.

This operation will register self as proxy tensor to the Bagua backend. getter_closure takes the proxy tensor as input and returns a Pytorch tensor. When using the Bagua tensor, the getter_closure will be called and returns the effective tensor which will be used for communication and other operations. For example, if one of a model’s parameter param is registered as proxy tensor, and getter_closure is lambda x: x.grad, during runtime its gradient will be used.

setter_closure takes the proxy tensor and another tensor as inputs and returns nothing. It is mainly used for changing the effective tensor used in runtime. For example when one of a model’s parameter param is registered as proxy tensor, and getter_closure is lambda x: x.grad, the setter_closure can be lambda param, new_grad_tensor: setattr(param, "grad", new_grad_tensor). When the setter_closure is called, the effective tensor used in later operations will be changed to new_grad_tensor.

Parameters
  • name (Optional[str]) – The unique name of the tensor.

  • module_name (Optional[str]) – The name of the model of which the tensor belongs to. The model name can be acquired using model.bagua_module_name. This is required to call bagua_mark_communication_ready related methods.

  • getter_closure (Optional[Callable[[torch.Tensor], torch.Tensor]]) – A function that accepts a Pytorch tensor as its input and returns a Pytorch tensor as its output. Could be None, which means an identity mapping lambda x: x is used. Default: None.

  • setter_closure (Optional[Callable[[torch.Tensor, torch.Tensor], None]]) – A function that accepts two Pytorch tensors as its inputs and returns nothing. Could be None, which is a no-op. Default: None.

Returns

The original tensor with Bagua tensor attributes initialized.

is_bagua_tensor(self)

Checking if this is a Bagua tensor.

Return type

bool

to_bagua_tensor(self, name=None, module_name=None, getter_closure=None, setter_closure=None)

Create a new Bagua tensor from a PyTorch tensor or parameter and return it. The new Bagua tensor will share the same storage with the input PyTorch tensor. A Bagua tensor is required to use Bagua’s communication algorithms. See ensure_bagua_tensor for more information.

Caveat: Be aware that if the original tensor changes to use a different storage using for example torch.Tensor.set_(...), the new Bagua tensor will still use the old storage.

Parameters
  • name (Optional[str]) – The unique name of the tensor.

  • module_name (Optional[str]) – The name of the model of which the tensor belongs to. The model name can be acquired using model.bagua_module_name. This is required to call bagua_mark_communication_ready related methods.

  • getter_closure (Optional[Callable[[torch.Tensor], torch.Tensor]]) – A function that accepts a Pytorch tensor as its input and returns a Pytorch tensor as its output. See ensure_bagua_tensor.

  • setter_closure (Optional[Callable[[torch.Tensor, torch.Tensor], None]]) – A function that accepts two Pytorch tensors as its inputs and returns nothing. See ensure_bagua_tensor.

Returns

The new Bagua tensor sharing the same storage with the original tensor.