bagua.torch_api.tensor¶
Module Contents¶
- class bagua.torch_api.tensor.BaguaTensor¶
This class patch torch.Tensor with additional methods.
A Bagua tensor is required to use Bagua’s communication algorithms. Users can convert a PyTorch tensor to Bagua tensor by
ensure_bagua_tensor
.Bagua tensor features a proxy structure, where the actual tensor used by backend is accessed via a “Proxy Tensor”. The proxy tensor is registered in Bagua, whenever the Bagua backend needs a tensor (for example use it for communication), it calls the
bagua_getter_closure
on the proxy tensor to get the tensor that is actually worked on. We call this tensor “Effective Tensor”. Thebagua_setter_closure
is also provided to replace the effective tensor during runtime. It is intended to be used to replace the effective tensor with customized workflow.Their relation can be seen in the following diagram:
For example, in the gradient allreduce algorithm, the effective tensor that needs to be exchanged between machines is the gradient. In this case, we will register the model parameters as proxy tensor, and register
bagua_getter_closure
to belambda proxy_tensor: proxy_tensor.grad
. In this way, even if the gradient tensor is recreated or changed during runtime, Bagua can still identify the correct tensor and use it for communication, since the proxy tensor serves as the root for access and is never replaced.- bagua_backend_tensor()¶
- Returns:
The raw Bagua backend tensor.
- Return type:
bagua_core.BaguaTensorPy
- bagua_ensure_grad()¶
Create a zero gradient tensor for the current parameter if not exist.
- Returns:
The original tensor.
- Return type:
torch.Tensor
- bagua_getter_closure()¶
Returns the tensor that will be used in runtime.
- Return type:
torch.Tensor
- bagua_mark_communication_ready()¶
Mark a Bagua tensor ready for scheduled operations execution.
- bagua_mark_communication_ready_without_synchronization()¶
Mark a Bagua tensor ready immediately, without CUDA event synchronization.
- bagua_set_storage(storage, storage_offset=0)¶
Sets the underlying storage for the effective tensor returned by
bagua_getter_closure
with an existing torch.Storage.- Parameters:
storage (torch.Storage) – The storage to use.
storage_offset (int) – The offset in the storage.
- bagua_setter_closure(tensor)¶
Sets the tensor that will be used in runtime to a new Pytorch tensor
tensor
.- Parameters:
tensor (torch.Tensor) – The new tensor to be set to.
- ensure_bagua_tensor(name=None, module_name=None, getter_closure=None, setter_closure=None)¶
Convert a PyTorch tensor or parameter to Bagua tensor inplace and return it. A Bagua tensor is required to use Bagua’s communication algorithms.
This operation will register
self
as proxy tensor to the Bagua backend.getter_closure
takes the proxy tensor as input and returns a Pytorch tensor. When using the Bagua tensor, thegetter_closure
will be called and returns the effective tensor which will be used for communication and other operations. For example, if one of a model’s parameterparam
is registered as proxy tensor, andgetter_closure
islambda x: x.grad
, during runtime its gradient will be used.setter_closure
takes the proxy tensor and another tensor as inputs and returns nothing. It is mainly used for changing the effective tensor used in runtime. For example when one of a model’s parameterparam
is registered as proxy tensor, andgetter_closure
islambda x: x.grad
, thesetter_closure
can belambda param, new_grad_tensor: setattr(param, "grad", new_grad_tensor)
. When thesetter_closure
is called, the effective tensor used in later operations will be changed tonew_grad_tensor
.- Parameters:
name (Optional[str]) – The unique name of the tensor.
module_name (Optional[str]) – The name of the model of which the tensor belongs to. The model name can be acquired using
model.bagua_module_name
. This is required to callbagua_mark_communication_ready
related methods.getter_closure (Optional[Callable[[torch.Tensor], torch.Tensor]]) – A function that accepts a Pytorch tensor as its input and returns a Pytorch tensor as its output. Could be
None
, which means an identity mappinglambda x: x
is used. Default:None
.setter_closure (Optional[Callable[[torch.Tensor, torch.Tensor], None]]) – A function that accepts two Pytorch tensors as its inputs and returns nothing. Could be
None
, which is a no-op. Default:None
.
- Returns:
The original tensor with Bagua tensor attributes initialized.
- is_bagua_tensor()¶
Checking if this is a Bagua tensor.
- Return type:
bool
- to_bagua_tensor(name=None, module_name=None, getter_closure=None, setter_closure=None)¶
Create a new Bagua tensor from a PyTorch tensor or parameter and return it. The new Bagua tensor will share the same storage with the input PyTorch tensor. A Bagua tensor is required to use Bagua’s communication algorithms. See
ensure_bagua_tensor
for more information.Caveat: Be aware that if the original tensor changes to use a different storage using for example
torch.Tensor.set_(...)
, the new Bagua tensor will still use the old storage.- Parameters:
name (Optional[str]) – The unique name of the tensor.
module_name (Optional[str]) – The name of the model of which the tensor belongs to. The model name can be acquired using
model.bagua_module_name
. This is required to callbagua_mark_communication_ready
related methods.getter_closure (Optional[Callable[[torch.Tensor], torch.Tensor]]) – A function that accepts a Pytorch tensor as its input and returns a Pytorch tensor as its output. See
ensure_bagua_tensor
.setter_closure (Optional[Callable[[torch.Tensor, torch.Tensor], None]]) – A function that accepts two Pytorch tensors as its inputs and returns nothing. See
ensure_bagua_tensor
.
- Returns:
The new Bagua tensor sharing the same storage with the original tensor.