bagua.torch_api.bucket¶
Module Contents¶
- class bagua.torch_api.bucket.BaguaBucket(tensors, name, flatten, alignment=1)¶
Create a Bagua bucket with a list of Bagua tensors.
- Parameters
tensors – A list of Bagua tensors to be put in the bucket.
name – The unique name of the bucket.
flatten – If True, flatten the input tensors so that they are contiguous in memory.
alignment – If alignment > 1, Bagua will create a padding tensor to the bucket so that the total number of elements in the bucket divides the given alignment.
- name¶
The bucket’s name.
- tensors¶
The tensors contained within the bucket.
- append_centralized_synchronous_op(self, hierarchical=False, average=True, scattergather=False, compression=None)¶
Append a centralized synchronous operation to a bucket. It will sum or average the tensors in the bucket for all workers.
The operations will be executed by the Bagua backend in the order they are appended when all the tensors within the bucket are marked ready.
- Parameters
hierarchical (bool) – Enable hierarchical communication. Which means the GPUs on the same machine will communicate will each other first. After that, machines do inter-node communication. This can boost performance when the inter-node communication cost is high.
average (bool) – If True, the gradients on each worker are averaged. Otherwise, they are summed.
scattergather (bool) – If true, the communication between workers are done with scatter gather instead of allreduce. This is required for using compression.
compression (Optional[str]) – If not None, the tensors will be compressed for communication. Currently “MinMaxUInt8” is supported.
- Returns
The bucket itself.
- Return type
- append_decentralized_synchronous_op(self, hierarchical=True, peer_selection_mode='all', communication_interval=1)¶
Append a decentralized synchronous operation to a bucket. It will do gossipy style model averaging among workers.
The operations will be executed by the Bagua backend in the order they are appended when all the tensors within the bucket are marked ready.
- Parameters
hierarchical (bool) – Enable hierarchical communication. Which means the GPUs on the same machine will communicate will each other first. After that, machines do inter-node communication. This can boost performance when the inter-node communication cost is high.
peer_selection_mode (str) – Can be “all” or “shift_one”. “all” means all workers’ weights are averaged in each communication step. “shift_one” means each worker selects a different peer to do weights average in each communication step.
communication_interval (int) – Number of iterations between two communication steps.
- Returns
The bucket itself.
- Return type
- append_python_op(self, python_function)¶
Append a Python operation to a bucket. A Python operation is a Python function that takes the bucket’s name and returns
None
. It can do arbitrary things within the function body.The operations will be executed by the Bagua backend in the order they are appended when all the tensors within the bucket are marked ready.
- Parameters
python_function (Callable[[str], None]) – The Python operation function.
- Returns
The bucket itself.
- Return type
- check_flatten(self)¶
- Returns
True if the bucket’s tensors are contiguous in memory.
- Return type
bool
- clear_ops(self)¶
Clear the previously appended operations.
- Return type