|
Safemotion Lib
|
Functions | |
| int | get_world_size () |
| int | get_rank () |
| int | get_local_rank () |
| int | get_local_size () |
| bool | is_main_process () |
| synchronize () | |
| _get_global_gloo_group () | |
| _serialize_to_tensor (data, group) | |
| _pad_to_largest_tensor (tensor, group) | |
| all_gather (data, group=None) | |
| gather (data, dst=0, group=None) | |
| shared_random_seed () | |
| reduce_dict (input_dict, average=True) | |
Variables | |
| _LOCAL_PROCESS_GROUP = None | |
This file contains primitives for multi-gpu communication. This is useful when doing distributed training.
|
protected |
Return a process group based on gloo backend, containing all the ranks The result is cached.
Definition at line 82 of file comm.py.
|
protected |
Returns:
list[int]: size of the tensor, on each rank
Tensor: padded tensor that has the max size
Definition at line 111 of file comm.py.
|
protected |
Definition at line 93 of file comm.py.
| fastreid.utils.comm.all_gather | ( | data, | |
| group = None ) |
Run all_gather on arbitrary picklable data (not necessarily tensors).
Args:
data: any picklable object
group: a torch process group. By default, will use a group which
contains all ranks on gloo backend.
Returns:
list[data]: list of data gathered from each rank
Definition at line 138 of file comm.py.
| fastreid.utils.comm.gather | ( | data, | |
| dst = 0, | |||
| group = None ) |
Run gather on arbitrary picklable data (not necessarily tensors).
Args:
data: any picklable object
dst (int): destination rank
group: a torch process group. By default, will use a group which
contains all ranks on gloo backend.
Returns:
list[data]: on dst, a list of data gathered from each rank. Otherwise,
an empty list.
Definition at line 174 of file comm.py.
| int fastreid.utils.comm.get_local_rank | ( | ) |
Returns:
The rank of the current process within the local (per-machine) process group.
Definition at line 36 of file comm.py.
| int fastreid.utils.comm.get_local_size | ( | ) |
Returns:
The size of the per-machine process group,
i.e. the number of processes per machine.
Definition at line 49 of file comm.py.
| int fastreid.utils.comm.get_rank | ( | ) |
| int fastreid.utils.comm.get_world_size | ( | ) |
| bool fastreid.utils.comm.is_main_process | ( | ) |
| fastreid.utils.comm.reduce_dict | ( | input_dict, | |
| average = True ) |
Reduce the values in the dictionary from all processes so that process with rank
0 has the reduced results.
Args:
input_dict (dict): inputs to be reduced. All the values must be scalar CUDA Tensor.
average (bool): whether to do average or sum
Returns:
a dict with the same keys as input_dict, after reduction.
Definition at line 228 of file comm.py.
| fastreid.utils.comm.shared_random_seed | ( | ) |
Returns:
int: a random number that is the same across all workers.
If workers need a shared RNG, they can use this shared seed to
create one.
All workers must call this function, otherwise it will deadlock.
Definition at line 215 of file comm.py.
| fastreid.utils.comm.synchronize | ( | ) |
Helper function to synchronize (barrier) among all processes when using distributed training
Definition at line 66 of file comm.py.