Referring to the state of being spread across multiple devices. In the realm of machine learning, distributed training typically involves utilizing multiple GPUs, often situated on distinct physical machines, to train a model. This approach enables faster and more efficient model training by parallelizing the computational workload across multiple processing units.