网络拓扑对分布式深度学习系统的影响

网络拓扑

Posted by Ezra on June 2, 2018

imgE. Chan et al.: Collective communication: theory, practice, and experience. CCPE’07TH, D. Moor: Energy, Memory, and Runtime Tradeoffs for Implementing Collective Communication Operations, JSFI’14img Hierarchical Parameter ServerS. Gupta et al.: Model Accuracy and Runtime Tradeoff in Distributed Deep Learning: A Systematic Study. ICDM’16 Adaptive Minibatch SizeS. L. Smith et al.: Don’t Decay the Learning Rate, Increase the Batch Size, arXiv 2017 imgPeter H. Jin et al., “How to scale distributed deep learning?”, NIPS MLSystems 2016imgS. Zhang et al.: Deep learning with Elastic Averaging SGD, NIPS’15 imgT. G. Dietterich: Ensemble Methods in Machine Learning, MCS 2000 img[1] S. Gupta et al. Deep Learning with Limited Numerical Precision, ICML’15

[2] F. Li and B. Liu. Ternary Weight Networks, arXiv 2016

[3] F. Seide et al. 1-Bit Stochastic Gradient Descent and Application to Data-Parallel Distributed Training of Speech DNNs, In Interspeech 2014

[4] C. Renggli et al. SparCML: High-Performance Sparse Communication for Machine Learning, arXiv 2018 img