Speeding Up Distributed Machine Learning Using Codes.

international symposium on information theory, no. 3 (2018): 1514-1529

Cited by: 354|Views304
EI

Abstract

Distributed machine learning algorithms that are widely run on modern large-scale computing platforms face several types of randomness, uncertainty and system “noise.” These include stragglers 1 , system failures, maintenance outages, and communication bottlenecks. In this work, we view distributed machine learning algorithms through a co...More

Code:

Data:

0
Introduction
  • The computational paradigm for large-scale machine learning and data analytics has shifted towards massively large distributed systems, comprising individually small and unreliable computational nodes.
  • The workflow of distributed machine learning algorithms in a large-scale system can be decomposed into three functional phases: a storage, a communication, and a computation phase, as shown in Fig. 1.
  • In order to develop and deploy sophisticated solutions and tackle large-scale problems in machine learning, science, engineering, and commerce, it is important to understand and optimize novel and complex trade-offs across the multiple dimensions of computation, communication, storage, and the accuracy of results.
  • Codes have begun to transform the storage layer of distributed systems in modern data centers under the umbrella of regenerating and locally repairable codes for distributed storage [7]–[22] which are having a major impact on industry [23]–[26]
Highlights
  • In recent years, the computational paradigm for large-scale machine learning and data analytics has shifted towards massively large distributed systems, comprising individually small and unreliable computational nodes
  • The workflow of distributed machine learning algorithms in a large-scale system can be decomposed into three functional phases: a storage, a communication, and a computation phase, as shown in Fig. 1
  • We show how erasure codes can be applied to distributed computation to mitigate the straggler problem
  • We describe one simple way of parallelizing the algorithm, which is implemented in many open-source machine learning libraries including Spark mllib [83]
  • We have explored the power of coding in order to make distributed algorithms robust to a variety of sources of “system noise” such as stragglers and communication bottlenecks
  • We propose Coded Shuffling that can significantly reduce the heavy price of data-shuffling, which is required for achieving high statistical efficiency in distributed machine learning algorithms
Results
  • The authors will show that the runtime of the algorithm can be significantly reduced compared to that of other uncoded algorithms.
  • The authors propose to use coding opportunities to significantly reduce the communication cost of some distributed learning algorithms that require data shuffling.
  • The authors propose Coded Shuffling that can significantly reduce the heavy price of data-shuffling, which is required for achieving high statistical efficiency in distributed machine learning algorithms.
  • The authors' preliminary experimental results validate the power of the proposed schemes in effectively curtailing the negative effects of system bottlenecks, and attaining significant speedups of up to 40%, compared to the current state-of-the-art methods
Conclusion
  • The authors have explored the power of coding in order to make distributed algorithms robust to a variety of sources of “system noise” such as stragglers and communication bottlenecks.
  • The authors propose Coded Shuffling that can significantly reduce the heavy price of data-shuffling, which is required for achieving high statistical efficiency in distributed machine learning algorithms.
  • Matrix multiplication is one of the most basic computational blocks in many analytics, it would be interesting to leverage coding for a broader class of distributed algorithms
Related work
  • A. Coded Computation and Straggler Mitigation

    The straggler problem has been widely observed in distributed computing clusters. The authors of [6] show that running a computational task at a computing node often involves unpredictable latency due to several factors such as network latency, shared resources, maintenance activities, and power limits. Further, they argue that stragglers cannot be completely removed from a distributed computing cluster. The authors of [27] characterize the impact and causes of stragglers that arise due to resource contention, disk failures, varying network conditions, and imbalanced workload.

    One approach to mitigate the adverse effect of stragglers is based on efficient straggler detection algorithms. For instance, the default scheduler of Hadoop constantly detects stragglers while running computational tasks. Whenever it detects a straggler, it relaunches the task that was running on the detected straggler at some other available node. In [28], Zaharia et al propose a modification to the existing straggler detection algorithm and show that the proposed solution can effectively reduce the completion time of MapReduce tasks. In [27], Ananthanarayanan et al propose a system that efficiently detects stragglers using real-time progress and cancels those stragglers, and show that the proposed system can further reduce the runtime of MapReduce tasks.
Reference
  • K. Lee, M. Lam, R. Pedarsani, D. Papailiopoulos, and K. Ramchandran, “Speeding up distributed machine learning using codes,” IEEE Transactions on Information Theory, vol. PP, no. 99, pp. 1–1, 2017.
    Google ScholarLocate open access versionFindings
  • ——, “Speeding up distributed machine learning using codes,” Presented at 2015 Neural Information Processing Systems (NIPS): Workshop on Machine Learning Systems, December 2015.
    Google ScholarLocate open access versionFindings
  • ——, “Speeding up distributed machine learning using codes,” in Proc. of IEEE International Symposium on Information Theory (ISIT), July 2016, pp. 1143–1147.
    Google ScholarLocate open access versionFindings
  • M. Zaharia, M. Chowdhury, M. J. Franklin, S. Shenker, and I. Stoica, “Spark: Cluster computing with working sets,” in Proc. 2nd USENIX Workshop on Hot Topics in Cloud Computing (HotCloud), 2010. [Online]. Available: https://www.usenix.org/conference/hotcloud-10/spark-cluster-computing-working-sets
    Findings
  • J. Dean and S. Ghemawat, “MapReduce: Simplified data processing on large clusters,” in Proc. 6th Symposium on Operating System Design and Implementation (OSDI), 2004, pp. 137–150. [Online]. Available: http://www.usenix.org/events/osdi04/tech/dean.html
    Findings
  • J. Dean and L. A. Barroso, “The tail at scale,” Commun. ACM, vol. 56, no. 2, pp. 74–80, 2013. [Online]. Available: http://doi.acm.org/10.1145/2408776.2408794
    Locate open access versionFindings
  • A. G. Dimakis, P. B. Godfrey, Y. Wu, M. J. Wainwright, and K. Ramchandran, “Network coding for distributed storage systems,” IEEE Transactions on Information Theory, vol. 56, no. 9, pp. 4539–4551, 2010.
    Google ScholarLocate open access versionFindings
  • K. V. Rashmi, N. B. Shah, and P. V. Kumar, “Optimal exact-regenerating codes for distributed storage at the MSR and MBR points via a product-matrix construction,” IEEE Transactions on Information Theory, vol. 57, no. 8, pp. 5227–5239, 2011.
    Google ScholarLocate open access versionFindings
  • C. Suh and K. Ramchandran, “Exact-repair MDS code construction using interference alignment,” IEEE Transactions on Information Theory, vol. 57, no. 3, pp. 1425–1442, 2011.
    Google ScholarLocate open access versionFindings
  • I. Tamo, Z. Wang, and J. Bruck, “MDS array codes with optimal rebuilding,” in Proc. of IEEE International Symposium on Information Theory (ISIT), 2011, pp. 1240–1244.
    Google ScholarLocate open access versionFindings
  • V. R. Cadambe, C. Huang, S. A. Jafar, and J. Li, “Optimal repair of MDS codes in distributed storage via subspace interference alignment,” arXiv preprint arXiv:1106.1250, 2011.
    Findings
  • D. S. Papailiopoulos, A. G. Dimakis, and V. R. Cadambe, “Repair optimal erasure codes through Hadamard designs,” in Proc. 49th Annual Allerton Conference on Communication, Control, and Computing (Allerton). IEEE, 2011, pp. 1382–1389.
    Google ScholarLocate open access versionFindings
  • P. Gopalan, C. Huang, H. Simitci, and S. Yekhanin, “On the locality of codeword symbols,” IEEE Transactions on Information Theory, vol. 58, no. 11, pp. 6925–6934, 2011.
    Google ScholarLocate open access versionFindings
  • F. Oggier and A. Datta, “Self-repairing homomorphic codes for distributed storage systems,” in Proc. of IEEE INFOCOM. IEEE, 2011, pp. 1215–1223.
    Google ScholarLocate open access versionFindings
  • D. S. Papailiopoulos, J. Luo, A. G. Dimakis, C. Huang, and J. Li, “Simple regenerating codes: Network coding for cloud storage,” in Proc. of IEEE INFOCOM. IEEE, 2012, pp. 2801–2805.
    Google ScholarLocate open access versionFindings
  • J. Han and L. A. Lastras-Montano, “Reliable memories with subline accesses,” in Proc. of IEEE International Symposium on Information Theory (ISIT). IEEE, 2007, pp. 2531–2535.
    Google ScholarLocate open access versionFindings
  • C. Huang, M. Chen, and J. Li, “Pyramid codes: Flexible schemes to trade space for access efficiency in reliable data storage systems,” in Proc. 6th IEEE International Symposium on Network Computing and Applications (NCA). IEEE, 2007, pp. 79–86.
    Google ScholarLocate open access versionFindings
  • D. S. Papailiopoulos and A. G. Dimakis, “Locally repairable codes,” in Proc. of IEEE International Symposium on Information Theory (ISIT). IEEE, 2012, pp. 2771–2775.
    Google ScholarLocate open access versionFindings
  • G. M. Kamath, N. Prakash, V. Lalitha, and P. V. Kumar, “Codes with local regeneration,” arXiv preprint arXiv:1211.1932, 2012.
    Findings
  • A. S. Rawat, O. O. Koyluoglu, N. Silberstein, and S. Vishwanath, “Optimal locally repairable and secure codes for distributed storage systems,” IEEE Transactions on Information Theory, vol. 60, no. 1, pp. 212–236, Jan. 2014.
    Google ScholarLocate open access versionFindings
  • N. Prakash, G. M. Kamath, V. Lalitha, and P. V. Kumar, “Optimal linear codes with a local-error-correction property,” in Proc. of IEEE International Symposium on Information Theory (ISIT). IEEE, 2012, pp. 2776–2780.
    Google ScholarLocate open access versionFindings
  • N. Silberstein, A. Singh Rawat, and S. Vishwanath, “Error resilience in distributed storage via rank-metric codes,” CoRR, vol. abs/1202.0800, 2012.
    Findings
  • C. Huang, H. Simitci, Y. Xu, A. Ogus, B. Calder, P. Gopalan, J. Li, and S. Yekhanin, “Erasure coding in Windows Azure Storage,” in Proc. of USENIX Annual Technical Conference (ATC), Jun. 2012.
    Google ScholarLocate open access versionFindings
  • M. Sathiamoorthy, M. Asteris, D. Papailiopoulos, A. G. Dimakis, R. Vadali, S. Chen, and D. Borthakur, “XORing elephants: Novel erasure codes for big data,” in Proc. VLDB Endowment, vol. 6, no. 5. VLDB Endowment, 2013, pp. 325–336.
    Google ScholarLocate open access versionFindings
  • K. V. Rashmi, N. B. Shah, D. Gu, H. Kuang, D. Borthakur, and K. Ramchandran, “A solution to the network challenges of data recovery in erasure-coded distributed storage systems: A study on the Facebook warehouse cluster,” in Proc. of USENIX HotStorage, Jun. 2013.
    Google ScholarLocate open access versionFindings
  • K. Rashmi, N. B. Shah, D. Gu, H. Kuang, D. Borthakur, and K. Ramchandran, “A hitchhiker’s guide to fast and efficient data reconstruction in erasure-coded data centers,” in Proc. ACM conference on SIGCOMM. ACM, 2014, pp. 331–342.
    Google ScholarLocate open access versionFindings
  • G. Ananthanarayanan, S. Kandula, A. G. Greenberg, I. Stoica, Y. Lu, B. Saha, and E. Harris, “Reining in the outliers in Map-Reduce clusters using Mantri,” in Proc. 9th USENIX Symposium on Operating Systems Design and Implementation (OSDI), 2010, pp. 265–278. [Online]. Available: http://www.usenix.org/events/osdi10/tech/full papers/Ananthanarayanan.pdf
    Findings
  • M. Zaharia, A. Konwinski, A. D. Joseph, R. H. Katz, and I. Stoica, “Improving MapReduce performance in heterogeneous environments,” in Proc. 8th USENIX Symposium on Operating Systems Design and Implementation (OSDI), 2008, pp. 29–42. [Online]. Available: http://www.usenix.org/events/osdi08/tech/full papers/zaharia/zaharia.pdf
    Findings
  • A. Agarwal and J. C. Duchi, “Distributed delayed stochastic optimization,” in Proc. 25th Annual Conference on Neural Information Processing Systems (NIPS), 2011, pp. 873–881. [Online]. Available: http://papers.nips.cc/paper/4247-distributed-delayed-stochastic-optimization
    Findings
  • B. Recht, C. Re, S. Wright, and F. Niu, “Hogwild: A lock-free approach to parallelizing stochastic gradient descent,” in Proc. 25th Annual Conference on Neural Information Processing (NIPS), 2011, pp. 693–701.
    Google ScholarLocate open access versionFindings
  • G. Ananthanarayanan, A. Ghodsi, S. Shenker, and I. Stoica, “Effective straggler mitigation: Attack of the clones,” in Proc. 10th USENIX Symposium on Networked Systems Design and Implementation (NSDI), 2013, pp. 185–198. [Online]. Available: https://www.usenix.org/conference/nsdi13/technical-sessions/presentation/ananthanarayanan
    Findings
  • N. B. Shah, K. Lee, and K. Ramchandran, “When do redundant requests reduce latency?” in Proc. 51st Annual Allerton Conference on Communication, Control, and Computing, 2013, pp. 731–738. [Online]. Available: http://dx.doi.org/10.1109/Allerton.2013.6736597
    Findings
  • D. Wang, G. Joshi, and G. W. Wornell, “Efficient task replication for fast response times in parallel computation,” in Proc. of ACM SIGMETRICS, 2014, pp. 599–600.
    Google ScholarLocate open access versionFindings
  • K. Gardner, S. Zbarsky, S. Doroudi, M. Harchol-Balter, and E. Hyytia, “Reducing latency via redundant requests: Exact analysis,” in Proc. of ACM SIGMETRICS, 2015, pp. 347–360.
    Google ScholarLocate open access versionFindings
  • M. Chaubey and E. Saule, “Replicated data placement for uncertain scheduling,” in Proc. of IEEE International Parallel and Distributed Processing Symposium Workshop (IPDPS), 2015, pp. 464–472. [Online]. Available: http://dx.doi.org/10.1109/IPDPSW.2015.50
    Findings
  • K. Lee, R. Pedarsani, and K. Ramchandran, “On scheduling redundant requests with cancellation overheads,” in Proc. 53rd Annual Allerton conference on Communication, Control, and Computing, Oct. 2015.
    Google ScholarLocate open access versionFindings
  • G. Joshi, E. Soljanin, and G. Wornell, “Efficient redundancy techniques for latency reduction in cloud systems,” ACM Trans. Model. Perform. Eval. Comput. Syst., vol. 2, no. 2, pp. 12:1–12:30, Apr. 2017. [Online]. Available: http://doi.acm.org/10.1145/3055281
    Locate open access versionFindings
  • L. Huang, S. Pawar, H. Zhang, and K. Ramchandran, “Codes can reduce queueing delay in data centers,” in Information Theory Proceedings (ISIT), 2012 IEEE International Symposium on, July 2012, pp. 2766–2770.
    Google ScholarLocate open access versionFindings
  • K. Lee, N. B. Shah, L. Huang, and K. Ramchandran, “The MDS queue: Analysing the latency performance of erasure codes,” IEEE Transactions on Information Theory, vol. 63, no. 5, pp. 2822–2842, May 2017.
    Google ScholarLocate open access versionFindings
  • G. Joshi, Y. Liu, and E. Soljanin, “On the delay-storage trade-off in content download from coded distributed storage systems,” IEEE Journal on Selected Areas in Communications, vol. 32, no. 5, pp. 989–997, 2014.
    Google ScholarLocate open access versionFindings
  • Y. Sun, Z. Zheng, C. E. Koksal, K.-H. Kim, and N. B. Shroff, “Provably delay efficient data retrieving in storage clouds,” arXiv:1501.01661, 2015.
    Findings
  • S. Kadhe, E. Soljanin, and A. Sprintson, “When do the availability codes make the stored data more available?” in Proc. of 53rd Annual Allerton Conference on Communication, Control, and Computing (Allerton), Sept 2015, pp. 956–963. [43] ——, “Analyzing the download time of availability codes,” in Proc. of IEEE International Symposium on Information Theory (ISIT), June 2015, pp.
    Google ScholarLocate open access versionFindings
  • [44] N. Ferdinand and S. Draper, “Anytime coding for distributed computation,” Presented at the 54th Annual Allerton conference on Communication, Control, and Computing, Monticello, IL, 2016.
    Google ScholarLocate open access versionFindings
  • [45] S. Dutta, V. Cadambe, and P. Grover, “Short-Dot: Computing large linear transforms distributedly using coded short dot products,” in Advances In Neural Information Processing Systems, 2016, pp. 2092–2100.
    Google ScholarLocate open access versionFindings
  • [46] R. Tandon, Q. Lei, A. G. Dimakis, and N. Karampatziakis, “Gradient coding,” arXiv preprint arXiv:1612.03301, 2016.
    Findings
  • [47] R. Bitar, P. Parag, and S. E. Rouayheb, “Minimizing latency for secure distributed computing,” arXiv preprint arXiv:1703.017504, 2017.
    Google ScholarFindings
  • [48] K. Lee, C. Suh, and K. Ramchandran, “High-dimensional coded matrix multiplication,” in Proc. of IEEE International Symposium on Information Theory (ISIT), June 2017.
    Google ScholarLocate open access versionFindings
  • [49] A. Reisizadehmobarakeh, S. Prakash, R. Pedarsani, and S. Avestimehr, “Coded computation over heterogeneous clusters,” arXiv preprint arXiv:1701.05973, 2017.
    Findings
  • [50] K. Lee, R. Pedarsani, D. Papailiopoulos, and K. Ramchandran, “Coded computation for multicore setups,” in Proc. of IEEE International Symposium on Information Theory (ISIT), June 2017.
    Google ScholarLocate open access versionFindings
  • [51] D. P. Bertsekas, Nonlinear programming. Athena scientific, 1999.
    Google ScholarFindings
  • [52] A. Nedic and A. E. Ozdaglar, “Distributed subgradient methods for multi-agent optimization,” IEEE Transactions on Automatic Control, vol. 54, no. 1, pp. 48–61, 2009. [Online]. Available: http://dx.doi.org/10.1109/TAC.2008.2009515
    Locate open access versionFindings
  • [53] S. P. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein, “Distributed optimization and statistical learning via the alternating direction method of multipliers,” Foundations and Trends in Machine Learning, vol. 3, no. 1, pp. 1–122, 2011. [Online]. Available: http://dx.doi.org/10.1561/2200000016
    Findings
  • [54] R. Bekkerman, M. Bilenko, and J. Langford, Scaling up machine learning: Parallel and distributed approaches. Cambridge University Press, 2011.
    Google ScholarFindings
  • [55] J. C. Duchi, A. Agarwal, and M. J. Wainwright, “Dual averaging for distributed optimization: Convergence analysis and network scaling,” IEEE Transactions on Automatic Control, vol. 57, no. 3, pp. 592–606, 2012. [Online]. Available: http://dx.doi.org/10.1109/TAC.2011.2161027
    Locate open access versionFindings
  • [56] J. Chen and A. H. Sayed, “Diffusion adaptation strategies for distributed optimization and learning over networks,” IEEE Transactions on Signal Processing, vol. 60, no. 8, pp. 4289–4305, 2012. [Online]. Available: http://dx.doi.org/10.1109/TSP.2012.2198470
    Locate open access versionFindings
  • [57] J. Dean, G. Corrado, R. Monga, K. Chen, M. Devin, Q. V. Le, M. Z. Mao, M. Ranzato, A. W. Senior, P. A. Tucker, K. Yang, and A. Y. Ng, “Large scale distributed deep networks,” in Proc. 26th Annual Conference on Neural Information Processing Systems (NIPS), 2012, pp. 1232–1240. [Online]. Available: http://papers.nips.cc/paper/4687-large-scale-distributed-deep-networks
    Findings
  • [58] Y. Low, D. Bickson, J. Gonzalez, C. Guestrin, A. Kyrola, and J. M. Hellerstein, “Distributed graphlab: a framework for machine learning and data mining in the cloud,” Proc. VLDB Endowment, vol. 5, no. 8, pp. 716–727, 2012.
    Google ScholarLocate open access versionFindings
  • [59] T. Kraska, A. Talwalkar, J. C. Duchi, R. Griffith, M. J. Franklin, and M. I. Jordan, “MLbase: A distributed machine-learning system,” in Proc. 6th Biennial Conference on Innovative Data Systems Research (CIDR), 2013. [Online]. Available: http://www.cidrdb.org/cidr2013/Papers/CIDR13 Paper118.pdf
    Findings
  • [60] E. R. Sparks, A. Talwalkar, V. Smith, J. Kottalam, X. Pan, J. E. Gonzalez, M. J. Franklin, M. I. Jordan, and T. Kraska, “MLI: an API for distributed machine learning,” in Proc. IEEE 13th International Conference on Data Mining (ICDM), 2013, pp. 1187–1192. [Online]. Available: http://dx.doi.org/10.1109/ICDM.2013.158
    Findings
  • [61] M. Li, D. G. Andersen, J. W. Park, A. J. Smola, A. Ahmed, V. Josifovski, J. Long, E. J. Shekita, and B. Su, “Scaling distributed machine learning with the parameter server,” in Proc. 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI), 2014, pp. 583–598. [Online]. Available: https://www.usenix.org/conference/osdi14/technical-sessions/presentation/li mu
    Findings
  • [62] B. Recht and C. Re, “Parallel stochastic gradient algorithms for large-scale matrix completion,” Mathematical Programming Computation, vol. 5, no. 2, pp. 201–226, 2013.
    Google ScholarLocate open access versionFindings
  • [63] L. Bottou, “Stochastic gradient descent tricks,” in Neural Networks: Tricks of the Trade - Second Edition, 2012, pp. 421–436. [Online]. Available: http://dx.doi.org/10.1007/978-3-642-35289-8 25
    Findings
  • [64] C. Zhang and C. Re, “Dimmwitted: A study of main-memory statistical analytics,” Proc. VLDB Endowment, vol. 7, no. 12, pp. 1283–1294, 2014.
    Google ScholarLocate open access versionFindings
  • [65] M. Gurbuzbalaban, A. Ozdaglar, and P. Parrilo, “Why random reshuffling beats stochastic gradient descent,” arXiv preprint arXiv:1510.08560, 2015.
    Findings
  • [66] S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep network training by reducing internal covariate shift,” arXiv preprint arXiv:1502.03167, 2015.
    Findings
  • [67] M. A. Maddah-Ali and U. Niesen, “Fundamental limits of caching,” IEEE Transactions on Information Theory, vol. 60, no. 5, pp. 2856–2867, 2014.
    Google ScholarLocate open access versionFindings
  • [68] M. A. Maddah-Ali and U. Niesen, “Decentralized coded caching attains order-optimal memory-rate tradeoff,” IEEE/ACM Transactions on Networking, vol. 23, no. 4, pp. 1029–1040, 2015. [Online]. Available: http://dx.doi.org/10.1109/TNET.2014.2317316
    Locate open access versionFindings
  • [69] R. Pedarsani, M. A. Maddah-Ali, and U. Niesen, “Online coded caching,” in Proc. of IEEE International Conference on Communications (ICC), 2014, pp. 1878–1883. [Online]. Available: http://dx.doi.org/10.1109/ICC.2014.6883597
    Findings
  • [70] N. Karamchandani, U. Niesen, M. A. Maddah-Ali, and S. Diggavi, “Hierarchical coded caching,” in Proc. of IEEE International Symposium on Information Theory (ISIT), 2014, pp. 2142–2146.
    Google ScholarLocate open access versionFindings
  • [71] M. Ji, G. Caire, and A. F. Molisch, “Fundamental limits of distributed caching in D2D wireless networks,” in Proc. of IEEE Information Theory Workshop (ITW), 2013, pp. 1–5. [Online]. Available: http://dx.doi.org/10.1109/ITW.2013.6691247
    Findings
  • [72] S. Li, M. A. Maddah-ali, and S. Avestimehr, “Coded MapReduce,” Presented at the 53rd Annual Allerton conference on Communication, Control, and Computing, Monticello, IL, 2015.
    Google ScholarLocate open access versionFindings
  • [73] Y. Birk and T. Kol, “Coding on demand by an informed source (iscod) for efficient broadcast of different supplemental data to caching clients,” IEEE Transactions on Information Theory, vol. 52, no. 6, pp. 2825–2830, June 2006.
    Google ScholarLocate open access versionFindings
  • [74] Z. Bar-Yossef, Y. Birk, T. S. Jayram, and T. Kol, “Index coding with side information,” IEEE Transactions on Information Theory, vol. 57, no. 3, pp. 1479–1494, March 2011.
    Google ScholarLocate open access versionFindings
  • [75] M. A. Attia and R. Tandon, “Information theoretic limits of data shuffling for distributed learning,” in Proc. of IEEE Global Communications Conference (GLOBECOM), Dec 2016, pp. 1–6. [76] ——, “On the worst-case communication overhead for distributed data shuffling,” in Proc. of 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), Sept 2016, pp. 961–968.
    Google ScholarLocate open access versionFindings
  • [77] L. Song and C. Fragouli, “A pliable index coding approach to data shuffling,” arXiv preprint arXiv:1701.05540, 2017.
    Findings
  • [78] S. Li, M. A. Maddah-Ali, and A. S. Avestimehr, “A unified coding framework for distributed computing with straggling servers,” CoRR, vol. abs/1609.01690, 2016. [Online]. Available: http://arxiv.org/abs/1609.01690
    Findings
  • [79] T. M. Cover and J. A. Thomas, Elements of information theory. John Wiley & Sons, 2012.
    Google ScholarFindings
  • [80] D. Costello and S. Lin, Error control coding. New Jersey, 2004.
    Google ScholarFindings
  • [81] G. Liang and U. C. Kozat, “TOFEC: achieving optimal throughput-delay trade-off of cloud storage using erasure codes,” in Proc. of IEEE Conference on Computer Communications (INFOCOM), 2014, pp. 826–834. [Online]. Available: http://dx.doi.org/10.1109/INFOCOM.2014.6848010
    Findings
  • [82] S. Boyd and L. Vandenberghe, Convex optimization. Cambridge university press, 2004.
    Google ScholarFindings
  • [83] X. Meng, J. K. Bradley, B. Yavuz, E. R. Sparks, S. Venkataraman, D. Liu, J. Freeman, D. B. Tsai, M. Amde, S. Owen, D. Xin, R. Xin, M. J. Franklin, R. Zadeh, M. Zaharia, and A. Talwalkar, “MLlib: Machine learning in apache spark,” CoRR, vol. abs/1505.06807, 2015. [Online]. Available: http://arxiv.org/abs/1505.06807 [84] “Open MPI: Open source high performance computing,” http://www.open-mpi.org, accessed:2015-11-25.[85] “StarCluster,” http://star.mit.edu/cluster/, accessed:2015-11-25.[86] “BLAS (Basic Linear Algebra Subprograms),” http://www.netlib.org/blas/, accessed:2015-11-25.
    Findings
  • [87] S. Li, M. A. Maddah-Ali, and A. S. Avestimehr, “Fundamental tradeoff between computation and communication in distributed computing,” in Proc. of IEEE International Symposium on Information Theory (ISIT), July 2016, pp. 1814–1818.
    Google ScholarLocate open access versionFindings
  • [88] D. Halperin, S. Kandula, J. Padhye, P. Bahl, and D. Wetherall, “Augmenting data center networks with multi-gigabit wireless links,” vol. 41, no. 4, pp. 38–49, 2011.
    Google ScholarFindings
  • [89] Y. Zhu, X. Zhou, Z. Zhang, L. Zhou, A. Vahdat, B. Y. Zhao, and H. Zheng, “Cutting the cord: a robust wireless facilities network for data centers,” in Proceedings of the 20th annual International Conference on Mobile Computing and networking (KCML). ACM, 2014, pp. 581–592.
    Google ScholarLocate open access versionFindings
  • [90] M. Y. Arslan, I. Singh, S. Singh, H. V. Madhyastha, K. Sundaresan, and S. V. Krishnamurthy, “Computing while charging: building a distributed computing infrastructure using smartphones,” in Proc. 8th international conference on Emerging networking experiments and technologies. ACM, 2012, pp. 193–204.
    Google ScholarLocate open access versionFindings
Your rating :
0

 

Tags
Comments
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn
小科