The power of big models: Unleashing opportunities for cloud computing

  • Yanying Lin Shenzhen Institute of Advanced Technology, University of Chinese Academy of Sciences, Shenzhen 51800, China;
Keywords: Big model; Cloud computing; Deep learning; Model inference.

Abstract

The proliferation of deep models characterized by an abundance of parameters has catalyzed research enthusiasm in the domain of AI systems. The emergence of novel computational modalities has brought forth numerous fresh challenges within the realm of cloud computing, encompassing aspects such as cost, performance, elasticity, and the intricate tradeoffs entailed therein.

References

1. Brown TB, Mann B, Ryder N, et al. Language models are few-shot learners. arXiv 2020; arXiv:2005.14165. doi: 10.48550/arXiv.2005.1416.

2. Bubeck S, Chandrasekaran V, Eldan R, et al. Sparks of artificial general intelligence: Early experiments with GPT-4. arXiv 2023; arXiv:2303.12712. doi: 10.48550/arXiv.2303.12712.

3. Devlin J, Chang MW, Lee K, Toutanova K. BERT: Pre-training of deep bidirectional transformers for language understanding. arXiv 2019; arXiv:1810.04805. doi: 10.48550/arXiv.1810.04805

4. Touvron H, Lavril T, Izacard G, et al. LLaMA: Open and efficient foundation language models. arXiv 2023; arXiv:2302.13971. doi: 10.48550/arXiv.2302.13971

5. Jain P, Kumar S, Wooders S, et al. Skyplane: Optimizing transfer cost and throughput using cloud-aware overlays. In: Proceedings of the 32nd USENIX Security Symposium (USENIX 2023); 9–11 August 2023; Anaheim, CA, USA. pp. 1375–1389.

6. Li C, Yao Z, Wu X, et al. DeepSpeed data efficiency: Improving deep learning model quality and training efficiency via efficient data sampling and routing. arXiv 2023; arXiv:2212.03597. doi: 10.48550/arXiv.2212.03597

7. Li Z, Zheng L, Zhong Y, et al. AlpaServe: Statistical multiplexing with model parallelism for deep learning serving. arXiv 2023; arXiv:2302.11665. doi: 10.48550/arXiv.2302.11665

8. Yu M, Cao T, Wang W, Chen R. Following the data, not the function: Rethinking function orchestration in serverless computing. In: Proceedings of the 20th USENIX Symposium on Networked Systems Design and Implementation (ONDI 2023); 17–19 April 2023; Boston, MA, USA. pp. 1489–1504.

9. Zhang H, Tang Y, Khandelwal A, Stoica I. SHEPHERD: Serving DNNs in the wild. In: Proceedings of the 20th USENIX Symposium on Networked Systems Design and Implementation (NSDI 2023); 7–19 April 2023; Boston, MA, USA. pp. 787–808.

10. Bai Z, Zhang Z, Zhu Y, Jin X. PipeSwitch: Fast pipelined context switching for deep learning applications. In: Proceedings of the 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI 2020); 4–6 November 2020; Banff, Alberta, Canada. pp. 499–514.

11. Gujarati A, Karimi R, Alzayat S, et al. Serving DNNs like clockwork: Performance predictability from the bottom up. In: Proceedings of the 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI 2020); 4–6 November 2020; Banff, Alberta, Canada. pp. 443–462.

12. Huang Y, Cheng Y, Bapna A, et al. GPipe: Efficient training of giant neural networks using pipeline parallelism. In: Proceedings of the 33rd International Conference on Neural Information Processing Systems (NeurIPS 2019); 8–14 December 2019; Vancouver, Canada. pp. 103–112.

13. Narayanan D, Harlap A, Phanishayee A, et al. PipeDream: Generalized pipeline parallelism for DNN training. In: Proceedings of the 27th ACM Symposium on Operating Systems Principles (SOSP 2019); 27–30 October 2019; Huntsville, ON, Canada. pp. 1–15.

14. Yu GI, Jeong JS, Kim GW, et al. Orca: A distributed serving system for transformer-based generative models. In: Proceedings of the 16th USENIX Symposium on Operating Systems Design and Implementation (OSDI 2022); 11–13 July 2022; Carlsbad, CA, USA. pp. 521–538.

15. Zheng L, Li Z, Zhang H, et al. Alpa: Automating inter- and intra-operator parallelism for distributed deep learning. In: Proceedings of the 16th USENIX Symposium on Operating Systems Design and Implementation (OSDI 2022); 11–13 July 2022; Carlsbad, CA, USA. pp. 559–578.

16. Zhou Z, Wei X, Zhang J, Sun G. PetS: A unified framework for parameter-efficient transformers serving. In: Proceedings of the 2022 USENIX Annual Technical Conference (USENIX ATC 2022); 11–13 July 2022; Carlsbad, CA, USA. pp. 489–504.

17. Bhattacharjee A, Chhokra AD, Kang Z, et al. BARISTA: Efficient and scalable serverless serving system for deep learning prediction services. In: Proceedings of the 2019 IEEE International Conference on Cloud Engineering (IC2E); 24–27 June 2019; Prague, Czech Republic. pp. 23–33.

18. Choi S, Lee S, Kim Y, et al. Serving heterogeneous machine learning models on Multi-GPU servers with spatio-temporal sharing. In: Proceedings of the 2022 USENIX Annual Technical Conference (USENIX ATC 2022); 11–13 July 2022; Carlsbad, CA, USA. pp. 199–216.

19. Kosaian J, Rashmi KV, Venkataraman S. Parity models: Erasure-coded resilience for prediction serving systems. In: Proceedings of the 27th ACM Symposium on Operating Systems Principles (SOSP 2019); 27–30 October 2019; Huntsville, ON, Canada. pp. 30–46.

20. Li J, Zhao L, Yang Y, et al. Tetris: Memory-efficient serverless inference through tensor sharing. In: Proceedings of the 2022 USENIX Annual Technical Conference (USENIX ATC 2022); 11–13 July 2022; Carlsbad, CA, USA.

21. Romero F, Li Q, Yadwadkar NJ, Kozyrakis C. INFaaS: Automated model-less inference serving. In: Proceedings of the 2021 USENIX Annual Technical Conference (USENIX ATC 2021); 14–16 July 2021. pp. 397–411.

Published
2023-12-08
Section
Articles