An Exploratory Study of Deep Learning Supply Chain

Abstract

Deep learning becomes the driving force behind many contemporary technologies and has been successfully applied in many fields. Through software dependencies, a multi-layer supply chain (SC) with a deep learning framework as the core and substantial down-stream projects as the periphery has gradually formed and is constantly developing. However, basic knowledge about the structure and characteristics of the SC is lacking, which hinders effective support for its sustainable development. Previous studies on software SC usually focus on the packages in different registries without paying attention to the SCs derived from a single project. We present an empirical study on two deep learning SCs: TensorFlow and PyTorch SCs. By constructing and analyzing their SCs, we aim to understand their structure, application domains, and evolutionary factors. We find that both SCs exhibit a short and sparse hierarchy structure. Overall, the relative growth of new projects increases month by month. Projects have a tendency to attract downstream projects shortly after the release of their packages, later the growth becomes faster and tends to stabilize. We propose three criteria to identify vulnerabilities and identify 51 types of packages and 26 types of projects involved in the two SCs. A comparison reveals their similarities and differences, e.g., TensorFlow SC provides a wealth of packages in experiment result analysis, while PyTorch SC contains more specific framework packages. By fitting the GAM model, we find that the number of dependent packages is significantly negatively associated with the number of downstream projects, but the relationship with the number of authors is nonlinear. Our findings can help further open the “black box” of deep learning SCs and provide insights for their healthy and sustainable development.

Publication
In The 44th International Conference on Software Engineering
Kai Gao
Kai Gao
Ph.D. Candidate