Large-Scale Optimization for Deep Neural Network Architecture: A Dynamical System Theory Perspective