Transfer Learning and Optimization Theory for Large-Scale Models