Training Dynamics Of Neural Language Models