Beyond Memorization: Evaluating Length-Generalization in Transformer-based Language Models