Exploring Visual Perception With Transformers And World Model Representation