Experiment Design for Hypotheses About How NLP Models Work