Understanding and Mitigating Biases in Evaluation