Mitigating Data Scarcity for Neural Language Models