Publications

  1. Cendol: Open Instruction-tuned Generative Large Language Models for Indonesian Languages (ACL)
  2. LLMs Are Few-Shot In-Context Low-Resource Language Learners (NAACL)
  3. An empirical study of multilingual reasoning distillation for question answering (EMNLP)
  4. Efficient Overshadowed Entity Disambiguation by Mitigating Shortcut Learning (EMNLP)
  5. McCrolin: Multi-consistency Cross-lingual Training for Retrieval Question Answering (EMNLP)
  6. SEACrowd: A Multilingual Multimodal Data Hub and Benchmark Suite for Southeast Asian Languages (EMNLP)
  7. Kalahi: A handcrafted, grassroots cultural LLM evaluation suite for Filipino (PACLIC)
  8. Batayan: A Filipino NLP benchmark for evaluating Large Language Models (ACL)
  9. Crowdsource, Crawl, or Generate? Creating SEA-VL, a Multicultural Vision-Language Dataset for Southeast Asia (ACL)
  10. Global MMLU: Understanding and Addressing Cultural and Linguistic Biases in Multilingual Evaluation (ACL)
  11. Towards better understanding of program-of-thought reasoning in cross-lingual and multilingual environments (ACL)
  12. SEA-HELM: Southeast Asian Holistic Evaluation of Language Models (ACL)
  13. ThaiInstruct: An instruction-following Dataset for Culturally-Aware, Multitask, and Multi-domain Evaluation in Thai (EMNLP)
  14. Shortcut Learning in Safety: The Impact of Keyword Bias in Safeguards (LLMSEC)
  15. Worldcuisines: A massive-scale benchmark for multilingual and multicultural visual question answering on global cuisines (NAACL)
  16. Language Surgery in Multilingual Large Language Models (MRL)
  17. SEA-LION: Southeast Asian Languages in One Network (AACL)