Hao (Peter) Yu
McGill, Mila, Tencent, SmartPrep. Montreal, QC, Canada hao.yu2[at]mail.mcgill.ca

New York City, 2023.12
Hello, I’m Peter Yu, a MSc. (Thesis) student at McGill University and Mila. I’m supervised by Prof. David Ifeoluwa Adelani on multilingual language processing and low-resource languages. I am also collaborating with Shiwei Tong in Tencent, working on RAG and diffusion model on time series. Start from my undergraduate studies, I was supervised by Prof. Reihaneh Rabbany in detecting misinformation detection with RAG (continue working as collaborator).
Currently, my focus will be on advancing retrieval systems that adapt to human feedback. This research addresses critical challenges in current AI systems, specifically model staleness and knowledge conflicts, through unified knowledge embedding and preference-optimized knowledge distillation. Looking ahead, I aspire to develop AI systems that continuously learn and evolve by integrating human preferences and expertise, drawing inspiration from systems like Google Search which leverages user engagement as a quality signal.
Furthermore, I aim to expand beyond textual knowledge to encompass action spaces and emotional speech, transitioning from learning from humans to augmenting human capabilities. Ultimately, my goal is to develop meaningful, useful, and industry-ready products that create lasting impact.
Actively seeking Ph.D./industry opportunities in AI/NLP/ML.
Poster and Slide
- RAG
- Evaluation of Retrieval-Augmented Generation: A Survey [Poster] CCF BigData 2024
- Web Retrieval Agents for Evidence-Based Misinformation Detection [Poster] COLM 2024
- Double Decomposition with Web-Augmented Verification for Misinfor Detection [Poster]
- Multilingual
- INJONGO: A Multicultural Intent Detection and Slot-filling Dataset for 16 African Languages Under Review, 2025
- Ensembling Enhances Effectiveness of Multilingual Small LMs [Oral] EMNLP 2024 MRL Wining NER
- Weak Supervision
- SWEET - Weakly Supervised Person Name Extraction for Fighting Human Trafficking [Poster] EMNLP 2023
- Data Science
- How to Unlock Time Series Editing? A Diffusion-Driven Approach with Multi-Grained Contro Under Review, 2025
- Paper Share
- Bitext Mining Using Distilled Sentence Representations for Low-Resource Languages [Slide, 2025 Winter]
- GlotLID: Language Identification for Low-Resource Languages [Slide, 2025 Winter]
- Constitutional AI AND Collective Constitutional AI: Aligning a Language Model with Public Input (CCAI) [Slide, 2024 Fall]
🏸🏓⛰️📷
Resume: PDF
Motto: 脚踏实地 行稳致远 (Work hard and steady, and will go far)