中大推粤语评测平台可借AI帮助学习

2025-10-31

广东话有一字多音等特点，人工智能（AI）工具如聊天机器人未必容易理解文字内容。中大团队昨推出全球首个大语言模型粤语评测平台「CLEVA-Cantonese」，帮助评估现有大语言模型（LLM）在粤拼转写、俚语理解、粤普转翻等能力，推动粤语LLM发展。
中大InnoHK博智感知互动研究中心主任蒙美玲说，粤语一字多音，例如「行（haang4）街、银行（hong4）、唱片行（hong2）、实行（hang4）」，当中「行」的音及声调全部不同，AI将文字转换成粤语拼音时未必准确；而粤语语境的戏名、球星译名、地名，与普通话语境可能有别，粤语另有「食水深」等俚语。她称标准化评测工具可助开发者及业界识别AI模型强弱。
「CLEVA-Cantonese」评测平台可多方面评估LLM的粤语水平，例如粤拼转写、粤普翻译、中英夹杂的文句翻译、冒犯语言检测、专有名词理解等。蒙美玲说，系统采用香港语言学学会「粤语拼音方案」来核对粤拼，同时获凤凰卫视提供粤语数据。平台在导入及筛选数据后可生成任务，评测LLM在指定范畴能力水平，例如拟定多项选择题等，可得出评分及反馈。
蒙美玲说，虽然粤语人口比例较普通话小，但从中国文化角度，粤语「很有特色、对香港尤其重要」，从文化保育及传承层面，有必要推动生成式AI的粤语能力。项目另一领导学者中大语言与视觉实验室负责人王历伟说，如模型对粤语理解更贴近日常生活文化的表达习惯，人们更能借AI帮助学习、工作及生活。

分类：升学教育, 香港升学
标签： CLEVA-Cantonese, 中大粤语评测平台

曲奇饼	期间	描述
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

明报走进大湾区

明报走进大湾区

中大推粤语评测平台可借AI帮助学习

大湾区

大湾区城市

其他