We have the pleasure of welcoming Vageesh Saxena, who will present co-authored research on multilingual information retrieval.
About the talk
We have the pleasure of welcoming Vageesh Saxena, who will present co-authored research on multilingual information retrieval. Existing neural retrievers focus on high-resource languages like English, preventing them from being applied to retrieval scenarios in other languages. Current approaches circumvent the lack of high-quality labeled data in non-English languages by leveraging multilingual pre-trained language models. However, these models require substantial fine-tuning in multiple languages, have a proven tendency to underperform in low-resource (less available) languages, and do not easily extend to new languages once the model is trained. In this work, the authors present a novel modular dense retrieval model that learns from a high-resource language's rich data and effectively adapts to a broad spectrum of languages. The model is called ColBERT-XM and demonstrates competitive performance against existing state-of-the-art multilingual retrievers trained on more extensive datasets in various languages.
About the presenter
Vageesh Saxena is a machine learning enthusiast in the final year of his Ph.D. at Maastricht University's Law & Tech Lab, Netherlands, focusing on research in Natural Language Processing and Computer Vision. His work concentrates on exploring multimodal representational learning techniques for authorship identification tasks, to link and connect illegal vendors by analyzing patterns in writing and photmetric styles on online market advertisements.
About the lunch talk
Everyone is welcomed to join this talk (in English only). Please bring your own lunch.