Pre-translation vs. direct inference in multilingual LLM applications

Large language models (LLMs) are becoming omnipresent tools for solving a wide range of problems. However, their effectiveness in handling diverse languages has been hampered by inherent limitations in training data, which are often skewed towards English. To address this, pre-translation, where inputs are translated to English before feeding them to the LLM, has become a standard practice.

Previous research has demonstrated the effectiveness of pre-translation for optimal LLM performance for GPT-3/3.5/4, ChatGPT, PaLM and other models. While pre-translation helps address the language bias issue, it introduces complexities and inefficiencies, and it may lead to information loss. With the introduction of new powerful LLMs trained on massive multilingual datasets, it is time to revisit the assumed necessity of pre-translation.

In our recent work “Breaking the Language Barrier: Can Direct Inference Outperform Pre-Translation in Multilingual LLM Applications?”, to be presented at NAACL’24, we re-evaluate the need for pre-translation using PaLM2, which has been established as highly performant in multilingual tasks. Our findings challenge the pre-translation paradigm established in prior research and highlight the advantages of direct inference in PaLM2. Specifically, we demonstrate that PaLM2-L consistently outperforms pre-translation in 94 out of 108 languages, offering a more efficient and effective application in multilingual settings while unlocking linguistic authenticity and alleviating the limitations of pre-translation.

Leave a Comment Cancel Reply