Preliminary evaluation of ChatGPT as a machine translation engine and as an automatic post-editor of raw machine translation output from other machine translation engines
Abstract
This preliminary study consisted of two experiments. The first aimed to gauge the translation quality obtained from the free-plan version of ChatGPT in comparison with the free versions of DeepL Translator and Google Translate through human evaluation, and the second consisted of using the free-plan version of ChatGPT as an automatic post-editor of raw output from the pay-for version of DeepL Translator (both monolingual and bilingual full machine translation post-editing). The experiments were limited to a single language pair (from English to Italian) and only one text genre (Wikipedia articles). In the first experiment, DeepL Translator was judged to have performed best, Google Translate came second, and ChatGPT, last. In the second experiment, the free-plan version of ChatGPT equalled average human translation (HT) levels of lexical variety in automatic monolingual machine translation post-editing (MTPE) and exceeded average HT lexical variety levels in automatic bilingual MTPE. However, only one MT marker was considered, and the results of the post-editing were not quality-assessed for other features of MTPE that distinguish it from HT. It would therefore be unadvisable to generalize these findings at present. The author intends to carry out new translation experiments during the next academic year with ChatGPT Plus, instead of the free-plan version, both as an MT engine and as an automatic post-editor. The plan is to continue to evaluate the results manually and not automatically..
Published in
International Conference on Human-Informed Translation and Interpreting Technology (HiT-IT 2023): proceedings. Naples, Italy, 7-9 July 2023.