Neural Machine Translation or NMT is known for its exceptionally high-quality translation. Being the latest in the line of machine translation systems, NMT definitely proves its superiority and is poised to revolutionise the way the translation industry currently functions.
However, like all systems, Neural Machine Translation requires ongoing feedback and corrections to ensure the continued production of high-quality results. The most common evaluation method used for NMT today is Bilingual Evaluation Understudy (BLEU). However, BLEU is an outdated method that has also been applied to some of the older machine translation systems. Though it made sense back then, NMT is simply too advanced for BLEU.
Quality Translation systems Need Quality Evaluation Methods
Neural Machine Translation is in dire need of a better evaluation approach. It is important because NMT is gradually becoming the industry standard and changing the way translation processes are carried out. Like any piece of majorly disruptive technology, even machine translation systems pose an impact on technically trained staff, sales, marketing and project management.
The “outdated” BLEU system became the standard as it works well with the previous machine translation system and being widely available. However, with NMT, the developments (especially in design) have been significant, rendering BLEU completely ineffective at quantifying the quality of output.
BLEU is what the industry refers to as an “n-gram-based metric system”. The system shows its incompetence when it comes to assessing the capabilities of NMT over other machine translation systems, such as SMT or RBMT. Current research shows that NMT, despite possessing greater capabilities over earlier machine translation systems, manages to receive only two BLEU points more.
NMT is a character-level translation system and a software in the same level is required for evaluation. The ChrF evaluation approach, proposed by Maja Popovic, is an suitable option.
However, as the days progress, we might actually witness the standardisation of NMT quality assurance becoming more fragmented, similar to what we currently see in the demand for NMT. Practitioners will soon begin to come up with proprietary evaluation methods that are specific and relevant to their own needs. Metrics based on named entities, machine learning and custom QA systems may soon become available. It is also believed that multiple evaluation methods may be used in combination. NMT is the new “state-of-the-art,” and eventually, evaluation systems designed to handle the paradigm shift will show up.