What are the "BLEU/COMET scores" in translation?

2/26/2026 07:46:00 AM

BLEU and COMET are automatic metrics used to score the quality of machine-translated text by comparing it to human reference translations.

BLEU (Bilingual Evaluation Understudy)

BLEU measures how many words and short phrases (n‑grams) in the machine translation also appear in the reference translation, giving higher weight to longer matching sequences.
It outputs a score between 0 and 1 (often shown as 0–100); higher scores mean the MT output is more like the reference.
BLEU is fast and widely used, but it captures surface overlap and is less sensitive to meaning when the wording differs but is still correct.

COMET

COMET is a newer, neural-network–based metric that uses pretrained language models to judge similarity in meaning between MT output and reference (and often also considers the source sentence).
Instead of just counting overlapping words, it embeds sentences in a semantic space and predicts a quality score that correlates more closely with human judgments, especially for strong modern systems where BLEU differences are small.
COMET scores are typically real numbers (e.g., around −1 to 1); higher values indicate better translations in terms of adequacy and fluency as estimated by the model.

In practice, researchers report both: BLEU for comparability with older work and COMET (or similar neural metrics) for a more meaning-aware view of translation quality.

Search This Blog

A Newbie translating

What are the "BLEU/COMET scores" in translation?

Comments

Post a Comment

Popular posts from this blog

CÓMO CONVERTIRSE EN UN MEJOR TRADUCTOR

COME DIVENTARE UN TRADUTTORE MIGLIORE

BECOMING A BETTER TRANSLATOR