What are the "BLEU/COMET scores" in translation?

 

What are the "BLEU/COMET scores" in translation?

BLEU and COMET are automatic metrics used to score the quality of machine-translated text by comparing it to human reference translations.

BLEU (Bilingual Evaluation Understudy)

  • BLEU measures how many words and short phrases (n‑grams) in the machine translation also appear in the reference translation, giving higher weight to longer matching sequences.
  • It outputs a score between 0 and 1 (often shown as 0–100); higher scores mean the MT output is more like the reference.
  • BLEU is fast and widely used, but it captures surface overlap and is less sensitive to meaning when the wording differs but is still correct.

COMET

  • COMET is a newer, neural-network–based metric that uses pretrained language models to judge similarity in meaning between MT output and reference (and often also considers the source sentence).
  • Instead of just counting overlapping words, it embeds sentences in a semantic space and predicts a quality score that correlates more closely with human judgments, especially for strong modern systems where BLEU differences are small.
  • COMET scores are typically real numbers (e.g., around −1 to 1); higher values indicate better translations in terms of adequacy and fluency as estimated by the model.

In practice, researchers report both: BLEU for comparability with older work and COMET (or similar neural metrics) for a more meaning-aware view of translation quality.

 

Comments

Popular posts from this blog

CÓMO CONVERTIRSE EN UN MEJOR TRADUCTOR

COME DIVENTARE UN TRADUTTORE MIGLIORE

BECOMING A BETTER TRANSLATOR