What are the "BLEU/COMET scores" in translation?
What are the
"BLEU/COMET scores" in translation?
BLEU and COMET are
automatic metrics used to score the quality of machine-translated text
by comparing it to human reference translations.
BLEU (Bilingual
Evaluation Understudy)
- BLEU measures how many words and short
phrases (n‑grams) in the machine translation also appear in the reference
translation, giving higher weight to longer matching sequences.
- It outputs a score between 0 and 1 (often
shown as 0–100); higher scores mean the MT output is more like the
reference.
- BLEU is fast and widely used, but it
captures surface overlap and is less sensitive to meaning when the wording
differs but is still correct.
COMET
- COMET is a newer, neural-network–based
metric that uses pretrained language models to judge similarity in meaning between
MT output and reference (and often also considers the source sentence).
- Instead of just counting overlapping
words, it embeds sentences in a semantic space and predicts a quality
score that correlates more closely with human judgments, especially for
strong modern systems where BLEU differences are small.
- COMET scores are typically real numbers
(e.g., around −1 to 1); higher values indicate better translations in
terms of adequacy and fluency as estimated by the model.
In practice,
researchers report both: BLEU for comparability with older work and COMET (or
similar neural metrics) for a more meaning-aware view of translation quality.
Comments
Post a Comment