Idiomatic Expression Detection
Given context, choose the correct figurative reading from a 4-option MCQ. Tests whether the model recognises that a literal reading is wrong.
Metric — MCQ Accuracy · Source — MAGPIE / EPIE
Every task lives at the intersection of a phrase category (idiom, collocation, noun compound, verbal MWE) and a semantic operation (detection, extraction, categorization, interpretation). Below: the canonical examples used in the paper, the prompt skeleton, and what each metric actually rewards.
Given context, choose the correct figurative reading from a 4-option MCQ. Tests whether the model recognises that a literal reading is wrong.
Metric — MCQ Accuracy · Source — MAGPIE / EPIE
Spot the idiom span in a sentence that may or may not contain one. Open-ended — no choice list.
Metric — Exact Match (span)
Generate a paraphrase of the idiom that fits the surrounding context — the hardest of the three.
Metric — ROUGE-L · BERTScore-F1 · METEOR · BLEU
Tag the collocation with one of the lexical function semantic roles — for which we evaluate at 1, 2, 4, 8 and 16-category granularities.
Metric — Accuracy · Macro / Micro / Weighted F1 · Taxonomies — 1 / 2 / 4 / 8 / 16
Identify the collocation span in running text. Used downstream as the first step of sequential tasks.
Metric — Exact Match
Paraphrase the collocation in context — converting a lexical preference into a transparent gloss.
Metric — ROUGE-L · BERTScore-F1 · METEOR · BLEU
Given a base word and a desired semantic function, retrieve the appropriate collocate. Probes lexical preference directly.
Metric — Exact Match
Binary check on a candidate pair — is it a conventionalised collocation, or just two co-occurring words?
Metric — Accuracy
Decide whether the compound is fully compositional, partly so, or non-compositional.
Metric — MCQ Accuracy · Source — NCTTI
Pick out the [N1 N2] span in a sentence — the compound boundary problem in the wild.
Metric — Exact Match
Produce a free-form paraphrase that preserves the modifier-head relation.
Metric — ROUGE-L · BERTScore-F1 · METEOR · BLEU
Drawn from PARSEME 1.1. Identifies six verbal-MWE subclasses (LVCs, IRVs, VIDs, …) — the only task with deeper internal sub-structure.
Metric — Exact Match · Source — PARSEME 1.1 (English)
Six tasks chain extraction with a downstream operation. The model must extract a phrase and categorize / interpret it in a single response — a setting that surfaces cascade failure invisible to single-step evaluation.
Extract an idiom from a sentence, then classify whether the use is figurative.
Extract the idiom, then produce a context-faithful paraphrase. Errors in step 1 propagate.
Extract the collocate pair, then judge whether it is a true collocation.
Extract the collocation, then gloss it. The most cascade-sensitive of the six.
Extract the compound, then judge its compositionality grade.
Extract the compound, then paraphrase its modifier-head relation.
Prompts are deliberately minimal: a one-line task statement, an optional handful of examples, then the input. The intent is to vary the task, not the prompt.
Task: {task description}
Input: {phrase + context}
Output: {expected form}
— example 1 —
Input: "She tried to break the ice at the meeting."
Output: "make people feel less awkward"
All prompts (zero-shot and few-shot variants for every task) ship with the codebase.