MACE - Multi-Annotator Competence Estimation

Overview

MACE (Multi-Annotator Competence Estimation) is an implementation of an item-response model that let's you evaluate redundant annotations of categorical data. It provides competence estimates of the individual annotators and the most likely answer to each item.

If we have 10 annotators answer a question, and five answer with 'yes' and five with 'no' (a surprisingly frequent event), we would normaly have to flip a coin to decide what the right answer is. If we knew, however, that one of the people who answered 'yes' is an expert on the question, while one of the others just alwas selects 'no', we would take this information into account to weight their answers. MACE does exactly that. It tries to find out which annotators are more trustworthy and upweighs their answers. All you need to provide is a CSV file with one item per line.
In tests, MACE's trust estimates correlated highly wth the annotators' true competence, and it achieved accuracies of over 0.9 on several test sets. MACE can take annotated items into account, if they are available. This helps to guide the training and improves accuracy.

You may find the latest version of MACE here

Examples

Input

The input file has to be a comma-separated file, where each line represents an item, and each column represents an annotator. MACE can handle empty lines in the input, which is convenient for annotations of sequential data (e.g., POS tags). Empty values represent no annotation by the specific annotator on that item. Make sure the last line has a line break.
Examples:

  1. no,yes,,,,yes,no,no
    ,,yes,yes,,no,no,yes
    yes,no,no,yes,,yes,,no
  2. NOUN,,,NOUN,PRON
    VERB,VERB,,VERB,

    ADJ,,ADJ,,ADV
    ,,VERB,,VERB,ADV
    NOUN,,,NOUN,PRON
Additionally, MACE can take
  1. a test file. Each line corresponds to one item in the CSV file, so the number of lines must match. If a test file is supplied, MACE outputs the accuracy of the predictions to STDOUT.
  2. a file with control items. Each line corresponds to one item, so the number of lines MUST match the input CSV file. However, not every line has to be filled. Control items serve as semi-supervised input. Controls usually improve accuracy, because it makes it easier to find bad annotators.
  3. a file with label priors. Each line has one label and weight per line (tab-separated). Must include all labels in the data file. Weights will be automatically normalized. Priors can help improve accuracy.

Output

MACE provides two output files:

  1. the most likely answer for each item, prefix.prediction. This file has the same number of lines as the input file.
    If you set --distribution, each line contains the distribution over answer values sorted by entropy.
  2. the competence estimate for each annotator, prefix.competence. This file has one line with tab separated values

In addition, you can output the entropy of each item by setting --entropies. This will output a file with the same number of lines as the input file, named [prefix.]entropy.

Usage

java -jar MACE.jar example.csv
Evaluate the file example.csv and write the output to competence and prediction

java -jar MACE.jar --prefix out example.csv
Evaluate the file example.csv and write the output to out.competence and out.prediction

java -jar MACE.jar --test example.key example.csv
Evaluate the file example.csv against the true answers in example.key. Write the output to competence and prediction and print the accuracy to STDOUT

java -jar MACE.jar --threshold 0.9 example.csv
Evaluate the file example.csv. Return predictions only for the 90% of items the model is most confident in. Write the output to competence and prediction. The latter will have blank lines for ignored items.

java -jar MACE.jar --controls example.controls example.csv
Evaluate the file example.csv. Use the instances in file example.controls to guide training. This improves accuracy. Write the output to competence and prediction.

java -jar MACE.jar --priors example.weights example.csv
Evaluate the file example.csv. Use the label priors in file example.weights to guide training. This improves accuracy. Write the output to competence and prediction.

java -jar MACE.jar --distribution example.csv
Evaluate the file example.csv. Instead of showing the most likely label for each instance, show the sorted probability distribution over all labels. Write the output to competence and prediction.

java -jar MACE.jar --entropies example.csv
Evaluate the file example.csv. Compute the entropy for each instance. Write the output to competence, entropies and prediction.

Publication

Dirk Hovy, Taylor Berg-Kirkpatrick, Ashish Vaswani, and Eduard Hovy (2013): Learning Whom to Trust with MACE. In: Proceedings of NAACL-HLT 2013.