MACE (Multi-Annotator Competence Estimation) is an implementation of an item-response model that let's you evaluate redundant annotations of categorical data. It provides competence estimates of the individual annotators and the most likely answer to each item.
If we have 10 annotators answer a question, and five answer with 'yes' and five with 'no' (a surprisingly frequent event), we would normaly have to flip a coin to decide what the right answer is. If we knew, however, that one of the people who answered 'yes' is an expert on the question, while one of the others just alwas selects 'no', we would take this information into account to weight their answers. MACE does exactly that. It tries to find out which annotators are more trustworthy and upweighs their answers. All you need to provide is a CSV file with one item per line. In tests, MACE's trust estimates correlated highly wth the annotators' true competence, and it achieved accuracies of over 0.9 on several test sets. MACE can take annotated items into account, if they are available. This helps to guide the training and improves accuracy.
You may find the latest version of MACE here
The input file has to be a comma-separated file, where each line represents an item, and each column represents an annotator. MACE can handle empty lines in the input, which is convenient for annotations of sequential data (e.g., POS tags).
Empty values represent no annotation by the specific annotator on that item. Make sure the last line has a line break.
Examples:
MACE provides two output files:
In addition, you can output the entropy of each item by setting --entropies. This will output a file with the same number of lines as the input file, named [prefix.]entropy.
java -jar MACE.jar example.csv
Evaluate the file example.csv and write the output to competence and prediction
java -jar MACE.jar --prefix out example.csv
Evaluate the file example.csv and write the output to out.competence and out.prediction
java -jar MACE.jar --test example.key example.csv
Evaluate the file example.csv against the true answers in example.key.
Write the output to competence and prediction and print the accuracy to STDOUT
java -jar MACE.jar --threshold 0.9 example.csv
Evaluate the file example.csv. Return predictions only for the 90% of items the model is most confident in.
Write the output to competence and prediction. The latter will have blank lines for ignored items.
java -jar MACE.jar --controls example.controls example.csv
Evaluate the file example.csv. Use the instances in file example.controls to guide training. This improves accuracy.
Write the output to competence and prediction.
java -jar MACE.jar --priors example.weights example.csv
Evaluate the file example.csv. Use the label priors in file example.weights to guide training. This improves accuracy.
Write the output to competence and prediction.
java -jar MACE.jar --distribution example.csv
Evaluate the file example.csv. Instead of showing the most likely label for each instance, show the sorted probability distribution over all labels.
Write the output to competence and prediction.
java -jar MACE.jar --entropies example.csv
Evaluate the file example.csv. Compute the entropy for each instance.
Write the output to competence, entropies and prediction.
Dirk Hovy, Taylor Berg-Kirkpatrick, Ashish Vaswani, and Eduard Hovy (2013): Learning Whom to Trust with MACE. In: Proceedings of NAACL-HLT 2013.