Multi-Annotator Competence Estimation
MACE (Multi-Annotator Competence Estimation) is an implementation of an item-response model, written by Taylor Berg-Kirkpatrick, Dirk Hovy, and Ashish Vaswani at USC/ISI. It lets you evaluate redundant annotations of categorical data (for example from Amazon's Mechanical Turk), and provides competence estimates of the individual annotators and the most likely answer to each item.
If we have 10 annotators answer a question, and five answer with 'yes' and five with 'no' (a surprisingly frequent event), we would normally have to flip a coin to decide what the right answer is. If we knew, however, that one of the people who answered 'yes' is an expert on the question, while one of the others just always selects 'no', we could take this information into account to weight their answers. MACE does exactly that. It tries to find out which annotators are more trustworthy and upweights their answers. All you need to provide is a CSV file with one item per line. In tests, MACE's competence estimates correlated highly with the annotators' true competence, and it achieved accuracies of over 0.9 on several test sets. MACE can take annotated items into account, if they are available. This helps to guide the training and improves accuracy.
To download and run the software, please read the license agreement and fill out the form below, download, and follow these steps:
- the downloaded file will be named mace.pl. Rename to mace.zip:
mv mace.pl mace.zip
- unzip mace.zip:
This will create a directory called MACE/
- navigate into that directory:
- you can run the software using:
java -jar MACE.jar
The input file has to be a comma-separated file, where each line represents an item, and each column represents an annotator. Files should be formatted in UTF-8 to avoid newline character problems.
Empty values represent no annotation by the specific annotator on that item. Make sure the last line has a line break.
Example for 3 items with 5 annotations each (8 individual annotators):
MACE produces two output files:
- the most likely answer for each item, [prefix.]prediction. This file has the same number of lines as the input file
- the competence estimate for each annotator, [prefix.]competence. This file has one line with tab separated values
java -jar MACE.jar example.csv
Evaluate the file example.csv and write the output to competence and prediction
java -jar MACE.jar --alpha 0.5 example.csv
Evaluate the file example.csv. Use Variational Bayes training with a Beta(0.5,0.5) prior. Usually results in slightly better accuracy than regular EM training
java -jar MACE.jar --prefix out example.csv
Evaluate the file example.csv and write the output to out.competence and out.prediction
java -jar MACE.jar --test example.key example.csv
Evaluate the file example.csv against the true answers in example.key. Write the output to competence and prediction and print the accuracy to STDOUT
java -jar MACE.jar --threshold 0.9 example.csv
Evaluate the file example.csv. Return predictions only for the 90% of items the model is most confident in. Write the output to competence and prediction. The latter will have blank lines for ignored items.
java -jar MACE.jar --controls example.controls example.csv
Evaluate the file example.csv. Use the instances in file example.controls to guide training. This improves accuracy. Write the output to competence and prediction.
Dirk Hovy, Taylor Berg-Kirkpatrick, Ashish Vaswani, and Eduard Hovy (2013): Learning Whom to Trust with MACE. In: Proceedings of NAACL-HLT 2013. PDF