SciKnowMine

Triage System Demo

This is an open demonstration for public use to examine the interface as part of the 'Interactive Text Mining' task at BioCreative IV. This deployment is designed only as a sandbox to examine simple functionality to showcase how the interface works in it's current form (including its weaknesses as well as it's strengths). 

Click on the image below to be taken to the demo.

The idea is to select both (A) the target corpus (at the top left) which denotes the category into which you are classifying the article, and (B) the triage corpus ('john' at the middle left) which denotes the stack of papers a given curator has to work through. The system then lists all records in the triage corpus based on their inclusion classification into the target corpus ('in', 'out' or 'unclassified') and as a preliminary support for the curator, our system provides a score, derived from a document classifier trained using very rudimentry machine learning to provide a number from 0.0 to 1.0 indicating the likliehood that the document is classified 'in'.  

Note that this score is likely to be quite inaccurate since we have not yet optimized the document classifier. This is an area for improvement and we ask you to be patient with obvious errors of this part of the system when you evaluate our work.  

Importantly, as you scroll through the list, you can use the left and right cursor keys to change the classification of each document to either 'in' or 'out', or if you'd like to remove the classification, you just hit the delete key to return the document to an 'unclassified' state. The interface is designed to be easy for a curator to step through a list of documents and make a decision about each one based on the view that they see. As shown, we use the FlexPaper web application to show the PDF document (note that we will be removing this dependency but have our own PDF viewer which provides highlighting and annotating capabilities, see the next iteration of the system). Note that all papers presented here are taken from the Pubmed Central's 'Open Access Subset' and we will be regenerating the original build of the database automatically every night, so no changes will be stored in the system and you should feel free to play with it as much as you like.  

The main reason for providing this demonstration system is to ask the question:

"What information do you think we should highlight to help with the classification task for a curator?"

Should we present highlighted words in the text pertaining to the curation task? Should we run a cluster analysis over the document set and showcase the relationships between documents? Please tell us what you think.

If you are participating in the BioCreative IV IAT task, you should use their procedures to provide feedback, but feel free to send email with thoughts and suggestions to gully@usc.edu.

Groups: