X-Sender: gil@nitro.isi.edu Date: Wed, 31 Jan 2001 14:33:21 -0800 To: rkf-tkcp@AI.SRI.COM From: Yolanda Gil Subject: [rkf-tkcp] Plans for user pre-tests and evaluations X-archive-position: 82 Sender: rkf-tkcp-bounce@AI.SRI.COM X-original-sender: gil@ISI.EDU Reply-to: gil@ISI.EDU We agreed last week that it would be good to start immediately with user pre-tests and evaluations and conduct them periodically up and until the IET evaluations in May. Since I agreed to coordinate our efforts on this topic, here is the plan I suggest we follow. Immediate action items are summarized at the end of the message. GOALS OF PRE-TESTS I assume that we are not trying to test any specific claims or hypothesis about our system. Instead, our goals are: - to get focused feedback on the capabilities of the tool and its usability by SMEs - to prepared ourselves for the summer evaluations in terms of what we should expect to happen in them - to provide informed input to IET about how to conduct the evaluations this summer based on our first-hand experience with evaluating the tool with SMEs We have found that user evaluations are A LOT of work, but they are also an invaluable source of feedback and an incredibly useful reality check. I would not be surprised if feedback obtained from these evaluations were to drive significantly the team's work in the upcoming months. PROPOSED SCHEDULE I suggest that we plan on testing first ourselves as users (T0) and then plan on four cycles of SME tests (T1 through T4) followed by system extensions/upgrades. We can adjust this and do two or three rounds with SMEs intead of four depending on how our initial tests go and how fast we can improve and extend our tools based on their feedback. Here is my proposed schedule: 1/29-2/4: Prepare SHAKEN 0.2 to get it ready for initial user tests 2/5-2/11: Pre-test by team members as users (T0) 2/12-2/25: Improve system based on pre-test 2/26: First pre-test with SMEs (T1) 3/5: System extensions/upgrades 3/12: System extensions/upgrades continue 3/19: Pre-test with SMEs (T2) 3/26: System extensions/upgrades 4/2: System extensions/upgrades continue 4/9: Pre-test with SMEs (T3) 4/16: System extensions/upgrades 4/23: System extensions/upgrades continue 4/30: Pre-test with SMEs (T4) 5/7: System extensions/upgrades. Installation of system at IET's lab 5/14: IET's SME testing starts I would suggest that we test two SMEs in each of the T1-T4 tests. SELECTING SMEs I believe the plan is Kristien and Vinay will find us SMEs from SRI that we can use in pre-tests. In our experience, it takes a few minutes of interacting with a potential participant in an experiment to tell if they are going to be useful. Here is my three main criteria for what makes a good subject for our tests: - TEACHING ABILITIES: Some people are good at explaning things and others are just too criptic. The former makes a better SME for our purposes. To find this out, I casually ask them to explain something a bit involved, on whatever topic comes up in conversation. - CURIOSITY IN COMPUTER TECHNOLOGY: Some people are interested in new technology and others do not seem to be. One good indicator is whether they have written formulas in Excel spreadsheets, or put together Word Macros, or used something like MacinTax, or played around with HTML, or whatever. - MOTIVATION: What can I say. You'd be surprised. A SME that does not satisfy these criteria typically gives bad data. Let's not shoot ourselves in the foot. TRAINING AND TESTING MATERIAL Pat, Bruce, and I will put together something in time for T1 (the first SME tests on 2/26), and will run the material by Art and Kristien. I suggest that we keep it as brief as possible, using slides that highlight the main things instead of text with manual-style details. Here is what we will need: - small intro to our goals with the test and the approach of our tool - brief tour of component library - overview of the tool using training example - hands-on practice with tool first with the same training example and then with a second training example - a 1-page write-up of instructions and tasks they need to do in the test - short questionnaire for additional feedback and suggestions from participants For the first round of experiments I suggest we show users only the parts of the component library that are relevant to the tests, since we still won't have implemented things such as the SME dictionary. I suggest that the test should be to enter the RNA transcription scenario. For T0 I assume we will not need any training materials, since we are testing ourselves. We will use virus invades cell as practice. Bruce and I will work on coming up with another training scenario on time for T1 that is more aligned with Chapter 7. DATA COLLECTION - SHAKEN needs to be instrumented to record all kinds fo data about what the user is doing, tracking what axioms are added, keeping versions of the KB, etc. I believe SRI has been taking care of this. I think it would be useful if SRI could distribute to the team a description the data being collected and a sample log. - A KE will be present and taking notes on what the participant is doing. - We will videotape the sessions in case anyone wants to analyze the tapes. I personally have never needed them, and people that do user studies for a living will tell you that it takes 5 hrs to turn 1 hr of tape into workable material. - We are looking into software that captures screen activities. If anyone has direct experience or suggestions in this respect, let us know. GROUND RULES AND GUIDELINES A KE will always be present, and we have to figure out what he/she needs to do when the participant gets stuck or wants to ask a question. In as much as possible, the KE will formulate any answers in terms of future capabilities of the tool. This is much in the spirit of a Wizard of Oz experiment, where a human pretends that a certain tool or capability is implemented and interacts with the user as if he were that tool. That way, we can know that by adding capability X, the problem that user Y had when she got stuch with Z has been addressed. LEGALITIES As Murray said, we want to make sure that we comply with federal law regarding human subject testing. It is just a matter of paperwork. My group submitted a "claim of exemption" to USC that was approved and we use as a blanket to cover specific tests. I am attaching a draft that I put together using our own description, it took me many backs and forths with admin guys until they found it specific enough, understandable enough, and inoffensive enough. SRI should take it from here and submit it for approval through their administration. FEEDBACK AND BUG REPORTS I suggest that SRI devises a mechanism for people to provide feedback and bug reports to the team. For example, in the T0 pre-tests next week there will be a few bug reports, suggestions for improvements, feedback, etc. The easiest way may be to establish an email address where everyone mails any such things. The email account can be managed by SRI, forwarding messages to specific members of the team, assigning priorities to different suggestions. There are probably fancier ways to manage this though a Web page that we can all look at. Perhaps designing a standard form that captures useful information may be useful, with items such as: - type of report: bug, suggestion, - severity of problem: red (it really got in the way), orange (it was bad but there are ways to get around it), yellow (it is not so bad), white (just thought things could be made better or easier). - relevant modules: what modules within the system is this report relevant to (interface, component library, KM, KANAL, etc). - detailed description IMMEDIATE ACTION ITEMS - 2/4 (?): SRI to release to the team a version of SHAKEN ready for T0 pre-test. Jihie and Jerome have been in touch over the phone going through improvements that we can make to the tool to prepare it for the initial tests. I believe Jerome has already done a lot of improvements to the tool since we all saw the demo last week. The version to be released needs to have been tested to support a user in entering the virus invasion and the RNA transcription scenarios. - 2/4 (?): SRI to establish a mechanism for reporting bugs and feedback. - 2/5-2/11: All team members take the pre-test. I would be inclined to include Art and Kristien in this pre-test. They definitely need to go through this before the T1 round of SMEs anyways. Attachment converted: Macintosh HD:HumanSub.doc (WDBN/MSWD) (0003EA60) -- Yolanda Gil, USC/ISI (310) 448-8794