Week 03 Colloquium Summary David Banks Experimental Performance Evaluation in Computer Vision Kevin Bowyer University of South Florida Dr. Bowyer described a tool he has developed for benchmarking the accuracy of algorithms in medical imaging. Suppose 20 different research groups devise software for determining whether a tumor is present in an image and whether the tumor is benign or malignant. Whose algorithm is best? Is it worth the trouble to even read about algorithm number 21? The ground truth in benchmarking mammograms is supplied by human experts: the physicians who actually classify the images. If an algorithm makes the same decisions that the human expert does, it must be a good algorithm. Bowyer developed an online tool that gives anyone anywhere the chance to try out his favorite tumor-classification algorithm on a set of mammograms and test it against the experts. The outcome is an ROC (Receiver Operating Characteristic) curve that plots true positives against false positives. He contends that in the future, any novel computer vision technique for cancer detection will exhibit its superiority based on the shape of its ROC curve. This objective measure will impartially determine which is the best algorithm. He recruited a few groups to actually try it out, establishing some target ROC curves that may one day be out-performed.