Abstract: Most existing theory of structured prediction assumes exact inference, which is often intractable in many practical problems. This leads to the routine use of approximate inference such as beam search but there is not much theory behind it. Based on the structured perceptron, we propose a general framework of "violation-fixing" perceptrons for inexact search with a theoretical guarantee for convergence under new separability conditions. This framework subsumes and justifies the popular heuristic "early-update" for perceptron with beam search (Collins and Roark, 2004). We also propose several new update methods within this framework, among which the "max-violation" method dramatically reduces training time (by 3 fold as compared to early-update) on state-of-the-art part-of-speech tagging and incremental parsing systems.
]]>
Abstract:
This pilot study explores the hypothesis that customer reviews of cars can be used to create and/or fine tune a recommendation system that offers a list of ranked top-N matches for a given vehicle. Our main premise is that positive or negative reviews invariably focus on the features relevant to the car being reviewed and hence can be used to uncover subtle similarities among various car models, as well as discover macro-types of cars (e.g. family cars, luxury, high performance sports etc). To discover similar models based on reviews we propose a Weighted Dice Coefficient which weighs each shared or non-shared word token by its tf-idf score. Closest top five cars are then discovered for each of the 226 reviewed car models. We also show that integrating tf-idf scores into the similarity metric improves the accuracy of the top five picks, as compared to the standard Dice Coefficient.