A few years ago, Robert Hodgson, a retired oceanographer and now wine grower in California, wrote a devastating paper on the concordance of judgments in 13 wine competitions that took place in California. Hodgson (2009) concludes that “there is almost no consensus among the 13 competitions regarding wine quality; for wines receiving a Gold medal in one or more competitions, it is very likely that the same wine received no award at another; and the likelihood of receiving a Gold medal can be statistically explained by chance alone”.
Hodgson is not the only critic of wine-tasting competitions. Davy Derbyshire (2013) wrote a paper in the Guardian entitled ‘Wine Tasting: It's Junk Science’ that describes several similar results.
Our point of departure is that wine tasting consists of two main building blocks: judges, and a rating method. Hence, perhaps it is not the judges who are ‘bad’ the problem may lie in the rating method.
The problem lies with how we rate wine
Rating methods almost invariably have the same structure: M judges sit in front of N glasses of wine. They taste them (without any information), and have to either rate them (say, on a scale between zero and 20) or order-rank them (first, second, and so on). The rates or ranks are then added, and the totals yield a rating or ranking of the wines.
Some judges are generous, others are less so, and the ranges of marks used by the judges may vary as well. As noted by Ashenfelter and Quandt (1999), this “may give greater weight to judges who put a great deal of scatter in their numerical scores and thus express strong preferences by numerical differences”. Borda, 1781, had already pointed to similar problems. Therefore, the ratings given by each judge should always be transformed into ranks before adding. Note that adding rates or ranks does not necessarily lead to a ‘social’ ordering in the absence of a dictator.
But still, rating as well as order ranking is complicated. Those who had or still have to grade students know this well (and there is not even a glass of wine to grab).
A new way to rank wines
We suggest a new game theory-based rating and ranking method for wines, in which the Shapley value of each wine is computed, and wines are ranked according to their Shapley values. Judges should find it simpler to use, since they are not required to rank-order or grade all the wines, but merely to choose the group of those that they find ‘worthy’ (for a medal). This ranking method is based on the set of reasonable axioms that determine the Shapley value as the unique solution of an underlying cooperative game. Unlike in the general case where computing the Shapley value could become very complex, here the Shapley value and hence the final ranking, are straightforward to compute.
We assume that each judge has just one vote, and that she can vote for any (sub) group of the N wines. By voting for such a group, she indicates that she favours any wine belonging to this group over wines excluded from the group, and that, as far as she is concerned, every wine chosen is a candidate for the first place or a medal, while non-chosen wines are not. Note that a group can consist of a single wine, or of all the N wines; it can be empty (no vote). When a judge votes for a group, her single vote is split equally among the wines in the group. If she voted, say, for five wines, then each of them gets a score of one out of five. Judges vote simultaneously so that none is aware of other’s choices, and no judge can vote twice for the same wine. The final score of each wine is the sum of the scores it received from all judges. This turns to be each wine's Shapley value.1 Wines are then ranked (and rated) by decreasing Shapley values – the higher the better, and the wine with the highest Shapley value wins the gold medal.
This is very similar to some versions of approval voting systems, where a voter can vote for as many candidates as he wants. Still, one might be concerned about the possibility that a judge who chooses to vote for a large number of candidate wines exercises more power or influence than a judge who votes for, say, one wine only. This poses no problem here, as each judge is endowed with one vote only. If she chooses many wines as being worthy, her single vote is divided by the number of wines chosen, and thus becomes diluted, while the judge who votes for one wine only, assigns to it a full score of one.
It should be clear that such a method is considerably easier to apply than traditional wine ranking or rating approaches, since each judge has only to point to the wines she finds worthy (any number between zero and N), and ignore the others. This may well make judges feel more comfortable, and make their choices easier and consistent. Of course, there is need for some experimentation employing the various approaches (ranking, rating and Shapley-ranking), if only to convince ourselves that Shapley-ranking leads to results comparable to those of other methods.
Ashenfelter, Orley and Richard Quandt (1999), "Analyzing a wine tasting statistically", Chance 12.
Borda, Jean-Charles de (1781), "Mémoire sur les élections au scrutin", in Mémoires de l’Académie des Sciences, 657-664.
Derbyshire, Davy (2013), “Wine tasting: It's junk science”, Guardian, 23 June.
Ginsburgh, Victor and Israel Zang (2003), “The museum pass game and its value”, Games and Economic Behavior 43, 322-325.
Ginsburgh, Victor and Israel Zang (2012), “Shapley ranking of wines”, Journal of Wine Economics 7, 169-180.
Hodgson, Robert (2009), “An analysis of concordance among 13 US wine competitions”, Journal of Wine Economics 4, 1-9.
Shapley, Lloyd (1953), “A value for n-person games”, in AW Kuhn and AW Tucker (eds.) Contribution to the Theory of Games. Vol. II., Princeton. NJ, Princeton University Press, 307-317.
1 See Shapley (1953) for definition, theory and properties of the value, Ginsburgh and Zang (2003) for a formal proof that the described procedure actually yields the Shapley value, and Ginsburgh and Zang (2012) for an application to wine competitions.