Wine Club Data Analysis - Part 2: Tasting Order

1 minute read

In a perfect world, the order in which we taste the wines wouldn’t have an effect on their score. In this page I’ll take a look to see how the tasting order affects our scoring

import pandas as pd
import numpy as np
import scipy
import matplotlib.pyplot as plt
import seaborn as sns
plt.rcParams.update({'savefig.dpi' : 100})
sns.set_context("paper")
sns.set_style("darkgrid")

# Load the databases
wines = pd.read_csv('wines.csv', index_col=0)
scores = pd.read_csv('scores.csv', index_col=0)
nights = pd.read_csv('nights.csv')
ax = sns.boxplot(vals=scores.Score, groupby=scores.order)

png

It looks like things are more or less uniform, with position 6 looking like the best spot and deteriorating quickly after that

Person-by-person analysis

Here we’ve assumed that we all vary the same in our rankings vs order, which might not be the case. Lets look on a person-by-person basis, to see which of us deviate the most in our tasting ability. I’m only looking at (person, order) pairs with more than 10 datapoints to make sure we’re not skewed by that 9th wine or the very occasional member.

means = scores.groupby(['Name', 'order']).Score.mean()
count = scores.groupby(['Name', 'order']).Score.count()

threshold = 10
r_names = count[count > threshold].reset_index().Name.unique()
means = means[count > threshold]
n_names = len(r_names)
colors = sns.color_palette('hls', n_names)
for i, iname in enumerate(r_names):
    plt.plot(
        means[iname].index,
        means[iname] - means[iname].mean(),
        color=colors[i], label=iname)
plt.legend(ncol=6)

png

So it looks like there’s more or less a general trend - We start off strong, get tired around wines 3-4, then pick back up for wines 5-6.

means['Martijn'].plot(
    figsize=(3,2),
    title='It just takes Martijn a while to get warmed up',
    label='Martijn')

png

I’m not sure this is the best way to do this, but we could likely define a “consistency metric” showing how well we’re able to resist the effects of wine order. For now I’ll just use standard deviation of the means.

a = means.std(level=0).copy()
a.sort()
ax = sns.barplot(np.arange(len(a)), a)
labels = ax.set_xticklabels(a.index, rotation=90)

png

Looks like Erin is the most accurate? Grace, get your act together :)

means['Erin'].plot(figsize=(3,2), label='Erin')
means['Grace'].plot(figsize=(3,2), label='Grace')
plt.legend()

png

View this as an IPython notebook

Tags:

Updated: