# Wine Club Data Analysis - Part 2: Tasting Order

1 minute read

In a perfect world, the order in which we taste the wines wouldn’t have an effect on their score. In this page I’ll take a look to see how the tasting order affects our scoring

``````import pandas as pd
import numpy as np
import scipy
import matplotlib.pyplot as plt
import seaborn as sns
plt.rcParams.update({'savefig.dpi' : 100})
sns.set_context("paper")
sns.set_style("darkgrid")

# Load the databases
wines = pd.read_csv('wines.csv', index_col=0)
scores = pd.read_csv('scores.csv', index_col=0)
nights = pd.read_csv('nights.csv')
``````
``````ax = sns.boxplot(vals=scores.Score, groupby=scores.order)
`````` It looks like things are more or less uniform, with position 6 looking like the best spot and deteriorating quickly after that

## Person-by-person analysis

Here we’ve assumed that we all vary the same in our rankings vs order, which might not be the case. Lets look on a person-by-person basis, to see which of us deviate the most in our tasting ability. I’m only looking at (person, order) pairs with more than 10 datapoints to make sure we’re not skewed by that 9th wine or the very occasional member.

``````means = scores.groupby(['Name', 'order']).Score.mean()
count = scores.groupby(['Name', 'order']).Score.count()

threshold = 10
r_names = count[count > threshold].reset_index().Name.unique()
means = means[count > threshold]
n_names = len(r_names)
colors = sns.color_palette('hls', n_names)
for i, iname in enumerate(r_names):
plt.plot(
means[iname].index,
means[iname] - means[iname].mean(),
color=colors[i], label=iname)
plt.legend(ncol=6)
`````` So it looks like there’s more or less a general trend - We start off strong, get tired around wines 3-4, then pick back up for wines 5-6.

``````means['Martijn'].plot(
figsize=(3,2),
title='It just takes Martijn a while to get warmed up',
label='Martijn')
`````` I’m not sure this is the best way to do this, but we could likely define a “consistency metric” showing how well we’re able to resist the effects of wine order. For now I’ll just use standard deviation of the means.

``````a = means.std(level=0).copy()
a.sort()
ax = sns.barplot(np.arange(len(a)), a)
labels = ax.set_xticklabels(a.index, rotation=90)
`````` Looks like Erin is the most accurate? Grace, get your act together :)

``````means['Erin'].plot(figsize=(3,2), label='Erin')
means['Grace'].plot(figsize=(3,2), label='Grace')
plt.legend()
`````` View this as an IPython notebook

Tags:

Updated: