# Evaluation metrics » quadratic weighted kappa

The Kappa coefficient is a chance-adjusted index of agreement. In machine learning it can be used to quantify the amount of agreement between an algorithm's predictions and some trusted labels of the same objects. Kappa starts with accuracy - the proportion of all objects that both the algorithm and the trusted labels assigned to the same category or class. However, it then attempts to adjust for the probability of the algorithm and trusted labels assigning items to the same category "by chance." This metric typically varies from 0 (random agreement between raters) to 1 (complete agreement between raters). In the event that there is less agreement between the raters than expected by chance, the metric may go below 0.

## How Quadratic Weighted Kappa works¶

The quadratic weighted kappa is calculated as follows.

1. First, an N x N histogram matrix O is constructed, such that Oi,j corresponds to the number of adoption records that have a rating of i (actual) and received a predicted rating j.
2. An N-by-N matrix of weights, w, is calculated based on the difference between actual and predicted rating scores.
3. An N-by-N histogram matrix of expected ratings, E, is calculated, assuming that there is no correlation between rating scores. This is calculated as the outer product between the actual rating's histogram vector of ratings and the predicted rating's histogram vector of ratings, normalized such that E and O have the same sum.
4. Weighted Kappa = 1 - sum(w_i,j O_i,j) / (w_i,j E_i,j)

### Example¶

In [23]:
import numpy as np
actuals = np.array([4, 4, 3, 4, 4, 0, 1, 1, 2, 1])
preds   = np.array([0, 4, 1, 0, 4, 0, 1, 1, 2, 1])


#### Step 1 - construct the confusion matrix¶

In [24]:
from sklearn.metrics import confusion_matrix
O = confusion_matrix(actuals, preds);
print('Matrix O:')
print(O)

Matrix O:
[[1 0 0 0 0]
[0 3 0 0 0]
[0 0 1 0 0]
[0 1 0 0 0]
[2 0 0 0 2]]


#### Step 2 - construct weights matrix¶

In [25]:
w = np.zeros((5,5))
for i in range(len(w)):
for j in range(len(w)):
w[i][j] = float(((i-j)**2)/16)

print('weights matrix:')
print(w)

weights matrix:
[[0.     0.0625 0.25   0.5625 1.    ]
[0.0625 0.     0.0625 0.25   0.5625]
[0.25   0.0625 0.     0.0625 0.25  ]
[0.5625 0.25   0.0625 0.     0.0625]
[1.     0.5625 0.25   0.0625 0.    ]]


#### Step 3 - construct matrix of expected ratings¶

In [26]:
N=5
act_hist=np.zeros([N])
for item in actuals:
act_hist[item]+=1

pred_hist=np.zeros([N])
for item in preds:
pred_hist[item]+=1

E = np.outer(act_hist, pred_hist)

print('Expected matrix:')
print(E)

Expected matrix:
[[ 3.  4.  1.  0.  2.]
[ 9. 12.  3.  0.  6.]
[ 3.  4.  1.  0.  2.]
[ 3.  4.  1.  0.  2.]
[12. 16.  4.  0.  8.]]


#### Step 4 - Compute QWK¶

In [27]:
E = E/E.sum() # normalize E
O = O/O.sum() # normalize O

num=0
den=0
for i in range(len(w)):
for j in range(len(w)):
num+=w[i][j]*O[i][j]
den+=w[i][j]*E[i][j]

weighted_kappa = (1 - (num/den))
print('weighted kappa:')
weighted_kappa

weighted kappa:

Out[27]:
0.31818181818181823

## How to implement it¶

The above described a very naive implementation of QWK and will be very slow. Use one of the following options:

### SKlearn cohen_kappa_score function¶

In [67]:
from sklearn.metrics import cohen_kappa_score, confusion_matrix
import numpy as np
from time import time

#dataset
np.random.seed(2020)
actuals = np.random.randint(0, 4, 10000)
preds = np.random.randint(0, 4, 10000)

In [68]:
# QWK
start_time = time()
print(f'qwk = {qwk} (runtime: {time()-start_time:0.3} seconds)')

qwk = 0.010146537647530596 (runtime: 0.007 seconds)


### afajohn Method¶

In [64]:
# https://www.kaggle.com/afajohn/quadratic-weighted-kappa-with-numpy-flavor

O = confusion_matrix(act,pred)
O = np.divide(O,np.sum(O))

W = np.zeros((n,n))
for i in range(n):
for j in range(n):
W[i][j] = ((i-j)**2)/((n-1)**2)

act_hist = np.histogram(act,bins=n,range=hist_range)[0]
prd_hist = np.histogram(pred,bins=n,range=hist_range)[0]

E = np.outer(act_hist,prd_hist)
E = np.divide(E,np.sum(E))

num = np.sum(np.multiply(W,O))
den = np.sum(np.multiply(W,E))

return 1-np.divide(num,den)

In [66]:
# QWK
start_time = time()
print(f'qwk = {qwk} (runtime: {time()-start_time:0.3} seconds)')

qwk = 0.010146537647530707 (runtime: 0.00899 seconds)


### cpmp method¶

In [63]:
#https://www.kaggle.com/c/data-science-bowl-2019/discussion/114133
from numba import jit
import warnings
warnings.filterwarnings('ignore')

@jit
def qwk3(a1, a2, max_rat=3):
assert(len(a1) == len(a2))
a1 = np.asarray(a1, dtype=int)
a2 = np.asarray(a2, dtype=int)

hist1 = np.zeros((max_rat + 1, ))
hist2 = np.zeros((max_rat + 1, ))

o = 0
for k in range(a1.shape[0]):
i, j = a1[k], a2[k]
hist1[i] += 1
hist2[j] += 1
o +=  (i - j) * (i - j)

e = 0
for i in range(max_rat + 1):
for j in range(max_rat + 1):
e += hist1[i] * hist2[j] * (i - j) * (i - j)

e = e / a1.shape[0]

return 1 - o / e

In [62]:
# QWK
start_time = time()
qwk = qwk3(actuals, preds)
print(f'qwk = {qwk} (runtime: {time()-start_time:0.3} seconds)')

qwk = 0.010146537647530596 (runtime: 0.001 seconds)