Kaggle Winning Solutions   |   How to win a kaggle solution

Evaluation metrics » quadratic weighted kappa

quadratic-weighted-kappa

The Kappa coefficient is a chance-adjusted index of agreement. In machine learning it can be used to quantify the amount of agreement between an algorithm's predictions and some trusted labels of the same objects. Kappa starts with accuracy - the proportion of all objects that both the algorithm and the trusted labels assigned to the same category or class. However, it then attempts to adjust for the probability of the algorithm and trusted labels assigning items to the same category "by chance." This metric typically varies from 0 (random agreement between raters) to 1 (complete agreement between raters). In the event that there is less agreement between the raters than expected by chance, the metric may go below 0.

How Quadratic Weighted Kappa works

The quadratic weighted kappa is calculated as follows.

  1. First, an N x N histogram matrix O is constructed, such that Oi,j corresponds to the number of adoption records that have a rating of i (actual) and received a predicted rating j.
  2. An N-by-N matrix of weights, w, is calculated based on the difference between actual and predicted rating scores.
  3. An N-by-N histogram matrix of expected ratings, E, is calculated, assuming that there is no correlation between rating scores. This is calculated as the outer product between the actual rating's histogram vector of ratings and the predicted rating's histogram vector of ratings, normalized such that E and O have the same sum.
  4. Weighted Kappa = 1 - sum(w_i,j O_i,j) / (w_i,j E_i,j)

Example

In [23]:
import numpy as np
actuals = np.array([4, 4, 3, 4, 4, 0, 1, 1, 2, 1])
preds   = np.array([0, 4, 1, 0, 4, 0, 1, 1, 2, 1])

Step 1 - construct the confusion matrix

In [24]:
from sklearn.metrics import confusion_matrix
O = confusion_matrix(actuals, preds);
print('Matrix O:')
print(O)
Matrix O:
[[1 0 0 0 0]
 [0 3 0 0 0]
 [0 0 1 0 0]
 [0 1 0 0 0]
 [2 0 0 0 2]]

Step 2 - construct weights matrix

In [25]:
w = np.zeros((5,5))
for i in range(len(w)):
    for j in range(len(w)):
        w[i][j] = float(((i-j)**2)/16)
        
print('weights matrix:')
print(w)
weights matrix:
[[0.     0.0625 0.25   0.5625 1.    ]
 [0.0625 0.     0.0625 0.25   0.5625]
 [0.25   0.0625 0.     0.0625 0.25  ]
 [0.5625 0.25   0.0625 0.     0.0625]
 [1.     0.5625 0.25   0.0625 0.    ]]

Step 3 - construct matrix of expected ratings

In [26]:
N=5
act_hist=np.zeros([N])
for item in actuals: 
    act_hist[item]+=1
    
pred_hist=np.zeros([N])
for item in preds: 
    pred_hist[item]+=1

E = np.outer(act_hist, pred_hist)

print('Expected matrix:')
print(E)
Expected matrix:
[[ 3.  4.  1.  0.  2.]
 [ 9. 12.  3.  0.  6.]
 [ 3.  4.  1.  0.  2.]
 [ 3.  4.  1.  0.  2.]
 [12. 16.  4.  0.  8.]]

Step 4 - Compute QWK

In [27]:
E = E/E.sum() # normalize E
O = O/O.sum() # normalize O

num=0
den=0
for i in range(len(w)):
    for j in range(len(w)):
        num+=w[i][j]*O[i][j]
        den+=w[i][j]*E[i][j]
        
weighted_kappa = (1 - (num/den))
print('weighted kappa:')
weighted_kappa
weighted kappa:
Out[27]:
0.31818181818181823

How to implement it

The above described a very naive implementation of QWK and will be very slow. Use one of the following options:

SKlearn cohen_kappa_score function

In [67]:
from sklearn.metrics import cohen_kappa_score, confusion_matrix
import numpy as np
from time import time

#dataset
np.random.seed(2020)
actuals = np.random.randint(0, 4, 10000)
preds = np.random.randint(0, 4, 10000)
In [68]:
# QWK
start_time = time()
qwk = cohen_kappa_score(actuals, preds, weights="quadratic")
print(f'qwk = {qwk} (runtime: {time()-start_time:0.3} seconds)')
qwk = 0.010146537647530596 (runtime: 0.007 seconds)

afajohn Method

In [64]:
# https://www.kaggle.com/afajohn/quadratic-weighted-kappa-with-numpy-flavor
def quadKappa(act,pred,n=4,hist_range=(0,3)):
    
    O = confusion_matrix(act,pred)
    O = np.divide(O,np.sum(O))
    
    W = np.zeros((n,n))
    for i in range(n):
        for j in range(n):
            W[i][j] = ((i-j)**2)/((n-1)**2)
            
    act_hist = np.histogram(act,bins=n,range=hist_range)[0]
    prd_hist = np.histogram(pred,bins=n,range=hist_range)[0]
    
    E = np.outer(act_hist,prd_hist)
    E = np.divide(E,np.sum(E))
    
    num = np.sum(np.multiply(W,O))
    den = np.sum(np.multiply(W,E))
        
    return 1-np.divide(num,den)
In [66]:
# QWK
start_time = time()
qwk = quadKappa(actuals, preds)
print(f'qwk = {qwk} (runtime: {time()-start_time:0.3} seconds)')
qwk = 0.010146537647530707 (runtime: 0.00899 seconds)

cpmp method

In [63]:
#https://www.kaggle.com/c/data-science-bowl-2019/discussion/114133
from numba import jit 
import warnings 
warnings.filterwarnings('ignore')

@jit
def qwk3(a1, a2, max_rat=3):
    assert(len(a1) == len(a2))
    a1 = np.asarray(a1, dtype=int)
    a2 = np.asarray(a2, dtype=int)

    hist1 = np.zeros((max_rat + 1, ))
    hist2 = np.zeros((max_rat + 1, ))

    o = 0
    for k in range(a1.shape[0]):
        i, j = a1[k], a2[k]
        hist1[i] += 1
        hist2[j] += 1
        o +=  (i - j) * (i - j)

    e = 0
    for i in range(max_rat + 1):
        for j in range(max_rat + 1):
            e += hist1[i] * hist2[j] * (i - j) * (i - j)

    e = e / a1.shape[0]

    return 1 - o / e
In [62]:
# QWK
start_time = time()
qwk = qwk3(actuals, preds)
print(f'qwk = {qwk} (runtime: {time()-start_time:0.3} seconds)')
qwk = 0.010146537647530596 (runtime: 0.001 seconds)