Scoring Models¶
In the last notebook, we learned how to select the best model given a set model architecture, but often you may have multiple versions of the architecture in mind that you would like to test. A common method is to use probabilistic statistical measures that attempt to quantify both the model performance on the training dataset and the complexity of the model. The scores often used are Akaike and Bayseian Information Criterion (AIC & BIC respectively). Both evaluate the model's fit on the training data, adding penalties for more complex models as these tend to overfit to the dataset. This means the scores will reflect the model that best generalises to the dataset.
With criterion values, the lower the score, the better, and it is relative, which means it can only be compared with other models trained on the same dataset and in the same way. Both AIC and BIC evaluate in very similar ways with minor differences in their formulas, so the results should often be very similar.
It is important to understand the limitations of probabilistic scores for models when viewing the results. Both AIC and BIC will by design prioritise the simplest model that best fits the dataset, it will have no knowledge of the uncertainty of the model or any biological relevance. It is therefore up to you to decide, using both the scores and your prior knowledge of the system.
For this example we will load in the whole dataset and train a fully open HMM (as in everything can transition into each other and all states can emit all observables) for the sake of ease
# We'll load in the pacakges we need
import pandas as pd
import numpy as np
import pickle
from hmmlearn.hmm import CategoricalHMM
# Load in your cleaned dataset
df = pd.read_pickle('/USERS_PATH/ReCoDE-HMMs-for-the-discovery-of-behavioural-states/admin/cleaned_data.pkl')
# List the observables
observables = ['immobile', 'micro', 'walking']
# Transform all the data to the right shape
ar_data = df.groupby('id')['hmm'].apply(np.array)
ar_data = np.array(ar_data)
len_seq_all = [len(ar) for ar in ar_data]
seq_all = np.concatenate(ar_data, axis = 0)
seq_all = seq_all.reshape(-1, 1)
seq = ar_data[0]
seq = seq.reshape(-1, 1) # It also needs to be reshped for decoding
# Call the .decode() method with the sequence inside the brackets
# The method returns two parts, the log liklihood for the sequence and the decoded sequence
log_prob, decoded_array = model.decode(seq)
All hmmlearn hidden markov models have a built in method that will give you AIC, BIC scores, as well as the .score() method we've used previously that gives the log likelihood. We'll run through briefly how to get these scores before creating models with varying numbers of hidden states.
# Lets load in the 2 state and 4 state model you trained previously into list called models
models = []
# replace the paths below with the ones in your repo
models.append(pickle.load(open('.../ReCoDE-HMMs-for-the-discovery-of-behavioural-states/data/2_state_model.pkl', "rb")))
models.append(pickle.load(open('.../ReCoDE-HMMs-for-the-discovery-of-behavioural-states/data/4_state_model.pkl', "rb")))
# We'll create some empty lists to append the scores
aic = []
bic = []
lls = []
for hmm in models:
aic.append(hmm.aic(seq_all, len_seq_all)) # get the AIC score with .aic()
bic.append(hmm.bic(seq_all, len_seq_all)) # get the BIC score with .bic()
lls.append(hmm.score(seq_all, len_seq_all))# get the logliklihood will .score()
We can now use Matplotlib to plot the data. If you've used Python before, you've probably seen or used Matplotlib before, if not, it's a library for visualising data in Python. Click the embedded link for more information.
# This is the way to load matplotlib
import matplotlib.pyplot as plt
# Labels for the x-axis
model_names = ['2 states', '4 states']
# Create the plot
fig, ax = plt.subplots()
# Plot AIC and BIC on the first y-axis
ln1 = ax.plot(model_names, aic, label="AIC", color="blue", marker="s")
ln2 = ax.plot(model_names, bic, label="BIC", color="green", marker="D")
# Create a second y-axis for logliklihood as its scores differently
ax2 = ax.twinx()
ln3 = ax2.plot(model_names, lls, label="LL", color="orange", marker="o")
# Joins the legends and sets the labels
ax.legend(handles=ax.lines + ax2.lines)
ax.set_title("Using AIC/BIC for Model Selection")
ax.set_ylabel("Criterion Value (lower is better)")
ax2.set_ylabel("LL (higher is better)")
ax.set_xlabel("HMM type")
fig.tight_layout()
# Pring the plot to screen
plt.show()
The lower the AIC and BIC, the better, and the higher the likelihood, the better. From this, we can see that despite the additional complexity of the four-state model, it performs better on the dataset in all scores.
Task¶
Create a loop below that evaluates the models, with varying amounts of hidden states
# Finish the loop below using what you created in the previous notebook
# hint: what we did for storing the best model
aic = []
bic = []
lls = []
# Create models of size 2, 4, 6, 8
n_states = [2, 4, 6, 8]
for n in n_states:
# Write the code to plot the scores here
Head to notebook_answers for an example of what the graph should look like, as well as some commentary on how to interpret it.
|
├── data
├── docs
├── notebooks
├── src
| ├── notebook_answers.ipynb <-----
| └── ...
Extra task¶
Try training models with varying numbers of hidden states that are true to the biology (i.e., the sleep states only emit as immobile and sleep stages are sequential).
Compare the scores for each model.