Visualising the model output¶
For this section we'll need to import some plotting functions from elsewhere in the repository.
import sys
# The plotting functions are in the HMM folder
sys.path.append('../src/HMM')
# We'll look at four plots in this notebook
from hmm_plot_functions import plot_hmm_overtime, plot_hmm_quantify, plot_hmm_quantify_length, plot_hmm_raw
from misc import hmm_display
For this notebook we'll use a pre-trained HMM, as the dataset we used earlier was deliberately reduced in size to meet the github size limit and to minimise the time spent training. Due to this, there's a good chance the trained model might HAVE some odd quirks, so to be safe we'll use one trained on the whole dataste.
# we'll need to load in the HMM from the data folder, it should be called example_hmm.pkl
import pickle
path = '.../ReCoDE-HMMs-for-the-discovery-of-behavioural-states/data/example_hmm.pkl'
h = pickle.load(open(path, "rb"))
We'll also need to set the labels for the decoded states and the colours we want to represent it.
hmm_labels = ['deep sleep', 'light sleep', 'quiet awake', 'active awake']
emission_labels = observables = ['immobile', 'micro', 'walking']
colours = ['darkblue', 'lightblue', 'orange', 'red']
# We can also view the pre-trained model
hmm_display(h, hmm_labels, emission_labels)
Starting probabilty table: | | deep sleep | light sleep | quiet awake | active awake | |----|--------------|---------------|---------------|----------------| | 0 | 0.0165855 | 6.08132e-05 | 0.160616 | 0.822738 | Transition probabilty table: | | deep sleep | light sleep | quiet awake | active awake | |--------------|--------------|---------------|---------------|----------------| | deep sleep | 0.834628 | 0.0101301 | 0.155242 | 0 | | light sleep | 0.112523 | 0.672489 | 0.214988 | 0 | | quiet awake | 0 | 0.25632 | 0.731348 | 0.0123321 | | active awake | 0 | 0 | 0.0206226 | 0.979377 | Emission probabilty table: | | immobile | micro | walking | |--------------|------------|-----------|-----------| | deep sleep | 1 | 0 | 0 | | light sleep | 1 | 0 | 0 | | quiet awake | 0.16186 | 0.418199 | 0.419942 | | active awake | 0.0115003 | 0.0185279 | 0.969972 |
# Now we can load in the data we used to train the model earlier, i.e. 'cleaned_data.pkl'
import pandas as pd
df = pd.read_pickle('')
For plotting the data we'll be using matplotlib and seaborn to create the plots. Matplotlib was introduced in the last notebook and seaborn acts ontop of matplotlib and alongside pandas, allowing users to quickly generate plots from a dataframe with just 1 line of code. Head to the websites linked to read more about their uses. For now we'll just be calling functions that utilise them in the backend, so they're won't be any explination to how the plots were generated.
Plots Overtime¶
One of the first things you want to look at with any time series analysis is how your output changes overtime. Below we'll generate a graph that plots the prevalence of each state over a 24 hour window.
# The function below takes the data and the column to decode as the first 2 arguments
# wrapped takes the decoded data and transforms it to only be over 24 hours, rather than several days
# tbin is the time difference of the data that was used to train the model (here its 60). Make sure you remember the time difference when training future models as it can affect the output.
# avg_window is the amount of values that are included in a moving average window that smoothes the data for plotting
plot_hmm_overtime(data = df, variable = 'hmm', hmm = h, labels = hmm_labels, colours = colours, wrapped = True, tbin = 60, avg_window = 30)
Each plotted line for each state has the 95% confidence intervals shown in a semi-transparent colour around the line. Those sections of each line that have no overlapping confidence intervals we can be fairly sure that they are statistically significant from each other (although we would need to back this up with tests, but at a quick glance its good)
Comparing this plot to a regular sleep plot that uses the 5 minute rule to calculate sleep (the Drosophila standard, > 5 minutes of immobility equates sleep) we can see that the peaks in deep sleep follows directly the sleep pattern:
We can therefore be fairly happy that our HMM is replicating real life (or at least what we think is a correct interpretation of real life).
Now that we can see how the states change in prevelance overtime we'll want to quantify the values to see their proportions.
Quantifying plots¶
With quantification plots bar charts are often not the best way to go as they lack description of the underlying data, just showing the mean or median. It is common now to plot all the data points, with an overlay of a boxplot or violin plot. Seaborn has built in plotting functions to plot boxplots and swarm plots together.
# We'll call the same arguments as before
# This plot will find the proportion of time each fly spends in each state
# The writtern values is the mean, whereas the line in the box is the median
plot_hmm_quantify(data = df, hmm = h, variable = 'hmm', labels = hmm_labels, colours = colours, tbin = 60)
We can see that both light sleep and quiet awake are the most prevelant states, which if we look back at the overtime plot makes sense as they are consistently present throughout the day and night with deep sleep and active awake fluctuating throughout the day.
Quantifying length¶
We also want to look at the rough length of each state to gain an idea if the states are being transitioned into a lot or just have long instances a few amount of times.
# Same again
plot_hmm_quantify_length(data = df, hmm = h, variable = 'hmm', labels = hmm_labels, colours = colours, tbin = 60)
/home/lab/Desktop/ReCoDE-HMMs-for-the-discovery-of-behavioural-states/.venv/lib/python3.11/site-packages/seaborn/categorical.py:3370: UserWarning: 52.6% of the points cannot be placed; you may want to decrease the size of the markers or use stripplot. warnings.warn(msg, UserWarning) /home/lab/Desktop/ReCoDE-HMMs-for-the-discovery-of-behavioural-states/.venv/lib/python3.11/site-packages/seaborn/categorical.py:3370: UserWarning: 25.2% of the points cannot be placed; you may want to decrease the size of the markers or use stripplot. warnings.warn(msg, UserWarning)
From this plot we can tell that although both light sleep and quiet awake are the most prevelant, they are both short bouts, indicating they are transitioned into frequently, but for short amount of time. Addtionally, this HMM would predict sleep to be shorter than the tradtional 5 minute rule, an assumption that would need to tested by an experiment.
Both deep sleep and active awake are much longer, running for long periods without interuption, which is something you'd expect.
Plotting raw decoded data¶
Sometimes its good to visualise the raw decoded data to do some exploratory analysis. Looking at the raw data can give you an idea how the states interact with each other, then with any assumptions you make you can devise new analysis techniques to prov your assumptions.
# Once again the same...
plot_hmm_raw(data = df, hmm = h, variable = 'hmm', colours = colours, tbin = 60)
With the above plots you can get a general idea about how the model maps to the real dataset and better understand how the transition rates translate in reality. Using these insights you can then create an assay that will test the models output. For example, to test the predicted sleep states I random delivered attractive odours to individual flies and observed if the woke up after (started moving), this data was then decoded and the response mapped to what the state was previously. See the plot below.
(a) The % of time spent in each state over 24 hours (b) The response rate to a puff of 10% Apple cideo vinegar per state in a 4-state model. In grey is the spontaneuous movement, where a fake puff is recored and then movement post fake puff is recorded.
I hope now you are familiar with HMM's, their use, how to create your own, and finally how to understand what's produced and how it maps to reality.