# How Predictable: Evaluating Song Lyrics with Language Models

I was briefly nerd-sniped this morning by the following tweet:

Can we quantify how ‘predictable’ a set of lyrics are?

## Language Models and Token Probabilities

A language model is a neural network trained to predict the next token in a sequence. Specifically, given an input sequence it outputs a probability for each token in its vocabulary. So, given the phrase “Today is a nice ” the model outputs one value for every token, and we can look up the probability associated with the token for “day” – which will likely be fairly high (~0.5 in my tests).

We can look at the probabilities predicted for each successive word in a set of lyrics, and take the average as a measure of ‘predictability’. Here’s the full code I used:

``````import torch
from transformers import AutoModelForCausalLM
from transformers import AutoTokenizer
gpt2 = AutoModelForCausalLM.from_pretrained("gpt2", return_dict_in_generate=True)
tokenizer = AutoTokenizer.from_pretrained("gpt2")

lyrics = """
And my thoughts stay runnin', runnin' (Runnin')
The heartbreaks keep comin', comin' (Comin')
Oh, somebody tell me that I'll be okay
"""
input_ids = tokenizer(lyrics, return_tensors="pt").input_ids
word_probs = []
min_length = 5 # How much do we give to start with

for i in range(min_length, len(input_ids)-1):
ids = input_ids[:,:i]
generated_outputs = gpt2.generate(ids[:,:-1], do_sample=True, output_scores=True,
max_new_tokens=1,
scores = generated_outputs.scores
probs = scores.softmax(-1)
word_probs.append(probs[ids[-1]])

torch.mean(torch.tensor(word_probs))``````

My starting point was this post by Patrick Von Platen showing how to generate probabilities per token with GPT-2.

## Results

The first test: ‘Remind Me’ by Megan Trainor. The mean probability given by the model for the next word given the lyrics up to that point: 0.58!

Trying a few other songs I could think of with less repetitive lyrics:

• ‘Levitate’ (21 Pilots): 0.34
• ‘Mom’s Spaghetti’ (MNM): 0.35
• The code example above: 0.45
• I’m Gonna Be (500 Miles)’ (The Proclaimers): 0.59

There is a caveat worth making which is that anything written before 2019 might be in the model’s training data, and so it might ‘know’ the lyrics already making the measure less informative.

## Historical Trends

EDIT: Someone (me) didn’t preview their data well enough, the lyrics I used for this were either badly scraped or very processed, so these scores won’t compare well to the previous section and I need to re-do this with a proper dataset before we can say anything concrete about trends!