Song Lyrics 7: Rhyme Time

28th October 2019 / Andrew Gustar

Previously, we have looked at repetition in our dataset of song lyrics. This seventh article in the series considers a related issue – rhyming patterns. We are only interested here in the last word of each line – i.e. the string of characters between the last space and the end-of-line character \n.

Continue reading →

Song Lyrics 6: Principal Components

9th October 2019 / Andrew Gustar / Leave a comment

This is the sixth in a series of articles looking at different ways of analysing a dataset of song lyrics. In this article we will be venturing into hyperspace to explore the differences and similarities between artists, in terms of the words they use in their songs.

Continue reading →

Song Lyrics 5: Sunday in New York with Mary and John

29th August 2019 / Andrew Gustar / Leave a comment

In a later post in this series of articles analysing a dataset of song lyrics, I will consider the more general question of identifying parts-of-speech (nouns, verbs, adjectives, etc.), which can greatly expand what can be learned from statistical text analysis. However, in this article, I will focus on a particular part of speech: proper nouns.

Continue reading →

Song Lyrics 4: Sentiment Analysis

19th August 2019 / Andrew Gustar / Leave a comment

In this fourth article in the series looking at our song lyrics dataset, we will begin to consider the meaning of the lyrics, rather than just treating the words as abstract objects. A simple technique for quantifying the meaning of texts is known as ‘sentiment analysis’.

Continue reading →

Song Lyrics 3: Repetition and Compression

9th August 2019 / Andrew Gustar / Leave a comment

We all know that a good song depends on repetition – both of the tune and the lyrics. Too much repetition and it is just boring; too little, and it can lack structure. This article looks at different aspects of repetition in song lyrics.

Continue reading →

Song Lyrics 2: n-grams

3rd August 2019 / Andrew Gustar / Leave a comment

In the previous article in this series we looked at counting the frequency of words in a dataset of song lyrics. This time we will look at combinations of words – or n-grams.

Continue reading →

Song Lyrics 1: Counting Words

27th July 2019 / Andrew Gustar / 1 Comment

This is the first of a series of articles about analysing text data. The statistical music historian might be interested in many sorts of text – from lists and catalogues through to complex ‘free format’ writing in tweets, record reviews, composer biographies, or encyclopedias. For these articles I will consider a dataset of song lyrics, taken from the LyricWiki website [since I wrote this post, LyricWiki has disappeared, although there are several other sources of song lyrics that could be used].

Continue reading →

Time at the top: classical vs popular music

7th May 2019 / Andrew Gustar / Leave a comment

One of the things that seems to distinguish ‘classical’ from ‘popular’ music is the fact that the same classical composers and works can remain at the top for very long periods of time – decades, even centuries – whereas popular music songs and artists can reach the top of the charts, sell millions of records, and disappear within a matter of months. But is this difference real?

Continue reading →

Concert Programming at the New York Philharmonic

19th March 2019 / Andrew Gustar / Leave a comment

The New York Philharmonic has got an excellent online archive of all of its concerts since 1842. This article uses the archive to investigate which composers tend to be programmed together.

Continue reading →

Why it pays to perform last

9th February 2019 / Andrew Gustar / Leave a comment

One of the annoying things about TV talent shows is the fact that the winning act very often seems to be the one that performed last. I thought I would check whether this was actually the case, using the Wikipedia data detailing all 15 series of the UK ‘X Factor’.

Continue reading →