Song Lyrics 1: Counting Words

27th July 2019 / Andrew Gustar / 1 Comment

This is the first of a series of articles about analysing text data. The statistical music historian might be interested in many sorts of text – from lists and catalogues through to complex ‘free format’ writing in tweets, record reviews, composer biographies, or encyclopedias. For these articles I will consider a dataset of song lyrics, taken from the LyricWiki website [since I wrote this post, LyricWiki has disappeared, although there are several other sources of song lyrics that could be used].

Continue reading →

Concert Programming at the New York Philharmonic

19th March 2019 / Andrew Gustar / Leave a comment

The New York Philharmonic has got an excellent online archive of all of its concerts since 1842. This article uses the archive to investigate which composers tend to be programmed together.

Continue reading →

What’s in a name?

6th November 2018 / Andrew Gustar / Leave a comment

Many datasets of composers tell us relatively little about them, so we sometimes have to guess details from the information available – such as the composer’s name. Forenames, for example, are often a good indicator of gender, as described in this previous article. Titles – associated with the church, aristocracy or royalty – can also reveal gender, and tell us about occupation or social class. This article looks at what names can tell us about nationality – based on a recent attempt to identify Italian composers among the many obscure and unknown names listed in the British Library’s music catalogue.

Continue reading →

Deduplication

24th March 2018 / Andrew Gustar / Leave a comment

Deduplication is an important, though often messy and time-consuming, part of many statistical investigations. It is usually required when data comes from several different sources, to identify all of the records that actually refer to the same thing. For example, I have recently been deduplicating the names appearing in the ‘women composers’ sources listed in this previous article. Deduplication may also be needed where several publications of the same work are described in different ways in a library catalogue. Continue reading →

Reading a scanned book

26th February 2018 / Andrew Gustar / Leave a comment

I have recently been working on extracting data on women composers from the various sources listed in this previous article. The first source on that list is a scanned copy of a French translation of a book – Les femmes compositeurs de musique – compiled in 1910 by Otto Ebel. It is available at archive.org here. Although I’ve not had great success in the past in extracting usable data from scanned books, this appears to be a reasonably tidy scan of Ebel, which looks like a useful source on women composers, so I thought I would give it a go. Continue reading →

Triangulation

24th January 2018 / Andrew Gustar / Leave a comment

Surveyor 2 Triangulation is a research technique that involves looking at the same thing from two different perspectives. In surveying, it enables positions and distances to be calculated by measuring angles from two locations. In the social sciences, it can increase the reliability of conclusions if they are found by two (or more) different methods. And in statistical historical musicology, looking for the same works or composers in two or more datasets can tell us a lot about the characteristics of the datasets, and about the works’ patterns of survival or dissemination. Continue reading →

Lies, Damned Lies, and Composers’ Star Signs

31st December 2017 / Andrew Gustar / 2 Comments

On the classical.net website there is a list of 715 composers and their dates of birth. It is straightforward to use this data to identify each composer’s star sign, which produces this interesting chart: Continue reading →

Pick a composer, any composer

13th November 2017 / Andrew Gustar / Leave a comment

Often in statistical analysis we need to select things at random. For example, if it is impractical to work with a complete dataset, the only option might be to use a random sample. The science of statistics tells us how to analyse a sample in order to reach conclusions about the entire dataset, and gives us ways to calculate margins of error based on the size of the sample. But I digress.

So, how might we pick a random composer? Continue reading →

The Semantic Web vs Tidy Data

15th September 2017 / Andrew Gustar / Leave a comment

I have recently been trying to collect data from the Listening Experience Database (LED) in order to put together a proposal for a conference paper. The LED is a nicely constructed database using linked open data and a structure based on something called the ‘Semantic Web’. Rather than traditional databases that have a hierarchical ‘tree’ structure, the Semantic Web concept is a true ‘network’, where anything can be linked to anything else. The LED, for example, includes links to data on a number of other databases. Have a look at the LED and follow a few links and you will see what this means – a very rich and flexible means of linking data together. Continue reading →

Types of Investigation

7th August 2017 / Andrew Gustar / Leave a comment

In what ways can statistical techniques be used to investigate topics in historical musicology? I think there are four main approaches – hypothesis testing, quantification, modelling and exploration. Their use depends on the topic, the data, and the type of question you are trying to answer.

These four types often overlap. It is hard to do modelling without some exploration and quantification, for example. Also, after you have spent so long collecting the data, cleaning it, and getting it into a form for statistical analysis, why not squeeze the most out of it and do some general exploration after testing your hypotheses? Continue reading →

Statistics in Historical Musicology

Category: Techniques

Song Lyrics 1: Counting Words

Concert Programming at the New York Philharmonic

What’s in a name?

Deduplication

Reading a scanned book

Triangulation

Lies, Damned Lies, and Composers’ Star Signs

Pick a composer, any composer

The Semantic Web vs Tidy Data

Types of Investigation