The Semantic Web vs Tidy Data

I have recently been trying to collect data from the Listening Experience Database (LED) in order to put together a proposal for a conference paper. The LED is a nicely constructed database using linked open data and a structure based on something called the ‘Semantic Web’. Rather than traditional databases that have a hierarchical ‘tree’ structure, the Semantic Web concept is a true ‘network’, where anything can be linked to anything else. The LED, for example, includes links to data on a number of other databases. Have a look at the LED and follow a few links and you will see what this means – a very rich and flexible means of linking data together. Continue reading →

Collecting Data

Radial Bookshelves 2Finding a great dataset is all very well, but the next step is working out how to get the data onto your computer so that you can start playing with it. Datasets come in many forms, and there are different ways of collecting the data. In this article I will use some examples from the list of datasets in this previous article on women composers.

There are three main approaches to collecting data: read it and type it in, download it, or ‘scrape’ it. Continue reading →

Types of Investigation

Infographic and Statistic Vector PackIn what ways can statistical techniques be used to investigate topics in historical musicology? I think there are four main approaches – hypothesis testing, quantification, modelling and exploration. Their use depends on the topic, the data, and the type of question you are trying to answer.

These four types often overlap. It is hard to do modelling without some exploration and quantification, for example. Also, after you have spent so long collecting the data, cleaning it, and getting it into a form for statistical analysis, why not squeeze the most out of it and do some general exploration after testing your hypotheses? Continue reading →

Women Composers: Sources and Bias

Louise Farrenc (1804-1875)

There is a lot of interest at the moment in women composers. Until recently, women were a small minority of the composing population, but in working with large datasets, I encounter a surprisingly large number of female names (although it is often frustratingly difficult to find out any details about them). In the nineteenth century, for example, perhaps 1-2% of published music was written by women.1 Whilst that is an embarrassingly small proportion, it still equates to a substantial body of music by many hundreds of women composers – most of whom have since sunk into obscurity. There are of course many more from the twentieth and twenty-first centuries.2 Continue reading →

Why Quantify Music History?

There is a shocking absence of statistics in books on music history. Generations of music historians have shown little interest in using statistical analysis to quantify their subject.

But why should it be considered outrageous that music historians have not embraced the tools and techniques that would enable them to quantify music history? After all, there are plenty of excellent accounts of the history of music, all based on thorough and rigorous scholarship and a deep knowledge of the subject. Is this not enough? Continue reading →

This is not a book!

This website is about how to use statistical techniques to study music history. It is based on my PhD thesis, and on more recent work developing the techniques, investigating various topics in music history, and discovering new datasets and ways of understanding them.

It is perhaps more common to develop a PhD thesis into a book, and I have considered this option. But a website seems a more sensible way to go, for three main reasons… Continue reading →