In what ways can statistical techniques be used to investigate topics in historical musicology? I think there are four main approaches – hypothesis testing, quantification, modelling and exploration. Their use depends on the topic, the data, and the type of question you are trying to answer.
These four types often overlap. It is hard to do modelling without some exploration and quantification, for example. Also, after you have spent so long collecting the data, cleaning it, and getting it into a form for statistical analysis, why not squeeze the most out of it and do some general exploration after testing your hypotheses?
If you have a hypothesis about a particular topic, you might want to find quantitative evidence to indicate whether or not it might be true. Hypotheses could include statements like these…
- the key signatures of orchestral works became ‘flatter’ during the nineteenth century, as brass instruments became more common;
- female composers before the First World War were less likely than male composers to write large scale works;
- Polish composers were more likely than Italian composers to have spent time in nineteenth-century Paris.
Hypothesis testing is a classic application of statistics, and there are standard techniques for a variety of situations. These can often be used for musicological questions, although sometimes it is less straightforward: it can be difficult to state hypotheses about music history in precise or easily measurable terms, or it might be difficult or impossible to find suitable data.
One case where standard tests were possible was my analysis of whether well-known pieces of piano music tend to be in ‘sharper’ keys than lesser-known works. The question could be clearly defined, and piano works could be easily sampled from the Dictionary of Musical Themes1 (which is biased towards well-known tunes) and from IMSLP (which has a high proportion of obscure and ‘domestic’ pieces). Averaging the number of sharps or flats in the key signatures, and using a standard statistical ‘t-test’, revealed that there is indeed a statistically significant difference of about 1 to 1½ sharps between works from those two sources.2
A more difficult case was testing Carl Dahlhaus’s claim that there was a “dead era” for symphonies between Schumann’s last symphony in 1851 and Brahms’s first in 1876.3 One problem is that it is not entirely clear what ‘dead era’ means. A second difficulty is that there is no obvious single source listing all symphonies, although it was possible (but time consuming) to use data gathered from multiple sources.4
In this case a visual check of the data was more appropriate than trying to apply a standard statistical test.
This chart shows that, if there was a ‘dead era’ in symphonic composition, it was not in the period that Dahlhaus claims, but rather during the first half of the century, in the wake of the decline of the almost mass-produced ‘classical’ symphonies of Haydn, Mozart and their contemporaries. From mid-century onwards, the rate of composition of symphonies increased reasonably steadily. This doesn’t, of course, tell us anything about the quality of the works, which might be what Dahlhaus was referring to.
This is where you want an answer to a ‘how much?’ or ‘how many?’ sort of question. How many British composers published music in the German market in the nineteenth century? How many works are listed by Pazdirek in his ‘Universal Handbook’?5 What proportion of nineteenth-century operatic arias are in triple time?
Sometimes it is straightforward to come up with estimates simply by collecting data and counting it. At other times, though, it can be more difficult. Counting all of the works or composers mentioned in the nineteen 600-page small-print volumes of Pazdirek’s Handbook is impractical, but it is possible to make an estimate by taking a random sample of pages, calculating the average number of composers and works per page, and then multiplying by the total number of pages. This needs to be done carefully in order to control for sampling bias, and will only give an answer within a likely range (depending on how many pages are sampled).6
Musical datasets often have very long-tailed distributions. There are a small number of very prolific ‘premier league’ composers, for example, with a rather larger number of ‘second division’ ones, and as the level of obscurity increases, so the numbers rise rapidly.7 In Pazdirek’s Handbook, over a third of composers only had a single work in print.8 Working with this sort of distribution presents some methodological challenges, and results tend to have high margins of uncertainty.
Estimating the size of musical populations (or works, composers, etc) can also draw on approaches from other fields. ‘Capture-Recapture’ techniques, for example, are used to estimate populations of animals based on how frequently the same individuals are caught in traps. Versions of this idea could be applied (in some circumstances) to estimate the total population of works or composers depending on how often they are ‘captured’ by appearing in different datasets.9
Sometimes we are trying to find out how musicological processes work. What are the important factors? What characteristics seem to be linked, or not? Are there ‘clusters’ that behave differently? The purpose of such an analysis might be to enable the creation and calibration of a simplified but realistic model that can be used to test other theories.
Questions might include this sort of thing…
- what factors influence the republication rates of sheet music?
- how do works or composers achieve ‘canonic’ status or slide into obscurity?
- which regions or cities have been net importers or exporters of composers, and why?
Such questions are usually difficult to answer, and typically require careful analysis of multiple datasets, a certain amount of trial and error, and checking against qualitative research and historical context. Breaking the question into smaller pieces can often help.
An examination of republication rates, for example, might usefully start by trying to identify clusters of works that share similar publication patterns. If such clusters are found, you can then look for common features of the works falling into each cluster. And you can compare the typical behaviour of a small number of clusters.
When I did this sort of analysis on piano music composed around the year 1837, I found three clusters – a group of works P1 (the majority) that were published once and fell out of fashion fairly quickly; a group P3 (a small minority) that quickly became established and have since been republished regularly; and a middle group P2 that was republished at a declining rate, reducing to zero over a period of about a century.
Sometimes there is no specific question other than ‘is there anything interesting to be found here?’ You might have come across a new dataset, or be interested in a particular musicological topic (women composers, nineteenth-century symphonies, early jazz recordings), and simply want to explore it to see whether you can find any interesting trends or patterns. This is often a valuable first step in getting to grips with a new topic or dataset.
Data exploration can be a productive and useful activity, with a good chance of revealing something novel and interesting. There are lots of numerical techniques that can be useful – descriptive statistics, cross-tabulations, correlations, clustering, regression – and various forms of visualisation (graphs, charts, diagrams, maps, etc) can be particularly helpful.
When you are not looking for anything in particular, there is a good chance that you will stumble across something that looks interesting. Humans are good at spotting patterns – perhaps too good, as we can often see completely spurious patterns in random data. This can be a particular problem with visualisations. Pictures of data are great for revealing patterns, but they do not usually tell us much about how significant the patterns are (in statistical terms). This typically requires doing some calculations.
Any unexpected patterns should if possible be verified or tested with independent data. It is no good using the same data both to make discoveries and then to test their significance. Verification should ideally be done with a different dataset. If, as if often the case, alternative data is not available, you could test with another random sample from the same dataset. If this is also impractical, there are circumstances where techniques such as ‘bootstrapping’ can be used with the same sample, to give an indication of the robustness of a result. There are also softer verification methods, such as referring to similar analyses in other fields, or to qualitative research. It is important to attempt some sort of verification before shouting too loudly about your exciting new discovery!
- Barlow, H. & Morgenstern, S. 1948. A Dictionary of Musical Themes. New York: Crown.
- For these purposes, a ‘flat’ counts as a negative ‘sharp’, so lesser-known works are ‘flatter’ than well-known ones. Further details of this analysis are in my thesis.
- Dahlhaus, Carl. 1989. Nineteenth-Century Music trans. J. B. Robinson. Berkeley: University of California Press, p. 78
- This will be discussed in a future post.
- See reference on the Printed Music Datasets page.
- My estimates were that Pazdirek lists roughly 730,000 works (±8%, with 95% confidence) by around 90,000 composers (±20%), based on a sample of 100 pages.
- The particular distribution that is often found in these cases is known in various guises as the ‘Zipf’, ‘Pareto’ or ‘Power Law’ distribution.
- See the chart in this article.
- In practice this is often difficult since few datasets are independent – they are frequently drawn from similar sources and from each other.