The New York Philharmonic has got an excellent online archive of all of its concerts since 1842. This article uses the archive to investigate which composers tend to be programmed together.
It is straightforward to download the concert archive as a single XML file, and to extract a table of dates, programme IDs, composer names and work titles.1 The table contains almost 103,000 entries, listing over 21,400 concerts (i.e. date/ID combinations), including over 11,300 different works by almost 2,800 composers.2
On average there were 3.5 composers represented at each concert. Just under 3,000 (about 14%) of the concerts were dedicated to a single composer. Half of these were accounted for by six names – Beethoven (16%), Wagner (8%), Mahler (8%), Tchaikovsky (7%), Brahms (6%) and Mozart (6%) – although there were another 120 or so composers with at least one single-composer concert. For the rest of this analysis of composer combinations I will ignore these single-composer concerts.
Amazingly, more than one in four non-single-composer concerts included a work by Beethoven! Bizet appeared about one tenth as many times as Beethoven, Rameau a tenth as frequently as Bizet, and Amy Beach only a tenth as often as Rameau. With such a large range, how do we test whether Beethoven’s 77 appearances on the same programme as Bizet are more or less significant than, say, Rameau’s ten appearances alongside Ravel?
One approach is to compare the actual number of concerts including both composers to the expected number if they appeared independently of each other. For example, Beethoven appeared in 26.38% of non-single-composer concerts and Bizet appeared in 2.58%. There were 18,463 concerts in total, so we would expect 0.2638 * 0.0258 * 18463 = 126.9, say 127 of them, to include both Beethoven and Bizet. The fact that there were just 77 implies that this combination is less likely than we would expect.
How can we compare different combinations of composers? One way would be to look at the ratio Actual / Expected. For Beethoven and Bizet this would be 77/127, or about 0.6. If this ratio is less than one, then the combination is less likely than expected: a ratio greater than one means it is more likely.
The combination of de Falla and Musorgsky also has a ratio of about 0.6: six concerts included both composers, compared to an expected number of ten. But we cannot really compare these ratios: if you expect something to happen 127 times and it only occurs 77 times, that is statistically MUCH more significant than getting six when you expect ten.3
So the ratio Actual / Expected is too simple to properly capture what is going on here. A better measure is Actual * log(Actual / Expected). This is less than zero if the actual number of concerts is less than expected, and also takes account of the statistical significance associated with the size of the numbers involved. On this measure, Beethoven & Bizet score -37.8, whereas de Falla & Musorgsky score a much less impressive -3.1.4
Using this measure we can see which composer combinations produce the biggest positive or negative scores. The biggest negative scores (i.e. least likely to appear together) are for any two of Beethoven, Brahms, Mozart and Tchaikovsky, followed by Beethoven/Dvorak, Mozart/Wagner, Beethoven/Haydn and Beethoven/Ravel. The biggest positive scores are for combinations of the major American composers (Bernstein, Copland, Gershwin), for the Viennese waltz composers (Johann Strauss I & II, Léhar, etc), and for Puccini/Verdi and Liszt/Wagner. So the rules of concert programming seem to include a) don’t put the really big names together, and b) national themes are good.
A more interesting perspective is to regard the Actual * log(Actual / Expected) as a measure of the similarity between composers, and to use this to cluster them together. One algorithm for doing this is…
- find the pair of composers with the highest similarity score
- merge them into a group
- recalculate the similarities between other composers and the new group
- repeat until all composers are accounted for, treating each new group as a single composer
There are various ways of doing the recalculation in step 3. A common approach is to use the average similarity scores for the new group. For the NY Phil data, all 2,800 composers produce a rather complicated result, but restricting it to composers appearing in 2% or more of concerts produces the following clustering:
This sort of chart is known as a dendrogram. The composers have been grouped as described in the algorithm above. The further to the left they are connected, the stronger the similarity (i.e. the more likely they are to be programmed together). As previously noted, the big orchestral names – Beethoven, Mozart, Brahms, Tchaikovsky, Dvorak, Ravel, and the combination of Haydn & Mahler – are in separate groups that only meet to the right of the blue zero-similarity line. The strongly connected groups largely follow geographical lines: Americans, Russians, French, Italians, Scandinavians, Austrians. Wagner/Liszt/Beethoven and Bach/Brahms are also common combinations. Handel, in the middle of the chart, does not link positively with any of the other groups on this list: examination of the data confirms that his strongest connections are with more obscure names such as Franz Gruber, Giovanni Gabrieli, Adolphe Adam and Arcangelo Corelli.
Much more could be done with this data, such as analysing trends over time, or using other features of the concert (location, month, number of performances) or of the composers (dates, nationalities). Or we could look at combinations of three composers rather than two. Or combinations of individual works rather than composers. Or compare the New York Phil with other sources, such as the BBC Proms Archive. Or look at chamber music or piano music. We could use different measures, and alternative ways of clustering. Or we could draw a network graph of links between programme-sharing composers, which would open up a whole new line of enquiry. But all that is for another day!
- The programme ID identifies a single concert programme, which might have been performed more than once. In this analysis I count each performance as a separate concert.
- The archive is updated from time to time. This analysis was based on the data as at 8th February 2019. The last concert included was on 16th December 2017.
- Consider tossing a coin three times and getting two heads, and compare that with tossing it 300 times and getting 200 heads. In the latter case you would certainly suspect that the coin was biased!
- Statistically, the Actual * log(Actual / Expected) measure forms the basis of the G-test, which is closely related to the chi-squared test.