I recently stumbled across this page on Wikipedia, listing music students and their teachers. This is an ideal dataset to explore as a network diagram, or “graph”, in which a set of points (or “nodes”) are connected by lines (or “edges”). Here, the nodes are individuals, and there is an edge between them if one taught the other.
R has several powerful packages that make it easy to analyse and visualise graphs. The main workhorse is
igraph, and two useful packages using it are
ggraph for drawing graphs, and
tidygraph for simplifying the analysis using regular tabular (tidy) data.1
Downloading the data was straightforward, as these Wikipedia pages are formatted consistently (with one or two minor exceptions), do not require deduplication, and link to further data such as dates of birth and nationality. After gathering the data and tidying it, I had two tables, a “node list” of individuals with their birth years and nationalities, and an “edge list” consisting of two columns – “teacher” and “pupil” – with one entry for each teacher-pupil pair. In total there were 6,383 names on the node list, and 8,717 connections on the edge list.2
Let’s start by drawing the entire network…
The nodes here are blue points (with 40% opacity, to allow for some overlap to be discerned), and the edges are black lines (10% opacity).
There are a few problems here. Firstly, there are many small disconnected groups of nodes scattered around the periphery. Whilst these are potentially useful for some topics, I am most interested in the large connected network in the centre of the diagram, so it would be sensible to ignore these disconnected groups.
Secondly, the nodes are linked by straight lines, and it is impossible to tell which end is the pupil and which is the teacher. In a “directed graph” such as this (where the edges have a particular direction to them), this can be addressed with arrows, or by using curved edges, where the direction can be inferred if an edge always veers, say, to the right on its path from teacher to pupil.
Thirdly, we are not making the most of the information available. For example we could colour the nodes (by nationality, perhaps), or the edges (by date). We could vary the size of nodes – perhaps based on how many students a teacher has. It is also possible to vary the shapes of nodes or the thickness of edges, or to add text labels, etc.
Fourthly is the question of layout. The graph above was generated by telling
igraph to lay it out
nicely, which leaves the software to choose something suitable. It selected a layout algorithm called the “Distributed Recursive Layout” or DRL.3 There are many different ways of laying out graphs – at least a dozen in
igraph that make sense with this data – including arranging the nodes along a line or circle, or various techniques based on forces along the edges or between the nodes, which tend to attract closely connected nodes together to reveal structures within the network. Choosing a suitable layout is something of a dark art that can significantly affect the appearance of a graph and the story that it tells.
Addressing the above four points results in something like the following…
To reduce the size and remove the disconnected outliers, I have only considered students who were also teachers, and only kept nodes in the largest connected component. The edges now curve to the right on their way from teacher to pupil, and are coloured from red (early) to blue (late) according to the year of birth of the student (although I admit this is rather hard to see). The nodes are coloured according to nationality, and their size reflects the number of students. The layout is an algorithm called “Fruchterman-Reingold”.4 The same graph, using a “linear” layout, with the nodes sorted by nationality, looks very different, with the emphasis more on the patterns of the edges, rather than the arrangement of the nodes…
Of course, all of these charts should have legends to show how colours correspond to nationalities, node sizes represent the number of pupils, and edge colours indicate dates. I have removed the legends in order to focus on the graph itself.
These charts are pretty, but we’ve not yet discovered very much. To get further, it is helpful to delve into some of the analysis that packages such as
igraph can do. We can, for example, find the “diameter” of the graph, which is the length of the longest shortest path between two nodes – i.e. look at all pairs of nodes, find the shortest path (if any) between each pair (respecting the direction of edges), and take the longest such path. For the graph shown above, the diameter is 15, meaning that there is chain of 16 individuals, each of whom taught the next (there are thus 15 edges connecting them). One such series is… Andrea Gabrieli — Giovanni Gabrieli — Heinrich Schutz — Johann Schelle — Johann Friedrich Fasch — Carl Friedrich Christian Fasch — Carl Friedrich Zelter — Felix Mendelssohn — Camille-Marie Stamaty — Camille Saint-Saëns — Gabriel Fauré — Nadia Boulanger — Marion Bauer — Milton Babbitt — Lejaren Hiller — David Rosenboom. Andrea Gabrieli (1532-1585) and David Rosenboom (b.1947) are the end points of the diameter, and this is the shortest path between them.
The diameter is a characteristic of the graph as a whole, but we can also look at the features of individual nodes. One example is the “betweenness centrality”, a measure of the number of shortest paths that pass through that node.5 It is straightforward to ask
igraph to calculate the betweenness score for all nodes. The one with the highest score turns out to be Nadia Boulanger (1887-1979). She was the product of a long line of influential teachers, and had a large number of students over her long life, several of whom also became prolific teachers.6
Here is a graph of Nadia Boulanger’s chain of teachers (starting with Andrea Gabrieli and Orlande de Lassus on the far left) and students (ending with John Luther Adams and Larry Polanski on the bottom right).7 Nodes are coloured by nationality, and the layout is Fruchterman-Reingold…
Boulanger herself is the large blue node in the centre of the main cluster. She had four immediate teachers (Fauré, Gedalge, Vierne and Widor), but her extended teachers network has a total of 78 members, many of whom were French (in blue) or (further back) Italian (purple). On the student side, Boulanger had 45 of her own, with her extended student network including an impressive 143 names. Most of these were American (dark pink).
If we restrict the graph to people born before 1900, Carl Czerny (1791-1857) has the highest betweenness score. Interestingly, no teacher-student path exists from Czerny to Boulanger, so there was no continuous link between the most influential teacher in the first half of the nineteenth century and that of a century later. The “last common teacher” of Czerny and Boulanger (i.e. the last person to appear on both of their extended teacher networks) was Antonio Salieri (1750-1825). Aaron Copland (1900-1990) and George Antheil (1900-1959) were the “first common students” where the Czerny and Boulanger branches rejoined.
Graphs can also be used to find clusters – groups of nodes that are strongly linked to each other, but are less well connected to other clusters. There are several different approaches, but here is the result of applying “Louvain” clustering, reflected in the colour of the nodes (and using the DRL layout)…
The Louvain algorithm here produces 22 coloured clusters which, as expected, are grouped close together in the above chart. The largest cluster (in pink, near the centre of the graph) has 124 members, including Czerny, Liszt, and Beethoven. Boulanger is in the sixth-largest cluster (a brownish shade, mainly in the dense arc to the top right of the graph), with 70 members including Gedalge, Sibelius, Honegger and several 20th century American composers.
It is interesting to investigate the extent to which these clusters (determined post hoc from the network of teacher-student connections) correspond to nationalities, which we might expect to be an important factor influencing the development of the network. This data can also be visualised as a graph…
This is a “bipartite” graph, where there are two types of node (nationalities on the left and clusters on the right) which only ever connect to a node of the other type. The size of the nodes represents the number of members of each nationality or cluster.8 The edges are coloured by nationality, with the width proportional to the number of teachers with that combination of nationality and cluster. The bipartite layout algorithm used here attempts to minimise the number of edge crossings.
We can see that there is some evidence of clusters reflecting nationalities. Most nationalities seem to be mainly associated with a single cluster, with links to a few other less significant ones, although all clusters have members from several nationalities. The Danes (at the top), for example, are mostly in the largest cluster “A”, with a few in “R”, but none elsewhere. Germans contribute to several clusters, but the bulk of them are in “N” and “A” with a few in “B”, “Q” and elsewhere. Nadia Boulanger’s cluster “F”, as previously observed, consists largely of Americans. Most of her French compatriots are actually in cluster “C”.9
The graph of links between nationalities and clusters is an example of a graph derived from our original teacher-student network. There is another type of derived graph that is worth mentioning here. This is the “shared-teacher” graph, where two nodes are linked if they had at least one teacher in common. Whereas the teacher-student graph shows the progression of musical influence through time, the shared-teacher graph represents the connections between contemporaries, specifically in how they were influenced by the same teachers.
The shared-teacher graph is shown below. Nodes are linked if they had a common teacher. Node colour indicates nationality, and node size is proportional to the number of nodes they are linked to. Edges are coloured by date (red = early, blue = late), with the line thickness reflecting the number of teachers in common (a few pairs of nodes have four teachers in common). The layout algorithm is Fruchterman-Reingold.
As we might expect, there are many small disconnected groups around the edge of the graph, where a few composers shared a teacher, but the chain does not extend very far.10 What is surprising, however, is that most nodes belong to a single large connected component that spans multiple nationalities and several centuries. This component has 730 members, ranging in date from Ercole Bernabei (1622-1687) up to the present day, and including almost all of the big names. Here, for example, is a shortest path from Bernabei to Magnus Lindberg (b.1958), in which each adjacent pair of composers had at least one teacher in common… Ercole Bernabei — Gaetano Carpani — Johann Adolph Hasse — Venanzio Rauzzini — Ignaz Moscheles — Franz Liszt — Cesar Franck — Albert Lavignac — Vincent d’Indy — Andre Pirro — Olivier Messiaen — Yvonne Loriod — Colin McPhee — Michael Tenzer — Magnus Lindberg.
We can also examine the “shared-student” graph, derived in much the same way, with edges representing the number of students taught by both teachers. The shared-student graph looks much like the shared-teacher graph, with many small disconnected groups and one large connected central component comprising 591 teachers spanning many years and nationalities, and including most of the big names.
This article has aimed to illustrate some issues related to the use of network graphs in the study of music history. They can certainly be a useful way of presenting and analysing certain data, and can reveal patterns that might be impossible to uncover using other methods. However, graphs are often difficult to interpret, and many of the algorithms used in their analysis and presentation are complex and can be sensitive to small differences in the data or parameters. As an investigative tool, graphs can be very enlightening, but any analysis really needs to be supported with closer investigation before firm conclusions can be drawn.
- I used
ggraphto draw a network graph in this previous post on pairs of words in song lyrics.
- The data was gathered in April 2020. There are of course plenty of caveats about the dataset. It is dependent on the information that happens to be available, and that somebody has chosen to add to Wikipedia. It seems to be biased towards US composers and the twentieth century (especially the last fifty years or so). Any conclusions from this analysis should take account of these potential biases.
- Martin, S., Brown, W.M., Klavans, R., Boyack, K.W., DrL: Distributed Recursive (Graph) Layout. SAND Reports, 2008. 2936: p. 1-10
- Fruchterman, T.M.J. and Reingold, E.M. (1991). Graph Drawing by Force-directed Placement. Software – Practice and Experience, 21(11):1129-1164.
- The betweenness centrality of node A is defined as the sum, over all pairs of other nodes, of the proportion of shortest paths that pass through A.
- Other names that scored highly on betweenness were Carl Czerny, Gabriel Fauré, Carl Friedrich Zelter, Camille Saint-Saëns, Carl Friedrich Christian Fasch, Camille Marie Stamaty, Felix Mendelssohn, Ludwig van Beethoven, Franz Liszt and Roger Sessions.
- Interestingly David Rosenboom is one step closer to Boulanger than Adams and Polanski, so there must be a shorter path from them to Gabrieli which bypasses Boulanger, otherwise they would be on the diameter.
- I have only included nationalities with at least ten members. The nationalities are as described in the Wikipedia articles, so I have not attempted to merge English and British, or Czech and Bohemian.
- We could improve this analysis by also considering the effect of time, as the relationships represented in clusters will tend to be close chronologically, irrespective of any geographical effects.
- The largest of these disconnected groups has just ten members.