Women vs John

As part of my research into women composers, I have been playing around with first names – partly as a way of identifying genders among general lists of composers. The most common first name for female composers overall is Mary / Marie / Maria, followed by Anne, Florence, Alice, Dorothy, Elizabeth, Louise and Margaret.1

I thought it would be interesting to compare this with male composers, whose most popular first name is John / Johann / Johannes / Jean / Giovanni. Which are there more of, women, or Johns?

Using the Harant dataset of almost 48,000 composers, and identifying men and women by the likely gender of their first names,2 a quick bit of programming produces the following interesting graph…3

This shows the proportion of composers called John (or Johann(es), Jean or Giovanni) (in blue) and the proportion of composers who were women (in red), among all of the composers listed in Harant who were born before the years indicated along the x-axis.  So among composers born before 1800, about 3½% were women, and about 18% were called John – a ratio of about five Johns for every woman. It was not until the early twentieth century that the total number of female composers overtook the number called John.4

Cite this article as: Gustar, A.J. 'Women vs John' in Statistics in Historical Musicology, 27th April 2018, https://www.musichistorystats.com/women-vs-john/.
  1. This list is based on a combined dataset of the various sources listed in this previous article. Other sources give more or less the same result.
  2. The gender package for R  makes this straightforward. It is, however, only approximate with international lists of names such as this. A manual adjustment was needed to avoid a rather large number of French women composers called Jean, for example! It is also not very good at identifying some of the very early names – it misses ‘Hildegard’ for example.
  3. A CSV file with the data for this chart can be downloaded here. This is the data from Harant, with genders estimated from first names as above, omitting any data without both an estimated gender and a year of birth.
  4. Using a different dataset (the combined data gathered and deduplicated from multiple sources as part of the work for this article) gave slightly different answers, with the 1800 ratio being about three Johns to each woman, and a cross-over around 1935.

Leave a Reply

Your email address will not be published. Required fields are marked *