Franz Pazdírek was a Viennese music publisher who, in the first decade of the twentieth century, compiled a ‘Universal Handbook of Music Literature’ – a composite catalogue of all sheet music then in print, worldwide. This ambitious undertaking (which, perhaps not surprisingly, was never repeated) was published over six years, and resulted in nineteen 600-page volumes listing music publications by 1,400 publishers covering every continent except Antarctica.

Unfortunately, Pazdírek’s Handbook is not available in a form that makes it easy to analyse automatically, although it is available online at the internet archive *www.archive.org*.^{1} By taking a random sample of 100 pages and counting the number of composers and works per page, it is possible to estimate that the Handbook lists around 750,000 works by about 90,000 composers. The chart below was created by counting the number of works for the sampled composers, arranging them in ascending order, and plotting the cumulative total proportion of works against the cumulative proportion of composers. Notice the logarithmic vertical axis, where each ‘tick’ represents an increase by a factor of two in the proportion of works (unlike the non-logarithmic horizontal axis, where each tick represents an additional 5% of composers).

The chart shows that half of all composers account for just 8% of published works (with the more productive half thus producing the remaining 92%) and that the most productive 8% of composers were responsible for half of the works in print. Even this 8% amounts to perhaps 7,000 composers, which is far more than the number that are well-known today. The few hundred most prolific composers are squeezed into a tiny section of the top right corner of the graph. Of course the measure of productivity used here is not the same as being well-known, although there is some correlation. There are many very prolific names listed by Pazdírek who are now largely unknown, such as Rodolfo Mattiozzi (1832-1875), who has 163 mandolin works listed, mainly arrangements. Similarly there are several well-known composers who are famous for just one or two works (think of Max Bruch or Johann Pachelbel).

The early years of the twentieth century predate the commercial development and widespread popularity of recording and broadcasting, and the ‘popular music’ of the day took the form of sheet music to be played at home. In fact, as Pazdírek mentions in his introduction, around two thirds of the works listed in the Handbook are popular songs and short piano pieces aimed firmly at the domestic market. Many of these piano pieces (and a healthy number of those for other instruments) are arrangements, variations, and fantasies on tunes from operas and other well-known works. The composers credited for these derivatives are often not the composers of the works on which they are based. It is also apparent, even from the analysis of a fairly small sample, that there are a few cases of composers working under pseudonyms.

There is therefore probably a degree of overestimation in the number of composers. Several of the most popular works, too, are counted more than once, because they have different titles in different languages, they are included in albums, or they are longer works (operas, symphonies, song cycles) whose individual movements have been published separately. However, given that relatively few works enjoy even a repeat publication, let alone publication in several languages or inclusion in an album, this only affects a small proportion of the total.

Offsetting these arguments for overestimation are those for what is left out. We do not know, for example, how many publishers failed to respond to Pazdírek’s request for information. There are also many published works that had gone out of print by the first decade of the century, even though copies might still have been available second-hand or through libraries. It is possible to estimate the amount of out-of-print material by looking at the sequences of opus numbers in the Handbook. Of those composers whose works have opus numbers (less than a third), the highest opus number mentioned in the Handbook is, on average, about five times the number of opus-numbered works actually listed, suggesting that only about one fifth of their works were still in print. Opus numbers were most used by the more prolific composers, often from earlier in the nineteenth century, so this figure is not necessarily representative and must be treated with caution. Although Pazdírek does not include dates for either the composers or works listed, of those works in the sample that could be dated, over half were less than 25 years old when the Handbook was compiled.

The distribution of works per composer is a tricky one to handle statistically. Not only is it highly skewed, with a large number of single-work composers and steadily fewer as the number of works increases, but the ‘tail’ of large composers falls away relatively slowly. Many distributions, such as the Normal distribution, have tails that rapidly disappear almost to zero, so the amount that they contribute to statistical calculations such as averages, is usually insignificant. This distribution of works per composers, however, has a tail that declines slowly enough to have a significant effect on calculations such as the mean and standard deviation and can render such figures meaningless or irrelevant. The most prolific composers, though rare, are nevertheless common enough to make a substantial difference to the figures: the mean number of works per composer is about eight, for example, but, based on the chart, this does not appear to be a particularly useful or representative statistic.

This sort of distribution actually occurs quite a lot in the humanities and is one that will reappear several times on this site. Mathematically it is close to a shape known as the ‘Pareto’, ‘Zipf’, or ‘Power Law’ distribution, where the probabilities decline according to a small power of the variable (1/x^{s}, for a small number s), rather than exponentially or faster (1/a^{x}, for some value a) as is found in many well-behaved distributions such as the Normal. ‘Pareto’ is the continuous-variable version of the distribution, and ‘Zipf’, as we have here, is for a discrete variable (i.e. the number of works is always a whole number). It is named after linguist George Kingsley Zipf, who first observed it in a study of the frequency of common words: the second most common word was found to be about half as frequent as the most common word, the next appeared one third as frequently, and so on.^{2} The Zipf distribution has also been found to describe the size of cities, publication rates of academic papers, the number of connections in social networks, and other things in various fields.

The shape of this distribution (specifically the effect on the standard deviation of the thick tail) results in quite wide margins of error in statistical calculations. For example, the estimate of 90,000 composers in Pazdírek’s Handbook is the centre of a 95% confidence interval that varies by ±20% (based on a sample of 100 random pages). The 95% confidence interval for the estimate of the number of works is a much more reasonable ±8%, as the number of works per page is not affected by the Zipf-like distribution of works per composer.

A particular feature when extracting a random sample from sources such as books is ‘length bias’.^{3} This is a tendency for the longest entries to be randomly selected more frequently than the smaller ones simply because they take up most space. Whilst it is straightforward to select random *works* from Pazdírek’s Handbook without length bias being a problem, since the work entries are all relatively short and typically comprise just a few lines of text, the space occupied by a *composer* may vary from a couple of lines to many tens of pages. Selecting the composer whose entry is in progress at the start of a randomly selected page will therefore tend to favour the composers with the longest entries. However, a simple modification to the sampling strategy can avoid this problem. Selecting the next or (better) the third or fifth composer to be mentioned *after* the start of the random page will produce a sample where large and small composers are all equally represented. A gap (such as three or five composers) reduces the risk of *autocorrelation* – where the lengths of successive composers’ entries are correlated, usually because they are members of a dynasty such as the Bach family. Autocorrelation is rarely a problem in sources that are ordered alphabetically, but it is worth bearing in mind.

The bottom left of the chart represents the largely forgotten single published works of the many thousands of minor composers who, by and large, failed to make a living from publishing their music, but who nevertheless probably represent the bulk of everyday musical activity. Many would have been amateur musicians, or made a living as performers or teachers, with composition being a peripheral activity. The number of single-work composers in Pazdírek’s Handbook is over a third of the total – at least 30,000, and that omits all of the minor names from previous generations whose works had already gone out of print by the turn of the twentieth century (as most would have done). Most musical datasets display the same pattern: as composers get smaller and smaller (by number of works, number of recordings, or whatever measure), their numbers increase. What happens to these numbers once the unit of measurement becomes too small? How many composers were there with zero works in print in the first decade of the twentieth century? We cannot know, but extrapolation would suggest that the number is large.

An interesting by-product of looking statistically at sources such as Pazdírek’s Handbook is that the process inevitably throws up a lot of unfamiliar music by obscure names. Many of these are impossible to pursue because the works are lost and nothing is known about the composer, but there is a tantalising layer of works and composers with whom some progress can be made with a little research. As an example, take Carlotta Cortopassi, who appeared in the sample taken for this analysis. She is mentioned as the composer of a single work: a piano piece called *Desolazione*, published by Venturini of Florence. A little research reveals that the Italian National Library has a copy, catalogued as an undated *notturnino per Pianoforte *of five pages. It also has two other works by her not listed by Pazdírek: an undated *Melodia religiosa per Pianoforte*, and *Non ti scordar: polka per Pianoforte, *which it dates to 1869-1873. It also lists a 32-page monograph cataloguing the works of the Cortopassi family: Alemanno (1838-1909), Domenico (1875-1961), Marcello (1900-1975), Carlotta, and Massimo*. *Based on the dates, Carlotta would presumably have been of Alemanno’s generation, perhaps a sister, probably born in or around Lucca. Interestingly, the website of the Ellis Island Foundation records the arrival in New York of (presumably a different) Carlotta Cortopassi, married, aged 41, from San Gimignano in Tuscany, aboard the *Campania, *which sailed from Genoa in 1908. She was accompanied by her three children: Pietro (11), Mario (9), and Giuseppe (8). This Carlotta, born Gabrielli, was married to a Luigi Cortopassi who might have been a member of the composing family. It would perhaps be possible to follow up these connections to see if there is a link. This might be enough information from which Carlotta’s story could be researched more fully. She is just one among many tens of thousands of almost-unknown names who have played a part in the history of music. Analysis of these historical datasets can give such composers a voice, albeit a small collective one, in music history.

*Statistics in Historical Musicology*, 13th April 2018, https://www.musichistorystats.com/franz-pazdireks-universal-handbook/.

- See the Printed Music Datasets page for details.
- Zipf, G. K. 1935.
*The Psychobiology of Language*. Houghton-Mifflin. - Length bias has been described in more detail in this article.