‘Big data’ is a big issue these days – from analysing voters’ views on election policies to evidence-based medicine, crunching the numbers on huge collections of information often yields useful insights (even if only ‘we need a new computer’).
It’s no surprise, then, that this sort of data analysis has been applied to the world of genealogy. Every time one of the leading data websites sends forth a press release on how many of our ancestors were names after music hall stars, or how many worked in domestic service, say, these are gleanings from analysing their wealth of data. But there are more academic studies, too, often revolving around data gathered from DNA tests.
For example, a team led by members of the New York Genome Center and the Whitehead Institute for Biomedical Research in Cambridge, Massachusetts published a paper in 2017 entitled ‘Quantitative analysis of population-scale family trees using millions of relatives’. In their abstract, they observe that ‘the collection of extended family trees is tedious’ – do they not know how to have fun like family historians do? – and describe how they ‘collected 86 million profiles from publicly-available online data from genealogy enthusiasts’. The data specifically came from the work of three million contributors to Geni.com, now owned by MyHeritage.
And what did they do with all this info?