Two years ago, BYU computer science graduate student Stanley Fujimoto had just run out of funding for his bioinformatics project and needed a job. At the same time, economics professor Dr. Joseph Price needed a programmer to work in his family history lab. Although Fujimoto wasn’t especially interested in family history, his advisor, Dr. Mark Clement, thought the job would be a good match for his skills.
Fujimoto joined Price in the BYU Record Linking Lab to work on a collaborative project with FamilySearch, a nonprofit organization and website offering genealogical records, and the computer science department, building on work done by Clement and Dr. Bill Barrett.
When he joined the project, Fujimoto simply expected to do some programming. But, because of the COVID-19 pandemic, his work on developing a dataset from death certificates for the 1918 influenza pandemic has taken on bigger dimensions. The dataset his team is developing shows a more fine-grained view of the 1918 outbreak than has previously been available; by comparing the trajectory of the earlier pandemic with the trajectory of COVID-19, researchers may better understand the best strategies to combat viral spread over time.
“Doing this as a student has been really eye-opening because you start to see how an economist or a sociologist can transform the computer science work into something meaningful for their field,” said Fujimoto.
Another student, Eric Burdett, agreed that applying his programming skills to the COVID-19 response is gratifying.
“Sometimes in school we are focused so much on theory, and I love that I can see that the things we’ve been learning can actually make a difference,” said Burdett, who joined the team more recently and replaced Fujimoto as the student project leader when the latter graduated this past summer.
It all started with Dr. Price’s idea for using data on the cause of death from death certificates provided by FamilySearch. In the past, researchers could plot the curve of 1918 pandemic deaths based on how many people died in a given place at a given time, but they couldn’t break those numbers down into more detailed patterns. Price realized that they could generate more specific curves by linking each individual’s cause of death to other previously indexed attributes on the certificates.
“You can actually plot the curve by gender, age and race and analyze the data alongside the 1918 city-and county-level interventions to see which were most successful,” Price explained. “So when you look at a policy about when local governments chose to close the schools, you can look at the curve of deaths for school-age children to determine whether the closures helped.”
Their preliminary research suggests that the death rate for the 1918 outbreak was about twice as high in U.S. cities that chose not to implement any interventions, compared to those that did.
The original project, which had begun on a grant from the National Institutes of Health, was to gather data from only Ohio. When COVID-19 emerged, Price saw the work’s greater potential. Coordinating with FamilySearch, the team decided to expand the dataset to all states and accelerate the pace of the project in order to make the data publicly available.
“We will be creating the first-ever dataset of each individual who died in the pandemic,” Price said.
Ten students, including Fujimoto and Burdett, have been instrumental to the project’s success. To create the dataset, students identified and retrieved hundreds of thousands of relevant images from FamilySearch.
“That’s been quite a process because their collections are just massive,” said Burdett, who wrote computer code to interface with FamilySearch’s system. “We have access to millions and millions of records from FamilySearch, resources a lot of researchers haven’t had before.”
To teach the computer to extract relevant entries from certificates with varying layouts, Fujimoto modified and trained object detection algorithms typically used to identify people or cars in images. The students in the lab transcribe causes of death using a state-of-the-art handwriting recognition algorithm created by former BYU graduate student Curtis Wigington. Once they obtain the transcriptions, students assign a diagnosis code to the certificates to standardize differing ways coroners described the same cause of death. The automated process has allowed them to transcribe over 100,000 death records in under 2 hours, compared with the weeks or months of labor that human-generated transcriptions require.
For many, involvement in the project will shape their professional futures.
“This project is giving us the skills to be able to function in jobs in big fields in computer science like machine learning and artificial intelligence,” Burdett said.
As for Fujimoto—despite his past indifference to genealogy—seeing cutting-edge computer science and machine learning applied to family history has inspired him to take a full-time position as a data scientist with Ancestry.com.
The first dataset is now available at pandemic.familytech.byu.edu.