Northwestern University

05/06/2026 | Press release | Distributed by Public on 05/06/2026 13:20

Digital archive reveals how NIH built the field of genomics

Digital archive reveals how NIH built the field of genomics

Study details how taxpayer-funded agencies choose what science to support, previously a 'black box'

Media Information

  • Release Date: May 6, 2026

Media Contacts

Kristin Samuelson

Journal: Nature Communications

  • Northwestern-developed software enabled analysis of Human Genome Project archive
  • NIH supported academic scientists, developed shared research infrastructure, study found
  • Continuity of NIH leadership preserved expertise across decades-long projects
  • 'Funding agencies matter beyond the funds they distribute'

CHICAGO - Research funding agencies supported by taxpayer dollars do more than write checks - they help build entire scientific fields, reveals a new Northwestern University study recently published in Nature Communications.

The study details how Northwestern scientists and National Institutes of Health (NIH) historians developed software that extracts and connects information across thousands of documents in a publicly accessible digital archive of the Human Genome Project (HGP). In 2023, the National Human Genome Research Institute (NHGRI) created the archive of the HGP, a landmark international research effort between 1990 and 2003 that successfully mapped and sequenced the entire human genetic code (roughly 3 billion DNA base pairs).

Using the new software, the scientists revealed how federal funding agencies did more than distribute money. They helped guide the development of genomics by coordinating scientific communities, supporting the scientific workforce, developing shared research infrastructure and helping resolve technical challenges that no single laboratory could manage alone.

The archive documents the evolution of genomics across model organism sequencing, human variation research and genetic epidemiology, which ultimately enabled transformative advances in medicine, biotechnology and evolutionary biology. None of this would have been possible, the study shows, without leadership at the NIH and NHGRI playing a hands-on role over decades.

NIH leaders were directly involved in solving technical problems, coordinating large-scale collaborations and ensuring continuity of expertise across successive projects. Continuity of personnel within NIH also was important to preserve expertise between projects, said co-corresponding author Thomas Stoeger.

"Funding agencies matter beyond the funds they distribute," said Stoeger, assistant professor of medicine in the division of pulmonary and critical care at Northwestern University Feinberg School of Medicine. "We show that in early genomics, the leadership of a funding agency was directly involved in resolving technical problems, bringing together scholars and allocating money toward shared resources."

Funding decision-making a 'black box' until now

Until now, precisely how these funding agencies work with academic communities and decide to support certain scientific projects over others has long remained a "black box" to the scientific community.

"Why do certain projects never succeed? Why do they fail? Why do some never get funded? These are all questions we're dying to know and are at the very core to understanding how science works, but no one has really been able to study at scale how that's done," said lead author Spencer Hong, a former graduate student at the McCormick School of Engineering.

To address that gap, the research team analyzed the NHGRI's digital archive and traced how NHGRI responded to emerging scientific needs, supporting the development of genome-wide association studies, coordinating complex collaborations and jointly deciding with external experts which non-human organisms' genomes to sequence.

'Science takes time and … foresight'

The scientists developed a custom legal and computational framework that allows scholars and AI to safely access and analyze internal government documents. This enables scientists to quantify not only research outputs, such as publications and grants, but also the processes that make discovery possible, the authors said.

"This is an incredible reminder of the brilliant, revolutionary work that the NIH has done across a dozen fields," said co-corresponding author Christopher Donohue, a former historian at the NHGRI at the NIH and a researcher at the Institute for Clinical and Translational Science at the University of California, Irvine. "In the context of AI, the power of AI is clearly in this paper but also the importance of ethical, responsible and responsive AI research."

Furthermore, the archive is a reminder of the importance of state-funded research agencies supporting the early stages of contemporary scientific fields, such as genomics and emerging technologies, the study authors said.

"Science takes time, and science takes foresight," Hong said. "What we see throughout the paper is that a lot of really beneficial and impactful technology was in development and was in need of support years before it ever made it to publication or for the scientific community to use and, therefore, downstream affect us, the public."

Other Northwestern authors include Mohammad Hosseini, assistant professor of preventive medicine in the division of biostatistics and informatics, and Kristi Holmes, associate dean for Knowledge Management and Strategy and director of the Galter Health Sciences Library.

This work was supported in part by grants from the National Science Foundation (DMS-2235451) and Simons Foundation (MP-TMPS-00005320) to the National Science Foundation F-Simons National Institute for Theory and Mathematics in Biology (NITMB), and NIH UM1TR005121, U24LM013751 and 1OT2DB000013-01 and R00AG068544.

Northwestern University published this content on May 06, 2026, and is solely responsible for the information contained herein. Distributed via Public Technologies (PUBT), unedited and unaltered, on May 06, 2026 at 19:20 UTC. If you believe the information included in the content is inaccurate or outdated and requires editing or removal, please contact us at [email protected]