Improving Public Health Decision-Making Through Alternative Data Sources: Part One
The relationship between the environment and public health is well-established, offering an opportunity for addressing public health issues in fields outside of the public health realm. For example, traffic rerouting can help lower pollution levels and repaving with new types of asphalt can sequester carbon dioxide. While many public officials are interested in addressing issues like asthma, heart disease, and adverse birth outcomes through interventions in the infrastructure, transit, and education spaces, the data necessary to drive these decisions is not always readily available.
The Community Data Health Initiative (CDHI) exists to help bridge this gap. A Data-Smart City Solutions project, CDHI helps cities take on environmental challenges that directly affect people’s health by working with mayors and city teams to turn local data, community insights, and cross-sector partnerships into targeted solutions. These are focused in neighborhoods where residents face the highest risk of poor public health outcomes, often low-income or formerly redlined areas or historically marginalized communities.
This article, part one of a two-part series, features work done by a team of University of North Carolina at Chapel Hill students for the Institute for Technology and Global Health (ITGH), a Harvard-affiliated research initiative. This student team spent a semester researching the challenges in accessing and utilizing public health data in local decision making, focusing on environmentally-impacted respiratory and cardiovascular health. Part one focuses on the challenges with accessing, sharing, and evaluating data, and explores alternative data sources as a supplement. The students compiled a database tool to help local officials find data that links environmental factors to respiratory illnesses and other related health outcomes.
The Problem
Policymakers and officials must have adequate access to public health data to address environmental impacts on public health. Yet, when attempting to address local environmental issues, they often face challenges trying to source, analyze, and share reliable public health data. Due to issues like inadequate data governance, limited availability and accessibility, and lack of modern data infrastructure, many local policies are not made through a public health lens.
Below are several reasons why local officials have issues with accessing or sharing public health data.
Data from health systems are often missing demographic indicators, are ambiguously defined, or are inconsistently coded due to current sharing practices. Such a fragile system leads to underdeveloped aggregation of data.
Public health infrastructure lacks proper technological mechanisms, data gathering, and disease surveillance measures. For example, in cases of drug overdoses, disease outbreaks, and environmental illness, the lack of data modernization at the local level hinders public health officials from identifying patterns within their communities.
Local data governance strategies can block population-based data sharing, despite the implementation of disease surveillance systems. Competing public health priorities and concerns about data authority and privacy contribute to failures in data governance. Simultaneously, with the emergence of “Big Data”— large and complex datasets — health data is becoming increasingly privatized, as it can potentially be mobilized as a biddable commodity.
Primary researchers and patients have to consent to sharing and making data available; oftentimes, in order to acquire data from clinical studies, all principal investigators must agree to share their findings. This can be especially difficult with multi-center studies with multiple principal investigators, due to the challenge of requesting “consent for secondary data analysis after a trial is complete” according to research published in BioMed Central.
Some localities and public health departments still house and transfer sensitive data in paper form. They require formal record requisitions and certification of rights to that data along with in-person analysis of datasets. According to research published by The New England Journal of Medicine, “The growing interest in data sharing will not translate into progress without the development of supportive infrastructure and policies that enable data to be shared effectively and responsibly,”.
To achieve sufficient public data sharing between stakeholders, accessibility of public health data must expand and current mechanisms must be integrated with alternative data sources.
Traditional Versus Alternative Sources
Comparing traditional data sources (like medical and academic databases) with alternative sources (such as social media, satellite data, and Google Mobility Reports) helps illuminate the landscape of available data and opens the possibility of supplementing inaccessible traditional sources. While those are typically more reliable and carefully managed, alternative sources offer fast, real-time data and wider coverage; however, they sometimes lack consistency and can be biased.
In order to assess the benefit of providing increased data access to policymakers with the drawback of using potentially unreliable data, the team assessed three main criteria – accessibility, feasibility, and user interface– to grade how different data sources support public health work. These factors were selected to represent realistic challenges faced by practitioners, researchers, and policymakers when navigating public health data systems. The team conducted a structured analysis of 59 traditional and alternative data sources, evaluating the advantages and disadvantages of existing data-sharing systems, and compiling them into the publicly-available Capstone Database.
Data Source Collection and Evaluation Criteria
First, the team gathered a wide range of publicly available databases and private data sources. These included government health records, academic data sources, de-identified medical records (such as Electronic Health Records), and modern tools like social media analytics, satellite imaging, and Google Mobility Reports. The goal of this analysis was to understand the quality and strength of sources to connect environmental factors with respiratory illness and cardiovascular health outcomes for informing policy.
Each data set was assessed using the following criteria, which were divided into sub variables and scored on a five point scale. Cumulative scores were averaged and ranked for overall usability and relevance to policymakers. The Database includes qualitative commentary on each dataset’s strengths, weaknesses, and unique findings.
Accessibility: how easily users can locate, access, and retrieve data. Databases were evaluated based on their searchability, availability of download options, and the number of steps required to locate key datasets.
Feasibility: how well the data aligned with project scope and how likely it was to be used by researchers based on its quality and reliability. The team assessed whether each data source contained information on environmental and/or health factors and whether the data were current and applicable.
User interface: how clearly the database platform is structured and designed. Databases were evaluated based on the organization of datasets, the descriptions of variables, mobile friendliness, and interactive tools. The quality of the user interface directly impacted the ability of technical and non-technical users to extract insights from the data.
To promote scaling up of this analysis, the Capstone Database includes a codebook for interpreting variables. The codebook is designed for use with data analytics software, such as R studio and Microsoft Excel. Additionally, a generalized table is provided which delineates dataset collection by region, respondent groups, and aggregation levels. The database has been thoroughly cleaned, and is built to be filtered or modified for easy usability as a framework for future research.
Following data collection, evaluation, and database curation, the team developed policy recommendations for local officials to mitigate the outlined issues. Three policy alternatives, which focus on improving access, usability, and collaboration across stakeholders, will be discussed in the second of these articles, with the goal of increasing the use of public health data in a variety of local government decision making.
About the Author
ITGH Data Team
The Institute for Technology and Global Health (ITGH) Data Team is a spring 2025 capstone group comprised of students Elizabeth Leigh Zillioux, Tom Joe Dominic, Jynessa June King-Garcia, Iman Abdi, Ahmed Adam El-Halabi, and Mary Margaret Gilbert. Recent graduates of the University of North Carolina at Chapel Hill, these six students spent a semester working with ITGH on a research project centered on understanding barriers and improving access to public health data.