Important: Read this!
The description below was last updated on March 10, 2015. Click over to our Github project and check out the readme and other docs there to see the current state of the project.
You can find the Huboard we've (sometimes) used to track tasks and progress at Huboard and we (sometimes) communicate with each other at our Hackpad.
Mission Statement
Philly Open Health increases access to and availability of public health data for Philadelphia and surrounding counties. There is a huge amount of publicly-available population health data available for the greater Philadelphia area, but it isn't technically "open data". While some of the data on Philly Open Health will be open (machine-readable) data, such as population demographics, other data will be sourced from PDFs, siloed databases, and filtered federal data sets. The combination of data sources and types will provide a more comprehensive look at population health in the greater Philly area, even though open data in the public health arena is limited.
Starting Off
The jumping-off point for Philly Open Health is the Office of HIV Planning (OHP)'s annual epidemiologic profile (or "epi" profile), which covers Philadelphia and eight surrounding counties in two states. This document includes over 200 tables and 100 figures on everything from race/ethnicity to drug use to poverty to HIV/AIDS. This document has always been designed for print, which means that limited data are included for presentation. OHP would like to provide the public with the data used in developing the epi profile in one centralized location.
Problem
The data being uploaded to Philly Open Health comes from dozens of sources and has no standardized format. Many of the source files are PDFs or other files that cannot be easily manipulated, and they can be difficult to track down. In other cases, regionally relevant data can only be made available by filtering data sets with statistical analysis software that the general public may not have access to or knowledge of. While some of this data is selected for presentation in OHP's epi profile, the PDF format is limiting. For example, a nonprofit organization looking for data about a particular population in order to write a grant proposal might have to flip through the 400+ page epi profile by hand to find a table that might be relevant to them, and then manually copy it.
Stakeholders
People who will/might use the repository include:
- Public health nonprofits
- Grant writers
- Community planners
- Social service organizations
- Health departments
- Students
- OHP staff
- The general public
The Solution
A website built using Ruby on Rails, which users can visit to search for data that matches various search criteria such as geography, demographics, etc - see below for a list of proposed metadata. Each document will have a set of metadata (see below) as well as links to one or more files to contain that data. Access to the data is public; the ability to upload or alter database contents will be limited to trusted users. In addition, each document should have a unique permanent URL that can be referenced in documents such as the epi profile - so that below a given table, for example, there might be "Original data available at phillyopenhealth.org/datarepo/987654321/."
Known Limitations (ie, things we’re not worrying about right now)
The files we’re trying to make available come in many different formats and cover many different types of data. Some are raw survey data; some are heavily processed statistics.
In addition, the presentation of each file is usually optimized for human readability, so we’d need some pretty complex processing to ‘extract’ the information from each one - and that logic might not carry over to the next file, which is formatted very differently. It would be cool if we could start turning some of these files into standardized formats for easier machine processing - but that problem gets messy fast, because the data here is so heterogeneous in every way, we’re not going to worry about that… yet. First we make the data available, then we can worry about making it easier to work with.