select a location on the map to get started...
each dot represents a location that was mentioned in a New York City local news article.
- ● purple dots have more recent articles
- ● pale blue dots haven't been covered in a while
- ⬤ larger dots have had a lot of articles written in the last week
- · smaller dots represent one article or so
or, click a recently popular location below
select a location on the map to get started...
each dot represents a location that was mentioned in a New York City local news article.
- ● purple dots have more recent articles
- ● pale blue dots haven't been covered in a while
- ⬤ larger dots have had a lot of articles written in the last week
- · smaller dots represent one article or so
or, click a recently popular location below
about this map
Nick DeMarchis developed this page as a proof-of-concept to map New York locations mentioned in local news articles in near-realtime.
While the map functions as intended, there are clear areas of improvement. Check the issue tracker for a list of outstanding bugs. If you’ve found a new bug, or have feature requests, can contact me using the information found at the bottom of the about page.
why
Local journalism is in crisis. Subscription revenue is in freefall and outlets can barely cover basic community issues with the resources they have. According to Pew, “at a time when many local news outlets are struggling and Americans’ trust in the news media has waned, the vast majority of U.S. adults (85%) say local news outlets are at least somewhat important to the well-being of their local community.”
This crisis has several downstream effects, including “news deserts” — areas of the country that receive no coverage because all of their outlets have closed. Northwestern University’s Medill School “Local News Initiative” tracks local outlets, and releases an annual report analyzing local news coverage county-by-county across the US. It states that “There are 206 counties in the United States with no news outlets,” according to the initiative, and 1,558 counties only have one.
This data has been invaluable to understanding the state of local news coverage and the consequences of news deserts are well-documented. However, I wanted to look beyond the physical spread of local outlets, and bring attention not just to the counties that have just one or two.
My theory is that not all communities are treated equally when coverage is low and resources are spread thin. Could a county-level newspaper, for instance, focus a significant amount of coverage on the county seat, but miss happenings in other population centers spread miles away? If so, how could we know?
This project aims to examine this problem by mapping the locations mentioned in local news articles in one region — New York City. Why? It’s where I live right now, but more importantly, it's one of the most journalism-heavy environments in the US. Moreover, many of the edge cases that would exist expanding this data to cover the rest of the country can be identified and addressed here. For instance, New Yorkers have their choice of corporate media (both print and TV), nonprofit and worker-owned outlets, non-English print-only publications and hyper-local, small-circulation dailies. We have duplicate addresses within coverage areas (Fifth Avenue in Brooklyn versus Manhattan), ambiguous locations like universities and hospitals, local names and landmarks, outlets with overlapping coverage areas, and more. That’s to say, the work is clear to expand this project more broadly.
Others have undertaken similar projects— read more about them below. I will soon be creating a blog page and post updates to this project there, as well as on my personal X and Bluesky. And please feel free to reach out with any suggestions or comments!
technical details
overview
- frontend — Next.js using App router. MapLibre GL map. GeoJSON served with Next API route.
- data collection — Python script with
newspaper3k
library. - database — Supabase SQL db with three tables to store articles, locations, and their relations.
details
The Python script runs on a schedule in a Github Action. It checks against a predetermined set of RSS feeds, and determines whether any new articles have been added since it was last checked. If so it will pull relevant information from those articles and temporarily store them. The script runs more frequently during the middle of the day and less frequently overnight.
It then uses OpenAI’s gpt-4o-mini-2024-07-18 model with the following prompt to extract the physical locations mentioned in the text of each article.
Your goal is to extract all points of interest and street addresses from the text provided.
After we have relevant locations extracted, we use the Google Maps Geocoding API to determine their location. Any locations that are too generic to be mapped at this proof-of-concept stage — for instance, general county or state mentions — are removed.
Our database then receives information about the locations, the articles, and the relationship between the locations and articles.
On the frontend, whenever someone loads the page we use a Next.js Route Handler to reformat our database information as GeoJSON. We then serve that GeoJSON to an MapLibre GL map.
When a user clicks a location, we use another route to handle database requests and pull the information of the articles that match the selected location.
blind spots
There are issues, here are some:
- The OpenAI model isn’t amazing at pulling out non-obvious locations, and sometimes is either too verbose or too general.
- At this moment, we can only use publications that have working RSS feeds. Most still do, but not all.
- We did our best to filter feeds by their metro/local reporting, when relevant. That could leave state- or federal-level news stories that have local impact off the map.
next steps
There’s a lot of work left to do in regards to addressing problems found in the issue tracker. Read more above to understand why we're building this.
further reading
- Z. Metzger, “The State of Local News,” Local News Initiative. Accessed: Dec. 31, 2024. [Online]. Available: localnewsinitiative.northwestern.edu
- G. Ariyarathne and A. C. Nwala, “3DLNews: A Three-decade Dataset of US Local News Articles,” inProceedings of the 33rd ACM International Conference on Information and Knowledge Management (CIKM ’24), New York, NY, USA: ACM, 2024, pp. 1–5. doi: 10.1145/3627673.3679165.
- Media Cloud