Automating Data Collection to Support Conflict Analysis: Scraping the Internet for Monitoring Hourly Conflict in Sudan

Authors

  • Yahya Masri NSF Spatiotemporal Innovation Center, George Mason University, Fairfax, VA, 22030, USA
  • Anusha Srirenganathan Malarvizhi NSF Spatiotemporal Innovation Center, George Mason University, Fairfax, VA, 22030, USA
  • Samir Ahmed NSF Spatiotemporal Innovation Center, George Mason University, Fairfax, VA, 22030, USA
  • Tayven Stover NSF Spatiotemporal Innovation Center, George Mason University, Fairfax, VA, 22030, USA
  • Zifu Wang NSF Spatiotemporal Innovation Center, George Mason University, Fairfax, VA, 22030, USA
  • Daniel Rothbart Jimmy and Rosalynn Carter School for Peace and Conflict Resolution, George Mason University, Arlington, VA, 22201, USA
  • Mathieu Bere Jimmy and Rosalynn Carter School for Peace and Conflict Resolution, George Mason University, Arlington, VA, 22201, USA
  • David Wong Department of Geography and Geoinformation Science, George Mason University, Fairfax, VA, 22030, USA
  • Dieter Pfoser Department of Geography and Geoinformation Science, George Mason University, Fairfax, VA, 22030, USA
  • Chaowei Yang NSF Spatiotemporal Innovation Center, George Mason University, Fairfax, VA, 22030, USA

DOI:

https://doi.org/10.37256/ccds.7120268226

Keywords:

web scraping, conflict data collection, sudan conflict, data automation, news mining, spatiotemporal data

Abstract

The ongoing conflicts in Sudan have escalated rapidly, highlighting the critical need for timely and accurate data to inform humanitarian responses, policy decisions, and research needs. While existing datasets such as the Armed Conflict Location & Event Data Project (ACLED) and the Uppsala Conflict Data Program Georeferenced Event Dataset (UCDP GED) provide valuable insights into conflicts, they suffer from update delays and lack source transparency, which hinders timely incident reporting and comprehensive analysis. To address these limitations, we developed a web scraping toolset that collects hourly data from the Internet, deploying the tools to support Sudan conflict analysis. The scraped data was used to build an open-access database that houses 6,946 articles as of October 25, 2024, from national, regional, and international sources, offering a transparent and easily accessible resource for further analysis. A case study is presented to demonstrate the scraper’s practical application in covering the siege of Sinjah, successfully capturing spatial and temporal events within a conflict zone. The scraped data outperformed the UCDP GED in capturing incidents but missed some smaller-scale incidents recorded by ACLED, highlighting areas for improvement through expanding source diversity. Overall, the scraper demonstrates great potential for improving conflict monitoring and could be further enhanced by incorporating additional sources and automation techniques.

Downloads

Published

2025-12-15

How to Cite

1.
Yahya Masri, Anusha Srirenganathan Malarvizhi, Samir Ahmed, Tayven Stover, Zifu Wang, Daniel Rothbart, Mathieu Bere, David Wong, Dieter Pfoser, Chaowei Yang. Automating Data Collection to Support Conflict Analysis: Scraping the Internet for Monitoring Hourly Conflict in Sudan. Cloud Computing and Data Science [Internet]. 2025 Dec. 15 [cited 2026 Jan. 8];7(1):63-84. Available from: https://ojs.wiserpub.com/index.php/CCDS/article/view/8226