Automating Data Collection to Support Conflict Analysis: Scraping the Internet for Monitoring Hourly Conflict in Sudan
DOI:
https://doi.org/10.37256/ccds.7120268226Keywords:
web scraping, conflict data collection, sudan conflict, data automation, news mining, spatiotemporal dataAbstract
The ongoing conflicts in Sudan have escalated rapidly, highlighting the critical need for timely and accurate data to inform humanitarian responses, policy decisions, and research needs. While existing datasets such as the Armed Conflict Location & Event Data Project (ACLED) and the Uppsala Conflict Data Program Georeferenced Event Dataset (UCDP GED) provide valuable insights into conflicts, they suffer from update delays and lack source transparency, which hinders timely incident reporting and comprehensive analysis. To address these limitations, we developed a web scraping toolset that collects hourly data from the Internet, deploying the tools to support Sudan conflict analysis. The scraped data was used to build an open-access database that houses 6,946 articles as of October 25, 2024, from national, regional, and international sources, offering a transparent and easily accessible resource for further analysis. A case study is presented to demonstrate the scraper’s practical application in covering the siege of Sinjah, successfully capturing spatial and temporal events within a conflict zone. The scraped data outperformed the UCDP GED in capturing incidents but missed some smaller-scale incidents recorded by ACLED, highlighting areas for improvement through expanding source diversity. Overall, the scraper demonstrates great potential for improving conflict monitoring and could be further enhanced by incorporating additional sources and automation techniques.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Yahya Masri, Anusha Srirenganathan Malarvizhi, Samir Ahmed, Tayven Stover, Zifu Wang, Daniel Rothbart, Mathieu Bere, David Wong, Dieter Pfoser, Chaowei Yang

This work is licensed under a Creative Commons Attribution 4.0 International License.
