High-resolution population estimates have become essential in fields like healthcare, disaster response, urban planning, and environmental management. These datasets provide a detailed view of population distribution, allowing decision-makers to target resources and interventions more effectively. Recent advances in integrating satellite data with census information have further improved the accuracy and timeliness of these population maps, making them powerful tools for both research and practical applications.
Two widely-used high-resolution population datasets are WorldPop
In this post, we will compare the methodologies of these two datasets, exploring their assumptions, modeling approaches, and limitations, to help guide users in choosing the most suitable product for their needs.
WorldPop
WorldPop employs a Random Forest model combined with a variety of geospatial inputs to perform dasymetric redistribution, aiming to capture the spatial heterogeneity of population distribution at the pixel level within census units. The model uses recent census-based population counts, redistributing them across grid cells by analyzing the relationships between population densities and diverse covariate layers, such as land use, elevation, and infrastructure. This approach assumes that the Random Forest model’s capability to incorporate multiple variables enhances the accuracy of population estimates.
HRSL, on the other hand, bases its methodology on high-resolution building detection, operating on the assumption that buildings serve as a strong indicator of population presence, especially in sparsely populated rural areas. Meta employs deep learning architectures, primarily Convolutional Neural Networks (CNNs) such as SegNet and FeedbackNet, to detect human-made structures in high-resolution satellite imagery. Once identified, the population is distributed proportionally across all buildings within census units.
WorldPop integrates traditional factors associated with population distribution—such as land cover, digital elevation models, and slope estimates—alongside supplementary geospatial data layers that may indicate human presence, including road networks, waterways, settlement boundaries, protected areas, night lights, and facility locations. Together, these data sources enhance the model’s flexibility and accuracy. Meta’s model, on the other hand, uses high-resolution (50 cm) satellite imagery from DigitalGlobe, primarily collected between 2010 and 2015, to identify and map individual buildings.
WorldPop evaluates model performance by comparing aggregated grid cell predictions to census data, and by benchmarking against other population mapping products, such as GPW, GRUMP, and Afri/AsiaPop. WorldPop quantifies accuracy using root mean square error (RMSE), percent RMSE (%RMSE), and mean absolute error (MAE). HRSL’s evaluation, however, is more focused on the accuracy of its building detection process. It separates statistical errors, which stem from random variation, from systematic errors, which result from misclassification (e.g., false positives). Meta uses precision and recall to assess building identification accuracy and standard error to evaluate population redistribution across detected settled areas.
When comparing high-resolution population models, particularly Meta’s High Resolution Settlement Layer (HRSL) and WorldPop, it’s crucial to understand each model’s strengths, limitations, and assumptions. Both models strive to estimate where people live, but their differing approaches mean that each model has unique challenges and trade-offs, especially when applied across diverse regions. Here’s a breakdown of the main limitations of each approach:
WorldPop
Input Layers and Spatial Resolution: WorldPop’s model incorporates multiple layers of environmental and ancillary data to improve contextual accuracy. However, this layered approach limits spatial resolution to that of the coarsest data layer, which can reduce overall precision. Additionally, the quality of these input layers varies by region and collection period, relying heavily on local data collection standards. This variability introduces potential biases and inconsistencies when data is aggregated across different countries or regions.
Modeling Technique: WorldPop uses a random forest machine learning model, which is based on decision trees. While flexible and powerful, this model has limitations in accounting for complex spatial variations across diverse geographic regions. Moreover, the decision-making process within random forests can be opaque, making it challenging to understand how specific inputs shape final estimates. This lack of interpretability may impact user confidence in the model’s outputs.
Non-zero Allocated Output: A unique limitation of WorldPop’s model is that it does not use building footprint data, resulting in population estimates for all land grid cells, even in areas without residents. This can lead to misallocations, placing population estimates in regions that are likely uninhabited.
Meta (HRSL)
Building on previous chapters, this section takes a higher-level perspective on modeling strategies, shifting the focus from technical and model-specific details to structural design. Here, we outline two primary approaches: the top-down and bottom-up methods
Both WorldPop and HRSL employ the Top-Down Approach, which starts with broad, aggregate data sources, such as national census counts. These aggregate counts, usually at high administrative levels, are then redistributed into smaller, consistent grids, using supporting data like land use, satellite imagery, and other geospatial inputs. In contrast, the Bottom-Up Approach begins with highly detailed, localized data—such as household surveys—and scales up to create gridded population estimates across larger regions. Each approach offers unique strengths and trade-offs, which shape population mapping outcomes.
Top-Down Approach
Bottom-Up Approach
In this section, we summarize key characteristics of main gridded population products, providing a comparison across multiple dimensions such as methodology, long-term monitoring capabilities, regional and global usability, and practical considerations. The focus will include WorldPop, HRSL, along with two other widely used products: Gridded Population of the World (GPW)
Characteristics | WorldPop | HRSL | GPW | LandScan |
---|---|---|---|---|
Top-down/Bottom-up | Top-down | Top-down | Top-down | Top-down for global version, bottom-up for HD version |
Allocation Method | Dasymetric | Proportional/uniform | Proportional/uniform | Dasymetric |
Model | Random forest | Convolutional Neural Network (CNN) | Statistical mapping | Probabilistic weighting (global) |
Input Data | Census data, land cover, elevation data, road networks, waterways, settlement boundaries, protected areas, night lights, and facility locations | Census data, DigitalGlobe satellite images | Census data, administrative data, water mask | Census data, land cover, elevation data, road networks, night lights (global); ORNL Population Density Tables, land use, infrastructure data, and points of interest (POI) |
Characteristics | WorldPop | HRSL | GPW | LandScan |
---|---|---|---|---|
Latest available year | 2020 | 2020 public version, 2023 AWS version | 2020 | 2022 from GEE, 2023 from Explorer |
Last Update Date | 2020 | 2020 (public), 2023 (AWS) | 2018 | 2023 |
Update Frequency | Yearly for unconstrained; no updates for constrained | Quarterly | Every five years | Yearly |
Resolution | 100m, 1km | 30m | 1km | 1km (global), 100m (HD) |
Available Period | 2000-2020 (unconstrained); 2020 only (constrained) | 2020 (public); at least 2020-2024 (AWS) | 2000-2020 | 2000-2022 (global) |
Characteristics | WorldPop | HRSL | GPW | LandScan |
---|---|---|---|---|
Input comparability | Mixed input layers from various periods | Yes, global building identification | Yes, as only population count and masks are considered | Mixed input layers from various periods |
Coverage | Global | Global | Global | Global (global); selected countries (HD) |
UN Adjustment | Yes, for both constrained and unconstrained | Yes | Yes | No, but differences are minor |
Characteristics | WorldPop | HRSL | GPW | LandScan |
---|---|---|---|---|
Input for building identification | No | Yes | No | No (global); Yes (HD) |
Zero Allocation | Yes (constrained) | Yes | Yes | Yes |
Characteristics | WorldPop | HRSL | GPW | LandScan |
---|---|---|---|---|
Relationship to Census | Census disaggregation | Census disaggregation | Census disaggregation | Census disaggregation (global); no census (HD) |
Age and Sex Breakdown | Yes | Yes | Yes | ADM1 level only |
GEE Accessibility | Yes | Yes, 2020 only | Yes | Yes |
When selecting a gridded population product, the decision involves balancing the available data with the specific needs of the use case. The choice depends on a variety of factors, starting with the most critical requirements for the application. Below is a proposed decision-making process to help determine the best gridded population product for customized use cases.
Single Country, Non-Monitoring Settings
If you are working in a single country with no long-term monitoring goals, and you need the latest available data at a finer resolution, we recommend investing in bottom-up modeling based on recent surveys (such as DHS). Alternatively, consider LandScan HD if high-resolution age and sex breakdowns are not critical, or HRSL if you have access to AWS services.
Single Country, Long-Term Monitoring
For long-term monitoring in a single country, WorldPop, GPW, and LandScan are ideal, as they provide population rasters dating back to 2000 and ensure historical consistency. It’s important to note that GPW updates every five years, while WorldPop and LandScan update annually. The choice between these datasets will depend on the presence of sparsely populated or undeveloped regions. In such areas, a zero-allocation method with building identification is preferred over non-zero allocation (as shown in the maps below). Even when land cover data is used, some datasets may overlook sparse settlements and categorize these areas into larger land types like grassland or cropland. Another key consideration is resolution—WorldPop and HRSL are better suited when a finer resolution is needed.
Global/Regional Setting
When working on a global or regional scale, additional considerations beyond spatial and temporal needs become relevant, such as the ability to adjust data across countries after redistribution. This is viable on the user’s side if consistent datasets like UN population estimates are available. However, the population product should ideally offer global coverage, making LandScan HD less suitable for this purpose.
Practical Considerations
There are several practical factors that can influence the final decision. In humanitarian contexts, where age and sex breakdowns are crucial for analyzing vulnerable populations, WorldPop, HRSL, and GPW are ideal options. For countries with recent, reliable population data—particularly for policy planning, demographic analysis, and resource allocation—aligning to census data is an essential consideration. Lastly, the availability and accessibility of these datasets via platforms like Google Earth Engine (GEE) may also impact the decision.