Skip to content

aeolus.portals

Functions for working with global data portals (OpenAQ, PurpleAir).

Functions

Search for monitoring locations in a portal.

Portals are global data platforms aggregating from multiple sources with potentially millions of locations. They require filters to narrow searches.

Parameters:

Name Type Description Default
portal str

Portal name ("OPENAQ", "PURPLEAIR", etc.)

required
**filters

Portal-specific search filters (REQUIRED) Common filters: - country="GB" - Filter by country code - bbox=(min_lon, min_lat, max_lon, max_lat) - Bounding box (GeoJSON/shapely format) - city="London" - City name (OpenAQ) - sensor_type="SDS011" - Sensor type (Sensor.Community)

{}

Returns:

Type Description
DataFrame

DataFrame with location metadata including: - site_code: Unique location identifier (use for download) - site_name: Human-readable name - latitude: Location latitude - longitude: Location longitude - source_network: Original data source

Raises:

Type Description
ValueError

If portal is unknown, not a portal type, or no filters provided

Examples:

>>> # Find OpenAQ locations in UK
>>> locations = aeolus.portals.find_sites("OPENAQ", country="GB")
>>>
>>> # Find locations in bounding box (min_lon, min_lat, max_lon, max_lat)
>>> locations = aeolus.portals.find_sites(
...     "OPENAQ",
...     bbox=(-0.51, 51.28, 0.34, 51.69)
... )
>>>
>>> # Extract site codes for download
>>> site_codes = locations["site_code"].tolist()
Source code in src/aeolus/portals/api.py
def find_sites(portal: str, **filters) -> pd.DataFrame:
    """
    Search for monitoring locations in a portal.

    Portals are global data platforms aggregating from multiple sources with
    potentially millions of locations. They require filters to narrow searches.

    Args:
        portal: Portal name ("OPENAQ", "PURPLEAIR", etc.)
        **filters: Portal-specific search filters (REQUIRED)
            Common filters:
                - country="GB" - Filter by country code
                - bbox=(min_lon, min_lat, max_lon, max_lat) - Bounding box
                  (GeoJSON/shapely format)
                - city="London" - City name (OpenAQ)
                - sensor_type="SDS011" - Sensor type (Sensor.Community)

    Returns:
        DataFrame with location metadata including:
            - site_code: Unique location identifier (use for download)
            - site_name: Human-readable name
            - latitude: Location latitude
            - longitude: Location longitude
            - source_network: Original data source

    Raises:
        ValueError: If portal is unknown, not a portal type, or no filters provided

    Examples:
        >>> # Find OpenAQ locations in UK
        >>> locations = aeolus.portals.find_sites("OPENAQ", country="GB")
        >>>
        >>> # Find locations in bounding box (min_lon, min_lat, max_lon, max_lat)
        >>> locations = aeolus.portals.find_sites(
        ...     "OPENAQ",
        ...     bbox=(-0.51, 51.28, 0.34, 51.69)
        ... )
        >>>
        >>> # Extract site codes for download
        >>> site_codes = locations["site_code"].tolist()
    """
    source_spec = get_source(portal)

    if not source_spec:
        raise ValueError(f"Unknown portal: {portal}")

    # Verify it's a portal
    source_type = source_spec.get("type", "portal")
    if source_type != "portal":
        raise ValueError(
            f"{portal} is a {source_type}, not a portal.\n"
            f"Use aeolus.networks.get_metadata() for networks."
        )

    # Portals require filters
    if not filters:
        raise ValueError(
            f"{portal} requires search filters.\n\n"
            f"Examples:\n"
            f"  aeolus.portals.find_sites('{portal}', country='GB')\n"
            f"  aeolus.portals.find_sites('{portal}', bbox=(min_lon, min_lat, max_lon, max_lat))\n\n"
            f"See documentation for available filters."
        )

    # Get search function
    search_func = source_spec.get("search") or source_spec.get("fetch_metadata")
    if not search_func:
        raise ValueError(f"Portal {portal} does not support location search")

    return search_func(**filters)

Download air quality data from a portal.

Parameters:

Name Type Description Default
portal str

Portal name ("OPENAQ", "PURPLEAIR", etc.)

required
sites list[str]

List of site codes (obtained from find_sites())

required
start_date datetime

Start of date range (inclusive)

required
end_date datetime

End of date range (inclusive)

required

Returns:

Type Description
DataFrame

DataFrame with standardised schema: - site_code: Location identifier - date_time: Measurement timestamp - measurand: Pollutant measured (e.g., "NO2", "PM2.5") - value: Measured value - units: Units of measurement - source_network: Original data source - ratification: Data quality flag - created_at: When record was fetched

Raises:

Type Description
ValueError

If portal is unknown or not a portal type

Examples:

>>> from datetime import datetime
>>>
>>> # Step 1: Find locations
>>> locations = aeolus.portals.find_sites("OPENAQ", country="GB")
>>> site_codes = locations["site_code"].head(5).tolist()
>>>
>>> # Step 2: Download data
>>> data = aeolus.portals.download(
...     "OPENAQ",
...     site_codes,
...     datetime(2024, 1, 1),
...     datetime(2024, 1, 31)
... )
Source code in src/aeolus/portals/api.py
def download(
    portal: str,
    sites: list[str],
    start_date: datetime,
    end_date: datetime,
) -> pd.DataFrame:
    """
    Download air quality data from a portal.

    Args:
        portal: Portal name ("OPENAQ", "PURPLEAIR", etc.)
        sites: List of site codes (obtained from find_sites())
        start_date: Start of date range (inclusive)
        end_date: End of date range (inclusive)

    Returns:
        DataFrame with standardised schema:
            - site_code: Location identifier
            - date_time: Measurement timestamp
            - measurand: Pollutant measured (e.g., "NO2", "PM2.5")
            - value: Measured value
            - units: Units of measurement
            - source_network: Original data source
            - ratification: Data quality flag
            - created_at: When record was fetched

    Raises:
        ValueError: If portal is unknown or not a portal type

    Examples:
        >>> from datetime import datetime
        >>>
        >>> # Step 1: Find locations
        >>> locations = aeolus.portals.find_sites("OPENAQ", country="GB")
        >>> site_codes = locations["site_code"].head(5).tolist()
        >>>
        >>> # Step 2: Download data
        >>> data = aeolus.portals.download(
        ...     "OPENAQ",
        ...     site_codes,
        ...     datetime(2024, 1, 1),
        ...     datetime(2024, 1, 31)
        ... )
    """
    source_spec = get_source(portal)

    if not source_spec:
        raise ValueError(f"Unknown portal: {portal}")

    # Verify it's a portal
    source_type = source_spec.get("type", "portal")
    if source_type != "portal":
        raise ValueError(
            f"{portal} is a {source_type}, not a portal.\n"
            f"Use aeolus.networks.download() for networks."
        )

    # Get data fetcher
    fetcher = source_spec.get("fetch_data")
    if not fetcher:
        raise ValueError(f"Portal {portal} does not support data downloading")

    return fetcher(sites, start_date, end_date)

List all available portals.

Returns:

Type Description
list[str]

List of portal names

Examples:

>>> portals = aeolus.portals.list_portals()
>>> print(portals)
['OpenAQ']
Source code in src/aeolus/portals/api.py
def list_portals() -> list[str]:
    """
    List all available portals.

    Returns:
        List of portal names

    Examples:
        >>> portals = aeolus.portals.list_portals()
        >>> print(portals)
        ['OpenAQ']
    """
    from ..registry import SOURCES

    return [name for name, spec in SOURCES.items() if spec.get("type") == "portal"]

Usage Examples

Find Sites

import aeolus

# Find OpenAQ sites in a country
uk_sites = aeolus.portals.find_sites("OPENAQ", country="GB")

# Find sites within a bounding box
# bbox format: (min_lon, min_lat, max_lon, max_lat) - same as GeoJSON/shapely
london_sites = aeolus.portals.find_sites(
    "OPENAQ",
    bbox=(-0.51, 51.28, 0.34, 51.69)
)

# Find PurpleAir sensors in an area (using standard bbox format)
purpleair_sites = aeolus.portals.find_sites(
    "PURPLEAIR",
    bbox=(-0.5, 51.3, 0.3, 51.7)
)

Download Portal Data

import aeolus
from datetime import datetime

# First get site codes from find_sites
locations = aeolus.portals.find_sites("OPENAQ", country="GB")
site_codes = locations["site_code"].tolist()[:5]

# Download using sites parameter (consistent with networks API)
data = aeolus.portals.download(
    portal="OPENAQ",
    sites=site_codes,
    start_date=datetime(2024, 1, 1),
    end_date=datetime(2024, 1, 31)
)

List Available Portals

portals = aeolus.portals.list_portals()
print(portals)
# ['OPENAQ', 'PURPLEAIR']

Supported Portals

Portal Description API Key
OPENAQ Global air quality data platform Yes (OPENAQ_API_KEY)
PURPLEAIR Global low-cost sensor network (30,000+) Yes (PURPLEAIR_API_KEY)