Astrape/docs/weather-source-data.md

# Weather Source Data

## Goal

This subsystem aggregates external weather forecasts and stores them in a clean database-ready shape.

Terminology:
- **forecast**: data from an external weather source, such as Open-Meteo
- **resolved truth**: observed weather for a time that has already happened
- **prediction**: an internal estimate produced by a future Astrape/Gibil model

This module should not produce predictions or confidence scores. A later `weather_predictor.py` subsystem can use this clean forecast database to produce predictions and confidence.

## Subsystem Boundary

Initial classes should stay narrowly scoped:

- `OpenMeteoClient`: fetch raw hourly forecast payloads
- `OpenMeteoParser`: convert API payloads into external forecast runs and points
- `WeatherBuilder`: normalize and select clean forecast records for database use
- `WeatherStore`: persist forecast points and resolved truth

These classes communicate through data models like `WeatherForecastRun`, `WeatherForecastPoint`, and `WeatherResolvedTruth`.

## Core Data Shape

Every weather API pull is a forecast run.

```text
issued_at = when the external forecast was fetched
target_at = the hour being forecast
horizon_hours = target_at - issued_at
forecast_value = external forecast value for that target hour
```

Later, when `target_at` is in the past, Astrape can attach resolved truth:

```text
resolved_at = the hour that actually happened
truth = observed temperature / observed solar radiation
```

That creates rows future modules can use:

```text
target_at | resolved_truth | forecast_1h | forecast_2h | ... | forecast_48h
```

The future predictor can learn from those rows without needing to know anything about Open-Meteo payloads.

## First Variables

Use Open-Meteo hourly forecast fields:

- `temperature_2m`
- `shortwave_radiation`
- `cloud_cover`

Open-Meteo documents `shortwave_radiation` as average incoming solar radiation over the preceding hour at the surface, equivalent to GHI, measured in W/m2. That is the right starting solar forecast variable for Astrape.

## Storage Shape

Forecast points should be stored as individual rows.

Core fields:
- `issued_at`
- `target_at`
- `horizon_hours`
- `source`
- `temperature_c`
- `shortwave_radiation_w_m2`
- `cloud_cover_pct`

Resolved truth should be stored separately. For now, resolved truth comes from the Open-Meteo historical archive API.

Until archive data is available, Astrape can also store the current 0-hour Open-Meteo forecast as provisional truth with `source = open_meteo_zero_hour`. This gives the UI and future joins a near-real-time truth line. Archive truth remains separate with `source = open_meteo_archive`, so later modules can choose whether to prefer archive actuals over provisional 0-hour values.

Core fields:
- `resolved_at`
- `source`
- `temperature_c`
- `shortwave_radiation_w_m2`

The future predictor can join forecast points to truth by `target_at = resolved_at`.

Open-Meteo archive data can lag behind current time depending on model availability, so the database daemon backfills a configurable historical window instead of assuming the last completed hour is immediately available.

## Visual Explorer

We should build a small web output for inspecting forecast history.

Useful first view:
- select a weather variable, such as temperature or shortwave radiation
- select forecast horizons, such as 2h and 4h
- overlay those horizon-specific external forecasts against resolved truth
- plot by `target_at`

Example:

```text
target_at on x-axis
temperature_c on y-axis
line 1: Open-Meteo forecast made 2 hours before target_at
line 2: Open-Meteo forecast made 4 hours before target_at
line 3: resolved truth
```

This visual layer should read from the cleaned weather database. It should not be part of the Open-Meteo client or parser.

## First Implementation Slice

1. Fetch one Open-Meteo-style hourly forecast run.
2. Parse it into forecast points.
3. Normalize the run through `WeatherBuilder`.
4. Store forecast points through `WeatherStore`.
5. Add resolved truth rows when we have a source for observed weather.
6. Build the visual explorer after forecast/truth storage exists.