Weather Variables
The five registered weather variables — implementation details, statistical distributions, bias correction methods, thresholds, and backtest results.
The engine supports five registered weather variables through its WeatherVariable polymorphic architecture. Each variable has a dedicated statistical distribution, bias correction method, data sources, and threshold configuration.
Variable Summary
| Variable | Registry Name | Distribution | Bias Method | Exceedance | Unit | Status |
|---|---|---|---|---|---|---|
| Daily High Temp | temperature_high | Gaussian | Additive | above | °F | Live (Phase 2+) |
| Nighttime Low Temp | temperature_low | Gaussian | Additive | below | °F | Live (Phase 2+) |
| Rainfall | rainfall | Gamma (zero-inflated) | Multiplicative | above | inches | Backtested (Phase 4) |
| Wind Speed | wind_speed | Weibull | Multiplicative | above | mph | Backtested (Phase 4) |
| Wind Gust | wind_gust | Weibull | Multiplicative | above | mph | Backtested (Phase 4) |
| Irradiance | irradiance | Beta | Multiplicative | below | MJ/m^2 | Backtested (Phase 4) |
Temperature — Daily High (temperature_high)
The primary variable — used for ForecastEx validation and the foundation of the calibration proof.
Implementation
@register_variable("temperature_high")
class TemperatureHigh(WeatherVariable):
config = VariableConfig(
name="temperature_high",
display_name="Daily High Temperature",
unit="°F",
distribution_type=DistributionType.GAUSSIAN,
bias_method=BiasMethod.ADDITIVE,
open_meteo_variable="temperature_2m_max",
forecast_col="forecast_max_f",
observed_col="max_temp_f",
corrected_col="corrected_forecast_f",
data_dir_name="", # uses BACKTEST_DATA_DIR root
threshold_integer=True,
exceedance_semantic="above",
)
Aliases: "high" resolves to "temperature_high" via the registry.
Data Sources
| Source | API | Variable | Notes |
|---|---|---|---|
| Forecast | Open-Meteo Historical Forecast | temperature_2m_max | Deterministic GFS; °C → °F conversion |
| Ensemble | Open-Meteo Ensemble API | temperature_2m_max | GFS 31-member + ECMWF IFS 51-member |
| Observed | IEM ASOS Daily API | max_temp_f | State-specific networks (e.g., NY_ASOS) |
| Settlement | NWS Climatological Report | Max observed temperature | Official ForecastEx settlement |
Bias Correction — Additive
bias = mean(observed - forecast) per (station, month)
corrected_forecast = raw_forecast + bias
Trained on Jan–Feb 2026; validated on March 2026. Fallback hierarchy: (station, month) → station overall → global.
Probability — Gaussian CDF
Backtest: P(T > threshold) = 1 - Phi((threshold - mu_corrected) / sigma) where sigma is per-city, per-month standard deviation of forecast errors.
Live: Ensemble member counting: P(T > threshold) = count(members > threshold) / n_members, clamped to [0.02, 0.98].
Thresholds
Integer °F values: mean - STRIKES_PER_SIDE to mean + STRIKES_PER_SIDE (default: ±4 strikes = 9 total thresholds per city-day). Configured via THRESHOLD_RANGE_OFFSET = 10 for backtesting.
Backtest Results (Phase 0/1)
Baseline (Jan–Mar 2026, 900 city-days):
| Metric | Value |
|---|---|
| MAE | 1.24°F |
| Bias | -0.74°F (cold) |
| RMSE | 1.60°F |
Bias-corrected (March test set, 310 city-days):
| Metric | Raw | Corrected |
|---|---|---|
| MAE | 1.26°F | 1.06°F (-16%) |
| Bias | -0.74°F | 0.00°F (eliminated) |
| Brier (raw) | 0.0374 | — |
| Brier (calibrated) | — | 0.0349 |
Key sensitivities for product pricing:
| Region | Impact per +1°C | Source |
|---|---|---|
| Singapore | +3–4% electricity demand | Ang, Wang & Ma (2017) |
| India (above 30°C) | +11% power demand | Harish, Singh & Tongia (2020) |
| Shanghai (above 25°C) | +14.5% electricity use | Li, Pizer & Wu (2018) |
Temperature — Nighttime Low (temperature_low)
Implementation
@register_variable("temperature_low")
class TemperatureLow(WeatherVariable):
config = VariableConfig(
name="temperature_low",
display_name="Nighttime Low Temperature",
unit="°F",
distribution_type=DistributionType.GAUSSIAN,
bias_method=BiasMethod.ADDITIVE,
open_meteo_variable="temperature_2m_min",
forecast_col="forecast_min_f",
observed_col="min_temp_f",
corrected_col="corrected_forecast_f",
data_dir_name="low",
threshold_integer=True,
exceedance_semantic="below", # NLL: risk is cold events
)
Aliases: "low" resolves to "temperature_low".
Data Sources
| Source | API | Variable |
|---|---|---|
| Forecast | Open-Meteo Historical Forecast | temperature_2m_min |
| Ensemble | Open-Meteo Ensemble API | temperature_2m_min |
| Observed | IEM ASOS Daily API | min_temp_f |
Exceedance Semantic — Below
For NLL contracts, the relevant question is P(T < threshold). The engine's exceedance semantic is "below" — meaning observed_exceeded = 1 when observed < threshold.
Rainfall (rainfall)
The highest-impact variable for infrastructure and agriculture — but the most difficult to forecast accurately due to its zero-inflated, non-Gaussian distribution.
Implementation
@register_variable("rainfall")
class Rainfall(WeatherVariable):
config = VariableConfig(
name="rainfall",
display_name="Daily Rainfall",
unit="inches",
distribution_type=DistributionType.GAMMA,
bias_method=BiasMethod.MULTIPLICATIVE,
open_meteo_variable="precipitation_sum",
forecast_col="forecast_precip_in",
observed_col="precip_inches",
corrected_col="corrected_precip_in",
data_dir_name="rainfall",
thresholds=[0.01, 0.1, 0.25, 0.5, 1.0, 2.0],
threshold_integer=False,
clamp_min=0.0,
exceedance_semantic="above",
)
Data Sources
| Source | API | Variable | Conversion |
|---|---|---|---|
| Forecast | Open-Meteo Historical Forecast | precipitation_sum | mm → inches (* 0.0393701) |
| Ensemble | Open-Meteo Ensemble API | precipitation_sum | mm → inches |
| Observed | IEM ASOS Daily API | precip_in | Already in inches |
IEM returns ALL columns regardless of the vars parameter. The actual column name for precipitation is precip_in (not precip). The engine renames this to precip_inches on ingestion.
Bias Correction — Multiplicative
Additive correction can produce negative rainfall (physically impossible). The engine uses multiplicative bias correction:
ratio = mean(observed_wet) / mean(forecast_wet) per (station, month)
corrected = raw * ratio, clamped >= 0
where "wet" = values > RAINFALL_ZERO_THRESHOLD (0.005 inches)
The multiplicative method also stores dry-day frequencies: p_zero_obs and p_zero_fct per (station, month) for use in the zero-inflated Gamma distribution.
Fallback hierarchy: (station, month) → station overall → global. If no valid ratio exists, falls back to 1.0 (no correction).
Implementation: train_multiplicative_bias() and get_multiplicative_ratio() in src/models/bias_correction.py.
Probability — Zero-Inflated Gamma
Rainfall has a mixed discrete-continuous distribution: a point mass at zero (dry days) and a continuous Gamma distribution for wet days.
P(precip > T) = (1 - p_zero) * P(Gamma(alpha, beta) > T)
Where:
p_zero= probability of zero rainfall (estimated from training data)alpha(shape) andbeta(scale) = Gamma distribution parameters fitted via MLE
Live: Ensemble member counting (same as temperature): P(exceed) = count(members > threshold) / n_members.
Implementation: gamma_exceedance() in src/core/distribution.py, with fit_gamma_zero_inflated() for parameter estimation.
Thresholds
Fixed thresholds (not generated dynamically):
| Threshold | Inches | Description |
|---|---|---|
| Trace | 0.01 | Any measurable precipitation |
| Light | 0.10 | Light rain |
| Moderate | 0.25 | NWS "measurable" rain boundary |
| Heavy | 0.50 | Construction delay trigger |
| Very Heavy | 1.00 | Significant accumulation |
| Extreme | 2.00 | Flood risk |
Backtest Results (Phase 4)
Period: Jan–Mar 2026, 10 cities, GFS precipitation_sum vs IEM precip_in.
| Metric | Value |
|---|---|
| MAE | 0.036 inches |
| Bias | -0.0018 inches |
| Brier (raw) | 0.0403 |
| Brier (calibrated) | 0.0292 |
| Bias method | Multiplicative |
| Distribution | Gamma (zero-inflated) |
Wind Speed (wind_speed)
Implementation
@register_variable("wind_speed")
class WindSpeed(WeatherVariable):
config = VariableConfig(
name="wind_speed",
display_name="Daily Max Wind Speed",
unit="mph",
distribution_type=DistributionType.WEIBULL,
bias_method=BiasMethod.MULTIPLICATIVE,
open_meteo_variable="wind_speed_10m_max",
forecast_col="forecast_wind_mph",
observed_col="max_wind_mph",
corrected_col="corrected_wind_mph",
data_dir_name="wind",
thresholds=[15, 20, 25, 30, 35, 40, 50],
threshold_integer=True,
clamp_min=0.0,
exceedance_semantic="above",
)
Data Sources
| Source | API | Variable | Conversion |
|---|---|---|---|
| Forecast | Open-Meteo Historical Forecast | wind_speed_10m_max | km/h → mph (* 0.621371) |
| Ensemble | Open-Meteo Ensemble API | wind_speed_10m_max | km/h → mph |
| Observed | IEM ASOS Daily API | max_wind_speed_kts | knots → mph (* 1.15078) |
IEM returns the column as max_wind_speed_kts (not max_sknt). The engine converts knots to mph on ingestion and renames to max_wind_mph.
Bias Correction — Multiplicative
Same as rainfall: corrected = raw * ratio, clamped >= 0.
Important: GFS 10m grid-average wind differs systematically from IEM station-level sustained wind observations. The backtest shows a +7.09 mph positive bias — GFS significantly overforecasts peak wind compared to station observations. The multiplicative correction ratio of ~1.61x addresses this systematic scale mismatch.
Probability — Weibull
Wind speed follows a Weibull distribution (bounded below at zero, positive skew):
P(wind > T) = 1 - CDF_Weibull(T, k, lambda)
Where k (shape) and lambda (scale) are fitted via scipy.stats.weibull_min.fit() with a moment-based fallback.
Implementation: weibull_exceedance() in src/core/distribution.py.
Thresholds
Fixed operational thresholds:
| Threshold (mph) | Description | Risk Application |
|---|---|---|
| 15 | Moderate wind | Begin monitoring |
| 20 | Fresh breeze | Secure loose materials |
| 25 | Strong wind | Curtail some outdoor work |
| 30 | High wind | Curtail crane operations |
| 35 | Very high wind | Construction site shutdown |
| 40 | Gale-force | Infrastructure risk |
| 50 | Storm-force | Emergency protocols |
Backtest Results (Phase 4)
| Metric | Value |
|---|---|
| MAE | 7.16 mph |
| Bias | +7.09 mph (GFS overforecasts station-level wind) |
| Brier (raw) | 0.0514 |
| Brier (calibrated) | 0.0462 |
| Bias method | Multiplicative |
| Distribution | Weibull |
The large positive bias reflects the fundamental mismatch between GFS 10m grid-average wind and IEM station-level sustained wind. Multiplicative correction significantly improves calibration.
Wind Gust (wind_gust)
Implementation
@register_variable("wind_gust")
class WindGust(WeatherVariable):
config = VariableConfig(
name="wind_gust",
display_name="Daily Max Wind Gust",
unit="mph",
distribution_type=DistributionType.WEIBULL,
bias_method=BiasMethod.MULTIPLICATIVE,
open_meteo_variable="wind_gusts_10m_max",
forecast_col="forecast_gust_mph",
observed_col="max_gust_mph",
corrected_col="corrected_gust_mph",
data_dir_name="wind_gust",
thresholds=[25, 30, 40, 50, 60],
threshold_integer=True,
clamp_min=0.0,
exceedance_semantic="above",
)
Data Sources
| Source | Variable | Conversion |
|---|---|---|
| Forecast (Open-Meteo) | wind_gusts_10m_max | km/h → mph |
| Observed (IEM) | max_wind_gust_kts | knots → mph (* 1.15078) |
Thresholds
Higher thresholds than sustained wind: [25, 30, 40, 50, 60] mph. Gusts represent instantaneous peak values which are always higher than sustained averages.
Irradiance (irradiance)
Critical for solar energy — determines PPA performance and generation revenue. Uses a clear-sky index (CSI) approach where the observed irradiance is normalised by theoretical clear-sky irradiance to produce a value on [0, 1].
Implementation
@register_variable("irradiance")
class Irradiance(WeatherVariable):
config = VariableConfig(
name="irradiance",
display_name="Daily Irradiance",
unit="MJ/m²",
distribution_type=DistributionType.BETA,
bias_method=BiasMethod.MULTIPLICATIVE,
open_meteo_variable="shortwave_radiation_sum",
forecast_col="forecast_irradiance_mj",
observed_col="irradiance_mj_m2",
corrected_col="corrected_irradiance_mj",
data_dir_name="irradiance",
thresholds=[], # dynamic, based on CSI bins
threshold_integer=False,
clamp_min=0.0,
exceedance_semantic="below", # risk is LOW irradiance
)
Data Sources
| Source | API | Variable | Notes |
|---|---|---|---|
| Forecast | Open-Meteo Historical Forecast | shortwave_radiation_sum | MJ/m² |
| Ensemble | Open-Meteo Ensemble API | shortwave_radiation_sum | MJ/m² |
| Observed | Open-Meteo ERA5 Archive API | shortwave_radiation_sum | ERA5 reanalysis — independent of GFS/ECMWF forecasts |
IEM has no solar radiation data. The engine uses ERA5 reanalysis (archive-api.open-meteo.com/v1/archive) as independent ground truth for irradiance. ERA5 is a global atmospheric reanalysis that assimilates millions of observations — its irradiance values are independent of the GFS/ECMWF forecasts being evaluated.
Clear-Sky Index (CSI)
Rather than working with raw irradiance values (which vary by latitude, season, and day length), the engine normalises to a clear-sky index:
CSI = actual_irradiance / clear_sky_irradiance
Where clear_sky_irradiance is computed from solar geometry using the Angstrom-Prescott model:
- Compute solar declination from day of year
- Compute sunset hour angle from latitude and declination
- Compute extraterrestrial radiation (Ra) using solar constant and orbit eccentricity
- Apply Angstrom-Prescott coefficients (a=0.25, b=0.50) for clear-sky GHI
Implementation: clear_sky_ghi_daily() in src/variables/irradiance/clear_sky.py.
CSI ∈ [0, 1]:
- CSI ≈ 1.0 → clear sky, maximum solar generation
- CSI ≈ 0.5 → partly cloudy
- CSI ≈ 0.2 → overcast, minimal generation
Exceedance Semantic — Below
For irradiance, the relevant risk question is P(CSI < threshold) — the probability that irradiance falls below a given level. Low CSI means poor solar generation. The engine's exceedance semantic is "below".
Bias Correction — Multiplicative
Same as rainfall and wind: corrected = raw * ratio, clamped >= 0. Applied to raw MJ/m² values.
Probability — Beta Distribution
The clear-sky index is naturally bounded on [0, 1], making the Beta distribution the appropriate choice:
P(CSI < T) = CDF_Beta(T, alpha, beta)
Where alpha and beta are fitted from training data via scipy.stats.beta.fit() with a moment-based fallback.
Implementation: beta_exceedance() in src/core/distribution.py.
Thresholds
Dynamic thresholds based on clear-sky index bins:
| CSI Threshold | Description |
|---|---|
| 0.2 | Very low generation (overcast) |
| 0.4 | Significant shortfall |
| 0.6 | Below-average generation |
| 0.8 | Near-clear conditions |
| 1.0 | Clear sky (theoretical max) |
Backtest Results (Phase 4)
| Metric | Value |
|---|---|
| MAE | 1.49 MJ/m² |
| Bias | -0.13 MJ/m² |
| Brier (raw) | 0.0530 |
| Brier (calibrated) | 0.0498 |
| Bias method | Multiplicative |
| Distribution | Beta (on CSI) |
Generic Backtest Pipeline
All variables use a unified backtest pipeline via scripts/run_phase_generic.py:
python scripts/run_phase_generic.py --variable rainfall --phase 0+1
python scripts/run_phase_generic.py --variable wind_speed --phase 0+1
python scripts/run_phase_generic.py --variable irradiance --phase 0+1
Phase 0 (Baseline):
- Download historical forecasts via
var.ingest_historical()(Open-Meteo) - Download observations via
var.observe()(IEM or ERA5) - Merge on
[station, date] - Compute MAE, bias, RMSE
Phase 1 (Calibration):
- Split train/test (Jan–Feb / March)
- Train bias parameters (additive or multiplicative based on
var.config.bias_method) - Generate backtest probabilities via
generate_backtest_probabilities_generic():- Apply bias correction:
var.bias_correct() - Generate thresholds:
var.generate_thresholds() - Compute exceedance probability:
var.exceedance_probability() - Determine observed outcome based on
var.config.exceedance_semantic
- Apply bias correction:
- Train isotonic calibration
- Compute Brier scores (raw vs calibrated)
The generic pipeline dispatches everything to the variable's methods — no variable-specific code exists in the orchestrator.
Sigma Computation for Non-Gaussian Variables
For temperature (Gaussian), sigma comes from the trained bias parameters. For non-Gaussian variables (rainfall, wind, irradiance), the generic backtest pipeline pre-computes a residual sigma per (station, month):
for (station, month), sub in df.groupby(["station", "month"]):
residuals = sub[obs_col] - sub[fct_col]
sigma = residuals.std(ddof=1)
This sigma is used as the spread parameter in the distribution-specific CDF functions. Cached in _sigma_cache for efficiency.
Variable Priority & Product Readiness
| Variable | Engine Status | Product Readiness | ForecastEx | Risk API |
|---|---|---|---|---|
| Temperature (DH) | Live — Phase 2 signal generation active | Tier 1–3 ready | Yes (DH contracts) | N/A |
| Temperature (NLL) | Live — NLL pipeline active alongside DH | Tier 1–3 ready | Yes (NLL contracts) | N/A |
| Rainfall | Backtested — Phase 4 calibration complete | Tier 1–2 near-term | No contracts exist | Construction delay |
| Wind Speed | Backtested — Phase 4 calibration complete | Tier 1 first | No contracts exist | Operational risk |
| Wind Gust | Backtested — Phase 4 calibration complete | Tier 1 first | No contracts exist | Operational risk |
| Irradiance | Backtested — Phase 4 calibration complete | Solar-specific Tier 1 | No contracts exist | Solar shortfall |