Weather Variables

The engine supports five registered weather variables through its WeatherVariable polymorphic architecture. Each variable has a dedicated statistical distribution, bias correction method, data sources, and threshold configuration.

Variable Summary

Variable	Registry Name	Distribution	Bias Method	Exceedance	Unit	Status
Daily High Temp	`temperature_high`	Gaussian	Additive	above	°F	Live (Phase 2+)
Nighttime Low Temp	`temperature_low`	Gaussian	Additive	below	°F	Live (Phase 2+)
Rainfall	`rainfall`	Gamma (zero-inflated)	Multiplicative	above	inches	Backtested (Phase 4)
Wind Speed	`wind_speed`	Weibull	Multiplicative	above	mph	Backtested (Phase 4)
Wind Gust	`wind_gust`	Weibull	Multiplicative	above	mph	Backtested (Phase 4)
Irradiance	`irradiance`	Beta	Multiplicative	below	MJ/m^2	Backtested (Phase 4)

Temperature — Daily High (`temperature_high`)

The primary variable — used for ForecastEx validation and the foundation of the calibration proof.

Implementation

@register_variable("temperature_high")
class TemperatureHigh(WeatherVariable):
    config = VariableConfig(
        name="temperature_high",
        display_name="Daily High Temperature",
        unit="°F",
        distribution_type=DistributionType.GAUSSIAN,
        bias_method=BiasMethod.ADDITIVE,
        open_meteo_variable="temperature_2m_max",
        forecast_col="forecast_max_f",
        observed_col="max_temp_f",
        corrected_col="corrected_forecast_f",
        data_dir_name="",  # uses BACKTEST_DATA_DIR root
        threshold_integer=True,
        exceedance_semantic="above",
    )

Aliases: "high" resolves to "temperature_high" via the registry.

Data Sources

Source	API	Variable	Notes
Forecast	Open-Meteo Historical Forecast	`temperature_2m_max`	Deterministic GFS; °C → °F conversion
Ensemble	Open-Meteo Ensemble API	`temperature_2m_max`	GFS 31-member + ECMWF IFS 51-member
Observed	IEM ASOS Daily API	`max_temp_f`	State-specific networks (e.g., `NY_ASOS`)
Settlement	NWS Climatological Report	Max observed temperature	Official ForecastEx settlement

Bias Correction — Additive

bias = mean(observed - forecast)   per (station, month)
corrected_forecast = raw_forecast + bias

Trained on Jan–Feb 2026; validated on March 2026. Fallback hierarchy: (station, month) → station overall → global.

Probability — Gaussian CDF

Backtest: P(T > threshold) = 1 - Phi((threshold - mu_corrected) / sigma) where sigma is per-city, per-month standard deviation of forecast errors.

Live: Ensemble member counting: P(T > threshold) = count(members > threshold) / n_members, clamped to [0.02, 0.98].

Thresholds

Integer °F values: mean - STRIKES_PER_SIDE to mean + STRIKES_PER_SIDE (default: ±4 strikes = 9 total thresholds per city-day). Configured via THRESHOLD_RANGE_OFFSET = 10 for backtesting.

Backtest Results (Phase 0/1)

Baseline (Jan–Mar 2026, 900 city-days):

Metric	Value
MAE	1.24°F
Bias	-0.74°F (cold)
RMSE	1.60°F

Bias-corrected (March test set, 310 city-days):

Metric	Raw	Corrected
MAE	1.26°F	1.06°F (-16%)
Bias	-0.74°F	0.00°F (eliminated)
Brier (raw)	0.0374	—
Brier (calibrated)	—	0.0349

Key sensitivities for product pricing:

Region	Impact per +1°C	Source
Singapore	+3–4% electricity demand	Ang, Wang & Ma (2017)
India (above 30°C)	+11% power demand	Harish, Singh & Tongia (2020)
Shanghai (above 25°C)	+14.5% electricity use	Li, Pizer & Wu (2018)

Temperature — Nighttime Low (`temperature_low`)

Implementation

@register_variable("temperature_low")
class TemperatureLow(WeatherVariable):
    config = VariableConfig(
        name="temperature_low",
        display_name="Nighttime Low Temperature",
        unit="°F",
        distribution_type=DistributionType.GAUSSIAN,
        bias_method=BiasMethod.ADDITIVE,
        open_meteo_variable="temperature_2m_min",
        forecast_col="forecast_min_f",
        observed_col="min_temp_f",
        corrected_col="corrected_forecast_f",
        data_dir_name="low",
        threshold_integer=True,
        exceedance_semantic="below",  # NLL: risk is cold events
    )

Aliases: "low" resolves to "temperature_low".

Data Sources

Source	API	Variable
Forecast	Open-Meteo Historical Forecast	`temperature_2m_min`
Ensemble	Open-Meteo Ensemble API	`temperature_2m_min`
Observed	IEM ASOS Daily API	`min_temp_f`

Exceedance Semantic — Below

For NLL contracts, the relevant question is P(T < threshold). The engine's exceedance semantic is "below" — meaning observed_exceeded = 1 when observed < threshold.

Rainfall (`rainfall`)

The highest-impact variable for infrastructure and agriculture — but the most difficult to forecast accurately due to its zero-inflated, non-Gaussian distribution.

Implementation

@register_variable("rainfall")
class Rainfall(WeatherVariable):
    config = VariableConfig(
        name="rainfall",
        display_name="Daily Rainfall",
        unit="inches",
        distribution_type=DistributionType.GAMMA,
        bias_method=BiasMethod.MULTIPLICATIVE,
        open_meteo_variable="precipitation_sum",
        forecast_col="forecast_precip_in",
        observed_col="precip_inches",
        corrected_col="corrected_precip_in",
        data_dir_name="rainfall",
        thresholds=[0.01, 0.1, 0.25, 0.5, 1.0, 2.0],
        threshold_integer=False,
        clamp_min=0.0,
        exceedance_semantic="above",
    )

Data Sources

Source	API	Variable	Conversion
Forecast	Open-Meteo Historical Forecast	`precipitation_sum`	mm → inches (* 0.0393701)
Ensemble	Open-Meteo Ensemble API	`precipitation_sum`	mm → inches
Observed	IEM ASOS Daily API	`precip_in`	Already in inches

IEM returns ALL columns regardless of the vars parameter. The actual column name for precipitation is precip_in (not precip). The engine renames this to precip_inches on ingestion.

Bias Correction — Multiplicative

Additive correction can produce negative rainfall (physically impossible). The engine uses multiplicative bias correction:

ratio = mean(observed_wet) / mean(forecast_wet)   per (station, month)
corrected = raw * ratio,  clamped >= 0

where "wet" = values > RAINFALL_ZERO_THRESHOLD (0.005 inches)

The multiplicative method also stores dry-day frequencies: p_zero_obs and p_zero_fct per (station, month) for use in the zero-inflated Gamma distribution.

Fallback hierarchy: (station, month) → station overall → global. If no valid ratio exists, falls back to 1.0 (no correction).

Implementation: train_multiplicative_bias() and get_multiplicative_ratio() in src/models/bias_correction.py.

Probability — Zero-Inflated Gamma

Rainfall has a mixed discrete-continuous distribution: a point mass at zero (dry days) and a continuous Gamma distribution for wet days.

P(precip > T) = (1 - p_zero) * P(Gamma(alpha, beta) > T)

Where:

p_zero = probability of zero rainfall (estimated from training data)
alpha (shape) and beta (scale) = Gamma distribution parameters fitted via MLE

Live: Ensemble member counting (same as temperature): P(exceed) = count(members > threshold) / n_members.

Implementation: gamma_exceedance() in src/core/distribution.py, with fit_gamma_zero_inflated() for parameter estimation.

Thresholds

Fixed thresholds (not generated dynamically):

Threshold	Inches	Description
Trace	0.01	Any measurable precipitation
Light	0.10	Light rain
Moderate	0.25	NWS "measurable" rain boundary
Heavy	0.50	Construction delay trigger
Very Heavy	1.00	Significant accumulation
Extreme	2.00	Flood risk

Backtest Results (Phase 4)

Period: Jan–Mar 2026, 10 cities, GFS precipitation_sum vs IEM precip_in.

Metric	Value
MAE	0.036 inches
Bias	-0.0018 inches
Brier (raw)	0.0403
Brier (calibrated)	0.0292
Bias method	Multiplicative
Distribution	Gamma (zero-inflated)

Wind Speed (`wind_speed`)

Implementation

@register_variable("wind_speed")
class WindSpeed(WeatherVariable):
    config = VariableConfig(
        name="wind_speed",
        display_name="Daily Max Wind Speed",
        unit="mph",
        distribution_type=DistributionType.WEIBULL,
        bias_method=BiasMethod.MULTIPLICATIVE,
        open_meteo_variable="wind_speed_10m_max",
        forecast_col="forecast_wind_mph",
        observed_col="max_wind_mph",
        corrected_col="corrected_wind_mph",
        data_dir_name="wind",
        thresholds=[15, 20, 25, 30, 35, 40, 50],
        threshold_integer=True,
        clamp_min=0.0,
        exceedance_semantic="above",
    )

Data Sources

Source	API	Variable	Conversion
Forecast	Open-Meteo Historical Forecast	`wind_speed_10m_max`	km/h → mph (* 0.621371)
Ensemble	Open-Meteo Ensemble API	`wind_speed_10m_max`	km/h → mph
Observed	IEM ASOS Daily API	`max_wind_speed_kts`	knots → mph (* 1.15078)

IEM returns the column as max_wind_speed_kts (not max_sknt). The engine converts knots to mph on ingestion and renames to max_wind_mph.

Bias Correction — Multiplicative

Same as rainfall: corrected = raw * ratio, clamped >= 0.

Important: GFS 10m grid-average wind differs systematically from IEM station-level sustained wind observations. The backtest shows a +7.09 mph positive bias — GFS significantly overforecasts peak wind compared to station observations. The multiplicative correction ratio of ~1.61x addresses this systematic scale mismatch.

Probability — Weibull

Wind speed follows a Weibull distribution (bounded below at zero, positive skew):

P(wind > T) = 1 - CDF_Weibull(T, k, lambda)

Where k (shape) and lambda (scale) are fitted via scipy.stats.weibull_min.fit() with a moment-based fallback.

Implementation: weibull_exceedance() in src/core/distribution.py.

Thresholds

Fixed operational thresholds:

Threshold (mph)	Description	Risk Application
15	Moderate wind	Begin monitoring
20	Fresh breeze	Secure loose materials
25	Strong wind	Curtail some outdoor work
30	High wind	Curtail crane operations
35	Very high wind	Construction site shutdown
40	Gale-force	Infrastructure risk
50	Storm-force	Emergency protocols

Backtest Results (Phase 4)

Metric	Value
MAE	7.16 mph
Bias	+7.09 mph (GFS overforecasts station-level wind)
Brier (raw)	0.0514
Brier (calibrated)	0.0462
Bias method	Multiplicative
Distribution	Weibull

The large positive bias reflects the fundamental mismatch between GFS 10m grid-average wind and IEM station-level sustained wind. Multiplicative correction significantly improves calibration.

Wind Gust (`wind_gust`)

Implementation

@register_variable("wind_gust")
class WindGust(WeatherVariable):
    config = VariableConfig(
        name="wind_gust",
        display_name="Daily Max Wind Gust",
        unit="mph",
        distribution_type=DistributionType.WEIBULL,
        bias_method=BiasMethod.MULTIPLICATIVE,
        open_meteo_variable="wind_gusts_10m_max",
        forecast_col="forecast_gust_mph",
        observed_col="max_gust_mph",
        corrected_col="corrected_gust_mph",
        data_dir_name="wind_gust",
        thresholds=[25, 30, 40, 50, 60],
        threshold_integer=True,
        clamp_min=0.0,
        exceedance_semantic="above",
    )

Data Sources

Source	Variable	Conversion
Forecast (Open-Meteo)	`wind_gusts_10m_max`	km/h → mph
Observed (IEM)	`max_wind_gust_kts`	knots → mph (* 1.15078)

Thresholds

Higher thresholds than sustained wind: [25, 30, 40, 50, 60] mph. Gusts represent instantaneous peak values which are always higher than sustained averages.

Irradiance (`irradiance`)

Critical for solar energy — determines PPA performance and generation revenue. Uses a clear-sky index (CSI) approach where the observed irradiance is normalised by theoretical clear-sky irradiance to produce a value on [0, 1].

Implementation

@register_variable("irradiance")
class Irradiance(WeatherVariable):
    config = VariableConfig(
        name="irradiance",
        display_name="Daily Irradiance",
        unit="MJ/m²",
        distribution_type=DistributionType.BETA,
        bias_method=BiasMethod.MULTIPLICATIVE,
        open_meteo_variable="shortwave_radiation_sum",
        forecast_col="forecast_irradiance_mj",
        observed_col="irradiance_mj_m2",
        corrected_col="corrected_irradiance_mj",
        data_dir_name="irradiance",
        thresholds=[],  # dynamic, based on CSI bins
        threshold_integer=False,
        clamp_min=0.0,
        exceedance_semantic="below",  # risk is LOW irradiance
    )

Data Sources

Source	API	Variable	Notes
Forecast	Open-Meteo Historical Forecast	`shortwave_radiation_sum`	MJ/m²
Ensemble	Open-Meteo Ensemble API	`shortwave_radiation_sum`	MJ/m²
Observed	Open-Meteo ERA5 Archive API	`shortwave_radiation_sum`	ERA5 reanalysis — independent of GFS/ECMWF forecasts

IEM has no solar radiation data. The engine uses ERA5 reanalysis (archive-api.open-meteo.com/v1/archive) as independent ground truth for irradiance. ERA5 is a global atmospheric reanalysis that assimilates millions of observations — its irradiance values are independent of the GFS/ECMWF forecasts being evaluated.

Clear-Sky Index (CSI)

Rather than working with raw irradiance values (which vary by latitude, season, and day length), the engine normalises to a clear-sky index:

CSI = actual_irradiance / clear_sky_irradiance

Where clear_sky_irradiance is computed from solar geometry using the Angstrom-Prescott model:

Compute solar declination from day of year
Compute sunset hour angle from latitude and declination
Compute extraterrestrial radiation (Ra) using solar constant and orbit eccentricity
Apply Angstrom-Prescott coefficients (a=0.25, b=0.50) for clear-sky GHI

Implementation: clear_sky_ghi_daily() in src/variables/irradiance/clear_sky.py.

CSI ∈ [0, 1]:

CSI ≈ 1.0 → clear sky, maximum solar generation
CSI ≈ 0.5 → partly cloudy
CSI ≈ 0.2 → overcast, minimal generation

Exceedance Semantic — Below

For irradiance, the relevant risk question is P(CSI < threshold) — the probability that irradiance falls below a given level. Low CSI means poor solar generation. The engine's exceedance semantic is "below".

Bias Correction — Multiplicative

Same as rainfall and wind: corrected = raw * ratio, clamped >= 0. Applied to raw MJ/m² values.

Probability — Beta Distribution

The clear-sky index is naturally bounded on [0, 1], making the Beta distribution the appropriate choice:

P(CSI < T) = CDF_Beta(T, alpha, beta)

Where alpha and beta are fitted from training data via scipy.stats.beta.fit() with a moment-based fallback.

Implementation: beta_exceedance() in src/core/distribution.py.

Thresholds

Dynamic thresholds based on clear-sky index bins:

CSI Threshold	Description
0.2	Very low generation (overcast)
0.4	Significant shortfall
0.6	Below-average generation
0.8	Near-clear conditions
1.0	Clear sky (theoretical max)

Backtest Results (Phase 4)

Metric	Value
MAE	1.49 MJ/m²
Bias	-0.13 MJ/m²
Brier (raw)	0.0530
Brier (calibrated)	0.0498
Bias method	Multiplicative
Distribution	Beta (on CSI)

Generic Backtest Pipeline

All variables use a unified backtest pipeline via scripts/run_phase_generic.py:

python scripts/run_phase_generic.py --variable rainfall --phase 0+1
python scripts/run_phase_generic.py --variable wind_speed --phase 0+1
python scripts/run_phase_generic.py --variable irradiance --phase 0+1

Phase 0 (Baseline):

Download historical forecasts via var.ingest_historical() (Open-Meteo)
Download observations via var.observe() (IEM or ERA5)
Merge on [station, date]
Compute MAE, bias, RMSE

Phase 1 (Calibration):

Split train/test (Jan–Feb / March)
Train bias parameters (additive or multiplicative based on var.config.bias_method)
Generate backtest probabilities via generate_backtest_probabilities_generic():
- Apply bias correction: var.bias_correct()
- Generate thresholds: var.generate_thresholds()
- Compute exceedance probability: var.exceedance_probability()
- Determine observed outcome based on var.config.exceedance_semantic
Train isotonic calibration
Compute Brier scores (raw vs calibrated)

The generic pipeline dispatches everything to the variable's methods — no variable-specific code exists in the orchestrator.

Sigma Computation for Non-Gaussian Variables

For temperature (Gaussian), sigma comes from the trained bias parameters. For non-Gaussian variables (rainfall, wind, irradiance), the generic backtest pipeline pre-computes a residual sigma per (station, month):

for (station, month), sub in df.groupby(["station", "month"]):
    residuals = sub[obs_col] - sub[fct_col]
    sigma = residuals.std(ddof=1)

This sigma is used as the spread parameter in the distribution-specific CDF functions. Cached in _sigma_cache for efficiency.

Variable Priority & Product Readiness

Variable	Engine Status	Product Readiness	ForecastEx	Risk API
Temperature (DH)	Live — Phase 2 signal generation active	Tier 1–3 ready	Yes (DH contracts)	N/A
Temperature (NLL)	Live — NLL pipeline active alongside DH	Tier 1–3 ready	Yes (NLL contracts)	N/A
Rainfall	Backtested — Phase 4 calibration complete	Tier 1–2 near-term	No contracts exist	Construction delay
Wind Speed	Backtested — Phase 4 calibration complete	Tier 1 first	No contracts exist	Operational risk
Wind Gust	Backtested — Phase 4 calibration complete	Tier 1 first	No contracts exist	Operational risk
Irradiance	Backtested — Phase 4 calibration complete	Solar-specific Tier 1	No contracts exist	Solar shortfall

Variable Summary

Temperature — Daily High (temperature_high)

Implementation

Data Sources

Bias Correction — Additive

Probability — Gaussian CDF

Thresholds

Backtest Results (Phase 0/1)

Temperature — Nighttime Low (temperature_low)

Implementation

Data Sources

Exceedance Semantic — Below

Rainfall (rainfall)

Implementation

Data Sources

Bias Correction — Multiplicative

Probability — Zero-Inflated Gamma

Thresholds

Backtest Results (Phase 4)

Wind Speed (wind_speed)

Implementation

Data Sources

Bias Correction — Multiplicative

Probability — Weibull

Thresholds

Backtest Results (Phase 4)

Wind Gust (wind_gust)

Implementation

Data Sources

Thresholds

Irradiance (irradiance)

Implementation

Data Sources

Clear-Sky Index (CSI)

Exceedance Semantic — Below

Bias Correction — Multiplicative

Probability — Beta Distribution

Thresholds

Backtest Results (Phase 4)

Generic Backtest Pipeline

Sigma Computation for Non-Gaussian Variables

Variable Priority & Product Readiness

Temperature — Daily High (`temperature_high`)

Temperature — Nighttime Low (`temperature_low`)

Rainfall (`rainfall`)

Wind Speed (`wind_speed`)

Wind Gust (`wind_gust`)

Irradiance (`irradiance`)