Cliff Horizon logo

Weather Variables

The five registered weather variables — implementation details, statistical distributions, bias correction methods, thresholds, and backtest results.

The engine supports five registered weather variables through its WeatherVariable polymorphic architecture. Each variable has a dedicated statistical distribution, bias correction method, data sources, and threshold configuration.

Variable Summary

VariableRegistry NameDistributionBias MethodExceedanceUnitStatus
Daily High Temptemperature_highGaussianAdditiveabove°FLive (Phase 2+)
Nighttime Low Temptemperature_lowGaussianAdditivebelow°FLive (Phase 2+)
RainfallrainfallGamma (zero-inflated)MultiplicativeaboveinchesBacktested (Phase 4)
Wind Speedwind_speedWeibullMultiplicativeabovemphBacktested (Phase 4)
Wind Gustwind_gustWeibullMultiplicativeabovemphBacktested (Phase 4)
IrradianceirradianceBetaMultiplicativebelowMJ/m^2Backtested (Phase 4)

Temperature — Daily High (temperature_high)

The primary variable — used for ForecastEx validation and the foundation of the calibration proof.

Implementation

@register_variable("temperature_high")
class TemperatureHigh(WeatherVariable):
    config = VariableConfig(
        name="temperature_high",
        display_name="Daily High Temperature",
        unit="°F",
        distribution_type=DistributionType.GAUSSIAN,
        bias_method=BiasMethod.ADDITIVE,
        open_meteo_variable="temperature_2m_max",
        forecast_col="forecast_max_f",
        observed_col="max_temp_f",
        corrected_col="corrected_forecast_f",
        data_dir_name="",  # uses BACKTEST_DATA_DIR root
        threshold_integer=True,
        exceedance_semantic="above",
    )

Aliases: "high" resolves to "temperature_high" via the registry.

Data Sources

SourceAPIVariableNotes
ForecastOpen-Meteo Historical Forecasttemperature_2m_maxDeterministic GFS; °C → °F conversion
EnsembleOpen-Meteo Ensemble APItemperature_2m_maxGFS 31-member + ECMWF IFS 51-member
ObservedIEM ASOS Daily APImax_temp_fState-specific networks (e.g., NY_ASOS)
SettlementNWS Climatological ReportMax observed temperatureOfficial ForecastEx settlement

Bias Correction — Additive

bias = mean(observed - forecast)   per (station, month)
corrected_forecast = raw_forecast + bias

Trained on Jan–Feb 2026; validated on March 2026. Fallback hierarchy: (station, month) → station overall → global.

Probability — Gaussian CDF

Backtest: P(T > threshold) = 1 - Phi((threshold - mu_corrected) / sigma) where sigma is per-city, per-month standard deviation of forecast errors.

Live: Ensemble member counting: P(T > threshold) = count(members > threshold) / n_members, clamped to [0.02, 0.98].

Thresholds

Integer °F values: mean - STRIKES_PER_SIDE to mean + STRIKES_PER_SIDE (default: ±4 strikes = 9 total thresholds per city-day). Configured via THRESHOLD_RANGE_OFFSET = 10 for backtesting.

Backtest Results (Phase 0/1)

Baseline (Jan–Mar 2026, 900 city-days):

MetricValue
MAE1.24°F
Bias-0.74°F (cold)
RMSE1.60°F

Bias-corrected (March test set, 310 city-days):

MetricRawCorrected
MAE1.26°F1.06°F (-16%)
Bias-0.74°F0.00°F (eliminated)
Brier (raw)0.0374
Brier (calibrated)0.0349

Key sensitivities for product pricing:

RegionImpact per +1°CSource
Singapore+3–4% electricity demandAng, Wang & Ma (2017)
India (above 30°C)+11% power demandHarish, Singh & Tongia (2020)
Shanghai (above 25°C)+14.5% electricity useLi, Pizer & Wu (2018)

Temperature — Nighttime Low (temperature_low)

Implementation

@register_variable("temperature_low")
class TemperatureLow(WeatherVariable):
    config = VariableConfig(
        name="temperature_low",
        display_name="Nighttime Low Temperature",
        unit="°F",
        distribution_type=DistributionType.GAUSSIAN,
        bias_method=BiasMethod.ADDITIVE,
        open_meteo_variable="temperature_2m_min",
        forecast_col="forecast_min_f",
        observed_col="min_temp_f",
        corrected_col="corrected_forecast_f",
        data_dir_name="low",
        threshold_integer=True,
        exceedance_semantic="below",  # NLL: risk is cold events
    )

Aliases: "low" resolves to "temperature_low".

Data Sources

SourceAPIVariable
ForecastOpen-Meteo Historical Forecasttemperature_2m_min
EnsembleOpen-Meteo Ensemble APItemperature_2m_min
ObservedIEM ASOS Daily APImin_temp_f

Exceedance Semantic — Below

For NLL contracts, the relevant question is P(T < threshold). The engine's exceedance semantic is "below" — meaning observed_exceeded = 1 when observed &lt; threshold.


Rainfall (rainfall)

The highest-impact variable for infrastructure and agriculture — but the most difficult to forecast accurately due to its zero-inflated, non-Gaussian distribution.

Implementation

@register_variable("rainfall")
class Rainfall(WeatherVariable):
    config = VariableConfig(
        name="rainfall",
        display_name="Daily Rainfall",
        unit="inches",
        distribution_type=DistributionType.GAMMA,
        bias_method=BiasMethod.MULTIPLICATIVE,
        open_meteo_variable="precipitation_sum",
        forecast_col="forecast_precip_in",
        observed_col="precip_inches",
        corrected_col="corrected_precip_in",
        data_dir_name="rainfall",
        thresholds=[0.01, 0.1, 0.25, 0.5, 1.0, 2.0],
        threshold_integer=False,
        clamp_min=0.0,
        exceedance_semantic="above",
    )

Data Sources

SourceAPIVariableConversion
ForecastOpen-Meteo Historical Forecastprecipitation_summm → inches (* 0.0393701)
EnsembleOpen-Meteo Ensemble APIprecipitation_summm → inches
ObservedIEM ASOS Daily APIprecip_inAlready in inches

IEM returns ALL columns regardless of the vars parameter. The actual column name for precipitation is precip_in (not precip). The engine renames this to precip_inches on ingestion.

Bias Correction — Multiplicative

Additive correction can produce negative rainfall (physically impossible). The engine uses multiplicative bias correction:

ratio = mean(observed_wet) / mean(forecast_wet)   per (station, month)
corrected = raw * ratio,  clamped >= 0

where "wet" = values > RAINFALL_ZERO_THRESHOLD (0.005 inches)

The multiplicative method also stores dry-day frequencies: p_zero_obs and p_zero_fct per (station, month) for use in the zero-inflated Gamma distribution.

Fallback hierarchy: (station, month) → station overall → global. If no valid ratio exists, falls back to 1.0 (no correction).

Implementation: train_multiplicative_bias() and get_multiplicative_ratio() in src/models/bias_correction.py.

Probability — Zero-Inflated Gamma

Rainfall has a mixed discrete-continuous distribution: a point mass at zero (dry days) and a continuous Gamma distribution for wet days.

P(precip > T) = (1 - p_zero) * P(Gamma(alpha, beta) > T)

Where:

  • p_zero = probability of zero rainfall (estimated from training data)
  • alpha (shape) and beta (scale) = Gamma distribution parameters fitted via MLE

Live: Ensemble member counting (same as temperature): P(exceed) = count(members > threshold) / n_members.

Implementation: gamma_exceedance() in src/core/distribution.py, with fit_gamma_zero_inflated() for parameter estimation.

Thresholds

Fixed thresholds (not generated dynamically):

ThresholdInchesDescription
Trace0.01Any measurable precipitation
Light0.10Light rain
Moderate0.25NWS "measurable" rain boundary
Heavy0.50Construction delay trigger
Very Heavy1.00Significant accumulation
Extreme2.00Flood risk

Backtest Results (Phase 4)

Period: Jan–Mar 2026, 10 cities, GFS precipitation_sum vs IEM precip_in.

MetricValue
MAE0.036 inches
Bias-0.0018 inches
Brier (raw)0.0403
Brier (calibrated)0.0292
Bias methodMultiplicative
DistributionGamma (zero-inflated)

Wind Speed (wind_speed)

Implementation

@register_variable("wind_speed")
class WindSpeed(WeatherVariable):
    config = VariableConfig(
        name="wind_speed",
        display_name="Daily Max Wind Speed",
        unit="mph",
        distribution_type=DistributionType.WEIBULL,
        bias_method=BiasMethod.MULTIPLICATIVE,
        open_meteo_variable="wind_speed_10m_max",
        forecast_col="forecast_wind_mph",
        observed_col="max_wind_mph",
        corrected_col="corrected_wind_mph",
        data_dir_name="wind",
        thresholds=[15, 20, 25, 30, 35, 40, 50],
        threshold_integer=True,
        clamp_min=0.0,
        exceedance_semantic="above",
    )

Data Sources

SourceAPIVariableConversion
ForecastOpen-Meteo Historical Forecastwind_speed_10m_maxkm/h → mph (* 0.621371)
EnsembleOpen-Meteo Ensemble APIwind_speed_10m_maxkm/h → mph
ObservedIEM ASOS Daily APImax_wind_speed_ktsknots → mph (* 1.15078)

IEM returns the column as max_wind_speed_kts (not max_sknt). The engine converts knots to mph on ingestion and renames to max_wind_mph.

Bias Correction — Multiplicative

Same as rainfall: corrected = raw * ratio, clamped >= 0.

Important: GFS 10m grid-average wind differs systematically from IEM station-level sustained wind observations. The backtest shows a +7.09 mph positive bias — GFS significantly overforecasts peak wind compared to station observations. The multiplicative correction ratio of ~1.61x addresses this systematic scale mismatch.

Probability — Weibull

Wind speed follows a Weibull distribution (bounded below at zero, positive skew):

P(wind > T) = 1 - CDF_Weibull(T, k, lambda)

Where k (shape) and lambda (scale) are fitted via scipy.stats.weibull_min.fit() with a moment-based fallback.

Implementation: weibull_exceedance() in src/core/distribution.py.

Thresholds

Fixed operational thresholds:

Threshold (mph)DescriptionRisk Application
15Moderate windBegin monitoring
20Fresh breezeSecure loose materials
25Strong windCurtail some outdoor work
30High windCurtail crane operations
35Very high windConstruction site shutdown
40Gale-forceInfrastructure risk
50Storm-forceEmergency protocols

Backtest Results (Phase 4)

MetricValue
MAE7.16 mph
Bias+7.09 mph (GFS overforecasts station-level wind)
Brier (raw)0.0514
Brier (calibrated)0.0462
Bias methodMultiplicative
DistributionWeibull

The large positive bias reflects the fundamental mismatch between GFS 10m grid-average wind and IEM station-level sustained wind. Multiplicative correction significantly improves calibration.


Wind Gust (wind_gust)

Implementation

@register_variable("wind_gust")
class WindGust(WeatherVariable):
    config = VariableConfig(
        name="wind_gust",
        display_name="Daily Max Wind Gust",
        unit="mph",
        distribution_type=DistributionType.WEIBULL,
        bias_method=BiasMethod.MULTIPLICATIVE,
        open_meteo_variable="wind_gusts_10m_max",
        forecast_col="forecast_gust_mph",
        observed_col="max_gust_mph",
        corrected_col="corrected_gust_mph",
        data_dir_name="wind_gust",
        thresholds=[25, 30, 40, 50, 60],
        threshold_integer=True,
        clamp_min=0.0,
        exceedance_semantic="above",
    )

Data Sources

SourceVariableConversion
Forecast (Open-Meteo)wind_gusts_10m_maxkm/h → mph
Observed (IEM)max_wind_gust_ktsknots → mph (* 1.15078)

Thresholds

Higher thresholds than sustained wind: [25, 30, 40, 50, 60] mph. Gusts represent instantaneous peak values which are always higher than sustained averages.


Irradiance (irradiance)

Critical for solar energy — determines PPA performance and generation revenue. Uses a clear-sky index (CSI) approach where the observed irradiance is normalised by theoretical clear-sky irradiance to produce a value on [0, 1].

Implementation

@register_variable("irradiance")
class Irradiance(WeatherVariable):
    config = VariableConfig(
        name="irradiance",
        display_name="Daily Irradiance",
        unit="MJ/m²",
        distribution_type=DistributionType.BETA,
        bias_method=BiasMethod.MULTIPLICATIVE,
        open_meteo_variable="shortwave_radiation_sum",
        forecast_col="forecast_irradiance_mj",
        observed_col="irradiance_mj_m2",
        corrected_col="corrected_irradiance_mj",
        data_dir_name="irradiance",
        thresholds=[],  # dynamic, based on CSI bins
        threshold_integer=False,
        clamp_min=0.0,
        exceedance_semantic="below",  # risk is LOW irradiance
    )

Data Sources

SourceAPIVariableNotes
ForecastOpen-Meteo Historical Forecastshortwave_radiation_sumMJ/m²
EnsembleOpen-Meteo Ensemble APIshortwave_radiation_sumMJ/m²
ObservedOpen-Meteo ERA5 Archive APIshortwave_radiation_sumERA5 reanalysis — independent of GFS/ECMWF forecasts

IEM has no solar radiation data. The engine uses ERA5 reanalysis (archive-api.open-meteo.com/v1/archive) as independent ground truth for irradiance. ERA5 is a global atmospheric reanalysis that assimilates millions of observations — its irradiance values are independent of the GFS/ECMWF forecasts being evaluated.

Clear-Sky Index (CSI)

Rather than working with raw irradiance values (which vary by latitude, season, and day length), the engine normalises to a clear-sky index:

CSI = actual_irradiance / clear_sky_irradiance

Where clear_sky_irradiance is computed from solar geometry using the Angstrom-Prescott model:

  1. Compute solar declination from day of year
  2. Compute sunset hour angle from latitude and declination
  3. Compute extraterrestrial radiation (Ra) using solar constant and orbit eccentricity
  4. Apply Angstrom-Prescott coefficients (a=0.25, b=0.50) for clear-sky GHI

Implementation: clear_sky_ghi_daily() in src/variables/irradiance/clear_sky.py.

CSI ∈ [0, 1]:

  • CSI ≈ 1.0 → clear sky, maximum solar generation
  • CSI ≈ 0.5 → partly cloudy
  • CSI ≈ 0.2 → overcast, minimal generation

Exceedance Semantic — Below

For irradiance, the relevant risk question is P(CSI < threshold) — the probability that irradiance falls below a given level. Low CSI means poor solar generation. The engine's exceedance semantic is "below".

Bias Correction — Multiplicative

Same as rainfall and wind: corrected = raw * ratio, clamped >= 0. Applied to raw MJ/m² values.

Probability — Beta Distribution

The clear-sky index is naturally bounded on [0, 1], making the Beta distribution the appropriate choice:

P(CSI < T) = CDF_Beta(T, alpha, beta)

Where alpha and beta are fitted from training data via scipy.stats.beta.fit() with a moment-based fallback.

Implementation: beta_exceedance() in src/core/distribution.py.

Thresholds

Dynamic thresholds based on clear-sky index bins:

CSI ThresholdDescription
0.2Very low generation (overcast)
0.4Significant shortfall
0.6Below-average generation
0.8Near-clear conditions
1.0Clear sky (theoretical max)

Backtest Results (Phase 4)

MetricValue
MAE1.49 MJ/m²
Bias-0.13 MJ/m²
Brier (raw)0.0530
Brier (calibrated)0.0498
Bias methodMultiplicative
DistributionBeta (on CSI)

Generic Backtest Pipeline

All variables use a unified backtest pipeline via scripts/run_phase_generic.py:

python scripts/run_phase_generic.py --variable rainfall --phase 0+1
python scripts/run_phase_generic.py --variable wind_speed --phase 0+1
python scripts/run_phase_generic.py --variable irradiance --phase 0+1

Phase 0 (Baseline):

  1. Download historical forecasts via var.ingest_historical() (Open-Meteo)
  2. Download observations via var.observe() (IEM or ERA5)
  3. Merge on [station, date]
  4. Compute MAE, bias, RMSE

Phase 1 (Calibration):

  1. Split train/test (Jan–Feb / March)
  2. Train bias parameters (additive or multiplicative based on var.config.bias_method)
  3. Generate backtest probabilities via generate_backtest_probabilities_generic():
    • Apply bias correction: var.bias_correct()
    • Generate thresholds: var.generate_thresholds()
    • Compute exceedance probability: var.exceedance_probability()
    • Determine observed outcome based on var.config.exceedance_semantic
  4. Train isotonic calibration
  5. Compute Brier scores (raw vs calibrated)

The generic pipeline dispatches everything to the variable's methods — no variable-specific code exists in the orchestrator.

Sigma Computation for Non-Gaussian Variables

For temperature (Gaussian), sigma comes from the trained bias parameters. For non-Gaussian variables (rainfall, wind, irradiance), the generic backtest pipeline pre-computes a residual sigma per (station, month):

for (station, month), sub in df.groupby(["station", "month"]):
    residuals = sub[obs_col] - sub[fct_col]
    sigma = residuals.std(ddof=1)

This sigma is used as the spread parameter in the distribution-specific CDF functions. Cached in _sigma_cache for efficiency.


Variable Priority & Product Readiness

VariableEngine StatusProduct ReadinessForecastExRisk API
Temperature (DH)Live — Phase 2 signal generation activeTier 1–3 readyYes (DH contracts)N/A
Temperature (NLL)Live — NLL pipeline active alongside DHTier 1–3 readyYes (NLL contracts)N/A
RainfallBacktested — Phase 4 calibration completeTier 1–2 near-termNo contracts existConstruction delay
Wind SpeedBacktested — Phase 4 calibration completeTier 1 firstNo contracts existOperational risk
Wind GustBacktested — Phase 4 calibration completeTier 1 firstNo contracts existOperational risk
IrradianceBacktested — Phase 4 calibration completeSolar-specific Tier 1No contracts existSolar shortfall