Engine Overview
Architecture of the Cliff Horizon Weather Intelligence Engine — from data ingest to calibrated probability output and risk scoring.
The Cliff Horizon engine is a calibrated probability pipeline that transforms raw weather model output into risk-grade probability distributions. It is not a weather model — it is a post-processing and calibration layer that sits on top of existing NWP models and adds proprietary data layers.
Architecture
Development Status
Phase 4 — Multi-Variable Expansion (Complete). The engine now supports five weather variables via a polymorphic WeatherVariable architecture: temperature daily high, temperature nighttime low, rainfall, wind speed, and irradiance. Each variable has its own statistical distribution, bias correction method, and probability computation. A FastAPI risk scoring API provides commercial risk assessments for construction delay (rainfall), solar shortfall (irradiance), and wind operations. A read-only System Status API and dashboard tab provide visibility into the variable registry, pipeline health, data freshness, and engine configuration.
| Phase | Status | Description |
|---|---|---|
| Phase 0 | Complete | GFS baseline accuracy — MAE/bias/RMSE across 10 cities, 900 city-days |
| Phase 1 | Complete | Bias correction + isotonic calibration + edge calculator + backtest P&L |
| Phase 2 | Complete | Live signal generation, NLL pipeline, daily automation, dashboard integration |
| Phase 3 | Complete | Multi-model ECMWF + GFS ensemble, intraday hardening (lead-time sigma, D+0 METAR blending, divergence monitor) |
| Phase 4 | Complete | Multi-variable expansion (rainfall, irradiance, wind), WeatherVariable ABC, registry pattern, risk API, generic backtest pipeline |
Core Variable Architecture (Phase 4)
The engine uses a registry pattern to support multiple weather variables through a single unified pipeline. Each variable is a subclass of WeatherVariable with its own configuration, data ingestion, bias correction, and probability computation.
WeatherVariable ABC (src/core/variable.py)
Every weather variable implements this interface:
class WeatherVariable(ABC):
config: VariableConfig
def ingest_historical(cities, start, end, model) -> DataFrame # backtest forecasts
def ingest_ensemble(cities, forecast_days, model) -> DataFrame # live ensemble
def observe(cities, start, end) -> DataFrame # ground truth
def bias_correct(raw, bias_params, station, month) -> float # single-value correction
def bias_correct_ensemble(df, bias_params) -> DataFrame # bulk correction
def exceedance_probability(value, sigma, threshold) -> float # CDF-based
def ensemble_exceedance_probability(members, threshold) -> float # member counting
def generate_thresholds(mean) -> list[float] # contract strikes
VariableConfig (src/core/variable.py)
Each variable's behaviour is controlled by a configuration dataclass:
@dataclass
class VariableConfig:
name: str # "rainfall"
display_name: str # "Daily Rainfall"
unit: str # "inches"
distribution_type: DistributionType # GAUSSIAN, GAMMA, BETA, WEIBULL
bias_method: BiasMethod # ADDITIVE or MULTIPLICATIVE
open_meteo_variable: str # "precipitation_sum"
forecast_col: str # "forecast_precip_in"
observed_col: str # "precip_inches"
corrected_col: str # "corrected_precip_in"
data_dir_name: str # "rainfall"
thresholds: list[float] # [0.01, 0.1, 0.25, 0.5, 1.0, 2.0]
threshold_integer: bool # False for rainfall, True for temperature
clamp_min: float | None # 0.0 for non-negative variables
exceedance_semantic: str # "above" or "below"
Variable Registry (src/core/registry.py)
Variables are registered at import time via a decorator:
@register_variable("rainfall")
class Rainfall(WeatherVariable):
...
Lookup uses get_variable("rainfall") or aliases like get_variable("high") which resolves to "temperature_high".
Registered variables:
| Registry Name | Aliases | Distribution | Bias Method | Exceedance |
|---|---|---|---|---|
temperature_high | high | Gaussian | Additive | above (P(T > threshold)) |
temperature_low | low | Gaussian | Additive | below (P(T < threshold)) |
rainfall | — | Gamma (zero-inflated) | Multiplicative | above (P(precip > threshold)) |
wind_speed | — | Weibull | Multiplicative | above (P(wind > threshold)) |
wind_gust | — | Weibull | Multiplicative | above (P(gust > threshold)) |
irradiance | — | Beta | Multiplicative | below (P(CSI < threshold)) |
Distribution Functions (src/core/distribution.py)
Each distribution type has its own exceedance probability function:
| Distribution | Variable | Formula | Implementation |
|---|---|---|---|
| Gaussian | Temperature | P(X > T) = 1 - Phi((T - mu) / sigma) | gaussian_exceedance() |
| Gamma | Rainfall | P(X > T) = (1 - p_zero) * P(Gamma(a,b) > T) | gamma_exceedance() — zero-inflated for dry days |
| Weibull | Wind | P(X > T) = 1 - CDF_Weibull(T, k, lambda) | weibull_exceedance() |
| Beta | Irradiance | P(CSI > T) = 1 - CDF_Beta(T, alpha, beta) | beta_exceedance() — on clear-sky index [0,1] |
Distribution parameters are fitted from training data:
fit_gamma_zero_inflated(values)— MLE with method-of-moments fallbackfit_weibull(values)— scipyweibull_min.fitwith moment-based fallbackfit_beta(values)— Beta fit with moment-based fallback
Engine Pipeline
Live Signal Generation (src/engine/live_signal_generator.py)
The LiveSignalGenerator class orchestrates the full pipeline from ensemble fetch to trade signal:
1. load_models()
Load bias_params.json[model] for each NWP model
Load calibration_model.pkl (isotonic regression)
Extract prob_sigma = sqrt(brier_calibrated)
2. fetch_ensemble(target_date)
Open-Meteo Ensemble API
→ [station, date, member, forecast_value]
3. bias_correct_ensemble()
For each member: corrected = raw + get_bias(params, station, month) [additive]
or corrected = raw * get_ratio(params, station, month) [multiplicative]
→ [station, date, member, corrected_value]
4. compute_probabilities()
Generate thresholds via var.generate_thresholds() or integer +/- STRIKES_PER_SIDE
For each threshold: P(exceed) = count(members > threshold) / n_members
→ [station, date, threshold, raw_probability]
5. calibrate()
calibrated_prob = isotonic_model.predict(raw_probability)
→ [station, date, threshold, calibrated_probability]
6. merge_market_prices() [optional]
Left-join with IBKR quotes
→ [station, date, threshold, calibrated_probability, market_mid]
7. generate_trade_signals()
edge = |calibrated_prob - market_price|
direction = "YES" if calibrated > market, "NO" else
zscore = edge / sqrt(prob_sigma^2 + market_noise^2)
kelly = (p*b - q) / b * KELLY_FRACTION
passes = (confidence >= 70) AND (zscore >= 1.5)
→ TradeSignal rows
Constructor signature:
LiveSignalGenerator(variable="high", models=["gfs_seamless"])
The variable parameter accepts strings ("high", "low", "rainfall", etc.) which resolve via the registry, or a WeatherVariable instance directly. When a WeatherVariable is provided, the generator delegates to its methods for ensemble fetching, bias correction, and probability computation.
Multi-model support: When multiple models are specified (e.g., ["gfs_seamless", "ecmwf_ifs025"]), each model has separate bias parameters trained from its own historical forecast-vs-observed data. The ensemble members from all models are pooled for member counting.
Edge Calculator (src/engine/edge_calculator.py)
Converts calibrated probabilities into sized trade signals:
Edge: edge = |calibrated_prob - market_price|
Z-Score: z = edge / sqrt(prob_sigma^2 + MARKET_NOISE_SIGMA^2)
Where prob_sigma = sqrt(Brier_calibrated) from Phase 1 training, and MARKET_NOISE_SIGMA = 0.05 (assumed pricing noise).
Kelly Criterion:
f* = (p * b - q) / b
where p = win_prob, q = 1-p, b = (1 - entry_price) / entry_price
position = min(f* * KELLY_FRACTION, MAX_POSITION_PCT)
Lead-time sigma scaling (Phase 3):
| Lead Time | Sigma Multiplier | Rationale |
|---|---|---|
| D+0 | 0.7x | Observations available; highest confidence |
| D+1 | 1.0x | Baseline calibration window |
| D+2 | 1.3x | Moderate uncertainty increase |
| D+3 | 1.6x | Approaching ensemble skill horizon |
| D+4+ | 2.0x | Large edge required to pass filters |
Risk Scorer (src/engine/risk_scorer.py)
Commercial risk assessment engine for non-temperature variables. Converts probability curves into actionable risk levels (LOW, MODERATE, HIGH, EXTREME) with industry-specific labels.
Rainfall — Construction Delay Risk:
| Condition | Risk Level | Label |
|---|---|---|
| P(>0.5") >= 0.70 | EXTREME | Halt outdoor work |
| P(>0.25") >= 0.60 | HIGH | Delay concrete/paving |
| P(>0.10") >= 0.50 | MODERATE | Light rain, monitor |
| else | LOW | Dry conditions |
Irradiance — Solar Shortfall Risk:
| Condition | Risk Level | Label |
|---|---|---|
| P(CSI < 0.2) >= 0.60 | EXTREME | Very low generation |
| P(CSI < 0.4) >= 0.50 | HIGH | Significant shortfall |
| P(CSI < 0.6) >= 0.50 | MODERATE | Below-average generation |
| else | LOW | Normal/above-average |
Wind — Operational Risk:
| Condition | Risk Level | Label |
|---|---|---|
| P(>50 mph) >= 0.30 | EXTREME | Halt crane operations |
| P(>35 mph) >= 0.40 | HIGH | Curtail operations |
| P(>25 mph) >= 0.50 | MODERATE | Secure loose items |
| else | LOW | Light winds |
Divergence Monitor (src/engine/divergence_monitor.py)
Detects forecast drift between the morning baseline and afternoon ensemble refresh:
DivergenceMonitor(threshold_pp=10) # 10 percentage-point default
check_divergence(baseline_signals, current_probs) -> DataFrame
# For each (station, date, threshold):
# shift_pp = (current_prob - morning_prob) * 100
# Action: REVERSE (direction flipped), FLAG (|shift| >= threshold), HOLD (normal)
Output is saved to data/signals/divergence_YYYY-MM-DD.csv and logged to the dashboard.
D+0 Intraday Blending (src/signals/d0_signal_generator.py)
On settlement day, the engine blends live hourly METAR observations with the morning ensemble forecast:
Observation weight ramp: weight = min(hour / 18, 0.9) — linear from 0% at midnight to 90% at 6 PM.
Blended probability: blended = obs_weight * metar_prob + (1 - obs_weight) * ensemble_prob
Hard overrides:
- DH: If running max already exceeds threshold, probability set to ~0.95+ (outcome near-certain)
- NLL: If running min already below threshold, probability set to ~0.95+
METAR-implied probability (DH):
running_max > threshold → 0.98 (already exceeded)
hour >= 18 AND not exceeded → 0.02 (too late)
gap = threshold - running_max
gap > 10°F → ~0.05 (unlikely to reach)
gap <= 0°F → 0.95 (nearly certain)
gap in [0, 10°F] → interpolated based on gap and remaining hours
METAR data is fetched via src/signals/metar_live.py which parses IEM ASOS hourly data (state-specific networks, 3-letter station codes, tmpf field).
Design Principles
Calibration over accuracy. The engine optimises for calibrated probability (Brier score, reliability diagram) rather than point forecast accuracy (RMSE, MAE). A perfectly calibrated engine that says "70% chance" and is right 70% of the time is more valuable for risk products than a model with lower RMSE but unknown calibration.
Synthesis over data. The engine doesn't own proprietary weather stations or run its own NWP model. Its value is in combining public NWP output with proprietary satellite data (SatSure) and behavioural signals (grid load) — then calibrating the result to produce reliable probabilities.
Adapter, not rewrite. Temperature implementations delegate to existing functions (Phase 0-3). New variables (rainfall, wind, irradiance) plug in via the WeatherVariable interface without touching the existing codebase. Zero code duplication.
Audit trail by default. Every prediction stores its full provenance: timestamp, NWP sources ingested, bias correction parameters, ensemble weights, raw ensemble probability, and calibrated output. This is the chain of custody required for Tier 2 warranty verification.
Processing Cycle
The engine runs four times daily, aligned to NWP model runs:
| Run | NWP Cycle | Ingest Time | Output Available |
|---|---|---|---|
| 1 | 00Z | ~03:30 UTC | ~04:00 UTC |
| 2 | 06Z | ~09:30 UTC | ~10:00 UTC |
| 3 | 12Z | ~15:30 UTC | ~16:00 UTC |
| 4 | 18Z | ~21:30 UTC | ~22:00 UTC |
Each cycle processes the latest NWP ensemble data, applies bias corrections, recalculates ensemble weights based on recent verification performance, and produces updated calibrated probabilities for all monitored locations and variables.
Daily automation pipeline (scripts/daily_pipeline.sh, 8 steps):
| Step | Script | Description |
|---|---|---|
| 1a | daily_scoring.py --variable high | Score yesterday's DH predictions against IEM observed temps |
| 1b | daily_scoring.py --variable low | Score yesterday's NLL predictions against IEM observed lows |
| 2a | run_phase2.py --signal-only --variable high --models gfs_seamless ecmwf_ifs025 | Generate tomorrow's DH signals (multi-model) |
| 2b | run_phase2.py --signal-only --variable low --models gfs_seamless ecmwf_ifs025 | Generate tomorrow's NLL signals |
| 3 | run_d0_update.py | D+0 METAR blending for settlement-day contracts |
| 4 | run_variable_signals.py --variable rainfall --risk-scores | Rainfall signals + construction delay risk scores |
| 5 | run_variable_signals.py --variable wind_speed --risk-scores | Wind signals + operational risk scores |
| 6 | run_variable_signals.py --variable irradiance --risk-scores | Irradiance signals + solar shortfall risk scores |
| 7 | run_divergence.py | Forecast drift detection against morning baseline |
| 8 | export_dashboard_data.py | Export all signals, scores, and calibration data to dashboard JSON |
Midday pipeline (scripts/midday_pipeline.sh, triggered via LaunchAgent at noon local time):
| Step | Script | Description |
|---|---|---|
| 1 | run_d0_update.py --variable high | Bayesian-blend METAR observations with morning DH forecast |
| 2 | run_d0_update.py --variable low | Same for NLL contracts |
| 3 | run_divergence.py --variable high | Flag >10pp shifts or direction reversals |
| 4 | run_divergence.py --variable low | Same for NLL contracts |
Configuration Constants (config/settings.py)
Trading Parameters
| Constant | Value | Purpose |
|---|---|---|
CONFIDENCE_THRESHOLD | 70 | Minimum confidence (edge * 100) to pass filters |
KELLY_FRACTION | 0.25 | Fractional Kelly multiplier (quarter-Kelly) |
MAX_POSITION_PCT | 0.05 | Maximum 5% of bankroll per trade |
ZSCORE_MINIMUM | 1.5 | Minimum z-score to pass filters |
MARKET_NOISE_SIGMA | 0.05 | Assumed market pricing noise |
Probability Bounds
| Constant | Value | Context |
|---|---|---|
PROB_CLAMP_GAUSSIAN | (0.001, 0.999) | Backtest mode (Gaussian CDF) |
PROB_CLAMP_ENSEMBLE | (0.02, 0.98) | Live mode (ensemble member counting) |
Variable-Specific Thresholds
| Variable | Thresholds | Unit |
|---|---|---|
| Temperature | Integer +/- STRIKES_PER_SIDE (4) around mean | °F |
| Rainfall | [0.01, 0.1, 0.25, 0.5, 1.0, 2.0] | inches |
| Wind Speed | [15, 20, 25, 30, 35, 40, 50] | mph |
| Wind Gust | [25, 30, 40, 50, 60] | mph |
| Irradiance | [0.2, 0.4, 0.6, 0.8, 1.0] | Clear-sky index |
Risk API (src/api/risk_api.py)
FastAPI service providing commercial risk endpoints:
| Endpoint | Method | Description |
|---|---|---|
/api/v1/health | GET | Health check |
/api/v1/risk/{city}/{date} | GET | All variable risks for a city/date |
/api/v1/risk/{city}/{date}/{variable} | GET | Specific variable risk |
/api/v1/probability/{variable}/{city}/{date} | GET | Full probability curve |
Response schema:
{
"city": "KJFK",
"date": "2026-04-05",
"risks": [
{
"variable": "rainfall",
"risk_level": "HIGH",
"risk_label": "Delay concrete/paving",
"trigger_threshold": 0.25,
"trigger_probability": 0.65,
"detail": "P(>0.25 inches) = 65.0%"
}
]
}
System Status API (src/api/system_api.py)
Read-only endpoints for dashboard visibility into engine state. Mounted on the same FastAPI app.
| Endpoint | Method | Description |
|---|---|---|
/api/v1/system/variables | GET | Variable registry with training status, Brier scores, last trained dates |
/api/v1/system/pipeline | GET | Parse latest daily_*.log into step statuses (success/failed/not_run) |
/api/v1/system/config | GET | Read-only dump of trading, settlement, backtest, and lead-time sigma settings |
The /system/variables endpoint checks Phase 1 artifacts (bias_params.json, calibration_model.pkl, phase1_summary.json) for each registered variable to determine training status. The /system/pipeline endpoint parses the most recent pipeline log file using regex pattern matching on the [timestamp] Step N: message format.
Technology Stack
| Component | Technology | Purpose |
|---|---|---|
| Ingest service | Python | Pulls NWP data via Open-Meteo API, SatSure API, IBKR API |
| Time-series store | TimescaleDB | Native time-series PostgreSQL extension |
| Engine pipeline | Python | Bias correction, ensemble weighting, calibration, risk scoring |
| API layer | FastAPI | REST endpoints for dashboard, risk API, probability curves |
| Dashboard | React + Tailwind | Client-facing interface (10 tabs incl. System status) |
| Deployment | Docker | Containerised on single VPS (Kubernetes deferred) |
Source Code Map
| Directory | Contents |
|---|---|
src/core/ | variable.py (ABC + config), distribution.py (Gaussian/Gamma/Weibull/Beta), registry.py (decorator + factory) |
src/variables/temperature/ | daily_high.py, nighttime_low.py — Gaussian, additive bias |
src/variables/rainfall/ | rainfall.py — zero-inflated Gamma, multiplicative bias, mm-to-inches conversion |
src/variables/wind/ | wind.py, wind_gust.py — Weibull, multiplicative bias, knots-to-mph conversion |
src/variables/irradiance/ | irradiance.py, clear_sky.py — Beta on CSI, ERA5 ground truth, solar geometry model |
src/engine/ | live_signal_generator.py, edge_calculator.py, risk_scorer.py, divergence_monitor.py |
src/signals/ | d0_signal_generator.py (METAR blending), metar_live.py (IEM hourly fetch) |
src/models/ | bias_correction.py (additive + multiplicative), calibration.py (isotonic), ensemble_probability.py (generic backtest) |
src/data_ingestion/ | open_meteo.py (NWP forecasts), iem_observed.py (IEM highs/lows/precip/wind), open_meteo_reanalysis.py (ERA5 irradiance) |
src/api/ | risk_api.py (FastAPI risk endpoints), system_api.py (system status: variables, pipeline, config) |
scripts/ | daily_pipeline.sh, run_phase2.py, run_d0_update.py, run_divergence.py, run_variable_signals.py, run_phase_generic.py |
config/ | settings.py (all constants, paths, thresholds), cities.json (10-city roster) |