Competitive Benchmarking

Approach

Three categories of competitor exist in the weather prediction space: proprietary NWP model companies, long-range ML forecasters, and emerging prediction market tools. Cliff Horizon benchmarks against all three — but the prediction market tools are the most relevant comparison for the ForecastEx proving ground, because they validate the core methodology and reveal the specific gaps the engine exploits.

Prediction Market Bot & Tool Landscape

Updated 5 April 2026 — comprehensive competitor research.

Several open-source and commercial tools trade weather prediction markets using NWP ensemble data. Their techniques validate the approach (NWP ensemble → probabilistic edge → Kelly sizing) but they are all trading-as-product — profit is the objective, not calibration accuracy. None extend to multi-variable risk quantification, medium-range forecasting, or commercial weather risk protection.

Weather Intelligence Platforms

Tool	Technique	Key Feature
ClimateSight	Custom ML model trained on co-located ASOS station data	EV calculators, 10-year backtesting, rounding simulator, multi-model comparison (HRRR, GFS, NAM). Most sophisticated of the group.
Wethr.net	16+ data layers (METAR, extremes, CLI), 16+ model comparison	Real-time analytics for Kalshi/Polymarket. Freemium (3-min delay free, real-time paid).
minuteTemp	60-second updates across GFS/HRRR/ECMWF/NAM/NBM/ICON	Speed play — competitors lag ~20 min. REST + WebSocket APIs for bots. Launched March 2026.
DailyDewpoint	Auto-logs every NWS issuance, ranks accuracy per issuance time	Audit trail play — answers "which forecast issuance is most accurate?" $5/month.

Trading Bots & AI Agent Platforms

Tool	Technique	Key Feature
GFS ensemble bots	31-member GFS ensemble via Open-Meteo, member counting for probability	Kelly criterion sizing, 8% edge threshold. Paper trading only ($1.8k simulated).
Simmer / SpartanLabs	Unified Python SDK for AI agent trading on Polymarket/Kalshi	Pre-built weather trader skill (gopfan2-style). Safety rails, conflict detection.
gopfan2 (benchmark)	Buy YES < $0.15, NO > $0.45. Exploits NWS update latency vs market price.	Reportedly $2M+ profit from weather markets. Not a forecasting engine — a market microstructure trader.

Critical assessment: Every platform operates exclusively at the daily horizon for US city temperature. None track reliability diagrams or Brier scores. None produce calibrated probability distributions. None operate at the 7–30 day horizon. None have a commercial path beyond trading PnL or subscriptions.

Upstream AI Weather Models (Open Source)

NOAA deployed operational AI weather models (fine-tuned GraphCast) in December 2025. GenCast (Google DeepMind) produces calibrated 15-day ensemble forecasts. ECMWF AIFS operational since February 2025. FuXi-S2S produces 42-day subseasonal forecasts. These are inputs to the engine — better upstream models make the calibration layer more valuable, not less. See phase-0-reference.md Section 7.10 for the full model inventory.

Where Cliff Horizon Exceeds Existing Tools

Gap in Existing Tools	Cliff Horizon's Answer
No bias correction — raw ensemble counting or learned sigma only, no directional correction	City × month × regime bias correction from rolling 90-day calibration window; corrects both mean and sigma
No ensemble weighting — single model or equal-weight only	Inverse-error weighted multi-model ensemble (GFS + ECMWF + HRRR); regime-dependent weights
No behavioural signals — none use grid load or any non-NWP signal	ISO grid load anomaly as temperature modifier for cities with ISO coverage
No satellite ground truth — all calibrate against NWS observations only	SatSure Sparta API as Layer 1 — proprietary satellite-derived ground truth for bias correction
No position correlation management — each market treated independently	City-level exposure caps; maximum concurrent positions across correlated markets
No multi-variable capability — temperature only, no path to rainfall/irradiance/wind	`WeatherVariable` base class generalises across all variables; same pipeline, different distributions
No commercial product path — trading P&L is the end goal	Trading is the credential; the product is risk protection via Ensuro parametric derivatives

Techniques Absorbed

The engine incorporates validated techniques from competitor analysis:

Brier score calibration loop — track (model_probability, outcome) per bucket, recalibrate weekly. This is the core metric that proves calibration quality and feeds the reliability diagram.

Z-score significance testing — edge magnitude alone is insufficient. A 10% edge with σ = 15% is noise; a 10% edge with σ = 3% is a signal. Minimum z-score threshold of 1.5 filters false positive trades.

RMSE-by-lead-time confidence scaling — wider sigma for longer lead forecasts. Lead time is a first-order driver of forecast uncertainty. The engine scales confidence by D+0 through D+7 lead time, not just city and month.

METAR blending for D+0 — on the settlement day itself, live surface observations progressively shift the probability distribution toward observed reality. This enables mid-day position adjustments when the market is still liquid.

Smart source selection — HRRR/NAM for near-term US forecasts (higher resolution, shorter horizon), ECMWF for longer range. Absorbed into the ensemble weighting framework as a pre-filter before inverse-error weighting.

Forecast-divergence monitoring — if the latest NWP run shifts the engine probability by more than 10 percentage points from entry, the position is flagged for review.

Incumbent Comparison

Beyond prediction market tools, the engine is positioned against established weather intelligence companies:

Capability	Climavision	Meteomatics	Xweather (Vaisala)	DTN	Salient	Cliff Horizon
Proprietary NWP model	0.67–2 km	1 km, 15-min	Settlement data	ProphetX	S2S ML	Post-processing layer
Multi-variable risk integration	No	No	No	No	No	Yes — temperature + rainfall + irradiance + wind
Prediction market validation	No	No	No	No	No	Yes — ForecastEx live track record
Satellite data layer	No	No	No	No	No	Yes — SatSure Sparta
Emerging market focus	CONUS	Europe/US	Global stations	US/Europe	Global	SEA, India, Middle East, Africa
Parametric derivative pricing	No	No	Index provision	No	No	Yes — Ensuro Risk Module
Cash-paying warranty	No	No	No	No	No	Yes — Tier 2 warranted analytics

What Is Not an Edge

Cliff Horizon does not compete on short-term US temperature forecasting accuracy against Climavision (0.67 km resolution) or Meteomatics (1 km, 15-minute temporal). The ForecastEx proving ground requires the engine to be better calibrated than the market price — which is set by retail traders, simple GFS bots, and informed participants. The bar is "beat the crowd," not "beat Climavision."

Similarly, Xweather has 100,000 ground stations and DTN has the largest private meteorological team. The engine's value is in synthesis — combining public NWP + SatSure + behavioural signals into calibrated probability — not in owning the data.