Competitive Benchmarking
How Cliff Horizon's architecture compares to existing weather prediction market tools
Approach
Three categories of competitor exist in the weather prediction space: proprietary NWP model companies, long-range ML forecasters, and emerging prediction market tools. Cliff Horizon benchmarks against all three — but the prediction market tools are the most relevant comparison for the ForecastEx proving ground, because they validate the core methodology and reveal the specific gaps the engine exploits.
Prediction Market Bot & Tool Landscape
Updated 5 April 2026 — comprehensive competitor research.
Several open-source and commercial tools trade weather prediction markets using NWP ensemble data. Their techniques validate the approach (NWP ensemble → probabilistic edge → Kelly sizing) but they are all trading-as-product — profit is the objective, not calibration accuracy. None extend to multi-variable risk quantification, medium-range forecasting, or commercial weather risk protection.
Weather Intelligence Platforms
| Tool | Technique | Key Feature |
|---|---|---|
| ClimateSight | Custom ML model trained on co-located ASOS station data | EV calculators, 10-year backtesting, rounding simulator, multi-model comparison (HRRR, GFS, NAM). Most sophisticated of the group. |
| Wethr.net | 16+ data layers (METAR, extremes, CLI), 16+ model comparison | Real-time analytics for Kalshi/Polymarket. Freemium (3-min delay free, real-time paid). |
| minuteTemp | 60-second updates across GFS/HRRR/ECMWF/NAM/NBM/ICON | Speed play — competitors lag ~20 min. REST + WebSocket APIs for bots. Launched March 2026. |
| DailyDewpoint | Auto-logs every NWS issuance, ranks accuracy per issuance time | Audit trail play — answers "which forecast issuance is most accurate?" $5/month. |
Trading Bots & AI Agent Platforms
| Tool | Technique | Key Feature |
|---|---|---|
| GFS ensemble bots | 31-member GFS ensemble via Open-Meteo, member counting for probability | Kelly criterion sizing, 8% edge threshold. Paper trading only ($1.8k simulated). |
| Simmer / SpartanLabs | Unified Python SDK for AI agent trading on Polymarket/Kalshi | Pre-built weather trader skill (gopfan2-style). Safety rails, conflict detection. |
| gopfan2 (benchmark) | Buy YES < $0.15, NO > $0.45. Exploits NWS update latency vs market price. | Reportedly $2M+ profit from weather markets. Not a forecasting engine — a market microstructure trader. |
Critical assessment: Every platform operates exclusively at the daily horizon for US city temperature. None track reliability diagrams or Brier scores. None produce calibrated probability distributions. None operate at the 7–30 day horizon. None have a commercial path beyond trading PnL or subscriptions.
Upstream AI Weather Models (Open Source)
NOAA deployed operational AI weather models (fine-tuned GraphCast) in December 2025. GenCast (Google DeepMind) produces calibrated 15-day ensemble forecasts. ECMWF AIFS operational since February 2025. FuXi-S2S produces 42-day subseasonal forecasts. These are inputs to the engine — better upstream models make the calibration layer more valuable, not less. See phase-0-reference.md Section 7.10 for the full model inventory.
Where Cliff Horizon Exceeds Existing Tools
| Gap in Existing Tools | Cliff Horizon's Answer |
|---|---|
| No bias correction — raw ensemble counting or learned sigma only, no directional correction | City × month × regime bias correction from rolling 90-day calibration window; corrects both mean and sigma |
| No ensemble weighting — single model or equal-weight only | Inverse-error weighted multi-model ensemble (GFS + ECMWF + HRRR); regime-dependent weights |
| No behavioural signals — none use grid load or any non-NWP signal | ISO grid load anomaly as temperature modifier for cities with ISO coverage |
| No satellite ground truth — all calibrate against NWS observations only | SatSure Sparta API as Layer 1 — proprietary satellite-derived ground truth for bias correction |
| No position correlation management — each market treated independently | City-level exposure caps; maximum concurrent positions across correlated markets |
| No multi-variable capability — temperature only, no path to rainfall/irradiance/wind | WeatherVariable base class generalises across all variables; same pipeline, different distributions |
| No commercial product path — trading P&L is the end goal | Trading is the credential; the product is risk protection via Ensuro parametric derivatives |
Techniques Absorbed
The engine incorporates validated techniques from competitor analysis:
Brier score calibration loop — track (model_probability, outcome) per bucket, recalibrate weekly. This is the core metric that proves calibration quality and feeds the reliability diagram.
Z-score significance testing — edge magnitude alone is insufficient. A 10% edge with σ = 15% is noise; a 10% edge with σ = 3% is a signal. Minimum z-score threshold of 1.5 filters false positive trades.
RMSE-by-lead-time confidence scaling — wider sigma for longer lead forecasts. Lead time is a first-order driver of forecast uncertainty. The engine scales confidence by D+0 through D+7 lead time, not just city and month.
METAR blending for D+0 — on the settlement day itself, live surface observations progressively shift the probability distribution toward observed reality. This enables mid-day position adjustments when the market is still liquid.
Smart source selection — HRRR/NAM for near-term US forecasts (higher resolution, shorter horizon), ECMWF for longer range. Absorbed into the ensemble weighting framework as a pre-filter before inverse-error weighting.
Forecast-divergence monitoring — if the latest NWP run shifts the engine probability by more than 10 percentage points from entry, the position is flagged for review.
Incumbent Comparison
Beyond prediction market tools, the engine is positioned against established weather intelligence companies:
| Capability | Climavision | Meteomatics | Xweather (Vaisala) | DTN | Salient | Cliff Horizon |
|---|---|---|---|---|---|---|
| Proprietary NWP model | 0.67–2 km | 1 km, 15-min | Settlement data | ProphetX | S2S ML | Post-processing layer |
| Multi-variable risk integration | No | No | No | No | No | Yes — temperature + rainfall + irradiance + wind |
| Prediction market validation | No | No | No | No | No | Yes — ForecastEx live track record |
| Satellite data layer | No | No | No | No | No | Yes — SatSure Sparta |
| Emerging market focus | CONUS | Europe/US | Global stations | US/Europe | Global | SEA, India, Middle East, Africa |
| Parametric derivative pricing | No | No | Index provision | No | No | Yes — Ensuro Risk Module |
| Cash-paying warranty | No | No | No | No | No | Yes — Tier 2 warranted analytics |
What Is Not an Edge
Cliff Horizon does not compete on short-term US temperature forecasting accuracy against Climavision (0.67 km resolution) or Meteomatics (1 km, 15-minute temporal). The ForecastEx proving ground requires the engine to be better calibrated than the market price — which is set by retail traders, simple GFS bots, and informed participants. The bar is "beat the crowd," not "beat Climavision."
Similarly, Xweather has 100,000 ground stations and DTN has the largest private meteorological team. The engine's value is in synthesis — combining public NWP + SatSure + behavioural signals into calibrated probability — not in owning the data.