Cliff Horizon logo

Competitive Benchmarking

How Cliff Horizon's architecture compares to existing weather prediction market tools

Approach

Three categories of competitor exist in the weather prediction space: proprietary NWP model companies, long-range ML forecasters, and emerging prediction market tools. Cliff Horizon benchmarks against all three — but the prediction market tools are the most relevant comparison for the ForecastEx proving ground, because they validate the core methodology and reveal the specific gaps the engine exploits.


Prediction Market Bot & Tool Landscape

Updated 5 April 2026 — comprehensive competitor research.

Several open-source and commercial tools trade weather prediction markets using NWP ensemble data. Their techniques validate the approach (NWP ensemble → probabilistic edge → Kelly sizing) but they are all trading-as-product — profit is the objective, not calibration accuracy. None extend to multi-variable risk quantification, medium-range forecasting, or commercial weather risk protection.

Weather Intelligence Platforms

ToolTechniqueKey Feature
ClimateSightCustom ML model trained on co-located ASOS station dataEV calculators, 10-year backtesting, rounding simulator, multi-model comparison (HRRR, GFS, NAM). Most sophisticated of the group.
Wethr.net16+ data layers (METAR, extremes, CLI), 16+ model comparisonReal-time analytics for Kalshi/Polymarket. Freemium (3-min delay free, real-time paid).
minuteTemp60-second updates across GFS/HRRR/ECMWF/NAM/NBM/ICONSpeed play — competitors lag ~20 min. REST + WebSocket APIs for bots. Launched March 2026.
DailyDewpointAuto-logs every NWS issuance, ranks accuracy per issuance timeAudit trail play — answers "which forecast issuance is most accurate?" $5/month.

Trading Bots & AI Agent Platforms

ToolTechniqueKey Feature
GFS ensemble bots31-member GFS ensemble via Open-Meteo, member counting for probabilityKelly criterion sizing, 8% edge threshold. Paper trading only ($1.8k simulated).
Simmer / SpartanLabsUnified Python SDK for AI agent trading on Polymarket/KalshiPre-built weather trader skill (gopfan2-style). Safety rails, conflict detection.
gopfan2 (benchmark)Buy YES < $0.15, NO > $0.45. Exploits NWS update latency vs market price.Reportedly $2M+ profit from weather markets. Not a forecasting engine — a market microstructure trader.

Critical assessment: Every platform operates exclusively at the daily horizon for US city temperature. None track reliability diagrams or Brier scores. None produce calibrated probability distributions. None operate at the 7–30 day horizon. None have a commercial path beyond trading PnL or subscriptions.

Upstream AI Weather Models (Open Source)

NOAA deployed operational AI weather models (fine-tuned GraphCast) in December 2025. GenCast (Google DeepMind) produces calibrated 15-day ensemble forecasts. ECMWF AIFS operational since February 2025. FuXi-S2S produces 42-day subseasonal forecasts. These are inputs to the engine — better upstream models make the calibration layer more valuable, not less. See phase-0-reference.md Section 7.10 for the full model inventory.


Where Cliff Horizon Exceeds Existing Tools

Gap in Existing ToolsCliff Horizon's Answer
No bias correction — raw ensemble counting or learned sigma only, no directional correctionCity × month × regime bias correction from rolling 90-day calibration window; corrects both mean and sigma
No ensemble weighting — single model or equal-weight onlyInverse-error weighted multi-model ensemble (GFS + ECMWF + HRRR); regime-dependent weights
No behavioural signals — none use grid load or any non-NWP signalISO grid load anomaly as temperature modifier for cities with ISO coverage
No satellite ground truth — all calibrate against NWS observations onlySatSure Sparta API as Layer 1 — proprietary satellite-derived ground truth for bias correction
No position correlation management — each market treated independentlyCity-level exposure caps; maximum concurrent positions across correlated markets
No multi-variable capability — temperature only, no path to rainfall/irradiance/windWeatherVariable base class generalises across all variables; same pipeline, different distributions
No commercial product path — trading P&L is the end goalTrading is the credential; the product is risk protection via Ensuro parametric derivatives

Techniques Absorbed

The engine incorporates validated techniques from competitor analysis:

Brier score calibration loop — track (model_probability, outcome) per bucket, recalibrate weekly. This is the core metric that proves calibration quality and feeds the reliability diagram.

Z-score significance testing — edge magnitude alone is insufficient. A 10% edge with σ = 15% is noise; a 10% edge with σ = 3% is a signal. Minimum z-score threshold of 1.5 filters false positive trades.

RMSE-by-lead-time confidence scaling — wider sigma for longer lead forecasts. Lead time is a first-order driver of forecast uncertainty. The engine scales confidence by D+0 through D+7 lead time, not just city and month.

METAR blending for D+0 — on the settlement day itself, live surface observations progressively shift the probability distribution toward observed reality. This enables mid-day position adjustments when the market is still liquid.

Smart source selection — HRRR/NAM for near-term US forecasts (higher resolution, shorter horizon), ECMWF for longer range. Absorbed into the ensemble weighting framework as a pre-filter before inverse-error weighting.

Forecast-divergence monitoring — if the latest NWP run shifts the engine probability by more than 10 percentage points from entry, the position is flagged for review.


Incumbent Comparison

Beyond prediction market tools, the engine is positioned against established weather intelligence companies:

CapabilityClimavisionMeteomaticsXweather (Vaisala)DTNSalientCliff Horizon
Proprietary NWP model0.67–2 km1 km, 15-minSettlement dataProphetXS2S MLPost-processing layer
Multi-variable risk integrationNoNoNoNoNoYes — temperature + rainfall + irradiance + wind
Prediction market validationNoNoNoNoNoYes — ForecastEx live track record
Satellite data layerNoNoNoNoNoYes — SatSure Sparta
Emerging market focusCONUSEurope/USGlobal stationsUS/EuropeGlobalSEA, India, Middle East, Africa
Parametric derivative pricingNoNoIndex provisionNoNoYes — Ensuro Risk Module
Cash-paying warrantyNoNoNoNoNoYes — Tier 2 warranted analytics

What Is Not an Edge

Cliff Horizon does not compete on short-term US temperature forecasting accuracy against Climavision (0.67 km resolution) or Meteomatics (1 km, 15-minute temporal). The ForecastEx proving ground requires the engine to be better calibrated than the market price — which is set by retail traders, simple GFS bots, and informed participants. The bar is "beat the crowd," not "beat Climavision."

Similarly, Xweather has 100,000 ground stations and DTN has the largest private meteorological team. The engine's value is in synthesis — combining public NWP + SatSure + behavioural signals into calibrated probability — not in owning the data.