Cliff Horizon logo

ForecastEx Proving Ground

How Cliff Horizon validates calibration accuracy via ForecastEx temperature contracts on IBKR ForecastTrader

Why a Proving Ground

Calibrated probability is only valuable if it is demonstrably accurate. Backtests are necessary but insufficient — they are retrospective, cherry-pickable, and carry no financial conviction. Cliff Horizon validates its engine on ForecastEx temperature contracts: a regulated, daily-settling weather prediction market where real capital is at risk and every position resolves against independently published data.

The proving ground serves one purpose: when the engine says 70%, the outcome happens 70% of the time. The reliability diagram and Brier scores generated from live trading become the credential that unlocks Tier 2 warranted analytics and Tier 3 parametric derivatives via Ensuro.


Platform — IBKR ForecastTrader

ForecastEx weather markets are accessible via Interactive Brokers. Cliff Horizon (Singapore-domiciled) accesses ForecastTrader via IBKR Singapore Pte. Ltd., a confirmed eligible entity.

FeatureDetail
CommissionZero on Forecast Contracts
Incentive couponMinimum rate: (EFFR − 50 bps) / 2, accruing daily on open position value
API accessFull TWS API and Web API for automated order placement
Audit trailComplete trade history with timestamps, entry prices, and settlement outcomes

Contract Structure

ForecastEx temperature contracts are binary YES/NO contracts priced between $0.01 and $1.00.

Event question: "Will the daily high in [region] exceed [##]°F on [date]?"

Key specifications:

  • "Exceed" means strictly greater than — if the threshold is 75°F and the observed high is exactly 75°F, the outcome is NO. This distinction matters at boundary strikes where distribution mass is concentrated.
  • Product code: DH + three-letter NWS region code (e.g. DHNYC, DHORD)
  • YES pays $1.00 if daily high exceeds the threshold; NO pays $1.00 if it does not
  • Maximum loss: premium paid per contract
  • Minimum tick: $0.01
  • Thresholds (strikes): Listed at ForecastEx's discretion
  • Last trading time: 23:59 local time on the contract date
  • Position accountability level: 250,000 contracts per Forecast Market
  • Order type: Limit orders only — no market orders permitted (Rule 403)
  • Collateralisation: Full collateralisation required at all times (Rule 608)

Settlement Source

Settlement uses the NWS Climatological Report (Daily) — published the morning following the contract date. The settlement value is the "Maximum Observed Value" in the Temperature table of the report titled "The [region] CLIMATE SUMMARY FOR [date]..."

Fallback Hierarchy

If the Climatological Report has issues, a precise fallback mechanism applies:

  1. Normal case: Settlement uses the daily Climatological Report maximum temperature
  2. Discrepancy case: If the daily report maximum is less than the intraday report value, settlement is delayed until 10:00 AM CT to allow for a revised version
  3. No revision by 10:00 AM CT: Settles on the daily report value as published
  4. No final version published: Resolves to the greater of (a) the intraday report maximum, or (b) the highest METAR observation

Settlement Timing

  • Resolution time: When the NWS Climatological Report is released for the relevant region
  • Settlement: 13:00 CT on day of resolution if resolution occurs before 12:00 CT; otherwise 13:00 CT the following day
  • Settlement is irrevocable and unconditional (Rule 603)
  • Temperature contracts are not subject to early resolution

The engine monitors both the Climatological Report and live METAR data. In cases where the report is delayed or contested, knowing the METAR ground truth provides an information edge on likely resolution outcome.


Exchange Economics

Contract pair creation fee: YES + NO must collectively cost $1.01 for a pair to be created (Rule 602(c)). The $0.01 excess is the exchange fee.

Offsetting positions: A trader cannot simultaneously hold YES and NO for the same Forecast Market. If offsetting occurs pre-resolution, netting deadline is 16:00 CT daily. Offset = both positions cancelled, $1.00 credited per pair (Rule 604).

Collateral rebalancing: At least once per Settlement Bank Business Day at 16:00 CT — mark-to-market, but no P&L is realised until settlement (Rule 609(f)).


Edge Calculation

The engine's edge on any given contract is the difference between its calibrated probability and the market-implied probability:

edge = engine_probability − market_price

For a YES contract priced at $0.65, the market implies P(YES) = 0.65. If the engine calculates P(YES) = 0.80, the edge is +$0.15.

Trade Guardrails

GuardrailValue
Category lockWeather/temperature markets only — hard-coded
Confidence thresholdNo trade below 70% confidence
Position capMaximum 5% of account per trade
Daily loss limitHalt if daily drawdown exceeds 10%
Maximum concurrent positions5
Position sizingFractional Kelly (0.25×)
Z-score minimum1.5 — edge must be statistically significant, not just large

Track Record Schema

Every position is logged with full layer attribution:

FieldDescription
trade_idUnique identifier
dateEntry date
cityTarget city
strikeForecastEx strike temperature
directionYES / NO
ch_probabilityEngine-implied probability
market_probabilityForecastEx consensus at entry
edgeCH probability minus market probability
confidence_score0–100
entry_pricePrice paid per contract
settlement_outcomeActual observed temperature
pnlProfit / loss
model_correctBoolean: did the engine beat NWS?

Layer-specific attribution fields (physical, behavioural, meteorological) are recorded for every trade to measure each layer's contribution to accuracy over time.

Monthly Report Metrics

Win rate by city, accuracy vs NWS-LAMP baseline, mean absolute error, layer-by-layer contribution, Sharpe ratio, maximum drawdown, return on capital.


From Proving Ground to Commercial Product

The proving ground is not the business — it is the credential. The 90-day reliability diagram and Brier scores from ForecastEx trading become the evidence presented to:

  • Tier 2 clients — justifying the cash-paying performance warranty
  • Ensuro — qualifying as a Risk Module partner with demonstrated calibration accuracy
  • Institutional prospects — differentiating Cliff Horizon from competitors who rely on backtests alone

ForecastEx is the only regulated, daily-settling weather market with transparent pricing. No equivalent exists for rainfall, irradiance, or wind. The temperature track record bridges to the multi-variable product: "We proved our calibration methodology works where you can verify it. Trust us to apply the same methodology to the variables where you can't."


Multi-Horizon Proving Ground

ForecastEx daily temperature is the primary proving ground, but the engine must demonstrate calibration at every horizon where it will price derivatives. A construction delay derivative that triggers on 14-day accumulated rainfall, or a PPA shortfall derivative that triggers on 30-day irradiance deficit, requires proven calibration at those timescales.

HorizonMarket / BenchmarkVariablePurpose
Daily (1–3 days)ForecastEx + Kalshi daily high/rain/snowTemperature, precipitationPrimary credential. Methodology proof.
8–14 daysCPC 6–10 and 8–14 day outlooksTemperature + precipitation tercilesFree public benchmark at the horizon where commercial products sit. Outperformance vs CPC = communicable credential.
MonthlyKalshi monthly snowfall/rainfall totalsAccumulated precipitationClosest market to real parametric derivative triggers.
SeasonalCME HDD/CDD degree day futuresCumulative temperature deviationPortfolio-level risk validation.

The further into the future the horizon extends, the larger the delta between naive historical-data-based predictions and a well-calibrated engine — and the more valuable the derivative. Daily proves the method. Medium-range is where the commercial product lives and where nobody else operates.