ForecastEx Proving Ground
How Cliff Horizon validates calibration accuracy via ForecastEx temperature contracts on IBKR ForecastTrader
Why a Proving Ground
Calibrated probability is only valuable if it is demonstrably accurate. Backtests are necessary but insufficient — they are retrospective, cherry-pickable, and carry no financial conviction. Cliff Horizon validates its engine on ForecastEx temperature contracts: a regulated, daily-settling weather prediction market where real capital is at risk and every position resolves against independently published data.
The proving ground serves one purpose: when the engine says 70%, the outcome happens 70% of the time. The reliability diagram and Brier scores generated from live trading become the credential that unlocks Tier 2 warranted analytics and Tier 3 parametric derivatives via Ensuro.
Platform — IBKR ForecastTrader
ForecastEx weather markets are accessible via Interactive Brokers. Cliff Horizon (Singapore-domiciled) accesses ForecastTrader via IBKR Singapore Pte. Ltd., a confirmed eligible entity.
| Feature | Detail |
|---|---|
| Commission | Zero on Forecast Contracts |
| Incentive coupon | Minimum rate: (EFFR − 50 bps) / 2, accruing daily on open position value |
| API access | Full TWS API and Web API for automated order placement |
| Audit trail | Complete trade history with timestamps, entry prices, and settlement outcomes |
Contract Structure
ForecastEx temperature contracts are binary YES/NO contracts priced between $0.01 and $1.00.
Event question: "Will the daily high in [region] exceed [##]°F on [date]?"
Key specifications:
- "Exceed" means strictly greater than — if the threshold is 75°F and the observed high is exactly 75°F, the outcome is NO. This distinction matters at boundary strikes where distribution mass is concentrated.
- Product code: DH + three-letter NWS region code (e.g. DHNYC, DHORD)
- YES pays $1.00 if daily high exceeds the threshold; NO pays $1.00 if it does not
- Maximum loss: premium paid per contract
- Minimum tick: $0.01
- Thresholds (strikes): Listed at ForecastEx's discretion
- Last trading time: 23:59 local time on the contract date
- Position accountability level: 250,000 contracts per Forecast Market
- Order type: Limit orders only — no market orders permitted (Rule 403)
- Collateralisation: Full collateralisation required at all times (Rule 608)
Settlement Source
Settlement uses the NWS Climatological Report (Daily) — published the morning following the contract date. The settlement value is the "Maximum Observed Value" in the Temperature table of the report titled "The [region] CLIMATE SUMMARY FOR [date]..."
Fallback Hierarchy
If the Climatological Report has issues, a precise fallback mechanism applies:
- Normal case: Settlement uses the daily Climatological Report maximum temperature
- Discrepancy case: If the daily report maximum is less than the intraday report value, settlement is delayed until 10:00 AM CT to allow for a revised version
- No revision by 10:00 AM CT: Settles on the daily report value as published
- No final version published: Resolves to the greater of (a) the intraday report maximum, or (b) the highest METAR observation
Settlement Timing
- Resolution time: When the NWS Climatological Report is released for the relevant region
- Settlement: 13:00 CT on day of resolution if resolution occurs before 12:00 CT; otherwise 13:00 CT the following day
- Settlement is irrevocable and unconditional (Rule 603)
- Temperature contracts are not subject to early resolution
The engine monitors both the Climatological Report and live METAR data. In cases where the report is delayed or contested, knowing the METAR ground truth provides an information edge on likely resolution outcome.
Exchange Economics
Contract pair creation fee: YES + NO must collectively cost $1.01 for a pair to be created (Rule 602(c)). The $0.01 excess is the exchange fee.
Offsetting positions: A trader cannot simultaneously hold YES and NO for the same Forecast Market. If offsetting occurs pre-resolution, netting deadline is 16:00 CT daily. Offset = both positions cancelled, $1.00 credited per pair (Rule 604).
Collateral rebalancing: At least once per Settlement Bank Business Day at 16:00 CT — mark-to-market, but no P&L is realised until settlement (Rule 609(f)).
Edge Calculation
The engine's edge on any given contract is the difference between its calibrated probability and the market-implied probability:
edge = engine_probability − market_price
For a YES contract priced at $0.65, the market implies P(YES) = 0.65. If the engine calculates P(YES) = 0.80, the edge is +$0.15.
Trade Guardrails
| Guardrail | Value |
|---|---|
| Category lock | Weather/temperature markets only — hard-coded |
| Confidence threshold | No trade below 70% confidence |
| Position cap | Maximum 5% of account per trade |
| Daily loss limit | Halt if daily drawdown exceeds 10% |
| Maximum concurrent positions | 5 |
| Position sizing | Fractional Kelly (0.25×) |
| Z-score minimum | 1.5 — edge must be statistically significant, not just large |
Track Record Schema
Every position is logged with full layer attribution:
| Field | Description |
|---|---|
trade_id | Unique identifier |
date | Entry date |
city | Target city |
strike | ForecastEx strike temperature |
direction | YES / NO |
ch_probability | Engine-implied probability |
market_probability | ForecastEx consensus at entry |
edge | CH probability minus market probability |
confidence_score | 0–100 |
entry_price | Price paid per contract |
settlement_outcome | Actual observed temperature |
pnl | Profit / loss |
model_correct | Boolean: did the engine beat NWS? |
Layer-specific attribution fields (physical, behavioural, meteorological) are recorded for every trade to measure each layer's contribution to accuracy over time.
Monthly Report Metrics
Win rate by city, accuracy vs NWS-LAMP baseline, mean absolute error, layer-by-layer contribution, Sharpe ratio, maximum drawdown, return on capital.
From Proving Ground to Commercial Product
The proving ground is not the business — it is the credential. The 90-day reliability diagram and Brier scores from ForecastEx trading become the evidence presented to:
- Tier 2 clients — justifying the cash-paying performance warranty
- Ensuro — qualifying as a Risk Module partner with demonstrated calibration accuracy
- Institutional prospects — differentiating Cliff Horizon from competitors who rely on backtests alone
ForecastEx is the only regulated, daily-settling weather market with transparent pricing. No equivalent exists for rainfall, irradiance, or wind. The temperature track record bridges to the multi-variable product: "We proved our calibration methodology works where you can verify it. Trust us to apply the same methodology to the variables where you can't."
Multi-Horizon Proving Ground
ForecastEx daily temperature is the primary proving ground, but the engine must demonstrate calibration at every horizon where it will price derivatives. A construction delay derivative that triggers on 14-day accumulated rainfall, or a PPA shortfall derivative that triggers on 30-day irradiance deficit, requires proven calibration at those timescales.
| Horizon | Market / Benchmark | Variable | Purpose |
|---|---|---|---|
| Daily (1–3 days) | ForecastEx + Kalshi daily high/rain/snow | Temperature, precipitation | Primary credential. Methodology proof. |
| 8–14 days | CPC 6–10 and 8–14 day outlooks | Temperature + precipitation terciles | Free public benchmark at the horizon where commercial products sit. Outperformance vs CPC = communicable credential. |
| Monthly | Kalshi monthly snowfall/rainfall totals | Accumulated precipitation | Closest market to real parametric derivative triggers. |
| Seasonal | CME HDD/CDD degree day futures | Cumulative temperature deviation | Portfolio-level risk validation. |
The further into the future the horizon extends, the larger the delta between naive historical-data-based predictions and a well-calibrated engine — and the more valuable the derivative. Daily proves the method. Medium-range is where the commercial product lives and where nobody else operates.