Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Pricing
Podcast Image

Earthly Machine Learning

Probabilistic Measures for Fair AI and NWP Model Comparison

07 Nov 2025

Description

Probabilistic measures afford fair comparisons of AIWP and NWP model output (Tilmann Gneiting, Tobias Biegert, Kristof Kraus, Eva-Maria Walz, Alexander I. Jordan, Sebastian Lerch, June 10, 2025)Introduction of a New Fair Comparison Metric: The paper introduces the Potential Continuous Ranked Probability Score (PC), a new measure designed to allow fair and meaningful comparisons between single-valued output from data-driven Artificial Intelligence based Weather Prediction (AIWP) models and physics-based Numerical Weather Prediction (NWP) models. This approach addresses concerns that traditional loss functions (like RMSE) may unfairly favor AIWP models, which often optimize their training using these metrics. Methodology Based on Probabilistic Postprocessing: PC is calculated by applying the same statistical postprocessing technique—specifically Isotonic Distributional Regression (IDR), also known as Easy Uncertainty Quantification (EasyUQ)—to the deterministic output of both AIWP and NWP models. PC is then defined as the mean Continuous Ranked Probability Score (CRPS) of these newly generated probabilistic forecasts. Measure of Potential Skill and Invariance: PC quantifies potential predictive performance. A key property of PC is that it is invariant under strictly increasing transformations of the model output, treating both forecasts equally and facilitating comparisons where the pre-specification of a loss function might otherwise place competitors on unequal footings. AIWP Outperformance and Operational Proxy: When applied to WeatherBench 2 data, the PC measure demonstrated that the data-driven GraphCast model outperforms the leading physics-based ECMWF high-resolution (HRES) model. Furthermore, the PC measure for the HRES model was found to align exceptionally well with the mean CRPS of the operational ECMWF ensemble, confirming that PC serves as a reliable proxy for the performance of real-time operational probabilistic products.

Audio
Featured in this Episode

No persons identified in this episode.

Transcription

This episode hasn't been transcribed yet

Help us prioritize this episode for transcription by upvoting it.

0 upvotes
🗳️ Sign in to Upvote

Popular episodes get transcribed faster

Comments

There are no comments yet.

Please log in to write the first comment.