KatRisk Logo White

Dos and Don’ts for SCS Modeling: How to Identify and Prevent Data Bias 

Share
severe convective storm rural area

When it comes to severe convective storms (SCS), models are only as good as the data they’re built on. That’s both the strength and the Achilles’ heel of the industry’s current approach. 

In the U.S. and abroad, SCS data is riddled with blind spots. Urban bias, reporting inconsistencies, and misuse of radar data all contribute to models that appear usable at first glance but collapse under scrutiny. The result? Underwriters and reinsurers who are forced to mistrust model outputs that don’t reflect the true risk. 

Here’s what you need to know. 

What Data Exists for SCS Modeling? 

SCS Data in the U.S. 

Most SCS models are built on NOAA’s local storm reports (LSRs) and doppler radar (WSR-88D) data. These sources track tornadoes, hail, and damaging winds. On paper, this seems robust. In practice, both have serious flaws: 

  • LSRs: Heavily biased toward populated areas. A hailstorm over rural Kansas might never be logged, while a golf ball sized hail stone hitting Dallas leaves a trail of reports. This creates an urban reporting bias.  Many LSRs are also, at best, only an estimate of hazard severity and do not accurately reflect truth.  For example, several hail sizes that match familiar household objects are over reported as reporters typically only eyeball hail size and do not use officially measuring tools (like a ruler!). Additionally, when hailstones are reported, it’s impossible to know how long that stone may have been in place and what degree of melt has already occurred. Finally, if multiple severe weather hazards occur, such as hail and tornadoes, the least extreme hazard is often underreported as the focus shifts to the more extremely effected areas. 
  • Doppler radar: Useful, but only up to a point. Radar detects precipitation and can be used to estimate hail size, though these estimates are often inaccurate depending on local conditions.  While doppler radar can infer wind speed information, the radar beam is too high to measure surface wind gusts associated with a tornado or straight-line wind events.  Even seasoned forecasters have a hard time distinguishing between storms that do and do not produce a tornado using doppler radar data alone. 

Put simply: use radar data sparingly, and never as the backbone of your model. 

SCS Data Abroad 

Outside the U.S., the challenges multiply.  Radar coverage is sparse.  Further, the most common radar system outside the U.S. degrades in hailstorms due to its design, making comprehensive detection impossible. 

  • Europe: Patchwork reporting systems that vary between countries. SCS events are increasingly documented in some countries (e.g., Germany, Poland), but historical completeness remains poor. 
  • Australia: Data is strong for metro regions (Sydney, Melbourne), but rural coverage is thin. 
  • Latin America & Asia: Records are sparse, with only the most catastrophic events are logged. 
  • Africa: Virtually no structured datasets, leaving insurers blind to real hazard frequency. 

In short, global SCS data is fragmented, inconsistent, and often incomplete. 

What Approach Has Been Used to model Severe Convective Storms in the U.S.? 

Most U.S. SCS models rely heavily on NOAA records and storm reports, tuning them to reproduce historical loss events. That’s why many models seem to “fit” when back-tested, but fitting history is not the same as predicting risk. 

The problem is that these models inherit all the biases baked into the observations: 

  • Overrepresentation of cities and underrepresentation of rural impacts 
  • Inaccurate estimates of hazard severity 
  • Poor understanding of hazard spatial extent due to poor observational coverage 
  • The past does not predict the future as buildings and neighborhoods spread beyond the boundaries of traditional cities into area with no historical claims history 

The Approach SCS Modeling Demands 

Whether in the U.S. or abroad, the lesson is the same: historical reports and radar alone can’t support a reliable model. They’re too fragmented, too biased, and too incomplete to capture the true behavior of severe convective storms. 

The better approach is to start from the physics of the peril, not the patchwork of past reports. That means: 

  • Understanding how atmospheric weather conditions impact the potential for each severe weather hazard  
  • Using stochastic simulations to generate millions of plausible storm tracks, intensities, and footprints, not just replaying history 
  • Building hazard fields that capture tornado, hail, and straight-line wind as they interact in the atmosphere 
  • Stress-testing outputs at different resolutions to ensure the model converges and holds its shape under scrutiny 
  • Incorporating climate signals and exposure growth to reflect where risk is going, not just where it’s been 

This approach moves the industry from fitting the past to anticipating the full spectrum of possible futures and is how KatRisk has approached the development of our own SCS model.

What Does Data Bias Look Like in a Risk Map? 

Imagine a risk map that only reflects what’s been reported. High-risk zones cluster around major metros: Dallas, Denver, Chicago. Rural regions appear “safe,” not because they are, but because no one was there to file a report. 

This is what a biased model produces: a distorted picture of risk that mirrors population density, not storm behavior. 

Now imagine using that map for underwriting. You’d underprice rural exposures, overestimate city clustering, and misjudge reinsurance needs. That’s the cost of ignoring data bias. 

What This Means for Your SCS Risk Decisions 

When evaluating an SCS model, the question isn’t just “what data does it use?” but “how does it overcome the limitations of that data?” 

If your provider is building upon biased inputs without accounting for those flaws, their outputs can’t be trusted. 

Here are six uncomfortable but necessary questions you should be asking your modelling provider: 

  1. What resolution is your model built at, does the output change as resolution increases, and is the resolution sufficient to resolve the physical gradients of the underlaying peril? 
  1. Do you use radar data for tornadoes or wind  and if so, why and how? 
  1. How do you account for underreporting in rural regions? 
  1. Are peril interactions (hail + tornado + wind) explicitly modelled or treated independently? 
  1. Can you show side-by-side results with and without raw historical storm reports? 
  1. How does your model adapt to international regions with limited data? 

If the answers aren’t clear, you’re looking at a model that may be built on sand. 

Severe convective storms demand more than backward-looking catalogues and biased storm reports. They demand models that replicate real-world peril behavior and simulate what hasn’t happened yet, but could. That’s why KatRisk’s SCS model is built to overcome data bias, not reinforce it. 

Share this post