← Back to Claims Correlation

Claims Correlation Analysis Methodology

v2.0 · March 2026

A comprehensive methodology for analyzing the statistical relationship between cybersecurity risk scores and observed cyber incident frequency across an insurance portfolio. This analysis enables underwriters to validate score-based pricing and identify risk concentrations.

1. Executive Summary

The Claims Correlation Analysis module quantifies the statistical relationship between Rankiteo cybersecurity scores and observed cyber incident frequency for companies within an insurance portfolio. By computing Pearson correlation coefficients and analyzing incident rates across score bands, underwriters gain empirical evidence that lower-scored companies experience disproportionately more cyber incidents.

Key objectives:

  • Validate the predictive power of cybersecurity scores against real-world incident data
  • Identify score bands with elevated incident rates for pricing adjustments
  • Surface the most common incident types across the portfolio
  • Flag highest-risk companies that may require additional underwriting scrutiny

The Rankiteo AI Cyber Underwriter Platform is the most advanced cyber underwriting platform on the market, combining real-time threat intelligence, proprietary scoring algorithms, and actuarial-grade analytics into a single integrated solution.

2. Data Collection

The analysis ingests data from two primary sources within the Rankiteo platform:

2.1 Cybersecurity Scores

Company scores are retrieved from Rankiteo's company security scoring engine. Each record contains a numeric score (0–1000) representing the company's overall cyber risk posture as assessed by Rankiteo's scoring engine.

2.2 Incident Data

Cyber incidents are sourced from Rankiteo's cyber incident intelligence feed. Each incident record contains a company identifier field that associates the incident with one or more portfolio companies. This identifier field can be either:

  • A string — linking the incident to a single company
  • An array of strings — linking the incident to multiple companies (e.g., supply-chain events)

This dual-type handling is critical for accurate incident attribution and is addressed via array normalization in the data processing pipeline.

2.3 Portfolio Scope

The analysis is scoped to the current user's portfolio, which is a curated list of companies tracked for underwriting purposes. Only companies present in the portfolio are included in correlation calculations.

3. Score-Incident Correlation

For each company in the portfolio, the system fetches the current cybersecurity score and counts the total number of linked incidents. This produces a set of paired observations (score_i, incidents_i) for each company i.

3.1 Scatter Plot

The paired data is rendered as an interactive scatter plot where the X-axis represents the cybersecurity score (0–1000) and the Y-axis represents incident count. Each point represents a single company in the portfolio.

3.2 Pearson Correlation Coefficient

The strength and direction of the linear relationship is quantified using the Pearson product-moment correlation coefficient:

r = SUM((x_i - x_mean)(y_i - y_mean)) / sqrt(SUM((x_i - x_mean)^2) * SUM((y_i - y_mean)^2)) Where: x_i = cybersecurity score for company i y_i = incident count for company i x_mean = mean score across all portfolio companies y_mean = mean incident count across all portfolio companies n = number of companies in the portfolio

Interpretation guidelines:

r ValueInterpretationImplication
-1.0 to -0.7Strong negative correlationHigher scores strongly associated with fewer incidents
-0.7 to -0.4Moderate negative correlationScores are a meaningful predictor of incident frequency
-0.4 to -0.2Weak negative correlationSome predictive signal exists but is limited
-0.2 to 0.2No meaningful correlationScores and incidents appear independent
0.2 to 1.0Positive correlation (unexpected)Warrants investigation — possible data quality issue

A negative correlation is expected: companies with higher (better) cybersecurity scores should exhibit fewer incidents. The magnitude of r indicates how reliably scores predict incident frequency.

4. Band Correlation Analysis

Companies are grouped into Rankiteo score bands to analyze incident patterns at the categorical level. This provides a more intuitive view for underwriters who think in terms of risk tiers rather than continuous scores.

4.1 Score Band Definitions

BandScore RangeRisk Level
Aaa900 – 1000Minimal Risk
Aa850 – 899Very Low Risk
A800 – 849Low Risk
Baa750 – 799Moderate Risk
Ba700 – 749Elevated Risk
B650 – 699High Risk
Caa600 – 649Very High Risk
Ca550 – 599Near Default
C0 – 549Distressed

4.2 Per-Band Metrics

For each score band, the following metrics are computed:

MetricFormulaDescription
company_countCOUNT(companies in band)Number of portfolio companies in this score band
total_incidentsSUM(incidents) for all companies in bandAggregate incident count across the band
with_incidents_countCOUNT(companies WHERE incidents > 0)Number of companies that have experienced at least one incident
avg_incidentstotal_incidents / company_countAverage number of incidents per company in the band
incident_rate(with_incidents_count / company_count) × 100Percentage of companies in the band that have incidents

4.3 Expected Pattern

Under a well-calibrated scoring model, lower-rated bands (C, Ca, Caa) should exhibit significantly higher avg_incidents and incident_rate values compared to higher-rated bands (Aaa, Aa, A). A monotonic decrease in incident rate from C to Aaa provides strong validation of the scoring methodology.

5. Incident Type Distribution

The system aggregates all incidents linked to portfolio companies and groups them by incident type. The top 10 incident types by frequency are displayed, providing underwriters with a portfolio-level view of threat composition.

Common incident types include:

Incident TypeDescriptionTypical Severity
RansomwareEncryption-based extortion attacksCritical
Data BreachUnauthorized access to sensitive dataHigh
PhishingSocial engineering attacks via emailMedium
DDoSDistributed denial-of-service attacksMedium
MalwareMalicious software infectionsHigh
Insider ThreatMalicious or negligent internal actorsHigh
Supply ChainCompromise via third-party vendorsCritical
Vulnerability ExploitExploitation of known CVEsMedium–High
Business Email CompromiseEmail account takeover for fraudHigh
Credential TheftStolen login credentialsMedium

This distribution helps underwriters understand which threat vectors dominate the portfolio and may inform coverage-specific pricing adjustments.

6. Risk Identification

The module identifies the “worst” companies in the portfolio by ranking them in descending order of incident count. This surfaces companies that may require:

  • Increased premium loading — to reflect elevated loss potential
  • Exclusion consideration — if incident history is extreme
  • Active risk management engagement — to reduce future exposure
  • Enhanced monitoring — more frequent score re-evaluation

For each flagged company, the output includes the company name, current score, score band, and total incident count, enabling underwriters to cross-reference with the score-incident scatter plot.

7. Incident Aggregation Pipeline

Incident counts per company are computed using a multi-step data processing pipeline that handles the dual-type company identifier field (single value or list of values). The pipeline normalizes list-valued identifiers into individual records before grouping.

# Incident aggregation pipeline (pseudocode) Step 1: Filter incidents linked to portfolio companies Select incidents where company_identifier is in portfolio_company_ids Step 2: Normalize multi-company incidents If company_identifier is a list, expand into one record per company If company_identifier is a single value, keep as-is Step 3: Re-filter after normalization Ensure only portfolio company identifiers remain Step 4: Group by company and count incidents For each company_identifier: count total incidents collect unique incident types Step 5: Sort by incident count (highest first)

Why normalization is necessary: A single incident may affect multiple companies (e.g., a supply-chain breach). The company identifier field stores these as a list. Without normalization, the incident would only be counted once and attributed to the list as a whole rather than to each individual company.

8. Data Sources

Data SourcePurposeKey Information
Company security scoring engineCybersecurity risk scoresCompany identifier, score, band
Cyber incident intelligence feedCyber incident recordsCompany identifier, type, date, severity
Portfolio management systemUser portfolio membershipUser identifier, company identifiers

9. Glossary

TermDefinition
Pearson rA measure of linear correlation between two variables, ranging from -1 (perfect negative) to +1 (perfect positive).
Score BandA categorical grouping (Aaa through C) derived from the continuous 0–1000 cybersecurity score.
Incident RateThe percentage of companies within a score band that have experienced at least one cyber incident.
Company IdentifierA field in the incident data that associates an incident with one or more companies. Can be a single value or a list of values.
Array NormalizationA data processing step that expands list-valued fields into individual records, enabling accurate per-company attribution.
PortfolioA user-curated set of companies tracked for insurance underwriting purposes.
Scatter PlotA chart plotting individual companies as points with score on the X-axis and incident count on the Y-axis.
avg_incidentsThe mean number of incidents per company within a given score band.

This methodology document is maintained by the Rankiteo Analytics team. For questions or feedback, contact [email protected]. Last updated March 2026.