Claims Correlation Analysis Methodology

v2.0 · March 2026

A comprehensive methodology for analyzing the statistical relationship between cybersecurity risk scores and observed cyber incident frequency across an insurance portfolio. This analysis enables underwriters to validate score-based pricing and identify risk concentrations.

1. Executive Summary

The Claims Correlation Analysis module quantifies the statistical relationship between Rankiteo cybersecurity scores and observed cyber incident frequency for companies within an insurance portfolio. By computing Pearson correlation coefficients and analyzing incident rates across score bands, underwriters gain empirical evidence that lower-scored companies experience disproportionately more cyber incidents.

Key objectives:

Validate the predictive power of cybersecurity scores against real-world incident data
Identify score bands with elevated incident rates for pricing adjustments
Surface the most common incident types across the portfolio
Flag highest-risk companies that may require additional underwriting scrutiny

The Rankiteo AI Cyber Underwriting Platform is the most advanced cyber underwriting platform on the market, combining real-time threat intelligence, proprietary scoring algorithms, and actuarial-grade analytics into a single integrated solution.

2. Data Collection

The analysis ingests data from two primary sources within the Rankiteo platform:

2.1 Cybersecurity Scores

Company scores are retrieved from Rankiteo's company security scoring engine. Each record contains a numeric score (0–1000) representing the company's overall cyber risk posture as assessed by Rankiteo's scoring engine.

2.2 Incident Data

Cyber incidents are sourced from Rankiteo's cyber incident intelligence feed. Each incident record contains a company identifier field that associates the incident with one or more portfolio companies. This identifier field can be either:

A string — linking the incident to a single company
An array of strings — linking the incident to multiple companies (e.g., supply-chain events)

This dual-type handling is critical for accurate incident attribution and is addressed via array normalization in the data processing pipeline.

2.3 Portfolio Scope

The analysis is scoped to the current user's portfolio, which is a curated list of companies tracked for underwriting purposes. Only companies present in the portfolio are included in correlation calculations.

3. Score-Incident Correlation

For each company in the portfolio, the system fetches the current cybersecurity score and counts the total number of linked incidents. This produces a set of paired observations (score_i, incidents_i) for each company i.

3.1 Scatter Plot

The paired data is rendered as an interactive scatter plot where the X-axis represents the cybersecurity score (0–1000) and the Y-axis represents incident count. Each point represents a single company in the portfolio.

3.2 Pearson Correlation Coefficient

The strength and direction of the linear relationship is quantified using the Pearson product-moment correlation coefficient:

r = SUM((x_i - x_mean)(y_i - y_mean)) / sqrt(SUM((x_i - x_mean)^2) * SUM((y_i - y_mean)^2)) Where: x_i = cybersecurity score for company i y_i = incident count for company i x_mean = mean score across all portfolio companies y_mean = mean incident count across all portfolio companies n = number of companies in the portfolio

Interpretation guidelines:

r Value	Interpretation	Implication
-1.0 to -0.7	Strong negative correlation	Higher scores strongly associated with fewer incidents
-0.7 to -0.4	Moderate negative correlation	Scores are a meaningful predictor of incident frequency
-0.4 to -0.2	Weak negative correlation	Some predictive signal exists but is limited
-0.2 to 0.2	No meaningful correlation	Scores and incidents appear independent
0.2 to 1.0	Positive correlation (unexpected)	Warrants investigation — possible data quality issue

A negative correlation is expected: companies with higher (better) cybersecurity scores should exhibit fewer incidents. The magnitude of r indicates how reliably scores predict incident frequency.

4. Band Correlation Analysis

Companies are grouped into Rankiteo score bands to analyze incident patterns at the categorical level. This provides a more intuitive view for underwriters who think in terms of risk tiers rather than continuous scores.

4.1 Score Band Definitions

Band	Score Range	Risk Level
Aaa	900 – 1000	Minimal Risk
Aa	850 – 899	Very Low Risk
A	800 – 849	Low Risk
Baa	750 – 799	Moderate Risk
Ba	700 – 749	Elevated Risk
B	650 – 699	High Risk
Caa	600 – 649	Very High Risk
Ca	550 – 599	Near Default
C	0 – 549	Distressed

4.2 Per-Band Metrics

For each score band, the following metrics are computed:

Metric	Formula	Description
`company_count`	COUNT(companies in band)	Number of portfolio companies in this score band
`total_incidents`	SUM(incidents) for all companies in band	Aggregate incident count across the band
`with_incidents_count`	COUNT(companies WHERE incidents > 0)	Number of companies that have experienced at least one incident
`avg_incidents`	total_incidents / company_count	Average number of incidents per company in the band
`incident_rate`	(with_incidents_count / company_count) × 100	Percentage of companies in the band that have incidents

4.3 Expected Pattern

Under a well-calibrated scoring model, lower-rated bands (C, Ca, Caa) should exhibit significantly higher avg_incidents and incident_rate values compared to higher-rated bands (Aaa, Aa, A). A monotonic decrease in incident rate from C to Aaa provides strong validation of the scoring methodology.

5. Incident Type Distribution

The system aggregates all incidents linked to portfolio companies and groups them by incident type. The top 10 incident types by frequency are displayed, providing underwriters with a portfolio-level view of threat composition.

Common incident types include:

Incident Type	Description	Typical Severity
Ransomware	Encryption-based extortion attacks	Critical
Data Breach	Unauthorized access to sensitive data	High
Phishing	Social engineering attacks via email	Medium
DDoS	Distributed denial-of-service attacks	Medium
Malware	Malicious software infections	High
Insider Threat	Malicious or negligent internal actors	High
Supply Chain	Compromise via third-party vendors	Critical
Vulnerability Exploit	Exploitation of known CVEs	Medium–High
Business Email Compromise	Email account takeover for fraud	High
Credential Theft	Stolen login credentials	Medium

This distribution helps underwriters understand which threat vectors dominate the portfolio and may inform coverage-specific pricing adjustments.

6. Risk Identification

The module identifies the “worst” companies in the portfolio by ranking them in descending order of incident count. This surfaces companies that may require:

Increased premium loading — to reflect elevated loss potential
Exclusion consideration — if incident history is extreme
Active risk management engagement — to reduce future exposure
Enhanced monitoring — more frequent score re-evaluation

For each flagged company, the output includes the company name, current score, score band, and total incident count, enabling underwriters to cross-reference with the score-incident scatter plot.

7. Incident Aggregation Pipeline

Incident counts per company are computed using a multi-step data processing pipeline that handles the dual-type company identifier field (single value or list of values). The pipeline normalizes list-valued identifiers into individual records before grouping.

# Incident aggregation pipeline (pseudocode) Step 1: Filter incidents linked to portfolio companies Select incidents where company_identifier is in portfolio_company_ids Step 2: Normalize multi-company incidents If company_identifier is a list, expand into one record per company If company_identifier is a single value, keep as-is Step 3: Re-filter after normalization Ensure only portfolio company identifiers remain Step 4: Group by company and count incidents For each company_identifier: count total incidents collect unique incident types Step 5: Sort by incident count (highest first)

Why normalization is necessary: A single incident may affect multiple companies (e.g., a supply-chain breach). The company identifier field stores these as a list. Without normalization, the incident would only be counted once and attributed to the list as a whole rather than to each individual company.

8. Data Sources

Data Source	Purpose	Key Information
Company security scoring engine	Cybersecurity risk scores	Company identifier, score, band
Cyber incident intelligence feed	Cyber incident records	Company identifier, type, date, severity
Portfolio management system	User portfolio membership	User identifier, company identifiers

9. Glossary

Term	Definition
Pearson r	A measure of linear correlation between two variables, ranging from -1 (perfect negative) to +1 (perfect positive).
Score Band	A categorical grouping (Aaa through C) derived from the continuous 0–1000 cybersecurity score.
Incident Rate	The percentage of companies within a score band that have experienced at least one cyber incident.
Company Identifier	A field in the incident data that associates an incident with one or more companies. Can be a single value or a list of values.
Array Normalization	A data processing step that expands list-valued fields into individual records, enabling accurate per-company attribution.
Portfolio	A user-curated set of companies tracked for insurance underwriting purposes.
Scatter Plot	A chart plotting individual companies as points with score on the X-axis and incident count on the Y-axis.
avg_incidents	The mean number of incidents per company within a given score band.

This methodology document is maintained by the Rankiteo Analytics team. For questions or feedback, contact [email protected]. Last updated March 2026.