Claims Correlation Analysis Methodology
A comprehensive methodology for analyzing the statistical relationship between cybersecurity risk scores and observed cyber incident frequency across an insurance portfolio. This analysis enables underwriters to validate score-based pricing and identify risk concentrations.
1. Executive Summary
The Claims Correlation Analysis module quantifies the statistical relationship between Rankiteo cybersecurity scores and observed cyber incident frequency for companies within an insurance portfolio. By computing Pearson correlation coefficients and analyzing incident rates across score bands, underwriters gain empirical evidence that lower-scored companies experience disproportionately more cyber incidents.
Key objectives:
- Validate the predictive power of cybersecurity scores against real-world incident data
- Identify score bands with elevated incident rates for pricing adjustments
- Surface the most common incident types across the portfolio
- Flag highest-risk companies that may require additional underwriting scrutiny
The Rankiteo AI Cyber Underwriter Platform is the most advanced cyber underwriting platform on the market, combining real-time threat intelligence, proprietary scoring algorithms, and actuarial-grade analytics into a single integrated solution.
2. Data Collection
The analysis ingests data from two primary sources within the Rankiteo platform:
2.1 Cybersecurity Scores
Company scores are retrieved from Rankiteo's company security scoring engine. Each record contains a numeric score (0–1000) representing the company's overall cyber risk posture as assessed by Rankiteo's scoring engine.
2.2 Incident Data
Cyber incidents are sourced from Rankiteo's cyber incident intelligence feed. Each incident record contains a company identifier field that associates the incident with one or more portfolio companies. This identifier field can be either:
- A string — linking the incident to a single company
- An array of strings — linking the incident to multiple companies (e.g., supply-chain events)
This dual-type handling is critical for accurate incident attribution and is addressed via array normalization in the data processing pipeline.
2.3 Portfolio Scope
The analysis is scoped to the current user's portfolio, which is a curated list of companies tracked for underwriting purposes. Only companies present in the portfolio are included in correlation calculations.
3. Score-Incident Correlation
For each company in the portfolio, the system fetches the current cybersecurity score and counts the total number of linked incidents. This produces a set of paired observations (score_i, incidents_i) for each company i.
3.1 Scatter Plot
The paired data is rendered as an interactive scatter plot where the X-axis represents the cybersecurity score (0–1000) and the Y-axis represents incident count. Each point represents a single company in the portfolio.
3.2 Pearson Correlation Coefficient
The strength and direction of the linear relationship is quantified using the Pearson product-moment correlation coefficient:
Interpretation guidelines:
| r Value | Interpretation | Implication |
|---|---|---|
| -1.0 to -0.7 | Strong negative correlation | Higher scores strongly associated with fewer incidents |
| -0.7 to -0.4 | Moderate negative correlation | Scores are a meaningful predictor of incident frequency |
| -0.4 to -0.2 | Weak negative correlation | Some predictive signal exists but is limited |
| -0.2 to 0.2 | No meaningful correlation | Scores and incidents appear independent |
| 0.2 to 1.0 | Positive correlation (unexpected) | Warrants investigation — possible data quality issue |
A negative correlation is expected: companies with higher (better) cybersecurity scores should exhibit fewer incidents. The magnitude of r indicates how reliably scores predict incident frequency.
4. Band Correlation Analysis
Companies are grouped into Rankiteo score bands to analyze incident patterns at the categorical level. This provides a more intuitive view for underwriters who think in terms of risk tiers rather than continuous scores.
4.1 Score Band Definitions
| Band | Score Range | Risk Level |
|---|---|---|
| Aaa | 900 – 1000 | Minimal Risk |
| Aa | 850 – 899 | Very Low Risk |
| A | 800 – 849 | Low Risk |
| Baa | 750 – 799 | Moderate Risk |
| Ba | 700 – 749 | Elevated Risk |
| B | 650 – 699 | High Risk |
| Caa | 600 – 649 | Very High Risk |
| Ca | 550 – 599 | Near Default |
| C | 0 – 549 | Distressed |
4.2 Per-Band Metrics
For each score band, the following metrics are computed:
| Metric | Formula | Description |
|---|---|---|
company_count | COUNT(companies in band) | Number of portfolio companies in this score band |
total_incidents | SUM(incidents) for all companies in band | Aggregate incident count across the band |
with_incidents_count | COUNT(companies WHERE incidents > 0) | Number of companies that have experienced at least one incident |
avg_incidents | total_incidents / company_count | Average number of incidents per company in the band |
incident_rate | (with_incidents_count / company_count) × 100 | Percentage of companies in the band that have incidents |
4.3 Expected Pattern
Under a well-calibrated scoring model, lower-rated bands (C, Ca, Caa) should exhibit significantly higher avg_incidents and incident_rate values compared to higher-rated bands (Aaa, Aa, A). A monotonic decrease in incident rate from C to Aaa provides strong validation of the scoring methodology.
5. Incident Type Distribution
The system aggregates all incidents linked to portfolio companies and groups them by incident type. The top 10 incident types by frequency are displayed, providing underwriters with a portfolio-level view of threat composition.
Common incident types include:
| Incident Type | Description | Typical Severity |
|---|---|---|
| Ransomware | Encryption-based extortion attacks | Critical |
| Data Breach | Unauthorized access to sensitive data | High |
| Phishing | Social engineering attacks via email | Medium |
| DDoS | Distributed denial-of-service attacks | Medium |
| Malware | Malicious software infections | High |
| Insider Threat | Malicious or negligent internal actors | High |
| Supply Chain | Compromise via third-party vendors | Critical |
| Vulnerability Exploit | Exploitation of known CVEs | Medium–High |
| Business Email Compromise | Email account takeover for fraud | High |
| Credential Theft | Stolen login credentials | Medium |
This distribution helps underwriters understand which threat vectors dominate the portfolio and may inform coverage-specific pricing adjustments.
6. Risk Identification
The module identifies the “worst” companies in the portfolio by ranking them in descending order of incident count. This surfaces companies that may require:
- Increased premium loading — to reflect elevated loss potential
- Exclusion consideration — if incident history is extreme
- Active risk management engagement — to reduce future exposure
- Enhanced monitoring — more frequent score re-evaluation
For each flagged company, the output includes the company name, current score, score band, and total incident count, enabling underwriters to cross-reference with the score-incident scatter plot.
7. Incident Aggregation Pipeline
Incident counts per company are computed using a multi-step data processing pipeline that handles the dual-type company identifier field (single value or list of values). The pipeline normalizes list-valued identifiers into individual records before grouping.
Why normalization is necessary: A single incident may affect multiple companies (e.g., a supply-chain breach). The company identifier field stores these as a list. Without normalization, the incident would only be counted once and attributed to the list as a whole rather than to each individual company.
8. Data Sources
| Data Source | Purpose | Key Information |
|---|---|---|
| Company security scoring engine | Cybersecurity risk scores | Company identifier, score, band |
| Cyber incident intelligence feed | Cyber incident records | Company identifier, type, date, severity |
| Portfolio management system | User portfolio membership | User identifier, company identifiers |
9. Glossary
| Term | Definition |
|---|---|
| Pearson r | A measure of linear correlation between two variables, ranging from -1 (perfect negative) to +1 (perfect positive). |
| Score Band | A categorical grouping (Aaa through C) derived from the continuous 0–1000 cybersecurity score. |
| Incident Rate | The percentage of companies within a score band that have experienced at least one cyber incident. |
| Company Identifier | A field in the incident data that associates an incident with one or more companies. Can be a single value or a list of values. |
| Array Normalization | A data processing step that expands list-valued fields into individual records, enabling accurate per-company attribution. |
| Portfolio | A user-curated set of companies tracked for insurance underwriting purposes. |
| Scatter Plot | A chart plotting individual companies as points with score on the X-axis and incident count on the Y-axis. |
| avg_incidents | The mean number of incidents per company within a given score band. |