Section 3 Compulsory third party liability policies
In this section we construct the claims frequency data for the compulsory third party liability policies. We discuss the data cleaning and perform some preliminary summary analysis. We establish the claims frequency of the CTPL policies using the policy data (6032 policies on 2298 cars) and the claim data (1204 claims). All the CTPL policies have one coverage of third party liability and have the same amount of insurance CNY 122000. In 2021 all the Chinese insurance companies increase the CTPL AOI to CNY 200000.
3.1 Data cleaning
There are 26 variables for each CTPL policy. We describe those variables and the pre-process procedure as follows.
3.1.1 Identification variables
Variables 1-2 are the identification variables.
Device_ID
is used to match the frequency data with the telematics data.Policy_Code
identifies each policy. One car may have bought several policies, i.e., renewal policies for several years.
3.1.2 Time
Variables 3-5 are the time variables.
KINDSTARTDATE
is the policy effective date ranging from 2014-01-01 to 2017-09-25.DUEENDDATE
is the policy expiring date ranging from 2014-09-02 to 2018-09-24.UNDERWRITEENDATE
is the underwriting date ranging from 2013-10-19 to 2017-06-30.
3.1.3 Policy duration
Variable 6 is the policy duration.
YEARS
is the policy duration fromKINDSTARTDATE
toDUEENDDATE
. There are 6 policies with less than 1 year policy duration (no claims made on these policies). We remove those policies, resulting all the policies with 1 year policy duration.
3.1.4 Car features
Variables 7-11 are the car variables.
PURCHASEPRICENOTAX
is the car purchase price, ranging from CNY25000 to CNY990000.SEATCOUNT
ranges from 2 to 8, and 96% policies have 5 seats.CARBRAND
contains 67 car brands.CARSERIESNAME
contains 299 car series.USEYEARS
ranges from 0 to 14 years.
3.1.5 Driver features
Variables 12-14 are the driver variables.
BRANCHNAME
is the branch name selling the policy. It indicates the main area where the car is driven. There are 27 different branches.AGE
is the driver’ age. After correcting for 6 driver’s ages, the driver’s age ranges from 19 to 77 years old.SEX
is the driver’s gender. The number of policies bought by males is nearly as double as the females.
3.1.6 Experience rating factors
Variables 15-16 are the experience rating factors.
LASTCLAIMCOUNT
is the number of claims in the previous year. It ranges from 0 to 4, with -3 indicating a new car and -4 indicating a new policy. We keep the 8 missing values since we will use the no-claim discount (NCD) factor instead.NCD_Compulsory
is the no-claim discount (NCD) factor for the compulsory policies. It takes values from {0.7, 0.8, 0.9, 1, 1.1, 1.3}. The relationship betweenLASTCLAIMCOUNT
andNCD_Compulsory
is shown in Figure 3.1.
3.1.7 Other policy features
Variable 17 is the policy type. Variable 18 is the coverage type. Variable 19 is the amount of insurance (AOI). Variable 20 is the indication of non-deductible. Variable 21-22 are the premium.
RISKCODE
indicates the type of policy, either the CTPL policy or the commercial policy.KINDCODE
indicates the coverages of the policy. For the CTPL policy, there is only one coverage, third party liability.AMOUNTNEW
is the amount of insurance (the coverage limit). We correct for two data errors. All the CTPL policies have the same coverage limit of CNY122000.FLAG
is the indication of non-deductible. We correct for one data error. All the CTPL policies are non-deductible.PREMIUM
is the premium for each coverage, ranging from CNY285 to CNY1850.PREMIUM_Total
is the total premium for all coverages. For the CTPL policies, it equals toPREMIUM
. Figure 3.2 shows that generally the premiums increase with NCD factor.
3.1.8 Claims features
Variable 23-26 are the claim variables extracted from the claim data.
Claim_Code
contains all the claim codes on this policy. Note that one policy may make several claims.Claim_Coverage
contains the incurred coverages of all the claims. For the CTPL policy, it is always the third party liability coverage.Claim_Amount
contains the incurred claims amount of each coverage of each claim.Claim_Count
is the number of claims made on this policy. This variable is the response variable in the claims frequency modelling.
We show the distribution of last reporting dates (from policy inception) in Figure 3.3, which indicates that the last reporting dates are distributed uniformly along the policy duration. After the data cleaning, we have 6026 policies on 2296 cars in which 1204 claims are made by 1031 policies on 829 cars.
Considering the report delay as shown in Table 2.1, we set the observed exposed period as from 2014-01-01 to 2017-06-25. There are 122 policies outside the observed exposed period and 1614 policies are partially exposed. The total claims count is 1204 and the total exposure is 5262. The claims frequency is 0.23 on those 5904 policies of 2296 cars. The claims frequency data of CTPL policies is saved as policy_compulsory.csv
3.2 Policies aggregation w.r.t cars
As we discussed before, there is a large proportion of policies (\(26\%\)) missing the telematics data. We aggregate policies with respect to the cars Device_ID
. We need to check the cars with multiple drivers and implement appropriate data cleaning. We can detect the driver changes using the region, the age, and the gender.
There are 8 cars changing the region, 82 cars with drivers changing the gender and 87 cars with drivers changing the age for more times than the number of policies. We remove these cars. We set the age of drivers with multiple policies as the median of ages.
The car price PURCHASEPRICENOTAX
is the market value which changes yearly, and we set it as the median. We set the use-years USEYEARS
, the no-claim discount factor NCD_Compulsory
as the mean. We calculate the total premium PREMIUM_Total
, the total exposure Earned_Years
, and the total claim counts Claim_Count
for each car.
Finally we have removed 177 cars. The aggregation data contains 2119 cars with 4804 years-at-risk and 1059 claims. The empirical claims frequency is 0.22. The claims frequency data of cars is saved as car_compulsory.csv
.
Remarks: It may be that driver’s information has changed for optimizing the insurance premium from a policyholder perspective. In a second stage one may analyze whether this change of policyholder is caused by an accident, i.e. whether there was an accident immediately before the change. In a third stage we may test whether the driving behavior changes, i.e. is this really a change of car driver or only a change on paper (policy contract).