Enhancing Risk Management Strategies: GAM Analysis of Health Insurance Claim Determinants

Health insurance plays a crucial role in providing financial protection and ensuring access to necessary healthcare services. The awareness of Indonesian society regarding the importance of health insurance continues to grow, as evidenced by a 22% increase in premium income according to AAJI data as of March 2023. Despite the benefits of health insurance, an increasing number of insurance participants raises risks for insurance companies. The Generalized Additive Models (GAM) P-Spline can overcome these problems. The non-linear relationship between claim amount with age, body mass index, and blood pressure can be modelled with GAM P-Spline. The formed GAM model with PIRLS unable to give a clear information of relationship between variables explicitly, but can be seen by the shape of the function of each predictor associated with the link function used.


Introduction
Health insurance plays a crucial role in providing financial protection and ensuring access to necessary healthcare services for policyholders.According to research conducted by Inventure Indonesia in November 2020, 78.7% of respondents agreed that, as a result of the pandemic, having insurance, whether life insurance or health insurance, has become a necessity (Fajrini et al., 2023).In Indonesia, public awareness of the importance of health insurance continues to grow.Based on data from the Indonesian Life Insurance Association (AAJI), as of March 2023, there has been a 22% increase in premium income compared to the 2022 period.
The more participants in health insurance, the greater the risk faced by insurance companies.This is because with an increasing number of health insurance participants, the likelihood of health insurance claims also increases.Therefore, insurance companies must set the right premiums to manage the risks they face and ensure they have sufficient reserves to cover potential health insurance claims.In determining premiums, insurance companies must consider factors such as age, gender, health history, and the lifestyle of health insurance participants.By setting the right premiums, insurance companies can manage risks more effectively and provide better protection to their customers.
In this study, the relationship between participant data such as BMI, age, gender, lifestyle, and others will be analyzed in relation to the amount of health insurance claims.A better understanding of this relationship can help insurance companies identify and manage health risks more effectively.Additionally, this research has the potential to improve the accuracy of risk assessment and premium determination, thereby enhancing the financial sustainability of health insurance companies.
This research chooses to use a generalized additive model (GAM) as the most suitable statistical analysis method to formulate the complex relationship between participant data variables and the amount of health insurance claims.GAM has advantages in handling nonlinear patterns, complex interactions between variables, and flexibility in modeling heteroskedasticity.In this case, the use of GAM allows for a deeper understanding of the information from the data and facilitates a more accurate understanding of patterns that may be overlooked by more traditional linear approaches.
By understanding these relationships, insurance companies can adjust premiums accurately, optimize marketing strategies for more targeted market segments, and provide better protection for policyholders.

Monte Carlo
The Monte Carlo method is an algorithm for solving problems that utilizes random processes or randomization in the form of probability simulation.This approach employs probability by generating random numbers for estimation.The Monte Carlo simulation process is carried out through a series of iterations, and the number of iterations is determined by generating as many random numbers as needed for the simulation (Mukhti et al., 2018).
The following are the steps of the Monte Carlo simulation (Hasugian et al., 2022): a).Data group b).Calculation of relative frequency distribution c).Calculation of cumulative relative frequency distribution d).Determination of random number intervals e).Calculation and generation of random numbers f).Monte Carlo simulation The Monte Carlo method utilizes existing sampled data (historical data) with known data distributions.This approach relies on the strong law of large numbers, indicating that the more random variables used, the closer the approach will be to the exact value (Qurani et al., 2020).

Generalized Additive Models (GAM)
Generalized Additive Models (GAM) eliminate the assumption of some function to obtain a model that shows the relationship between the response variable ant its predictor variables (Fernanda et al., 2023).The general form of the GAM model can be formulated as follows:

∑
The equation can be rewritten as model below: with

Response variable Intercept coefficient Predictor variables for Number of predictor variables Smoothing function for predictor variables
Parameter estimation in GAM can be done with Penalized Iteratively Re-weighted Least Squares (PIRLS).In addition, smoothing function estimation based on cross validation criteria, such as Generalized Cross Validation (GCV) (Wood, 2006).

Penalized Spline B-Spline Basis
Splines are segmented polynomials in which each segment is joined together with knots.This segmented nature makes splines have good flexibility (Sihombing & Famalika, 2022).B-Spline is polynomial functions that are segmented on the interval formed by the knots and estimated for each segment at a certain polynomial degree (de Boor, 2001).The basis of -th B-Spline with degree is as many as knots for .The basis can be defined recursively as follows: with The B-Spline function uses too many knots that tend to overfit and require a penalty on the adjacent coefficients of the B-Spline (Eilers & Marx, 1996).P-Spline is a smoother consisting of several basis functions of B-Splines that do not exceed the number of observations and whose regression coefficients are penalized.

Generalized Additive Models P-Splines
Equation (1) can be rewritten as follows: The coefficient vector of the j-th predictor The unknown parameter estimates in GAM can be obtained by maximizing the likelihood function as follows: with ∑ and is the smoothing parameter whose value is known (Wood, 2006).According to Marx & Eilers (1998), the estimation form can be obtained as follows:

Smoothing Parameter Selection
The selection of smoothing parameters can be done using Generalized Cross Validation (GCV).According to Eubank (1999), the GCV value is obtained from the following formula: with The actual value of the -th observation of the response variable (2) (3) ̂ Predicted value of the -th observation of the response variable Number of observations The sum of the main diagonal elements matrx

Materials
The data used in this research is simulation data obtained from the Monte Carlo simulation.There are 100 data points consisting of three independent variables: age, body mass index, and blood pressure.The dependent variable used is the claim amount.

Methods
In additive models, the parameters are estimated by penalized least squares.However, in Generalized Additive Models P-Splines, the parameters can be estimated using Penalized Iteratively Re-weighted Least Squares (PIRLS) to achieve optimal results.In general, the maximum penalized likelihood estimation can be done through the following iteration process: ] increment.g).Repeat the steps starting from until it converges, i.e.
with is a very small number (Wood, 2006).

Results and Discussion
In this study, there are 100 data points consisting of three independent variabels such as age, body mass index, and blood pressure.The dependent variable used is the claim amout.Before modeling the claim amount against each predictor variable, author will first identify the relationship pattern between each variable response and predictor using a scatterplot.Figure 3 shows that the relationship between claim amount and blood pressure is randomly distributed.The relationship pattern cannot be specified parametrically.
Therefore, modeling the amount of claims with influencing factors will use a nonparametric approach, such as Generalized Additive Models P-Spline.The number of knots and degrees used is 2, so the number of bases will be as many as . The equation can be written in the form as follows:

|
The next step is done with PIRLS using the optimal smoothing parameters that have been obtained on each predictor variable to obtain the estimated value of the model parameters.The relationship of each variables from the model above can be seen by the shape of the function of each predictor associated with the link function used.

Conclussion
Based on the results of the discussion, it can be concluded that modeling between the amount of claims with age, BMI, and blood pressure can be done with a nonparametric approach, such as GAM.This can be seen from the scatter plot between each predictor variable and the response variable which does not follow the shape of a particular curve function.The Generalized Additive Models P-Spline model formed in the equation cannot be explained explicitly for the increase of each predictor.
variables Smoothing function for predictor variables Generalized Additive Models P-Splines are GAMs that contain additive functions using P-Spline as the smoothing function.Model (2) can be written as follows: ∑ ∑ ∑ The model can be rewriteen as follows: with Regressor matrix for the B-Spline basis Coefficient vector Regressor matrix for the -th predictor B-Spline basis [ ] a). Determine the initial estimated value of [ ] b).Determine the initial estimated value of [ ] [ ] c).Determine the initial estimated value of [ ] ( [ ] ) d). Calculating the weight matrix [ ] e).Calculating the pseudodata [ ] f).Calculating the value of [ ] as follows:[ ]

Figure 1 :
Figure 1: Scatterplot between claim amount and ageFigure1shows that the relationship between claim amount and age is randomly distributed.The relationship pattern cannot be specified parametrically.

Figure 2 :
Figure 2: Scatterplot between claim amount and body mass indexFigure2shows that the relationship between claim amount and body mass index is randomly distributed.The relationship pattern cannot be specified parametrically.

Figure 3 :
Figure 3: Scatterplot between claim amount and blood pressure

Table 1 :
Results of estimated parameter model The parameter estimation values of the model are as follows:

Variabel Estimated Value of Parameter Intercept( 4 )
So, that the Generalized Additive Models P-Spline model (4) is obtained as below: