A generalized linear mixed model for understanding determinant factors of student ' s interest in pursuing bachelor ' s degree at Universitas Syiah Kuala

Generalized Linear Mixed Model (GLMM) is a framework that has a response variable, fixed effects, and random effects. The response variable comes from an exponential family, whereas random effects have a normal distribution. Estimating parameters can be calculated using the maximum likelihood method using the Laplace approach or the Gauss-Hermite Quadrature (GHQ) approach. The purpose of this study was to identify factors that trigger student's interest to continue studying at Universitas Syiah Kuala (USK) using both techniques. The GLMM is suitable for the data because the variable response has a Bernoulli distribution, and the random effects are assumed to be having a normal distribution. Also, the model helps identify the relationship between the dependent variable and the predictors. This study utilizes data from six high schools in Banda Aceh city drawn using a two-stage sampling technique. Stage 1, we randomly chose six out of sixteen public senior high schools in Banda Aceh. Stage 2, we selected students from each school from four different major classes. The GLMM model includes one binary response variable, five numerical fixed-effects, and two random effects. The response variable is the interest of high school students to continue study at USK (yes or no). The five fixed effects in the model including scores of collaboration (C), Action (A), Emotion (E), Purposes (P), and Hope (H). Finally, the random effects are schools (S) and majors (M). In this study, both Laplace and GHQ techniques produce identical results. The predictors that can explain student interest are A, E, and H. These predictors have a positive effect. The random effects of schools and majors are not significantly different from zero. The model with three significant predictors is better than the complete predictor model.


INTRODUCTION
The General Linear Mixed Model (GLMM) is a valuable framework for comparing how several variables affect different continuous variables [1]. The Generalized Linear Mixed Model (GLMM) was developed as a combination of Linear Mixed Model (LMM) and Generalized Linear Model (GLM) [2]. LMM has a Gaussian response variable with predictors of random effect, whereas GLM has a response variable having an exponential family. As a result, GLMM is a model that contains a response variable from the exponential family, fixed effects component, and Gaussian random effects [3]. Examples of exponential family distribution are Bernoulli, Binomial, Poisson, Normal and Exponential distribution [4]. Therefore, the GLMM can use a response variable that is more flexible than the linear model, which can only use a gaussian response variable.
Estimation of parameters for the GLMM can be done using the Pseudo-Likelihood (PL) [5], hierarchical likelihood (h-likelihood) [3], the maximum likelihood method based on the Laplace approximation [6], and maximumlikelihood based on Adaptive Quadrature [7]. The PL technique utilizes Taylor Series to do the optimalization of the model. The hlikelihood is helpful in predict random effects, fixed effects, and dispersion parameters [8]. The integral of distribution for h-likelihood sometimes has no analytical closed-form, so that we need a numerical approximation. One of the numerical methods is Laplace approximation [3]. Some studies reveal that Laplace method might be implemented using *Corresponding Author: khairil@apps.ipb.ac.id smaller sample than the Adaptive Quadrature. Moreover, the Laplace can solve the interaction of random effects, but it cannot solve the quadrature case. Laplace is more efficient than the quadrature algorithm with one node. GLMM has been applied to real data since 1989. GLMM model of Salamander mating experiment with three fixed effects and two random effects was created [9]. The model was known as logistic linear with random effect. The response variable scores are 0 and 1 where 0 means the salamander mates and 1 means the salamander did not mate, so that the variable respons has distribution Bernoulli. The fixed effects are Whiteside Female (WS f ), Whiteside Male (WS m ), and interaction of WS f and WS m . The random effects are the male and the female effects. The h-likelihood ran well to estimate the parameters of the model. Next, the fitting of the model for seizure patients with three covariates has been carried out. The model was known as Poisson log-linear mixed model at that time [10]. The response variable was the seizurepatient count, which has Poisson distribution. The fixed effect was the treatment, and the random effects were the kth visiting and age. Further, there was a problem of overdispersion in the model. The model coefficient parameter was estimated by h-likelihood, Laplace and Quadrature with one node and twenty five nodes [3].
GLMM has been implemented in several other specific cases in further development, including a model with time-series data in 2011 using natural cubic splines (NS) [11]. We can also find the application of GLMM to identify triggers for resettlement to Australia with repeated measurement in the data. In the research, Generalized Estimating Equations (GEEs) were used to estimate the parameters of the model [12]. The GLMM with the Penalized Quasi-Likelihood, the maximum likelihood with Laplace and Adaptive Gaussian Quadrature (AGQ) approximations for estimating model has been applied to simulation data with high dimension in 2017. Based on this research, the Laplace approach provided better result [13]. Then, GLMM without random effects was also applied to identify factors that explain student's interests. Variables of networks, goals and expectations were significant in the research [14].
Universitas Syiah Kuala (USK) is an Aaccredited university since 2015. It is a public university ranked as Public Service Agency since 2018, and was awarded as the best university in Aceh by the Ministry of Education and Culture. Currently, USK has 1,558 lecturers with 39 percent having doctoral degrees, 60 percent having master's degrees, and 5 percent or 81 people professors. The number of students is around 33,000 people spread over 13 faculties. USK has a target to pass qualified graduates for working or job creators [15]. Based on [16], the higher the supervisor's education level of the students is the faster the students finish their thesis.
Many factors influenced student's interest to pursue their bachelor degree in USK. In this study, we included five fixed effects and two random effects as covariates. The future students usually hope to have many friends in the university, so collaboration or network development becomes a fixed effect. The other covariate is the ability to act. It is about how students collect information about USK, and it may influence the students' interest. The third fixed effect is the emotional factor that measures the willingness degree to study at USK. The fourth is the purpose factor which consists of several questions, including: (1) USK is my favorite campus; (2) USK can help me achieve my goals; (3) USK makes me get the job I want. The last fixed effect is the expectation factor which includes: (1) I hope USK has complete facilities; (2) I hope that tuition fees are affordable; (3) I hope USK has good quality lecturers; (4) I hope USK is a place comfortable study; (5) I hope USK offers many scholarships. The GLMM also involves two random effects, namely the school origin and the major. The School origin was assigned as a random effect because the six school origins were randomly selected from sixteen schools in Banda Aceh. Meanwhile, the major was chosen as a random effect because Science and Social Sciences were chosen from three majors in a senior high school.
USK currently applied a fair single tuition fee system to its students where the payment of the tuition fee was based on the student's financial capacity, but in fact many students or their parents felt that the costs were too high. The tuition fee was measured by the type of occupation of the parents, family assets owned, the amount of monthly electricity payments and the number of family dependents. Besides that, students through the Joint Entrance Examination were charged a different development fee at the beginning of their study in each study program. Therefore, several students did not do re-registration after they were accepted because of the high costs that must be bought by some students. There were students who choose to go to other universities, both public and private in Aceh.
Two reasons why this research was necessary are firstly, finding an accurate model that can solve data solutions with binary response variables, some fixed effects, and some random effects, secondly, applying the model to identify the factors that influence high school student's interest in continuing their studies at USK. Therefore, USK was hoped to obtain information as a basis for implementing policies. The right policy is important to implement to reduce prospective students who do not re-register because of the tuition fee policies and they will satisfy when studying at USK.
Based on this background, the purposes of this study is to apply the GLMM model with the Laplace and one node Gauss-Hermite Quadrature approach in identifying factors that can trigger the interest of prospective students to continue their studies at USK. Futhermore the model is evaluated to find a model with only factors that significantly influence the response and higher model fit.

METHODOLOGY Data
USK funds collecting of the data through Senior Lector Research Grant Program. The data were taken at the State Senior High Schools (Shortened to SMAN in Indonesian which stands for Sekolah Menengah Atas Negeri) in Banda Aceh City by the student enumerator from the Statistics Study Program -USK. The period of data collection was carried out from April until May 2019. This data collection was part of the research grant activity at USK in 2019.
The schools in Banda Aceh are categorized into three groups by the Education and Culture Office of Aceh Province, namely favorite, middle and ordinary. The number of schools in respective groups are five, five, and six, so that there are 16 schools in total with the number of students as many as 5,366 students (see Table  1).
The sample of the respondents was drawn in two stages. At the first stage, six schools were taken from the sixteen schools using stratified random sampling. Two selected favorite school samples are SMAN 3 and SMAN 4 Banda Aceh; the middle school samples are SMAN 5 Banda Aceh and SMAN 8 Banda Aceh; and the ordinary schools are SMAN 14 Banda Aceh and SMAN 16 Banda Aceh.
At the second stage, one class XI and one class XII of Natural Science, and one class XI and one class XII of Social Science were selected from each school randomly. The reason for selecting these classes is that several students consist of potential students to pursue their study at university. Finally, all students in the classes chosen are as respondents. The numbers of SMAN 3 and SMAN 4 students are 111 and 120 people, respectively, so the respondents of the favorite schools are 231 people. The number of students from SMAN 5 Banda Aceh and SMAN 8 Banda Aceh is 101 and 89 people, respectively, so that the number of middle school respondents is 190 students. The number of respondents of ordinary schools is 95. Therefore, the total of the respondent is 516 students, see Table 2.

Method
Steps of the research are: 1. collecting data, 2. identifying outliers which have influence, 3. building the GLMM models, 4. estimating parameters, standard error (SE), t and p-value using the maximum likelihood method through the Laplace approach and the Gauss-Hermite Quadrature approach [11], 5. evaluating model fit through -2 Log Likelihood, AIC, AICC, BIC, HQIC and CAIC, 6. determining the factors that affect the interest of high school students to continue studying at USK, 7. based on result of the fifth step, building a model involving only influential predictors and its statistic fit.
Step two until step four was carried out with SAS 9.4 software.

Generalized Linear Mixed Model (GLMM)
Two key elements in the GLMM [6]: • observations are independent in random effects, • the distribution of the random variable Y i is an exponential family with probability density function: Where: is score of ith object in kth cluster, (. ), (. ), (. ) are functions, is the dispersion parameter which is or is not known.
is related with the conditional mean = ( | ).  is a random effect vector. For example, Bernouli distribution is exponential family because its distribution can be written as equation (1). The process can be seen : (1 −  )] It can be shown: The GLMM model can be written as [10] and [17]: Where:  is a link function, X and Z are design matrices,  is a fixed effect vector,  is a vector of random effect and is error vector of the model. If is the ith object in the kth cluster, i = 1, 2, …, n and k = 1, …, K, = ( | , , ), b l is estimator of and V( | ) = ( ) then the GLMM model can be written as [18]:

Laplace's approach
The GLMM model in equation (2) can be estimated by:

Gauss-Hermite Quadrature (GHQ)
The parameter estimators with the GHQ approach is obtained by minimizing the following objective function [8]:

Hypothesis test
Hypothesis of fixed effects are [16]  After determining hypothesis, estimation of coefficients, standard error of the coefficient, t value and p-value are calculated and presented in Table 5.

Statistic Fits
Model fit can be analyzed with several criteria, see Table 6. The smaller the criterion score of a model is, the better the model is.

Modelling the triggers for interest of students to continue study in USK
Model of the research can be written as: Note: i=1, 2,…,n k ; j=1,2; k = 1, 2, 3 i = index of respondent, j = index of major, k = index of school (random effect), ( ) = the linear predictor and it is modelled by the inverse logistic link function [12].
C ik = the score of cooperation of the ith respondent at the kth school, A ik = action score of the ith respondent at the kth school, E ik = emotional attitude score of the ith respondent at the kth school, P ik = score of the objectives of the ith respondent at the kth school, H ik = the expectation score of the ith respondent at the kth school, S ik = the kth school of ith respondent, MS ij(k) = major of the ith respondent at the kth school (m .1(.) = Natural Science = 1, m .2(.) = Social Science = 2), = random error for the ith respondent in the kth school. 0 , 1 , 2 , 3 , 4 , 5 , 1 , and 2 are intercepts of fixed effect, and model coefficients of fixed effects and random effect.

RESULTS AND DISCUSSION
Outliers are values that often affect the results of statistical analysis. Outliers can be detected by a boxplot diagram. But there are also outliers that don't affect the results. In this study, there were several outliers in the data and after being evaluated they did not affect the results of the analysis. The boxplot for the data which have been cleaned in this study can be seen in Figure  1. Symbols of X1, X2 , X3, X4 and X5 are equal to symbols of C, A, E, P and H, respectively.
The optimum parameter estimator was obtained in the 14 th iteration. The value of the objective function has already had a very small change since the 11 th iteration, see Table 7. It means that the value of the objective function is convergence at 256.492. This can also be seen from the third column of the table, which shows the score around zero. In this case, the objective function for Laplace approximation is represented by equation (5). The maximum likelihood method through the Laplace and Gauss-Hermes Quadrature with one node approaches produce similar output. We can see it from the output of iteration history, IC, fixed effect, and random effect solution.    There is no influence of S and M on student interest (Y). The value of estimate S is 0.144 and M(S) is 0.03474 which can be called not significantly different from zero, see Table 9. This can also be interpreted that the places of schools in Banda Aceh do not show any difference in student interest in continuing school in USK. In addition, the major at SMAN also do not show any difference to interest in studying at USK. There are three fixed effects that affect the response variables, see Table 10. If the p-value is smaller than , the variable can be said to have a significant effect on the response variable at the  level. Therefore, action (A), emotions (E), and expectations (H) have a significant effect at  = 0.05., see Table 10. Collaboration (C) and goals (P) have no significant effect. A, E and H have a positive influence, which means that the more actions are taken, the higher the emotions and the more expectations are, the more interested the students are to continue studying at USK.      Table 13.
Handayani's research shows that Laplace is better than quadrature using several points [13]. This is different from this study which found that the results of the Laplace were the same as the results of the GHQ approach using one point.  Odd ratio of a coefficient  j can be counted by . The odds ratio A is 1.340, see Table 14, it means that the difference in score on variable A of 1 will cause the difference in the interest score of 1.340 times. For example, a student who scores 9 for the variable A has an interest level of continuing to USK 1.340 times that of a student who scores 8 for the variable A. The odds ratio E and H are interpreted in the same way.

CONCLUSION
The GLMM model can be applied to identify triggers for student's interest in pursuing bachelor's degree in USK. The response has a Bernouli distribution, the fixed effects are cooperation (C), activities (A), emotions (E), purposes (P) and expectation (H), while the random effects are school origin (S) and majors (M). Results of the maximum likelihood estimation technique using the Laplace approach are as good as the Gauss-Hermite Quadrature approach with one node. This can be seen from the value of the information criteria which has the same value and even the other output results are the same. There are two models produced in this study, namely a full model consisting of seven predictors and a model containing three predictors that significantly affect the response variables, namely A, E and H. These factors have positive effects on the level of interest of students to continue studying at USK. The random effects of schools and majors do not have significantly different influences on the response variable. This implies that the origin of the schools and the majors in senior high school give the same level of students' interest in continuing their studies at USK. The values of AIC, AICC, BIC, CAIC and HQIC for the three predictor model have smaller scores than these values for