Factors that influence the recovery of TB patients using Cox proportional hazard regression

Tuberculosis (TB) is an infectious disease which is one of the biggest health problems in the world, including Indonesia. The government, through the National Tuberculosis Control program, has made various efforts to control tuberculosis. However, this problem was exacerbated by the dramatic increase in the incidence of tuberculosis. This study aimed to determine the Cox proportional hazard regression model and the factors that affect the cure rate of TB patients. We used medical record data for inpatient TB patients for the period July-December 2017 at dr. Zainoel Abidin Hospital. The results showed that with α = 0.1, the factors that influenced the recovery of TB patients were the type of cough, the symptoms of bloody cough and symptoms of sweating at night. There were 33.93% of patients who did not work. This category included students, domestic helpers, and those who did not work until they suffered from tuberculosis and were treated at dr. Zainoel Abidin Hospital. The hazard ratio (failure ratio) showed that the tendency or cure rate for TB patients who did not experience cough symptoms was 70% greater than patients who experienced phlegm cough symptoms. The cure rate for TB patients who experienced coughing up blood symptoms was 53% greater than patients without these symptoms. The cure rate for TB patients who experienced symptoms of sweating at night was 54% greater than patients who did not sweat at night.


INTRODUCTION
The World Health Organization (WHO) has designated Indonesia a "high burden country" for tuberculosis (TB). The government through the National Tuberculosis Control program has made various efforts to tackle tuberculosis. Public hospitals as health institutions that are directly related to the community also have a role in tackling tuberculosis cases. The availability of medical records for TB patients in the hospital is a good source of information to be able to see factors related to recovery rate of TB. However, the problem now is that with an inadequate and inconsistent treatment regime, a persistent positive sputum case pool is being created.
Several studies have been conducted to obtain factors that influence the recovery rate of TB patients. Budiarti and Astutik found that age of patients affects the recovery rate of pulmonary tuberculosis [1]. Another research confirmed that gender, symptoms of shortness of breath, and symptoms of chest pain, are factors that affect the rate of recovery of pulmonary tuberculosis patients [2]. This study was done by using more variables, including age, gender, symptoms of shortness of breath, symptoms of chest pain, smoking habits, educational history, type of work, cough symptoms, symptoms of bloody cough, night sweats, and history of tuberculosis affecting the recovery rate of tuberculosis patients. These variables were considered to obtain the Cox proportional hazard regression model and find out the factors that influence the recovery rate of TB patients.
Survival analysis is the phrase used to describe the analysis of data in the form of times from a well-defined time origin until the occurrence of some particular event or end-point. In medical research, the time origin will often correspond to the recruitment of an individual into an experimental study, such as a clinical trial to compare two or more treatments [2]. *Corresponding Author: zurnila@unsyiah.ac.id One of the effective ways of identifying and evaluating the disease-related patterns and estimating the likelihood of death due to the specific health conditions, including TB is the survival rates analysis. Survival analysis has been widely used including in modeling the patient's recovery rate. The patient's recovery rate is affected by several risk factors. It can be modeled using the Cox proportional hazard regression method. Cox regression is one of the popular methods in survival analysis [3].

Tuberculosis (TB)
Tuberculosis (TB) is a direct infectious disease caused by the bacterium Mycobacterium tuberculosis, which mostly (80%) attacks the lungs, but can also affect other organs [4]. Mycobacterium tuberculosis was first discovered by Robert Koch in 1882 [5]. Mycobacterium tuberculosis includes grampositive bacilli, rod-shaped, cell walls containing lipid-glycolipid complexes and waxes (wax) that are difficult to penetrate chemicals. This bacterium is resistant to acids in coloration, so-called acid-fast bacilli (AFB) and is used to identify phlegm microscopically. Mycobacterium tuberculosis dies quickly when exposed to direct sunlight, but can survive for several years in a dark and humid place [6] METHODOLOGY The data used were medical record of pulmonary tuberculosis inpatients for the period July to December 2017 from dr. Zainoel Abidin Hospital Banda Aceh. It included 112 patients with 15 censored and 97 not censored data. The limitation regarding data set was that the medical record data did not explain in detail the information for each recorded variable. For example, in the smoking history variable, only the patient's status as an active smoker was known. No information was obtained regarding the type of cigarette, the frequency of smoking per day, or for how long the patient smoked.
The data analysis and modeling process were carried out with the R i386 3.4.2 software. The stages of analysis were: 1. Performed descriptive analysis to obtain an overview of the characteristics of TB patient data. 2. Determined the initial Cox proportional hazard regression model of the cure rate for TB patients, until the best model was identified, using some methods below.

Survival Function
According to Lee and Wang (2003), survival function is defined as the opportunity for an individual to experience an event or can survive beyond a certain time t [7]. Survival functions can be denoted by S(t) and formulated as follows: ( ) = Probability of surviving beyond time t S(t) = ( > ).
If T is time to death, then S(t) is the probability that a subject can survive beyond time t;

Hazard Function
The hazard function or failure function is also known as a hazard rate denoted by ℎ(t). According to [7], the hazard function is defined as the chance of an individual failing in the time interval t to t + ∆t, assuming that individuals can survive at the beginning of the interval or until time t. The mathematical hazard function can be stated as follows:

Cox Proportional Hazard Regression Model
According to [8], the Cox proportional hazard regression model is used to determine the relationship between the dependent variable and the independent variable, where the data used is in the form of data on the survival time of an individual. The general form of the Cox proportional hazard regression model is as follows: where ℎ ( ) = Hazard function (failure function i th individual) ℎ 0 ( ) = Baseline hazard function = The value of the j th variable from the i th individual, with j = 1,2, ..., p and i = 1,2, ..., n = The j th regression coefficient, with j = 1,2, ..., p

Parameter Estimation
To determine the model, the coefficients of variables X1, X2, ..., Xp are needed, namely β1, β2, ..., βp. The β coefficient in the Cox proportional hazard regression model can be estimated using the maximum likelihood method. If there are n individuals, including r individuals that are not censored and nr Factors that influence the recovery of TB patients using Cox proportional hazard regression (Zurnila Marli Kesuma, Hizir, Latifah Rahayu, Wardatul Jannah) ___________________________________________________________________________________________________ individuals are censored, the sequence of failure times r is denoted by (1) < (2) < ⋯ < ( ) , so ( ) is the order of the j th failure. According to Cox (1972) in [8] the likelihood function for the Cox proportional hazard regression model at the time of j-failure is: = Vector individual covariates that fail thej event at time ( ) ( ( ) ) = The set of individuals who are at risk of failing at the time ( )

Simultaneous Testing
Simultaneous testing can be done with a partial likelihood ratio test denoted by G [9]. The null hypothesis is that the independent variables do not have any effects on survival time ( 1 = 2 = ⋯ = = 0) with rejection criteria that the null hypothesis is rejected if G> ; 2 or pvalue < .

Partial Testing
Partial testing can be done with the Wald test denoted by Z [9]. The null hypothesis is that the j th independent variable does not have an effect on survival time ( = 0 for = 1,2, … , ) with rejection criteria that the null hypothesis is rejected if > or p-value < .

Hazard Ratio
According to [9], the hazard ratio (failure ratio) is the failure for one group of individuals divided by the failure of other individual groups. The failure ratio can be expressed in the following form: where * is the value of the independent variable for one individual group, shows the value of the independent variable for one group of other individuals and ̂ is the regression coefficient.   Table 2 shows the average length of time of TB patients treated in dr. Zainoel Abidin Hospital, which was 6 days with the minimum was 1 day and the maximum was 23 days. The average age of TB patients who were treated in dr. Zainoel Abidin Hospital was 46.57 years old. The youngest was 18 years old and the oldest was 83 years old.

Cox Proportional Hazard Regression Model
The Cox proportional hazard model was used to see the effect of independent variables on the dependent variable in the form of the length of time patients were treated at the dr. Zainoel Abidin Hospital. The dependent variable can be defined as a censored event and an uncensored event. In this study, the incidence of censorship was data on the time patients were discharged from the hospital not because they were cured but because they died so that it was not included in the event that was wanted to be observed in this study. The following equation is the Cox proportional hazard regression base model of the recovery rate of TB patients in dr. Zainoel Abidin Hospital.

Testing Proportional Hazard Assumptions Base Model
Based on the base model presented in equation (1), proportional hazard assumptions were carried out by calculating Schoenfeld residuals for each individual in each independent variable. The correlation value between Schoenfeld residuals and variable rank survival time for each independent variable can be seen in Table 3.
From Table 3, p-value < α (0,05) was obtained for the results of the correlation test between survival time and Schoenfeld residuals on the educational background variable (X4) junior high school, with a p-value of 0.028. With the null hypothesis that there was no correlation between survival time and Schoenfeld residuals, there was enough evidence to reject the null hypothesis. This meant that the proportional hazard assumption was not fulfilled for educational backgorund variables (X4) whereas for variables X1, X2, X3, Factors that influence the recovery of TB patients using Cox proportional hazard regression (Zurnila Marli Kesuma, Hizir, Latifah Rahayu, Wardatul Jannah) ___________________________________________________________________________________________________ X 5 , X 6 , X 7 , X 8 , X 9 , X 10 , and X 11 , the proportional hazard assumption was fulfilled. In the next stage, educational background variables (X 4 ) was reduced and no longer used in modeling. Table 4 presents a Cox proportional hazard regression model for the recovery rate of TB patients after reducing the educational background variable (X4). To ascertain whether the factors in the study simultaneously affected the cure rate for TB patients at dr. Zainoel Abidin Hospital, the test was carried out simultaneously. Based on the value of the partial likelihood ratio (G), it was obtained that the value is 20.7 and the p-value was 0.0798. By using α of 0.1, the p-value <α so that the decision taken was to reject H0. This meant that the factors in the study were simultaneously influenced by the cure rate for TB patients in dr. Zainoel Abidin.

Cox Proportional Hazard Regression Model after Reducing Variable
Based on the value of Wald (Z) presented in Table 4, it could be seen that the independent variable that has a p-value of less than 0.1 was type of cough symptoms (X6) in the category of not coughing, symptoms of bleeding cough (X7), and symptoms of sweating at night (X8). By using α of 0.1, the p-value < α , then the decision taken was rejecting H0. The variables which had significant effects on the rate of recovery of TB patients in dr. Zainoel Abidin Hospital were type of cough (X6), symptoms of bleeding cough (X7), and symptoms of sweating at night (X8).

Testing Proportional Hazard Assumptions after Reducing Variable
After reducing the educational background ((X4).variable, the assumption testing of proportional hazard on the model was done. All the p-value > α (0.05). Therefore, all variables meet the proportional hazard assumption so that no more variables would be reduced. Then the three variables in the model (type of cough (X6), symptoms of bleeding cough (X7), and symptoms of sweating at night (X8)) had fulfilled the proportional hazard assumption. This means that the hazard ratios of these variables were constant over time.

Selection of the Best Model
Selection of the best Cox proportional hazard regression model for the recovery rate of TB patients at dr. Zainoel Abidin Hospital was performed using the backward elimination method. The smallest AIC value indicated the best model, as shown in Table 5. The smallest AIC value was 709.67 which referred to model VIII Therefore model VIII was the best model which consisted of three independent variables namely type of cough symptoms (X6), symptoms of coughing blood (X7), and symptoms of sweating at night (X8).
After obtaining the results that the proportional hazard assumption had been fulfilled, we tested the significance of the parameters. The partial likelihood ratio (G) test value of 12.3 with pvalue = 0.015 indicated that model VIII simultaneously influenced the recovery rate of TB patients at dr. Zainoel Abidin Hospital. The partial testing of model VIII is presented in Table 6. Based on the value of Wald (Z) presented in Table 6, we concluded that the independent variables that had p-values less than 0.1 were type of cough symptoms (X6) with the category of not coughing, symptoms of coughing blood (X7), and symptoms of sweating at night (X8). By using α of 0.1, the p-value < α, then decision taken was rejecting H0, which meant that the variables had a significant effect on the variable rate of recovery of TB patients at dr. Zainoel Abidin Hospital. The variables were type of cough symptoms (X6) with the category of not coughing, symptoms of coughing blood (X7), and symptoms of sweating at night (X8).
From the model it could be explained that, the recovery rate for TB patients who did not have cough symptoms was e 0,528 or 1.7. This meant that patients who did not have cough symptoms had a 70% greater chance of recovery compared to patients who had symptoms of phlegm cough.
For the symptoms of coughing blood, the recovery rate of TB patients who had symptoms of coughing blood was e 0,425 or 1.53. This meant that patients who had symptoms of coughing blood had a 53% greater chance of recovery compared to patients who did not have symptoms of coughing blood .
The recovery rate of TB patients who have symptoms of sweating at night was e 0,432 or 1.54. It meant that patients who had symptoms of sweating at night had a 54% greater chance of recovery compared to patients who did not have symptoms of sweating at night.
Zainoel Abidin Hospital Banda Aceh needs to pay more attention to TB patients who did not have symptoms of coughing up blood and sweating at night to optimize the recovery rate. On the other hand, Improving more detailed and complete medical record regarding TB patients needs to be done to provide adequate information.

CONCLUSION
Based on the results of data analysis, it can be concluded that the factors that affect the recovery rate of TB patients in dr. Zainoel Abidin Hospital in the period of July-December 2017 were type of cough symptoms, with the category of not coughing, symptoms of bloody cough and symptoms of sweating at night.