APPLICATION OF COX PROPORTIONAL HAZARDS MODEL IN TIME TO EVENT ANALYSIS OF HIV / AIDS PATIENTS

The Human Immunodeficiency Virus (HIV) and Acquired Immunodeficiency Syndrome (AIDS) remains a public health crisis that has contributed to the majority of deaths recorded in the past decade, affecting Nigeria and other countries of the world as it has become drug resistance in some patients. This study was aimed at estimating the effects of covariates on the survival time for HIV/AIDS patients using the Cox PH model. The KM results indicated that 91 patients were males, out of which 31 experienced the event of interest, and 60 (68.9%) were censored, 209 were females, 65 died due to AIDS, and 144 were censored (68.9%) respectively. The results of the Cox PHM indicated that sex, age, and health of patients are positively associated with death due to AIDS with the associated negative length of survival for HIV/AIDS patients with HR (1.149, 1.235, 1.887, and 1.306) respectively. The study concluded that CD4 cell counts are the only variable or covariate that showed a lower risk of death due to AIDS. The results further stated that patients with high CD4 cell counts have lower risks of death due to AIDS but an increase in survival time considering other factors. The study, therefore recommends that survival analysis should be used to assess the various risk factors and the confounding effects associated with them stressing that a patient’s lifestyle should be improved to live healthy as they continue to age older.


INTRODUCTION
The Human Immunodeficiency Virus (HIV) and Acquired Immunodeficiency Syndrome (AIDS) remains one of the greatest public health challenges still facing Nigeria and the entire world as it has become drug-resistant in some patients (Agada, et al., 2019). Although, the joint United Nations Programme on HIV/AIDS (UNAIDS,2014) stated that an estimated 36.9 million people were living with HIV/AIDS worldwide of which 3.2 million were estimated to be in Nigeria, the second largest burden in the world after South Africa.
However, HIV/AIDS reduces the rate of growth particularly in countries seriously affected by the disease (Agada et al., 2019). Hence, the need to model the time until a patient dies from AIDS or HIV-related death after a confirmed positive result is paramount to both policymakers, Government, medical researchers, and public health. This is because the only available therapeutic option for the treatment of HIV is antiretroviral therapy (ART).
Survival analysis is a collection of statistical procedures for data analysis for which the outcome variable of interest is the time until an event occurs (Kleinbaum and Klein, 2012). Thus, the time variable in survival analysis is usually referred to as the survival time because it gives them the time that an individual has survived over some follow-up period after contracting the virus. The event is usually referred to as a failure because the event of interest is either death, disease, discharge, relapse, incidence, or other negative individual experiences (Kleibaum et al., 2012). The modeling of time to event data is an interesting area with many applications in diverse areas including but not limited to medicine, sociology, marketing, and economics (Emmert-Streib and Dehmer, 2019).
For researchers to analyse time to event data, different survival techniques such as Kaplan-Meier, Log-rank test and Cox proportional hazards (PH) model, and parametric models such as exponential, Weibull model, etc. are needed (Khanal et al., 2019). However, Cox PH is one of the most widely used regression models for the analysis of time to event data since the baseline hazard function embedded in this model does not require to follow any known probability distribution (Khanal et al., 2019). This implies that the model does not require to specify probability distribution to quantify the effects of independent variables. On the contrary, the application of the Cox PH to various epidemiological studies frequently involves heterogeneous and time-varying exposures, many years of follow up and confounding by numerous measured and unmeasured risk factors (Moolgavkar, et al., 2018).
The hallmark of the HIV infection is the progressive depletion of a class of lymphocytes called CD4+ which plays a very important regulatory role in the immune response to infections and tumours (Dessie, 2014). However, since the immune system of an HIV patient depends on the amount of CD4 cell count present in his/her bloodstream which in turn provides a marker for classifying the clinical stages of HIV patients. As strategies to reduce the burden of HIV-associated tuberculosis (TB) by most researchers both intensified case finding (ICF) and antiretroviral therapy were applied to HIV positive individuals with 4 > 350 / in South Africa (Kufa et al., 2016). The results showed that 80.9% were female, 57.9% were on ART with the median CD4 count of 562 cells/μL. The study concluded that TB incidence was high and associated with low body mass index (BMI). Intensified case finding for TB should be strengthened for all HIV positive individuals regardless of their CD4 counts or ART status.
In describing the various effects of ART on the risk of severe bacterial infections in people with high CD4 Cell counts, Lancet (2017) stated that immediate ART reduces the risk of severe bacterial infections in HIV-positive people with high CD4 cell counts.

MATERIALS AND METHODS Data:
The data collected for the study was a retrospective cohort study. The secondary data utilized were collected from the patients' follow-up records. A total of 300 HIV/AIDS patients receiving treatment every six months at the HIV Counselling and Testing (HCT) unit of Federal Medical Centre (FMC) Makurdi were followed from January 2013 -December 2018.

Variables:
The primary outcome variable of interest was survival time after a confirmed diagnosis of HIV. However, some subjects entered the study at different times over the study period. The maximum follow-up time is different for each study participant. The response variable is time. The follow-up time is the number of months between the entry date and the end date, age of subjects at the start of follow-up (in years) and Censor which indicates vital status at the end of the study (1=Death due to AIDS, and 0=Lost to follow-up or alive). The predictor variables which were assumed to influence the survival of HIV patients in the model are sex, age, CD4 count, and patients' health state.

Methods of Data Analysis
Survival analysis: Since the focus of this study was time to event (time to death due to AIDS), the appropriate method for the study was survival analysis. The study employed both Kaplan Meier (KM) estimator and Cox Proportional Hazards (PH) model for the analysis. The Log-rank test was used to compare survival functions. In this study, KM was used to study the survival pattern of HIV/AIDS patients using KM plots to indicate the shape of the survival distribution. The survival function ( ) gives the probability that a patient survives longer than some specified time t. that is, ( ) gives the probability that the random variable exceeds the specified time (Kleinbaum and Klein, 2012). The mathematical computation is represented in the equation (1) below.
( ) = ( > ) 1 Kleinbaum and Klein (2012) further stated that survival function is very fundamental to survival analysis because obtaining survival probabilities for different values of provided unique summary information from survival data.

The Kaplan Meier (KM) Estimator
The KM method according to Stevenson (2007) is based on individual survival times and assumes that censoring is independent of survival time (that is, the reason an observation is censored is unrelated to the cause of failure). The KM estimator of survival at time t is shown in equation 2. In the equation, , =1,2,…, is the total set of failure times recorded (with + as the maximum failure time), is the number of failures at time , and is the number of individuals at risk at time .

Log-Rank Test
To describe how to evaluate whether or not KM curves for two or more groups are statistically equivalent, a log-rank test was employed. The log-rank test is a chi-square test that provides an overall comparison of the KM curves being compared (Kleinbaum and Klein, 2012). They further stated that the logrank test makes use of observed versus expected cell counts over categories of outcomes.

The Cox Proportional Hazards (CPH) Model
The PH model is considered as one of the most popular models used for analysing survival data. It is a semi-parametric model that makes a parametric assumption about the effect of the predictors on the hazard function but makes no assumption regarding the nature of the hazard function ℎ( ) itself (Harrell, 2001). The Cox PH model assumes that predictors act multiplicatively on the hazard function but do not assume that the hazard function is constant (Harrel, 2001;Stel, et al., 2011). Hence, the regression portion of the model is fully parametric. That is, repressors are linearly related to log hazard or log cumulative hazard.
However, in many situations, the form of the true hazard function is either unknown or it is complex, which in turn makes the Cox PH model an advantageous model. Also, one is usually more interested in the effect of the predictors than in the shape of ℎ( ), and the Cox PH model allows the researcher to essentially ignore ℎ( ), which is often not of primary interest. The Cox PH model is usually stated in terms of the hazards function as shown below ℎ( , ) = ℎ 0 ( ) ( 1 1 + 2 2 + ⋯ + ) 3 Where ℎ( ) is the expected hazard at time , 1 , 2 , … , are regression coefficients, ℎ 0 ( ) is the baseline hazard and represents the hazard when all the predictors (independent variables 1 2 … ) are equal to zero. Note that, the predicted hazard ℎ( ) or the rate of suffering the event of interest in the next instant is the product of the baseline hazard ℎ 0 ( ) and the exponential function of the linear combination of the predictors. Thus, the predictors have a multiplicative or proportional effect on the predicted hazard. Thus, the formula for the Cox PH model can simply be written as; Where represents the regression coefficient

RESULTS AND DISCUSSION
The statistical package for social sciences (SPSS version 23) was used for the data analysis. Summary Statistics: Out of the total 300 registered HIV/AIDS  (Table 1a). Also, the age group 38 years and above showed a greater percentage with 202 (67.3%) compared to other age groups as shown in Table 1b.   Table 2a showed that out of the total 300 registered HIV/AIDS patients receiving treatment at the HCT unit of FMC Makurdi, 91 are males (31 experienced the event of interest and 60 were censored accounting for 65.9%) and 209 were females (65 died due to AIDS and 144 were censored with percentage censored of 68.9%). The overall results showed that 96 HIV patients experienced the event of interest (Dead due to AIDS) accounting for 32% and 204 (68%) were censored during the period under review. The age group 38 years and above showed a greater percentage associated with death due to AIDS with 69 deaths against 65.8 (censored) more than any other age group (Table  2b).   (Table 2c).

KM descriptive survival analysis
Based on the results in Table 2d, the Log-rank test indicates that there is a statistically significant difference between the sexes of patients in the survival time during the period under review. The Figures (1 and 2) showed that there were differences among survival curves of age and sex respectively.

Results of the Cox proportional hazards model (CPHM)
To estimate a Cox proportional hazards model and also relates covariates of sex, age, and CD4 cell count of patients to the time of death due to AIDS, the parameters are shown in Table (3a-3c) which exhibited the summary of variables used in the analysis. The Omnibus test of Model coefficients showed a statistically significant model with a chi-square value of 3.126 (p =0.681) and 5 degrees of freedom (Table 3b). This implied that at least one of the variables in the model is statistically significant.  The regression coefficients: The results of the model are shown in Table3c which is also interpreted in terms of hazard ratios (HR). From the results, CD4 cell counts of patients have a negative regression coefficient (-0.001) which showed a reduction in the hazards (lower risks of death). All other covariates have a positive regression coefficient which showed that hazards (risk of death) due to AIDS are higher.  (HR= 0.999). This implies that patients with higher CD4 cell count have lower risks of death due to AIDS but an increased in the length of survival of HIV/AIDS patients. The cumulative survival function at the mean of covariates is shown in Figure 3.

Confidence intervals of the hazard ratios:
The 95% confidence interval (C.I) of the hazard ratios showed that sex of patients has a lower and upper confidence intervals of (0.799 and 1.969); (0.998 and 1.001) for CD4 cell count, and (0.984 and 1.024) for the age of patients respectively. All the variables were statistically significant for the study.

CONCLUSION
The study concluded that CD4 cell count is the only variable or covariate that showed a lower risk of death due to AIDS. Sex, age, and state of patients had high hazards (risks of death) due to AIDS. Age, sex, and patient's health state were positively associated with death due to AIDS with an associated negative length of survival time for HIV/AIDS patients. Patients with high CD4 cell counts have lower risks of death due to AIDS but an increase in the survival time considering other factors. The KM showed that 96 patients experienced the event of interest (death due to AIDS) accounting for 32.0% and 204 (68.0%) were censored.

RECOMMENDATIONS
The study recommends that survival analysis should be used to assess the various risk factors and the confounding effects associated with them. Thus, the patient's lifestyle should be improved to live healthy as they continue to age older.