A Nomogram for End-Stage Renal Disease Prediction in Patients with Type 2 Diabetes Mellitus: A Nationwide Cohort Study in Korea
Article information
Abstract
Background
Despite the rising incidence of end-stage renal disease (ESRD) among individuals with type 2 diabetes mellitus (T2DM) in Korea, no predictive model or nomogram has been developed using a nationwide cohort. In this study, we developed a nomogram to predict the long-term risk of ESRD in patients with T2DM using a large-scale, population-based Korean database.
Methods
Using the Korean National Health Insurance Database, patients with T2DM who underwent health examinations between 2015 and 2016 were assigned as development (n=1,744,277) and validation (n=747,407) cohorts. New ESRD cases were identified using codes for renal replacement therapy. A Cox proportional hazards regression model was used to derive a risk-scoring system, and 13 variables were selected. A risk score nomogram was then created to estimate the risk of ESRD.
Results
In the development cohort, 8,631 patients with T2DM developed ESRD during a follow-up period of 4.8±0.9 years. After multivariable adjustment, significant predictors of ESRD included male sex, current smoking, physical inactivity, low income, low body mass index, hypertension, low-density lipoprotein cholesterol ≥160 mg/dL, chronic kidney disease, insulin use, and longer duration of T2DM. A final nomogram incorporating 13 variables was developed to estimate the individual probability of ESRD. The concordance index for ESRD prediction in the validation cohort was 0.906 (95% confidence interval, 0.9 to 0.912).
Conclusion
This 13-variable nomogram provides a simple tool for identifying patients with T2DM at high risk of ESRD and may aid in early intervention.
INTRODUCTION
End-stage renal disease (ESRD) is an irreversible condition that requires renal replacement therapy (RRT) or kidney transplantation (KT). Globally, the epidemiological burden of ESRD varies significantly, with Korea ranking fourth in ESRD prevalence and sixth in incidence rates (IRs) worldwide [1]. In particular, the rapidly increasing prevalence of type 2 diabetes mellitus in Korea has significantly contributed to the escalating incidence of ESRD, with the number of cases doubling over the past decade since 2010 [2]. According to the 2023 annual report by the United States Renal Data System, South Korea exhibited the highest average annual increase in the incidence of diabetes-related ESRD worldwide between 2011 and 2021, with an average increase of 9.4 cases per million population per year [3]. Given this alarming trend, identifying patients with type 2 diabetes mellitus who are at high risk of developing ESRD is crucial for enhancing health outcomes and establishing effective health policies.
Age, proteinuria, hypertension, hyperuricemia, and a decreased or rapidly declining estimated glomerular filtration rate (eGFR) have been identified as risk factors for ESRD [4,5]. Subsequently, these risk factors have been used to develop several risk prediction models for chronic kidney disease (CKD) and ESRD [6-9]. However, the incidence and prevalence of ESRD, as well its associated risk factors, may vary according to racial background, genetic predisposition, and lifestyle factors, such as dietary habits. Despite the rapidly increasing number of ESRD cases in Korea, no predictive model or nomogram using a nationwide population-based cohort has been developed specifically for Korean individuals with type 2 diabetes mellitus. To address this gap, we aimed to develop a predictive model for ESRD risk in Korean patients with type 2 diabetes mellitus using data from the Korean National Health Insurance Service (NHIS). By leveraging this extensive, population-based dataset, our study aimed to develop a more accurate and region-specific risk assessment tool that can aid in the early identification and prevention of ESRD in the Korean population with type 2 diabetes mellitus.
METHODS
Source of data and study population
We extracted data from the Korean NHIS and Health Insurance Review & Assessment Service (HIRA) database. The Korean NHIS includes an eligibility database (including data on age, sex, socioeconomic status, type of eligibility, and income level), medical treatment database (based on claims submitted by healthcare providers for reimbursement), health examination database (comprising results from general health examinations and lifestyle and behavioral questionnaires), and medical care institution database (including information on facility type, location, available equipment, and number of physicians). Together, these components form a comprehensive repository of health information for 50 million Koreans. Detailed descriptions of the NHIS and HIRA databases are provided in previous studies [10,11]. For this study, we used general health examination data and NHIS healthcare use data, including inpatient (diagnoses and procedures received) and outpatient records. This study was performed in accordance with the principles of the Declaration of Helsinki and was approved by the Institutional Review Board (IRB) of Soongsil University (IRB number: SSU-202202-HR- 411-1). The requirement for informed consent was waived because the study was observational and used anonymized public databases.
Study design
We selected Korean individuals with type 2 diabetes mellitus, identified by International Classification of Diseases, 10th Revision (ICD-10) codes of E11–E14 with either a prescription of oral hypoglycemic agents (OHAs) or fasting blood glucose (FBG) ≥126 mg/dL, who underwent at least one general health checkup between January 1, 2015, and December 31, 2016. The following individuals were excluded: (1) those ˂20 years of age (n=322); (2) those with any missing values from the general health checkup (n=28,030); and (3) those previously diagnosed with ESRD, identified using special medical aid codes for RRT (V001 for hemodialysis, V003 for peritoneal dialysis, and V005 for KT) (n=13,313). A total of 2,571,361 patients were included in the final analysis.
Study outcomes and follow-up
ESRD was defined by the special medical aid codes V001, V003, and V005. Patients who were diagnosed with ESRD before the date of their general health examination were excluded. The claims database was followed until the date of ESRD diagnosis or December 31, 2020, whichever came first.
Predictor variables
All available variables were extracted from the NHIS database. We identified risk factors for ESRD in individuals with type 2 diabetes mellitus based on existing literature and selected variables with high predictive potential [6,8,9,12,13]. We considered the following 13 predictors: sex, FBG level, smoking, alcohol consumption, regular exercise, household income, body mass index (BMI), systolic blood pressure (SBP), low-density lipoprotein cholesterol (LDL-C) level, CKD, use of multiple OHAs (≥3), insulin use, and duration of type 2 diabetes mellitus. Health-related behaviors, such as smoking, alcohol consumption, and regular exercise, were obtained from self-reported questionnaires. Heavy alcohol consumption was defined as an intake of more than 30 g of alcohol per day for males and more than 20 g per day for females. Regular exercise was defined as engaging in vigorous physical activity on more than 3 days per week or moderate-intensity activity on more than 5 days per week. BMI was calculated as weight in kilograms divided by the square of their height in meters (kg/m²). Blood pressure was measured twice in a seated position, and the average of the two readings was recorded.
Venous blood samples were collected after a 12-hour overnight fast to measure FBG, eGFR, total cholesterol, high-density lipoprotein cholesterol, LDL-C, and triglyceride (TG) levels. The eGFR was calculated using the Modification of Diet in Renal Disease formula: 175×(serum creatinine)−1.154×(age)−0.203×0.742 (if female) [14]. CKD was defined as an eGFR <60 mL/min/1.73 m2. BMI was categorized into five groups: <18.5, 18.5–<23, 23–<25, 25–<30, and ≥30 kg/m². Duration of diabetes was categorized as newly diagnosed diabetes mellitus, <5, <10, and ≥10 years based on the date of the first claim for diabetes mellitus and corresponding International Classification of Diseases, 10th Revision, Clinical Modification (ICD-10-CM) codes for diabetes. FBG levels were classified into eight groups: <90, 90–100, 100–110, 110–126,126–140, 140–150, 150–160, and ≥160 mg/dL. SBP was classified into five groups: <100, 100–<120, 120–<140, 140–<160, and ≥160 mm Hg. Household income status was categorized annually into four tiers based on monthly health insurance premiums, ranging from the first quartile (lowest) to the fourth quartile (highest). Individuals in the first quartile were classified as having low-income status.
Statistical analysis
Seventy percent of the participants were randomly assigned to the development cohort for constructing the risk prediction models, whereas the remaining 30% constituted the validation cohort. Continuous variables are presented as mean±standard deviation and categorical variables as percentages. The chi-squared tests were used to assess differences in categorical variables, and independent Student’s t tests were used to compare means of continuous variables. IRs are expressed as events per 1,000 person-years. Cox proportional hazards regression was used to estimate hazard ratios (HRs) and their corresponding 95% confidence intervals (CIs) for predicting the risk of ESRD.
To build the risk prediction model, variable selection was performed using a multivariable Cox proportional hazards model. Risk scores were assigned according to the HRs of each predictor included in the final model. The point scores in the nomogram were derived from the linear predictor (LP) of the Cox model. For each variable category, the difference in LP relative to the reference category was calculated. The variable with the largest effect range was scaled to 100 points, and the scores for all other variables were linearly rescaled. Consequently, the point value of each category reflects the relative increase in risk compared with the reference category, and the sum of points across all predictors corresponds linearly to the LP. The total score was then mapped to survival probabilities, resulting in a clinically interpretable nomogram. The nomogram was generated using the ‘nomogram’ function in the ‘rms’ package of R software (R Foundation for Statistical Computing, Vienna, Austria), which applies this scaling procedure to convert regression coefficients into clinically applicable point scores.
The discriminative performance of the model was assessed using the concordance index (C-index), which is analogous to the area under the receiver operating characteristic curve. All P values were two-tailed, with P<0.05 considered statistically significant. All statistical analyses were performed using SAS version 9.4 (SAS Institute Inc., Cary, NC, USA) and R version 4.3.0.
Data and resource availability
All data generated and analyzed in the current study are managed by the National Health Insurance Data Sharing Service (accessible at https://nhiss.nhis.or.kr/bd/ab/bdaba000eng.do) and this customized data is not available to the public. Researchers were required to submit their research proposals to the IRB and customized data request forms to the HIRA committees. After the final approval from the HIRA committee, only authorized researchers could access the customized database through desktops supplied by the HIRA, and only de-identified data were available for a limited period.
RESULTS
Baseline characteristics
The baseline characteristics of the development and validation cohorts are presented in Table 1. During the follow-up period of up to 6 years (mean±standard deviation, 4.8±0.9), 8,631 patients with type 2 diabetes mellitus in the development cohort and 3,774 patients in the validation cohort developed ESRD. The mean age of individuals who developed ESRD was significantly higher than that of those who did not (64.8 years vs. 59.6 years). Patients who developed ESRD were more likely to be male and have CKD with lower eGFR, hypertension, dyslipidemia, and higher TG levels, whereas their BMI was slightly lower. A higher percentage of patients with ESRD were on insulin and treated with three or more OHAs. Additionally, individuals with a longer duration of type 2 diabetes mellitus were more likely to develop ESRD. ESRD prevalence was higher in both the lowest (Q1) and highest (Q4) income quartiles.
HRs for the development of ESRD
Univariate HRs from model 1 and multivariable-adjusted HRs based on 13 covariates in model 2 for ESRD are presented in Table 2. The IR of ESRD increased with age, from 0.30 per 1,000 person-years in the 20–29 age group to 2.20 per 1,000 person-years among those aged 80 and above. After multivariable adjustment in model 2, significant predictors of ESRD in patients with type 2 diabetes mellitus included male sex, current smoking, physical inactivity, low income, underweight status (BMI <18.5 kg/m²), hypertension, LDL-C ≥160 mg/dL, CKD, insulin use, and longer duration of type 2 diabetes mellitus. Very high or very low FBG levels (<90 and ≥160 mg/dL, respectively) were associated with a higher incidence of ESRD and showed a reverse J-shaped relationship in the multivariable-adjusted HRs. Using FBG <90 mg/dL as the reference, the lowest HR was observed for FBG levels between 140 and 150 mg/dL. However, for FBG levels >160 mg/dL, the HR increased to 0.72 (95% CI, 0.67 to 0.78). We observed a higher IR in males than in females (1.16 per 1,000 person-years vs. 0.81 per 1,000 person-years), while the adjusted HR for ESRD was lower in females (HR, 0.46; 95% CI, 0.43 to 0.48). The presence of CKD dramatically increased both the IR and adjusted risk of ESRD (HR, 23.37; 95% CI, 22.21 to 24.59). A clear trend was observed in which a longer duration of type 2 diabetes mellitus was associated with higher IR and HR for ESRD, with the highest rates seen in individuals with diabetes for more than 10 years.
Development of ESRD prediction score and nomogram
Among the 13 variables examined, current smoking, physical inactivity, low income, underweight status, elevated SBP, higher LDL-C, presence of CKD, insulin use, and longer duration of type 2 diabetes mellitus were the primary contributors to ESRD development in patients with type 2 diabetes mellitus. A nomogram based on the risk prediction model was created to estimate the risk of ESRD (Fig. 1). Each variable was assigned a corresponding point value (Supplemental Table S1). The total score, ranging from 0 to 357, is calculated by summing the individual points from all variables and is then mapped onto the predicted value scale at the bottom of the nomogram. The predicted value represents (1–ESRD risk), such that lower predicted values indicate a higher probability of developing ESRD.
A risk score nomogram. This nomogram was developed using a multivariable Cox proportional hazards model to estimate the 5-year probability of developing end-stage renal disease (ESRD). For use, locate each patient’s value on the corresponding axis, draw a vertical line upward to determine the points for each variable, and sum these to obtain the total points. The total score corresponds to the predicted 5-year probability of ESRD shown at the bottom of the nomogram, allowing intuitive individual risk assessment. FBG, fasting blood glucose; BMI, body mass index; SBP, systolic blood pressure; LDL, low-density lipoprotein; CKD, chronic kidney disease; OHA, oral hypoglycemic agent; DM, diabetes mellitus.
For example, a male (24 points) with an FBG level of 74 mg/dL (24 points) who was a non-smoker (0 points) and did not consume alcohol (12 points); did not engage in regular exercise (6 points); had a household income in the lowest 25% (12 points), BMI of 18.3 kg/m2 (25 points), SBP of 98 mm Hg (0 points), and LDL-C level of 68 mg/dL (0 points); had CKD (100 points); was taking two OHAs (5 points); was on insulin treatment (29 points); and had had diabetes for 4 years (8 points) would result in a total score of 245. According to the nomogram (Fig. 1), this patient had a total score of 245, which corresponds to an estimated 15% probability of developing ESRD within 5 years, as depicted in Fig. 2.
The incidence probability of end-stage renal disease (ESRD) according to the total risk score. This figure shows the predicted 5-year probability of developing ESRD across the range of total risk scores calculated using the developed prediction model. The plot visualizes the relationship between total risk score and estimated ESRD risk for intuitive interpretation.
Validation of the ESRD prediction model
The performance of the ESRD prediction model was evaluated in the validation cohort using the C-index (Supplemental Table S2). The overall C-index was 0.906 (95% CI, 0.9 to 0.912), indicating an excellent predictive accuracy. Age-stratified analysis showed that the C-index was the highest in individuals aged <40 years, followed by those aged 40–64 years. In sex-specific analysis, the C-index was slightly higher in males than in females. The IRs of ESRD observed in the validation cohort closely matched those predicted by the model across deciles of the total risk score (Supplemental Fig. S1). Notably, as the total risk score increased, the IR of ESRD rose consistently in both the development and validation cohorts, with a particularly pronounced increase in the highest decile (total score >141 points). This alignment between predicted and observed IRs across all risk deciles supports the reliability and generalizability of this model. Furthermore, the calibration plot demonstrated that the model accurately predicts the risk of ESRD (Supplemental Fig. S2). Collectively, these findings confirm the robust predictive performance and practical utility of this ESRD prediction model.
DISCUSSION
In this study, we developed a risk prediction model and nomogram to estimate the risk of developing ESRD in patients with type 2 diabetes mellitus. Using this model, we identified key contributors to ESRD progression, including current smoking, physical inactivity, low income, underweight status, elevated SBP, higher LDL-C, presence of CKD, insulin use, and longer duration of type 2 diabetes mellitus. Many of these predictors are well-established risk factors for CKD progression, and several existing models have been proposed using similar variables. One such model is the Kidney Failure Risk Equation (KFRE), a widely used tool for predicting ESRD risk in patients with CKD [15]. Although KFRE is highly practical, incorporating only four or eight routinely collected clinical variables, its applicability may be limited in populations outside Canada, where it was originally developed and validated.
To address the need for a population-specific model, the KoreaN cohort study for Outcome in patients with Chronic Kidney Disease (KNOW-CKD) was established to track the natural history of CKD in Korea over >10 years and to develop prediction models for long-term outcomes of CKD, including ESRD [16]. However, because the cohort consists of patients recruited from nine hospitals rather than a nationwide sample, its generalizability remains uncertain. Furthermore, an ESRD risk prediction model based on the KNOW-CKD cohort has not yet been published.
Thus, our novel ESRD prediction model and nomogram have significant value, as they use nationwide Korean data to estimate the risk of ESRD in patients with type 2 diabetes mellitus. Given the limitations of previous models, including the population-specific nature of KFRE and the relatively small cohort sizes, our study provides a more comprehensive and generalizable approach to ESRD risk assessment in Korean patients with type 2 diabetes mellitus. Unlike KFRE, which relies solely on age, sex, and laboratory test results, our model incorporates modifiable lifestyle factors, including smoking, physical activity, and alcohol consumption. This inclusion enables healthcare providers to offer personalized lifestyle recommendations, facilitating targeted interventions to mitigate ESRD risk and slow disease progression. Beyond risk estimation, the nomogram may serve as a practical tool for early intervention in patients with type 2 diabetes mellitus who are at high risk of developing ESRD. By integrating both traditional risk factors and modifiable lifestyle factors, healthcare providers can identify individuals at high risk and implement proactive measures, such as optimizing glycemic control, managing hypertension, and promoting healthy behaviors. Its intuitive visual representation enhances patient engagement, encouraging active participation in disease management. This personalized approach may help slow the progression of diabetic kidney disease and ultimately reduce ESRD incidence in the Korean population with type 2 diabetes mellitus.
Our findings on alcohol consumption offer insights that diverge from conventional expectations. When individuals who self-identified as non-drinkers were used as the reference group, a noticeable trend emerged: the HR for developing ESRD appeared to decrease with increasing alcohol consumption. This finding aligns with a prospective cohort study of 65,601 Chinese males, which found that alcohol consumption was inversely associated with the risk of ESRD [17]. A meta-analysis of 25 prospective cohort studies revealed that light (<12 g/day), moderate (12–24 g/day), and heavy (>24 g/day) alcohol consumption were associated with a protective effect against CKD [18]. Several mechanisms may explain the beneficial effects of alcohol consumption, such as reduced platelet aggregation and fibrinolytic activity, as well as antioxidant properties attributed to polyphenols [19,20]. However, these findings should be interpreted with caution, as individuals who report alcohol consumption may represent a relatively healthier population compared to non-drinkers, potentially resulting in selection bias.
Socioeconomic disparities are widely recognized as key contributors to CKD and ESRD [21-23]; however, research incorporating household income into ESRD risk prediction models remains limited. Given that low income is closely linked to dietary habits, health-related behaviors, and healthcare use patterns, the absence of socioeconomic factors in existing ESRD risk models represents a critical research gap. Our model uniquely integrates household income by leveraging nationwide NHIS data, enhancing its capacity to capture broader socioeconomic determinants of ESRD risk.
The relationship between BMI and ESRD risk is well-documented, with multiple epidemiological studies reporting a positive association between higher BMI and an increased risk of ESRD [24,25]. However, studies conducted in Asian populations have reported an inverse association, where lower BMI is linked to a higher risk of ESRD [26-28]. Our model supports this finding, as being underweight (BMI <18.5 kg/m²) was associated with an increased risk of ESRD, reinforcing the importance of population-specific risk stratification.
Beyond incorporating alcohol consumption, household income, and BMI, our model has several notable strengths. Specifically tailored for Korean patients with type 2 diabetes mellitus, the model includes 13 variables, allowing a comprehensive and population-specific risk assessment. Additionally, our nomogram provides an intuitive and accessible tool for estimating ESRD risk, thereby enhancing its clinical utility. Furthermore, because our model is derived from nationwide clinical and socioeconomic data, it has the potential to inform public health planning and policy development. Nonetheless, this study has certain limitations. First, the model’s generalizability to other ethnic groups is uncertain, as external validation in multi-ethnic cohorts has not yet been performed. Second, although diabetes is the leading cause of RRT initiation in Korea, our outcome measure may include ESRD cases caused by autoimmune diseases such as systemic lupus erythematosus and chronic glomerulonephritis. Third, key clinical markers, including urine albumin-creatinine ratio and glycated hemoglobin, were unavailable in the national health examination database, which limited the ability to directly assess glycemic control. To account for this, the duration of type 2 diabetes mellitus, insulin use, and the use of multiple OHAs were included as surrogate markers of glycemic control. Fourth, lifestyle-related factors such as smoking, alcohol consumption, and physical activity may vary over time, potentially limiting the long-term predictive validity of the nomogram. To address this, we also developed a simplified version excluding these modifiable variables, which is provided in Supplemental Fig. S3. This simplified model may enhance the generalizability and clinical utility of the tool.
Using this prediction model, we identified major contributors to the development of ESRD, such as current smoking, physical inactivity, low income, underweight status, elevated SBP, higher LDL-C, presence of CKD, insulin use, and duration of type 2 diabetes mellitus. It is crucial that physicians closely monitor patients, particularly those with multiple risk factors for ESRD or high predictive scores. This 13-variable risk prediction nomogram may serve as a valuable tool for identifying patients with type 2 diabetes mellitus who are at a high risk of developing ESRD.
Supplementary Material
Supplemental Table S1.
Scores for Each Risk Factor Category
Supplemental Table S2.
The Concordance Index for ESRD Prediction in the Validation Cohort
Supplemental Fig. S1.
Comparison of end-stage renal disease (ESRD) incidence rates (per 1,000 person-years) by decile groups of the total risk score. Incidence rate (per 1,000 person-years) based on the decile groups of the total risk score in the development and validation cohorts. The numbers on the x-axis represent the range of the total risk score according to each decile group.
Supplemental Fig. S2.
Calibration plot.
Supplemental Fig. S3.
Simplified version of the risk score nomogram. CKD, chronic kidney disease; OHA, oral hypoglycemic agent; DM, diabetes mellitus; BMI, body mass index; SBP, systolic blood pressure; FBG, fasting blood glucose; LDL, low-density lipoprotein cholesterol.
Notes
CONFLICTS OF INTEREST
No potential conflict of interest relevant to this article was reported.
ACKNOWLEDGMENTS
This work was performed using data from the Korean National Health Insurance Service (NHIS). The authors would like to thank the NHIS for their cooperation. We used the National Health Information Database constructed by the Korean NHIS, and the study results do not necessarily represent the opinion of the Korean NHIS. This work was supported by the Bio & Medical Technology Development Program of the National Research Foundation (NRF) funded by the Korean government (MSIT) (NRF-2019M3E5D3073102 and NRF-2023R1A2C2003479, to Nan Hee Kim), and by a National IT Industry Promotion Agency (NIPA) grant funded by MSIT (No. S0252-21-1001, Development of AI Precision Medical Solution (Doctor Answer 2.0), to Nan Hee Kim). It was also supported by a grant of the Korea Health Technology R&D Project through the Korea Health Industry Development Institute (KHIDI), funded by the Ministry of Health & Welfare, Republic of Korea (grant number: H123C 0679, to Nan Hee Kim). Additionally, this work was supported by a grant from the Korea University Ansan Hospital (O2515231, to Inha Jung).
AUTHOR CONTRIBUTIONS
Conception or design: I.J., K.D.H., N.H.K. Acquisition, analysis, or interpretation of data: I.J., B.S.K., S.Y.P., D.Y.L., J.H.Y., J.A.S., K.D.H., N.H.K. Drafting the work or revising: I.J., B.S.K., S.Y.P., D.Y.L., J.H.Y., J.A.S., K.D.H., N.H.K. Final approval of the manuscript: I.J., B.S.K., S.Y.P., D.Y.L., J.H.Y., J.A.S., K.D.H., N.H.K.
