Comprehensive Proteomics and Machine Learning Analysis to Distinguish Follicular Adenoma and Follicular Thyroid Carcinoma from Indeterminate Thyroid Nodules
Article information
Abstract
Background
The preoperative diagnosis of follicular thyroid carcinoma (FTC) is challenging because it cannot be readily distinguished from follicular adenoma (FA) or benign follicular nodular disease (FND) using the sonographic and cytological features typically employed in clinical practice.
Methods
We employed comprehensive proteomics and machine learning (ML) models to identify novel diagnostic biomarkers capable of classifying three subtypes: FTC, FA, and FND. Bottom-up proteomics techniques were applied to quantify proteins in formalin-fixed, paraffin-embedded (FFPE) thyroid tissues. In total, 202 FFPE tissue samples, comprising 62 FNDs, 72 FAs, and 68 FTCs, were analyzed.
Results
Close spectrum-spectrum matching quantified 6,332 proteins, with approximately 9% (780 proteins) differentially expressed among the groups. When applying an ML model to the proteomics data from samples with preoperative indeterminate cytopathology (n=183), we identified distinct protein panels: five proteins (CNDP2, DNAAF5, DYNC1H1, FARSB, and PDCD4) for the FND prediction model, six proteins (DNAAF5, FAM149B1, RPS9, TAGLN2, UPF1, and UQCRC1) for the FA model, and seven proteins (ACTN4, DSTN, MACROH2A1, NUCB1, SPTAN1, TAGLN, and XRCC5) for the FTC model. The classifiers’ performance, evaluated by the median area under the curve values of the random forest models, was 0.832 (95% confidence interval [CI], 0.824 to 0.839) for FND, 0.826 (95% CI, 0.817 to 0.835) for FA, and 0.870 (95% CI, 0.863 to 0.877) for FTC.
Conclusion
Quantitative proteome analysis combined with an ML model yielded an optimized multi‐protein panel that can distinguish FTC from benign subtypes. Our findings indicate that a proteomic approach holds promise for the differential diagnosis of FTC.
INTRODUCTION
Follicular thyroid carcinoma (FTC) is the second most common thyroid cancer, accounting for 10% to 15% of thyroid malignancies [1,2]. The diagnostic hallmark of FTC is the capsular or vascular invasion of tumor cells, a feature that cannot be discerned through cytological examination alone. Fine needle aspiration (FNA) of a thyroid nodule cannot reliably distinguish FTC from follicular adenoma (FA) or benign follicular nodular disease (FND) because these conditions share overlapping cytological features [3]. According to the Bethesda system for reporting thyroid cytopathology, FTC is often categorized as atypia of undetermined significance (AUS) or follicular neoplasm (FN) on FNA [4]. These cytologic categories correspond to an actual malignancy risk of 13%–30% for AUS and 23%–34% for FN, meaning that the majority of patients undergo unnecessary surgery.
Due to the limited diagnostic accuracy of FNA in the preoperative setting, significant efforts have focused on discovering novel biomarkers to differentiate FTC from FA and FND. Such biomarkers encompass genetic mutations [5,6], gene expression profiles [7,8], and protein expression patterns [9-11]; however, each approach has limitations that hinder widespread clinical application. Currently, molecular diagnostic methods remain largely unavailable in routine clinical practice outside the United States, primarily due to high costs.
Machine learning (ML) applied to omics data offers enhanced accuracy in diagnostic prediction models. Several diagnostic trials using genomic data have been reported, with some approaches already being commercialized [12-15]. The ML approach entails feature selection, modeling based on the selected features, and subsequent validation to assess the model’s performance.
Proteomics, the quantitative analysis of all proteins expressed in tissue samples, is a powerful tool for identifying novel molecular biomarkers. It enables the detection of molecular signatures that reflect the involvement of multiple biological pathways in the management of thyroid nodules. Moreover, mass spectrometry (MS)-based proteomics, which can analyze post-translational modifications (PTMs) like iodination, phosphorylation, and acetylation, offers a more comprehensive description of thyroid tumors. Furthermore, tissue proteome analysis can overcome the limitations of existing biomarkers by enabling faster processing times (<2 days) and requiring only minimal tissue samples (less than 100 μg) [16,17]. Although previous studies reported several potential protein biomarkers for FTC [9-11,18-20], large-scale validation is lacking, and few reliable biomarkers have been established to date. Various ML algorithms—including supervised, unsupervised, and reinforcement learning—have been developed and applied to proteomics data [21,22]. In this study, we applied an area under the curve (AUC)-based random forest (RF) backward-elimination process [23] for feature selection and subsequently used RF for model building.
The aim of the present study was to identify potential biomarkers for predicting FTC through global proteome analysis of formalin-fixed, paraffin-embedded (FFPE) tissues from patients with FNs. Additionally, we sought to validate these biomarker candidates using an ML model to differentiate FTC from benign nodules.
METHODS
Subjects
A total of 202 FFPE tissue samples, comprising FNDs (n=62), FAs (n=72), and FTCs (n=68), were analyzed. These tissues were obtained from surgical specimens collected after thyroidectomy, performed between 2008 and 2018 at Asan Medical Center, Seoul, Korea. All specimens were reviewed by an experienced endocrine pathologist (Dong Eun Song). All FTC cases were of the minimally invasive subtype. Only patients who provided informed consent during follow-up were included, and the Institutional Review Board of Asan Medical Center approved all data collection and subsequent analyses (IRB No. 2020-1219).
FFPE tissue proteomic analysis by liquid chromatography-mass spectrometry, statistical analysis, and ML
FFPE tissue-digested peptide samples were analyzed by tandem mass spectrometry (MS/MS) using a Q Exactive Plus Hybrid Quadrupole-Orbitrap mass spectrometer equipped with a nanoelectrospray ionization source, coupled with a Dionex UltiMate 3000 RSLCnano system (Thermo Fisher Scientific, Pittsburgh, PA, USA). Raw data files were searched using both open and closed methods for peptide-spectrum matches (PSMs; MSFragger version 2.3 [24]) (https://msfragger.nesvilab.org) and spectrum-spectrum matches (SSMs; Approximate Nearest Neighbor Spectral Library [ANN-SoLo] [25]) against the human protein sequence database (accessed on February 21, 2020 from UniProt [26]; https://www.uniprot.org). Data analysis, including normalization and missing data imputation, was conducted using Perseus version 1.6.15.0 (https://mybiosoftware.com/perseus-shotgun-proteomics-data-analysis.html) [27]. Feature selection and prediction model generation were performed using an RF-based backward-elimination process [23] in R version 3.6.0 (R Foundation for Statistical Computing, Vienna, Austria). The Student’s t test was employed to compare continuous variables of protein quantification, as depicted in volcano plots. All tests were two-sided, with P values <0.05 considered statistically significant. Adjusted P values were calculated using the Benjamini-Hochberg procedure to control for multiple comparisons. Differences in continuous singscore [28] variables were assessed using either the Wilcoxon rank-sum test or the Kruskal-Wallis analysis of variance (ANOVA) test. Additional software packages used included rapid integration of term annotation and network (RITAN) for functional analysis, ggplot2 for generating scatter plots, boxplots, and volcano plots, and pheatmap for creating heatmaps. Detailed materials and methods are provided in the Supplemental Methods.
RESULTS
Baseline characteristics
The characteristics of the 202 patients, stratified by histologic subtype, are summarized in Table 1. The overall mean age was 50 years; however, significant age differences were observed among subtypes, with FTC patients being the youngest (mean, 43 years) and FND patients the oldest (mean, 52 years; P=0.025). Most patients were female (82.7%), and the mean tumor size was 3.0 cm (interquartile range, 2.2 to 3.8). According to the Korean Thyroid Imaging Reporting and Data System (K-TIRADS), most tumors were classified as ‘low suspicion’ (67.8%), followed by ‘intermediate suspicion’ (25.7%), a pattern consistent across all three subtypes. Preoperative FNA diagnoses exhibited significant differences among the histologic subtypes. In the FTC group (n=68), 32 cases (47.1%) were diagnosed as AUS and 31 cases (45.6%) as FN. Similarly, among FA patients (n=72), FNA diagnoses were nearly evenly split between AUS (n=36, 50%) and FN (n=35, 48.6%). Conversely, most FND patients had preoperative AUS diagnoses (n=53, 85.5%), with only five patients (8.1%) being classified as benign by FNA.
Comparative proteomics
The workflow for the comparative proteomic analysis of FFPE thyroid tissues is depicted in Supplemental Fig. S1. The global proteomes of 202 samples were analyzed using liquid chromatography-tandem mass spectrometry (LC-MS/MS). For MS spectrum analysis, both closed and open searches were conducted against a protein database using MSFragger [24]. Subsequently, a spectral library was generated based on the open search results, and protein analysis was performed using the ANN-SoLo program for SSM [29]. Within the entire MS/MS (MS2) spectrum, the coverage for FTC, FA and FND samples was as follows: open SSMs covered 70.54%, 69.88%, and 66.60%, respectively; open PSMs covered 49.45%, 45.54%, and 38.99%; closed SSMs covered 19.79%, 20.70%, and 15.35%; and closed PSMs covered 22.00%, 18.51%, and 14.30% (Fig. 1A). Using these four spectral matching methods, 4,848 proteins were commonly quantified out of a total of 8,554 (Fig. 1B). Notably, open SSMs quantified 22% more proteins than open PSMs, and closed SSMs quantified 42.7% more proteins than closed PSMs. In particular, open SSMs matched and quantified the largest number of proteins compared to the other methods, excluding proteins that constituted less than 2.55% of the total. Overall, 4,100,733 PSMs were identified, of which 46% were classified as non-modified PSMs in closed mode and 11% were identified as FFPE artifact modified PSMs (Supplemental Fig. S2).

Liquid chromatography-tandem mass spectrometry (LC-MS/MS)-based proteomic results of 202 thyroid formalin-fixed, paraffin-embedded (FFPE) samples. (A) The number of identified MS spectra in the 202 thyroid FFPE tissues: MS/MS (MS2) spectra (magenta), open search spectrum-spectrum matches (open SSMs; green), open search peptide-spectrum matches (open PSMs; orange), closed search SSMs (red), and closed search PSMs (blue). (B) Venn diagram showing the number of proteins identified by open SSM and PSM searches compared with closed SSM and PSM searches. (C) Distribution of normalized protein abundances based on label-free quantification from the open SSM search, with thyroid gland-elevated proteins highlighted in red. (D) Scatter plot of total protein count versus the percentage of thyroglobulin (TG) spectra matched; open search results from MSFragger (black circles) and closed search results from Approximate Nearest Neighbor Spectral Library (ANN-SoLo) (blue circles). The inset shows the number of proteins identified by PSMs and SSMs. (E) Boxplots displaying the number of identified proteins and the percentages of three post-translation modifications (PTMs) (methylation, oxidation/hydroxylation, and iodination) in samples stored short-term (1–5 years; n=38) versus long-term (6–11 years; n=164). (F) Scatter plot comparing protein quantification between the two storage time groups, with a Pearson’s correlation coefficient of 0.93.
Comparisons with the Human Protein Atlas (https://www.proteinatlas.org/) [30] revealed that, among the 8,341 proteins quantified using open SSMs, 85 were found to be elevated in thyroid tissues (Fig. 1C). Thyroglobulin (TG) emerged as the most abundant protein in thyroid tissue, with a quantification level more than five times that of the third most abundant protein, albumin. This observation underscores the challenge of detecting low-abundance proteins using liquid chromatographymass spectrometry (LC-MS). Across 202 samples, a negative correlation was observed between the total number of proteins identified by open SSMs and PSMs and the proportion of TG spectra detected by open SSMs and PSMs, respectively (R2=0.525, P<2.0×10–16 for open SSMs [black circles], and R2=0.162, P=1.77×10–9 for open PSMs [blue circles]) (Fig. 1D). This indicates that PSM was prone to overrepresenting the highabundance protein TG (2,768 amino acids) due to target-decoy analysis, which limits the detection of low-abundance proteins. In contrast, SSM—employing spectral library matching—reduces database dependency and false positives, thereby enhancing sensitivity for low-abundance proteins.
When comparing the number of proteins identified by two methods—peptide spectrum matching in MSFragger and spectrum-to-spectrum search in ANN-SoLo—a higher number of proteins was identified using the spectrum-to-spectrum search. The number of proteins identified by both methods showed a positive correlation (R2 =0.589, P<2.0×10–16) (inner rectangle in Fig. 1D), indicating that most proteins identified in MSFragger were also identified in ANN-SoLo.
Impact of tissue archival time on the proteome
Proteomic results were analyzed with respect to tissue archival time, a factor of particular importance in designing large discovery-based studies to avoid potential sampling biases. The tissues were collected and archived between 2008 and 2018, with a median archival time of 7 years and a maximum of 11 years. Samples were categorized into two groups based on archival time: 6–11 years (n=164) and 1–5 years (n=38). A comparison of the total number of quantified proteins between these groups revealed that the short-term group (1–5 years) had approximately 9.48% more quantified proteins than the long-term group (6–11 years) (Fig. 1E). We investigated whether the lower identification rates were associated with increased levels of methionine oxidation, lysine methylation, formaldehyde adduct formation (common variable peptide modifications in FFPE tissue), and tyrosine iodination (a variable modification involved in thyroid hormone synthesis by TG) over time. We observed a general trend of higher peptide modification rates in the long-term storage samples, indicating progressive protein modification over time [31]. However, the difference, at approximately 1%, was insufficient to account for the 16% reduction in peptide identifications observed in the oldest samples. A previous study reported that tissues stored for over 20 years exhibited lower identification rates, attributed to compromised retrieval of low-abundance proteins [32]. Nevertheless, the impact on global protein quantification was minor, as evidenced by the high proteome correlations between the different archival age groups (r=0.93) (Fig. 1F).
Distinguishing three subtypes based on major pathways
We quantified 6,332 proteins across all 202 thyroid FFPE samples using close SSMs, leveraging the advantages of closed SSM-based label-free quantification to enhance consistency and reproducibility across different experiments and conditions. Partial least squares-discriminant analysis (PLS-DA) moderately separated the FTC, FA, and FND groups into two components: component 1 (explaining 9% of variance) distinguished FND from FA and FTC, while component 2 (2% of variance) differentiated FA from FTC and FND (Fig. 2A, Supplemental Fig. S3A). Proteomic results revealed that FA and FTC exhibited an expression signature consistent with the Warburg effect compared to FND. This signature included increased expression of glycolytic and gluconeogenic proteins such as hexokinase-1 (HK1), phosphoglucomutase 1 (PGM1), glucose-6-phosphate isomerase (GPI), fructose-1,6-bisphosphatase 1 (FBP1), phosphofructokinase (PFKM), aldolase A (ALDOA), triosephosphate isomerase 1 (TPI1), glyceraldehyde-3-phosphate dehydrogenase (GAPDH), phosphoglycerate kinase 1 (PGK1), enolase 1 (ENO1), pyruvate kinase L/R (PKLR), pyruvate dehydrogenase beta subunit (PDHB), alcohol dehydrogenase 5 (ADH5), aldo-keto reductase family 1 member A1 (AKR1A1), lactate dehydrogenase A (LDHA), dihydrolipoamide S-acetyltransferase (DLAT), and phosphoenolpyruvate carboxykinase 2 (PCK2) (Fig. 2B). Notably, LDHA expression was elevated in FA and FTC, thereby accelerating the conversion of pyruvate to lactic acid. Using 50 hallmark gene sets from the Molecular Signatures Database (MSigDB) [33], enrichment scores for each sample were calculated using the singscore method. Correlation plots for 22 pathways, which exhibited significant differences in at least two groups as determined by one-way ANOVA, are presented (Fig. 2C). Specifically, MYC targets V1 and glycolysis pathways were upregulated in FA and FTC, whereas the inflammatory response pathway was downregulated in FTC and phosphoinositide 3-kinase (PI3K)/protein kinase B (AKT)/mammalian target of rapamycin (mTOR) signaling was downregulated in FA (Fig. 2D). The singscores for the remaining 18 pathways across samples are provided in Supplemental Fig. S3B.

Partial least squares–discriminant analysis (PLS-DA) and functional pathway analysis of 202 thyroid formalin-fixed, paraffin-embedded (FFPE) tissue proteome. (A) Two-dimensional score plot of the three thyroid subtypes: follicular nodular disease (FND; n=62, cyan), follicular adenoma (FA; n=72, green), and follicular thyroid carcinoma (FTC; n=68, red) based on PLS-DA. (B) Heatmap showing the relative quantitative values of proteins mapped to the glycolysis/gluconeogenesis Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway for the three subtypes; color scale: –1 (green), 0 (gray), 1 (red). (C) Hierarchical Spearman’s rank correlation clustering of 22 hallmark gene sets (P<0.05 by one-way analysis of variance [ANOVA]) out of 50 hallmark gene sets. (D) Boxplots of four representative hallmark signal pathways using single-sample scores generated by the singscore algorithm. P values calculated using the Mann-Whitney U test. NS, not significant. aP<0.05; bP<0.01; cP<0.001.
Quantitative analysis between subtypes
We examined the differentially expressed proteins (DEPs) among the three groups. The first volcano plot, which plots log2 fold changes versus minus log10-adjusted P values, identified 28 proteins upregulated in FND and 379 proteins upregulated in FA (|log2 fold-change| >1 and adjusted P<0.05) (Fig. 3A). P values were adjusted using the Benjamini-Hochberg method to control the false discovery rate. The second volcano plot identified 43 proteins upregulated in FND and 482 proteins upregulated in FTC (|log2 fold-change| >1 and adjusted P<0.05) (Fig. 3B). The third volcano plot identified 59 proteins upregulated in FA and 59 proteins upregulated in FTC (|log2 fold-change| >1 and adjusted P<0.05) (Fig. 3C). Among the DEPs, analysis of the upregulated proteins in FND for both FA and FTC revealed two sets comprising 12 common proteins, 16 FA-specific proteins, and 31 FTC-specific proteins (Fig. 3D). Furthermore, the overlap between FA-upregulated DEPs for FND and FTC-upregulated DEPs for FND included 186 common proteins. By combining the exclusively upregulated and downregulated proteins from each comparison, we identified 224 FA-specific proteins (with subsets of 165, 28, and 31) and 311 FTC-specific proteins (with subsets of 252, 44, and 15). These proteins were significantly enriched in various Gene Ontology (GO) terms and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways (Fig. 3E). FA-specific proteins were involved in enriched KEGG pathways such as the ribosome, proteasome, and tricarboxylic acid (TCA) cycle, as well as GO terms related to translation and protein targeting to the ER or membrane. Conversely, FTC-specific proteins were linked to KEGG pathways such as focal adhesion and spliceosome, and to GO terms associated with neutrophil immunity and platelet aggregation. The 186 common proteins were associated with energy metabolism pathways, including glycolysis, the TCA cycle, and mitochondrial functions. Furthermore, two specific protein lists for FA and FTC were analyzed using Metascape (Supplemental Fig. S4) [34]. The enrichment signal was stronger for FTC-specific proteins compared to FA-specific proteins for the most significant functional terms. Typically, the pathways enriched by proteins specific to both tissues include translation, cytokine signaling within the immune system, and mitochondrial translation. FTC-specific proteins were primarily associated with biological processes such as neutrophil degranulation, vascular endothelial growth factor A (VEGFA)-VEGFR2 signaling, amino acid metabolism, and regulation of DNA metabolic processes, whereas FA-specific proteins were predominantly involved in carbon metabolism (Supplemental Fig. S4A). Using the molecular complex detection algorithm in Metascape, we identified eight protein–protein interaction (PPI) network modules in FTC-specific proteins and nine modules in FA-specific proteins. Detailed results from the Metascape analysis are provided in Supplemental Fig. S4B, C.

Volcano plots and functional annotation of differentially expressed proteins (DEPs) among three thyroid subtypes: benign follicular nodular disease (FND; n=62, cyan), follicular adenoma (FA; n=72, green), and follicular thyroid carcinoma (FTC; n=68, red). (A) Volcano plot displaying log2 fold changes versus log10-adjusted P values for proteins comparing FND and FA; proteins upregulated by more than 2-fold (adjusted P<0.05) are highlighted. (B) Volcano plot comparing FND and FTC. (C) Volcano plot comparing FA and FTC. Cyan, green, and red circles represent upregulated proteins in FND, FA, and FTC, respectively, while gray circles indicate non-significant differences. (D) Venn diagram showing the overlap of DEPs among the three subtypes; the left side represents DEPs in FND, while the right side represents DEPs in FA and FTC. (E) Bar graphs visualizing significant Kyoto Encyclopedia of Genes and Genomes (KEGG) and Gene Ontology (GO): biological process (BP) functional terms (with false discovery rate [FDR] q values) for FA and FTC compared to FND; functional terms in the Venn diagram (D) are color-coded accordingly (green, orange, and red). TCA, tricarboxylic acid.
Clinical unmet needs in predicting subtypes using proteomic-based ML models
The clinical objective of this study was to improve the accuracy of FTC screening through a novel proteomic-based method. To ensure statistical reliability, we focused on 592 proteins that were quantified in more than 70% of the thyroid tissue samples [35]. A total of 183 samples (90.6%) that were classified as Bethesda categories III or IV were subjected to further analysis. Univariate receiver operating characteristic (ROC) analysis was conducted on these 592 proteins (class 1; details in Supplemental Methods) for each of the three groups (Supplemental Fig. S5). MacroH2A.1 histone (MACROH2A1) (area under the ROC curve [AUROC]=0.720) emerged as the best discriminating protein for FTC, ubiquinol-cytochrome c reductase core protein 1 (UQCRC1) (AUROC=0.736) for FA, and dynein axonemal assembly factor 5 (DNAAF5) (AUROC=0.773) for FND.
To improve predictive performance and identify a meaningful combination of proteins to differentiate among the three subtypes, we generated three clinical models based on RF [36]. Using an AUC-based RF backward-elimination process [23], the FND prediction model selected five proteins (carnosine dipeptidase 2 [CNDP2], DNAAF5, dynein cytoplasmic 1 heavy chain 1 [DYNC1H1], phenylalanyl-tRNA synthetase subunit beta [FARSB], and programmed cell death 4 [PDCD4]; probability ≥ 0.65), the FA prediction model selected six proteins (DNAAF5, family with sequence similarity 149 member B1 [FAM149B1], ribosomal protein S9 [RPS9], transgelin 2 [TAGLN2], UPF1, and UQCRC1; probability ≥0.9), and the FTC prediction model selected seven proteins (actinin alpha 4 [ACTN4], destrin, actin depolymerizing factor [DSTN], MACROH2A1, nucleobindin 1 [NUCB1], spectrin alpha, non-erythrocytic 1 [SPTAN1], TAGLN, and X-ray repair cross complementing 5 [XRCC5]; probability ≥0.75).
RF models were constructed based on the features selected for each model (Supplemental Fig. S6). To assess the robustness of the ML models, the samples were randomly divided into thirds 100 times, with the model built using two-thirds of the samples and validated on the remaining third. The performance evaluation of the FND, FA, and FTC classifiers revealed median AUC values of 0.832 (95% confidence interval [CI], 0.824 to 0.839), 0.826 (95% CI, 0.817 to 0.835), and 0.870 (95% CI, 0.863 to 0.877), respectively (Fig. 4A-C). Furthermore, each sample was assigned a disease classification based on the highest of the three probability scores derived from the prediction models (Fig. 4D). This approach correctly classified 141 out of 183 subjects, yielding an accuracy of 77.05%. For FND samples (n=55), the accuracy was 87.98%, with a sensitivity of 80% and a specificity of 74.19%. For FA samples (n=66), accuracy reached 85.25%, with a sensitivity of 77.27% and a specificity of 89.74%. For FTC samples (n=62), accuracy was 80.87%, with a sensitivity of 74.19% and a specificity of 84.3%.

Receiver operating characteristic (ROC) curves of three subtype-specific random forest classifiers: follicular nodular disease (FND)-machine learning (ML), follicular adenoma (FA)-ML, and follicular thyroid carcinoma (FTC)-ML and prediction scores by the classifiers of the entire sample. ROC curves were generated using 100 repeats of threefold cross-validation, plotting the 25th, 50th, and 75th percentiles of sensitivities for each 1-specificity value. (A) ROC curve for the FND-ML model based on five proteins, evaluated in a set of 183 samples (55 FND and 128 others; area under the ROC curve [AUROC], 0.832; 95% confidence interval [CI], 0.824 to 0.839). (B) ROC curve for the FA-ML model based on six proteins, evaluated in 183 samples (66 FA and 117 others; AUROC, 0.826; 95% CI, 0.817 to 0.835). (C) ROC curve for the FTC-ML model based on seven proteins, evaluated in 183 samples (62 FTC and 121 others; AUROC, 0.870; 95% CI, 0.863 to 0.877). (D) Disease prediction scores for the entire sample, with each sample assigned a classification based on the highest three probability scores from the classifiers.
Iodinated protein-protein networks in clinical thyroid tissue samples
Open search proteomic results revealed that FFPE artifact modifications, including lysine or N-terminal methylation and formaldehyde adducts, accounted for 11% of the total PTMs (Supplemental Fig. S2). Iodine-related PTMs, constituting 0.4% of TG modifications, play an important role in thyroid hormone synthesis [37]. In addition to TG, 13 proteins were found to be iodinated, with most of these proteins directly or indirectly interacting with TG, suggesting a role in iodine transport (Fig. 5A). Proteins with the same iodination site—observed in five or more samples—were used to construct a PPI network via StringDB [38]. This network includes modifications at sites such as BTD-Y126, HBB-Y36, PKHD-Y756, CLU-H402, PDIA3-Y479, HBA1-Y43, H46, H51, H4C1-Y89, ALB-Y356, Y358, ALB-Y425, in addition to multiple TG sites (categorical chi-square test, P<0.05). Naturally, iodination can occur not only on tyrosine but also on histidine residues through chemical reactions [39]. Among the 66 potential TG tyrosine sites and 39 histidine sites, seven tyrosine sites (Y234, Y382, Y2194, Y2478, Y2540, Y2573, and Y2587) and two histidine sites (H1384 and H2487) exhibited significant differences in normalized spectral counts between thyroid tumor tissues (Fig. 5B).

Qualitative and quantitative differences in peptides of proteins with iodine modifications among three thyroid tumors: benign follicular nodular disease (FND; n=62, cyan), follicular adenoma (FA; n=72, green), and follicular thyroid carcinoma (FTC; n=68, red). (A) Protein-protein interaction network of 13 proteins with iodine modifications. The number of iodinated sites per protein for each tissue type is displayed in a bar plot or pie chart; P values were calculated using a categorical chi-square test. (B) Normalized spectral counts for nine significant iodinated sites on thyroglobulin (Y234, Y382, H1384, Y2194, Y2478, H2487, Y2540, Y2573, and Y2587), with P values calculated using the Mann-Whitney U test. An asterisk (*) indicates modifications that showed statistically significant differences among the three groups. aP<0.05; bP<0.01.
DISCUSSION
Approximately 15% to 30% of thyroid nodules are diagnosed as cytologically indeterminate by FNA, including categories such as AUS or FN (Bethesda class IV) [40,41]. Numerous genomic-based approaches have been explored to improve the presurgical diagnosis of these indeterminate nodules, with the aim of differentiating malignant from benign nodules. The most recent advancement in this field is molecular diagnostics [12-15]. Despite promising results, these tests are not widely used outside the United States, largely due to high costs, logistical challenges in sample shipping, and insurance coverage limitations [42,43]. Highly accurate MS-based clinical tests for diagnosing indeterminate nodules are greatly needed. Although proteomic techniques are not yet available clinically, our study suggests that they may hold significant promise.
Past studies have employed proteomic techniques such as two-dimensional electrophoresis [9,20], recombinant phages, synthetic peptide immunohistochemistry [18], and antibody arrays [19]. These methods have limitations, including lengthy multistep procedures that can result in batch effects and low reproducibility in quantification. The present study employed a high-resolution LC-MS/MS method to overcome these limitations, offering increased sensitivity and accuracy. Using this method, we identified a relatively large number of proteins even in tissue samples collected as early as 2008. For comparison, previous LC-MS/MS studies on thyroid tissues extracted 1,629 proteins from 32 samples [44], 2,865 proteins from 202 samples [11], and 4,107 proteins from 52 samples [45]. A comparison between the two methods—peptide spectrum matching via MSFragger and spectrum-to-spectrum matching via ANN-SoLo—revealed that the spectrum-to-spectrum search identified a greater number of proteins, as illustrated by the Venn diagram (Fig. 1B). For measuring a wide range of proteins using LC-MS, SSM-based analysis, although it cannot quantify proteins absent from the spectral library, is more effective than PSM-based analysis for quantifying low-abundance proteins in the presence of high-abundance proteins. Moreover, open SSM analysis enabled the specific examination of iodine-related PTMs in thyroid tumor tissues.
To discover diagnostic biomarkers that distinguish FTCs from benign thyroid neoplasms, we performed a comprehensive proteomic analysis using LC-MS/MS on 202 thyroid tissues (FND, FA, and FTC), identifying 6,332 proteins via closed SSMs and 8,341 via open SSMs. Volcano plot analysis of protein quantification revealed distinct features across the three subtypes, with several molecular pathways displaying differential expression. Application of an ML model to our proteomic data showed that a combination of just 17 proteins could distinguish the three subtypes with high sensitivity and specificity. Among the pathways that were differentially expressed, glycolysis and MYC targets V1 were upregulated in FA and FTC compared to FND. As shown in Fig. 2B, a hallmark of malignant tumors is the metabolic shift from oxidative metabolism to glycolysis, known as the Warburg effect [46]. Glucose transporters, key to glycolysis, are upregulated in the plasma membrane and can be detected using fluorine-18-fluorodeoxyglucose positron emission tomography/computed tomography (18F-FDG PET/CT) [47]. Previous studies have reported elevated glucose metabolism in thyroid cancers [48] and have evaluated the role of 18F-FDG PET/CT in FTCs [49], which supports the upregulation of the glycolysis pathway observed in FTCs in our study. cMYC, a proto-oncogene located on chromosome 8q24.1, encodes a nuclear phosphoprotein that functions as both a transcription factor and a growth promoter [50]. It regulates approximately 15% of human genes and is implicated in about 20% of human malignancies [51]. Although limited, several studies have reported cMYC expression in FTCs [19,50,52-54].
One limitation of this study is the exclusion of other thyroid neoplasm types, such as follicular-variant papillary thyroid carcinoma, non-invasive follicular thyroid neoplasm with papillary-like nuclear features, or oncocytic tumors. Validation of the identified biomarkers in a clinical setting is essential for future research. Furthermore, validation in an independent external cohort and generalization of the ML model using these biomarkers are necessary to ensure robustness and clinical applicability.
In conclusion, our study identified potential diagnostic biomarkers for FTC and FA through comprehensive proteomic and ML analyses. The expression profile of 17 selected proteins effectively distinguished FTC, FA, and FND with high sensitivity and specificity. Further validation studies and the development of commercialized diagnostic kits could potentially usher in a new era in the presurgical diagnosis of FTC.
Supplementary Material
Supplemental Methods.
Supplemental Fig. S1.
Workflow of sample preparation of formalin-fixed, paraffin-embedded (FFPE) thyroid tissues and raw data processing to determine protein abundance. (A) FFPE tissue protein S-Trap digestion was performed after deparaffinization, protein extraction, and acetone precipitation. The 202 tissues consisted of three thyroid subtypes: benign follicular nodular disease (FND; n=62), follicular adenoma (FA; n=72), and follicular thyroid carcinoma (FTC; n=68). Digested peptide mixtures were detected using liquid chromatography–tandem mass spectrometry (LC-MS/MS). (B) Raw MS spectra were analyzed using MSFragger with the human Swiss-Prot protein sequence database and Approximate Nearest Neighbor Spectral Library (ANN-SoLo) with the spectrum library created from the output of the MSFragger software. SDS, sodium dodecyl sulfate.
Supplemental Fig. S2.
Numbers and percentages of identified peptide-spectrum matches (PSMs) at 1% false discovery rate. Spectra identified by MSFragger were verified with post-translational modification (PTM)-Shepherd. FFPE, formalin-fixed, paraffin-embedded.
Supplemental Fig. S3.
(A) Variable importance plot showing the top 20 most significant variables of component #1 and component #2 to classifying the three thyroid subtypes, benign follicular nodular disease (FND; n=62; cyan), follicular adenoma (FA; n=72; green), and follicular thyroid carcinoma (FTC; n=68), by partial least squares–discriminant analysis (PLS-DA). (B) The 18 ‘hallmark’ gene sets with P values less than 0.05 in one-way analysis of variance (ANOVA) tests are shown as boxplots of the three subtypes using single-sample scores generated by the singscore algorithm. The P values were calculated using the nonparametric Mann-Whitney U test. aP<0.05; bP<0.01, cP<0.001.
Supplemental Fig. S4.
Metascape analysis of tissue specific proteins (TSPs) of follicular adenoma (FA) (n=224) and follicular thyroid carcinoma (FTC) (n=311). (A) Hierarchical clustering of significant functional terms in the Gene Ontology (GO), Reactome, Kyoto Encyclopedia of Genes and Genomes (KEGG), and WikiPathway. (B) Eight protein–protein interaction modules and functional annotation of the corresponding modules in FA-TSPs by Metascape. (C) Nine protein–protein interaction modules and functional annotation of the corresponding modules in FTC-TSPs by Metascape.
Supplemental Fig. S5.
(A) Histogram of 592 absolute area under the curve values in univariate receiver operating characteristic (ROC) analysis comparing benign follicular nodular disease (FND; n=55) and other cases (n=128). (B) Area under the ROC curve (AUROC) table showing the top 10 highest AUROC values in benign. (C) Histogram of 592 absolute area under the curve values in univariate ROC analysis between benign follicular adenoma (FA; n=66) and other (n=117). (D) AUROC table showing the top 10 highest AUROC values in FA. (E) Histogram of 592 absolute area under the curve values in univariate ROC analysis comparing benign follicular thyroid carcinoma (FTC; n=62) and other cases (n=121). (F) AUROC table showing the 10 highest AUROC values in FTC.
Supplemental Fig. S6.
Feature selection process for identifying the optimal subset of 592 proteins for classifying binary outcome groups. (A) In benign follicular nodular disease (FND; n=55) vs. other cases (follicular adenoma [FA]+follicular thyroid carcinoma [FTC]; n=128), six proteins were selected. (B) In FA (n=66) vs. other cases (FND+FTC; n=117), seven proteins were selected. (C) In FTC (n=62) vs. other cases (FND+FA; n=121), seven proteins were selected. AUC, area under the curve; AUROC, area under the receiver operating characteristic curve.
Notes
CONFLICTS OF INTEREST
No potential conflict of interest relevant to this article was reported.
ACKNOWLEDGMENTS
This work was supported by a National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) (RS-2022-NR069128 and RS-2024-00454407) and by a grant (2024IP0054) from the Asan Institute for Life Sciences, Asan Medical Center, Seoul, Korea.
We thank Interface (Seoul, Korea) for technical support with AFA in FFPE sample preparation and all the patients who participated in this study.
Data sharing: All MS data have been deposited in the PRIDE archive (www.ebi.ac.uk/pride/archive/) PDX043154.
AUTHOR CONTRIBUTIONS
Conception or design: K.K., W.G.K. Acquisition, analysis, or interpretation of data: H.S.A., C.A.K., M.J.J., Y.M.L., T.Y.S., D. E.S., J.Y., J.M.S., Y.S.C., K.K., W.G.K. Drafting the work or revising: H.S.A., E.S., M.J.J., K.K., W.G.K. Final approval of the manuscript: H.S.A., E.S., C.A.K., M.J.J., Y.M.L., T.Y.S., D.E.S., J.Y., J.M.S., Y.S.C., K.K., W.G.K.