Alternative Polyadenylation in Human Diseases
Article information
Abstract
Varying length of messenger RNA (mRNA) 3′-untranslated region is generated by alternating the usage of polyadenylation sites during pre-mRNA processing. It is prevalent through all eukaryotes and has emerged as a key mechanism for controlling gene expression. Alternative polyadenylation (APA) plays an important role for cell growth, proliferation, and differentiation. In this review, we discuss the functions of APA related with various physiological conditions including cellular metabolism, mRNA processing, and protein diversity in a variety of disease models. We also discuss the molecular mechanisms underlying APA regulation, such as variations in the concentration of mRNA processing factors and RNA-binding proteins, as well as global transcriptome changes under cellular signaling pathway.
INTRODUCTION
Eukaryotic pre-messenger RNA (mRNA) processing is composed of three major steps: 5′-capping, 3′-end cleavage/polyadenylation, and RNA splicing. Among them polyadenylation finishes the maturation of almost all eukaryotic mRNAs [1]. This is a two-step reaction that includes an endonucleolytic cleavage of the pre-mRNA and the addition of untemplated adenosines, the poly A tail. The poly A tail plays a key role in mRNA stability, nuclear export, and translation control [2]. The main machinery responsible for the designation of a poly A site includes several recognition, cleavage and polyadenylation factors such as cleavage stimulation factor (CSTF), cleavage and polyadenylation specificity factor (CPSF), cleavage factors I and II (CFI and CFII), as well as the poly(A) polymerase (PAP) that generally recognizes and acts on an AAUAAA hexamer or its variants in pre-mRNA (Fig. 1A) [1234]. This mechanism requires the coordinated activities of many RNA-binding proteins (RBPs) as well as specific sequence elements in pre-mRNA, which can guide the trans-acting factors to form 3′-end processing complexes at the cleavage site and its malfunctions can cause abnormal gene expressions and may lead to severe diseases [5]. Numerous genes generate multiple mRNA isoforms as a result of alternative polyadenylation (APA) in the 3′-untranslated region (UTR) or intron region. The varying length of the 3′-UTR in mRNAs created by APA is a detectible target for differential regulation and affects the fate of the transcript, ultimately modulating the expression of the gene (Fig. 1B) [67]. Recently, many studies have focused on identifying the role of APA in gene expression and their impact on a variety of biological conditions as well as in several diseases [567]. Those studies determined that abnormalities in the 3′-end processing mechanisms represent a common characteristic among many endocrinal, hematological, oncological, immunological, and neurological diseases. In this review, we discuss the molecular features of APA in different cellular status and diseases.
ENDOCRINE DISEASES
With the development of high-end sequencing techniques, it has been revealed that APA is responsible for several endocrine diseases. One of these occurs in the steroidogenic acute regulatory (StAR) gene, where the encoded protein mediates the key step of the delivery of cholesterol to the mitochondrial inner membrane in steroidogenic tissues [8910]. Two different lengths of StAR transcripts due to APA located in the 3′-UTR have the same expression levels in adrenal cells in a normal state [8]. However, upon the activation by a cyclic AMP analog, Br-cAMP (bromoadenosine 3′,5′-cyclic monophosphate), which can activate the cholesterol metabolism, the longer and less stable form of StAR transcript is preferentially produced using the distal polyadenylation signal (PAS) (Table 1) [9]. Thus, regulation of StAR at the mRNA level seems to control the level of this critical endocrine regulator in an acute manner. Although the congenital adrenal hyperplasia caused by mutations in StAR have already been reported [1011], these findings indicate that the impairment of its RNA processing mechanism can also induce alterations in the cholesterol metabolism.
In hyperglycemia, the high-glucose-regulated gene 14 (HGRG-14) is differentially expressed by changing the APA pattern under different glucose levels. In normal condition, cells express only the short isoform of HGRG-14. However, after the incubation with hyperglycemic media for 2 hours, the longer mRNA isoform begins to express. The lengthening of HGRG-14 mRNA in hyperglycemic condition causes to a rapid degradation of HGRG-14 mRNA, which might be due to the presence of five AREs (AU-rich elements) in the 3′-UTR (Table 1) [12]. Taken together, these results indicate that a tolerance system against hyperglycemia condition exists in cells by controlling the 3′-UTR length of HGRG-14 mRNA [12].
Another example of the gene regulation by APA is transcription factor 7-like 2 (TCF7L2), a member of T-cell factor/lymphoid enhancer factor (TCF/LEF) family [1314]. TCF7L2 has been highly correlated with type 2 diabetes. Unlike the case of HGRG-14, truncated TCF7L2 mRNA is produced by intronic APA and is increased up to comparable amounts of the full-length transcript in human tissues with type 2 diabetes. Similar observations were made in other studies indicate that the loss of function of TCF7L2 by intronic APA might be the cause for predisposition to diabetes [151617].
HEMATOLOGICAL DISORDERS
Alterations in the usage of PAS signals in 3′-UTR have been linked to numerous cases of hematological diseases. Most of the cases were initially linked to thalassemia, a group of inherited autosomal recessive hematological disorders mostly caused by the deficiency in the synthesis of one or more of hemoglobin chains. These studies showed that the mutations in PAS severely affect the globin genes expression and are responsible for the hematological disease.
Several decades ago, a single point mutation in the canonical PAS sequence (AATAAA to AACAAA) of the human β-globin gene was reported in β-thalassemia patients. This mutation produced an elongated mRNA isoform of β-globin by using a canonical PAS which is located 900 nucleotide downstream of the mutation site [18]. Later, two other mutations in the β-globin PAS, a deletion (AATAAA to A—) or a point mutation (AATAAA to AATAAG), were also found. The point mutation leads to the production of four different elongated mRNA isoforms, which causes a significant decrease in the expression of β-globin mRNA (Table 1) [19].
In the case of α-thalassemia, a single nucleotide replacement (AATAAA to AATAAG) in the PAS of α2-globin was identified. In this case, it was reported that the expression of the downstream α1-globin gene was deactivated by unknown mechanisms [20]. Follow-up studies revealed that a mutation on the PAS of the α2-globin gene, a deletion of two nucleotides (AATAAA to AATA–), causes the downregulation of α2-globin gene. As this mutation affects RNA polymerase II transcription termination, the expression of α1-globin gene was also affected by the malfunction of transcription termination. These studies identified the major role of PAS in the expression of α2-globin gene and how it reciprocally affects the expression of other genes like α1-globin [21].
PROLIFERATION AND TUMORIGENIC CONDITIONS
Recently several studies have shown that mRNAs of highly proliferative or tumorigenic cells have shorter 3′-UTRs, leading to a potential escape from microRNA (miRNAs)- or RBP-mediated repression of protein synthesis. Therefore, in many cases 3′-UTR shortening can result in the increase of protein levels of oncogenes, such as cyclin D1 (CCND1) without the increase of mRNA levels [722].
It has been shown that the alterations in the expression of 3′-end processing factors in various cellular conditions affect the APA pattern. A recent study on glioblastoma uncovered a strong correlation between the cleavage factor CPSF5 and tumor development. In this study, the authors showed that the downregulation of CPSF5 leads to transcriptome-wide 3′-UTR shortening, indicating that CPSF5 functions as a repressor of proximal PAS selection. Moreover, the study also found that CPSF5 downregulation-mediated 3′-UTR shortening leads to the upregulation of several tumor-related genes like CCND1. On the other hand, the upregulation of CPSF5 impaired cell growth and inhibited cellular invasion. These results suggest a role for CPSF5 as a potential tumor suppressor through the control of APA [23].
Furthermore, a recent study showed that the hyper-activation of mammalian target of rapamycin complex 1 (mTORC1) in cells induces global 3′-UTR shortening in the transcriptome. Interestingly, many genes related to ubiquitin-mediated proteolysis pathway showed 3′-UTR shortening and increased protein synthesis without changing the level of corresponding mRNAs in our data analysis. Further polysome profiling experiments indicated that the increase of protein amounts is indeed mediated by 3′-UTR shortening. An interesting perspective of this finding is that mTORC1-mediated 3′-UTR shortening can foster tumor environment because many genes showing 3′-UTR shortening in ubiquitin-mediated proteolysis pathway are specific E2 or E3 ligases that selectively target tumor suppressors or cell cycle regulators [24].
During the colorectal cancer development, the APA event in the distinct group of genes including dermokine (DMKN), pyridoxal kinase (PDXK), and peptidylpropyl isomerase E (PPIE) were identified at different stages of cancer development such as normal mucosa, adenoma, and carcinoma. DMKN and PPIE genes produced 3′-UTR shortened mRNAs by using proximal PAS during the transformation of normal mucosa into adenoma or carcinoma. In case of the PDXK and PPIE genes, they prefer to use proximal PAS rather than distal ones during the development carcinoma from adenoma (Table 1) [25].
As opposed to cell proliferation, it has been shown that the expression of mRNA isoforms with long 3′-UTRs is increased by using distal PAS during cell differentiation and embryonic development. Interestingly, long 3′-UTR mRNA isoforms upregulated during cell differentiation and embryonic development shared the molecular signature of strong proximal PAS and weak proximal PAS in general, implying that the strength of PAS signal is an important determinant of PAS selection during differentiation and development [2627].
INFECTION AND IMMUNOLOGICAL CONDITIONS
Immune responses are critical to regulate immunological and infectious diseases by inducing inflammatory reactions. This requires coordinated and timely regulated cellular responses involving the modulation of gene expression from various genes. Any deregulations in this process would affect the magnitude and duration of the inflammatory reactions and can lead to pathological conditions.
During B-cell differentiation, the immunoglobulin M (IgM) heavy-chain mRNA (µ) produces two distinct transcripts that encode a membrane bound or a secreted antibody form by the APA event. Pre-B or B-cells splice out the PAS-containing intron during IgM heavy-chain pre-mRNA processing and use the distal PAS for polyadenylation, resulting in the generation of full-length mRNA competent for the encoding of µ membrane receptor region. In contrast, proximal PAS located in the upstream intron of exons encoding membrane receptor region is used in differentiated plasma cells during the processing of IgM heavy-chain pre-mRNA and this molecular event results in the production of truncated IgM heavy-chain mRNA that lacks membrane receptor region and thus encodes the µ secreted form (Table 1) [2829]. The mRNA isoforms of nuclear factor of activated T-cells 1 (NF-ATC1) are differentially controlled during T-cell activation. Three different isoforms of NF-ATC are generated by the coordination of splicing and APA events. Generally, two longer isoforms (NF-ATC1/B and NF-ATC1/C) are synthesized in native T-cells, whereas in effector T-cells, the shortest mRNA isoform (NF-ATC1/A) is produced by using the proximal PAS [30].
A multisystem autoimmune disease, systemic lupus erythematosus (SLE), also known as lupus, shows familial co-segregates with other autoimmune disorders. Inhibition of apoptosis of regulatory T-cells has been associated with autoimmunity and is thought to contribute to the lymphopenia in SLE [3132]. Recently GTPase, IMAP family member 5 (GIMAP5s) was identified as a key player in lymphopenia and primary T-cell apoptosis, and single nucleotide polymorphisms (SNPs) in the first proximal PAS of GIMPAP5s 3′-UTR was found in most SLE patients. This particular SNP produces a long 3′-UTR isoform of GIMPAP5s mRNA by inefficient transcription termination and suggests that the GIMAP5 long transcript is highly correlated with the susceptibility to SLE (Table 1) [3334].
The IPEX syndrome (immune dysfunctions, polyendocrinopathy, enteropathy, X-linked) is an autoimmune disease related to the hyperactivation of T-cells by a mutation in the forkhead box P3 gene (FOXP3) [3536]. By analyzing the 3′-UTR of FOXP3 in patient samples, it has been shown that an A to G transition in the canonical PAS after the stop codon (AATAAA to AATGAA) appears in patient samples but not in unaffected control groups. This transition mutation caused downregulation of FOXP3 in affected individuals (Table 1) [37]. Therefore, the deficiency of FOXP3 pre-mRNA processing resulting from the PAS mutation was suggested as a cause of IPEX.
NEUROLOGICAL DISEASES
The development of the nervous system is a very complicated process and includes a highly regulated network of gene expression. APA events that result in longer 3′-UTRs have been identified in embryonic development [38], differentiation in neurons [26], and the development of central nervous system [3940]. This development- or brain-specific 3′-UTR lengthening increases the complexity of regulatory networks that could be imposed by the abundance of miRNAs or RBPs interacting with 3′-UTRs [41].
Mutations in the αSyn gene have been correlated with Parkinson disease (PD) [424344]. It has been demonstrated that the isoform with an extended 3′UTR of αSyn mRNA, αSynL, is upregulated in brain tissues of PD patients and it leads to the αSyn protein aggregation and changes in subcellular localization of αSyn protein from synaptic terminals and to mitochondria, consistent with PD pathology (Table 1). Interestingly, it was also reported that αSynL mRNA levels are increased by the treatment of dopamine in the dopaminergic neurons, implying that this neurotransmitter could act as a modulator of αSyn APA [45].
Oculopharyngeal muscular dystrophy (OPDM) is an autosomal dominant neuromuscular disorder and is characterized by progressive eyelid drooping (ptosis), difficulty in swallowing (dysphagia), proximal limb weakness, and inclusions in muscle fibers [4647]. In dominant OPDM, a mutation of trinucleotide repeat expansion in the polyA-binding protein nuclear 1 (PABPN1) gene results in the addition of N-terminal expanded polyalanine tract to PABPN1 (Table 1) [48]. Recent studies showed that PABPN1 binds close to proximal PAS in 3′-UTRs and introns, and protects the usage of nearby PAS for cleavage and polyadenylation. These findings suggest PABPN1 as an APA suppressor. Therefore, loss of function of PABPN1 by downregulation or an OPMD mutation (PABPN1 with 17 polyalanine tract) leads to genome-wide usage of proximal PASs and results in 3′-UTR shortening in the transcriptome [49].
Huntington's disease (HD) is a neurodegenerative disorder caused by the extension of the polyglutamine tract in huntingtin (HTT) exon 1 (Table 1) [50]. APA events can produce three isoforms of HTT mRNA. The shortest isoform can be only found in HD patients and mouse models, indicating that it could be associated with HD. The shorter 3′-UTR isoform prevails in growing cells while the longest isoform is found in nondividing cells. Interestingly, each HTT isoform shows different half-lives, localization pattern, RBP-binding sites, and microRNA-binding sites. Moreover, knockdown of CNOT6 (CCR4-NOT transcription complex subunit 6) RBP leads to HTT mRNA isoform changes similar to those found in the HD motor cortex. These results indicate that the expression of different 3′-UTR isoforms from a distinct group of genes including HTT is one of the molecular features in the HD (Table 1) [51].
MECHANISMS OF APA
It has been reported that the process of APA is tightly controlled by the strength and availability of cis-acting sequence elements, as well as the concentration and the activity of trans-acting proteins [7]. The differential CSTF2 expression during B-cell activation is one of the first known mechanistic insights of APA [52]. When B-cells are activated, CSTF2 level is highly upregulated and leads to a preferential usage of a weaker PAS in many genes. Also, a transcriptome-wide study revealed that the expression level of CSTF2 is positively correlated with the tendency of global mRNA 3′-UTR shortening [3053]. In this case, CSTF2 can bind to downstream sequence elements near proximal PAS more effectively to facilitate the usage of weak proximal PAS (Fig. 1B) [54]. It was also reported that the expression level of CSTF2 is upregulated upon T-cell receptor (TCR) stimulation. In native T-cells, CSTF2 expression is lower and thus proximal PASs are not efficiently used in general. During TCR stimulation, the CSTF2 level increases and the usage of the proximal PAS increases accordingly [30].
CPSF5 and CPSF6 have also been identified as APA factors. It has been reported that the loss-of-function or knockdown of CPSF5 or CPSF6 can lead to a global usage of proximal PASs [555657]. Transcriptome-wide localization of CPSF5 and CPSF6 suggested the following mechanism of APA by these proteins: high affinity binding sites for CPSF5 or CPSF6 resides in the upstream sequence elements (USEs) of PASs and the association of CPSF5 and CPSF6 provides a steric hindrance to the access of other 3′-end processing enzymes (Fig. 1B). As such, the downregulation of CPSF5 and CPSF6 exposes the USEs to 3′-end processing complexes and promotes the usage of proximal PASs [56].
Other 3′-end processing factors and RBPs have also been shown to be involved in the regulation of APA. For example, PABPN1, as we mentioned above, functions as a suppressor of 3′ UTR shortening [49]. U2AF2 (U2 small nuclear RNA auxiliary factor 2), the splicing factor binding to polypyrimidine tract near to the 3′-splice site, has been shown to be capable of interacting with other CFIs to facilitate 3′-end formation near polypyrimidine tracts [58]. U1 small nuclear ribonucleoproteins (snRNP), the splicing factor which defines 5′ splice sites, was reported to inhibit PAS usage via directly interaction with PAPα. Interestingly, U1 snRNP also has a function of premature termination of transcription using PASs localized in introns and this activity is linked to genome-wide determination of mRNA length. Therefore, in addition to the role in splicing, U1 snRNP blocks premature cleavage and polyadenylation in proximal PASs in introns [5960].
CONCLUSIONS
As a post-transcriptional regulatory mechanism, APA is one of the critical events regulating various molecular aspects of mRNA metabolism. Many studies on APA have shown that the number of transcripts undergoing APA has increased notably due to the advancement of technologies for genome-associated studies. However, the biological roles and the mechanistic insights of global APA events are still unclear. To better understand the functional roles and significance of APA events in their particular cellular processes, biological model systems as well as comprehensive analysis tools for APA are needed. Also, understanding upstream cellular pathways that regulate the APA events and how these molecular events are connected would be of interest.
ACKNOWLEDGMENTS
This works was supported by National Institutes of Health (1R01GM113952-01A1) and Department of Defense–Congressionally Directed Medical Research Programs (W81XWH-16-1-0135) to Jeongsik Yong.
Notes
CONFLICTS OF INTEREST: No potential conflict of interest relevant to this article was reported.