Transcriptome diversity assessment of Gossypium arboreum (FDH228) leaves under control, drought and whitefly infestation using PacBio long reads

Farooq, Muhammad; Zahra Naqvi, Rubab; Amin, Imran; Ur Rehman, Atiq; Asif, Muhammad; Mansoor, Shahid


Alternative splicing (AS) and alternative polyadenylation (APA) are common mechanisms in eukaryotes to increase the complexity of transcriptomes and subsequently proteomes. Analysis of long reads transcriptomics data can result in the discovery of novel transcripts, splice sites, AS or APA events. Gossypium arboreum is an important cultivated cotton species and a putative contributor of the A sub-genome to the modern tetraploid cotton; and inherently tolerant to several biotic and abiotic stresses. Specifically, its variety ‘FDH228′ is considered to be an important resistance source. In this study, we sequenced the G. arboreum (var. FDH228) transcriptome using PacBio IsoSeq and illumina short read sequencing under three different conditions i.e. untreated/healthy, treated with biotic stress through whitefly infestation, and treated with abiotic stress via water deprivation, for the discovery and surveying of canonical and non-canonical AS, APA and transcript fusion events. We were able to obtain 15,419 unique transcripts from all samples representing 11,343 genes, out of which 10,832 were annotated and 520 were novel with respect to the published reference genome. These transcripts were grouped into different structural categories including 60 Antisense, 11,959 having a full-splice match, 999 with incomplete-splice match, 30 fusion transcripts, 177 genic, 479 intergenic, 771 novels in the catalog, and 944 Novel but not found in the catalog. Subsequently, randomly selected candidate transcripts were experimentally validated using qRT-PCR. Our comprehensive identification of canonical and non-canonical splicing events, and novel and fusion transcripts aids in the understanding of the resistance mechanisms for this specific germplasm.