10th International Virology Summit
University of Szeged, Hungary
Title: Developing a bioinformatic package for fast identification of viral transcripts using long-read sequencing methods
Biography: Attila Szucs
The Oxford Nanopore Technologies MinION and the Pacific Biosciences Sequel, RSII long-read sequencing platforms are capable to sequence full length transcripts, although several limitations are still occurred. The sequencing depths are much lower compared to short-read sequencing methods. One of the main problem is to distinguish between the degraded mRNAs molecules and full-length transcripts. This problem can be relatively easily solved for the abundant genes, but it is difficult for low abundant ones. Furthermore, not a trivial task to distinguish between the sequences which conatin false and true intronic sites. The false introns are generated by the strand switching effect of the reverse transcriptase. False prime sites in the genome are another source of the sequencing error. The currently available mapping softwares not always provide the most significant matches. To circumvent the above mentioned problems, we have written several routines. These routine scripts are able to correct the alignments at the end of mapped reads and remove extra false exons. Our scripts add extra parameters to the BAM files, which contain information about the positions and similarity of adaptor sequence. These routins use statistic methods to distinguish between the real and the false transcription start and end sites (TSS and TES). The program removes the potential TSS and TES sites derived from false priming. Moreover, the reads which contain false introns are also removed by our routines. Finally, our programs are able to collect the “real” transcripts and their abundance.