本平台为互联网非涉密平台,严禁处理、传输国家秘密或工作秘密

基于烟草基因组重测序数据的SNP提取软件组合比较

Assessment of SNP-calling pipelines using tobacco genome resequencing data

  • 摘要: 为选择一个适合大规模烟草基因组重测序数据分析的软件组合,对常用的基因组重测序软件之间的差异进行了比较。分别采用3种过滤测序原始数据的软件(NGS QC Toolkit、Trimmomatic和ngsShoRT)对烟草品种K326的全基因组测序数据进行过滤,然后用两种比对软件(BWA和Bowtie2)将过滤后的数据比对到红花大金元的参考基因组上,随后将比对结果用SAMtools软件进行变异提取,同时把BWA软件的比对结果用GATK软件进行变异提取,共有9个软件组合。比较9个软件组合的分析结果发现:不同软件组合的分析结果具有明显差异,各个组合结果的可信概率范围为55%~71%,其中Trimmomatic_BWA_SAMtools软件组合与其他组合相比,分析耗时较短、操作简便、准确度较高,适合测序深度较高的大规模全基因组重测序数据的前期处理。

     

    Abstract: To select a suitable software pipeline for analyzing large-scale resequencing data of tobacco genome, nine software pipelines were compared. Three standalone software packages including NGS QC Toolkit, Trimmomatic and ngsShoRT were used for filtering K326 genome sequencing data. The quality filtered reads were mapped to Hongda Reference Genome through two sequence aligners BWA and Bowtie2. Then, SAMtools, a variant calling tool, was used to identify SNPs, and GATK was used to analyze the results generated by BWA. Finally, a total of nine independent VCF files containing SNPs and InDels were obtained. The results showed that the outputs analyzed by the nine software pipelines differed significantly, and the exact probabilities of the nine SNPs-calling pipelines ranged from 55% to 71%. The pipeline of Trimmomatic_BWA_SAMtools featured higher efficiency, easier operation and higher precision, it was therefore considered suitable for data reprocessing of large-scale genomic resequencing data.

     

/

返回文章
返回