Sequencing Analysis Support Core (SASC)

Sequencing Analysis Support Core (SASC)

Thanks to the rapid development of Next Generation Sequencing technologies, the cost of sequencing genomes has reduced drastically within the past decade. With less than 1000 euros, it is possible to obtain a high quality of personal genome for everyone now. Using more targeted approaches such as Whole Exome Sequencing (WES) or RNA sequencing (RNAseq), it just costs a few hundreds of euros for each sample. NGS has been commonly used in many research and clinical projects to provide an unprecedented view of the underlying genomics or transcriptomics landscape. However, the large amount of generated NGS dataset at their Gigabyte to Terabyte level posts a major challenge for researchers and clinicians to efficiently work with NGS.

To tackle this challenge and build up a structural solution at LUMC, the Sequencing Analysis Support Core (SASC) is established as an expertise team within the MOLEPI department to support the NGS data analysis. The SASC team consists of experienced bioinformaticians who have deep understanding of various biological and medical research domains and ample knowledge on data analysis algorithms, software development, and Linux based high performance computing infrastructure. Most standard support provided by SASC are free of charge. For projects requiring customized development of tools and pipelines, alternative collaboration modes are also available. Questions and inquires can be sent to sasc@lumc.nl.

Reproducible pipelines

The SASC team benchmarks and integrates state-of-the-art algorithms and tools developed by the bioinformatics community to build user-friendly, reproducible, and flexible data analysis pipelines. The current collection of NGS pipelines are implemented using WDL and are available as the Open Source BIOWDL project (https://github.com/biowdl). It supports a large variety of NGS based data analysis, such as RNAseq, WES, WGS, microbial sequencing and long read sequencing using PacBio or Nanopore.

Detection and annotation of genetic variants

Identification of SNVs and SVs using WES, WGS and RNAseq data is an essential step in many NGS based research project. However due to sequencing error and alignment bias, such analysis often suffers a high rate of false positives. At SASC, we have integrated a number of tools to normalize, annotate, filter and intersect raw variants in order to generate to a more confident short list of events that allow functional follow-ups. For SNVs, we mainly use GATK best practice workflow, VEP, VT tools and integrate public database like SIFT, PolyPhen, CADD, gnomAD. For SVs, we leverage several proven SV detection programs such as Delly, Clever, manta and the breakpoint based SV merging program of SURVIVOR .

Interactive transcriptomics analysis

RNAseq is a powerful tool for many applications, e.g., differential gene expression and pathway analysis, alternative splicing analysis, fusion gene detection in cancer genomics. To provide an effective way of performing quality control and expression based analysis, we have built an interactive analysis program using R-Shiny (https://github.com/LUMC/dgeAnalysis). It is a free and Open Source program that can be used by everyone works with RNAseq dataset. Please feel free to contact the SASC team for getting support of starting with this user-friendly tool.

Multiomics data management solution

Integration of multi-level omics data (genomics, transcriptomics, epigenetic, proteomics, and metabolomics) is a commonly used approach in large scale research projects across multiple biobanks. SASC has played an essential role in the data management work of several such projects such as BBMRI-BIOS, BBMRIomics, NCDC. We have developed databases and tools to support multiomics based research. The data management effort that is mainly led by SASC in the project of BBMRI-omics was awarded with the Dutch Dataprize 2018.

Single cell analysis and cell type prediction

In collaboration with Prof. Susana Chuva de Sousa Lopes’ group, we have built data analysis pipelines for single cell transcriptomics using open source packages include Seurat, Monocle, etc. Here, our research interests lie in Machine Learning based automatic cell type classification to unravel the early embryonic development. Furthermore, we are keen to build atlas based resource (e.g., http://www.keygenes.nl/) to publish and share our research data to the broader research community.