The CompOmics group, headed by Prof. Dr. Lennart Martens, is part of the Department of Biomolecular Medicine of the Faculty of Medicine and Health Sciences of Ghent University, and the VIB-UGent Center for Medical Biotechnology of VIB, both in Ghent, Belgium.
The group has its roots in Ghent, but has active members all over Europe, and specializes in the management, analysis and integration of high-throughput Omics data with an aim towards establishing solid data stores, processing methods and tools to enable downstream systems biology research.
The CompOmics team is always looking for talented people. Go to the jobs section on the VIB website to look for open positions.
The following web applications are developed and hosted by the group.
Here is a selection of our free and open-source tools. A full list can be found on GitHub.
Do you want to learn about Proteomics and Proteomics data analysis? Have a look at our CompOmics tutorials:
Molecular & Cellular Proteomics, 2021
Proteogenomics approaches often struggle with the distinction between true and false peptide-to-spectrum matches as the database size enlarges. However, features extracted from tandem mass spectrometry intensity predictors can enhance the peptide identification rate and can provide extra confidence for peptide-to-spectrum matching in a proteogenomics context. To that end, features from the spectral intensity pattern predictors MS2PIP and Prosit were combined with the canonical scores from MaxQuant in the Percolator postprocessing tool for protein sequence databases constructed out of ribosome profiling and nanopore RNA-Seq analyses. The presented results provide evidence that this approach enhances both the identification rate as well as the validation stringency in a proteogenomic setting.Read article
JACS AU, 2021
Rising population density and global mobility are among the reasons why pathogens such as SARS-CoV-2, the virus that causes COVID-19, spread so rapidly across the globe. The policy response to such pandemics will always have to include accurate monitoring of the spread, as this provides one of the few alternatives to total lockdown. However, COVID-19 diagnosis is currently performed almost exclusively by reverse transcription polymerase chain reaction (RT-PCR). Although this is efficient, automatable, and acceptably cheap, reliance on one type of technology comes with serious caveats, as illustrated by recurring reagent and test shortages. We therefore developed an alternative diagnostic test that detects proteolytically digested SARS-CoV-2 proteins using mass spectrometry (MS). We established the Cov-MS consortium, consisting of 15 academic laboratories and several industrial partners to increase applicability, accessibility, sensitivity, and robustness of this kind of SARS-CoV-2 detection. This, in turn, gave rise to the Cov-MS Digital Incubator that allows other laboratories to join the effort, navigate, and share their optimizations and translate the assay into their clinic. As this test relies on viral proteins instead of RNA, it provides an orthogonal and complementary approach to RT-PCR using other reagents that are relatively inexpensive and widely available, as well as orthogonally skilled personnel and different instruments. Data are available via ProteomeXchange with identifier PXD022550.Read article
Nature Communications, 2021
Read article The placenta is the interface between mother and fetus and inadequate function contributes to short and long-term ill-health. The placenta is absent from most large-scale RNA-Seq datasets. We therefore analyze long and small RNAs (~101 and 20 million reads per sample respectively) from 302 human placentas, including 94 cases of preeclampsia (PE) and 56 cases of fetal growth restriction (FGR). The placental transcriptome has the seventh lowest complexity of 50 human tissues: 271 genes account for 50% of all reads. We identify multiple circular RNAs and validate 6 of these by Sanger sequencing across the back-splice junction. Using large-scale mass spectrometry datasets, we find strong evidence of peptides produced by translation of two circular RNAs. We also identify novel piRNAs which are clustered on Chr1 and Chr14. PE and FGR are associated with multiple and overlapping differences in mRNA, lincRNA and circRNA but fewer consistent differences in small RNAs. Of the three protein coding genes differentially expressed in both PE and FGR, one encodes a secreted protein FSTL3 (follistatin-like 3). Elevated serum levels of FSTL3 in pregnant women are predictive of subsequent PE and FGR. To aid visualization of our placenta transcriptome data, we develop a web application (
The placenta is the interface between mother and fetus and inadequate function contributes to short and long-term ill-health. The placenta is absent from most large-scale RNA-Seq datasets. We therefore analyze long and small RNAs (~101 and 20 million reads per sample respectively) from 302 human placentas, including 94 cases of preeclampsia (PE) and 56 cases of fetal growth restriction (FGR). The placental transcriptome has the seventh lowest complexity of 50 human tissues: 271 genes account for 50% of all reads. We identify multiple circular RNAs and validate 6 of these by Sanger sequencing across the back-splice junction. Using large-scale mass spectrometry datasets, we find strong evidence of peptides produced by translation of two circular RNAs. We also identify novel piRNAs which are clustered on Chr1 and Chr14. PE and FGR are associated with multiple and overlapping differences in mRNA, lincRNA and circRNA but fewer consistent differences in small RNAs. Of the three protein coding genes differentially expressed in both PE and FGR, one encodes a secreted protein FSTL3 (follistatin-like 3). Elevated serum levels of FSTL3 in pregnant women are predictive of subsequent PE and FGR. To aid visualization of our placenta transcriptome data, we develop a web application (
Journal of Proteome Research, 2021Read article
JOURNAL OF PROTEOME RESEARCH, 2021
The study of microbiomes has gained in importance over the past few years and has led to the emergence of the fields of metagenomics, metatranscriptomics, and metaproteomics. While initially focused on the study of biodiversity within these communities, the emphasis has increasingly shifted to the study of (changes in) the complete set of functions available in these communities. A key tool to study this functional complement of a microbiome is Gene Ontology (GO) term analysis. However, comparing large sets of GO terms is not an easy task due to the deeply branched nature of GO, which limits the utility of exact term matching. To solve this problem, we here present MegaGO, a user-friendly tool that relies on semantic similarity between GO terms to compute the functional similarity between multiple data sets. MegaGO is high performing: Each set can contain thousands of GO terms, and results are calculated in a matter of seconds. MegaGO is available as a web application at https://megago.ugent.be and is installable via pip as a standalone command line tool and reusable software library. All code is open source under the MIT license and is available at https://github.com/MEGA-GO/.Read article
JOURNAL OF PROTEOME RESEARCH, 2021
Metaproteomics has become an important research tool to study microbial systems, which has resulted in increased metaproteomics data generation. However, efficient tools for processing the acquired data have lagged behind. One widely used tool for metaproteomics data interpretation is Unipept, a web-based tool that provides, amongst others, interactive and insightful visualizations. Due to its web-based implementation, however, the Unipept web application is limited in the amount of data that can be analyzed. In this manuscript we therefore present Unipept Desktop, a desktop application version of Unipept that is designed to drastically increase the throughput and capacity of metaproteomics data analysis. Moreover, it provides a novel comparative analysis pipeline and improves the organization of experimental data into projects, thus addressing the growing need for more performant and versatile analysis tools for metaproteomics data.Read article
For mass spectrometry-based peptide and protein quantification, label-free quantification (LFQ) based on precursor mass peak (MS1) intensities is considered reliable due to its dynamic range, reproducibility, and accuracy. LFQ enables peptide-level quantitation, which is useful in proteomics (analyzing peptides carrying post-translational modifications) and multi-omics studies such as metaproteomics (analyzing taxon-specific microbial peptides) and proteogenomics (analyzing non-canonical sequences). Bioinformatics workflows accessible via the Galaxy platform have proven useful for analysis of such complex multi-omic studies. However, workflows within the Galaxy platform have lacked well-tested LFQ tools. In this study, we have evaluated moFF and FlashLFQ, two open-source LFQ tools, and implemented them within the Galaxy platform to offer access and use via established workflows. Through rigorous testing and communication with the tool developers, we have optimized the performance of each tool. Software features evaluated include: (a) match-between-runs (MBR); (b) using multiple file-formats as input for improved quantification; (c) use of containers and/or conda packages; (d) parameters needed for analyzing large datasets; and (e) optimization and validation of software performance. This work establishes a process for software implementation, optimization, and validation, and offers access to two robust software tools for LFQ-based analysis within the Galaxy platform.Read article
ANALYTICAL CHEMISTRY, 2020
Missing values are a major issue in quantitative data-dependent mass spectrometry-based proteomics. We therefore present an innovative solution to this key issue by introducing a hurdle model, which is a mixture between a binomial peptide count and a peptide intensity-based model component. It enables dramatically enhanced quantification of proteins with many missing values without having to resort to harmful assumptions for missingness. We demonstrate the superior performance of our method by comparing it with state-of-the-art methods in the field.Read article
ACS OMEGA, 2020
Despite its growing popularity and use, bottom-up proteomics remains a complex analytical methodology. Its general workflow consists of three main steps: sample preparation, liquid chromatography coupled to tandem mass spectrometry (LC-MS/MS), and computational data analysis. Quality assessment of the different steps and components of this workflow is instrumental to identify technical flaws and avoid loss of precious measurement time and sample material. However, assessment of the extent of sample losses along with the sample preparation protocol, in particular, after proteolytic digestion, is not yet routinely implemented because of the lack of an accurate and straightforward method to quantify peptides. Here, we report on the use of a microfluidic UV/visible spectrophotometer to quantify MS-ready peptides directly in the MS-loading solvent, consuming only 2 mu L of sample. We compared the performance of the microfluidic spectrophotometer with a standard device and determined the optimal sample amount for LC-MS/MS analysis on a Q Exactive HF mass spectrometer using a dilution series of a commercial K562 cell digest. A careful evaluation of selected LC and MS parameters allowed us to define 3 mu g as an optimal peptide amount to be injected into this particular LC-MS/MS system. Finally, using tryptic digests from human HEK293T cells and showing that injecting equal peptide amounts, rather than approximate ones, result in less variable LC-MS/MS and protein quantification data. The obtained quality improvement together with easy implementation of the approach makes it possible to routinely quantify MS-ready peptides as a next step in daily proteomics quality control.Read article
MOLECULAR & CELLULAR PROTEOMICS, 2020
Label-Free Quantitative mass spectrometry based workflows for differential expression (DE) analysis of proteins impose important challenges on the data analysis because of peptide-specific effects and context dependent missingness of peptide intensities. Peptide-based workflows, like MSqRob, test for DE directly from peptide intensities and outperform summarization methods which first aggregate MS1 peptide intensities to protein intensities before DE analysis. However, these methods are computationally expensive, often hard to understand for the non-specialized end-user, and do not provide protein summaries, which are important for visualization or downstream processing. In this work, we therefore evaluate state-of-the-art summarization strategies using a benchmark spike-in dataset and discuss why and when these fail compared with the state-of-the-art peptide based model, MSqRob. Based on this evaluation, we propose a novel summarization strategy, MSqRobSum, which estimates MSqRob's model parameters in a two-stage procedure circumventing the drawbacks of peptide-based workflows. MSqRobSum maintains MSqRob's superior performance, while providing useful protein expression summaries for plotting and downstream analysis. Summarizing peptide to protein intensities considerably reduces the computational complexity, the memory footprint and the model complexity, and makes it easier to disseminate DE inferred on protein summaries. Moreover, MSqRobSum provides a highly modular analysis framework, which provides researchers with full flexibility to develop data analysis workflows tailored toward their specific applications.Read article
lennart [dot] martens [AT] UGent.be