The CompOmics group, headed by Prof. Dr. Lennart Martens, is part of the Department of Biomolecular Medicine of the Faculty of Medicine and Health Sciences of Ghent University, and the VIB-UGent Center for Medical Biotechnology of VIB, both in Ghent, Belgium.
The group has its roots in Ghent, but has active members all over Europe, and specializes in the management, analysis and integration of high-throughput Omics data with an aim towards establishing solid data stores, processing methods and tools to enable downstream systems biology research.
The CompOmics team is always looking for talented people. Go to the jobs section on the VIB website to look for open positions.
The following web applications are developed and hosted by the group.
Here is a selection of our free and open-source tools. A full list can be found on GitHub.
Do you want to learn about Proteomics and Proteomics data analysis? Have a look at our CompOmics tutorials:
Nature Communications, 2021
Read article The placenta is the interface between mother and fetus and inadequate function contributes to short and long-term ill-health. The placenta is absent from most large-scale RNA-Seq datasets. We therefore analyze long and small RNAs (~101 and 20 million reads per sample respectively) from 302 human placentas, including 94 cases of preeclampsia (PE) and 56 cases of fetal growth restriction (FGR). The placental transcriptome has the seventh lowest complexity of 50 human tissues: 271 genes account for 50% of all reads. We identify multiple circular RNAs and validate 6 of these by Sanger sequencing across the back-splice junction. Using large-scale mass spectrometry datasets, we find strong evidence of peptides produced by translation of two circular RNAs. We also identify novel piRNAs which are clustered on Chr1 and Chr14. PE and FGR are associated with multiple and overlapping differences in mRNA, lincRNA and circRNA but fewer consistent differences in small RNAs. Of the three protein coding genes differentially expressed in both PE and FGR, one encodes a secreted protein FSTL3 (follistatin-like 3). Elevated serum levels of FSTL3 in pregnant women are predictive of subsequent PE and FGR. To aid visualization of our placenta transcriptome data, we develop a web application (
The placenta is the interface between mother and fetus and inadequate function contributes to short and long-term ill-health. The placenta is absent from most large-scale RNA-Seq datasets. We therefore analyze long and small RNAs (~101 and 20 million reads per sample respectively) from 302 human placentas, including 94 cases of preeclampsia (PE) and 56 cases of fetal growth restriction (FGR). The placental transcriptome has the seventh lowest complexity of 50 human tissues: 271 genes account for 50% of all reads. We identify multiple circular RNAs and validate 6 of these by Sanger sequencing across the back-splice junction. Using large-scale mass spectrometry datasets, we find strong evidence of peptides produced by translation of two circular RNAs. We also identify novel piRNAs which are clustered on Chr1 and Chr14. PE and FGR are associated with multiple and overlapping differences in mRNA, lincRNA and circRNA but fewer consistent differences in small RNAs. Of the three protein coding genes differentially expressed in both PE and FGR, one encodes a secreted protein FSTL3 (follistatin-like 3). Elevated serum levels of FSTL3 in pregnant women are predictive of subsequent PE and FGR. To aid visualization of our placenta transcriptome data, we develop a web application (
Journal of Proteome Research, 2021Read article
JOURNAL OF PROTEOME RESEARCH, 2021
The study of microbiomes has gained in importance over the past few years and has led to the emergence of the fields of metagenomics, metatranscriptomics, and metaproteomics. While initially focused on the study of biodiversity within these communities, the emphasis has increasingly shifted to the study of (changes in) the complete set of functions available in these communities. A key tool to study this functional complement of a microbiome is Gene Ontology (GO) term analysis. However, comparing large sets of GO terms is not an easy task due to the deeply branched nature of GO, which limits the utility of exact term matching. To solve this problem, we here present MegaGO, a user-friendly tool that relies on semantic similarity between GO terms to compute the functional similarity between multiple data sets. MegaGO is high performing: Each set can contain thousands of GO terms, and results are calculated in a matter of seconds. MegaGO is available as a web application at https://megago.ugent.be and is installable via pip as a standalone command line tool and reusable software library. All code is open source under the MIT license and is available at https://github.com/MEGA-GO/.Read article
JOURNAL OF PROTEOME RESEARCH, 2021
Metaproteomics has become an important research tool to study microbial systems, which has resulted in increased metaproteomics data generation. However, efficient tools for processing the acquired data have lagged behind. One widely used tool for metaproteomics data interpretation is Unipept, a web-based tool that provides, amongst others, interactive and insightful visualizations. Due to its web-based implementation, however, the Unipept web application is limited in the amount of data that can be analyzed. In this manuscript we therefore present Unipept Desktop, a desktop application version of Unipept that is designed to drastically increase the throughput and capacity of metaproteomics data analysis. Moreover, it provides a novel comparative analysis pipeline and improves the organization of experimental data into projects, thus addressing the growing need for more performant and versatile analysis tools for metaproteomics data.Read article
ANALYTICAL CHEMISTRY, 2020
Missing values are a major issue in quantitative data-dependent mass spectrometry-based proteomics. We therefore present an innovative solution to this key issue by introducing a hurdle model, which is a mixture between a binomial peptide count and a peptide intensity-based model component. It enables dramatically enhanced quantification of proteins with many missing values without having to resort to harmful assumptions for missingness. We demonstrate the superior performance of our method by comparing it with state-of-the-art methods in the field.Read article
ACS OMEGA, 2020
Despite its growing popularity and use, bottom-up proteomics remains a complex analytical methodology. Its general workflow consists of three main steps: sample preparation, liquid chromatography coupled to tandem mass spectrometry (LC-MS/MS), and computational data analysis. Quality assessment of the different steps and components of this workflow is instrumental to identify technical flaws and avoid loss of precious measurement time and sample material. However, assessment of the extent of sample losses along with the sample preparation protocol, in particular, after proteolytic digestion, is not yet routinely implemented because of the lack of an accurate and straightforward method to quantify peptides. Here, we report on the use of a microfluidic UV/visible spectrophotometer to quantify MS-ready peptides directly in the MS-loading solvent, consuming only 2 mu L of sample. We compared the performance of the microfluidic spectrophotometer with a standard device and determined the optimal sample amount for LC-MS/MS analysis on a Q Exactive HF mass spectrometer using a dilution series of a commercial K562 cell digest. A careful evaluation of selected LC and MS parameters allowed us to define 3 mu g as an optimal peptide amount to be injected into this particular LC-MS/MS system. Finally, using tryptic digests from human HEK293T cells and showing that injecting equal peptide amounts, rather than approximate ones, result in less variable LC-MS/MS and protein quantification data. The obtained quality improvement together with easy implementation of the approach makes it possible to routinely quantify MS-ready peptides as a next step in daily proteomics quality control.Read article
MOLECULAR & CELLULAR PROTEOMICS, 2020
Label-Free Quantitative mass spectrometry based workflows for differential expression (DE) analysis of proteins impose important challenges on the data analysis because of peptide-specific effects and context dependent missingness of peptide intensities. Peptide-based workflows, like MSqRob, test for DE directly from peptide intensities and outperform summarization methods which first aggregate MS1 peptide intensities to protein intensities before DE analysis. However, these methods are computationally expensive, often hard to understand for the non-specialized end-user, and do not provide protein summaries, which are important for visualization or downstream processing. In this work, we therefore evaluate state-of-the-art summarization strategies using a benchmark spike-in dataset and discuss why and when these fail compared with the state-of-the-art peptide based model, MSqRob. Based on this evaluation, we propose a novel summarization strategy, MSqRobSum, which estimates MSqRob's model parameters in a two-stage procedure circumventing the drawbacks of peptide-based workflows. MSqRobSum maintains MSqRob's superior performance, while providing useful protein expression summaries for plotting and downstream analysis. Summarizing peptide to protein intensities considerably reduces the computational complexity, the memory footprint and the model complexity, and makes it easier to disseminate DE inferred on protein summaries. Moreover, MSqRobSum provides a highly modular analysis framework, which provides researchers with full flexibility to develop data analysis workflows tailored toward their specific applications.Read article
JOURNAL OF PROTEOME RESEARCH, 2020
Spectral similarity searching to identify peptide-derived MS/MS spectra is a promising technique, and different spectrum similarity search tools have therefore been developed. Each of these tools, however, comes with some limitations, mainly because of low processing speed and issues with handling large databases. Furthermore, the number of spectral data formats supported is typically limited, which also creates a threshold to adoption. We have therefore developed COSS (CompOmics Spectral Searching), a new and user-friendly spectral library search tool supporting two scoring functions. COSS also includes decoy spectra generation for result validation. We have benchmarked COSS on three different spectral libraries and compared the results with established spectral searching tools and a sequence database search tool. Our comparison showed that COSS more reliably identifies spectra, is capable of handling large data sets and libraries, and is an easy to use tool that can run on low computer specifications. COSS binaries and source code can be freely downloaded from https://github.com/compomics/COSS.Read article
JOURNAL OF PROTEOME RESEARCH, 2020
Protein phosphorylation is a key post-translational modification in many biological processes and is associated to human diseases such as cancer and metabolic disorders. The accurate identification, annotation, and functional analysis of phosphosites are therefore crucial to understand their various roles. Phosphosites are mainly analyzed through phosphoproteomics, which has led to increasing amounts of publicly available phosphoproteomics data. Several resources have been built around the resulting phosphosite information, but these are usually restricted to the protein sequence and basic site metadata. What is often missing from these resources, however, is context, including protein structure mapping, experimental provenance information, and biophysical predictions. We therefore developed Scop3P: a comprehensive database of human phosphosites within their full context. Scop3P integrates sequences (UniProtKB/Swiss-Prot), structures (PDB), and uniformly reprocessed phosphoproteomics data (PRIDE) to annotate all known human phosphosites. Furthermore, these sites are put into biophysical context by annotating each phosphoprotein with per-residue structural propensity, solvent accessibility, disordered probability, and early folding information. Scop3P, available at https://iomics.ugent.be/scop3p, presents a unique resource for visualization and analysis of phosphosites and for understanding of phosphosite structure–function relationships.Read article
Cell migration research has become a high-content field. However, the quantitative information encapsulated in these complex and high-dimensional datasets is not fully exploited owing to the diversity of experimental protocols and non-standardized output formats. In addition, typically the datasets are not open for reuse. Making the data open and Findable, Accessible, Interoperable, and Reusable (FAIR) will enable meta-analysis, data integration, and data mining. Standardized data formats and controlled vocabularies are essential for building a suitable infrastructure for that purpose but are not available in the cell migration domain. We here present standardization efforts by the Cell Migration Standardisation Organisation (CMSO), an open community-driven organization to facilitate the development of standards for cell migration data. This work will foster the development of improved algorithms and tools and enable secondary analysis of public datasets, ultimately unlocking new knowledge of the complex biological process of cell migration.Read article
lennart [dot] martens [AT] UGent.be