The CompOmics group, headed by Prof. Dr. Lennart Martens, is part of the Department of Biomolecular Medicine of the Faculty of Medicine and Health Sciences of Ghent University, and the VIB-UGent Center for Medical Biotechnology of VIB, both in Ghent, Belgium.
The group has its roots in Ghent, but has active members all over Europe, and specializes in the management, analysis and integration of high-throughput Omics data with an aim towards establishing solid data stores, processing methods and tools to enable downstream systems biology research.
The CompOmics team is always looking for talented people. Go to the jobs section on the VIB website to look for open positions.
The following web applications are developed and hosted by the group.
Here is a selection of our free and open-source tools. A full list can be found on GitHub.
Do you want to learn about Proteomics and Proteomics data analysis? Have a look at our CompOmics tutorials:
ANALYTICAL CHEMISTRY, 2020
Missing values are a major issue in quantitative data-dependent mass spectrometry-based proteomics. We therefore present an innovative solution to this key issue by introducing a hurdle model, which is a mixture between a binomial peptide count and a peptide intensity-based model component. It enables dramatically enhanced quantification of proteins with many missing values without having to resort to harmful assumptions for missingness. We demonstrate the superior performance of our method by comparing it with state-of-the-art methods in the field.Read article
ACS OMEGA, 2020
Despite its growing popularity and use, bottom-up proteomics remains a complex analytical methodology. Its general workflow consists of three main steps: sample preparation, liquid chromatography coupled to tandem mass spectrometry (LC-MS/MS), and computational data analysis. Quality assessment of the different steps and components of this workflow is instrumental to identify technical flaws and avoid loss of precious measurement time and sample material. However, assessment of the extent of sample losses along with the sample preparation protocol, in particular, after proteolytic digestion, is not yet routinely implemented because of the lack of an accurate and straightforward method to quantify peptides. Here, we report on the use of a microfluidic UV/visible spectrophotometer to quantify MS-ready peptides directly in the MS-loading solvent, consuming only 2 mu L of sample. We compared the performance of the microfluidic spectrophotometer with a standard device and determined the optimal sample amount for LC-MS/MS analysis on a Q Exactive HF mass spectrometer using a dilution series of a commercial K562 cell digest. A careful evaluation of selected LC and MS parameters allowed us to define 3 mu g as an optimal peptide amount to be injected into this particular LC-MS/MS system. Finally, using tryptic digests from human HEK293T cells and showing that injecting equal peptide amounts, rather than approximate ones, result in less variable LC-MS/MS and protein quantification data. The obtained quality improvement together with easy implementation of the approach makes it possible to routinely quantify MS-ready peptides as a next step in daily proteomics quality control.Read article
MOLECULAR & CELLULAR PROTEOMICS, 2020
Label-Free Quantitative mass spectrometry based workflows for differential expression (DE) analysis of proteins impose important challenges on the data analysis because of peptide-specific effects and context dependent missingness of peptide intensities. Peptide-based workflows, like MSqRob, test for DE directly from peptide intensities and outperform summarization methods which first aggregate MS1 peptide intensities to protein intensities before DE analysis. However, these methods are computationally expensive, often hard to understand for the non-specialized end-user, and do not provide protein summaries, which are important for visualization or downstream processing. In this work, we therefore evaluate state-of-the-art summarization strategies using a benchmark spike-in dataset and discuss why and when these fail compared with the state-of-the-art peptide based model, MSqRob. Based on this evaluation, we propose a novel summarization strategy, MSqRobSum, which estimates MSqRob's model parameters in a two-stage procedure circumventing the drawbacks of peptide-based workflows. MSqRobSum maintains MSqRob's superior performance, while providing useful protein expression summaries for plotting and downstream analysis. Summarizing peptide to protein intensities considerably reduces the computational complexity, the memory footprint and the model complexity, and makes it easier to disseminate DE inferred on protein summaries. Moreover, MSqRobSum provides a highly modular analysis framework, which provides researchers with full flexibility to develop data analysis workflows tailored toward their specific applications.Read article
JOURNAL OF PROTEOME RESEARCH, 2020
Spectral similarity searching to identify peptide-derived MS/MS spectra is a promising technique, and different spectrum similarity search tools have therefore been developed. Each of these tools, however, comes with some limitations, mainly because of low processing speed and issues with handling large databases. Furthermore, the number of spectral data formats supported is typically limited, which also creates a threshold to adoption. We have therefore developed COSS (CompOmics Spectral Searching), a new and user-friendly spectral library search tool supporting two scoring functions. COSS also includes decoy spectra generation for result validation. We have benchmarked COSS on three different spectral libraries and compared the results with established spectral searching tools and a sequence database search tool. Our comparison showed that COSS more reliably identifies spectra, is capable of handling large data sets and libraries, and is an easy to use tool that can run on low computer specifications. COSS binaries and source code can be freely downloaded from https://github.com/compomics/COSS.Read article
JOURNAL OF PROTEOME RESEARCH, 2020
Protein phosphorylation is a key post-translational modification in many biological processes and is associated to human diseases such as cancer and metabolic disorders. The accurate identification, annotation, and functional analysis of phosphosites are therefore crucial to understand their various roles. Phosphosites are mainly analyzed through phosphoproteomics, which has led to increasing amounts of publicly available phosphoproteomics data. Several resources have been built around the resulting phosphosite information, but these are usually restricted to the protein sequence and basic site metadata. What is often missing from these resources, however, is context, including protein structure mapping, experimental provenance information, and biophysical predictions. We therefore developed Scop3P: a comprehensive database of human phosphosites within their full context. Scop3P integrates sequences (UniProtKB/Swiss-Prot), structures (PDB), and uniformly reprocessed phosphoproteomics data (PRIDE) to annotate all known human phosphosites. Furthermore, these sites are put into biophysical context by annotating each phosphoprotein with per-residue structural propensity, solvent accessibility, disordered probability, and early folding information. Scop3P, available at https://iomics.ugent.be/scop3p, presents a unique resource for visualization and analysis of phosphosites and for understanding of phosphosite structure–function relationships.Read article
Cell migration research has become a high-content field. However, the quantitative information encapsulated in these complex and high-dimensional datasets is not fully exploited owing to the diversity of experimental protocols and non-standardized output formats. In addition, typically the datasets are not open for reuse. Making the data open and Findable, Accessible, Interoperable, and Reusable (FAIR) will enable meta-analysis, data integration, and data mining. Standardized data formats and controlled vocabularies are essential for building a suitable infrastructure for that purpose but are not available in the cell migration domain. We here present standardization efforts by the Cell Migration Standardisation Organisation (CMSO), an open community-driven organization to facilitate the development of standards for cell migration data. This work will foster the development of improved algorithms and tools and enable secondary analysis of public datasets, ultimately unlocking new knowledge of the complex biological process of cell migration.Read article
The inclusion of peptide retention time prediction promises to remove peptide identification ambiguity in complex LC-MS identification workflows. However, due to the way peptides are encoded in current prediction models, accurate retention times cannot be predicted for modified peptides. This is especially problematic for fledgling open modification searches, which will benefit from accurate retention time prediction for modified peptides to reduce identification ambiguity. We here therefore present DeepLC, a novel deep learning peptide retention time predictor utilizing a new peptide encoding based on atomic composition that allows the retention time of (previously unseen) modified peptides to be predicted accurately. We show that DeepLC performs similarly to current state-of-the-art approaches for unmodified peptides, and, more importantly, accurately predicts retention times for modifications not seen during training. DeepLC is available under the permissive Apache 2.0 open source license and comes with a user-friendly graphical user interface, as well as a Python package on PyPI, Bioconda, and BioContainers for effortless workflow integration.Read article
ANALYTICAL CHEMISTRY, 2020
Accurate prediction of liquid chromatographic retention times from small-molecule structures is useful for reducing experimental measurements and for improved identification in targeted and untargeted MS. However, different experimental setups (e.g., differences in columns, gradients, solvents, or stationary phase) have given rise to a multitude of prediction models that only predict accurate retention times for a specific experimental setup. In practice this typically results in the fitting of a new predictive model for each specific type of setup, which is not only inefficient but also requires substantial prior data to be accumulated on each such setup. Here we introduce the concept of generalized calibration, which is capable of the straightforward mapping of retention time models between different experimental setups. This concept builds on the database-controlled calibration approach implemented in PredRet and fits calibration curves on predicted retention times instead of only on observed retention times. We show that this approach results in substantially higher accuracy of elution-peak prediction than is achieved by setup-specific models.Read article
JOURNAL OF PROTEOME RESEARCH, 2020
Although metaproteomics, the study of the collective proteome of microbial communities, has become increasingly powerful and popular over the past few years, the field has lagged behind on the availability of user-friendly, end-to-end pipelines for data analysis. We therefore describe the connection from two commonly used metaproteomics data processing tools in the field, MetaProteomeAnalyzer and PeptideShaker, to Unipept for downstream analysis. Through these connections, direct end-to-end pipelines are built from database searching to taxonomic and functional annotation.Read article
Unipept is an ecosystem of tools developed for fast metaproteomics data-analysis consisting of a web application, a set of web services (application programming interface, API) and a command-line interface (CLI). After the successful introduction of version 4 of the Unipept web application, we here introduce version 2.0 of the API and CLI. Next to the existing taxonomic analysis, version 2.0 of the API and CLI provides access to Unipept’s powerful functional analysis for metaproteomics samples. The functional analysis pipeline supports retrieval of Enzyme Commission numbers, Gene Ontology terms and InterPro entries for the individual peptides in a metaproteomics sample. This paves the way for other applications and developers to integrate these new information sources into their data processing pipelines, which greatly increases insight into the functions performed by the organisms in a specific environment. Both the API and CLI have also been expanded with the ability to render interactive visualizations from a list of taxon ids. These visualizations are automatically made available on a dedicated website and can easily be shared by users.Read article
lennart [dot] martens [AT] UGent.be