A web platform that facilitates the identification of leading genes from sequencing data.




Single-cell RNA-sequencing (scRNA-seq) and RNA-sequencing (RNA-seq) have revolutionized the field of biomedical research, presenting both opportunities and computational analysis challenges. Dealing with the enormous complexity of data, especially in gene selection, remains a critical hurdle. Our platform, Bioinn, excels in identifying key genes within both scRNA-seq and RNA-seq datasets, leveraging more than 15 specialized feature extraction techniques. A unique aspect of Bioinn is its ensemble approach, allowing users to create their personalized method by selecting one technique from each category. This tool provides comprehensive analysis of prominent genes, evaluating their predictive accuracy and relevance to various biological processes and drug interactions. It also includes visual representations like KEGG pathways and PPI networks to give users a complete view of the gene functions. With its extensive features, Bioinn serves as an invaluable resource for discovering and deciphering transcriptional markers for complex diseases using scRNA-seq and RNA-seq research.






⚠ The selection parameters of the methods, which were selected in the previous tabs, remain consistent for the Ensemble approach !

Bioinn

This is a web-based R/shiny application designed for processing and feature selection of RNA-seq and/or scRNA-seq data. It offers a range of reliable and advanced feature selection algorithms, along with frequently used visualization tools, making it a comprehensive platform. Bioinn simplifies the handling of single-cell data, providing researchers with a user-friendly and intuitive interface.

Introduction

The app serves as a versatile platform for identifying disease biomarkers by identifying dominant genes. It employs three established gene selection method categories that utilize statistical, machine learning, and cutting-edge feature selection techniques specifically designed for scRNA-seq data. With 20 different feature selection modes available, users have a wide range of options. The gene list generated by the app can be analyzed further using the EnrichR tool, allowing users to examine enrichment in various biological and pharmacological features such as pathway terms, gene ontology (GO) terms, disease terms, and drug substances. Moreover, users can export snapshots of KEGG pathway maps that highlight the exported genes and biomarkers in png format.

Data upload

In this tab, users can upload the data that they wish to analyze. Datasets can be uploaded in either .rds or .csv formats. Users can also specify the type of separator, the type of quote as well as whether they want either the head (first few lines) or the whole dataset to be displayed once it has been uploaded. Do note that it is extremely important that users specify the correct data corresponding organism (human or mouse).

Feature selection

In the “Run Analysis” tab, users can perform feature selection on the dataset that has been uploaded through the previous tab. They can choose from a wide selection of well established feature selection methods. Two data filtering methods can also be utilized to preprocess the data for further analysis; Users can either remove features with low variance or keep features with a high degree of variability (both thresholds can be specified by users). Data normalization is also an option. Through the “Number of genes” slider, users can specify the number of selected features to be displayed.

Users also have the option of selecting many different feature selection algorithms in order to create an ensemble.

Results are visualized through a bar plot as well as a confusion matrix. The bar plot shows the importance of each selected gene/feature. The confusion matrix displays the performance of a knn classifier on the filtered version of the dataset. Users can also display the results through a heatmap. The heatmap specifically uses cell type or State labeling to organize and display the data.

Enrichment analysis

In this tab, users can check the genes that were highlighted through the analysis in the previous tab. They can choose to look at all the genes or just some of them, like the top 50 highest scoring genes. Users can also select from 18 different Pathway Datasets that are grouped into three categories: Biological Pathways, Biological Ontologies, and Diseases-Drugs.

They can choose to analyze one specific pathway, a combination of three ontology terms, or all available ontology terms. This analysis is done using the Enrichr database, which helps users understand the biological significance of their gene sets.

Graph analysis

In this tab users can analyze protein-protein interactions (PPI) and similarity graphs. With PPI analysis, users can choose a threshold to include only interactions with a high enough score, based on the STRINGdb database. Similarity graph analysis helps users find molecular modules by comparing gene interaction profiles. Users can adjust the level of similarity by setting a Pearson correlation threshold.


Contact info:
Tel: +302411416175, +302411811971
Email: info@pytheia.gr
Address:   Iroon Polytechniou 15, 41222, Larisa, Greece