PerSEveML: A Web-Based Tool to Identify Persistent Biomarker Structure for Rare Events Using Integrative Machine Learning Approach

Step 1: Input file and options

You have the option to select your own file in .csv, tab-delimited, excel and RDS format, or use one of the example data set, Nilsson rare, Mosmann rare, and SIN3 network.
Select your categorical response variable (formatted as 0 and 1) from “Choose binary outcome”.
Select the variables that you don’t want in your data analysis from the “Select unwanted variables”.

Step 2: Resampling Methods

You have the option to select one of the two resampling techniques for class imbalance from the “Resampling Methods” option:

SMOTE
ADASYN
None

Please select “None” when you don’t want to use resampling methods in your data analysis.

Step 3: Normalization

You have the option to select one of the following normalization techniques from the “Normalization Technique” option:

Min-max
Log Scaling
Standard Scaling
Arcsine
TopS
Percentage Row
No Normalization

Please select “No Normalization” when you don’t want to to normalize your data or have already performed normalization outside PerSEveML.

Step 4: Machine Learning Models

You can select one or more machine learning (ML) algorithms based on your research question from the “Choose preferred algorithms” option. The options include:

Tree-based algorithms:

Decision Tree
Random Forest
XgBoost
AdaBoost

Non-tree based algorithms:

Naive Bayes
Linear SVM
Non-linear SVM
Polynomial SVM

Linear classifiers:

LDA
Logistic Regression
Lasso Regression
Ridge Regression

Step 5: Select the train-test percentage

The user can select can either enter or select any percentage between 0-100 to split the normalized/raw data into training and test sets using the “Insert train-test ratio” tab. Note that test set should have at least some data points for the app to run successfully.

Step 6: Select the value of k for cross-validation

This app heavily rely on the performance of the ML methods. Since the algorithm is based on hyper-parameter tuning using grid search or cross-validation, using an optimum value of k is crucial. This app allows the user to select the value of k between 1 to 10 in the “Value of k for cross validation” tab.

Step 7: Select the threshold for cutpoint analysis

Based on the user defined cut-point in the “Insert cut-point analysis cutoff”, this app will formulate the persistent biomarker (or feature) structure. An user can use different combinations of normalizations, ML methods, and thresholds to come up with the most peristent structure.

Step 8: Running the app

After the user have successfullly selected the preferred input options, they can go ahead and click on the gree “Submit” button of the upper left corner.