Train Github Model

Ensemble: RF + XGBoost + HGB + LightGBM + SVM with 5-fold CV

Back
What gets trained?
Three models — Random Forest, XGBoost, and SVM — are combined into a soft-voting ensemble. Feature importance and SHAP explanations are auto-generated after training.
Required: CSV must have a label column — one of: label, is_fake, fake, target, is_bot, bot  (0 = Legit, 1 = Fake)
Select multiple files to merge. Max 20 MB per file.
Selected files:
Training in progress... Preparing...

Balance class distribution to improve recall on minority class
More powerful but slower to train
Train on older data, validate on newer — more realistic evaluation
Drops low-importance and highly correlated features automatically
Automatically finds best RF/XGB hyperparameters (adds ~2–3 min)
More trials = better tuning, longer wait

Cross-validated metrics
Feature importance chart
Distribution analysis