A paper published in the journal PLOS ONE presented machine learning pronto (MLpronto), a user-friendly system tailored for ML analysis. This platform aimed to democratize machine learning by offering accessibility without the necessity for programming skills or prior machine learning knowledge. Utilizing a web interface, MLpronto processed data files swiftly, executing prevalent supervised machine learning algorithms and promptly presenting the analysis outcomes.
Notably, this system generated corresponding programming code for the machine learning analysis, catering to varied user preferences. It served as a no-code solution for citizen data scientists, an educational tool for beginners in machine learning, and a stepping stone for individuals inclined towards programming code, expediting the development of machine learning projects.
Background
Access to extensive data presents a challenge in extracting insights, met by machine learning's potent analysis tools. While some demand expertise and resources, efforts to democratize machine learning have introduced accessible tools like Copilot, Generative Pre-trained Transformer (ChatGPT), and Ghostwriter for code generation, alongside platforms like OpenML and Kaggle for sharing datasets and experiments. Automated ML (AutoML) systems streamline end-to-end analysis, offering low-code approaches. MLpronto furthers this accessibility by providing a user-friendly interface, minimizing barriers to engagement without requiring programming or machine learning expertise.
MLpronto Workflow and Benchmark Evaluation
MLpronto, developed in Python utilizing the scikit-learn library, executes supervised machine learning algorithms for classification and regression. Its workflow involves:
- Taking structured data files.
- Offering several parameters for customization.
- Providing 16 algorithm options for analysis.
The output encompasses various analyses for training and testing data, such as principal component projections, feature relationships, and comprehensive metrics like accuracy, F1 score, R2 score, and more. Additionally, MLpronto generates Python code and a Jupyter notebook specific to the user's input, facilitating reproducibility and customization of analyses for evolving machine learning projects.
Evaluating MLpronto's performance involved testing it alongside five other machine learning tools using the Penn Machine Learning Benchmarks (PMLB) dataset. MLpronto, operating with default settings and employing gradient boosting across all datasets, was compared against Weka, PyTorch, Auto-Weka, Auto-PyTorch, and Auto-Sklearn. Weka and PyTorch represented single-model tools, while the latter were AutoML tools constructing model ensembles. The evaluation spanned 286 benchmark datasets with different tool-specific settings and execution times, wherein AutoML tools continuously explored the search space to enhance performance within specified time constraints.
MLpronto: User-friendly, Swift, and Competitive.
MLpronto, aimed at democratizing machine learning, operates through a user-friendly web interface, eliminating the need for coding and ensuring rapid execution. To gauge its performance against accuracy, researchers compared MLpronto with five advanced ML tools: Weka, PyTorch, Auto-Weka, Auto-PyTorch, and Auto-Sklearn. Weka and PyTorch, akin to MLpronto, are single-model tools where users choose a specific model. PyTorch specializes in deep learning models using Python, while Weka relies on Java.
Conversely, AutoWeka, Auto-PyTorch, and Auto-Sklearn are AutoML systems utilizing meta-learning and Bayesian optimization to optimize learning algorithms and hyperparameters across search spaces. The emphasis of MLpronto on user-friendliness contrasts with the objective of AutoML systems, which strive to optimize predictive accuracy through sophisticated model ensembles.
In evaluating MLpronto's performance against these tools, researchers conducted assessments across 286 benchmark datasets encompassing 165 classification and 121 regression problems. For classification datasets, median F1 scores revealed that MLpronto performed comparably to Weka and PyTorch, achieving a median F1 score of 0.86 compared to 0.84 and 0.82, respectively. The runtime analysis indicated generally faster operations for Weka and MLpronto. In regression datasets, MLpronto exhibited significantly higher median R2 scores (0.87) compared to Weka (0.66) and PyTorch (0.79), showcasing superior performance. MLpronto also demonstrated faster runtimes in this domain.
Comparisons with AutoML tools for classification problems revealed similar median F1 scores across various runtimes. MLpronto maintained a median F1 score of 0.86, aligning with the scores achieved by AutoWeka, Auto-PyTorch, and Auto-Sklearn. In regression problems, MLpronto exhibited a median R2 score of 0.87, surpassing AutoWeka and Auto-PyTorch for shorter runtimes. However, Auto-Sklearn showed slightly higher performance than MLpronto, albeit without statistical significance. Notably, MLpronto consistently operated with faster runtimes in both classification and regression domains.
These findings reveal that despite prioritizing simplicity and user-friendliness, MLpronto achieves competitive performance comparable to sophisticated AutoML systems across a diverse dataset range. While AutoML systems may offer enhanced accuracy with longer runtimes, MLpronto stands out for its swift execution and no-code approach, making it an appealing choice for users seeking efficient and accessible machine learning tools. Additionally, MLpronto's performance closely aligns with AutoML systems, challenging the perception of a trade-off between usability and predictive accuracy in machine learning tools.
Conclusion
To sum up, MLpronto is a user-friendly ML tool designed to democratize the field and make it accessible to everyone. However, it does have some limitations. It currently focuses on supervised learning methods and does not support unsupervised methods, hyperparameter tuning, or data acquisition and wrangling.
Additionally, the web server has a file size limit of 100 megabytes. MLpronto only accepts structured data in text or spreadsheet format and does not support images, audio files, or unstructured text. Despite these limitations, the developers have plans to expand their capabilities by supporting different types of input and optimizing the hyperparameter search space. They also intend to output code in multiple programming languages. Overall, MLpronto aims to make ML accessible and usable for all.