In a paper published in the journal Minerals, researchers explored selenium (Se) and tellurium (Te) to distinguish different genetic types of ore deposits by collecting trace elemental data from pyrite, sphalerite, and chalcopyrite in Se-Te bearing deposits and using principal component analysis (PCA) and the silhouette coefficient method to identify key distinguishing elements.
Despite the support vector machine (SVM) showing low accuracy in binary discrimination, random forest (RF) and SVM models were developed for better classification. The RF model for pyrite achieved higher accuracy, with Se being crucial for distinguishing volcanogenic massive sulfide (VMS) and epithermal deposits and Te being significant for Carlin-type deposits. This study highlights the effectiveness of machine learning (ML) in improving ore genetic type classification and aiding mineral resource exploration.
Trace Element Analysis
Over 10,424 trace elemental data points of pyrite, sphalerite, and chalcopyrite from various global deposits were systematically collected and analyzed using laser ablation inductively coupled plasma mass spectrometry (LA-ICP-MS) techniques. The study examined pyrite from seven genetic types, sphalerite from six, and chalcopyrite from five, analyzing 12 to 15 elements.
Significant differences in element contents, such as higher Se in VMS and epithermal deposits and lower Se in sedimentary exhalative (SEDEX) deposits, were observed, aiding in distinguishing ore genetic types. These findings highlight the need for advanced methods to improve classification accuracy.
Study Framework Overview
The general framework of this study consists of four key parts: data collection, preprocessing, two-dimensional discriminant diagram plotting, and high-dimensional classifier model construction. Initially, the team established the trace elemental dataset and conducted data preprocessing to address missing values and normalize the data.
Two critical operations followed: generating a two-dimensional discriminant diagram for visually identifying different deposit geneses and constructing a high-dimensional classifier model to enhance classification accuracy. Generating the discriminant diagram involved using PCA to select elements with the highest contributions and lowest correlations, especially Se and Te, and then using these elements to create diagrams evaluated by silhouette coefficients. High-dimensional classifier models, specifically SVM and RF, were trained using 80% of the dataset, with hyperparameters optimized to achieve the best performance, and evaluated using the remaining 20%. Feature importance was analyzed using the SHAP algorithm to interpret the model and identify the most significant elements.
ML methods such as PCA, silhouette coefficient, RF, and SVM played crucial roles in this study. PCA was employed to reduce dimensionality and identify relationships among trace elements and ore genetic types. The silhouette coefficient quantified the effectiveness of clustering in the discriminant diagrams. RF and SVM were chosen for their proven effectiveness in geological data processing.
Model evaluation was performed using cross-validation, confusion matrices, and statistical metrics like accuracy, precision, recall, and F1 score. Feature importance analysis, using the SHAP algorithm, enhanced model interpretability by quantifying the contribution of each trace element in the ore genetic type identification, leading to a preliminary software application for predicting Se-Te bearing deposits based on pyrite trace elements.
Dimension Reduction Analysis
In the analysis of pyrite data, the application of PCA led to a dimensional reduction procedure, revealing that PC1 and PC2 captured 33.2% and 19.8% of the cumulative information, respectively. The resulting biplot representation demonstrated a clear demarcation line for magmatic sulfide, Carlin, and porphyry deposits, explaining only over 50% of the data variability. Despite limitations and overlaps among other ore genetic types, the discriminant diagrams showed potential in differentiating specific deposit types, especially with elements like Se and Te exhibiting low or negative correlations.
In the analysis of sphalerite, PCA reduced data dimensions, where PC1 and PC2 captured 30.1% and 18.1% of the information, respectively, revealing a clear distinction for porphyry deposits but significant overlap among others. Strong negative correlations between elements like Se and Te aided class differentiation, with optimal end-element combinations, like Te and Se/Cd, showing high silhouette coefficient scores.
Similarly, in chalcopyrite analysis, despite PCA, PC1 and PC2 did not explain over 50% of data variability, indicating complex geochemical processes. End-element combinations like Te and Zn/Se were identified, emphasizing caution in result interpretation due to process complexity.
Conclusion
In summary, the analysis found that Se content generally exceeded Te content in the collected metal sulfides due to Se's higher sulfur affinity. However, two-dimensional discriminant diagrams could have distinguished ore genetic types more effectively across pyrite, sphalerite, and chalcopyrite datasets. Notably, the pyrite dataset performed strongly in both RF and SVM models, achieving accuracies of 94.5% and 93.8%, respectively, making it the most suitable for training classifiers.
Feature importance analysis highlighted Se's significance in VMS and epithermal deposits and Te's role in distinguishing Carlin-type deposits. It enhanced the interpretability of the classification model and provided crucial insights for rapid regional-scale ore exploration. Additionally, a web application was deployed to aid geologists in accessing predictions and facilitating ore exploration efforts.