In an article recently published in the journal Nature Communications, researchers proposed a deep learning (DL)-based approach for label-free identification of neurodegenerative diseases (NDD)-associated aggregates (LINA) for the first time.
Background
NDDs, such as Huntington’s disease (HD) and Alzheimer’s disease (AD), are major health concerns affecting millions of patients around the world and are currently incurable. Protein aggregation and misfolding/misfolded protein aggregates play crucial roles in the pathogenesis of different NDDs.
However, the role and formation mechanisms of misfolded protein aggregates in the disease pathogenesis have not been comprehensively understood until now due to a lack of methods and tools that enable direct monitoring of the various phases of protein aggregation and misfolding.
Fluorescent proteins (FPs), such as mCherry, YFP, and GFP, are typically fused to the N- or C-terminus of the protein of interest to monitor protein aggregation in living cells. However, fluorescently tagged proteins demonstrate altered cellular, biophysical, or biochemical properties.
In the context of NDDs, the addition of FPs to proteins of various sizes can alter the aggregation kinetics, the final aggregate size, the aggregate interactome, and the ultrastructural organization of aggregates in the final inclusions that accumulate in brain tissues/cells.
Multiple NDD-associated proteins have shown this discrepancy, including exon1 of the Huntingtin (Htt) protein (Httex1), one of the key intracellular protein aggregate components found in HD post-mortem brains. Httex1 overexpression in several animal and cellular models of HD recapitulates multiple key HD human pathology features, including Htt aggregation and inclusion formation, brain atrophy, and neurodegeneration.
The DL-based approach
In this study, researchers proposed an approach for LINA based on DL to detect unaltered and unlabeled Httex1 aggregates in living cells from transmitted-light images without fluorescent labeling.
Researchers employed the virtual labeling concept on an effectively characterized mutant Httex1 aggregation and inclusion formation cellular model by training a neural network using data obtained using common imaging modalities, including brightfield/QPI and widefield fluorescence, to generate LINA models.
Most imaging was performed using a custom-built microscope equipped with a carbon dioxide and temperature-controlled incubator for live cell imaging. A dataset of more than 1000 eight-plane fluorescence and brightfield image pairs of fixed HEK 293 (HEK) cells overexpressing a mutant Httex1 construct with 72 polyQ repeat length fused to GFP (Httex1-72Q-GFP) was created using the microscope.
The dataset was processed to obtain phase images/pixel-registered eight-plane QPI and fluorescence images. 10% of the dataset was used as the test set, while the rest was split into validation and training sets, with 20% being used for validation. The training set was utilized to train a deep convolutional neural network (CNN) and generate LINA models for both pixel classification and pixel regression. To reduce the mapping complexity, every ground truth model was produced as the maximum z-projection of every eight-plane image stack.
Significance of the study
Researchers used the function ‘Pearsonr’ in the Scipy library (v1.7.3) to measure the Pearson correlation coefficient (r) between the ground truth and the network output to evaluate the reliability of the LINA pixel-regression model.
The correlation was consistently high throughout the test set, indicating the model's high reliability. Researchers also assessed the model’s quantitative performance by measuring the normalized mean squared error (NMSE) and r. The results demonstrated that the correlation was very high, and errors were consistently low.
Although the pixel classification model demonstrated good performance with a mean Jaccard index of 0.78, the pixel-regression model was more accurate with a mean Jaccard index of 0.81 when segmentations were produced from its regression predictions.
Researchers used the r to compare the total intensity in the aggregates in the prediction images with the ground truth. The results showed a very high correlation with 0.91 r value. They validated the network on various label-free mutant Httex1 constructs, including unlabeled mutant Httex1 with 72 polyQ repeat length with a truncated Nt17 domain (Httex1-ΔNt17-72Q), unlabeled mutant Httex1 with 72 polyQ repeat length (Httex1-72Q), and unlabeled mutant Httex1 with 39 polyQ repeat length (Httex1-39Q), to determine the effectiveness of the models on unlabeled protein aggregates and their generalizability to other constructs.
Although the network was only trained on GFP-labeled protein aggregates, it accurately identified the aggregates for all three constructs, which indicated that the model was generalizable to different label-free Httex1 constructs. The LINA model can precisely identify unlabeled mutant Httex1 aggregates with areas of around 3 µm2 and at 3 ms exposure times.
Moreover, applying the proposed neural network models to identify aggregates formed by various label-free mutant Httex1 constructs effectively enabled the measurement and comparison of the aggregates’ dry masses. The models could also identify aggregates from live-cell imaging data and measure the area and dry mass of aggregates during their formation.
To summarize, the findings of this study demonstrated that the proposed DL-based approach can be feasibly used for virtual labeling on unaltered and unlabeled protein aggregates in living cells with rapid speed, high accuracy, and relative simplicity.
Journal reference:
- Ibrahim, K. A., Grußmayer, K. S., Riguet, N., Feletti, L., Lashuel, H. A., Radenovic, A. (2023). Label-free identification of protein aggregates using deep learning. Nature Communications, 14(1), 1-11. https://doi.org/10.1038/s41467-023-43440-7, https://www.nature.com/articles/s41467-023-43440-7