In an article recently published in the journal Scientific Reports, researchers proposed an intelligent English composition grading method integrating deep learning (DLIECG) for scoring English compositions in higher vocational colleges.
Background
English is a compulsory course in higher vocational colleges and compulsory education in countries such as China. The English exam-oriented education in China primarily focuses on written tests, which holistically evaluate the English proficiency of students based on their English writing ability, reading comprehension ability, and vocabulary proficiency.
Among these tests, the English composition examination comprehensively assesses the language ability of students from difficult and long sentences, words, and grammar, and their overall text expression ability. However, comprehensively and efficiently assessing every student and providing timely feedback on the writing issues/composition problems of students can be extremely challenging for teachers.
Additionally, the composition judgment can be affected by the teachers' subjective factors. Thus, the long grading time, slow feedback, and subjectivity of conventional English essay grading must be addressed by realizing an effective intelligent English composition scoring method.
The proposed deep learning-based approach
In this study, researchers proposed a topic decision model (TDM) to determine the topic relevance score of the topic richness in English composition and then proposed the DLIECG method combining deep learning and artificial feature extraction based on the score of relevance based on topic richness (TRSR) calculation method to address the limitations of conventional automatic English composition grading methods and topic relevance feature extraction methods.
The TRSR was designed to address the topic richness dimension in English composition to achieve efficient and objective intelligent scoring. The intelligent English composition scoring method's preparation stage comprised recurrent neural network (RNN), pre-training word vector (PWV), text segmentation (TS), and transfer learning (TL).
PWV encoded semantic and syntactic information into a dense vector, which solved the dimensionality curse due to conventional single hot encoding. RNN is used extensively in natural language processing, which solves the convolutional neural networks' inability to extract global semantics, the large space occupied by conventional language models, and the unordered input information within feedforward neural networks.
TL solves the contradiction between personalized needs and general models and between few labels and big data. TL can be categorized as TL techniques based on models, relationships, instances, and features. The model-based TL method can train models in the source domain using a substantial amount of data for process prediction in the target domain.
TS can be utilized to separate written text into meaningful units. These segmentation tasks are divided into topic segmentation tasks and basic discourse unit segmentation tasks based on the segmentation granularity. Topic segmentation involves dividing a section of text using topic semantic information, with every topic being continuous.
Evaluation and findings
The F1 value, recall rate, and accuracy rate were utilized as evaluation indicators to verify the proposed TD model in the study. Researchers used data from W and P datasets for the experiments. The data of both W and P datasets were divided into verification, testing, and training sets at a 1:1:8 ratio.
Researchers used the training set to learn the patterns and features of the task better, while the verification set was utilized to adjust the model hyper-parameters and monitor the underfitting/overfitting of the model. Eventually, the testing set was employed to assess the trained model's performance on the dataset. The model depth, hidden layer size, and input dimension parameters were 2, 64, and 768, respectively.
The results showed that the TD model realized the best effect when it was iterated 80 times, and the corresponding F1 value, recall, and accuracy were 0.95, 0.93, and 0.97, respectively. Additionally, the model training loss was finally stabilized at 0.03. A comparative analysis was performed between the TD model and the RNN model to scientifically verify the TD model's performance.
Results displayed that the proposed TD model had a higher accuracy and F1 value compared to the RNN model. For instance, the F1 value of the TD model was 4.69% higher, while the accuracy was 5.67% higher compared to the RNN model.
Moreover, the dataset P and the correction network's machine evaluation dataset were utilized as experimental data to evaluate the effectiveness of the DLIECG method. The proposed method was also compared with intelligent scoring algorithms for English writing quality based on machine learning (MLIS), RNN model, and BiLSTM model.
Results showed that the DLIECG performance was significantly better than both BiLSTM and RNN. The DLIECG method also displayed an improved performance on dataset P compared to online machine evaluation by achieving a maximum score of 0.980.
To summarize, the findings of this study demonstrated the effectiveness and reliability of the proposed intelligent English composition scoring method/DLIECG for English composition grading.