In an article published in the journal Ampersand, researchers investigated the influence of first languages (L1) on the grammatical errors made by English as a second language (ESL) learners, using the first certificate in English (FCE) corpus.
By analyzing three error types, the authors demonstrated statistically significant relationships between errors and linguistic characteristics of learners' L1s, confirming both positive and negative language transfer. The findings aligned with second language acquisition (SLA) literature and validated the use of grammatical error correction (GEC) corpora for crosslinguistic influence (CLI) analysis.
Background
CLI, also known as language transfer, refers to how a person's knowledge of one language affects their learning or use of another language. This phenomenon has been a significant focus within SLA research, with early studies dating back to the nineteenth century. CLI encompasses various aspects such as syntax, vocabulary, and concepts.
Although early CLI studies often faced challenges due to small sample sizes and confounding factors, recent advancements have led to the development of large SLA corpora like the International Corpus of Learner English (ICLE) and the Norwegian learner corpus Norsk andrespråkskorpus (ASK), enhancing the robustness of CLI research.
Parallel to SLA, natural language processing (NLP) has made significant strides, particularly in GEC, which involves detecting and correcting grammatical errors in learner texts. Key GEC corpora include the National University of Singapore Corpus of Learner English (NUCLE) and the Lang-8 Learner Corpora. However, these datasets often lack detailed annotations and balanced L1 distributions, limiting their utility for CLI analysis.
This paper sought to address these gaps by leveraging the FCE corpus, which included balanced L1 distributions and comprehensive error annotations. By analyzing the relationship between error types and L1s, this study aimed to validate the FCE corpus for CLI research, offering insights into how linguistic characteristics of L1s influence ESL learners' error patterns. This approach not only aligned with SLA methodologies but also enhanced the empirical foundation for studying CLI.
Analyzing Error Types in ESL Writing Based on L1 Characteristics
The authors used a subset of the FCE corpus, excluding texts from two L1 backgrounds due to their minimal representation. The error annotation (ERRANT) toolkit identified and categorized error types within this corpus, commonly used in GEC system evaluations like conferences on natural language learning (CoNLL)-2013 and CoNLL-2014. The analysis investigated the relationship between specific error types and the linguistic properties of various L1s compared to English. Error proportions were compared across L1s, using the Wilcoxon rank sum test to determine statistical significance, with a significance level set at 0.05.
Three error types are analyzed in detail: articles and determiners (DET), prepositions (PREP), and spelling (SPELL). These were chosen due to their high frequency in ESL writings and the varying complexity of their correction. DET errors had a limited set of corrections, PREP errors had a moderate range, and SPELL errors presented an unbounded set of possibilities. Additionally, extensive SLA research has focused on these error types.
L1s were grouped based on linguistic characteristics using the World Atlas of Language Structures (WALS). For DET errors, L1s were divided into those with and without article systems similar to English. PREP errors were grouped by the similarity of prepositional systems to English. For SPELL errors, L1s were categorized by whether they use the Roman writing system or belong to the Indo-European language family, though the latter grouping was noted as potentially less intuitive. This approach allowed the researchers to assess CLI on error types, providing insights into how L1 characteristics impact English language acquisition.
Results and Comprehensive Error Analysis
The analysis of error types in ESL writing based on L1 characteristics revealed significant variations across different language backgrounds. DET errors were more prevalent among L1s lacking an article system, such as Russian, Polish, and Korean, suggesting a challenge in acquiring English article usage due to a lack of positive transfer.
Conversely, L1s with similar linguistic structures to English, like Spanish, demonstrated fewer DET errors. Similarly, PREP errors showed patterns of negative transfer, with preposition-containing L1s like Portuguese and Catalan exhibiting higher error rates compared to non-preposition system languages like Polish and Russian.
SPELL errors also displayed intriguing trends, with Indo-European L1s generally exhibiting higher error rates. However, this contradicted earlier studies, suggesting complex interactions between L1 characteristics and spelling accuracy. Negative transfer was evident, as seen in misspellings reflecting similarities between L1 and English words. Additionally, underuse and overuse phenomena contributed to SPELL errors, particularly among learners with closer L1-English relationships.
These findings underscored the intricate interplay between L1 characteristics and ESL error patterns. Understanding these dynamics could inform targeted language instruction strategies tailored to learners' specific linguistic backgrounds, facilitating more effective language acquisition and error reduction. Further research exploring the nuanced influences of L1 characteristics on ESL writing errors promised valuable insights for language educators and curriculum developers.
Conclusion
In conclusion, the researchers analyzed the FCE corpus to investigate the influence of L1 on grammatical errors made by ESL learners, demonstrating significant relationships between errors and L1 characteristics. The findings aligned with SLA literature, confirming both positive and negative language transfer. This research highlighted the potential of using GEC corpora for CLI analysis and suggested applications for enhancing language learning strategies and improving GEC models by incorporating L1 information. Future work can explore new CLI types and integrate L1 insights to refine GEC approaches.