Date of Award

12-2024

Document Type

Open Access Thesis

Degree Name

Master of Arts (MA)

Department

Linguistics, Applied

First Advisor

Charles F. Meyer

Second Advisor

George Mikros

Abstract

Phonics instruction is widely regarded as the most efficacious method for teaching children to read English. In order to acquire reading, young-learners must connect orthography to phonology. An obstacle to English reading acquisition is the language’s opaque orthography. Its grapheme to phoneme correspondence is not 1:1. Instead, a range of multi-letter strings, or graphotactics, correspond to various phonemic sequences. This network of graphophonemic correspondences renders English reading acquisition an arduous undertaking for young-learners. Phonics curricula like Letters and Sounds (Department for Education and Skills, 2007) in the U.K. and UFLI Foundations (Lane et. al., 2022) in the U.S. aid young-learners in learning these graphophonemic correspondences by drawing their attention to the regular connections between graphotactics and phonemic sequences. These curricula afford a scope and sequence for teaching children these correspondences, but the metric for prioritizing these correspondences is unclear. Aside from a few shared correspondences near the beginning of both curricula, these pedagogic trajectories are quite different. Assumedly, phonics instruction aims to empower a young-learner with expedient access to authentic English text. In turn, knowing the frequency distribution of graphophonemics in such text is important if not essential information for curriculum design. Unfortunately, while previous research measures the frequency distribution of graphotactics (Venezky, 1999), or phoneme distribution in speech (Mines et al., 1978), or conditional probabilities of graphemic-phonemic correspondences (Berndt et al., 1987), it appears that no corpus linguistics analysis has identified the frequency distribution of graphemic-phonemic correspondences in English children's literature, the text children are learning to read. Given that the English vowel phonemic sequences are most subject to sound change, and thus offering more adversity in phonics instruction than consonants, this corpus linguistics study examined the graphophonemic frequency distribution of American English vowel sounds in a the new Corpus of English Children's Literature (COECL) coded for Graphophonemic Analysis (COECL-GPA). COECL includes 7,310,441 words, from 178 complete texts of Newbery winners/nominees and critically acclaimed English children's literature published between 1900-2010. COECL-GPA identifies 8,442,993 vowel appearances with 131 graphotactics, 85 phonemic sequences, and 272 graphemic-phonemic correspondences. This analysis identified the frequency distribution of each of these categories and the proportional frequency breakdowns of each graphotactic and phonemic sequence into its constituent correspondences. Results revealed that vowel graphophonemes have more compressed distribution than the traditional 80-20 Pareto Principle and a steeper slope than a Zipfean Distribution, but their distribution is compatible with Zipf's Law of Abbreviation. Additionally, in a contrastive analysis of these findings with the pedagogic trajectories of Letters and Sounds (Department for Education and Skills, 2007) and UFLI Foundations (Lane et. al., 2022), it appears that neither curriculum's sequence of vowel lessons respects the frequency distribution of vowel graphophonemics in authentic children's literature. This is likely because, presently, no software can sufficiently account for all the sublexical features that modulate English graphophonemics, meaning that such corpus linguistic analysis must be conducted manually, which can be time prohibitive. Fortunately, these findings afford a more expedient route to modifying phonics and reading instruction, for the vowel graphotactics can be retuned to the phonetics of any English variety, which would be of great service to bilingual young-learners in the Peripheral Anglophone.

Comments

Free and open access to this Campus Access Thesis is made available to the UMass Boston community by ScholarWorks at UMass Boston. Those not on campus and those without a UMass Boston campus username and password may gain access to this thesis through Interlibrary Loan. If you have a UMass Boston campus username and password and would like to download this work from off-campus, click on the "Off-Campus UMass Boston Users.

Share

COinS