Reading Trajectories in Elementary Grades: A Longitudinal Analysis

: Research shows that children's reading proficiency levels in the early grades positively correlate with students' future academic achievement. This study provides the first-ever analysis of reading achievement trajectories for a cohort of students in grades 3 to 5 in 2014 – 17 in Hawaii schools. Hawaii serves a diverse student population whose characteristics differ in ways often overlooked by standard US racial and ethnic classifications. Our analysis shows that Native Hawaiian and Pacific Islander students not only start at a lower reading proficiency than their peers in Grade 3, but the achievement gap widens as they move from Grade 3 to Grade 5. Moreover, we find a strong association between students' third-grade performance and reading achievement growth rate above and beyond all other factors in our longitudinal model. The difference in performance patterns between student subgroups across the elementary grades can serve as an accurate baseline for yearly monitoring. In light of our findings, we discuss implications for policy and practice.


Introduction
The facility with which we learn, how academically successful we are, the extent to which we behave or misbehave, whether we need special education services, how easily and well we learn English, whether we persist in school, graduate high school, enroll and succeed in college, how much we earn and accomplish professionally are among the multiple outcomes that positively correlate with reading proficiency as early as the third grade (Fiester, 2010;Hein et al., 2013;Hernandez, 2011;Lesnick et al., 2010;Yoshikawa et al., 2013). An achievement gap in Grade 3 is consequently a significant source of concern.
Examining the state standardized assessment results of Native Hawaiian (NH) students and their higher-performing peers reveals a statistically significant and increasing achievement gap starting in Grade 3 (Singh, 2011;Singh et al., 2014). NH students experience circumstances that profoundly hinder their educational, economic, and social progress, resulting in performance that lags behind White students, even those with the same socioeconomic or special education status.
average-based differences between groups and implementing fine-grained comparative analyses of how students traverse through the achievement distribution over time" (Quintana & Correnti, 2020, p. 1630. Cross-sectional comparisons alone cannot adequately explain differences in student performance and growth over time and the contribution of the resources available to them (Pfost et al., 2014). This paper demonstrates how school districts can effectively and realistically use state-level longitudinal data to understand how student academic growth trajectories differ by demographic characteristics and initial performance (Morgan et al., 2011). Districts can leverage these analyses to identify student challenges and their underlying root causes and then select appropriate evidence-based strategies to meet the needs of every student and close achievement gaps. Investigating children's early educational experience when they are "maximally sensitive to home, and school influences" is an opportunity for timely intervention, and students' "cognitive growth rates (are) higher in the first few grades than they are later on" (Entwisle & Alexander, 1992, p. 73).

Research Questions
Our study answers two research questions. First, how do students' reading trajectories vary during the elementary school years? Second, how are growth rates related to student demographic characteristics and their initial reading status at Grade 3? The findings will equip practitioners with a research-based method to support tailored programmatic decisions based on longitudinal, student-level reading performance.
1. What is the difference in the average trajectory of actual change associated with student demographic characteristics in Hawaii's elementary grades?
2. Accounting for student demographic characteristics, is there a relationship between a student's initial status (Grade 3 performance) and growth rate?
Beyond observing a positive correlation between children's initial and ensuing performance, Rigney (2010) posited that the Matthew effect could be absolute or relative. An absolute Matthew effect reveals developmental patterns demonstrating high levels of student performance at the outset increase over time. In contrast, initially, low levels of student performance will decrease over time (Rigney, 2010). For example, a longitudinal study follows students from Grade 1 to Grade 5; Entwisle et al. (1997) determined that an achievement disparity between disadvantaged and advantaged students can increase from less than a year in Grade 1 to nearly three years at the end of Grade 5. Likewise, a relative Matthew effect assumes that students who demonstrate high initial reading skills will improve over time. However, it also suggests that students who demonstrate low initial reading skills will make little or no gains (Rigney, 2010).
The second type of developmental pattern refers to either the compensatory effect (Aarnoutse & van Leeuwe, 2000;Parrila et al., 2005;Phillips et al., 2002) or the developmental-lag model (Francis et al., 1996;Parrila et al., 2005;Rourke, 1976). This model argues that children who enter school with lower reading proficiency levels demonstrate faster growth rates than students who start school with higher reading proficiency levels (Morgan et al., 2011;Salaschek et al., 2014). Hence, students' initial inter-individual differences will decrease over time rather than broaden or remain constant. Namely, students with low pretest results will demonstrate more significant gains over time than their peers who had higher pretest results. The third model suggests that the performance gap and growth rates will remain stable over time among initially low-and high-performing children (Pfost et al., 2014).

Closing Achievement Gaps
Modeling and examining children's reading progress can yield insights that can guide policy and educational practice, especially in closing achievement gaps (Kim et al., 2010;Shanley, 2016). The literature on closing achievement gaps suggests two prevailing approaches. The preventative view assumes that children enter school on a level playing field and argues that it is incumbent on schools to ensure that disadvantaged children keep pace with their peers (Johnson, 2002).
Based on evidence supporting school readiness practices, the reparative perspective contends that disadvantaged children enter school with lower proficiency than their more advantaged peers. Thus, schools are responsible for providing underprivileged students with the necessary support to accelerate learning and close the learning gaps. (Shin et al., 2013). In either case, the achievement gap between student groups is an enduring issue in American education. Nevertheless, all efforts to close the gaps should be informed by methods designed to monitor students' performance outcomes and longitudinally analyze their developmental trajectories across multiple years.

Importance and Relevance
Transitioning from elementary to middle school is critical in a student's educational pathway. Middle-school teachers expect incoming students to be grade-level proficient in reading and prepared to meet the academic demands of middle school. However, there are racial or ethnic differences in reading performance during the elementary grades (Camera, 2016;Hemphill & Vanneman, 2011;The Nation's Report Card, 2019;Vanneman et al., 2009). These differences are prominent starting from Grade 3 (Singh et al., 2014).
To investigate the growth rates of different groups of students, we can employ longitudinal statistical procedures to explicitly model student performance change over time. While this has occurred for some racial or ethnic groups (Chatterji, 2006;Lesaux et al., 2007;Lubienski & Crane, 2010), no single longitudinal study has investigated the relationships between predictor variables and student growth in Hawaii's public school system. The likely reason is that those previous assessments, such as the Hawaii State Assessment (HSA) I, HSA II, and HSA III, were not vertically linked. Vertically scaled assessments, which relate to student growth over time, require comparative information about results from assessments of different difficulty levels. The challenges of vertical scaling of assessments have existed since using standards-based assessments to measure individual growth (Patz, 2007).
In 2013, the Hawaii Department of Education (HIDOE) implemented the English language arts common core state standards (CCSS) based on explicit specifications for the reading domain's learning continuum. Moreover, newly developed assessments aligned to the cross-grade CCSS. Thus, the new assessments created a research-based reference frame for individuals' and groups' locations on a learning continuum. Therefore, it allowed for the proper use of its test scores as indicators of student growth.

Research Design
The new assessment implemented in 2013 allowed us to employ individual growth curve modeling techniques to analyze our longitudinal data with three data waves, corresponding to Grades 3, 4, and 5. The scores range from 2,000 to 3,000 spanning sufficient content to indicate increasing performance effectively. We kept the outcome measure scale the same as Hawaii's state reporting protocol.
We followed Singer and Willett's (2003) recommendation for a linear-change individual growth model within a multilevel framework with three data waves. A visual inspection of the data also supports our choice of a linear-change individual growth model as the population model most likely generates our sample data ( Figure 1).

Figure 1. Spaghetti plots by ethnicity. There is variation in initial and growth patterns among different racial/ethnic student groups.
The individual growth model included time-invariant demographic predictor variables: students' gender, male, coded zero if female or one if male; categorized as free or reduced-price lunch (FRPL), an indicator to represent poverty (zero, is not classified as FRPL, one categorized as FRPL in any of the elementary grades); special education status (SPED) (zero, does not have an individualized educational plan (IEP), and one, has an IEP in any of the elementary grades).
We grouped the pupils into six major racial and ethnic groups: Asian (Chinese, Korean, and Japanese students), Filipino, Hawaiian, Pacific Islander (PI) (Micronesian, Samoan, and Tongan students), White, and "Other" (for example, American Indian, Black, Guamanian, Hispanic, Portuguese, and multiple race/ethnicities). Moreover, we used the White student group as a baseline; therefore, we created five dummy-coded indicators for the student groups. In total, there were eight individual-level predictors in the model.

Sample and Data Collection
The analytic sample is the 2014-15 third-grade cohort of students with reading scores in all three grades, around 85% of the population. In the complete-case analysis, we removed students who did not have scores in Grades 4 or 5 or both. We also removed students who had repeated a grade (n = 28), resulting in a sample size of 12 139 students.

Analyzing of Data
We included the time variable (grade levels). We centered it at Grade 3 (initial status) to explicitly model the withinindividual change via the time slope parameter. We included only the time-invariant predictors' effects to explain any changes at the intercept and time slope outcomes. We added cross-level interactions between the first level time predictor and the second level student predictors, which modifies the linear time slope of student predictors.
School fixed effects were included as indicator variables to adjust for any clustering effect. We excluded English language status as a predictor because almost half of the PI students were English language learners. Therefore, adding the variable leads to a multicollinearity issue between the two variables. We used the student growth model where the rate is a function of time (Gradeij), and we write the first level model as the following.

First-level model
i and j represent pupils and time (i.e., waves).
i goes from values of 1 through 12,139 (the sample size), and j runs through values from 1 through 3 (for the three waves).
Yij represents the reading scale score for student i at time j. Moreover, π0i and π1i are the individual rate parameters of the i th student in the population. By centering the time variable as Gradeij -3, the intercept π0i shows student i's "true" reading performance at Grade 3. π1i means student i's actual annual (or per grade) rate of change in reading score. We model a straight line adequately representing each student's "true" change over grades. We assume that any changes from linearity arise from random error ( ij).
The second-level model represents the differences in student growth parameters across students and their timeinvariant characteristics. The second-level model features one equation for each first-level growth parameter, which defines the association between the rate and the other factors as follows: Second-level model The linear-change individual growth model estimates each student's change trajectory by specifying a rate of change with two components: one constant across students and another allowed to differ across them. Thus, we will enable the intercept and slope to vary among students and covary. The final specified model is as follows: Final model Yti = β00 + β01Male + β02FRPL + β03SPED + β04Asian + β05Filipino + β06NH + β07Other + β08PI + ∑ ℎ + (β10 + β11Male + β12FRPL + β13SPED + β14Asian + β15Filipino + β16NH + β17Other + β18PI + ∑ ℎ ) *(Gradeij -3) + r0i + r1i*(Gradeij -3) + ij We used the free R software and packages (Bates et al., 2015;R Core Team, 2017) for all analyses. We report unbiased estimates of the variance and covariance parameters using the restricted maximum likelihood procedure.
We also explored other specifications by including interactions among the student-level predictors. However, these did not result in convergence or non-statistically significant results. Therefore, we settled on the more economical model. Only the time variable term interacts with the student-level predictors to study the different student groups' differential growth patterns. The model allows for a straightforward interpretation of the results.

Sensitivity Analyses
For sensitivity tests, we reanalyzed the data in two ways. In the first test, we kept all students with incomplete scores in subsequent waves (Grade 4 or 5) as the multilevel framework allows for incomplete or unbalanced data related to time. In the second one, we imputed missing data using multiple imputations with chained equations. The procedure, developed by Enders et al. (2018), employs a conditional specification and a Bayesian framework to obtain the posterior distributions (Gelfand, 2000). The method allows missing data to be imputed using a specification identical to the final analysis model. We created 20 multiple imputed samples. The results of the numerous imputed samples are then combined to provide unbiased results (Tables 1A).

Findings / Results
We report descriptive statistics for the analytic sample and present the results disaggregated by racial/ethnic student groups. On average, students' scores will be higher in later grades in vertically scaled assessments than in earlier grades. The variance in scores also becomes slightly more extensive in later grades, as shown in the boxplots below ( Figure 2). We also report additional descriptive results in the appendix section for the sample with incomplete waves of data. We report the pooled estimates from the 20 multiple-imputed samples. Figure 2. Reading scaled score distribution in grades 3, 4, and 5. As expected with vertically scaled assessments, scores tend to be higher as students move up the grades, thus justifying the growth model's use. The diamond represents the mean score, while the box's midline represents the median.
The disaggregated performance by student group shows that Asian students, on average, scored the highest in all three elementary grades. In contrast, NH and PI students scored the lowest (Table 1). The descriptive results only illustrate the differences among student groups and do not consider confounding factors. We can only ascertain the unique relationship between a predictor and the outcome through multivariate statistical analyses.

Growth Modeling Results
By comparing the model with and without predictors in the level-2 specification, we estimated the residual variance reduction in the random coefficients (intercept and linear-time slope Statistically controlling for other factors in the model, male students, on average, had an adjusted score of 16.25 points below their female peers in third grade 95% CL: -18.72, -13.78). As their growth rate in reading lagged behind girls' by 2.35 points, this disadvantage increased as boys progressed to Grades 4 and 5. In other words, the model estimates that, on average, boys underperformed girls by 16.25 points in Grade 3, and by the end of Grade 5, this gap increased to 20.95 points.
With other factors held constant, students eligible for FRPL start with an adjusted score of 32.79 points lower than students who are not eligible (95% CL: -35.72, -29.86). The FRPL-eligible students had an average growth rate of 1.31 points (95% CL: -2.51, -0.11) lower than non-eligible students. FRPL-eligible students would score 33.10 points lower in Grade 4 and 34.41 points lower in Grade 5 than their non-eligible peers, increasing the gap from one grade to the next.
The scores of students with IEPs were estimated to be 92.55 points (95% CL: -96.87, -88.23) lower than their counterparts without IEPs in Grade 3, holding other factors constant. This gap increased in later grades. The growth rate was estimated to be 7.31 points (95% CL: -9.09, -5.54) lower than the gap for students without an IEP leading to a gap expected to widen from 92.55 in third, 99.86 in fourth, and 107.17 in fifth grades.
Keeping other demographic factors constant and using White students as the reference group, the model showed that Asian and White students exhibited no statistically significant differences in their Grade 3 adjusted performance and growth rates. Filipino students, however, had an adjusted score of 14.45 points (95% CL: -18.97, -9.93) lower than their White peers in Grade 3. This gap seems to have remained stable in Grades 4 and 5 (as the difference in growth rates between them and their White peers is not statistically significant).
Statistically, there was no significant difference between the adjusted mean between the Other and White students. To illustrate this visually, we plot the adjusted average trends for male students who are not eligible for FRPL or have an IEP for these three racial/ethnic groups (Figure 3).   = 12, 139). n.s. -not significant, p > 0.05; * p < 0.05; ** p < 0.01; *** p < 0.001 Figure 3. Estimated adjusted trends for White, NH, and PI male students. We expect gaps to become more significant between White s and NH or PI groups as they progress from third to fifth grades.
To assess the appropriateness of the model, we graphed the residuals against fitted values. The plot showed no obvious pattern of residual heteroscedasticity (Figure 4, left side). The quantile-quantile plot showed mostly a homogeneous residual distribution except at the lower quartile, which may imply a heavier tail than usual (Figure 4, right side). Overall, the residuals do not violate the homoscedasticity and normality assumptions.

Sensitivity Analyses
The sensitivity test analyses show that the whole sample's estimates with incomplete outcome data or multiple imputations were almost identical to those obtained from the complete-case analysis. The different analyses did not affect our conclusion except for the Other versus White comparison groups (-2.50, p < 0.05 versus -1.94, p = 0.09, see Tables 3  and 4). Typically, the pooled estimates' confidence intervals are wider in the multiple imputation method because of accounting for the additional variation among the 20 imputed samples.  The linear association between the intercept and time-slope residual deviances showed a correlation of 0.58. Therefore, Grade 3 reading (initial status) remains a strong predictor of a student's trajectory even after accounting for demographic characteristics and school fixed effects. Specifically, Grade 3 reading performance accounts for 33.7 percent of the variance in student growth differences above and beyond other demographic and school-level factors. Thus, the finding is consistent with the literature that Grade 3 reading proficiency is a critical milestone predictive of children's future success. The result also conforms to the Matthew effect, characterized by a strong association between a student's beginning reading achievement and subsequent learning rates.

Discussion
Our analysis and findings are an essential first step in truly understanding various student groups' comparative academic growth patterns. We illustrate it with data from the Hawaii public schools, and school districts can replicate it with their data.
Our study confirms the need to intervene early, starting in elementary grades (Hernandez, 2011;Singh, 2011;Singh et al., 2014) and possibly even earlier than Grade 3 (Adams, 1994;Coyne et al., 2004;Fiester, 2013;Foster & Miller, 2007;Fuchs et al., 2004;Mastropieri et al., 1999;McCoach et al., 2006;McNamara et al., 2011). A promising foundation for future success depends on making high-quality early childhood education broadly accessible. Expanding offerings to include health and readiness, including cognitive and social-emotional development, can contribute to system-level change (Fiester, 2010(Fiester, , 2013Hernandez, 2011). We also need to build a seamless transition from early to later grade. This systemic vision goes well beyond the school system and calls for an integrated and well-funded cross-sector approach.
Any system-level change will occur in the neighborhoods where students and their families live and their circumstances, including poverty, mobility, insecurity, and access to healthcare (Fiester, 2013). Student achievement is also closely connected to families, communities, and peers (Coleman, 1966). Closing achievement gaps is the responsibility of a wide range of stakeholders, including but certainly not limited to those who serve in the education field. Moreover, engaging caregivers and equipping them to support their children's learning and well-being is critical (Fiester, 2010;Lonigan & Shanahan, 2009). Leveraging local culture and asset framing is another opportunity. Over time, the educational experiences of NH children have contributed to their low academic performance. They have increasingly been educated in centers away from their families and communities since Protestant missionaries arrived in the state in the early 1800s with little attempts to leverage and integrate the local culture, beliefs, and priorities (Grace & Serna, 2012). Exploring culturally-appropriate early childhood (and beyond) educational offerings and instructional strategies may support children's and youth's cognitive and social-emotional development and chip at the gap between NH students and their peers (Benner, 2011;Wurdeman-Thurston & Kaomea, 2015).
Ensuring student reading proficiency in Grade 3 and beyond will benefit from a diverse body of well-prepared and wellsupported teachers and leaders who can effectively implement the differentiated, culturally responsive instructional practices that will help students learn and demonstrate their learning (Foorman et al., 2016;Gersten et al., 2008;Kamil et al., 2008;Lonigan & Shanahan, 2009;Rodriguez et al., 2004;Shanahan et al., 2010;Tomlinson, 2017).
School districts can implement interventions to increase attendance so students take full advantage of the effective educators who serve them (Fiester, 2010). Additional strategies to address other occasions for interrupted learning, such as summer loss and the consequences of the schooling during COVID-19, which disproportionately affected traditionally underserved students, are warranted (Fiester, 2010;Kuhfeld et al., 2021;Kwakye & Kibort-Corcker, 2021).
This body of connected efforts will help level the playing field between traditionally underserved children and students and their peers. Continuing to collect data will enable further research to deepen our understanding of the challenges, measure progress, and support accountability (Hein et al., 2013).

Conclusion
Our study shows that the Grade 3 reading performance disparity between NH students and their peers appears in the early elementary grades. The gap widens in later elementary grades. Therefore, this early gap is a crucial policy issue in education in the United States (Slavin et al., 2009) and may be a unique opportunity for improvement. Investigations at the macro (district) and micro (schools and classrooms) levels would yield invaluable insights into students' inequitable outcomes and allow researchers, educators, and policymakers to improve these outcomes in the long-and short-term. Our findings support the need for further research, including qualitative studies. We must critically investigate factors such as classroom cultures and students' experiences that could be affecting the learning trajectories of students of color and other traditionally underserved students during and before elementary grades.

Recommendations
While improving literacy achievements in the primary grades holds promise for future student outcomes, it is not an "inoculation to future literacy success" (Lai et al., 2014, p. 305). Student performance in middle school is another predictor of their future academic success (Balfanz et al., 2007;Kieffer & Marinell, 2012). Addressing low performance and gaps in middle school may reduce the need to do so in high school, by which time patterns of low performance are entrenched. Hence, the research community has called for substantial interventions to prevent high school dropout outcomes (Mac Iver, 2010). We intend to extend the longitudinal analyses from this study to the middle grades when the data becomes available.
Researchers have essential questions to understand NH students' circumstances to serve them better: Do NH families have the same access to appropriate early childhood education and healthcare? To what extent do they use those services, and why? Are the early health outcomes of NH children different from those of their non-N.H. peers? Do they live in equally safe neighborhoods? Are they more mobile? What resources do they have available in school, at home, and in the community? How is their learning supported at home? What of their social-emotional well-being? What do their schools need? What do their teachers need? What are the expectations for them of the adults in their life? Answers to these questions may unearth rich opportunities for success and equitable outcomes for NH children.

Limitations
Although our findings support the Matthew effect, the data does not allow us to make causal statements. Other variables, such as parental education, may have increased the model's precision, but HIDOE does not collect such data. Our results only generalize to the student population in Hawaii. However, other states and districts can use similar methods to check for similar patterns in their data.