What makes a good reader? Worldwide insights from PIRLS 2016

Using hierarchical linear models, this study probes into student, family, teacher, and schools’ variables that can explain the variation in Progress in International Reading Literacy Study (PIRLS) 2016 results. Students’ confidence in reading, early literacy tasks, and parents’ expectations are the strongest explanatory variables of reading literacy. Teachers’ perception of class instruction being limited by students’ needs is the strongest explanatory variable of PIRLS achievement, although this was not consistently verified among all countries. No teaching strategies or other related variables emerged consistently as explanatory variables in every country. A similar result was observed in schools where the percentage of economic disadvantage students was the most consistent explanatory variable of PIRLS results. The present analysis shows that although student variables are the most consistent explanatory variables among participating countries, a general conclusion of what makes a good reader worldwide must consider all student, teacher, and school variables conjointly, acknowledging the existence of between-country variation.


Introduction
In today's driven information world, being able to read and comprehend what was read is a key skill for full citizenship. Becoming a proficient reader at an early age is fundamental not only for the development and intellectual maturation of children, but also to advance successfully in school, in the workplace, and contemporary societies.
Aware of the importance of reading in modern-day societies, the International Association for the Evaluation of Educational Achievement (IEA)-a cooperative of institutes and agencies, both government and non-governmental, that develop research in education-has been promoting the Progress in International Reading Literacy Study (PIRLS) every 5 years since 2001. PIRLS is a collaborative research project between the participating countries and the IEA, under the direction of the TIMSS & PIRLS International Study Center at Boston College. The study evaluates reading literacy in children with 4 years of formal schooling (excluding the kindergarten or pre-primary education years) by sampling a large number of students in each participant.
The IEA defines reading literacy as "the ability to understand and use those written language forms required by society and/or valued by the individual. Readers can construct meaning from texts in a variety of forms. They read to learn, to participate in communities of readers in school and everyday life, and for enjoyment" (Mullis, Martin, & Sainsbury, 2015, p. 12). The PIRLS framework emphasizes reading for two main purposes: (1) literary experience and (2) acquisition and use of information . The age of the students evaluated by the PIRLS assessments-9-10 years-reinforces the importance of reading literacy in modern educational systems: until this age, children learn to read; from then on, they read to learn . It is therefore fundamental that children are competent readers by the time they leave primary school.
Data gathered by PIRLS provide comparative information on how well a child reads as assessed by a comprehensive test of reading literacy. The test focuses on four broad-based reading comprehension processes employed by fourth-graders: (1) focusing on and retrieval of explicitly stated information; (2) making straightforward inferences; (3) interpreting and integrating ideas and information; and (4) evaluating and criticizing content and textual elements . PIRLS also collects considerable background information through a series of questionnaires aimed at the participating countries' educational systems, schools, teachers, students, and students' families (Hooper, Mullis, & Martin, 2015). This information allows for the characterization of reading opportunities, strategies, and contexts, as well as for the identification of variables that can influence learning. In assessing reading skills and the contexts in which they develop, PIRLS opens the way for the diagnosis of educational, socioeconomic, and cultural areas where education systems can invest to make reading an effective tool for acquiring knowledge and active citizenship.
The development of the PIRLS framework, tests, and questionnaires is the result of collaborative work between groups of IEA specialists and the PIRLS national coordinators. National coordinators compile data on the educational systems of each of the participants for the Encyclopedia of PIRLS to characterize and frame the educational systems of their countries (Mullis, Martin, Goh, & Prendergast, 2017).
The collaborative nature of PIRLS, together with a range of technical validations imposed on the process, ensures that the results in each edition of the study present transcultural validity, concurrent validity, and reliability. The technical validations, that cover different areas of the assessment, include procedures for translation, adaptation, and delivery of the tests and questionnaires, sampling, and literacy estimation methods that are anchored in the results of previous PIRLS editions.
In 2016, fifty countries, 12,000 schools, 16,000 teachers, 310,000 parents, and 319,000 students participated in the fourth edition of PIRLS (Mullis, Martin, Foy, & Hooper, 2017a). Fourth graders from Russia (M = 581, SE = 2.2) and Singapore (M = 576, SE = 3.2) had the highest mean reading achievement. On the scale opposite, 4th graders from Morocco (M = 358, SE = 3.9) and South-Africa (M = 320, SE = 4.4) had the lowest country mean achievements. The gap between top and low-performers countries is more than two PIRLS standard-deviation (200 points), an equivalent to a four-year 1 gap in the reading literacy of 10-year old students from high versus low performing countries. More girls than boys are good readers, and good readers have an early start in learning and home environments that support learning. Good readers attend schools that put a high priority in reading instruction, are academically oriented, and are well resourced (Mullis, Martin, Foy, & Hooper, 2017c). Although the TIMSS and PIRLS International Study Center has released an overall outlook on what makes a good reader, the subject of explanatory variables of reading attainment within participating countries has been conspicuously absent from most of the recent studies undertaken by the educational research community. Many models of reading comprehension have been proposed (see, e.g., Harker, 1972;Joshi & Aaron, 2000) but those are limited in the number of explanatory variables that can explain reading literacy. For example, in Joshi & Aaron (2000) model, reading comprehension is simply estimated as the product of Decoding and Listening Comprehension plus the Speed of Processing. For more recent models, as the Component Model of Reading (Aaron, Joshi, Gooden, & Bentum, 2008), reading literacy of children is explained not only by cognitive factors (e.g., word recognition and reading fluency) but also by psychological (e.g., reading self-motivation and enjoyment; teachers' expectations) and environmental factors (e.g., home environment and parents' engagement; schools' resources for reading). The PIRLS framework follows closely the Component Model of Reading wide and comprehensive network of possible explanatory variables of reading literacy including students (cognitive and psychological domains), and families, teachers, and schools' (environmental domain). Initially developed for the 2001 edition of PIRLS, but updated for each subsequent assessment cycle, the PIRLS framework emphasis has been shifting from "demonstrating fluency and basic comprehension to demonstrating the ability to apply what is read to new situations or projects" (Mullis et al., 2015, p. 11). The framework explicit recognizing self (e.g., self-confidence in reading), families (e.g., home resources for learning or parents education), teachers (e.g., strategies used for teaching reading) and schools (e.g., reading resources like libraries or schools' emphasis in academic success) as complex explanatory variables of reading achievement at early ages.
There is a long-standing debate around the role of student and family background and school quality in shaping learning. Most efforts to address this question using international large scale student assessments (ILSA) are based on TIMSS (mathematics and science literacies) and PISA (mathematics, reading and science literacies) data on a country or regional levels (see e.g., Karakolidis, Pitsia, & Emvalotis, 2016;Thien & Ong, 2015). A few PIRLS reports on reading literacy have surfaced, but once again limited to a country or a few countries analysis (Araújo & Costa, 2015;Costa & Araújo, 2017;Marôco, 2018;Park, 2008;Tse, Lam, Lam, Loh, & Westwood, 2005). Furthermore, a 2015 literature review on PIRLS data usage found out that despite PIRLS being a reading literacy study, its data is hardly used for reading research (Lenkeit, Chan, Hopfenbeck, & Baird, 2015). Using data from 25 countries who took part in the 2001 PIRLS edition, Park (2008) uncovered that home literacy resources were consistent explanatory variables of reading achievement, although effects sizes varied substantially across countries. A small number of generalized multi-country analysis have been produced recently, and yet again, focusing mainly on mathematics and science and its correlates measured at a single level (either student/family or school) (see e.g., Guo, Marsh, Parker, & Dicke, 2018;Schmidt, Burroughs, Zoido, & Houang, 2015). It is not clear how student/family level variables versus school level variables relate to student achievement overall or how these patterns replicate across different countries. Using data from the 1970s gathered by IEA on mathematics and science literacy from a series of countries with different gross national products, Stephen Heyneman and William Loxley proposed that in developing nations, school variables are more important than family socioeconomic status, in determining academic achievement (Heyneman & Loxley, 1982). However, the 'Heyneman-Loxley' hypothesis did not hold when data from TIMSS on the relationship between family background and mathematics and science achievement gathered in the mid-1990 was reassessed. Baker, Goesling, and Letendre (2002) found out that the relationship between family and school variables and student achievement was similar across nations, regardless of national income. These authors suggested that the spread of mass schooling has reduced the 'Heyneman-Loxley' effect. A 2008 study of PIRLS 2001 literacy done by Park (2008) uncovered evidence that partially supports the presence of the Heyneman-Loxley effect in some countries, but not in others. The effects of early home literacy activities, parental attitudes towards learning, and home resources for learning varied between countries according to their level of economic development (Park, 2008).

Research questions
In this paper, looking at PIRLS data collected in the 48 countries that took part in the 2016 edition of the study and have a complete dataset, I inquire onto the three following research questions: (Q1): How does between-schools' variation relate to country performance in PIRLS? (Q2): Which school, teacher, family, and student variables, as set by the cognitive, psychological, and environmental domains of the Component of Reading Model, can explain the within country's variation on reading achievement? (Q3): Are the PIRLS literacy explanatory variables for top-achievers substantially different from those of low-achievers?
Within-school variation may correlate negatively with country mean scores, i.e., countries where schools are more heterogeneous have lower PIRLS scores than countries were schools are more homogenous. If this is the case, then student/family level variables will correlate less with student achievement than school-related variables, thus confirming the Heyneman-Loxley' effect. If not, then I expect that the stronger explanatory variables of student reading achievement will not differ between low and top-performer countries as demonstrated by Baker et al. (2002) with data from the mid-1990 TIMSS editions.
An exploratory hierarchical joint analysis of literacy regressed on student, family, teacher, and school-level organizational, socioeconomic and cultural variables, taking into account the complex sample design of PIRLS and the countries' within schools' variation, can provide useful insights for policy recommendations and education practices aimed at improving reading literacy.

Methods
PIRLS 2016 aims at assessing the reading achievement of fourth graders in 50 participating countries using a set of standardized reading tests forms and context questionnaires to characterize education systems, schools, teachers, students, and their families. These are briefly described in the following sections. A detailed description of the PIRLS methods and experimental design can be found in the PIRLS 2016 Methods and Procedures manual edited by .

Participants
Three hundred and seventeen thousand students of both sexes (mean age = 10.2, SD = 0.4) attending the 4th year of schooling during the 2015/2016 academic year in 50 countries were enrolled in PIRLS 2016. The 11 benchmark regions were not included in this study, as well as the two countries that did not apply the Student's family questionnaire were removed from the overall analysis. A minimum sample size of 4000 students per country (mean = 5874, SD = 3175) was selected by a multistage probabilistic sampling procedure as defined by the IEA-Boston College consortium responsible for 2016 PIRLS (LaRoche, Joncas, & Foy, 2017). In the first sampling stage, countries were stratified into regions defined by a set of local stratum variables. In the second stage, around 200 schools per country (mean = 208, SD = 138) were selected by systematic sampling proportional to size. Finally, in each selected school, one or two grade four (from primary education) classes were randomly sampled according to the number and size of the classes in the selected schools. All students in the selected classes whose participation was authorized by their parents and who met the eligibility criteria for PIRLS (did not have special educational needs, and were native speakers of the test language or students whose mother tongue was not the test language but had more than 1 year of language learning) were assessed (overall weighted mean participation rate was 95%, SD = 3%). School, class, and student sampling weights were derived from the multi-stage sampling accounting for any disproportional sampling of sub-groups and non-participation (LaRoche et al., 2017). Figure 1 shows the geographical distribution of PIRLS 2016 participants.

The PIRLS test
The PIRLS 2016 test was composed of a set of 16 test forms (booklets). Each booklet comprised a literary text and an informative text followed by a set of multiplechoice and constructed-response items that evaluated reading literacy and its dimensions (reading purposes and comprehension processes) . Each student responded to one of the 16 test forms, according to a matrixsampling booklet design with planned missing by design items, and to a sociodemographic questionnaire . The translation and adaptation of the tests, from the original English version, was carried out by the National PIRLS research centers. The validity of the translations and adaptations was verified by an independent translation agency subcontracted by IEA (Ebbs & Wry, 2017).

The PIRLS questionnaires
In parallel to the tests, the PIRLS study deploys a series of questionnaires to school principals, teachers of the selected classes, parents, and students. Based on the questionnaire responses, the IEA-Boston College consortium produced indexes and psychometric scales that allow for the characterization of the educational stakeholders' opinions and perceptions of the educational, professional, and sociodemographic contexts of the school community . Amongst the several scales and indices reported by PIRLS, and following the Component of Reading Model three domains predictors of reading literacy, the ones with a larger explaining power in the HLM models are described briefly in the following section (see  for full descriptions).

Student and family-level variables
Home resources for Learning (Home Res for Learning) is a scale based on students' and parents' responses regarding the home possessions that may facilitate reading literacy (e.g., books at home; the highest level of education of parents; highest parent's occupational status). Higher scores indicate higher availability of resources. Reported Cronbach's α for this scale (see ) ranged from as low as .51 (Saudi Arabia) to .81 (Hungary) with most countries displaying values around .6.
Student Bullying (Stud Bullying) is a scale based on students' responses to how often they experienced bullying behaviors (e.g., making fun or calling names; spread lies; threatening). Higher scores of the scale indicate a lower frequency of bullying. Reported Cronbach's α (see  ranged from .77 (Kuwait) to .86 (Australia) with most countries displaying values greater than .7.
Students Confident in Reading (Stud Conf Reading) is a scale based on students' degree of agreement with statements about their confidence and liking about reading (e.g., I usually do well in reading; reading is easy for me; reading is harder for me than any other subjects). Higher scores indicate higher confidence. Reported Cronbach's α for this scale (see ) ranged from as low as .53 (Saudi Arabia) to .83 (Belgium(Flemish) with most countries displaying values above .7.
Students like reading (Stud Like Read) is a scale based on students' responses to items that evaluate the enjoyment and liking about reading (e.g., I enjoy reading; I like talking about what I read with other people; reading is boring). Higher scores indicate higher liking. Reported Cronbach's α for this scale (see ) ranged from as low as .71 (Iran) to .90 (Poland) with most countries displaying values above .8.
Students Sense of School Belonging (Stud Sense of Sch Belong) is a scale based on students' degree of agreement with statements like "I like being in school"; "I feel safe when I am at school"; or "Teachers are fare to me". Higher scores indicate a higher sense of belonging. Reported Cronbach's α for this scale (see ) ranged from as low as .59 (Morocco) to .82 (Australia or Qatar) with most countries displaying values above .7.
Expected level of education of Child (Exp Level Ed Child) is an index based on the parents' response to the level of education that they expect their child to reach.
Early Literacy Activities Before Beginning Primary School (Early Lit Actv Befor Sch) is a scale based on parents' report on the frequency of the children doing literacy activities before entering school (e.g.; read books; tell stories; write letters or words). Higher scale scores indicate higher activities. Reported Cronbach's α for this scale (see ) ranged from as low as .70 (Kazakhstan, Oman,…) to .85 (Bulgaria and Egypt) with most countries displaying values above .7.
Could Do Early Literacy Tasks When Beginning Primary School (Early Lit Tasks) is a scale derived from parents' responses to how well their children could do tasks like "Read some Words"; "Write some letters"; "Write sentences" before entering primary school. Higher scores indicate higher ability. Reported Cronbach's α for this scale (see   Parents Like Reading (Parents Like Read) is a scale that measures how students' parents feel about reading as derived from their agreement with sentences like "I read only if I have to"; "I enjoy reading" or "Reading is one of my favorite hobbies". Higher scores indicate higher liking. Reported Cronbach's α for this scale (see  ranged from .72 (Kazakhstan) to .90 (e.g. Austria or Hungary) with most countries displaying values above .8.

Teachers and schools-level variables
Classroom Instruction Limited by Student Attributes (Class Instr Lim by Stud) is a scale derived from the teachers' reports on the extent to which their classroom instruction in reading was limited by students' preparedness and readiness to learn (e.g., lacking skills; sleep-deprived; poor nutrition; disruptive students or with learning impairments). Higher scale scores indicate lower limitations on instruction. Reported Cronbach's α for this scale (see  ranged from .61 (Italy) to .83 (e.g. Australia) with most countries displaying values above .7.
Safe and Orderly School (Safe and Ord Sch) is a scale derived from teachers' degree of agreement with t statements such as "I feel safe at this school"; "This school has clear rules about student conduct" or "The students are respectful to teachers". Higher scores on the scale indicate higher safety and orderly schools. Reported Cronbach's α for this scale (see  ranged from .61 (Georgia) to .90 (United States of America) with most countries displaying values above .8.
Time spent by teacher in Reading Instruction (Time Spent Read Instr) is a measure derived from principals' reports of total instruction time per year and teachers' reports on language instruction and reading times.
Teacher asks students to Reading Silently (Read Silently) is a measure of how often teachers ask their students to read silently (form "never or almost never" to "every day or almost every day").
Teachers Teaching Students strategies for Decoding Words and Sounds (Decoding Words) is a measure of how often teachers teach strategies for decoding words and sounds (from "never or almost never" to "every day or almost every day").
Teachers Teaching Students how to Summarize the main ideas of a Text (Summarize Main Ideas) is a measure of how often teachers teach students how to summarize the main ideas of the text (from "never or almost never" to "every day or almost every day").
Teachers asking Students to Locate Information in the Text (Locate Info) is a measure of how often teachers ask students how to locate the main ideas of the text (from "never or almost never" to "every day or almost every day").
Percentage of Economic Disadvantage Students (Economic Disad) a report by Principals' on the percentage of students coming from economically disadvantage homes (from "0 to 10%" to "more than 50%").
Shortage of Instructional Materials (Short Instruct Materials) a report by principals on how much the school capacity to provide instruction is affected by the shortage of instructional materials (from "not at all" to "a lot").
Percentage of Students Entering with Literacy Skills (Stud Enter Lit Skills) is a measure derived from principals' answers to 6 questions regarding the percentage of students that enter schools with reading literacy skills, e.g., "read some words", "recognize most of the letters of the alphabet", "read sentences" or "write letters of the alphabet" using a ranking from "less than 25%" to "more than 75%".
Instruction Affected by Reading Materials Shortage (Instr Aff by Read Short) is a scale derived by principals' reports how reading instruction is affected by 12 school and classroom resources (e.g., "instructional material (e.g., textbooks)", "Teachers with a specialization in reading", or "Computer technology for teaching and learning" reported on a rating scale from "not at all" to "a lot". Higher scale scores indicate less shortage of resources. Reported Cronbach's α for this scale (see  ranged from .77 (The Netherlands) to .96 (United Arab Emirates) with most countries displaying values above .8.
Parents Expectations (Parent Expect) a measure by principals on their perception about parents' expectations for student achievement (from "very high" to "very low").
School Discipline (Sch Discipl) is a scale derived from principals' answers to ten potential school discipline problems, e.g., "Classroom disturbance", "Cheating" or "Vandalism" reported on a scale from "Not a problem" to "Serious problem". Higher scale scores indicate more disciplinary problems. Reported Cronbach's α for this scale (see  ranged from .73 (Macao SAR) to .97 (Georgia) with most countries displaying values above .8.
School Emphasis in Academic Success (Sch Enph Acad Success) is a scale derived from principals' answers to 13 questions regarding aspects of school emphasis on academic success, e.g., "Teacher's understanding the school's curricular goals", "Parental expectations for student achievement" or "Students' desire to do well in school" with higher scores indicating higher emphasis. Reported Cronbach's α for this scale (see  ranged from .84 (Germany) to .93 (Australia) with most countries displaying values around .85.

Testing and coding procedures
The administration of PIRLS tests and questionnaires in the participating countries followed a standardized procedure defined by IEA. The duration of the test was 80 min, with an interval of no more than 30 min at the end of the first 40 min. A student sociodemographic questionnaire (20 min) was applied at the end of the test (Johansone, 2017). The school, teacher, and family surveys were conducted before the testing sessions with no time limit to answer the questionnaires. Answers to multiple-choice and constructed-response test items were coded by a panel of national coders previously trained by the National Centers according to the coding scripts produced by the IEA. The inter-coder reliability was controlled by the IEA.

Estimation of students' reading literacy
Student reading literacy (proficiency) scores were estimated by the psychometrics team of the IEA-Boston College consortium using Item Response Theory and Imputation of missing values by regression on latent (conditioning) variables methods . The score estimates, presented in the form of five plausible values, are stored in the PIRLS database that was made available to the public in December 2017 (TIMSS & PIRLS-PIRLS 2016 International Database, n.d.). The plausible values are estimates of the score that a student is expected to obtain in reading, considering his/her latent aptitude in the evaluated reading purposes/processes and context variables that characterize the student. Standardized student performance estimates were then converted into the PIRLS scale, established in the first edition of the study, to vary between 0 and 1000 points, with an average of 500 points and a standard deviation of 100 points .

Statistical analysis
The identification of variables, scales, and indexes that were able to explain the variation of the reading literacy scores was made through a hierarchical linear regression of the five plausible values for reading literacy on student and family (level 1) and teachers and schools (level 2) explanatory variables. The level 1 (within) model for student i within school j is: where Y ij is the reading literacy imputed from the five plausible values, β 0j are school random effects associated with the model intercept, β 1j are the fixed effects for the W ij level 1 (within) explanatory variables ( 1j = 10 ) and ε ij are the random errors for student i within school j. The level 2 (between) model is: where γ 00 is the common intercept for schools, γ 01 are the fixed effects for the B j level 2 (between) explanatory variables and u 0j are schools' random effects associated with the model intercept. The final mixed model, obtained by combining Eqs. (1) and (2), is: where ε ij ~ N(0, σ). Only explanatory variables that could be predictors of reading literacy according to the Component Model of Reading (Aaron et al., 2008) were included in the models. Furthermore, variables displaying strong multicollinearity (VIF > 5) were removed before the HLM Analysis (see e.g., Montgomery & Peck, 1982). Thus, the models presented in this study do not include all the variables, indexes and psychometric scales (more than 100) that can be found in the international report (for a full description of all the scales and indices see Hooper et al., 2015) (Table 1 lists the variables present in the HLM models for each country). (1) After pairwise missing data deletion, the proportion of non-missing data in all variables was larger than 0.99 for all countries.
In the first step of the analysis, the intraclass correlation coefficient (ICC) for schools within countries was estimated from a basal model with no explanatory variables ( ) with corresponding design effect estimated as 1 + (n. − 1) × ICC where n. is the average cluster size (Stapleton, 2013). ICC greater than or equal to 0.1 and design effect larger than 2 were considered an indicator of cluster effects that must be considered to obtain efficient estimates with the HLM model (Muthén & Satorra, 1995;Stapleton, 2013). In the second stage of the analysis, hierarchical linear models with random intercepts and homogeneous slopes between-schools were analyzed for each of the PIRLS 2016 participating countries. Level 1 explanatory variables (students and family) were centered by the grand mean of the country while level 2 explanatory variables (teachers and schools) were centered by the group mean. To account for the 2-stage sampling designs (PPS and cluster sampling), students (level 1) weights and teachers/schools (level 2) weights were recalculated from the information in the PIRLS 2016 international database according to procedures implemented in the IEA Data Analysis center (Stein-Planck, 2017, pers. comm.). Teacher variables were aggregated by the mean, for interval scaled variables, or the median, for ordinal scaled variables within schools. The validation of the model assumptions, namely the normal distribution of the residues, was safeguarded by the large sample size and consequent application of the central limit theorem. All HLM analyses were performed on the five plausible values for reading literacy using the imputation procedure implemented in the Mplus software (v 7.2, Muthén & Muthén, Los Angeles) with robust maximum likelihood estimation (MLR). A diagonal covariance structure for the null model and unstructured for the 2-level model were assumed. No other covariance structures were tested. Model fit was evaluated from the model's R 2 for level 1 and level 2 variables estimated as which applies to both withinand between levels (Liu, Zumbo, & Wu, 2014). Mplus input files and results of the analyses retrieval were optimized through the R software (R Core Team, 2017) package MplusAutomation (Hallquist & Wiley, 2018). No missing data imputation was performed during the HLM analysis (pairwise deletion). The HLM models were fitted in three stages to account for model complexity and possible algorithmic convergence issues. In the first, only W variables were added, and the statistically significant ones selected; in the second, B variables were probed, selecting the statistically significant ones. Finally, in the third stage, the full two-level HLM model was fitted using the level 1 and level 2 selected variables in the two previous steps. Summary measures were obtained with the skimr package for R (McNamara et al., 2018). Student's t-tests were used to probe the significance of the overall mean standardized regression coefficients. Effects with p < .05 were considered statistically significant. Standardized regression coefficients below 0.10 (β < 0.10), even if statistically significant, were considered irrelevant effect sizes at this level of analysis.

Results
In the PIRLS 2016 edition, the Russian Federation (M = 581, SE = 2.2) followed by Singapore (M = 576, SE = 3.2), Hong Kong SAR (M = 569, SE = 2.7), Ireland (M = 567, SE = 2.5), and Finland (M = 566, SE = 1.8) were the top-5 performers (scores at or greater than the scale mean + 0.5SD; green-colored in Fig. 2 Looking at the PIRLS average achievement for participating countries, a strong linear relationship emerges between the mean PIRLS score and the betweenschools' variation as probed by the Intraclass Correlation Coefficient (ICC) for schools within countries (r = − .624, p < .001). Countries with lower PIRLS scores show a larger variation between schools than countries with higher scores who display more homogenous schools (see Fig. 3a). The support for the 'Heyneman-Loxley' effect is, overall, week. Only 19% of the ICC variation can be explained by the GDP per capita (r = − 0.436; p = .012) (Fig. 3b). However, there are quite striking differences between countries suggesting that specific, within-countries, effects must be in place.

Teachers and schools-level variables
At the teacher level, there was a large variation between countries in teachers' strategies and attitudes associated with better PIRLS results. For the Russian Federation, giving students time to read silently (varying from every day to never or almost never) was the teaching strategy associated with better student's results. However, the amount of time spent on reading instruction was the strongest, and a negative explanatory variable of Singapore's students' performance in PIRLS. In some mid-performing countries, like Portugal, locating information was the strongest explanatory variable of PIRLS results, while for others, like Slovakia, decoding words strategies was the strongest explanatory variable. In some low-performers, like South Africa or the United Arab Emirates, time spent by students reading silently and decoding words were the teachers' strategies more strongly associated with better PIRLS results. Problems with discipline and students' difficulties limiting classroom instructions (an inverted scale derived from students lacking prerequisite knowledge or skills, disruptive students, uninterested student, etc.) were common explanatory variables in low-performers (e.g., the United Arab Emirates and Morocco), mid-performers (e.g., Poland), and top-performers (e.g., Singapore, Northern Ireland) alike.
As was the case for teacher-level factors, considerable differences between countries in terms of school-level reading attainment explanatory variables were also observed. A higher percentage of economically disadvantaged students was associated with lower performance in PIRLS in countries like South Africa (low-performer), Flemish Belgium (mid-performer) or Singapore, and Northern Ireland (topperformers). School emphasis in academic success was identified as an important explanatory variable in low-achieving countries like Oman, mid achieving like Portugal, and also top achievers like Taiwan. Bold predictors have a mean value greater than or equal to 0.10 (lower limit for a medium effect size) An exploration of Table 1 reveals a common pattern for student and family-level variables associated with PIRLS results, but a strong variation in teacher and schoolrelated variables. A summary of all the explanatory variables in all PIRLS countries with data available for all variables studied is given in Table 2.
The analysis of Table 2 reveals that confidence in reading, parents' higher expectations about their child education, and early literacy tasks before entering primary education were the strongest explanatory variables of PIRLS 2016 reading literacy scores at the students and family level (p < .001). However, the effect on average PIRLS attainment described by these explanatory variables varied considerably between countries. On average, student and family variables accounted for 28% of the within-school PIRLS variation. At the teacher level, classroom instruction limited by students' needs was the most common, and negative, explanatory variable of PIRLS results across countries (p < .001), with no effective teaching strategies proving to be common to most countries. Finally, and at the school level, the percentage of economically disadvantaged students was the negative explanatory variable of PIRLS results that emerged consistently among participating countries (p < .001). School emphasis on academic success emerged as an important and strong explanatory variable in some countries, both low, mid, and top-performers, but the overall common effect across the 48 countries analyzed was quite weak (mean standardized effect of .09) although statistically significant (p = .001). On average, teacher and school variables accounted for 48% of the variation of PIRLS results between schools. It is worthwhile to note that a country's poor performance in PIRLS was negatively associated with the between-schools variation (p < .05).

Discussion
PIRLS 2016 differences between top-performers and low-achievers were around 200 points or two standard-deviations on the PIRLS scale (see Fig. 2). This means that 4th-grade students in low-performing countries were about four school years behind students from top-performing countries in the same grade as far as reading literacy is concerned. Analysis of within versus between-schools' variation reveals that low-performers have higher variation due to school-level factors when compared to top-performers. For example, in Morocco, ca. 45% of the variation in PIRLS results is attributable to schools, while for top performers like Finland less than 10% of the variation in PIRLS results is due to school differences. A strong negative linear relationship emerges from poor performance in PIRLS and larger school variation within countries (r = − .62; p < .05).
Analysis of student and family-level variables, at the within school level, revealed some variation but a common set of explanatory variables common to low-performers, mid performers, and top performers countries emerged. Student confidence in reading (how much the students thinks he/she is a good reader), parents' expectation on their children future education, and early literacy task before entering primary education were consistent explanatory variables of PIRLS scores. More than 50% of the participating countries in the PIRLS 2016 edition showed that the student confidence in reading had a standardized effect of at least .29 (P50 = .29). That is a change in one standard deviation in the student confidence in reading results in at least 29 points increases in the PIRLS score scale for half of the participating countries. A related construct, reading self-concept, was identified as a significant explanatory variable of reading literacy in PIRLS 2001 for Hong-Kong SAR students (Tse et al., 2005), Australian students, and other 14 countries that participated in PIRLS 2011 (Guo et al., 2018). Similar effects have been observed in several European countries taking part in previous editions of PIRLS (Araújo & Costa, 2015;Costa & Araújo, 2017;Netten, Voeten, Droop, & Verhoeven, 2014). The effect of parents' expectations about their children's future education had a more modest effect across countries. An increase in one standard deviation on the level of expectation resulted, across countries, in at least 13 points increase in the PIRLS score for half of the participating countries (P50 = .13). The same result was seen in the early literacy tasks scale. Similar results, as well as expectations and practices of early literacy activities before entering primary education of Canadian children, were reported by Martini and Sénéchal (2012) in their home literacy model, and for a few countries in Park's 2008 study with 25 countries who took part in the PIRLS 2001 study.
In 27 out of the 48 countries that took the full PIRLS test in 2016, girls significantly outperformed boys (see Table 1). However, the average gender effect was around 6 points in the PIRLS scale (see Table 2). When considering the effect of other student and family level on the students' PIRLS scores, the gender effect was not statistically significant for the other 21 participating countries. This contrasts with the PIRLS consortium published information for the gender gap that reported girls scoring significantly higher than boys in 48 countries out of the 50 PIRLS and PIRLS Literacy participants that took the test in 2016 (Mullis et al., 2017a). The results presented in this paper, obtained from the analysis of gender differences after considering other student and family variables, contradicts the previous emphasis on the reading gender gap. Nonetheless, when considering single country studies, there are countries where the gender gap is inexistent (e.g., Portugal and Macao SAR, see Marôco, 2018), mid-sized (e.g. Ireland, see Eivers, Gilleece, & Delaney, 2017) or relatively large (e.g., Saudi Arabia or South Africa, see Spaull, 2017).
Another result that is at odds with previously published research on both PIRLS and other International Large Scale Student Assessments (ILSA) like TIMSS (see, e.g., Ólafsson et al., 2014) or PISA (see, e.g., Karakolidis et al., 2016) is the positive effect of family socioeconomic status on literacy, although with a strong heterogeneity among students and countries (Lagravinese, Liberati, & Resce, 2019;Park, 2008). For example, the PISA's Economic and Socio-Cultural Status (ESCS) was a strong explanatory variable of mathematics achievement in Malaysia but not Singapore (Thien & Ong, 2015). In this study, when taken together with other student and family-level variables like early literacy tasks, socio-economic related variables like home resources for learning did not show a consistent effect across countries. Indeed, the mean effect for this explanatory variable over the 48 analyzed countries was .091 (P50 = .098). For about half of the participating countries, an increase in one unit of the home resources for learning scale results in a modest increase of nine points in the PIRLS scale. On the other hand, this study confirms the results of an earlier analysis of PIRLS 2011 data from four European countries where student early literacy skills, home literacy practices, resources, and behaviors were strongly associated with PIRLS scores (Costa & Araújo, 2017). The association of home resources for learning and early literacy tasks (a toddler can only read or play with a book if a book is available at home, or preschool…) may result in possible suppression effects, although this was not a testable hypothesis in this study. Consistently with this hypothesis, Park (2008) observed a U-shape effect of the number of books at home with the country's economic level on reading literacy.
When considering the influence of teacher-related variables on students' PIRLS achievement, no teacher-related variables (e.g., teaching strategies or teachers professional experience) emerged consistently on all the 48 countries analyzed other than students' difficulties hindering learning. Analysis from other PIRLS editions has pointed out the importance of learning-oriented teaching strategies. Cheung et al. (2017) noted that the results of Hong Kong students in the PIRLS 2006 were significantly correlated with teaching strategies and activities that promoted silent reading. However, the positive effect of teacher strategies on student literacy does not appear to be universal. For example, when analyzing PIRLS 2006 data Shiel and Eivers (2009) found contradictory relationships between student performance and different types of strategies and resources used by teachers from different countries. In this study, in only but a few countries, either low-performers like South Africa, mid-performers like Slovakia and Spain, or top-performers like the Russian Federation, did time spent in reading silently explained PIRLS results. Overall, the mean standardized effect was .045 (P50 = .045). There is no common denominator for all countries and education systems that can easily identify the resources and strategies that teachers can use to improve reading literacy.
At the level of school-related variables, school emphasis on the students' academic success has been pinpointed as a relevant explanatory variable in the PIRLS 2016 edition in several individual countries. This was the case in mid-performers like Portugal (Marôco, 2018) and several low performers like Oman, and high performers like Taiwan (Mullis et al., 2017a). But again, this effect was not consistent across countries. Furthermore, some countries displayed effects contrary to the expectation, like what happened with the USA's negative effect of the emphasis on academic success on PIRLS results (see the USA in Table 1). The most consistent school-level explanatory variable of (negative) PIRLS achievement across countries was the percentage of disadvantaged students. Schools with a higher proportion of economically disadvantaged students had consistently lower scores somehow masking the effect of the type of school governance (public or state-funded vs. private owned). This effect was observed in low-performing (e.g. South Africa's β = − 0.68, see Table 1), mid-performing (e.g. Poland's β = − 0.348, see Table 1), and top-performing countries (e.g. Taiwan's β = − 0.68, see Table 1) alike.
Overall, these results on reading literacy using PIRLS 2016 data, do not support a comprehensive Heyneman-Loxley' effect. Student and family-related variables that emerged consistently among the top, mid and low performing countries were quite similar, while school-related variables, like percentage of economically disadvantaged students, although accounting for a larger fraction of the within-country achievement variance (r 2 = .48 vs. r 2 = .28), were not stronger explanatory variables at the low-achieving countries than at mid or top-performing countries. Taking the percentage of disadvantaged economic students at school as a proxy for a country's economic health, no stronger and consistent effect of this variable was observed between countries with the higher gross national product (see Fig. 3) like Hong Kong SAR (β = 0.024), Singapore (β = − 0.287 or Taiwan (β = − 0.627) versus lowperforming and less economically advantaged countries like Oman (β = − 0.043), Morocco (β = − 0.312) or South Africa (β = − 0.68). These observations are in line with the previous conclusions from Baker et al. (2002) on the attenuation of the socio-economic and national economic development effects on student achievement across developing and developed countries.

Conclusions and Recommendations
Results from this study (see Table 2) demonstrate that models for reading literacy achievement are complex, and involve different hierarchical variables (students and families, teachers, and schools). Although the PIRLS framework and the Aaron et al. Component Model of Reading (Aaron et al., 2008;Joshi & Aaron, 2000), set the stage for the cognitive, psychological and environmental explanatory variables of reading literacy, PIRLS data show strong variation in the explanatory variables of reading literacy between countries and even within countries. No single, universal set of explanatory variables function homogeneously for all children and countries.
The analysis of 48 countries that took part in the PIRLS 2016 assessment and provided student, family, and teacher-level data, as well as school questionnaires shows that countries with lower PIRLS scores displayed higher between-schools variation, thus confirming the hypothesis set by research question 1. On average, between schools' variation accounted for 25% of the countries' PIRLS scores. The percentage of disadvantaged students surfaced as the most consistent school variable explaining PIRLS results. Regarding research question 2, at the student and family level, student confidence in reading, early literacy tasks, and parent expectations were the strongest explanatory variables of reading literacy in PIRLS 2016. Although profusely reported before, gender and home resources for learning were less important explanatory variables of reading literacy when the effects other student and family-related variables were accounted for. At the school and teacher level, teacher perception of the class instruction limited by students' needs and weaknesses was the strongest explanatory variable of PIRLS achievement. No common teaching strategies or other school-related variables emerged consistently in the analyzed countries, disproving the hypothesis proposed by research question 3. These data also disprove the hypothesis that school-related variables are more important explanatory variables of student achievement than student-related variables for low-performer countries when compared with mid or top-performer countries. No consistent pattern was observed between healthier versus poorer countries, corroborating the hypothesis that the Heyneman-Loxley Effect is no longer in place as suggested by others (See Baker et al. 2002). Although school and teacher-related variables account for a larger amount of student reading literacy variance than student and family-related variables, there was no clear pattern for the overall relevance of school and teachers' variables across countries in all levels of country performance.
An increase in the number of formal schooling years across countries, as well as a mass schooling investment, focused on school and teaching quality backed by nationwide uniformization of curricula, learning goals, and teacher training may be the cause of lower effects of school and teacher-related variables on student achievement. On the other hand, the parental expectation on children's education and families' home resources for learning can overcome schools' limitations for learning. This does not imply that within-countries school variation does not occur. As the data shows, low-performing countries do have larger between-schools variation then top-performing countries. However, no common effects of school and teacherrelated variables can be generalized to a one-size-fits-all recommendation.
The results presented in this study are significant for both evidence-based policy recommendations and education practices aimed at improving reading literacy. First, the results show that many student and family-related variables are strong explanatory variables of reading literacy variation within countries. A common set of variables to most of the countries include early literacy tasks, parents' expectations, and student confidence in reading. Results presented in this study suggest that investment in the development of pre-school/kindergarten systems can give children a leading advantage for their schooling years. Also, improving parents' expectations on their children's education, trough e.g. adult-education programs reinforcing the value of education, skills, and literacy competencies, can lead to better student achievement. Finally, improving student confidence in reading, trough e.g. programs to encourage reading both for literary as well as information proposes, can improve Countries are represented by 3-letter ISO codes. GDP Data was retrieved from the World Bank (https :// data.world bank.org/indic ator/NY.GDP.PCAP.CD) student academic achievement, not only in reading but also in all other disciplines since competence in reading is a must for learning. At the teacher level, policies oriented at reducing the students' limitations hindering learning (e.g., remedial classes, or recovery strategies for students showing learning difficulties) is, according to the PIRLS 2016 data, the most rewarding strategy. The multi-level analysis considering the complex sampling features of PIRLS-a methodological approach that should be chosen more often with ILSA data (Jerrim & Lopez-Agudo, 2017)-in this study draws attention to the fact that individual explanatory variable effects (like gender) may be over or underestimated when considered individually. This corroborates the findings of other studies that analyzed ILSA data other than PIRLS's. Since explanatory variables are not fully independent from each other, their effects should be considered together and not individually as is the frequent cause for the international reports of ILSA and the secondary data analysis research that generally follows. Results gathered in this study show that there is a large between-countries variation in both reading achievement and its explanatory variables recommending against the generalized adoption of policies and practices from top-performing countries as a short-term fix to improve reading literacy in mid or low-performing countries. Even within low-performing countries, there are top-achiever students and schools whose policies and education practices are better suited for the countries' socioeconomic and cultural landscape then possibly those from top-performing countries.
Despite the importance of these conclusions, PIRLS has limitations that must be considered. PIRLS and similar ILSA studies are sample-based studies with non-participation and biases in the selection of students possibly hindering the generalization of conclusions. Cross-cultural differences between-countries and regions may also hinder the transcultural validity of both achievement tests and psychometric scales. Lack of strong measurement invariance, as well as different coverage of the students' abilities by the test items, may render the between-countries mean score comparisons untruthful. Additionally, PIRLS, like all other ILSA, is a correlational study by nature and thus causal inference from significant correlations and regression models' coefficients may not be more than a form of statistical fantasy. Proper care must be exerted before the acceptance of causal effects.