What are multilevel models?

Multilevel Models

Multilevel models have various names. In sociology, it is called the multilevel linear models, in biometry, it is known as mixed-effects models and random-effects models, in economics, it is referred to as random-coefficient regression models, and in statistics, it is named covariance components models. Currently, it is commonly referred to as the hierarchical linear models or multilevel models. Lindley and Smith (1972) proposed a model for analyzing hierarchical data with complex error structures and named it the hierarchical linear models, though it could not estimate the covariance components of unbalanced data. However, Dempster, Laird, and Rubin (1977) developed the EM algorithm to estimate covariance components, and Dempster (1981) demonstrated that it could be applied to hierarchical data structures, leading to the widespread use of hierarchical linear models in various academic fields, including education.

Hierarchical structure of educational data

In education, the general structure of data is that students are nested within classrooms, classrooms are nested within schools, and schools are nested within school districts. The most suitable model for analyzing such data is the multilevel models. If this hierarchical structure is ignored and single-level data is used for analysis, the results of hypothesis testing will be invalid. For example, if students nested within schools are analyzed through regression analysis, students from the same school are similar to each other, which violates the assumption of independence in regression analysis. This leads to an underestimation of the standard error of the regression coefficients, increasing the likelihood of type I errors.

In hierarchical data where students are nested within schools, students from the same school tend to share more similar characteristics than students from different schools. Therefore, when analyzing the academic achievement scores of students belonging to different schools using a multilevel model, the variance in students’ academic achievement scores can be divided into two components: variance due to the effects of different schools and variance due to the differences among students within the same school. Thus, the variance components can be separated into school variance and student variance.

Complex Sampling Design

Recently, large-scale data is collected using complex sampling designs. For example, the OECD PISA assesses the cognitive and non-cognitive achievement outcomes of 15-year-old students in both member and non-member countries. To do this, each country selects schools by considering factors such as regional scale and school type (middle schools, high schools, general high schools, vocational high schools, specialized high schools, etc.), and then randomly selects students within those schools. In other words, the data collected through PISA consists of both student-level and school-level data, and this structure is determined by PISA’s sampling method.

To gather this data, each country compiles a list of all schools attended by 15-year-olds, selects schools from this list, and then randomly samples students within the selected schools. Given the characteristics of PISA data, the most suitable statistical analysis method is multilevel data analysis. This type of sampling differs from the random sampling used in experimental designs. Therefore, a model that can handle this complex sampling design is the multilevel models.

댓글 달기

이메일 주소는 공개되지 않습니다. 필수 필드는 *로 표시됩니다

error: Content is protected !!
위로 스크롤