Summary of the project
Standards based education requires the development of standards. A standard setting procedure is a subjective activity (Glass, 1978) in which teachers and other educational experts indicate their expectations of student performance. These expectations are captured in (performance) standards and used to adapt and direct curriculum and instruction and formulate learning goals. Because the standards are of direct influence on educational practice, it is important that their values are set in a valid and reliable way. However, it is not easy to define the reliability and validity of standard setting procedures (McGinty, 2005). Reliability is often established by checking whether teachers are consistent in their own judgments and whether they are consistent among each other, although teachers and experts from different backgrounds of course can hold different expectations without being unreliable. Validity is even harder to establish and is mostly addressed in relation to the value of the standard (eg. does this standard distinguish well between children just passing and just failing?) and the type of procedure that is used (eg. do method A and method B lead to similar standards?). Researchers only recently started to study the process of the standard setting procedure itself in order measure validity.
Recent studies indicate that the height of the standards may be influenced by different factors, like background and attitudes of the teachers (McGinty, 2005) their understanding of the standard setting task (Giraud, Impara, & Plake, 2005; Hein & Skaggs, 2009; Skorupski & Hambleton, 2005) and the age and domain for which standards are being set (Black & Wiliam, 2004; Hattie & Brown, 2003). These factors are mostly studied by using questionnaires (Dawber & Lewis, 2004; Skorupski & Hambleton, 2005), interviews (Ferdous & Plake, 2005; Hein & Skaggs, 2009) or thinking-out-loud protocols (Dawber & Lewis, 2004). Detailed analyses of the content of the group discussions are hardly made. These group discussions however are of influence on the height of the standards, since standards are adjusted based on the arguments for lowering or raising them. This study describes how participants adjust the height of their standards in response to the small group discussion and therefore contributes to insight in the factors that influence standard setting. The leading research question is: what is the relation between the standards that participants set and their discussions during the standard setting task?
To answer the research question, small group discussions during two standard setting tasks are audio recorded and analyzed. In September 2010 standards for reading comprehension were set for grade 2 and 3. March 2011 the same group of teachers set standards for math for grades 1, 2 and 3. Approximately 35 teachers, remedial teachers and school directors participated in the standard setting procedures. They set standards on 4 levels (Minimum, Fundamental, Proficient and Advanced) and discussed these in small groups, after which they could adjust the height of their standards. Standards were formulated on the national proficiency scale of Cito, the Dutch testing and assessment centre.
The results of the standard setting are analyzed twofold: first the heights of the initial individual standards are compared to the heights of the standards after small group discussion. Second, the contributions of the participants during group discussion are analyzed. These two types of data are then merged to see how adjustments in the height of the standards relate to the group discussions. All discussions are analyzed using a coding scheme (like in studies of Mercer, 1995; Snow & Kurland, 1996) in which every 20 seconds is recorded who is speaking, whether the conversation was about learning content reading or math and which role the speaker took in group dynamics. Group dynamics is about what participants ‘do’ in conversation: like asking questions, giving an opinion, summarizing or (dis)agreeing. Group discussions in which many questions are asked, many opinions are given and participants respond critically to each other will lead to standards that are more thoroughly grounded, especially when the discussion is mostly about the content of reading or math.
First analyses of the standards setting task for reading comprehension show that the average height of the four standards is hardly adjusted after group discussion: Minimum is raised 0,73 (95% confidence interval between -0,46 and 1,91), Fundamental is lowered 1,12 (-2,18, -0,6), Proficient is lowered 2,30 (-3,89, -0,71) and Advanced is lowered 1,24 (-3,96, 1,48). However, there are major differences in the adjustments of individual participants. These differences can not be explained by participants moving towards each other in the small groups. Almost one third of the participants hardly adjusted their standards (0-2 points), while 9 participants adjusted their standards with at least 20 points. The group discussions will be analyzed to understand why some participants make major adjustments to their standards after group discussions and others do not.
Current status of the project: Data analysis of the standard setting procedures for reading comprehension and math is currently conducted.
Black, P., & Wiliam, D. (2004). The Formative Purpose: Assessment Must First Promote Learning. In M. Wilson (Ed.), Towards Coherence Between Classroom Assessment and Accountability (pp. 20-50). Chicago: NSSE.
Giraud, G., Impara, J. C., & Plake, B. S. (2005). Teachers' conceptions of the target examinee in Angoff standard setting. Applied Measurement in Education, 18(3), 223-232.
Hattie, J. A., & Brown, G. T. L. (2003). Standard setting for asTTle reading: A comparison of methods No. asTTle Technical Report #21). Auckland: University of Auckland/Ministry of Education.
Hein, S. F., & Skaggs, G. E. (2009). A Qualitative Investigation of Panelists' Experiences of Standard Setting Using Two Variations of the Bookmark Method. Applied Measurement in Education, 22(3), 207-228.
McGinty, D. (2005). Illuminating the "Black Box" of Standard Setting: An Exploratory Qualitative Study. Applied Measurement in Education, 18(3), 269-287.
Mercer, N. (1995). The Guided Construction of Knowledge. Clevedon: Multilingual Matters.
Skorupski, W. P., & Hambleton, R. K. (2005). What Are Panelists Thinking When They Participate in Standard-Setting Studies? Applied Measurement in Education, 18(3), 233-256.
Snow, C. E., & Kurland, B. F. (1996). Sticking to the point: talk about magnets as a context for engaging in scientific discourse. In D. Hicks (Ed.), Discourse, learning, and schooling (pp. 189-220). Cambridge: Cambridge University Press.