EduInsights: Nutmeg (or Nutty) Reasoning

In my previous post I used a standard on the Praxis II test of mathematics content knowledge that Connecticut had developed prior to NCLB, in order to give the scaled scores some perspective. Some may have noticed that the pass score used by Connecticut — 137 — is less than this initial standard of 141. This reduction was recommended after the pass rates were known:

[Recommendation] Adjust the passing standard on the Praxis II Mathematics: Content Knowledge test from 141 to 137 and apply the adjusted standard to all Connecticut candidates who have taken or will take this test (July 1, 1997, to present). In 1997, when this test was reviewed by a representative panel of mathematics teachers, they followed the modified Tucker/Angoff method for standard setting and recommended a score of 141. The standard practice of adjusting the recommended score by one-half of the standard error of measurement (SEM) (See page 4 for explanation) was not done for the mathematics test. Since there were no national or state data available for this newly developed test, the Advisory Committee’s recommended passing score was presented to the Board for adoption with the intent that the passing rate would be monitored and a recommendation would be made to the Board for an adjustment, if warranted. Using the unadjusted passing score of 141 resulted in a comparably lower first-time and final pass rate for mathematics than the other Praxis II tests. The initial pass rate for mathematics is 51% and final pass rate is 70%, which is the lowest of all the Praxis II tests. Adjusting the score to 137 is expected to produce a final pass rate of approximately 76% which is more in alignment with the pass rates of other Praxis II tests, does not significantly lower the mathematics knowledge and skill required for passing the exam or for teaching, and would move Connecticut from the third to the seventh highest passing score of the 20 states using this exam. ...

Connecticut's passing standards were established for each test using a modified Tucker/Angoff method for the multiple-choice tests and a holistic method for the constructed-response tests. The standards were set by Connecticut educators following a process that consisted of: establishing a preliminary standard using expert judgment and analyzing the results; and presenting the standard for Board adoption with a statistical adjustment downward of one-half a standard error of measurement (SEM) [Except for the Mathematics Praxis II Test]. The SEM is used to describe the reliability of the scores of a group of examinees. For example, if a large group of examinees takes a test for which the SEM is eight, then it is expected that about two-thirds of the examinees would receive scores that are within eight points of their true score (plus four or minus four). An examinee’s true score can be thought of as the average of the examinee’s observed scores obtained over an infinite number of repeated testings using the same test (Crocker & Algina, 1986).

Crocker, L. M., & Algina, J. (1986). Introduction to classical and modern test theory. Orlando, FA: Holt, Rinehart and Winston, Inc. Publishers.

The underlining in the passages above was added by me. Let's parse the edu-speak to see if we can gain some insight as to what is really going on here.

Point 1: with the intent that the passing rate would be monitored and a recommendation would be made to the Board for an adjustment, if warranted — Translation: We determined a minimal ability level, but if do not get enough teacher candidates that meet this standard, we will lower the standard until we do.

Point 2: Adjusting the score to 137 is expected to produce a final pass rate of approximately 76% which is more in alignment with the pass rates of other Praxis II tests — Translation: We don't actually have a reason to expect that the passing rates on different Praxis II tests should be the same. They test different subjects and draw from a different pool of candidates, but this way we can always adequately staff our schools by making the criteria pass rates instead of some objective standard of competence.

Point 3: establishing a preliminary standard using expert judgment and analyzing the results; and presenting the standard for Board adoption with a statistical adjustment downward of one-half a standard error of measurement — Translation: We know standardized tests are used to measure some intrinsic ability level. That measurement may be wrong. The statistics are such that ETS can estimate what the error bars on the measurement are. As explained above, this tells us that two thirds of the time the actual ability level should be within ±4 points of the measurement. A person with an ability of 141 might score a 137. We should let him pass.

The stuff about SEM's is correct, but the proposed adjustment is exactly backwards (unless of course the real purpose is just increase the number of people who pass by 6% so that you can avoid a shortage).

Think about it. The minimal ability level was estimated at 141. Connecticut is saying they should adjust the passing score so that this minimal ability person will pass on the first try, even if he is having a moderately bad day (scores a 137). But this means that a person whose “real” ability level is a 133 can now pass if they are having a moderately good day (score 4 points above their “real” level). They can take this test an unlimited number of times. Eventually they will have a good day. You have just guaranteed that teachers with intrinsic abilities 8 points (more if the examinee has a very good day) below your minimal standard will pass.

EduInsights

Sunday, July 30, 2006

Nutmeg (or Nutty) Reasoning

1 comment:

About Me

Blog Archive

Education Links