Wednesday, August 02, 2006

History Lesson II

In the previous post we saw that states have an incentive to skew their data on student testing. There is a similar dynamic at play in teacher testing. The NCLB requires that teachers demonstrate competence “as defined by the state, in each core academic subject he or she teaches.”

States have free reign. They can use their own tests, and most of the big states do. Even on ETS tests, they set their own passing scores. Most states require that their new teachers pass some sort of subject matter competency test, but verteran teachers can opt to by-pass any direct measure of competence by jumping through a few additional hoops called HOUSSE.

Such a system creates lots of paperwork headaches for lots of educrats, but has little chance of actually accomplishing the goal of improving teacher quality. It is a system that creates pressure for the states to simply define low standards that assure their own success, rather than make politically difficult changes to improve the quality of the teaching force. The federal government only seems to care if the states are meeting their self-defined standards. It is a wonder that any states ever come up short. To understand, in detail, what the state standards really are, is a laborious task. I know of only one significant attempt.

In their 1999 report Not Good Enough, the Education Trust examined the content and passing criteria for a large number of such tests. They came close to catching the states in full fledge deception mode. But for one major oversight (to be explained shortly) they might have revealed one the states’ clever tricks to obfuscate performance. Unfortunately their report didn’t get the attention it deserved and the deception has continued into the NCLB era.

On the test of secondary mathematics content knowledge (the Educational Testing Service’s Praxis II 0061 test), the Education Trust reported that two states set passing standards below 50%. Fifty percent seems to be a psychologically important threshold, so this finding was highlighted in several subsequent studies. For example, the following appears in this 2000 report Preparing and Supporting New Teachers prepared by SRI researchers for the U.S. Department of Education:
Critics argue that the teacher tests are too easy and that the passing scores are benchmarked very low in most states. For example, on the Praxis II mathematics content tests, teacher candidates in Pennsylvania and Georgia can pass with fewer than 50 percent of the items answered correctly (Education Trust, 1999).
This is on a test that Not Good Enough told us was largely at the high school level, and could be passed by a B+ high school student.

This low pass score problem got some attention, but it just wasn’t a big enough issue. After all, only two of the thirteen states set pass scores this low, and both were almost at 50%. Besides these were standards that defined the minimal ability beginning teacher, not the ”highly qualified” teacher of today. In addition the problem was left unquantized, that is we didn't know how many of these barely passing teachers were actually teaching.

An additional problem with Not Good Enough was that the Education Trust’s policy recommendations were so unrealistic. In their 2000 report Generalizations in Teacher Education: Seductive and Misleading Gitomer and Lantham state:
Finally, there is increasing policy debate concerning the raising of passing standards for teacher licensure tests. Organizations like the Education Trust (1999) have proposed deceptively simple solutions, such as “raising the bar” for teachers by requiring them to meet far more stingent testing guidelines than are currently in place in order to earn a license to practice. This myopic perspective, however, fails to acknowledge the complexity of the issues embedded in teacher reform. While higher passing standards would elevate the academic profile of those who pass by reducing the pool of candidates and selectively removing a group of individuals with lower mean SAT scores, higher passing standards would also limit the supply substantially. If the highest passing scores currently used in any one state were implemented across all states, fewer than half the candidates would pass Praxis I, and fewer than two thirds would pass Praxis II. Without other interventions the supply of minority candidates would be hit the most severely. For example, only 17% of the African-American candidates would pass Praxis I, and just one third would pass Praxis II. The dramatic effects that would be brought about by raising passing standards require careful policy analysis.
So what educrat would want to raise standards if these would precipitate a crisis of quantity and diversity in the teacher workforce?

Unfortunately, the Education Trust’s data, while technically accurate, was misleading. In a previous post, “The Highly Qualified Math Teacher”, I showed how the pass scores used by the Education Trust grossly overstate the teacher examinee knowledge because the Praxis II tests allow guessing without penalty. Under these conditions an examinee with zero content knowledge still gets 25% of the questions right. The knowledge represented by that 46% raw score shrinks considerably when you realize it is on a scale where zero knowledge earns a 25% raw score.

The Education Trust’s numbers can be adjusted to account for this condition. With this adjustment zero content knowledge maps into the expected zero percent. In the table below I reproduce the Education Trust’s table, but add a column with this adjustment.

The following table, taken from Not Good Enough, shows the 1999 performance of teachers taking the 0061 exam. The second column gives the 1999 pass score (or cut score) for each state. The third column is the percentage of correct answers that corresponds to the pass score. The fourth column is an adjustement to third column that corrects for the “free guessing” effect. The last row is also added.

Praxis II (0061) cut scores by state (1999)
StatePassing Score (1999)Estimated
% Correct
to pass
% Correct
to pass
North Carolina1335337
West Virginia1335337
New Jersey1305135
Knows Nothing10025  0
Table 1. Table from Not Good Enough. The fourth column and last row are added.

Somehow, for all their diligence in analyzing this test and compiling this data, the Education Trust missed this important correction. They did not mention that the Praxis II allows free guessing. They did not tell their readers that 25% would represent zero content knowledge. So no one reading their report could even infer that such a correction was needed.

What if they had reported that 12 of 13 states set passing scores at a level of knowing less than 50%, several under 40%, of this high school level material? This issue would have received a lot more serious attention. At some point the standards are so low and so widespread that they just cry out for attention.


Anonymous said...

Hey, landed on your blog, nice stuff. I found a cool new tool for our blogs... It helps get latest news for our keywords directly on to our blog. I added it on mine. Worked like a charm.

Dan said...

Hi. I just took 0061 this morning. I'm graduating with a BS in math and have 3.2 in the math courses.
The math content is very challenging. Harder than the exit exam i needed to take for being a Math Major at college. The test scores might be low since it is so rigorous, often takes 3 steps with a vast amount of insight for 1 question. Also because partial credit does not exist on the multiple choice questions. I know I was on the verge or a couple solutions but could not get an exact answer to bubble in. I would most likely get 2/3 of the points in an open ended situation where most likely got 0% on some of them. I guess you can argue guessing is kinda like partial questions, but 1- The answers are done in such a way that it's either you have an answer or you blankly guess on most problems. 2- 1/4 is a lot lower than 2/3.

And of course there is some bias with trying to draw any teachers in. Most competent people at math go into the private sector for money, not to teaching for moral validation. I do feel that the test does need to be hard to judge whether someone knows the information or doesn't. I'm not sure if this test does it and can only be assessed i think in an open ended format(questions are show work and tell the answer).

By the way, I enjoyed reading your blog!