Recently Arne Duncan granted a one year reprieve to NCLB waiver states from using test scores in teacher evaluations. http://blogs.edweek.org/edweek/campaign-k-12/2014/09/which_nclb_waiver_states_may_d.html?cmp=ENL-EU-NEWS3. This only makes sense since this is the first year for the new Common Core aligned assessments to be administered. But for Florida teachers, it’s full steam ahead with VAM! Like “The Little Engine that Could,” the Florida VAM is no quitter. The FLDOE is determined to VAM Florida teachers by any means necessary.
Not one to shy away from sending emails to high level officials, I sent the Florida Commissioner of Education the following email last week:
“I was very disappointed to see that Florida will still be using student test scores in teacher evaluations this year even though Arne Duncan granted a one year reprieve from test based evaluations to the NCLB waiver states.
I am utterly perplexed as to how the state plans on calculating VAM scores for teachers in the 2014-15 school year. Almost all of the tests our students will be taking this year are new exams that are still in the process of being created. Students have not been given a baseline assessment. How can the VAM model possibly predict growth for a test that a student has never taken before?
Perhaps you can direct me to the appropriate statistician at the Florida Department of Education who can better explain to me how an algorithm is capable of predicting student learning gains on a test that doesn’t even exist yet?”
One week later, I received an email from someone (or something) called “ARM”:
“The VAM model calculates an expected score for an individual student based what similar students scored on the test during the same year. It does this using a series of covariates that have been shown to be related to student learning growth. Prior test score is the most significant of these predictors. The expected score is based on the actual performance of students on the assessment during the year, and these expectations are set individually using information from the covariates contained in the model. It is not dependent on alignment of scales between the prior assessment used as a covariate (FCAT) and the current test score expectation being calculated (FSA). The fact that the FSA is new does not impact the way the model functions or its ability to perform the expected score calculation. To take a simple example, height and age can be used to predict weight using a simple linear regression. An expected score for weight is computed and compared to actual weight to determine the fixed effects beta coefficients based on how it is measured. If the observed weight data is measured in pounds, the predicted values calculated using these fixed effects will also be in pounds. If they are measured in kilograms, the predicted values will be in kilograms. In the case of VAM, student scores on the FSA will be predicted based, in part, on prior performance on the related FCAT 2.0 assessment and how students who performed similarly in the prior year on the FCAT 2.0 scored in 14-15 on the FSA.”
So…translated into my social studies teacher vernacular with only two statistics courses under my belt, I take the statement “Prior test score is the most significant of these predictors” to mean that a student who scores low on one standardized test is predicted to reliably score low on another standardized test. If this is the case, then are standardized test scores measuring what students know or how well they will do on any standardized test?
“The fact that the FSA is new does not impact the way the model functions or its ability to perform the expected score calculation.”
This is perhaps the scariest statement of them all. Of course any algorithm can function if you plug in a numerical value for a given variable. And that is exactly what the FLDOE will be doing this year when it calculates teacher VAMs. They are just going to plug in any damned test score. This further proves my first point. They are so confident that past test performance is the best indicator of future test performance that they can plug in any standardized test score to create a VAM ranking for a completely different test.
“In the case of VAM, student scores on the FSA will be predicted based, in part, on prior performance on the related FCAT 2.0 assessment and how students who performed similarly in the prior year on the FCAT 2.0 scored in 14-15 on the FSA.” There you have it Florida teachers. The state indeed plans on using FCAT 2.0 test scores to predict student growth on the supposedly much more rigorous FSA exam.
Not one to ever feel accountable for its role in enforcing teacher accountability models, the FLDOE has passed the burden of creating end of course exams for every course in Florida on to individual school districts. Districts can’t even give teachers a straight answer as to which test data will be used for 50% of their evaluations this year. I contacted the head of statistics and research for my county to ask if they would be using a VAM model or proficiency rates for the district created EOCs. Here was her response:
“We will be using many models. Some will only have proficiency ( like AP courses) and some will have growth. The details have not all been worked out.”
Umm….let’s tackle the first absurdity. Judging AP teachers on proficiency rates is a terrible idea. Some teachers in my district have the ability to pick and choose who gets to take their course. Other AP teachers have schools that treat their classes like dumping grounds for students who can’t fit in the core classes that are covered by the class size amendment. My school has the AVID program which places nontraditional AP students in the AP program. The AVID students rarely pass the AP exams. That doesn’t mean they don’t benefit from taking the course. But it does mean that any AP teacher at a school with the AVID program will have significantly lower proficiency rates than a school that does not.
So…the district does plan on calculating VAM scores for the other district created EOCs. Only no baseline assessments have been administered as of the end of September. What data are they going to use to predict growth? See the FLDOE email. They will just stick whatever test score in the algorithm because the model will function even if they use a math test score to predict learning gains on a world history EOC.
Finally, I questioned which test data will be used for a teacher who teaches two sections of regular world history, two sections of AP world history and two sections of US history EOC courses. They will potentially have four different sets of data to choose from (one VAM for FSA reading, one VAM for the World History EOC, one VAM for the US History EOC and a separate ranking for their AP pass rate). I would be willing to bet money that the same teacher would have vastly different test score rankings in each subject. An AP teacher with gifted students might have a decent pass rate on AP exams while simultaneously having a low VAM for the FSA reading exam. So which test scores does the county use in the teacher’s evaluation? Needless to say, I did not get an answer to that question. I’m sure they wish they could respond to my emails the same way one of my little darlings reacted when I reprimanded him for talking off topic while doing group work, “Why don’t you just mind your own business!”