• Tuesday, May 29, 2012

Previous

Next

The Question of Assessment, Part 2

August 29, 2011, 11:22 am

I received great feedback from my last post, much of it articulating the difficulty of coming up with a fair way of assessing all institutions of higher learning—public, private, and proprietary—and I would like to incorporate much of it into this post. My first reaction when this topic came up (as I read responses to a post in which I singled out proprietary colleges as most in need of accountability) was that we certainly should not adopt a model such as the No Child Left Behind Act, signed into law by George W. Bush in 2002. Though the Act initially received overwhelming bipartisan support, it has been the object of much controversy ever since. The mandate was grossly underfunded and thus could not be adequately executed. More substantively, it seemed to create a culture of “teaching to the test.” Teachers, knowing that their school’s funding might ultimately rest on the annual test scores of its students, would focus on teaching the content of the tests, which would in turn limit the creativity and independence of their pedagogy. And that in turn created a decade’s worth of grade-school and high-school students more knowledgeable about test-taking strategies than the content that those tests ostensibly reflected.

No one seems eager to replicate that model on the collegiate level, although even before NCLB, there was apparently just such an undertaking. Commenter “emwhite” had this to say: “I think the last time this was broached was about 15 years ago in relation to the Goals 2000 initiative, sponsored by the US department of education, the first president Bush, and Bill Clinton. Several conferences were held and commissioned papers were published in the Journal of General
Education. Nothing resulted, though there was much talk of a national test of college graduates in the six areas of the proposal. Some reasons: those involved could not agree on just who would be defined as a college graduate, no money was to be provided for what would be a massive test development and administration enterprise, the great diversity of American post-secondary education militated against any single testing device, and—most obvious of all—nobody could come up with a way to induce college graduates to sit still for a test.”

He then wonders if others more centrally involved with that rather bizarre enterprise could provide more information: “Actually, it would provide good material for a CHE story.” Indeed, he or she is right: The story of how to hold America’s colleges accountable for the achievements of their students is a bizarre enterprise with a fairly long and complex history—perhaps better suited to an article than to my reflections here.

“emwhite’s” recollections of the Goals 2000 initiative raise questions that are completely unresolved today. How do we define a college graduate? Or do we devise different tests for community college graduates than for graduates of four-year baccalaureate programs? Even if we did so, that would not do justice to the vast variety of U.S. colleges and Universities.

“did18” elaborates: “there is no ‘one size fits all’ approach to assessment. In fact, it would be . . . impossible to use the same single measure for Pennsylvania State University, Pennsylvania College of Technology, the University of Pennsylvania, and Central Pennsylvania College. Each institution purports to promote student learning in different areas and in different ways.” The same claim could legitimately be made about every state university system in the country, and that doesn’t even approach the question of how to compare graduates of Bob Jones University to graduates of Harvard: private universities have widely disparate missions too.

And then, what would the tests (and I think it’s obvious in light of the above comments that there would have to be several), cover, since students begin to specialize from the moment they enter college, if not earlier? As a proto-English major in high school, the quadratic formula was lost to me at 15. Would I have had to relearn it? Would every animal husbandry major at Ohio State be forced to read Robinson Crusoe? (I actually taught such a student once. The thesis of her paper about Defoe’s masterpiece was: Robinson Crusoe mistreats his goat. It’s actually true, but hardly literary criticism. How would one score such a response?).

Finally, while many Americans would like to know whether our colleges are being responsible in educating their students, they would balk at the price tag of a nationwide accountability rubric. The first time that an assessment tool like that was implemented on a grand scale occurred when the U.S. entered World War I: 1.7 million inductees were given I.Q. tests (the War Department picked up the bill). The results were initially inconclusive, in the sense that the Army didn’t really make use of them, and the in the postwar era the tests were put to nefarious uses by eugenicists to prove that some races and ethnic groups were more intelligent than others (yet another problem with accountability testing).

In an article published in the Chronicle on February 23, 2011, Theodore C. Wagenaar, professor of sociology and a faculty associate in the Center for the Enhancement of Learning, Teaching, and University Assessment at Miami University of Ohio, strongly supported assessment, concluding: “Let’s not do assessment just because it is mandated. Let’s not do it to make accreditation agencies happy or because everyone else is doing it. Let’s do it to improve learning.” I’m all in favor of his sentiment. I just can’t figure out the how.

This entry was posted in Uncategorized. Bookmark the permalink.

  • Print
  • Comment
  • betterschool

    Sophomoric. I would invite the author to seek the counsel of competent measurement scientists instead of blowing phlogiston into straw men.

  • jeff_winger

    betterschool your ad hominem reply, with its assurance that there is such a thing as “competence” and science is the ultimate authentication of truth, misses the point in favor of the ideology that measurement science is more than it is.
    There is NO way to definitively measure a human being’s [insert what you will] capacity. Such things are the definition of intangible. Even physical prowess, which we would like to think is measurable, is not always so. There is something intangible in Lance Armstrong, Roger Federer, Lebron James, etc. No metric will ever quantify what makes them so different.
    Further, accurately and completely measuring the addition, the increase, the education of a human’s capacity is also simply impossible. Sure, we can create methods to ensure that something good has happened, that a change in human capacity has been created, but that is all we can do.
    You measurement folks would reduce infinite complexity of humanity, of that social interactions that is education, into a homogenous measurable event, that is generalizable across a mass of humanity that is indistinct from each other.
    Such measurement would make people numbers in a ledger, data points, and not human. This is why such measurements fail utterly.
    This is why in an era of standardized testing highschools are graduating the worst educated generation of Americans since there was mandatory public education in this country.
    Because there is no standard.
    There is only infinite, or nearly infinite, variation and is in these zones, invisible to the rubric, that education happens, that learning occurs.

    This measurement kick would almost be funny if so many people didn’t believe it so much.
    This is the madness that wants to believe that out there in the world there is a “mass” of humanity, a great blob that is essentially one. But we know that it isn’t so. There is a multitude of humanity, a great grouping of unique individuals often similar, of a type, but NEVER the same.

    Claiming to measure them, measure the differences between them, measure their capacity and its increase or decrease is shear madness. It is the height of scientific hubris to say that it is possible when it is so clearly not.
    We all know this is true. Scientific efforts to measure such things are like two hands thrust through the doorway of a darkened room, sure we can detect somethings, but the great bulk of the room is beyond or capacity to know.

    The other thing about measurement of education is the suggestion that the educated have gained something, become better. What if they haven’t? What if “education” is really just about changing a person’s knowledge in a lateral way, rather than vertical or hierarchical way? Maybe this change makes them more or less valuable economically. Maybe this change makes them less knowledgeable about whatever things they would have known had they not gotten an education, and maybe that education just supplanted the knowledge they would’ve had with another path in life.

    Maybe, for adults, there is no such thing as learning — that is the gaining of knowledge — only the controlling which knowledge and skills gets added.

    Perhaps a college student would’ve become a welder had they not gone off to college. As a welder they would’ve learned x volumes of things. Going to college didn’t change the rough volume (however unmeasurable) of their knowledge, only the content, and it doesn’t make them more or less “brilliant” than they would’ve otherwise been.

    Maybe I am wrong, but it seems to me that this last possibility is not considered with enough seriousness.

    What if there is no teaching only guiding. It looks like teaching, but what it really is is just controlling the knowledge a student learns by blocking some information and skills and replacing them with others. That the chosen knowledge is privileged by those in power has given it the aura of importance, but is it really more important than any other knowledge?

  • betterschool

    In addition to the epistemologically false dichotomies you invoke with respect to “capacity” (your term), your comments suggest a lack of awareness of the learning sciences (do you know how many there are?) as well as evaluation science. Humans do not learn in the fashion that your logical reconstruction and dead spatial metaphors suggest. On the main point, I suggest that you begin with the work of Michael Scriven, author of more than 400 books and refereed articles, chaired or served on more than 40 editorial creator of the logic of modern evaluation science, president of AERA, president of AEA, served on more national evaluation science panels than I can recall, mentor to many of the best and brightest behavioral and education measurement scientists working today. See: http://michaelscriven.info/images/MS.CV.BIG.2008..4web.pdf. This link will take you to a short version of Scriven’s CV (40 pages) which, itself, will demonstrate longstanding work on the part of many scholars in areas of which you appear to be unaware. From there, you can follow your own path to becoming familiar with the discipline that you seem to think so ill-formed that it has yet to consider your simplistic challenges.

    In addition to being embarrassing to our profession, I find your comments intellectually treasonous. True academics would not contemplate stepping outside their field of expertise (say, biology) to offer professional judgment in another discipline (say, physics). Yet, you (presumably an academic) think nothing of doing exactly that when it comes to teaching and learning and the sciences behind them. Somehow in your mind, the fact that you teach and that your students learn qualifies you as an expert.

    Pointing out that you and the author lack the expertise to comment on this topic is not an ad hominem. I’m speaking to your lack of knowledge of facts and theory, not to your person. Other readers who are not learning and measurement scientists have the right to know that you are unqualified.

  • http://twitter.com/MakeCollegePay MakingCollegePay

    “That the chosen knowledge is privileged by those in power has given it the aura of importance, but is it really more important than any other knowledge?”   

    Philosopically, you could argue that all knowledge has value, however, in the marketplace, some knowledge is rewarded and other knowledge is not.  Given the dual role of universities in the current economic system, this often seems to be a point of tension.

    Universities as conduits for economic development and workforce preparation (as many public universities are expected to do) suggests that there is increased pressure to focus on knowledge that is valuable in the economic market.  Accountability and assessment is derived directly from this perspective. Otherwise, why would it matter?

    As a professor, I am seriously engaged in assessment associated with my School. These activities are associated with our accreditation.  One thing that I have noticed in our development and implementation of these efforts is that focus on “content-learned” is easily measurable, but not very satisfying as an indicator of quality.  Focusing on “skills” or “mastery” or “competence’ is more difficult to assess across large quantities of students, but is closer to what is desired.  More vague elements such as the quality of  ”critical thinking” and “communication” is even more difficult and desirable.  I agree with Mr. Donoghue and with Mr. Winger that assessment is frustrating, difficult, and potentially not meaningful.

    However, I have learned, though my adventures in assessment, that one element that needs more attention is FACULTY best practices.  Personally, as someone who really loves to teach and values my “craft” in the classroom, I am constantly doing personal and classroom assessment…both formally and informally.  This is not something that always gets reported, but it is something that is constantly done.  However, I see many of my colleagues who do not represent anything close to “best practices”.  Faculty who lecture by reciting the book via powerpoint (or don’t even bother to create their own classroom power points). Faculty who are using the same old,yellow transparencies.  Faculty who do not know how to use everyday software to enhance their teaching or do their grading.  Faculty who don’t answer email and who never utilize Blackboard. Faculty who rely too heavily on secretaries and TAs.  Faculty who focus more on what they want to teach than what students need to learn….and then complain about the poor quality of students.  Faculty who use technology to avoid engagement and even grading.  I could go on and on…

    Perhaps the first step in assessment and accountabiltiy is not student learning but assessing and expecting compliance with best practice (and 21st century performance) in teaching.  Academic freedom is no excuse for poor teaching craftmanship.

  • betterschool

    “Focusing on “skills” or “mastery” or “competence’ is more difficult to assess across large quantities of students, but is closer to what is desired.  More vague elements such as the quality of  ”critical thinking” and “communication” is even more difficult and desirable.”

    Correct. If you are not already doing so, you might engage the creative minds in each department to define capstone products that demonstrate integrative mastery (i.e., knowledge of relevant facts and generalizations and the ability to synthesize them into a professionally meaningful whole that possess both convergent and discriminant validity). A very simple example can be offered in the field of accounting. Instead of focusing on the myriad of individual elements of accounting (DR, CR, etc.), the faculty creates a full blown financial analysis that cannot be performed without mastery of the many individual facts, etc. Thus the functional challenge is at a high level that is authentic with respect to the profession but the product contains mastery of individual elements as well. The rubric can score at all levels (facts, models, laws, principles, analysis, synthesis, etc.).

    Unfortunately the research on critical thinking does not, as you observed, lead to as much valid guidance. Most tests lack true validity and it turns out that the construct itself is family resemblance and also polymorphic. This is too complicated to explain here but consider that “critical thinking” for a nurse is quite different (structure and decision criteria) than critical thinking for an internal medicine specialist. Either profession can get sued for adopting the other’s mode of critical thinking, even or especially for best case critical thinking. The best undifferentiated CT assessment to date is a critical writing assignment in which a well defined rubric is combined with a panel review process to ensure consensual validation. Unfortunately, doing it right has a comparatively high per-unit cost. However, the cost can be worth it in terms of diagnostic and predictive value. A more common, and illogical, decision is to pay $5 per head for any one of the many scientifically invalid assessments on the market.

    An aside: at the beginning of the 20th century, this kind of writing assessment was administered to college level nuns. Many decades later, it turned out that performance on this assessment (in the nuns’ early 20′s, I believe) was highly predictive of of he development of Alzheimer’s in later life. The research is worth reading and can be found easily (mid-1990′s) although I don’t have the reference on hand.

    This brings me to my last point. There is more than a little evidence that critical thinking can be enhanced by teaching tools of disciplined thought (i.e., philosophy, law courses, etc.) but that, fundamentally, CT is more like IQ in that it is not largely modifiable. This question is still open.

  • missoularedhead

    betterschool, while I appreciate the idea of a capstone, this is something in place in many institutions (although not at the community college level, for several reasons).  And yes, a great deal of what you say relates well to the 4 year schools.  But as a community college teacher, the student needs/wants are extremely disparate.  In a history class, I may have someone who wants to go on to do a BA in history, someone who is taking the class for personal enlightenment, someone else who is taking it to fulfill a requirement in general education, and someone who is taking it because they needed a class, any class, to fill their load.
    These students all have different desired outcomes.  While the first may want to master the information, the second really doesn’t care about anything except (insert area of personal interest…mostly WWII). The third student may have had horrible experiences in the past, or is someone who hates the idea of there not being any ‘right’ answer, while the last may gain interest, or may just be there to pass — barely.  Do I have to create different assessments for each student, based on what they need or want out of my class? And if so, how do I go about that without making it impossible to be an effective teacher, since it would seem I would spend all my time figuring out how to assess these assessments?  Do I make the assumption that all of my students will transfer to a 4 year school to pursue a BA in history? In something else?  And in the end, if some of the students don’t get what they ‘wanted’ out of my class, then does that reflect on me, or on them?  I can tell you what my course outcomes, as determined by committee, are, and I can tell you how I try to reach those outcomes, but those outcomes may be very different from what the students want out of the class.  Does that make those outcomes irrelevant?  
    These are the questions that have yet to be answered in any meaningful way for me, and all of them complicate the idea that there’s a single way to measure all teaching, or all learning.

  • emwhitephd

    I am pleased to see my earlier comment quoted in FD’s new article. Sometimes it seems as if commenting here is like throwing ideas down a well. It is good to be taken seriously.I am a bit disappointed that no other participants in that exercise in attempting a national assessment of college graduates joined in the discussion. “Betterschool” echoes my commissioned paper (published in the JGE) in saying that the best measure would be some kind of writing assignment; I proposed a portfolio assessment, carefully prepared and intelligently scored. What an enterprise that would have been! I am, btw, a “he,” FD, Edward M. White, now a visiting scholar at the University of Arizona, emwhite@u.arizona.edu.

  • betterschool

    Dr. White, Do you have a link? I would like to look at what you are doing or have done. I have heard that some nursing programs are doing well with writing assessments as evidence of CT. The problem is that their model wants to see pre/post gains and that is problematic not only with respect to the possibility of valid gains but explaining them if you get them. As Cronbach said a long time ago, explaining change scores is always difficult. As you can probably tell, I’m slightly more in the camp of CT being a trait cum ability that is a superset of being logical but with synthetic and divergent components. Such abilities do have a modifiable component but within limits generally determined by the practiced application of learned tools of thought (e.g., Coomb’s conceptual analysis can teach individuals how to discern common and excluded elements in model and contrary cases, moving to borderline cases; this makes them more “critical thinkers” even though their fundamental cognitive abilities have not changed). 

  • nlasla

    In terms of defining a college graduate, are  you familiar with the Lumina Foundation’s Degree Qualifications Framework?

  • betterschool

    missoularedhead, I considered using another term rather than ‘capstone’ but left it alone for its communications value. The real focus of my response was on authenticity and hierarchical integration in activities and assessments. Authenticity is the easiest: don’t have them doing things they wold not be doing in the workplace, role, profession, etc. Of course this immediately rules out writing 40 page papers (they would write a great deal but to defined professional functions) and multiple choice tests (no employer gives his employees such tests as a routing part of the job). Comprehensive integration requires more work. Even with the great diversity of inputs in the community and career colleges that trend toward the lower ends of key attributes, integration is still possible. As educators, we have a longstanding habit of measuring knowledge of individual elements of proficiency rather than proficiency itself. The reasons for this are beyond scope except to note that the culture is deep. Picking up on my earlier example, when I have worked with accountants, they keep asking how they can assess students without asking multiple choice and calculation questions about debits and credits, even though they have already acknowledged that the comprehensive integrated analysis requires that specific knowledge and that, if the knowledge is faulty, they can determine where it is faulty by analyzing the nature of the errors. Habit! Once faculty get the hang of comprehensive integrated activities and assessment, they seldom want to return to the 1906 model.