Tuesday, February 18, 2014

Academia, theories and the real world

Yesterday was the first day of my workshop on culture, evolution and collective rituals organized by the Kurt Lewin Institute, hosted by Groningen University. I gave a talk on cultural change and how evolutionary processes can help to understand how, when and why cultures might be changing. It was more of a big picture kind of talk, with an overview of what I would consider exciting theories in the social and biological sciences about cultural dynamics. In the afternoon, me and Nina Hansen then gave the students a task: Breaking them into small groups, they had to imagine that they are asked to provide advise to UN Women about addressing gender inequality in developing countries. Their task was to use their respective theories of research to come up with interventions to tackle the task.
The interventions were really fascinating, typically addressing issues of arranged marriages and/or education of women. I loved how the students focused on concrete examples and target populations and considered trade-offs between costs and benefits that keep practices that disempower women in place. This was fantastic stuff. Some of the proposed interventions were also really innovative and creative - big ups for that.
It was also great to see that people were really aware of cultural sensitivity, issues of trust and ethics. They showed amazing sensitivity to potential problems.
At the same time, what was really striking was the lack of psychological theory. These students are PhD students in leading Dutch universities. Their research is in social psychology, on issues around status and intergroup relations. I would have thought this makes their theories and research immediately applicable to this specific intergroup context of gender relations. Yet, beyond simple nods towards social influence and contact theories (without actually using these theories to develop or guide their intervention), there was a void of theory. Lots of common sense reasoning, but nothing that an educated individual may not come up with... I was struck by this.
Are our psychological theories irrelevant to addressing pressing global issues?

What could account for this observation?
One option might be that psychologists are really good in terms of addressing the problem, using a problem focused mindset. So the students turned to the problem and addressed/ identified key issues about the problem. This is great critical thinking - but then... what is the use of studying psychology if critical thinking is the key?
Even when pressed by me, the students came back to classic theories of intergroup relations. Even though they work on cutting edge stuff at world leading universities, nobody really stepped up with an example of recent psychological theorizing. What is the practical relevance of these theories?
Another option might be that our instructions led them astray. We asked them to address the problem. Yet again, involving Kurt Lewin's old mantra: There is nothing as practical as a good theory. Good theories should lend themselves to address problems.
Another option is that the students were aware of all the pitfalls and problems and exceptions and boundary conditions of their theories and therefore were reluctant to apply it here. Great - but then, what is the relevance of the theory if it can not explain key mechanisms of human behaviour? What is the usefulness of a theory if by virtue of its boundary conditions and equivocal findings it becomes inapplicable to real world contexts?
Or are psychologists just too shy or unwilling to convert their theoretical stuff into useful real-world applications? So the problem would be one of translating science and research into interventions.

I am not blaming the students, I think it reflects a deeper problem in our field. Our work is increasingly specialized and focused on minute details (which can and should be important), but miss the link to real world applicability. This is reflected in a similar way in this great opinion piece by Nicholas Kristof.

One great quote from this opinion piece:
A basic challenge is that Ph.D. programs have fostered a culture that glorifies arcane unintelligibility while disdaining impact and audience. This culture of exclusivity is then transmitted to the next generation through the publish-or-perish tenure process. Rebels are too often crushed or driven away.
Side note on that blog: The one thing that I do not agree with there is that I believe we need more (but useful) quantitative thinking and research to tackle social problems. The issue for me is translating it into language that is understandable by the general public (see Statistics as Principled Arguments).

Back to my main points: I can already hear some colleagues complaining about the importance of basic research. I love basic research. I think we need a better understanding of human behaviour and all the intricacies of it. But we are not doing physics, most of what we are studying is based on real world observations. So our insights should also provide some better understanding of how things could be changed (see also my earlier post on practical psychology).

We had a great discussion yesterday. I think this is the starting point. We need to reflect on our practices. Maybe (and I actually believe this) our theories in psychology are relevant. But psychologists are too reluctant to take their own research serious, tweak it to make it relevant and most importantly communicate it in a useful way to help address social issues.

Monday, February 3, 2014

Philosophy of measurement, functional equivalence, DSM V... or how did I get here?

Here is a very raw and unfinished "trying to wrap my head around some rather confusing issues" post. I have been thinking about levels of equivalence or invariance in cross-cultural measurement. I have been a wee bit unhappy with a couple of conceptual problems in the framework, but particularly the most general or abstract level of 'functional equivalence' has intrigued me for a while. Traditionally, it is more of a philosophical or theoretical statement of the similarity of functions of a psychological construct in different cultural groups. In other words, a particular behaviour serves the same functions in two or more cultural contexts.

I have been following some of the discussions on IOnet and the posts by Paul Barrett as well as the more biologically oriented personality literature. Following a few of these leads, I recently started reading some more conceptual and philosophical papers on the philosophy of measurement in psychology. More specifically, I just finished reading Joel Michell's Quantitative science and the definition of measurement in psychology and Michael D. Maraun's Measurement as a Normative Practice. These papers are superbly well written (as far as you can say that about these kinds of papers) and express quite a few of my growing concerns about psychological research in very clear terms. I started off wondering about functional equivalence, but got much bigger issues to chew on now.

Michell's main logical argument is as follows (from his very concise reply to a number of commentaries, p. 401):

Premise 1. All measurement is of quantitative attributes.
Premise 2. Quantitative attributes are distinguished from non-quantitative attributes by the possession
of additive structure.
Premise 3. The issue of whether or not any attribute possesses additive structure is an empirical one.
Conclusion 1. The issue of whether or not any attribute is measurable is an empirical one.
Premise 4. With respect to any empirical hypothesis, the scientific task is to test it relative to the evidence.
Premise 5. Quantitative psychologists have hypothesized that some psychological attributes are
Final thesis. The scientific task for quantitative psychologists is to test the hypothesis that their
hypothesized attributes are measurable (i.e. that they possess additive structure).

The  major task for psychology is to actually prove that anything that we do has a quantitative structure. Much of his review is taking to task the legacy of Fechner and especially Stevens (for those of you who ever suffered through some advanced methods classes... these names should be painfully familiar). It was an eye opener to see the larger context and the re-interpretation of stuff that I just took for granted as a student and never really questioned later on in my professional life. Fechner's legacy leading to a so-called quantitative imperative (e.g., Spearman, Cattell, Thorndike) was challenged in the early to mid-parts of the last century (the so-called Ferguson Committee), but Stevens became the most successful defender of this empiricist tradition. He argued in a representational theory of measurement that measurement is the numerical representation of empirical relations. There is a
'kind of isomorphism between (1) empirical relations among objects and events and (2) the properties of...' numerical systems (Stevens, 1951, p. 1). From this starting point he developed his theory of the four possible types of measurement scales (nominal, ordinal, interval and ratio)' (Michell, page 370). This is the foundation of any scale development in psychology. In a second argument beautifully laid out by Michell, it then becomes clear that these numerical representations due to their assumed isomorphic relations then both define the relations represented and represent them. Given this operationism, 'any rule for assigning numerals to objects or events could be taken as providing a numerical representation of at least the equivalence relation operationally defined by the rule itself.' (Michell, p. 371). 

And this loop is where we are stuck. We take a few items or questions, administer them to a bunch of people, factor analyze them to get a simple structure and voila... we have measured depression, anxiety, dominance, identity... you name it. Or take implicit measures...  you present a number of stimuli with no inherent coherent meaning and present them to individuals to measure their accuracy or reaction speed or whatever you want. Take the score and you have some measure of implicit bias, cognitive interference, etc. There is no relation between the empirical reality and the numerical representation as scores anymore. The question of whether the phenomenon of interest can be quantified has disappeared.

How does the DSM V fit in here? Well, it could be seen as just the latest installment of the same confusion. We don't know what exactly we are measuring (see for example this article on grief as a case in point).

The issue is that we need to test whether psychological constructs can actually be quantified. As simple or complex as that. As much as I agree, I can't stop scratching my head and wondering how the heck we are going to do that. How would you be able to examine whether any psychological construct (which is basically just an idea in our beautiful minds that we try to use and build some kind of professional convention around it) is actually quantifiable or not? The responses by a number of eminent psychometricians to this challenges suggested that nobody was able to come with an example to show that this has worked in a wider context within mainstream psychology.

Enter the second paper. Approaching the problem using Wittgenstein's philosophy of measurement as normative practice (comparing it to the logical structure of language), Maraun argues that measurement needs to be rule-based or normative. You need to start with a definition that then leads to a specific set of rules or norms of how to measure this particular phenomenon just defined. The definition and the set of rules are the most basic form of expression. There is nothing simpler or more basic than this. Once these norms are established, any other person should be able to arrive at a similar result, that even if based on a different metric should still be convertible (e.g., from meters to feet). In psychology in contrast, we have no rules. We have a test or an experiment that is being conducted and the results are examined against another set of empirical observations to claim that the results are valid. According the practice of measurement in physics, empirically based arguments are not relevant for claiming that something has been measured. Measuring a number of items that factor together and then correlating it with some other instrument similarly derived does not mean that anything meaningful has been measured. Observing some kind of empirical pattern in an experiment does not constitute measurement if it is then validated or compared to a different set of  empirical observations. The issue is that the concept is not sufficiently precise defined to lead to a set of rules that govern its measurement.

There a number of other points in that paper around validity, nomological networks, covariance structure and the like. Again, I keep scratching my head. These guys got a point... but how to get out of it. Maraun is very pessimistic. He argues:
Simply put, measurement requires a formalization which does not seem well suited to what Wittgenstein calls the 'messy' grammars of psychological concepts, grammars that evolved in an organic fashion through the 'grafting of language onto natural ("animal") behaviour' (Baker & Hacker, 1982). One aspect of this mismatch arises from the flexibility in the grounds of instantiation of many psychological concepts, the property that Baker and Hacker (1982) call an open-circumstance relativity (see also Gergen, Hepburn, & Comer Fisher, 1986, for a similar point). Take, for example, the concept dominance. Given the appropriate background conditions, practically any 'raw' behaviour could instantiate the concept. Hence, Joe's standing with his back to Sue could, in certain instances, be correctly conceptualized as a dominant action. On the other hand, Bob's ordering of someone to get off the phone is not a dominant action if closer scrutiny reveals the motivation for his behaviour to be a medical emergency which necessitated an immediate call for an ambulance. The possibility for the broadening of background conditions to defeat the application of a psychological concept is known as the defeasibility of criteria (Baker & Hacker, 1982). Together, open-circumstance relativity and the defeasibility of criteria suggest that psychological concepts are simply not organized around finite sets of behaviours which jointly provide necessary and sufficient conditions for their instantiation (Baker & Hacker, 1982). Yet, this is precisely the kind of formalization required if a concept is to play a role in measurement. (p. 457-458).
Maybe what we are studying is just the social construction of meanings of psychological concepts as expressed in the heads of individuals? Is this a feasible reconciliation? From a researcher perspective it might be a worthwhile endeavor (think of discourse analysts embracing factor analysis... the thought is actually quite amusing). However, this approach leaves our search for a) latent variables and b) measurement invariance completely meaningless.

The reading continues. Some random thoughts at 1am while I am writing these notes:
a) The search for quantitative latent constructs in psychology probably should (?) or could (?) start from basic biological principles. In essence, we assume that there is something 'latent' out there if we use EFA or CFA or any of the typical covariance structure tests. If there are biological mechanisms that lead to certain psychological phenomena, we can study the biological principles and their interaction with the social environment that lead to psychological realities. Then we could get around the quantification problem. Problem... what biological principles and at what level of specificity?
b) The use of covariance analyses provide simple structures of language concerning folk concepts. This may be useful and meaningful for understanding how people in a specific context interpret items or questions. It is probably more of a sociological analysis of meaning conventions than a psychological analysis. This could be useful or interesting for research purposes, but it is not quite how we commonly understand or interpret the results when we are using these kinds of techniques.

Or am I missing something? How can this measurement paradox be tackled?