Thursday, October 11, 2012

How to run a Conditional ANOVA

Today is a wee bit heavier on the stats side again. If you are interested in Differential Item Functioning and how to do it with an easy to use tool, this is for you...

Aim: Identify differential item functioning in numerical scores across groups in order to decide whether the items are unbiased and can be used for cross-cultural comparisons.

General approach: Van de Vijver and Leung (1997) describe a conditional technique which can be used if you use Likert-type scales. It uses traditional ANOVA techniques. The independent variables are (1) the groups to be compared and (2) score levels on the total score (across all items) as an indicator of the true observed or ‘latent’ trait (please note that technically it is not a latent variable). The dependent variable is the score for each individual item. Since we are using the total score (divided into score levels) as an IV, the analysis is called ‘conditional’.

Advantages of Conditional ANOVA: It can be easily run in standard programmes such as SPSS. It is simple. It highlights some key issues and principles of differential item functioning. One particular advantage is that working through these procedures, you can easily find out whether score distributions are similar or different (e.g., is an item bias analysis warranted and even possible?).

Disadvantages of Conditional ANOVA: There are many arbitrary choices in splitting variables and score groups (see below) that can make big differences. It is not very elegant. Better approaches that circumvent some of these problems and that can be implemented in SPSS and other standard programmes include Logistic Regression. Check out Bruno Zumbo’s website and manual. I will also try and put up some notes on this soon.

What do we look for? There are three effects that we look for.
First, a significant main effect of score level would indicate that individuals with low score overall also show a lower score on the respective item. This would be expected and therefore is generally not of theoretical interest (think of it as equivalent to a significant factor loading of the item on the ‘latent’ factor).
Second, a significant main effect of country or sample would indicate that scores on this item for at least one group are significantly higher or lower, independent of the true variable score. This indicates ‘uniform DIF’. (Note: this type of item bias can NOT be detected in Exploratory Factor Analysis with Procrustean Rotation).
Third, a significant interaction between country and score level on the item mean indicates that the item discriminates differently across groups. This indicates ‘non-uniform DIF’. The item is differently related to the true ‘latent’ variable across groups. For example, think of an item of extroversion. In one group (let’s say New Yorkers), ‘being the centre of attention at a cocktail party’ is a good indicator of extroversion, whereas for a group of Muslim youth from Mogadishu in Somalia it is not a relevant item of extroversion (since they are not allowed to drink alcohol and probably have never been at a cocktail party, for obvious reasons).
Note: Such biases MAY be detected through Procrustean Rotation, if examining differentially loading items.

Important: What is our criterion for deciding whether an item shows DIF or not?

Statistical Procedure:

The procedure requires in most cases at least four steps.
Step 1: Calculate the sum score of your variable. For example, if you have an extraversion scale with ten items measured on a scale from 1 to 5, you should create the total sum. This can obviously vary between 10 and 50 for any individual. Use the syntax used in class.

For example:

Compute extroversion=sum(extraversion1, extraversion2,…,extraversion10).

Step 2: You need to create score levels. You would like to group equal numbers of individuals into groups according to their overall extroversion score.
Van de Vijver and Leung (1997) recommend having at least 50 individuals per score group and sample. For example, if you have 100 individuals in each group, you can maximally form 2 groups. If you have 5,000 individuals in each of your cultural samples, you could theoretically form up to 100 score levels (well actually not, because you would have only 40 meaningful groups in this example since the difference between maximum and minimum possible score is 40). Therefore, it is up to you how many score levels you create. Having more levels will obviously allow more fine-grained analyses (you can make finer distinctions between extroversion levels in both groups) and probably more powerful (you are more likely to detect DIF). However, because you have fewer people in your analysis, it might also be less stable. Hence, there is a clear trade-off, but don’t despair. If an item is strongly biased, it should show up in your analysis independent of you have fewer or more score levels. If the bias is less severe, analyses might change across different options.

One issue is that if you have less than 50 people in each score group and cultural sample, the results might become quite unstable and you may find interactions that are hard to interpret. In any case, it important to consider both statistical significance as well as effect sizes when interpreting item bias.

A simple way of getting the desired number of equal groups is to use the rank cases option. You find this under ‘Transform’ -> ‘Rank cases’. Transfer your sum score into the variables box. Click on ‘Rank types’. First, unclick ‘Rank’ (it will rank your sample, but this is something that you do not need). Second, click on ‘Ntiles’ and specify the number of groups you want to create. For example, if you have 200 individuals, you could create 4 groups. If you have larger samples, the discussion from above applies (you have to decide about the number of levels, facing the before-mentioned trade-off in terms of power versus stability).

As discussed above, it is strongly advisable to interpret effect sizes (how big is the effect) in addition to statistical significance levels. This is particularly important if you have large sample sizes in which often minute differences can become significant. SPSS gives you partial eta-squared values routinely (if you click on ‘effect sizes’ under the ‘options’). Cohen (1988) differentiated between small  (0.01), medium (0.06), and large effect size (0.14) for eta-squared. Please note that SPSS gives you partial eta-squared values (which is the variance due to the effect, independent of the effect of other effects), whereas eta-squared does not take the other effects take into account. Partial eta-squared values are often larger than the traditional eta-squared values (overestimating the effect), but at the same time there is much to be recommended for using partial instead of traditional eta-squared values (see Pierce, Block & Aguinis, 2004, in Educational and Psychological Measurement).

Step 3:  Run your ANOVA for each item separately. The IV’s are country/sample and score level (the variable created using ranking procedures). Transfer your IV’s into the ‘Fixed Factor’ boxes. As described above, the important stuff to look out for is the significant main effect of country/sample (indicating uniform DIF) and/or the significant interaction between country/sample x score level (indicating non-uniform DIF). You can use plots produced by SPSS to identify that nature and direction of the bias (under plots, transfer your score level to the ‘horizontal axis’ and the country/sample to ‘separate lines’, click ‘add’ and then ‘continue’). Van de Vijver and Leung (box 4.3) describe a different way of plotting the results. However, the results are the same, only different way of visualising the main effect and/or interaction.
This little figure for example shows evidence of both uniform and nonuniform bias. The item is overall easier for the East German sample and it does not discriminate equally well across all score levels. Among higher score levels, it does not differentiate well for the UK sample. 

Step 4: Ideally, you would not like to have DIF. However, it is likely that you will encounter some biased items. I would run all analyses first and identify the most biased items. If all items are biased, you are in trouble (well, unless you are a cultural psychologist, in which case you rejoice and party). In this case, there is probably little you can do at this point except trying to understand the mechanisms underlying the processes (how do people understand these questions, what does this say about the culture of both groups, etc.).
If you have only a few biased items, remove them (you can either remove the item with the strongest partial eta-square or all of the DIF items in a single swoop – I would recommend the former procedure though) and recompute the sum score (step 1). Go through step 2 and 3 again to see whether your scale is working better now. You may need to repeat this analysis various times, since different items may show up as biased at each iteration of your analysis.


My factor analysis showed that one factor is not working in at least one sample: In this case, there is no point in running the conditional ANOVA with that sample included. You are interested in identifying those items that are problematic in measuring the latent score. You therefore assume that the factor is working in all groups included in the analysis.

My overall latent scores do not overlap: This will lead to situations where the latent scores are so dramatically different that you can not find score levels with at least 50 participants in each sample. In this case, your attempt to identify Differential ITEM functioning is problematic, since something else is happening. One option is to increase score levels (make the groups larger – obviously this involves a loss of sensitivity and power to detect effects). Sometimes, even this might not be possible.
At a theoretical level, it could be that you have a situation where you have generalized uniform item bias in at least one sample (for example because one group gives acquiescent answers that are consistently higher or lower). It also might indicate method bias (for example, translation problems that make all items significantly easier in one group compared to the others) or construct bias (for example, you might have tapped into some religious or cultural practices that are more common in one group than in another – in this case your items might load on the intended factor but conceptually the factor is measuring something different across cultural groups). Of course, it can also indicate a true differences. Any number of explanations (construct or method bias or substantive effects that lead to different cultural scores) could be possible.

What happens if most items are biased and only a few unbiased items remain? In this situation you run into the paradox that you can not actually determine whether your biased items are actually unbiased or unbiased items are biased. This type of analysis only functions properly if you have a small number of biased items, up to probably half the number of items in your latent variable. Once you move beyond this, it means that there is a problem with your construct. If you mainly find uniform bias, but no interactions, you can still compare correlations or patterns of scores (since your instrument most likely satisfies metric equivalence). If you have interactions, you do not satisfy metric equivalence and you may need to investigate the structure and function of your theoretical and/or operationalized construct (functional and structural equivalence). 

Any questions? Email me ;) 

Thursday, September 6, 2012

Tales from the field: The last day on the African coast

Time is definitely a rare commodity. I touched down in Capetown, South Africa on a Friday, the 13. Although it feels it was only yesterday, I have one more day and I am back on my way to Europe. It has been a wild and tumultuous two months, stopping in 4 countries and covering thousands of kilometres while having hammered away on my laptop in many random places. I plunged straight into the IACCP winter school business, met 37 bright and eager young minds and had a pretty full-on 4 days of workshop discussions, debates, laughter and a few hours out and about Stellenbosch with this new crew of cross-cultural researchers. The winter school experience was amazing, mind-blowing and exceptionally tiring, mainly the little organizational details that cropped up here and there and everywhere. But it was very cool and emotional to see the presentations of the groups at the end. Even those that had not gone as far as some other projects showed clear signs of research in progress and those intellectual struggles that good research is all about. Sometimes the process is more educational than the polished outcome. So it was a brilliant couple of days. One point though for next time: I am not sure I would have really needed the fridge – sorry, my super luxuriously prized room in the brilliantly spartanic student hostel, but hey, it was good to have a place to change your clothes…

After the winter school, there was one short day to breathe, which I spent with an excursion to the amazing cape peninsula. The weather was stunningly beautiful (as were the prices that we were charged, but only found out later for what kind of ride we were taken).  Then we went straight on with the IACCP conference. One lesson that I still not have learned is how to say ‘NO’. As a consequence of this minor learning deficiency of mine, I was giving non-stop talks most days or was in sessions to attend friends’, colleagues’ or students’ work, and still missed out on so many other interesting looking papers. I am a conference junky, I openly admit and plead happily guilty on this charge, I had a ball and intellectual feast to last me a while. But my sleep deprivation was also quite severe, so at the end of it I was a little nervous wreck. I apologise to everyone who may have thought I was a strange weirdo (well… ). All in all, the conference passed way too quickly and I wished I had more time to talk to friends and colleagues that I had not seen in a long time.

From there things got a bit more complicated. The whole itinerary of the trip was abruptly changed shortly after it started. My plan was to make it from South Africa to Kenya overland through Mozambique and Tanzania. Unfortunately, a few days into this adventure my right eye decided to develop a fairly painful inflammation. Finding an ophthalmologist in semi-rural Mozambique turned into an anthropological field study on the health care system in one of the poorest countries in the world. We finally found a doctor who had some adequate equipment. The original diagnose was scary enough for me to change my schedule and return to South Africa. When I got back there 4 days later, the local eye specialist could not find any evidence of the ulcer that was supposedly and permanently threatening my eye sight.  It was still to be treated with care, but nowhere as serious as it seemed originally. I was devastated. I do not like to lose, especially not against your own body. The medical retreat to Kenya via airplane and a necessary stopover in Dar El Salaam were psychologically painful.

Instead of exploring one of the last frontiers of Africa, I pretty immediately got my nose into the work we were supposed to be doing here in Kenya. Yet, things never quite work out as you had planned them. In addition to all the usual chaos in places like this, we also had some minor political upheavals (luckily I was always a bit far from the action) and the teacher strike right now made a big cross through our field work plans. Instead I was given some sessions on research on field staff and students and beyond that was stuck behind my laptop crunching tons of health related data that we had collected already. Once you get beyond the abstractness of numbers and start connecting the immaterial statistical facts to the images of poverty and inequality that surround you on a daily basis, it can become a pretty heart-breaking and depressing reality. The colourful cloths of women carrying huge loads on their head and little kids on their back, the exotic mud houses with their dark single inner room and picturesque little villages wane when you get closer.

Poverty is only beautiful from a distance.

I am not sure how much of this reality is really being noticed by the European and North American tourists in their flash resorts. At some stage, I could not stand anymore some of the comments and conversations I overheard from fellow Western travellers when I had breakfast at some of the hotels I was staying at (yes, I am damn lucky, this time I was mainly staying in pretty amazing and flash places – thanks to sugar mammi Amina ;). This disparity between what you see outside the bubble of western consumerism in the villages and the streets of the Kenyan coast and what you perceive when being at your tourist hotel, with limited excursions into the harsh reality through well-organized and expensive safaris was probably equally depressing.

So it was a lot of work, full on days and long nights, but with widely ranging schedules. Often I would start working around 8am and only finish at 10 or 11pm, sometimes it could also be 2am before the computer was switched off. Internet was mainly through USB modems and access could be described as patchy at best. I was amazed though how smooth the skype connection worked for the Postgrad Committee meeting. There would be breaks for lunch or dinner (well, first it was Ramadan, so food and lunch breaks were fairly random and hard to come by). Sometimes we would go to a fancy hotel (thanks Jackie), more often though in one of the few local eateries. It was really funny to see how quickly you adapt, a colleague from Zambia is just visiting and he was commenting on the things that I do not notice anymore, how dirty some of these places were, with lots of flies and mosquitos and a dark, crammed atmosphere. A funny incidence happened at lunch today, when a big mama sat down behind me and squeezed me into the table that I was sitting at. As this opulent woman also had a very animated eating and conversation style, my chair also bounced up and down with each wiggle of her impressive body. As I was firmly squeezed into the chair and pressed against the table, I moved in unison with my chair directed by that lady. The extraction process – this is me trying to get up and out – was equally funny. The waitress had to intervene and help the lady move so that I could get out.

A trip full of experiences, smells, colours, impressions and learnings. Some of the things I learned are pretty random, like that you should avoid crossing land borders after night fall, even if it is friendly territory. We again had to oil the wheels of informal markets dressed in neat uniforms – well, this is actually a bit of a lie, the main recipients of unaccounted amounts of cash were well-fed plain clothes officers in nice leather jackets with big plastic ID cards hanging around their neck. Or that most of the staple on the coast (cassava, pineapple, maize) were all introduced from South America by the Portuguese. Or that the East African coast had sophisticated civilizations and extensive trade connections all the way to India and China when Europeans had only a vague idea what was behind the horizon of the Mediterranean. African ivory was a prized possession in China and was carved into the most exquisite sculptures. Chinese porcelain needed African raw materials. In turn, the traders and town folk among the Swahili were sipping their spice tea from elegant Chinese tea cups.

Globalization was long here before we knew it…

I also learned a new appreciation for washing machines and street lights. The former because it would not force me to wash my clothes every second day or so by hand. I got some good instructions from one of the cleaning ladies and demonstrated them by publicly washing my underwear, to the amusement of the department. The latter because it is pitch black without the moon as a street light substitute. Try walking on a path when you are not a cat or forgot to bring your magic battery-powered torch. A few days ago I wandered straight into a half metre deep hole and nose dived into the sand. Just at that moment a guy had to pass and witness my blind struggles with the African soil. Speaking of moving on roads, I will definitely miss the traffic here and its fluid interpretation of a few basic road rules. I have the perception that people consensually agree to drive on the left (being obedient ex-colonial subjects). Speed limits are not a major worry because of the state of the roads – as a digression, I actually wonder whether speed bumps are the more regular inverse of  random pot holes. But this mathematical law may only apply to the rare occurrence of paved roads. -  However, beyond that it is a country of infinite freedom of driving. This is refreshing after our experiences in Mozambique where we had to contribute to African development (a.k.a police officers’ salaries) on an unpleasantly regular basis. But this state of ultimate freedom (and lack of well maintained roads) also meant for example that an excursion along the Kenyan coast that was supposed to last about 30min turned into a 4 hour odyssey, with the car stuck in sand, random smooth talking politicians hitching rides and us ditching the car in the end because it got stuck on some hug slab of coral stone in the middle of the well-worn sand track.

I should be packing now, but instead are procrastinating writing down these random observations as I am trying to squeeze too much stuff into my little back. Looking back to two eventful and busy months, I wish I had slept less, gone out more and met more people. But then hey… can somebody genetically engineer my body to not feel tired? We got lots of projects done, but still there is so much more to do.

I will miss the amazing tropical fruit, the heat, the sun and the fresh breeze off the Indian Ocean. I will miss the sleepy hectic, the tranquil everyday chaos of cars, motorbikes and people, smells, noises and shy gazes towards that strange muzungu. But I also look forward to seeing familiar friends and family. Next stop is Germany, will be nice to see my folks and friends I have not seen in ages. Will be good to have a clean table, clean cloths, home-cooked food and a clean house with no power cuts and no dust and sand covering your body in the evening.

I think I need a proper holiday… But work is not stopping, I need to finish our book with Peter, Viv and Michael and need to write that chapter for the Advances series. Also look forward to the talks and workshop in Bremen and seeing Diana and Katja and their new academic environment.

May the adventure continue….

Saturday, July 28, 2012

Pinker, evolution and a lot of confusion

Steven Pinker recently created a bit of a debate when he attacked group selection models in an essay entitled: THE FALSE ALLURE OF GROUP SELECTION
His argument is probably best summarized in his final words:
The idea of Group Selection has a superficial appeal because humans are indisputably adapted to group living and because some groups are indisputably larger, longer-lived, and more influential than others. This makes it easy to conclude that properties of human groups, or properties of the human mind, have been shaped by a process that is akin to natural selection acting on genes. Despite this allure, I have argued that the concept of Group Selection has no useful role to play in psychology or social science. It refers to too many things, most of which are not alternatives to the theory of gene-level selection but loose allusions to the importance of groups in human evolution. And when the concept is made more precise, it is torn by a dilemma. If it is meant to explain the cultural traits of successful groups, it adds nothing to conventional history and makes no precise use of the actual mechanism of natural selection. But if it is meant to explain the psychology of individuals, particularly an inclination for unconditional self-sacrifice to benefit a group of nonrelatives, it is dubious both in theory (since it is hard to see how it could evolve given the built-in advantage of protecting the self and one's kin) and in practice (since there is no evidence that humans have such a trait).
It is a thought provoking but superficial critique of an important issue, namely, at what level does selection operate. I just want to highlight three issues that I particularly grappled with, but if interested, you can find lots more here.

My first problem is the very narrow definition of variables. His definition of natural selection only allows genes as carriers of information, therefore, by definition any other source that may influence selection is already ruled out. Once he defined selection like this, there is no further point arguing. Case closed and many interesting social phenomena and their evolution remain unexplained.

The second problem is the variables that he is considering. The main focus is on altruism at the individual level. Of course, if you want to explain altruistic behaviour of individuals, natural selection is without doubt important. However, group selection works on different variables and at a different level. Group selection concerns the survival of groups over a long period. What is important is how information within a group is passed on that allows the survival of group (and the survival of the actual traits). I am traveling in Africa right now and I would be hard pressed to survive by my genetic information alone if I have no cultural tools that help me survive. Of course, I have modern technology that helps me navigate the hostile environment, but once my car runs out of fuel I am done and my credit card is not going to help me find water or food in the desert. Kalahari bushmen or Masai have very different technologies that has helped them survive for thousands of years. Jared Diamond in Collapse provides excellent examples of how cultural technology is important for group survival and how changing environmental conditions can push groups over the edge. Peter Richerson and Robert Boyd in Not by Genes Alone provide a more scientific account of these cultural processes. What is important here is not that it is a historical analysis, but that these accounts use formal models to understand what groups with what technologies are more likely to survive.

Finally, the description of natural selection used by Pinker is simplistic and does not reflect the emerging complexities of biological processes. The area that interests me most is that of epigenetics: cultural and social variables interact with biological mechanisms, such epigenetic processes switch genes on and off in ways that conflict with classic genetic ideas and the whole idea of a simple translation of genetic information to phenotypes is not tenable anymore. To understand natural selection of complex social traits, looking at genes alone is not cutting the cake.

I really enjoyed reading the essay and the many comments around it. Yet, the argument actually strengthened my conviction that we need to study group processes to better understand human evolution. The learning continues...

Sunday, May 27, 2012

How do computers change cultures?

In a study to be published soon in Social Psychology, Nina Hansen and colleagues examined whether and how giving laptops to children in developing countries changes their cultural beliefs and values. It is a fascinating and important study because it looks at how Western technology affects cultural systems in very basic but profound ways. They gave a group of Ethiopian school children laptops. One year later, they did a follow up and compared those children whose laptop was still functioning with a control group as well as a group of students whose laptop had broken down. They note that after this year the children with functioning laptops had become more independent and endorsed more individualistic values. This change was apparently mediated by a change in self-construal, that is the children had changed their values because of changes in how they saw themselves as individuals. The authors also stated that traditional cultural beliefs and norms were not changed, that is collectivistic values and interdependent self-construals were not affected by laptop use. A fascinating finding, but I believe only a first step towards understanding how technology use affects traditional cultural systems.

Here are some of my questions that appeared while reading their article.
They proposed three different mechanisms of how technology affects culture change:
1. Operation of a modern laptop requires a set of complicated actions which children would have to master completely independently of their elders (a self-efficacy explanation)
2. Social usage of technology can change social relations and self-perceptions, eg., through changes in interaction patterns (email) and well-being. This was one mechanisms that I did not quite understood and a bit more explanation of how this may play out in an African context would have been great.
3. Even a cheap and basic laptop of the kind given to children is an immensely valuable object by local standards, increasing their local status vis-à-vis their elders and peers (a social status explanation).

For me, these mechanisms are actually quite important and could have been tested more directly and effectively. The implications of these mechanisms are certainly quite different from a pragmatic and practical perspective (think of interventions and education programmes).

One important aspect for future studies is a more comprehensive test of these mechanisms. For example, the first mechanism requires a change in cognitive structures and abilities. Therefore, a test of cognitive abilities and knowledge pre-post would have been very useful. This mechanism fits in with a number of sociological theories that argue that cultural change happens via changes in values that then lead to different cognitive strategies. Alternatively, you could argue that greater knowledge and self-efficacy in operating technology provides access to different perspectives which then allows children to develop new ideas and values. It would have been a fascinating opportunity to test some of these bigger theories and ideas in a realistic experiment like this one.

The third mechanisms could have been tested directly as their measure of independent values included both self-direction values - which capture self-directed values focused on exploring and experimenting with new ideas as well as power and achievement values - values focused on dominating others, concerns with status and one's position in the social hierarchy according to social norms and expectations. Hence, it would have been very easy to test whether the status explanation is a plausible explanation or not. I think it is important to differentiate whether these changes are driven by knowledge and self-efficacy or by status processes. If the former is true, than providing schools with laptops and sharing resources is a viable mechanism for empowerment, whereas this would not be effective if it was driven by status mechanisms.

Some of their reported data also did not quite gel with the overall message. The usage pattern suggested increasing sharing of the laptop – this suggests more collectivistic behaviour rather than more individualistic behaviour in the classical sense. Hence, the more individualistic changes were accompanied by more collectivistic behaviours!

The design is also problematic in some respects. For example, the criteria for selection of students into the two groups is not discussed. Was allocation to groups random? Were the two groups matched in any other important characteristic?
The authors used single items to measure self-construals. On one hand it is difficult to use standard scales in traditional non-Western communities. On the other hand, given the unknown properties of such Western imposed items, it is hard to understand what the item actually measured and validity and reliability can not be assessed.
It was also curious to see that this item of self-construals only mediated effects of condition on independent values in the laptop condition. In the no laptop condition, there was no correlation between self-construal and values. This may indicate validity problems with the item. Alternatively, it could also mean that our Western findings of relatively close links between the self and values are in fact culturally mediated - that is these links are formed by the use of our cultural products.

Overall, a great field study, but it is not quite answering some of the important questions that are posed in the article.

Hansen, N., Postmes, T., van der Vinne, N., & van Thiel, W. (in press). Information and communication technology and cultural change. How ICT usage changes self-construal and values. Social Psychology.

Monday, April 16, 2012

Applied Cross-Cultural Psychology: Some ideas for a meaningful science

I just spent the last 72 hours in 3 different countries. Lots of random thoughts raced through my mind while spending time in small eateries, big airports and on roads wide and narrow. How can cross-cultural research contribute to the development and well-being of societies? What are the tools that psychologists interested in culture can use to inform politicians and political decision-making? How can we make cross-cultural relevant to everyday actions and events, considering the massive challenges that humanity faces through globalization, climate change and increasing interdependencies at a global level?

I think there are three different paths that may address these broad questions of policy relevance and societal development. For lack of better words, I will call them culturally sensitive understanding, culturally sensitive change and culturally sensitive evaluation of change. In other words: a) an examination of processes that are of societal importance and relevance, b) development and application of culturally sensitive change programs and c) a culture-sensitive evaluation of existing intervention programs so that the needs of communities are better met. Engaging with bigger questions and practical problems entailed in these three approaches can help sharpening our basic research questions and theories as well as contributing to understanding and managing global issues.

Culturally sensitive understanding of societal level problems

The first option is a focus on a better understanding of psychological processes related to important societal outcomes. There are many debates about how society can be made more humane, healthy and prosperous. What are the psychological processes that are associated with these outcomes? Here, the strength of cross-cultural psychology is the quasi-experimental nature of culture. Societies differ along a number of important outcomes and potential antecedents, cross-cultural psychologists can take these variabilities and study what variables are most likely implicated in the different outcomes across societies. An open, but critical mind about potential antecedents about potential contributing factors is important. Once certain variables have been identified as potentially important, more controlled experiments to test the causality may be conducted. Not all variables can be manipulated in experimental settings (just think of the difficulty of manipulating national histories or seasonal patterns). This option is probably closest to standard psychological research. The main difference is a closer alignment between scientific research topics and questions of practical and societal relevance. 

My own focus has been more along the multi-country, sociological level of inquiry. One example is the work by Seini O'Connor. Corruption and political transparency has been on the minds of politicians, philosophers and political scientists for millennia. One of the major unaddressed questions though is what variables might be implicated in changes of corruption levels over time. There are many theories and ideas of what makes societies more or less transparent. Seini's honours project addressed these ideas through an innovative longitudinal method and found some pretty surprising findings (see, the actual study can be found here:

Implementing culturally sensitive change programs

Second, cross-cultural psychologists can engage in developing and running culturally sensitive interventions that address practical problems. Psychologists interested in culture have been relatively successful in developing and running intercultural training programs. At the same time, programs that focus on developing and changing behaviours of individuals and groups have largely been left to general psychologists or other disciplines (e.g., developmental workers, economists, sociologists, political scientists). Only few programs have taken a culturally sensitive approach when trying to change behaviours (for a cool example, have a look at this project: There is much scope for innovative and important work to be done.

Evaluating interventions in culturally sensitive ways

Third, cross-cultural psychologists could get involved more in assessing existing change programs as they are applied and implemented in diverse cultures around the world. For example, micro-crediting – that is the provision of small loans to individuals or groups - has been used in many disadvantaged communities to fight poverty and contribute to economic growth. Yet, we know relatively little about the effectiveness of these initiatives, especially about how they fit in with the larger cultural norms, beliefs and practices. One of the interesting studies in this regard was reported in a study in Science last year ( . Karlan and colleagues demonstrated that micro-crediting in the Philippines led to down-sizing of enterprises and higher stress among recipients, which is contrary to common expectations about the effectiveness of micro-crediting. This study was conducted by economists who have little interest in examining the cultural (or even psychological) processes. Cross-cultural psychologists could significantly contribute to such research and help in evaluating programs so that they better meet the needs of the communities.

Saturday, April 7, 2012

Tales from the field: the dawn of day 3

It is a refreshing morning, the birds are chirping in the trees, a gentle breeze is playing with the banana leaves and the village roosters are advertising the blood red sun over the sea. 

Today is the day that turned yesterday into a strange and nearly frustrating experience. The local world does not play by the rules of the minds of the Western educated, science-oriented aliens that descended upon this little island to study their strange customs. A pre-test a few days ago revealed that the main measure is likely to be contaminated – a beautiful word for saying that somebody had worked out what the main dependent measure of the field study was and is likely to have instructed people how to answer it. A major debacle for the motley crew of international researchers hoping to study a fascinating religious ritual, with the high hopes to help humanity understand why engaging in seemingly insane and dangerous things (think of getting pierced, walking 4 to 6 hours in the tropical heat to finish off the day with a nice stroll over some gentle burning fire – who in their right Western mind would want to do something like this?). 

However, the one thing that should have sealed the study, the brilliantly devised and simple variable to measure how truly connected people feel to their religion and their religious fellows may not work anymore. The frustration turned into a heated debate about behavioural economics, a field of science that most villagers probably will never encounter in their whole life. Hours passed debating the pros and cons of games with the appealing names like dictator or prisoner dilemma game. 

It is fascinating to see the research work and weeks of preparation descend into an abyss of confusion, personal convictions, Western bias and scientific despair. One thing that I am wondering is, we don’t understand what these economic games are measuring with well-educated Western participants, despite nearly a century of research. What will it show us in a group that has problems understanding our humble attempts to ask them ‘how do you feel right now’? It makes me wonder how some famous studies published (like the famous series of studies by Joseph Henrich and others, see managed to explain complex games that take a page to describe in their widely cited publications to nomadic hunter and gatherer groups in the African bush. The appeal of our measure was its elegant simplicity and meaningfulness in a local community context. Yet, it might have been too easy and too transparent for the smart minds of some local people.

Now it is the dawn of day 3. A new day and a gentle breeze that calms the jetlag and insomnia. The debate was settled in the end late last night over some dinner and beer, we are going to use a similarly simple design, focusing on an unknown local entity, a potential Mead’esque faux pax, but the best that can be done within the time constraints of the study and better than other measures. It will be an exciting study nonetheless. 

The meeting last night hammering out the details, nine curious minds bent on making it work, 70 heart rate monitors to be connected to people participating in the ritual, a pre-post design with control groups and a multi-method design to study a fascinating ritual. And best of all – despite over 12 hours of tormenting debates and tiring preparations – the sun is shining, it is nice and warm and the sea is just meters away. 

And most importantly, it will be a fascinating day following new won local friends in their religious quests. The true beauty of field work. 

Monday, April 2, 2012

How to do Procrustean Factor Rotation with more than 2 groups

Today, I am continuing the torture with a bit more detail on options for comparing factor loadings across three or more groups within SPSS. This is a crucial issue for cross-cultural research and is becoming increasingly important, because researchers start studying more than two groups. More complex designs are more powerful in uncovering processes that can explain emerging behavioural differences, so this research should be strongly encouraged!

Aim: Compare the factor structure when you have more than two cultural groups, get an estimate of factor similarity

Why are we concerned with Procrustean Rotation? Factor rotation is arbitrary, therefore apparently dissimilar factor structures might be more similar than we think; procrustean rotation is necessary to judge structural and metric equivalence

Statistical Procedure:

The same syntax as for the two group case (see previous post: can be run with SPSS, but the greater number of countries adds additional problems. You have various options:

  1. Run all pairwise comparisons. However, this will lead to a substantive number of comparisons (especially if you have many samples). This also leads to a number of statistical problems (remember family-wise error rate and increased Type I errors)
  2. Select one country as your target group. For example, if an instrument was developed in the US, you may want to compare each group to the US.
  3. Compute the average correlation matrix and use it for your factor analysis. The average is sometimes called pooled-within matrix. Therefore, you would compare each sample with the average structure across all samples (this can be done via discriminant function analysis in SPSS, you can then read the resulting correlation matrix into spss and use as an input for your factor analysis - see my discussion of how to do this here). This is highly appealing if you have many samples. This procedure of computing the average correlation matrix as input to the factor analysis can be simplified if (a) you have samples with similar sample size (no sample is dominating others; eg., if you have one sample of 10,000 and three samples of 50 participants each, the large sample is driving the factor structure) and (b) you mean centre each item within each sample prior to the overall factor analysis. This is necessary to account for any group mean differences that might obscure relationships if the samples are pooled. See below for a graphical explanation of why this might be a problem. As you can see, the relationship within each sample is negative, more sleep problems within each sample are associated with less laughter by participants. However, one group is consistently higher, for both the reported sleep problems as well as laughing. There may be reasons of why this is the case (I will come back to this example when talking about multilevel analysis), but for our analysis, combining the two samples would mean that we have a positive relationship across both samples combined (compared to negative relationships within both samples separately). This effect is due to the mean differences across both groups (I will post something soon on the beautiful complexity of these multi-level problems in psychology - very fascinating stuff). As a consequence of this confounding of group differences with individual differences, we need to take any such mean differences into account before we can combine the samples. This can easily be done using the z-transformation option in SPSS (‘Save standardized values as variables’ under the ‘Analysis’ -> ‘Descriptives’ option). 

I believe the last option is the most appealing with large data sets.

 However, cross-cultural psych never stops to be complicated. What happens if you find that some samples show good factor congruence with the average factor structure and others not? Ideally, you would exclude those samples from the average factor structure and re-run the analysis. Proceed iteratively till no sample shows any problems with factor similarity anymore.
If you have lots of cultural samples, you are really curious (and stats savvy) and want to find out what is happening in the strange worlds of culture, you may want to run cluster analysis on the congruence coefficients to identify clusters of samples that show greater similarity with each other. This might provide some interesting insights from a cross-cultural perspective. However, it is computationally demanding and relies on purely statistical criteria. There is a neat paper discussing various options and strategies, written by Welkenhuysen-Gybels and van de Vijver (2001, published in the Proceedings of the Annual Meeting of the American Statistical Association – I think this gives you an idea about what level of analysis we are talking about[1]). You can also download a SAS macro (the link is in the paper) that does much of the computational work for you. I have never worked with SAS, it seems a parallel universe to me and I am fascinated, but scared of it. But there are people who think it is easy. Conceptually, it is a nice tool.  

[1] You can download the paper at:

Wednesday, March 28, 2012

How to do Procrustean Factor Rotation

Procrustean Factor Rotation
 Today, it is a little bit less light-hearted, but hopefully a bit more practical. 

Aim: To make factor structures maximally comparable & provide a statistical estimate of factor similarity

Why are we concerned with Procrustean Rotation? Factor rotation is arbitrary, therefore apparently dissimilar factor structures might be more similar than we think; procrustean rotation is necessary to judge structural and metric equivalence

Statistical Procedure:

 A SPSS routine to carry out target rotation needs to be run (adapted from van de Vijver & Leung, 1997)

The following routine can be used to carry out a target rotation and evaluate the similarity between the original and the target-rotated factor loadings. One cultural group is being assigned as the source and the second group is the target group. The varimax rotated (or unrotated) factor loadings for at least two factors obtained in two groups need to be inserted. The loadings need to be inserted, separated by commas and each line is ended with a semicolon. The last line is not to end with a semicolon, but with a ‘}’. Failure to pay attention to this will result in an error message and no rotation will be carried out. To use an example, Fischer and Smith (2006) measured self-reported extra-role behaviour in British and East German samples. Extra-role behaviour is related to citizenship behaviour, voluntary and discretationary behaviour that goes beyond what is expected of employees, but helps the larger organization to survive and prosper. These items were supposed to measure a more passive component (factor 1) and a more proactive component (factor 2). The selection of the target solution is arbitrary, in this case we rotated the East German data towards the UK matrix. 

Table 1. Items and varimax-rotated loadings in each sample separately



Factor 1
Factor 2
Factor 1
Factor 2
I am always punctual.
I do not take extra breaks.
I follow work rules and instructions with extreme care.
I never take long lunches or breaks.
I search for causes for something that did not function properly.
I often motivate others to express their ideas and opinions.
During the last year I changed something. in my work....
I encourage others to speak up at meetings.
I continuously try to submit suggestions to improve my work.

This can not be done using the windows interface within SPSS. You should run a factor analysis in each sample separately first. Use Varimax (orthogonal) rotation.  Then insert the loadings in the loadings and norm matrices in the SPSS syntax described in Fischer and Fontaine (2011, in Matsumoto and Van de Vijver’s Cross-Cultural Research Methods in Psychology). I can also email this syntax to you (contact me at
The start of the syntax is printed below. Be careful to separate the loadings by a ‘,’ and the last loading for each item needs to be followed by ‘;’. The last loading should be indicated by }.

compute LOADINGS={
.778,    -.066;  
.875,    .081;   
.751,    .079;   
.739,    .092;   
.195,    .574;   
-.030,   .807;   
-.135,   .717;   
.125,    .738;   
.060,    .691     }.

compute       NORMs = {
.783,    -.163;  
.811,    .202;   
.724,    .209;   
.850,    .064;   
-.031,   .592;   
-.028,   .723;   
.388,    .434;   
.141,    .808;   
.215,    .709}.

Output and Interpretation:

The edited output for this example is shown below. It shows the rotated matrix of the group (East Germany in our case) that was rotated to maximal similarity:

Run MATRIX procedure:

   .77  -.10
   .88   .04
   .75   .05
   .74   .06
   .22   .57
   .00   .81
  -.10   .72
   .16   .73
   .09   .69

  -.01   .06
   .07  -.16
   .03  -.16
  -.11   .00
   .25  -.03
   .03   .08
  -.49   .29
   .02  -.08
  -.13  -.02

Square Root of the Mean Squared Difference per Variable (Item)

Square Root of the Mean Squared Difference per Factor
   .19   .13

   .94   .97

   .86   .92

   .94   .97

   .86   .93

------ END MATRIX -----

The output shows the factor loadings following rotation, the difference in loadings between the original structure and the rotated structure as well as the differences of each loading squared and then averaged across all factors (square root of the mean squared difference per variable column).
The first matrix could be pasted in a new table, showing the rotated loadings (instead of using the loadings from the original analysis as reported above in the table). The second matrix shows the differences after rotation. You should look for large values, because they indicate that some items are problematic. A low value would indicate good correspondence.
The column of values entitled: Square Root of the Mean Squared Difference per Variable (Item) gives you information about each item. The larger the value, the more problematic is an individual item. The next row (Square Root of the Mean Squared Difference per Factor) shows the same information per factor. Again, smaller values are better, larger values indicate trouble for a particular factor. There are no hard and fast criteria for any of these indices above, you should look at the relative values and particular discrepant values.
The most important information is reported in the last four lines, namely the various agreement coefficients. As can be seen there, the values are all above .85 and generally are beyond the commonly accepted value of .90. The most common indicator is Tucker’s Phi which is called Proportionality coefficient here.
It is also worth noting the first factor shows lower congruence and that the estimate vary across indicators. An examination of the differences between the loadings shows that one item (During the last year I changed something. in my work....) in particular shows somewhat different loadings. In the British sample, it loads moderately on both factors, whereas it loads highly on the proactivity factor in the German sample. Therefore, among the British participants making some changes in their workplace is a relatively routine and passive task, whereas for German participants this is a behaviour that is associated more with proactivity and initiative (e.g., Frese et al., 1996). We might want to exclude this item and re-run the analyses. Overall, we could cautiously conclude that our scales meet structural equivalence and most items might even meet metric equivalence (although this syntax routine does not provide a statistical test for this higher level of equivalence). 

Good on ya... if you made it to this point ; ) Hope your eyes are looking slightly better than that of a Tarsier...