Monday, May 1, 2017

Cross-cultural structural invariance testing: How to run the procrustean factor rotation magic in R

It has been a while since I last posted some stats related material. Today I am getting back to this amazing topic and focus on how we can compare factor structures across cultural samples. I have done this previously with SPSS. Today I am focusing on R, which is way cooler.

In cross-cultural psychology, we often use factor analysis (or principal component analysis) to examine the factor structure of an instrument. But how can we tell whether the factors that we find are comparable? And how similar are they to each other? In order to do this, we need to make the factor structures maximally comparable with each other and then get an overall estimate of factor similarity. This is what Procrustean Rotation and indices such as Tucker's Phi are all about.

You may ask: Why do we need rotations which such weird Greek mythological names (if you wonder about the history of the name, look up the  mighty evil rogue Procrustes  on google)? The problem is that simply speaking any factor rotation is arbitrary and there are infinite possible solutions that can be mathematically fitted to any factor structure. Which means that there is a good chance that sample specific fluctuations will make factors look quite different. Apparently dissimilar factor structures might be more similar than we think; procrustean rotation is necessary to judge how similar they are.

Hence, I will cover the magic of how to do this in R, a free and awesome statistics program. Assuming that you are new to R, I will cover the basics of how to set your path and get your data in. If you know what you are doing, you can skip forward to the latter section.

Step 1. Set your working directory

You need to set a working directory. This step is important because it will allow you to call your data file later on repeatedly without listing the whole path of where it is saved. For example, I saved the file that I am working with on my USB drive.
I need to type this command:


If I had saved all the data on my dropbox folder in a folder called 'Stats' that is in my 'PDF' folder, then I would need to type this command:


Two important points:
a) for some strange reason you need double \\ to set your directory paths with windows. You could also use / instead of \\ (e.g., setwd("C:/Users/Ron/Dropbox/PDF/Stats")).  This is just to confuse you... But R is still awesome.

b) make sure that there are no spaces in any of your file or directory paths. R does not like it and will throw a tantrum if you have a space somewhere.

Step 2. Read your data into R

The most convenient way to read data into R is using .csv files. Any programme like SPSS or Excel will allow you to save your data as a .csv file.

You need to type:
ocb=read.csv("ocb_efa.csv", header=TRUE)

R is an object oriented language, which means we will constantly create objects by calling on functions: object <- function. This may seem weird at first, but will allow you to do lots of cool stuff in a very efficient way.

I am using a data that tested an organizational citizenship behavior scale, so I am calling my object that contains the data 'ocb'. Just as a bit of background, I am using data from Fischer and Smith (2006). They measured self-reported work behaviour in British and East German samples, which they called extra-role behaviour. Extra-role behavior is pretty much the same as citizenship behaviour, voluntary and discretationary behaviour that goes beyond what is expected of employees, but helps the larger organization to survive and prosper. These items were supposed to measure a more passive component (factor 1) and a more proactive component (factor 2). We will need this info on the expected factors below...

The command header=TRUE (or you could make it sure and just type T) tells R that the variable names are included.

Step 3. Preparing your data (dealing with missing data, checking your data, etc.)

R does not like missing data. We will need to define which values are missing. I previously coded all missing data as -999 in SPSS or EXCEl. Now I have to declare that these annoying -999s should be treated as missing values.

If you type:
You will see that the minimum value is -999. The simplest and straightforward option is to define the missing values is to write this short command that converts all these offending values into NA - the R form of missing data.


Note the square brackets and double ==. If you want to treat only a selected variable, you could write:

ocb$ocb1[ocb$ocb1==-999] <- NA

This tells R that you want only the the first variable in the dataframe ocb to be treated in this way.

To check that all worked well, type:


You should see something like this:

If all went well, now your minimum and maximum values are within the bounds of your original data and you have a row of NA's a the bottom of each variable column.

As you can see, we have a variable called country with 1's and 2's. This is not that useful, because last time I checked, these are not good names for countries and might be a bit confusing.

The best option is to convert this variable in what is called a factor in R (don't confuse it with factor analysis). Basically, it becomes a dummy variable and we can give it labels. In my case, I have data from British and German employees, so I am using UK and German as labels.

You can type:
ocb$country<-factor(ocb$country,   #specifies the variable to be recoded
                    levels = c(1,2), #specifies the numeric values
                    labels = c("UK", "German")) #specifies the labels assigned to each numeric value

If you wonder, the # allows me to add annotations to each command line, that tell me (and you) what is going on, but R is ignoring these sections. 

If you type, summary(ocb) again, you should now see that there 130 responses from the UK and 184 from Germany. 

There is one more thing we need to do. In our analyses, we want to compare the factor analysis results of the two samples. Therefore, we need to create two data sets for each sample that include only the variables that we need for our factor analysis. This can be achieved with the subset command, which creates a new object with only the data that we need for each analysis. At the same time, we can also use this command to select only the relevant variables for our factor analysis. 

To create the UK data set, you can type: 

ocb.UK<-subset(ocb, #creates a new data frame using the original ocb data frame
               country=="UK", #this is the variable that is used for subsetting, note the double ==
               select=c(2:10)) #we only need the continuous variables which were in column 2 to 10

To see whether it worked, type: 
summary(ocb.UK) #check that it worked
nrow(ocb.UK) #check that it worked, this command will give you the number of rows

Then repeat the procedure to create the German data set:
                   select=c(-1)) #if you wonder, this is an alternative way of selecting the variables, by dropping the first column which had the country dummy factor

To check, you know the drill (summary or nrow). 

Step 4. Installing and loading the analysis packages for your analysis

R is a very powerful tool because it is constantly expanding. Researchers from around the world are uploading tools and packages that allow you to run fancy new stats all the time. However, the base installation of R does not include them. So we need to tell R which packages we want to use.
For the type of measurement invariance tests that I am talking about today, we will need these two: psych (written by William Revelle, an amazing package, check out some of the awesome stuff can do with this package here) and GPArotation.

Write this code to download and install the packages on your machine:

install.packages(c("psych", "GPArotation"))

Make sure you have good internet connectivity and you are not blocked by an institutional firewall. I had some problems recently trying to download R packages when accessing it from a university campus with a strong firewall. 

Once all packages are downloaded, you need to call them before you can run any analyses:



Important: You need to call these packages each time that you want to run some analyses, if you have restarted R or RStudio. Now we should be ready to start our analyses. 

Step 5. Run the analysis in each sample

I have used the name factor analysis so far. Technically, I am going to use principal component analysis (PCA). There is a lot of debate whether factor analysis or principal component analysis are better... I touched upon this in class, but will not repeat it here. Let's just stick with PCA for the time being and be happy. I will also continue to use the term 'factors', even though this is factually incorrect (they are principal components) and I am likely to burn in statistical hell. I am happy to brave this risk...

To run the PCA, we need to type a short command line. Let's break it down. is the name that I gave the new object that R will create. The name is pretty much up to you, I called it pca (because I am running a PCA) with 2 factors (hence 2f) based on the British data (voila, this is what uk stands for). The command 'principal' tells R what to do:  run a principal component analysis. After the open brackets, I first specify the data object (ocb.UK), then how many factors I want to extract (nfactors=2), followed by the type of rotation (I decided to go with varimax rotation, which is a form of orthogonal rotation that assumes independence of factors). So this is what I write:<-principal(ocb.UK, 

If you run it, nothing will happen. We just created an object that contains the PCA results. To actually see it, we can either call all the output by typing:

Or we could sort the factor loadings by size and suppress small factor loadings (for example, factor loadings smaller than .3). To get this, write:

print.psych(, cut=0.3, sort = T)

Now you should see some output like this:
As you can see, the first item loaded on both factors. However, overall there seems to be a pretty neat two-factor structure.

Now you need to do the same thing for the German data set. This is not rocket science and I hope you would have come up with the same code like this:

print.psych(pca_2f.german, cut=0.3, sort = T)

The output looks like this:
The first item loads much more clearly on factor 2 in this German data set compared to the British data set. But what can say about this difference? We can't really compare to the two factor results, because there might be arbitrary changes due to sample fluctuations or other funny jazz (this is a highly technical term).  Now we get to the crux of this whole issue, because we need to do Procrustean rotation. Procrustean rotation (have you looked up Procrustes yet?) does what the name says, it rotates and fits one solution to the other, making them directly comparable.
Before we get there, take a deep breath and have a look at this picture...

Feeling more relaxed and calmer now? Let's move on to the real stuff!

Step 5. Run the Procrustean rotation 

For those of you who have done the procrustean rotation stuff in SPSS (for a reminder, have a look here), you might have braced yourself for a massive typing exercise with lots of random error messages and annoying missing commas, semi-colons and winged brackets. Fear not - R is making it much easier.

To run the actual procrustean rotation, we need to type one little command line. To break it down again, we create a new object that contains our rotated factor loadings. I called it ''. We tell R what to do (run a Target Rotation... hence, called 'TargetQ'), specify what factor loadings we want to rotate and what we want it to rotate it to - our target. I used the German sample as the target. This is a pretty arbitrary choice, but I decided to use it because a) the German sample is larger and b) the German sample had a slightly cleaner initial structure. 

Here is the command:<-TargetQ($loadings, Target=list(pca_2f.german$loadings))

If we now call the object (just type the name of the object), we should see something like this:
The first first item still does show up as loading on both factors, but the loading on the first factor is somewhat reduced. We could now start a bit of a tea leaf reading exercise and look at all the little changes that have happened after rotation. This can be informative and if you have your own data sets, this is probably a good thing to do. Yet, these impressions do not allow us to get a sense of how statistically similar the two factor solutions are. Do these differences matter? 

Hence, the final step for today... We need to calculate the overall similarity.

Step 6. Compute Factor Congruence Coefficients

There are a number of different ways to calculate factor congruence or factor similarity. The most common one is Tucker's Phi. You can read up more about it in a chapter that I have written together with Johnny Fontaine. Send me a message if you want a copy. 

To get Tucker's Phi, we again have to write a single command line. The command is simple: 'factor.congruence' and all we need to specify is which loadings from what analyses we want to analyze. In our case, we want to compare the original German factor loadings with the procrustean rotated British loadings. Hence, we write:


We will see a 2 x 2 matrix, which has Tucker's Phi on the diagonal. As you should see, the similarity for factor 1 is .94 and for factor 2 is .97. If you compare it with the standards that we discuss in the book chapter, this is pretty good similarity. The small changes that we see across the two samples do not matter that much. 

If you want another indicator, we could compute the correlation between the two factor structures. This again is relatively straightforward. Without creating a new object, we could just type (note that we use the same structure as for the factor.congruence statement):


The correlation matrix shows us on the diagonal that the correlation for factor 1 is .87 and for factor 2 is .93. Therefore, the correlation coefficient suggests that factor 2 is pretty similar. However, factor 1 is not doing that great. Maybe item 1 is a big dodgy after all. 

As we discuss in the chapter, it can be useful to compare the different indices. If they agree - you are sweat and you can happily go your way comparing the factor structures. If they diverge (as they do a wee bit in this case), you may want to explore further. In our case, it might make sense to remove the first item and redo the analyses. If we do this and re-run all the steps after excluding ocb1 (see the subsetting command at step 3), we will find the two structures are now beautifully similar. Nearly like identical twins... Who would have thought that of ze Germans and ze Brits...

I hope you have enjoyed this little excursion into R and procrustean rotation. I am a big fan of the capabilities of R and what you can do with it for cross-cultural analyses. I hope I got you inspired too.
Any questions or comments, please get in touch and comment :)

Now... rotate and relax :)

Thursday, February 11, 2016

Culture & Psychology Summer School in Nagoya, Japan

We are super-excited to announce all the details for the next IACCP Culture & Psychology Summer School. The Culture & Psychology School is open to students at PhD and MSc level and is sponsored by the International Association for Cross-Cultural Psychology (IACCP). The goal is to provide specialized training by experts in topics of importance and relevance for studying psychology and culture in context.  In addition to its educational benefits, the programme is designed to facilitate cross-cultural contact and understanding among future academic leaders and to broaden their academic vision.  We really look forward in bringing bright minds from all corners of the world together and help them develop new research ideas and collaborations. The Culture & Psychology School is run in association with the 23rd IACCP conference to take place in Nagoya, Japan. 
A project presentation during the 2012 Stellenbosch Culture & Psychology School

There have been numerous changes based on the feedback and suggestions that we received after the last one in Reims, France. We are super-excited about the line-up and new programme. Here is an overview of the new programme, the stream leaders and content.

The new programme

We have received a lot of feedback and we have remodeled how we plan to run it. The major difference this year is that we have more methods. You will be able to choose both a content stream and a method stream. On the first day after some introduction and overview, you will be working with people in your content stream. On the second day, you can then choose one of three method streams and work hands-on under the guidance of experts. On the third day, you rejoin your content stream and you will integrate your new methods learned to your content area of study.

To make this work, we will expect that you do some prep work before coming to Nagoya. The stream leaders will provide some reading lists and tasks for you to complete before you arrive in Nagoya to get everyone up to speed with basics. Think of it as a mini-online course to help you get familiar with some of the material to make the most of your learning experience. We will facilitate this as best as possible and we are confident that you will enjoy this opportunity to interact online with your stream colleagues.  

Here are the stream leaders and their content and methods sections:

Cristine Legare

Cristine is a cognitive scientist who studies the ontogeny of cultural learning. She examines the interplay of the universal human mind and the variations of human culture to address questions about cognitive and cultural evolution. Her research and training reflect her commitment to an interdisciplinary approach to the study of cognitive development. Cristine draws on insights from cognitive, cultural, developmental, educational, and evolutionary psychology as well as cognitive and evolutionary anthropology and philosophy, with the aim of facilitating cross-fertilization within and across these disciplines. Her website can be found here

Content Stream: Cognition in Cultural Context

Human cross-cultural variation is unique among all animals in both its extent and structural complexity. Cultural variability is one of our species’ most distinctive features, yet the vast majority of psychological research continues to examine a population that is unrepresentative of human culture globally and historically—those from Western, Educated, Industrialized, Rich, and Democratic (WEIRD) backgrounds. Cristine proposes that cultural diversity is inextricably tied to childhood. The human capacity for cultural variability within and between groups must be ontologically prepared by a set of characteristics that enable, structure, and stabilize group-specific cultural information much beyond anything that has been observed in other primates. She will discuss how the capacity to learn, create, and transmit culture increases our understanding of the cognitive and cultural evolution of our species. Cristine will describe how my experimental and ethnographic research integrates theory and methodology from cognitive and evolutionary anthropology, psychology, and philosophy to examine the co-construction of cognition and culture. She will also provide an overview of research conducted at field sites in southern Africa, the U.S., Brazil, and Vanuatu (a Melanesian archipelago).

Methods Stream: From Ethnography to Experiments and Back Again

Cristine conducts mixed-methodological, cross-cultural research to examine cognition in context. She will discuss how she “mines ethnography” to inform her experimental research and the ways in which experimental research can be used to test hypotheses about the cognitive psychological underpinnings of cultural beliefs and practices. Cristine will also discuss best practice for elevating the state-of-the-science in cross-cultural research as well as strategies for publishing interdisciplinary research.

Matt Easterbrook

Matt is part of the Social and Applied Psychology Research Group at the University of Sussex, UK, where he researches and lectures on the psychology of inequality.  His research investigates how selves and identities are influenced by different social structures, cultural orientations, and group memberships, and the consequences of these things for personal well-being, trust, motivation, and socio-political outcomes.  His research often uses multilevel and longitudinal study designs and advanced statistical analyses to investigate these issues. His website is here.

Content stream: The self and social inequality 

Against a backdrop of unprecedented and rising levels of inequality across the world, this stream will cover contemporary social psychological theories of inequality and social class.  We will begin by reviewing the broad consequences of inequality for nations and individuals, before discussing the pivotal role of the self as the explanatory nexus between structural inequality and individual characteristics.

Method stream: Multilevel modelling

This stream will begin with a discussion of the research designs that give rise to multilevel data, and why multilevel modelling of nested data is important and useful.  We will then cover how to manage and set up multilevel data in SPSS, and how to import, run and understand different multilevel models using HLM.  

Nicolas Geeraert

Nicolas is a senior lecturer in Psychology at the University of Essex (UK). He trained as a social psychologist (PhD, 2004, Louvain-la-Neuve, Belgium) looking at stereotypes and attribution theory. His current research interests are in cross-cultural psychology and acculturation. He has extensive experience in conducting longitudinal projects which he has used in a number of acculturation projects. His website is here.

This is Nicolas one or two weeks ago ;)

Content stream: Acculturation & intercultural contact

Intercultural contact leads to challenges and changes. This stream will explore how acculturation unfolds as a process requiring the acculturating individual to copes with cultural stressors and cognitively organize their heritage and settlement cultures. We will discuss how acculturation takes place within the ecological context of families, institutions, and society.

Methods stream: Longitudinal methods

Longitudinal research is an increasingly popular tool for cross-cultural researchers. This workshop will explore the strengths and advantages of longitudinal research and how to practical set-up a longitudinal study in terms of design, participant management, data preparation, etc. Different methods to analyse longitudinal data will also be discussed. 

Endorsement by Previous Participants

Ceren Gunsoy

I attended the IACCP summer school in Reims, France, as a third year PhD student at Iowa State University. It was a great experience! Not only the topic stream that I was part of but also the talks, discussions, and social activities were very informative, thought-provoking, and fun. On top of that I met great people and am still in touch with them. I strongly recommend this program!

Colin Scott

The 2014 Summer School in Reims, France, was a fantastic opportunity to connect with researchers interested in a range of topics in cross-cultural psychology.  Seminars with leaders in the field offered a hands-on opportunity to build new collaborations while getting critical and constructive feedback from faculty and students on our own work.


The cost for the summer school will be 200 Euro for participants from high-income countries (as per IACCP fee structure) and 150 Euro for participants from low income countries. The fee includes accommodation, welcome dinner, lunches and coffee breaks. This is pretty damn good value for a three full day workshop with world leaders in the field of psychology and culture, providing you with cutting edge skills and material. 

The Schedule

March 20: Deadline for applications
April 3: Decisions on applications
April 4: Work in online study groups commences
June 20: Submitting initial research ideas to stream leaders

July 26- 30: Culture & Psychology School in Nagoya Hill
July 26 evening – arrival at Nagoya hill station, general welcome and get to know each other
July 27– Introduction & work in content streams
July 28– work in methods streams
July 29 morning – we bring content and methods back together, discussion of research ideas and plans
July 29 afternoon – sharing experiences by stream leaders on how to publish cultural research
July 30 – transfer to Nagoya.

July 30 – Aug 3: IACCP conference in Nagoya

Accommodation & Logistics

The school will take place in a mountain retreat two hours away from Nagoya. It will be a fascinating cultural experience since we are going to be in a small community off the beaten track in a more traditional Japanese environment. All sleeping places are shared and we will sleep on futons. Please bring your own toiletries and towels be prepared to share a communal space. There will be NO towels available and only shared shower facilities. It is a traditional setting and we only have one hour per day in the evening when you can take a shower. I realize that this may be unusual or inconvenient, but I really hope you will be able to use this as a cultural learning experience and enjoy this cultural challenge. 

We have your dietary requirements and will try our best to accommodate them. The food will be in traditional Bento style. Feel free to buy additional food of your preference on the way, there is some (limited) refrigerator space where it can be stored. Here is the link to the place (use google translate to get it in English).

Application and Further Info

The application form is now available. Check here for any updates. A poster to share with colleagues and friends is herePrint it and spread the word! If you have any questions about the programme, the stream leaders or the general procedure, please do not hesitate to contact me or Yasin.  

Happy to answer your questions and look forward to seeing you all in Nagoya in a few months!

Monday, March 2, 2015

A gentle intro to cross-cultural equivalence - or how can we measure across cultures?

Psychology is the study of human behaviour and mental processes through scientific methods. The claim of psychology is often to be universal, that is applicable to all of humanity. Using scientific methods, we psychologists rely on a systematic and objective process of proposing and testing hypotheses and making predictions about the state of human nature.  Ever since the beginning of psychology as an academic discipline, the scientific quest to quantify natural occurrences to better understand and predict them in the future became one of the ultimate goals. Of course, this requires often extensive qualitative research, but ultimately the hope was and is that we can understand a behaviour or mental process so precisely that we can quantitatively measure it and also change it.

The application of such quantitative methods are now often taken for granted, even though the levels of quantification may vary. For example, we may want to select the most able person for a particular job, refer a child with learning problems to a specialist or we may wish to help a person with mental health problems to fully function in society again. Even though all these problems can be phrased in qualitative terms (a good person for the job, a child that has problems learning, a person who is not well), these are essentially quantitative problems because they always have some reference to implicit or explicit standards. A person might be BETTER qualified than another to take up a job or a person may have GREATER problems understanding concepts or material than 75% of the children of her age. Therefore, in many day-to-day situations we make implicit and intuitive quantitative statements.

If we want to make quantitative statements about a scientific concept, we run into one of the central problems in psychology. This is namely WHAT do we want to make a comparison about? Or in other words, how do we define a psychological construct so that we can measure it? A geographer, chemist or physicist is unlikely to phase the problems that psychologists have… after all, we can easily measure distances (e.g., how far is Auckland from Wellington), we have ways of dating the age of a piece of rock or we can measure the energy of particles when we collide them at the near speed of light. Psychologists on the other hand are dealing with intangible concepts that are difficult to specify. Most of you are familiar with concepts such as intelligence, attitudes, personality traits, depression or identity. However, if we were to ask you to pinpoint any of these concepts in the real world, we would be unable to do so. Our psychological terminology refer to unobserved mental constructs that we create in our community of fellow psychologists to indicate a particular set of problems, describe a particular set of behaviours or mental representations. I would argue that underlying many of these psychological terms are assumptions about relative coherence, stability, generalizability and potentially even some general biological foundations that lead to the emergence of such a syndrome. Therefore, we don’t just invent these terms on a whim, but we think that there is something meaningful to them that we think is important enough to look into and tell other people about.

Therefore, the first issue in any psychological study, even though it may not seem obvious anymore, is to clearly and unambiguously define and specify what we want to study. What is our construct or process of interest? It is at this point, that culture will throw the first curve ball at any psychologist attempting to address this question. How can we make sure that our definition or mental construct of our psychological term or process is actually valid or does have some meaning in another cultural context? How does our upbringing in a highly developed Western society influence how we think about psychological constructs? Can we assume that identity is a concept that is meaningful in a village in the lowland Amazon basin? Is our definition of depression applicable to refugees coming from Syria or Iraq? Is conscientiousness a useful term to screen out applicants for jobs in an international organization? Therefore, the first problem in any psychological study is to unambiguously define and describe the psychological process for all the populations that we are interested in. We could think of this as a mental bubble that we draw around some problem or process. Does this bubble ‘exist’ in all the different cultures that we want to include in our study? How can we find out whether this bubble is meaningful and has some value or relevance for all the local populations? We will discuss this as the question of functional equivalence.

If we are confident that there is some value to this mental bubble of ours (let’s say, depression, personality or identity) and that the terms are meaningful in two or more cultures, then we need to find good indicators for it. In psychological terms, this is called operationalization. How can we empirically say that one person has more of this latent category quality that we just created with our mental bubble compared to another person? What would be a good indicator to tell us that one person is better for a job compared to another person or that one person is a better learner than another, who in turn may need some help? Here again, culture will throw lots of beautiful little challenges at us. We need to find indicators that are meaningful and relevant in each cultural context, but obviously we would still need to be able to compare the results across contexts. Therefore, we can’t have indicators that are relevant and meaningful in each context, but cannot be compared across cultures. We want to aim for some level of comparability. For example, is staying late at your desk a good indicator of being conscientious? Or could it be seen as being disorganized and incompetent? What if people are unfamiliar with office jobs? Is the number of items that you circled the temple this morning before going to work a better indicator of your conscientiousness? Is the ability to track animals over long distances and varied terrain a good indicator of concentration?  Or should we give people lots of d’s and b’s and p’s and q’s and then ask them to count how many p and q’s were together in each line? Should we measure intelligence by asking people to name as many types of medicinal plans for diarrhoea? Or give them complex questions about history and philosophy? This problem of identifying good measurement indicators will be called structural equivalence. Obviously, how we define and how we operationalize a construct is very much dependent on each other. For this reason, some researchers lump the two terms together as construct equivalence. For reasons that we will discuss later, I prefer to keep them separate.

So, we now have a mental bubble and we have a number of indicators that give us some clue about the latent bubble. However, we don’t actually know how good each of these indicators is in representing that latent bubble. We need to find a way to show us how well each indicator works in each of our cultures. In other words, is the same indicator better in capturing a key aspect of our construct in one culture compared to another? For example, is going to parties and having lots of friends a good indicator of extraversion? Is having many wives a good indicator of social status in all cultures? Is staying late at work to finish a good indicator in all cultures for high conscientiousness? This problems is called metric equivalence. It is the question about the relative strength of the indicator-latent variable relationship. In technical terms, we are concerned with the equivalence of factor loadings or item slopes in classic test theory or the item discriminability in item response theory.

Finally, we may be convinced that our indicators work equally well in all contexts. Each questionnaire or test items is really giving us a good and reliable insight into the construct. But there may be still problems. Some items, even though they have the same relationship with the latent construct in all cultures, may still be a bit more difficult or easier in one context compared to another.  If I would ask you to name the capital of Benin, most of you would probably struggle finding the correct answer. Benin is a country that is quite far from our thoughts and most of us will never set foot in this place or may not have heard about it in the media. However, if I would ask you about the capital city of one of your neighbouring countries, you would probably quite easily be able to name it. Therefore, asking about the capital of Benin would be easier for somebody living in Togo or Nigeria compared to somebody living in NZ or Denmark. This is the issue of full score or scalar equivalence. Technically, we would look at the invariance of item intercepts (in a multi-group CFA) or the differential item difficulty (in IRT).

In summary, measuring psychological attributes or processes across cultural contexts is quite difficult. I gave some relatively superficial and easy examples to make this a relatively non-technical and easy intro to the problem. We need to define our construct – draw our mental bubble around what we want to study. The first step in any cultural study then is to make sure that this construct or mental bubble is meaningful and functional in all cultures that we want to study. Once we think this is the case, we need to find good indicators that are observable and give us some insight into the position or state of an individual in relation to our mental bubble. We then need to discuss whether the indicators are equally good in all contexts or whether some are better in telling us something about a person or process in one cultural context compared to another. Finally, we need to find out whether all indicators are equally easy or difficult. Only once we have fulfilled this last criterion can we actually make any comparisons between individuals or groups across cultures. This is a tough task and unfortunately, most studies that you will see in the literature do fall well short of it. But this is the challenge that we really need to meet in order to develop a meaningful and universal psychological science. 

Sunday, December 21, 2014

Cross-cultural measurement invariance testing in R in 5 simple steps

Here I am presenting a quick crash course to run invariance tests for cross-cultural research in R. R is a free programme and has an expanding list of awesome features that should be of interest to people doing cross-cultural work. 

I am working with RStudio, but there are other options for running R. Here are the basic steps to get you started and run a cross-cultural equivalence test. 

1. Set your working directory

This step is important because it will allow you to call your data file later on repeatedly without listing the whole path of where it is saved. For example, I saved the file that I am working with on my dropbox folder in a folder called 'Stats' that is in my 'PDF' folder. 
I need to type this command:


Two important points: 
a) for some strange reason you need double \\ to set your directory paths with windows.
b) make sure that there are no spaces in any of your file or directory paths. R does not like it and will throw a tantrum if you have a space somewhere. 

2. Read your data into R

The most convenient way to read data into R is using .csv files. Any programme like SPSS or Excel will allow you to save your data as a .csv file. There are a few more things that we need to discuss about saving your data, but I will discuss this below. 

justice <- read.csv("justice.csv", header = TRUE)

R is an object oriented language, which means we will constantly create objects by calling on functions: object <- function. This may seem weird at first, but will allow you to do lots of cool stuff in a very efficient way.

I am using a data that tested a justice scale, so I am calling my object 'justice'. 

3. Deal with missing data

If you have absolutely no missing data in your data file, skip this step. However most mortal researcher souls will have some missing data in their spreadsheet. R is very temperamental with missing data and we need to tell it what missing data is and how to deal with it. Some people (including myself) who are used to SPSS typically leave missing data as a blank cell in the spreadsheet. This will create problems. The best option is to write a little syntax command in spss to recode all blank cells to a constant number. 
For example, I could write something like this in SPSS:

recode variable1 to variable12
(sysmis=9999) (else=copy) into
pj1 pj2 pj3 pj4 ij1 ij2 ij3 ij4 dj1 dj2 dj3 dj4. 

Now I can save those new variables as a .csv file and read into R using the step above.

Once you have executed step 2, you need to define these annoying 9999s as missing values.
The simplest and straightforward option is to write this short command that converts all these offending values into NA - the R form of missing data.

justice[justice==9999] <- NA

Note the square brackets and double ==. If you want to treat only a selected variable, you could write:

justice$pj1[justice$pj1==9999] <- NA

This tells R that you want only the pj1 variable in the dataframe justice to be treated in this way. 

To check that all worked well, type:


You should see something like this:

If all went well, now your minimum and maximum values are within the bounds of your original data and you have a row of NA's a the bottom of each variable column.

4. Loading the analysis packages

R is a very powerful tool because it is constantly expanding. Researchers from around the world are uploading tools and packages that allow you to run fancy new stats all the time. However, the base installation of R does not include them. So we need to tell R which packages we want to use.
For measurement invariance tests, these three are particularly useful: lavaan (the key one) and semTools (important for the invariance tests).

Write this code to download and install the packages on your machine:

install.packages(c("semTools", "lavaan"))

Make sure you have good internet connectivity and you are not blocked by an institutional firewall. I had some problems recently trying to download R packages when accessing it from a university campus with a strong firewall. 

Once all packages are downloaded, you need to call them before you can run any analyses:



Important: You need to call these packages each time that you want to run some analyses, if you have restarted R or RStudio.

5. Running the analyses

R is an object oriented language, as I mentioned before. The analysis can be executed in a couple of steps. First we need to specify the model that we run by creating a new object that contains all the information. Then we tell R what to do with that model. Finally, we have various options for obtaining the results, that is the fit indices and parameter estimates as well as other diagnostic information. 

Let's create the model first:

This creates the model that we can then work with. The '=~' denotes that the items are loading on the latent factor. This is what it looks like:

The next set of commands sets specifies what should be done with the model. In the case of a simple CFA, we can call this function: <-cfa(cfa.justice, data=justice)

To identify the model, lavaan sets the loading of the first item on each latent variable to 1. This is convenient, but may be problematic if the item is not a good indicator. An alternative strategy is to set the variance of the latent variable to 1. This can be done by adding to the fit statement. <-cfa(cfa.justice, data=justice,

This statement runs the analysis, but we still need to request the output.

The simplest way is to use summary again. Here is an option that prints both the fit indices and the standardized parameters.

summary(, fit.measures= TRUE, standardized=TRUE)

Here is a truncated and annotated version of the output:

lavaan (0.5-17) converged normally after  39 iterations
                                                                  Used       Total
  Number of observations                          2518        2634
# The following shows the estimator and the Chi square stats. As can be seen, we have 51 df's, but the model does not fit that well.
  Estimator                                                  ML 
Minimum Function Test Statistic              686.657 
Degrees of freedom                                   51 
P-value (Chi-square)                                 0.000

Model test baseline model:
  Minimum Function Test Statistic            20601.993 
Degrees of freedom                                 66 
P-value                                                     0.000
#The incremental fit indices are in contrast quite good. The CFI and TLI should be ideally be above .95 (or at least .90). So this model does look good.
User model versus baseline model:
  Comparative Fit Index (CFI)                    0.969 
Tucker-Lewis Index (TLI)                       0.960

Loglikelihood and Information Criteria:
  Loglikelihood user model (H0)             -44312.277 
Loglikelihood unrestricted model (H1)     -43968.949
  Number of free parameters                         27
#The AIC and BIC are useful for comparing models, especially non-nested models. Not the case right now.
  Akaike (AIC)                               88678.555 
Bayesian (BIC)                             88835.998 
Sample-size adjusted Bayesian (BIC)        88750.212
#The RMSEA should be small. Values smaller than .08 are deemed acceptable, below .05 are good. We are doing ok-ish with this one here.
Root Mean Square Error of Approximation:
  RMSEA                                          0.070 
90 Percent Confidence Interval          0.066  0.075  
P-value RMSEA <= 0.05                          0.000
#Another useful lack of fit index. The SRMR should be below .05 if possible. This is looking good.
Standardized Root Mean Square Residual:
  SRMR                                           0.037
#Now we have a print out of all the parameter estimates, including the standardized loadings, variances and  covariances. Here it is important to check whether loadings are relatively even and strong, and that the variances and covariances are reasonable (e.g., we want to avoid very high correlations between latent variables). It is looking ok overall. Item ij4 may need some careful attention.
Parameter estimates:
  Information             Expected  Standard Errors                             Standard
                   Estimate  Std.err  Z-value  P(>|z|)  Std.all
Latent variables: 
pj =~    pj1   1.000                                           1.202    0.789   
pj2               1.009    0.027   38.005    0.000    1.213    0.790   
pj3               0.769    0.025   31.003    0.000    0.924    0.643   
pj4               0.802    0.024   33.244    0.000    0.963    0.687 
ij =~    ij1   1.000                                             1.256    0.879   
ij2               1.064    0.014   75.039    0.000    1.335    0.957   
ij3               1.015    0.015   68.090    0.000    1.274    0.910   
ij4               0.808    0.022   37.412    0.000    1.014    0.644 
dj =~    dj1 1.000                                              1.211    0.821   
dj2               1.013    0.019   51.973    0.000    1.227    0.871   
dj3               1.056    0.020   52.509    0.000    1.279    0.877   
dj4               1.014    0.021   48.848    0.000    1.228    0.834
pj ~~    ij    0.828    0.041   20.435    0.000    0.549    0.549   
dj                0.817    0.040   20.187    0.000    0.562    0.562 
ij ~~    dj    0.711    0.037   19.005    0.000    0.467    0.467
pj1               0.874    0.036                      0.874    0.377   
pj2               0.889    0.036                      0.889    0.377 
pj3               1.213    0.039                      1.213    0.587   
pj4               1.040    0.035                      1.040    0.528   
ij1               0.464    0.016                      0.464    0.227   
ij2               0.164    0.011                      0.164    0.084   
ij3               0.336    0.013                      0.336    0.172   
ij4               1.448    0.042                      1.448    0.585   
dj1               0.709    0.025                      0.709    0.326   
dj2               0.479    0.019                      0.479    0.242   
dj3               0.489    0.020                      0.489    0.230   
dj4               0.662    0.024                      0.662    0.305   
pj                1.444    0.066                      1.000    1.000   
ij                1.576    0.057                      1.000    1.000   
dj                1.466    0.060                      1.000    1.000

However, we want to do an invariance analysis. Right now we collapsed the samples and ran an analysis across all groups. This can create problems, especially if the samples have different means (see my earlier blogpost for an explanation of this problem). 

The grouping variable should be a factor, that is a string variable that has the labels. You can also use continuous variables, but then you will need to remember what each number means. In this example, I have data from three samples:

> summary(justice$nation)
     Brazil        NZ         Philippines 

        794        1146         694 

It would be informative to see whether the item loadings are similar in each group. To do this, we only need to add group="nation" to our cfa statement. <-cfa(cfa.justice, data=justice, group="nation")

We can then print the results by using the summary statement again (remember that we have to call the new object for this analysis):

summary(, fit.measures= TRUE, standardized=TRUE) 

I am not printing the output.

Of course, this is not giving us the info that we want, namely whether the model really fits. In addition, we could ask for equal loadings, intercepts, unique variances, etc. I can't go into details about the theory and importance of each of these parameters. I hope to find some time soon to describe this. In the meantime, have a look at this earlier article. 

In R, running these analyses is really straightforward and easy. A single command line will give us all the relevant stats. Pretty amazing!!!!!

To run a full-blown invariance analysis, all you need is to type this simple command:


You can write it as a single line. I just put it on separate lines to show what it actually entails. First, we call the model that we specified above, then we link it to the data that we want to analyze. After that, we specify the grouping variable (nation). The final line requests strict invariance, that is we want to get estimates for a model where loadings, intercepts and unique variances are constrained as well as a model in which we constrain the latent means to be equal. If we don't specify the last line, we will not get the constraints in unique variances. 

Here is the output, but without the strict invariance lines:

Measurement invariance tests:
Model 1: configural invariance:
    chisq        df    pvalue       cfi     rmsea       bic
  992.438   153.000     0.000     0.962     0.081 86921.364
Model 2: weak invariance (equal loadings):
    chisq        df    pvalue       cfi     rmsea       bic
 1094.436   171.000     0.000     0.958     0.080 86882.400
[Model 1 versus model 2]
  delta.chisq      delta.df delta.p.value     delta.cfi
      101.998        18.000         0.000         0.004
Model 3: strong invariance (equal loadings + intercepts):
    chisq        df    pvalue       cfi     rmsea       bic
 1253.943   189.000     0.000     0.952     0.082 86900.945
[Model 1 versus model 3]
  delta.chisq      delta.df delta.p.value     delta.cfi
      261.505        36.000         0.000         0.010
[Model 2 versus model 3]
  delta.chisq      delta.df delta.p.value     delta.cfi
      159.507        18.000         0.000         0.006
Model 4: equal loadings + intercepts + means:
    chisq        df    pvalue       cfi     rmsea       bic
 1467.119   195.000     0.000     0.942     0.088 87067.134
[Model 1 versus model 4]
  delta.chisq      delta.df delta.p.value     delta.cfi
      474.681        42.000         0.000         0.020
[Model 3 versus model 4]
  delta.chisq      delta.df delta.p.value     delta.cfi
      213.176         6.000         0.000         0.009

How do we make sense of this?

Model 1 is the most lenient model, no constraints are imposed on the model and separate CFA's are estimated in each group. The CFI is pretty decent. The RMSEA is borderline. 
Model 2 constraints the factor loadings to be equal. The CFI is still pretty decent, the RMSEA actually improves slightly. This is due to the fact that we have now more df's and RMSEA punishes models with lots of free parameters. The important info comes in the line entitled Model 1 versus model 2. Here we find the difference stats. The X2 difference test is significant and we would need to reject model 2 as significant worse. However, due to the problems with the X2 difference test, many researchers treat this index with caution and examine other fit indices. One commonly examined fit index of the difference is Delta CFI, that is the difference in CFI fit from one model to the next. It should not be larger than .01. In our case, it is borderline - the delta CFI is .01.
We can then compare the other models. The next model constraints both loadings and intercepts (strong invariance). The model fit is pretty decent, we can probably assume that both loadings and intercepts are invariant across these three groups. 
In contrast, constraining the latent means shows some larger problems. The latent means are likely to be different. 

6. Further statistics 

In this particular case, the model fits pretty well. However, often we run into problems. If there is misfit, we either trim the parameter (drop parameters or variables from the model) or we can add parameters. To see which parameters would be useful to add, we can request modification indices. 
This can be done using this command:
mi <- modificationIndices(

The second line (mi) will print the modification indices. It gives you the expected drop in X2 as well as what the parameter estimates would be like if they were freed.

If we want to print only those modification indices above a certain threshold, let's say 10, we could add the following line:
subset (mi, mi>10)

This will give us modification indices for the overall model. If we want to see modification indices for any of the constrained models, we can request them after estimating the respective model.

For example, if we want to see the modification indices after constraining the loadings to be similar, we can run the following line:

metric <-cfa(cfa.justice, 

This will now give us the modification indices for this particular model. 

There are more options for running constrained models. For example, this line gives the scalar invariant model: 

scalar <-cfa(cfa1,
                   group.equal=c("loadings", "intercepts"))

As you can see, these models are replicating the models implied in the overall analysis that we got with the measurementInvariance command above. 


I hope I have convinced you that measurement invariance in R using lavaan and semTools is a piece of cake. It is an awesome resource, allows you to run lots of models in no time whatsoever and of course it is free!!!!! Once you get into R, you can do even more fancy stuff and run everything from simple stats to complex SEM and ML models in a single programme. 

More info on lavaan can be found here (including a pdf tutorial). 

I am still in the process of learning how to navigate this awesome programme. If you have some suggestions for simplifying any of the steps or if you spot some mistakes or have any other suggestions... please get in touch and let me know :)
If there are some issues that are unclear or confusing, let me know too and I will try and clarify!
Look forward to hearing from you and hope you find this useful!