Sunday, December 21, 2014

Cross-cultural measurement invariance testing in R in 5 simple steps

Here I am presenting a quick crash course to run invariance tests for cross-cultural research in R. R is a free programme and has an expanding list of awesome features that should be of interest to people doing cross-cultural work. 

I am working with RStudio, but there are other options for running R. Here are the basic steps to get you started and run a cross-cultural equivalence test. 

1. Set your working directory

This step is important because it will allow you to call your data file later on repeatedly without listing the whole path of where it is saved. For example, I saved the file that I am working with on my dropbox folder in a folder called 'Stats' that is in my 'PDF' folder. 
I need to type this command:


Two important points: 
a) for some strange reason you need double \\ to set your directory paths with windows.
b) make sure that there are no spaces in any of your file or directory paths. R does not like it and will throw a tantrum if you have a space somewhere. 

2. Read your data into R

The most convenient way to read data into R is using .csv files. Any programme like SPSS or Excel will allow you to save your data as a .csv file. There are a few more things that we need to discuss about saving your data, but I will discuss this below. 

justice <- read.csv("justice.csv", header = TRUE)

R is an object oriented language, which means we will constantly create objects by calling on functions: object <- function. This may seem weird at first, but will allow you to do lots of cool stuff in a very efficient way.

I am using a data that tested a justice scale, so I am calling my object 'justice'. 

3. Deal with missing data

If you have absolutely no missing data in your data file, skip this step. However most mortal researcher souls will have some missing data in their spreadsheet. R is very temperamental with missing data and we need to tell it what missing data is and how to deal with it. Some people (including myself) who are used to SPSS typically leave missing data as a blank cell in the spreadsheet. This will create problems. The best option is to write a little syntax command in spss to recode all blank cells to a constant number. 
For example, I could write something like this in SPSS:

recode variable1 to variable12
(sysmis=9999) (else=copy) into
pj1 pj2 pj3 pj4 ij1 ij2 ij3 ij4 dj1 dj2 dj3 dj4. 

Now I can save those new variables as a .csv file and read into R using the step above.

Once you have executed step 2, you need to define these annoying 9999s as missing values.
The simplest and straightforward option is to write this short command that converts all these offending values into NA - the R form of missing data.

justice[justice==9999] <- NA

Note the square brackets and double ==. If you want to treat only a selected variable, you could write:

justice$pj1[justice$pj1==9999] <- NA

This tells R that you want only the pj1 variable in the dataframe justice to be treated in this way. 

To check that all worked well, type:


You should see something like this:

If all went well, now your minimum and maximum values are within the bounds of your original data and you have a row of NA's a the bottom of each variable column.

4. Loading the analysis packages

R is a very powerful tool because it is constantly expanding. Researchers from around the world are uploading tools and packages that allow you to run fancy new stats all the time. However, the base installation of R does not include them. So we need to tell R which packages we want to use.
For measurement invariance tests, these three are particularly useful: lavaan (the key one) and semTools (important for the invariance tests).

Write this code to download and install the packages on your machine:

install.packages(c("semTools", "lavaan"))

Make sure you have good internet connectivity and you are not blocked by an institutional firewall. I had some problems recently trying to download R packages when accessing it from a university campus with a strong firewall. 

Once all packages are downloaded, you need to call them before you can run any analyses:



Important: You need to call these packages each time that you want to run some analyses, if you have restarted R or RStudio.

5. Running the analyses

R is an object oriented language, as I mentioned before. The analysis can be executed in a couple of steps. First we need to specify the model that we run by creating a new object that contains all the information. Then we tell R what to do with that model. Finally, we have various options for obtaining the results, that is the fit indices and parameter estimates as well as other diagnostic information. 

Let's create the model first:

This creates the model that we can then work with. The '=~' denotes that the items are loading on the latent factor. This is what it looks like:

The next set of commands sets specifies what should be done with the model. In the case of a simple CFA, we can call this function: <-cfa(cfa.justice, data=justice)

To identify the model, lavaan sets the loading of the first item on each latent variable to 1. This is convenient, but may be problematic if the item is not a good indicator. An alternative strategy is to set the variance of the latent variable to 1. This can be done by adding to the fit statement. <-cfa(cfa.justice, data=justice,

This statement runs the analysis, but we still need to request the output.

The simplest way is to use summary again. Here is an option that prints both the fit indices and the standardized parameters.

summary(, fit.measures= TRUE, standardized=TRUE)

Here is a truncated and annotated version of the output:

lavaan (0.5-17) converged normally after  39 iterations
                                                                  Used       Total
  Number of observations                          2518        2634
# The following shows the estimator and the Chi square stats. As can be seen, we have 51 df's, but the model does not fit that well.
  Estimator                                                  ML 
Minimum Function Test Statistic              686.657 
Degrees of freedom                                   51 
P-value (Chi-square)                                 0.000

Model test baseline model:
  Minimum Function Test Statistic            20601.993 
Degrees of freedom                                 66 
P-value                                                     0.000
#The incremental fit indices are in contrast quite good. The CFI and TLI should be ideally be above .95 (or at least .90). So this model does look good.
User model versus baseline model:
  Comparative Fit Index (CFI)                    0.969 
Tucker-Lewis Index (TLI)                       0.960

Loglikelihood and Information Criteria:
  Loglikelihood user model (H0)             -44312.277 
Loglikelihood unrestricted model (H1)     -43968.949
  Number of free parameters                         27
#The AIC and BIC are useful for comparing models, especially non-nested models. Not the case right now.
  Akaike (AIC)                               88678.555 
Bayesian (BIC)                             88835.998 
Sample-size adjusted Bayesian (BIC)        88750.212
#The RMSEA should be small. Values smaller than .08 are deemed acceptable, below .05 are good. We are doing ok-ish with this one here.
Root Mean Square Error of Approximation:
  RMSEA                                          0.070 
90 Percent Confidence Interval          0.066  0.075  
P-value RMSEA <= 0.05                          0.000
#Another useful lack of fit index. The SRMR should be below .05 if possible. This is looking good.
Standardized Root Mean Square Residual:
  SRMR                                           0.037
#Now we have a print out of all the parameter estimates, including the standardized loadings, variances and  covariances. Here it is important to check whether loadings are relatively even and strong, and that the variances and covariances are reasonable (e.g., we want to avoid very high correlations between latent variables). It is looking ok overall. Item ij4 may need some careful attention.
Parameter estimates:
  Information             Expected  Standard Errors                             Standard
                   Estimate  Std.err  Z-value  P(>|z|)  Std.all
Latent variables: 
pj =~    pj1   1.000                                           1.202    0.789   
pj2               1.009    0.027   38.005    0.000    1.213    0.790   
pj3               0.769    0.025   31.003    0.000    0.924    0.643   
pj4               0.802    0.024   33.244    0.000    0.963    0.687 
ij =~    ij1   1.000                                             1.256    0.879   
ij2               1.064    0.014   75.039    0.000    1.335    0.957   
ij3               1.015    0.015   68.090    0.000    1.274    0.910   
ij4               0.808    0.022   37.412    0.000    1.014    0.644 
dj =~    dj1 1.000                                              1.211    0.821   
dj2               1.013    0.019   51.973    0.000    1.227    0.871   
dj3               1.056    0.020   52.509    0.000    1.279    0.877   
dj4               1.014    0.021   48.848    0.000    1.228    0.834
pj ~~    ij    0.828    0.041   20.435    0.000    0.549    0.549   
dj                0.817    0.040   20.187    0.000    0.562    0.562 
ij ~~    dj    0.711    0.037   19.005    0.000    0.467    0.467
pj1               0.874    0.036                      0.874    0.377   
pj2               0.889    0.036                      0.889    0.377 
pj3               1.213    0.039                      1.213    0.587   
pj4               1.040    0.035                      1.040    0.528   
ij1               0.464    0.016                      0.464    0.227   
ij2               0.164    0.011                      0.164    0.084   
ij3               0.336    0.013                      0.336    0.172   
ij4               1.448    0.042                      1.448    0.585   
dj1               0.709    0.025                      0.709    0.326   
dj2               0.479    0.019                      0.479    0.242   
dj3               0.489    0.020                      0.489    0.230   
dj4               0.662    0.024                      0.662    0.305   
pj                1.444    0.066                      1.000    1.000   
ij                1.576    0.057                      1.000    1.000   
dj                1.466    0.060                      1.000    1.000

However, we want to do an invariance analysis. Right now we collapsed the samples and ran an analysis across all groups. This can create problems, especially if the samples have different means (see my earlier blogpost for an explanation of this problem). 

The grouping variable should be a factor, that is a string variable that has the labels. You can also use continuous variables, but then you will need to remember what each number means. In this example, I have data from three samples:

> summary(justice$nation)
     Brazil        NZ         Philippines 

        794        1146         694 

It would be informative to see whether the item loadings are similar in each group. To do this, we only need to add group="nation" to our cfa statement. <-cfa(cfa.justice, data=justice, group="nation")

We can then print the results by using the summary statement again (remember that we have to call the new object for this analysis):

summary(, fit.measures= TRUE, standardized=TRUE) 

I am not printing the output.

Of course, this is not giving us the info that we want, namely whether the model really fits. In addition, we could ask for equal loadings, intercepts, unique variances, etc. I can't go into details about the theory and importance of each of these parameters. I hope to find some time soon to describe this. In the meantime, have a look at this earlier article. 

In R, running these analyses is really straightforward and easy. A single command line will give us all the relevant stats. Pretty amazing!!!!!

To run a full-blown invariance analysis, all you need is to type this simple command:


You can write it as a single line. I just put it on separate lines to show what it actually entails. First, we call the model that we specified above, then we link it to the data that we want to analyze. After that, we specify the grouping variable (nation). The final line requests strict invariance, that is we want to get estimates for a model where loadings, intercepts and unique variances are constrained as well as a model in which we constrain the latent means to be equal. If we don't specify the last line, we will not get the constraints in unique variances. 

Here is the output, but without the strict invariance lines:

Measurement invariance tests:
Model 1: configural invariance:
    chisq        df    pvalue       cfi     rmsea       bic
  992.438   153.000     0.000     0.962     0.081 86921.364
Model 2: weak invariance (equal loadings):
    chisq        df    pvalue       cfi     rmsea       bic
 1094.436   171.000     0.000     0.958     0.080 86882.400
[Model 1 versus model 2]
  delta.chisq      delta.df delta.p.value     delta.cfi
      101.998        18.000         0.000         0.004
Model 3: strong invariance (equal loadings + intercepts):
    chisq        df    pvalue       cfi     rmsea       bic
 1253.943   189.000     0.000     0.952     0.082 86900.945
[Model 1 versus model 3]
  delta.chisq      delta.df delta.p.value     delta.cfi
      261.505        36.000         0.000         0.010
[Model 2 versus model 3]
  delta.chisq      delta.df delta.p.value     delta.cfi
      159.507        18.000         0.000         0.006
Model 4: equal loadings + intercepts + means:
    chisq        df    pvalue       cfi     rmsea       bic
 1467.119   195.000     0.000     0.942     0.088 87067.134
[Model 1 versus model 4]
  delta.chisq      delta.df delta.p.value     delta.cfi
      474.681        42.000         0.000         0.020
[Model 3 versus model 4]
  delta.chisq      delta.df delta.p.value     delta.cfi
      213.176         6.000         0.000         0.009

How do we make sense of this?

Model 1 is the most lenient model, no constraints are imposed on the model and separate CFA's are estimated in each group. The CFI is pretty decent. The RMSEA is borderline. 
Model 2 constraints the factor loadings to be equal. The CFI is still pretty decent, the RMSEA actually improves slightly. This is due to the fact that we have now more df's and RMSEA punishes models with lots of free parameters. The important info comes in the line entitled Model 1 versus model 2. Here we find the difference stats. The X2 difference test is significant and we would need to reject model 2 as significant worse. However, due to the problems with the X2 difference test, many researchers treat this index with caution and examine other fit indices. One commonly examined fit index of the difference is Delta CFI, that is the difference in CFI fit from one model to the next. It should not be larger than .01. In our case, it is borderline - the delta CFI is .01.
We can then compare the other models. The next model constraints both loadings and intercepts (strong invariance). The model fit is pretty decent, we can probably assume that both loadings and intercepts are invariant across these three groups. 
In contrast, constraining the latent means shows some larger problems. The latent means are likely to be different. 

6. Further statistics 

In this particular case, the model fits pretty well. However, often we run into problems. If there is misfit, we either trim the parameter (drop parameters or variables from the model) or we can add parameters. To see which parameters would be useful to add, we can request modification indices. 
This can be done using this command:
mi <- modificationIndices(

The second line (mi) will print the modification indices. It gives you the expected drop in X2 as well as what the parameter estimates would be like if they were freed.

If we want to print only those modification indices above a certain threshold, let's say 10, we could add the following line:
subset (mi, mi>10)

This will give us modification indices for the overall model. If we want to see modification indices for any of the constrained models, we can request them after estimating the respective model.

For example, if we want to see the modification indices after constraining the loadings to be similar, we can run the following line:

metric <-cfa(cfa.justice, 

This will now give us the modification indices for this particular model. 

There are more options for running constrained models. For example, this line gives the scalar invariant model: 

scalar <-cfa(cfa1,
                   group.equal=c("loadings", "intercepts"))

As you can see, these models are replicating the models implied in the overall analysis that we got with the measurementInvariance command above. 


I hope I have convinced you that measurement invariance in R using lavaan and semTools is a piece of cake. It is an awesome resource, allows you to run lots of models in no time whatsoever and of course it is free!!!!! Once you get into R, you can do even more fancy stuff and run everything from simple stats to complex SEM and ML models in a single programme. 

More info on lavaan can be found here (including a pdf tutorial). 

I am still in the process of learning how to navigate this awesome programme. If you have some suggestions for simplifying any of the steps or if you spot some mistakes or have any other suggestions... please get in touch and let me know :)
If there are some issues that are unclear or confusing, let me know too and I will try and clarify!
Look forward to hearing from you and hope you find this useful!

Wednesday, December 17, 2014

The beauty of black and white

Marcel Cesar, a friend of mine, nominated me for a five-day black and white challenge on FB. The task was to post one picture per day, for five days. He posted some pretty amazing ones and seeing some of those amazing pictures by him and others, I thought I would try to match their brilliance. 
It had been ages since I last took BW photos. I had been trained to do all the processing in a darkroom when I was a kid, but after I switched to digital, I hardly used BW settings or converted pictures to BW afterwards. However, when experimenting in the evenings with some shots that I liked or had taken a while ago, I started to remember what had drawn me to photography as a kid. Pictures in BW have this timeless quality, simple shots are rendered art and captured history at the same time. Simple moments that would pass in a flash suddenly feel like they carry bigger significance, being transformed into instant classics. 

I just wanted to give a bit of background to those five pictures that I had selected. For me good photography (and photos) tell stories. I want to capture a moment that appears significant or beautiful or both. I have memories and thoughts attached to most photos that I take and I want to share a bit of how I see the world. When I see photos taken by others, I also try to find the stories that they capture or the thoughts that the photographer may have had when taking the shot. We probably all see different stories unfold in these frozen moments, but this is the fun to decipher somebody else's mind and see the world through their eyes. 

This is a portrait of Gi, a great friend of mine. We always catch up with each other when I happen to pass through Brazil and we have done a few photo shots over the years. This was on a hot Saturday afternoon in Lapa, the bohemian centre of Rio de Janeiro. Lapa, even though dizzily crowded with party-goers at night and with people hurrying about their business during weekdays, is eerily empty on weekends during the heat of the midday sun. We were just crossing the sun-bleached square in front of the arcs when I snapped this picture. For me, the smile and the sun capture two of the most iconic features of Rio. 

This was the finishing moment of one of the dance performances during the 2014 Diwali celebration in Wellington. Diwali in NZ has turned into a celebration of Indian culture, with dance and music performances by Indian school and community groups, arts and craft shops and lots of food stalls. It is secular public ritual and has quite a different vibe and atmosphere compared to Diwali in India. I like this shot because of its ephemeral grace and humbleness in the gestures of the dancers. 

The sky over Kaikora was breathtaking. We were preparing for some weekend tramp in the mountains around Kaikora and walked out to this spot by the beach to enjoy the sun. Some seals were playing in the water and lots of people were out and about on this sunny autumn day. I had a Canon 10D and was playing around with different tone options at that time. Being shot in sepia, for me, this shot captures the simplicity and timeless beauty of NZ. 

This is a deeply spiritual picture. I snapped this moment at the end of the street procession during the Vegetarian Festival, a Taoist celebration in Thailand. A spirit medium (ma song) is about to awake from trance and the priest (huat kua) is performing the rites to bring the human back. The incense has a deep spiritual significance here, because the central god takes the form of incense smoke. 

This photo was originally shot in BW. It is a scene from a market on the outskirts of the old city in Damascus. I was teaching in Beirut, Lebanon in 2005 and went to Damascus for a weekend. Meandering aimlessly through the maze of 5,000 year old streets, I came out to this little square where neighbors had organized a small fair with very simple carousels and stalls. These kids happily showed me the goldfish that they had just bought. Simple happiness and pleasures. It is sad to know that these peaceful places have disappeared in the bitter brutality of a civil war. 

Wednesday, August 27, 2014

National identity vs genetics: Are haplogroups becoming the new race?

I just got back from the Africa to Aotearoa Project Presentation at the Governor General Residence in Wellington. It is part of Genographic, the National Geographic sponsored project on mapping the human ancestry. It was a fascinating evening at various different levels. For one, the lack of security going into the residence of the de-facto head of state of NZ and the legal representative of the Queen in NZ was pretty amazing. A police officer wanted to see our invitation card at the gate and then just waved us through. No security anywhere. Compare that with just a single trip to any US or UK embassy...
The Governor General gave a brief speech, followed by a great overview presentation by Lisa Matisoo-Smith, one of the NZ leaders of the project. She gave a nice and easily accessible overview of genetic diversity and the shifts that have occurred since we moved out of Africa. The glacial period and the neolithic transition (especially the invention of agriculture) were periods of major changes in our genes. As people moved around the world, further mutations occurred and it is now possible to track the genetic heritage of groups of individuals. Groups who share a common ancestor, that is they share similar mutations, are forming a so-called haplogroup. The haplogroups studied in the Genographic Project are associated with mitochondrial DNA (passed on by the mother) and the Y-chromosome (passed on by fathers). The amazing fact is that this information allows a pretty accurate placing of individuals in terms of their genetic ancestry.

Straightforward genetics and highly fascinating. The current Governor General Sir Sir Jerry Mateparae, the former Governor General Sir Anand Satyanand as well as Gisborne Mayor Meng Foon all provided their DNA and a fascinating review of their genetic ancestry was displayed. 

However, what turned the whole event into a slightly less positive light was the frequent mentioning of national identity. New Zealand is one of those places where national identity is highly contested and politically sensitive. Maori as the first settlers have been around for about 750 years, followed by the European colonization project that started about 200 years ago. Both the Governor General as well as various speakers after him referred to this project as helping to find or determine a national identity for NZ. Some reference was made to the ethnic mixing of people, after all, the haplogroups show how much mingling there has been between individuals and groups on the giant track out of Africa all the way to the end of the world in the Pacific. However, the labels that were applied to individuals - haplogroup R, M, U, etc. created little tribes of related individuals. Photographs were taken of the 'families'. 

There has been a long and controversial tradition of linking identity to race. Social and biological scientists concerned with identity have been battling the common conception that race is a biologically meaningful concept. Good news is that old school race and genetics linkages seem to be waning. But the event tonight seemed to replace this old idea of race with the more 'scientific' and empirical evidence of haplogroups. Despite all the efforts by speakers that we are all mixed, people may start identifying and separating themselves via their ancestral haplogroups. This is the slightly worrying thought for me, namely that simplified and stereotyped haplogroups become the new race in the definition of group identities. What about genetic testing in the future to determine whether you belong to us or not? What haplogroup can be a true New Zealander? 

Why do we need to link a highly fascinating project on our genetic ancestry to national identity? I thought it was a great evening, with some worrying undertones...

Sunday, August 24, 2014

The motivational basis of personality & why threat is important

How do we describe what other people are like? What are the major characteristics that we can use to describe the personality of friends and strangers? As far as we know, these questions has been discussed since the emergence of ancient civilizations in Greece, India and China. In modern psychology, starting with Gordon Allport, a more or less sharp distinction has been made between values - that is motivational goals important for people in their lifes - and personality traits - described as behavioural consistency across situations. The study of values has flourished in social psychology and the study of personality traits has been a core area of research of personality psychology, with little discussion of the overlap or convergence between the two approaches.

Diana Boer and myself were intrigued to test the similarities of these two approaches. We were inspired by newly developed neural network models of personality that conceptualized personality as expressions of basic motivational goal systems. If personality traits are expressions of motivational goal systems, we should see some systematic relationships with values. Furthermore, we were aware of a number of studies that have found quite variable associations between values and personality traits in the Big Five tradition. If there is some systematic association, as we presumed, why should there be such variability in empirical studies?

We set out to address these two broad sets of questions in a paper that is appearing in the Journal of Personality.

We collected all the studies that have reported correlations between any set of Big Five instruments with the circular value theory described by Shalom Schwartz and conducted a meta-analysis to identify the overall patterns that might be obscured in individual studies. Let us first quickly review personality traits and values. 

Overview of Personality & Values

Here is a summary of the Big Five traits from Wikipedia:
Openness to experience: (inventive/curious vs. consistent/cautious). Appreciation for art, emotion, adventure, unusual ideas, curiosity, and variety of experience. Openness reflects the degree of intellectual curiosity, creativity and a preference for novelty and variety a person has. It is also described as the extent to which a person is imaginative or independent, and depicts a personal preference for a variety of activities over a strict routine. Some disagreement remains about how to interpret the openness factor, which is sometimes called "intellect" rather than openness to experience.
Conscientiousness: (efficient/organized vs. easy-going/careless). A tendency to be organized and dependable, show self-discipline, act dutifully, aim for achievement, and prefer planned rather than spontaneous behavior.
Extraversion: (outgoing/energetic vs. solitary/reserved). Energy, positive emotions, surgency, assertiveness, sociability and the tendency to seek stimulation in the company of others, and talkativeness.
Agreeableness: (friendly/compassionate vs. analytical/detached). A tendency to be compassionate and cooperative rather than suspicious and antagonistic towards others. It is also a measure of one's trusting and helpful nature, and whether a person is generally well tempered or not.
Neuroticism: (sensitive/nervous vs. secure/confident). The tendency to experience unpleasant emotions easily, such as anger, anxiety, depression, and vulnerability. Neuroticism also refers to the degree of emotional stability and impulse control and is sometimes referred to by its low pole, "emotional stability".

Shalom Schwartz' theory of values differentiates at least 10 value types that can be organized into two major higher order dimensions. These two major dimensions are openness to change (individualistic) versus conservative (collectivistic) values on one hand and self-enhancing (dominance) versus self-transcendence (altruistic) values. Individual values can now be ordered in a circular structure along the two dimensions. Moving around the circle, Power (PO) captures the goals of striving towards social status and prestige, controlling or dominating over people and resources. Achievement (AC) emphasises personal success through socially approved standards of competence. Hedonism (HE) values focus on pleasure and sensuous gratification of the sense. Stimulation (ST) captures excitement, novelty and pursuing challenging goals in one’s life. Self-direction (SD) entails valuing independent thought and action. Universalism (UN) values refer to the motivation to understand, appreciate, tolerate and protect the welfare of all people and nature. Benevolence (BE) in contrast has a more narrow focus on preserving and enhancing the welfare of people close to oneself (family and close friends). Tradition (TR) values are focused on respecting, accepting and committing to the customs and ideas of the traditional culture and religion. Conformity (CO) refers to restraining actions or impulses that may upset or harm others and violate social expectations and norms. Finally, security (SE) emphasises values around safety, harmony and stability of society, social relationships and the self.

Values and Personality Traits are systematically linked

Our basic argument was that values as motivational goals and personality traits as behavioural consistencies should be systematically linked. Using a new method that we developed in a previous article that allows us to track the systematic relation of personality traits to the underlying structure of values (see our previous article), we were able to examine the overall relationship between the two constructs. 

Schematically, the overall association can be described like in this figure:

Agreeableness and Self-Transcendence (positively) versus Self-Enhancement (negatively) values are strongly and consistently related. Agreeable individuals are also very benevolent and they tend to care about others, close and distant alike. Similarly, Openness personality traits were strongly associated with Openness to Change (positively) versus Conservatism (negatively) values. Open individuals also strongly value stimulation and self-direction values. The correlations for these two traits with values were substantive and not significantly different from correlations between different personality instruments measuring the same trait. Hence, for these two personality traits, the relationship with values is strong and values and personality are highly convergent. 
Extraversion is somewhat more weakly related to Openness values, but also weakly to Self-Enhancement values (achievement and power values). Conscientiousness is related to Conservatism values, but also shows some correlations with Self-Enhancement values (in particular achievement values). These two personality traits correlated significantly weaker with values and there were sometimes notable secondary associations of traits with the other main axis of values. This suggests that these two traits are more complex in their basic motivational structure. 
Finally, Neuroticism is a stability oriented personality trait and no surprisingly shows some weak associations with values that ensure stability (Conservatism values). 

This was a really re-assuring and strong pattern overall. By using a holistic approach to values and personality across a large number of studies, we were able to show the overall systematic relationships between these two constructs. 

Threat undermines the value-personality trait relationship

But moving to the second main question, how can we make sense of the variations in these correlations across different studies? 
Here we need to first briefly consider how values and personality traits might be causally related. First, values may provide the motivational structure for humans, that is then expressed in behaviour (personality traits). Assuming this logic, people who value conformity will follow rules and orders with great care. Values come first and actions follow. Second and alternatively, following classic self-perception theory, people might engage in behaviours in relatively consistent ways, which they then interpret in terms of overarching goals and re-interpret as their stable values. E.g., I am always conscientious and follow rules and orders in a consistent way, therefore, I am probably a person who values conformity and tradition.  Here, actions come first and values are inferred from these actions in a secondary step. 

We thought about what may weaken this link between values and traits. Environmental threats should play a major role in how values and personality traits are linked. In highly threatening environments, that is environments where there is poverty, lots of environmental threats such as cold winters and superhot summers, lots of diseases (you can easily get infected and die), little available food, lots of violence and no personal freedom, people are probably quite restricted in their choices and behaviours. Hence, their personal values may not be expressed in behaviours, because the environment determines their actions more than their personal orientations and motivations. Equally, since the environment strongly influences people's behaviour, individuals may not interpret their own behaviour as reflecting some underlying values because their behaviour is more strongly influenced by environmental pressures. Hence the overall link between values and personality traits should be lower, regardless of how values and personality are causally related. 

We tested this hypothesis using a good number of different indicators of threat. Amazingly, threat turned out to be a strong and consistent moderator for most of the value-personality associations. Hence, our analysis of environmental threat in a broad sense can explain why there was so much variability in the literature previously. 

Exploring Personality Systems

It also could explain why researchers have not integrated personality research with value research in a more systematic way. Given the substantive variability, researchers might have thought that the links are too weak and too inconsistent to be worth pursuing. However, we believe we have shown that by taking a broader picture, examining the value-personality link in a more systematic way and examining the conditions in which the links are stronger or weaker, we can move both value and personality research forward. 

In my view, values and personality traits are expressions of underlying motivational systems that are encoded in similar and overlapping language. It is more of an accident of history that these two systems have emerged in different research fields. I hope this study will help to bring the fields back together and allow a more sophisticated examination of how values and traits are both expression of human personality. 

Sunday, July 20, 2014

A crisis in cultural psychology? Lack of replications, bias & publication pressures

Social psychology is facing an existential crisis. Ype Poortinga and I took the opportunity to examine how cross-cultural psychology fares in comparison. What is the background? A collective drive for presenting novel, sexy and sensational findings has propelled social psychology into a minefield of public mistrust and claims of being a pseudoscience. The list of sins in the eyes of the public are long: Central methods at the core of the discipline such as priming have been challenged, the drive to find significant differences has led to a neglect of the meaningfulness of psychological findings, publication pressures opened the doors for unscientific data massaging and most notoriously, glamorous stars of the discipline have been found to fabricate their data. There has rarely been a month since the now infamous Staples affair, when the field was not in the spotlight of public and internal scrutiny. This series of events has led to some agonizing soul-searching among psychologists.

Addressing methodological vulnerabilities in research on behavior and culture

Ype Poortinga and myself used the opportunity of the 22nd International conference of Cross-Cultural Psychology (organized by IACCP) to critically examine how cross-cultural psychology as a sister discipline of social psychology is faring. We assembled an A-list of leading cross-cultural psychologists and former editors of the flagship journal for research on culture and psychology (Journal of Cross-Cultural Psychology). Our instructions were simple: we requested them to critically evaluate the methods of our field and comment on ways how our field may move forward. Ype and I also provided a summary of our own concerns about the state of the field. The session was exceptionally well attended and the panel managed to create a lively debate and exchange of views with each other and the audience. This was particularly remarkable given the technical challenges, the double booking of the room and the incredible heat, lack of seats and oxygen in the late afternoon (it felt like a 2 hour sauna session). I have received quite a few requests for our slides, so I am summarizing some key points from our introductory presentation, the talks by Peter Smith, Johnny Fontaine and David Matsumoto as well as discussion that followed the presentations. I will also outline some ideas of the next steps that we are considering taking.

Poortinga and Fischer: Why questionable null-hypotheses and convergent search for evidence erode research on behavior and culture

Null hypothesis significance testing is the modus operandi for conducting research in psychology overall. At the same time, it has come under increasing pressure and scrutiny. Some quotes from some recent papers illustrate the various problems with the state of psychology:

Ioannides (2005): “[A] research finding is less likely to be true when … when effect sizes are smaller; when there is … lesser preselection of tested relationships; … greater flexibility in designs, definitions, outcomes, and analytical modes; … and when more teams are involved in a scientific field in chase of statistical significance”
Vul et al. (2009) report on “voodoo correlations” in fMRI: “We show how … nonindependent analysis [of voxels] inflates correlations while yielding reassuring-looking scattergrams”
Simmons et al. (2011) on “false-positive psychology”: “… flexibility in data collection, analysis, and reporting dramatically increases actual false-positive rates. In many cases, a researcher is more likely to falsely find evidence that an effect exists than to correctly find evidence that it does not”.

The application of the experimental research paradigm with an emphasis on null-hypothesis significance testing is particular problematic in cross-cultural psychology, because some of the basic assumptions of experimental design are violated by default:

a) There is no random assignment of respondents to conditions and

b) The experimenter has little control over conditions and ambient events.

This figure shows these problems in a nice way and clearly highlights that cross-cultural studies do not even meet the conditions for good quasi-experimental designs and have significant

Further challenging current experimental practices, Simmons, Nelson and Simonsohn (2011) eloquently exposed the problems of researcher degrees of freedom and the impact of quite innocent appearing research practices on significance levels. They demonstrated how a logically impossible hypothesis (listening to songs about age will decrease the age of listeners) can be empirically supported. Applied to the topic of their investigation, they discovered the psychological equivalent of the proverbial fountain of youth by using questionable research practices. The following figure shows the outcomes of their simulation study and the impact of four researcher degrees of freedom on significance levels. We highlighted the relevance of these conditions for cross-cultural research. 

First, assuming per definition that culture is a shared meaning system, any two cultural variables will be correlated to a significant degree. The very nature of the phenomena under investigation makes finding significant differences more likely. This non-independence is well recognized and the negative impact on significance testing is well recognized in methods circles but not well-understood in general cross-cultural research circles.

Second, a researcher may add 10 more observations or cases to the study if a first examination did not reveal any significant differences. This is probably a more common practice in cultural priming studies, but may be less of an issue in comparative survey studies.

The third questionable practice is controlling for third variables, especially if their impact is not theoretical grounded. In their case study, Simmons et al. used gender as an example, but in cross-cultural psychology it is often GDP at the country level or some demographic variables at the individual level that is entered as a covariate. This is a double-bind of cross-cultural psychology, on one hand we need to control for other variables that may explain any differences between samples, on the other hand, these simulations demonstrated that such practices have a sizable impact on significance levels.

The last questionable practice is to drop (or not) one of the conditions. The equivalent in cross-cultural psychology is to omit samples that may not fit the expected pattern (outlier removal). Talking to other researchers, this seems a common practice.

These individual practices individually increase the likelihood of finding significant results only in a relatively minor way, but the combination of these practices will lead to substantively inflated ratios of significance results: a significant result at the magical .05 significance level is 60% more likely if you combine all four of these questionable practices! Based on conversations with colleagues and observations of publication trends, these practices are common in cross-cultural psychology. This now means that we probably need to question a good number of empirical findings published!

A further issue is that the null hypothesis of no difference is likely to be rejected if there is a difference on any third variable that is related to the dependent variable. In such instances, there is a high rate of Type 1 errors (false positive results). One pressing issue is method bias. In questionnaire studies, response biases such as acquiescence or yes-saying are particularly salient.

The next figure shows the probability of finding a significant result as a function of sample size and the size of the bias. The various lines show the various levels of bias in terms of the standard deviation. If the bias effect is small (e.g., 1/16th of the standard deviation on the DV), increasing sample sizes are not increasing the probability of finding a significant effect by much. However, when bias approaches a .25 of the standard deviation, the probability of finding a significant effect in a sample of 100 participants approaches 60%. You may argue that ¼ of a standard deviation is large. However, it is not an unrealistic scenario given the prevalence and extent of response styles in questionnaire research – see for example our earlier research showing that response styles produce bigger effect sizes than 1/3 of theoretically important research studies.

These two simulation studies suggest that cross-cultural differences might be spurious and driven by method effects. In addition, our field seems to be driven by differences and appears to pay unduly emphasis on differences, without questioning their validity. The next figure shows the emphasis on differences and the lack of studies hypothesizing and finding similarities. This graph is adapted from a review by Brouwers and others, published in JCCP in 2004. As can be seen, the majority of studies expect differences only (N=55) and only 25 studies expected differences and similarities. At the same time, 57 studies found both. Most importantly, given laws of probability, we should also have studies that expect and report only similarities. Brouwers and colleagues did not find a single study that either hypothesized or reported similarities only. Where are these studies?

The points raised so far should not be understood as challenging the experimental methods underlying comparative research. We would urge our colleagues to critically question some of our designs and analytic procedures. In the larger experimental literature, a number of strategies have been proposed, including:

- stricter designs (larger n, Button et al., 2013) for more power
- stricter analysis (p < . 005, Johnson, 2013)
- prevention of experimenter bias (O. Klein et al., 2012)
- more transparency (e.g., pre-registration of hypotheses)
-replication across multiple researchers and labs (R. A. Klein et al., 2014)
We see it as a good sign that replication studies have achieved new status. For example, an earlier attempt of our lab to replicate the culture-level value structure by Schwartz using data from the Rokeach Value Survey faced some real uphill battle in getting it published. The saving grace to get it published seemed to be the appearance of a new value type that was not evident in the earlier Schwartz circle (a replication of this new value type is still outstanding). The new emphasis on replication in my opinion is a major achievement. The first findings of this new wave of replications are coming in. For example, the following graph shows the replication success of a number of studies in the ‘many labs’ replication report. Some of the older studies hold up well to scrutiny, but many of the newer findings, in particular priming studies are not replicable.

What is also noteworthy is that in the original dissemination of these findings, the lack of cross-cultural differences in the patterns was emphasized. Some commentators were quick to jump on that and suggested that careful and experimentally strict replications will do away with cross-cultural differences. We may want to challenge such an assumptions, but these comments clearly demonstrate that we as cross-cultural and cultural psychologists need to engage with the replication debate. We cannot sit back and pretend that the replication crisis does not affect us!

Replications vary along an underlying dimension, with exact replications being at one end and conceptual replications forming the opposing end. The conventional experimental wisdom is to prioritize exact replications or to stick as closely as possible to the original designs (close replications) with large samples sizes to have high power to detect effects. Of course, we know that exact replication in a cross-cultural context is problematic due to the different cultural conditions of participants.

However, an even more important point for us is that the presence of bias (e.g., response styles, speed-accuracy trade-offs) challenges the validity of exact or close replications. A replication of a biased study is a replication of a biased study.

In addition, if we have two samples and we define one sample as belonging to X culture and the other sample as belonging to a Y culture (this could be anything: collectivistic vs individualistic; independent vs interdependent self-construals, honour vs dignity; holistic vs analytic thinking), then any difference on whatever variable will be statistically related to the presumed X-Y difference. Therefore, replications in cross-cultural psychology need to be positioned towards the conceptual replication end and require additional methodological safeguards.

We suggest that cross-cultural replications need to:

-ensure validity of procedures in local context

-empirical checks on the postulated antecedent (what theoretical process is likely to drive these expected patterns and to empirically test these theoretical processes)

-manipulation checks (including a “no-difference” condition, on what variable or set of variables do we NOT expect a difference)

-control on likely alternative explanations (e.g., response styles).

In summary of the points so far, cross-cultural psychology suffers from many of the same shortcomings that have created the crisis in social psychology. A somewhat humorous account borrowing from Dante’s version of hell is provided by this cartoon (by the Neurosceptic, published in Perspectives in Psychological Science). Our research culture that emphasizes differences instead of similarities leads a state of limbo, overselling and post-hoc story-telling. Our narrow orientation towards ghost in the machine variables (such as collectivism, self-construals and values) lead to overselling (everything needs to be explicable by single dimensions, typically of personal relations or self-construals), post-hoc story telling and p-value fishing. From personal experience publishing cross-cultural research, nearly any difference can with a bit of theoretical creativity be related back to individualism-collectivism, self-construals or any of the other fashionable constructs these days. These biases in orientation and the researcher practices and researcher degrees of freedom then lead to p-value fishing and creative outlier utilization. Of course, the absence of no-difference studies suggest a significant file drawer problem.

Our suggestions are therefore:

Better designs (including efforts to reduce bias, testing of alternative theoretical processes, etc.)

Planned replications

Depositing hypotheses and methods in a public archive prior to data collection

Peter Smith: To understand cultural variation let’s sample cultural variations

Peter Smith and colleagues suggested a rather straightforward approach for addressing some of the concerns. Their recommendation was to go beyond two culture comparisons and to sample cultural variation more broadly, e.g., by studying multiple Asian and non-Asian samples that are typically lumped together as collectivist, interdependent, holistic, etc. In addition, Peter and colleagues included more diverse instruments capturing conceptually similar constructs to examine variability in intended constructs across a broader range of instruments. Peter presented some preliminary data that supported the usefulness of this approach. However, he also acknowledged that the current study has some important limitations, including studying students, not having enough samples yet to properly examine effects (e.g., though multi-level modeling) and a high demand on participants (e.g., completing long sets of questionnaires).

Johnny Fontaine: A plea for domain representation

Johnny presented a more technical account of domain representation that examined the meaning of constructs across a larger number of languages and cultural contexts. Using examples from the emotion domain, he showed that we can avoid confusion and biases in meaning through the use of sophisticated non-metric statistical methods in combination with elaborate designs that allow separating situational and personal characteristics. His approach demands a theoretical analysis of possibly important variables that need to be incorporated into the research design. Johnny really got the methods guns blazing in his presentation and I have to admit that the heat of the room by that time had fried my brain. As a consequence, I was not able to follow all the intricate steps in the procedure and not having a seat did not allow me to take good notes (but the graphs looked very convincing). He is working on a manuscript detailing the procedures and I am certainly looking forward to reading it when it is ready.

David Matsumoto: Random thoughts about methodological vulnerabilities in research on behavior and culture.

David broadened the symposium by focusing on the broader research climate in culture and psychology. Most people in this overheating room will have appreciated his first demand: before he started talking he requested everyone to stand up from their seats. Beyond bringing some oxygen into our brains, this also then became a beautiful point of reference for his short and sharp presentation. Here are his three main arguments (my paraphrasing):

Point 1: Study behavior

Point 2: Respect the literature

Point 3: The current pressures on young academics makes following recommendations 1 and 2 challenging

The first point is obvious – our discipline confuses self/other/peer-reports of behavior for behavior. I do not have hard stats here, but from memory – I cannot remember a single cross-cultural social psyc study in the last year or two in JCCP that studied actual behavior. He pointed out that everyone had stood up when he asked at the beginning of his presentation – a success rate of 100%. In contrast, when asking people whether they would stand up in a seminar room when asked by the presenter (e.g., on a scale from 1-7), there would have been significant variability and the mean would definitely been lower than 100%. Drawing upon his own research on emotion display, he argued that triangulation of research method is necessary.

The second point highlights the emphasis of getting to know more about previous research. In the current research environment, researchers need to present novel result and theory. There is no incentive for (or penalty for not) reading older research that may have been conducted 10 or 40 years ago. Journal editors are keen to get citations to recent papers to increase the journal’s Impact Factor. Yet, this leads to impoverished and non-cumulative research.

The last point highlights the constraints that young researchers pre-tenure are facing: more publications in less time. Studies of behavior are time consuming and therefore are less appealing. Reading relevant literature in one’s field or neighboring disciplines is also detracting from writing articles and funding applications. David emphasized that IACCP has a richer intellectual tradition than many mainstream researchers who have discovered culture and now publish in high-impact journals.

Some of the discussion points

The discussion turned repeatedly on a number of points. I will try and summarize some of the key ones that stood out for me.

Representativeness of samples: One key concern that came up repeatedly was that studying students is not appropriate for making claims about cultural processes. Students are not good representatives of the larger population.

Studying nations: One early comment that drew spontaneous applause from the audience was that psychology has failed in studying culture. Instead, psychologists are studying nations. Yet, nations are highly diverse and consist potentially of many subcultures. Various other commentators picked up similar themes throughout the discussion. One issue that is related here was the relative emphasis on between-country/culture differences and the lack of attention to within-country/culture differences. Both Geert Hofstede and Shalom Schwartz were in the audience, but they remained silent – it would have been nice to hear their responses to some of these comments (and both have done some interesting work that would have been informative in this debate).

Lack of strong theory: Peter Richerson argued that psychologists lack strong theory and recommended looking to neighboring disciplines such as biology for inspiration. David Matsumoto defended psychology in his response, suggesting that psychology has some good theories. But he also added that we need truly exploratory work that can understand phenomena on their own terms. My thought on this is that we have not enough strong theory (in a philosophy of science perspective) and that exploratory research with attention to various alternative explanations may bring us closer to developments of stronger theories of culture (e.g., by including the possibilities of no differences, attention to alternative processes beyond the usual suspects in current psychological thinking on culture).

Validity of findings: One point that occurred in various disguises in a number of comments was the importance of validity of findings in the local context. Amina Abubakar was the first to get this point across in the debate: To what extent can cultural psychology and cross-cultural research as a method of choice yield insights into the minds and behaviours of people in a specific context? How applicable and relevant is cross-cultural research for people around the world? This is a major question and needs some serious contemplation as we face a rapidly changing world and need to collectively respond to multiple pressing challenges (e.g., increasing intergroup conflict, climate change, decreasing natural resources).

Next steps

An immediate opportunity following this debate arose the next day after the round-table discussion. Ype challenged the assembly that methods issues need more attention and in response Walt Lonner as the founding editor of JCCP suggested a methods oriented special issue for JCCP. We had a discussion during the coffee break and he invited us to write a proposal for a special issue. Any thoughts for topics and contributors for such a special issue addressing the methods challenges are much welcome (please flick me an email or respond below – I would love to hear from you).

Looking at some other associations (APS comes to mind here), we could adopt some of their criteria for publication – there have been some interesting suggestions and changes in policies recently. Even JPSP now publishes replications (hooray!!!!!!)!

Overall, I think that the overall change in research climate is promising. There has never been a more positive time to discuss how we collectively do research, there is much promise of change in the air and I strongly believe that collectively we can make a positive change. Without this conviction, we would not have had the symposium and such a large crowd keen to brave tropical temperatures and horrible conditions in the late afternoon to debate a topic so passionately. I felt humbled by this enthusiasm of the audience and the positive comments that we received over the next couple of days. I look forward to continuing this debate and hearing your opinions and suggestions!