Monday, April 14, 2014

How to do a pancultural factor analysis - a simple option

I am going to demonstrate a simple way of doing what is often called a pan-cultural or culture-free factor analysis in the cross-cultural literature (even though I do not like those terms) in SPSS. In the methods literature, this is also sometimes called a pooled-within analysis.

The basic problem is: How can you analyze the data from a large number of samples in an efficient way without giving priority to any data set? This is particularly interesting when you deal with data from lots of different cultures and you would like to find a solution that is averaged across all samples or 'culture-free' - capturing the average human being.

Such a solution could be interesting in its own right. It can also be useful as a reference structure for further Procrustean analyses (see my earlier blog post here).

Let's work with an example. I took the 1995 World Value Survey scores for Morally Debatable Behaviour (see a published analysis of the data here).

You will need to create the average correlation matrix first. The simplest way in SPSS is via Discriminant Function Analysis. Go to Classify (under 'Analyze') and select 'Discriminant'. Transfer the variables that you want to analyze into the Variables box. Then transfer your cluster or independent variable (your samples from different countries or cultures) into the 'Grouping Variable' box. You need to tell SPSS what the range of your country/sample codes is. In this case, the first sample is 1 (France) and the last sample in the data base is 101 (Bosnian Serb sample).




To request the average correlation, click on statistics. There you need to click on 'Pooled-Within Correlation'. Not much else that we need right now, so click 'Continue' and 'Ok'. In the output, you will see the table with the pooled-within correlation matrix right after the lengthy group statistics.

There are two options now. Either way, you need to get the correlation matrix.
One option is to open a syntax file in SPSS and to type this command and include the proper correlation matrix from your output as well as the overall N:

MATRIX DATA VARIABLES=benefits publictransport  tax stolengoods bribe homosexual prostitution abortion divorce euthanasia suicide
/contents=corr
/N=84887.
BEGIN DATA.
1.000
.434 1.000
.422 .516 1.000
.329 .429 .427 1.000
.338 .410 .428 .482 1.000
.232 .232 .244 .239 .267 1.000
.216 .249 .247 .274 .282 .544 1.000
.218 .238 .248 .266 .256 .334 .424 1.000
.204 .259 .252 .268 .273 .286 .355 .492 1.000
.220 .216 .235 .220 .233 .308 .295 .315 .327 1.000
.180 .210 .213 .239 .231 .275 .323 .315 .314 .430 1.000
END DATA.
EXECUTE.

Once you have it all typed out (or copied from SPSS), highlight it all and press the Play button (or 'Ctrl' + 'R').
A new SPSS window will open (probably best to safe this new data file with a proper name). As you can see in this picture, this looks a bit different from your average SPSS data spreadsheet.



 The first two columns are system variables (Rowtype_ and Varname_). The first line contains the sample size. If you don't want to use the syntax, this is the other option. You need to create this SPSS data file directly. The first variable in the SPSS matrix file is called ROWTYPE_ (specify it as string variable) and identifies the content in each row of the file (CORR, for correlations, in this example). The second variable is called VARNAME_ (again, specify as a string variable) and contains the variable name corresponding to each row of the matrix. The FACTOR procedure also includes a row of sample size (N) values to precede the correlation matrix rows. Then type or copy the full correlation matrix.


We are nearly ready for the analysis. Unfortunately, SPSS does not support factor analysis of matrices directly via the graphical interface. In order to run the analysis, you need to use syntax (again). 

Type the following command into the same syntax window (it will run a standard PCA, with Varimax rotation, print the scree test, sort the factor loadings and suppress loadings smaller than .3):

FACTOR MATRIX=IN(COR=*)
  /PRINT INITIAL EXTRACTION ROTATION
  /FORMAT SORT BLANK(.3)
  /PLOT EIGEN
  /CRITERIA MINEIGEN(1) ITERATE(25)
  /EXTRACTION PC
  /ROTATION VARIMAX
  /METHOD=CORRELATION.


Again, highlight the whole Factor command bit and hit play (or 'ctrl' + 'R'). You should see the output of the factor analysis based on the average correlation matrix. As you can see in the output, there are two factors that correspond to the 'socio-sexual' and the 'dishonest-illegal' factors. The scree test and Bartlett's EV > 1  also both support that there are only 2 factors. 


Now you can either interpet this factor structure in your report or use as reference for further comparisons against each of the samples.

Voila!




1 comment: