FAQ/restwb - CBU statistics Wiki
Self: FAQ/restwb

The VARSTOCASES and CASESTOVARS commands

The below can also be performed using the RESTRUCTURE command under the DATA menu in SPSS. For the example below we wish to stack columns x and y to make a single column called time. This is particularly useful for converting data from a format suitable for repeated measures to one suitable for multilevel modelling in SPSS. Notice we need to sort the cases before reformatting the data.

DATA LIST FREE / idno x y z a . 
BEGIN DATA 
001 34 56 23 45
002 23 45 43 22
003 39 41 11 22
004 33 11 22 55
005 87 57 66 56
006 99 36 67 78
END DATA. 

SORT CASES BY IDNO.
VARSTOCASES 
 /MAKE time FROM x y z a
 /INDEX = newfact "new var label" (time) 
 /KEEP = idno.

RECODE newfact (CONVERT) ('x'=1) ('y'=2) ('z'=1) ('a'=2) INTO wit.
RECODE newfact (CONVERT) ('x'=1) ('y'=1) ('z'=2) ('a'=2) INTO cond.
EXECUTE.

Use the /KEEP or /DROP subcommands to specify which variables (not created by the restructuring) are to be saved into the new file. The default is to keep all the variables in the original data file ie drop none. The RECODE is needed in the above to convert the within subjects factor, newfact, which has a string format into one or more numeric factors, in this example these are wit and cond.

To add id number (e.g. for performing Tukey's HSD with UNIANOVA) you need to use the syntax below. In this example the subject number is placed in the column called sub assuming there are 3 subjects (vary depending on the number of subjects and put equal to nsub in the below).

compute nsub=3.
COMPUTE SUB = trunc(1+($CASENUM - mod($casenum-0.01,nsub))/nsub).
exe.

To do the reverse operation we need to use the CASESTOVARS command in conjunction with SORT CASES as below:

DATA LIST LIST/sub score.
BEGIN DATA                
1 34                    
1 56                    
2 23                    
2 45                    
3 39                    
3 41                    
4 33                   
4 11                    
5 87               
5 57
6 99
6 36    
END DATA.                 
LIST.                     

COMPUTE num=1.                       
IF sub = LAG(sub) num = LAG(num)+1.
LIST.  
                                                                   
SORT CASES BY sub num.             
CASESTOVARS                          
 /ID = sub                           
 /INDEX = num
 /AUTOFIX=NO.                                      
LIST.                                

By default all variables are saved into the new file. The /DROP subcommand may be used to remove variables which are not required after restructuring. There is no /KEEP subcommand available for CASESTOVARS. You may need to average over nuisance factors prior to using CASESTOVARS using the AGGREGATE command.

NB: The CASESTOVARS command can omit reformatting columns with a large number of missing values or columns which immediately follow a column or columns with a lot of missing values. The AUTOFIX subcommand used above corrects this and produces reformatting of all columns including those omitted due to containing large numbers of missing values or immediately preceding columns with large numbers of missing values.

You can produce more columns indexed by all possible combinations of two ro more factors (num and num1) by just adding these to the SORT CASES and /INDEX subcommand as in the example below which produces four columns for the four combinations of factors num and num1.

DATA LIST /sub 1 score 3-4 num 6 (a) num1 8 (a).
BEGIN DATA                
1 34 a a                    
1 56 a b                  
1 23 b a                  
1 45 b b                  
2 39 a a                  
2 41 a b                  
2 33 b a                 
2 11 b b                      
END DATA.                 
LIST.              

SORT CASES BY sub num num1.             
CASESTOVARS                          
 /ID = sub                           
 /INDEX = num num1
 /AUTOFIX=NO.                                      
LIST.

It is assumed when using the CASESTOVARS command that there are no repetitions of each combination within each subject. If this is the case, and repetition is not of interest, one can, for example, use the AGGREGATE command (as below) to take an average of these two combinations aggregating over num and num1 and then remove duplicate entries (see syntax following the AGGREGATE example).

DATA LIST /sub 1 score 3-4 num 6 (a) num1 8 (a).
BEGIN DATA                
1 34 a a                    
1 56 a b                  
1 23 a b                  
1 45 b b                  
2 39 a a                  
2 41 a b                  
2 33 b a                 
2 11 b b                      
END DATA.                 
LIST.       

AGGREGATE
  /OUTFILE=* MODE=ADDVARIABLES
  /BREAK=num num1 sub
  /score_mean=MEAN(score).

Then going to Data>Identify Duplicate cases and entering 'score_mean' in the 'Define Matching Cases by' box will then create a indicator variable as a column flagging duplicate rows as a '0'. These extra rows can the be removed using 'Select cases' to select cases whose values are non-zero. The data thus obtained can then be formatted into repeated measures format using CASESTOVARS as earlier.

More examples of using VARSTOCASES and CASESTOVARS are given here.

None: FAQ/restwb (last edited 2015-08-11 08:38:08 by PeterWatson)