SAS macro for computing polychoric correlations
The following macro, which is downloadable from the SAS website, may also be copied and pasted into a SAS file of form *.sas which can be included for use in a factor analysis using SAS using syntax as below.
PROC IMPORT OUT= WORK.WHEATON
DATAFILE= "C:\Documents and Settings\peterw\Desktop\My Documents\EQS FILES STAI\TRAIB_CC.TXT"
DBMS=TAB REPLACE;
GETNAMES=YES;
DATAROW=2;
%inc 'C:\Documents and Settings\peterw\Desktop\My Documents\My Documents2\JOE HERBERT\POLYCHORIC MACRO.sas';
%polychor(var=T_Q_21 T_Q_22 T_Q_23 T_Q_24 T_Q_25,out=dist)
proc print; run; The polychoric correlation macro is below - which is assumed in the above to be put into a file called polychoric macro.sas.
/********************************************************************** |
* ** |
| |
| %POLYCHOR macro |
| Version 1.2 |
| |
| DISCLAIMER: |
| THIS INFORMATION IS PROVIDED BY SAS INSTITUTE INC. AS A SERVICE |
| TO |
| ITS USERS. IT IS PROVIDED "AS IS". THERE ARE NO WARRANTIES, |
| EXPRESSED OR IMPLIED, AS TO MERCHANTABILITY OR FITNESS FOR A |
| PARTICULAR PURPOSE REGARDING THE ACCURACY OF THE MATERIALS OR |
| CODE |
| CONTAINED HEREIN. |
| |
| PURPOSE: |
| The POLYCHOR macro creates a SAS data set containing a |
| correlation |
| matrix of polychoric correlations or a distance matrix based on |
| polychoric correlations. |
| |
| REQUIRES: |
| %POLYCHOR requires only Version 6.07 or later of base SAS |
| Software. |
| |
| USAGE: |
| Before calling the POLYCHOR macro, you must first define the |
| macro in |
| your current SAS session. You can do this either by copying this |
| file |
| into the SAS program editor and submitting it, or by using a |
| %INCLUDE |
| statement containing the path and filename of this file on your |
| system. |
| |
| Once the macro is defined, call the macro using the desired |
| options. |
| See the section below for an example. |
| |
| The options and allowable values are: |
| |
| DATA= SAS data set to be analyzed. If the DATA= option is |
| not |
| supplied, the most recently created SAS data set is |
| used. |
| |
| VAR= Polychoric or tetrachoric correlations will be |
| computed |
| for every pair of variables listed in the VAR= option. |
| Individual variable names, separated by blanks, must |
| be |
| specified. By default, all numeric variables found in |
| the data set will be used. See LIMITATIONS below for |
| time considerations. |
| |
| OUT= Specifies the name of the output data set that will |
| contain the correlation or distance matrix. By |
| default, |
| the output data set is named _PLCORR. |
| |
| TYPE= Specifies the type of matrix to be created. If |
| TYPE=CORR (the default), then a correlation matrix is |
| computed and the output data set is assigned a data |
| set |
| type of CORR. If TYPE=DISTANCE, then a distance |
| matrix |
| is computed and the output dat set is assigned a data |
| set type of DISTANCE. |
| |
| PRINTED OUTPUT: |
| No printed output is generated by the %POLYCHOR macro. |
| |
| DETAILS: |
| The PLCORR option in the FREQ procedure is used iteratively to |
| compute the polychoric correlation for each pair of variables. |
| If |
| both variables in a pair are binary (that is, they take on only |
| two |
| distinct values), then the correlation computed by the PLCORR |
| option is usually referred to as the tetrachoric correlation. |
| |
| The individual correlation coefficients are then assembled into |
| either a TYPE=CORR data set containing a matrix of polychoric |
| correlations, or a TYPE=DISTANCE data set containing a matrix of |
| dissimilarity values. The dissimilarity value used is computed |
| as: |
| |
| 1 - plcorr**2 |
| |
| where plcorr is the polychoric correlation. |
| |
| The resulting data set can be used for descriptive analyses only |
| in |
| either the FACTOR or the CALIS procedure (specify METHOD=ULS in |
| either procedure) if the correlation matrix is computed. If the |
| maximum likelihood method (METHOD=ML) is used, note that none of |
| the hypothesis tests will be valid, and the polychoric |
| correlation |
| matrix may be indefinite with small samples. The distance matrix |
| can be used in the CLUSTER procedure (however, the CCC value is |
| not |
| valid) or the MDS procedure. |
| |
| See the Appendix, "Special SAS Data Sets" in the SAS/STAT User's |
| Guide for a description of TYPE=CORR and DISTANCE data sets. |
| |
| MISSING VALUES: |
| Observations with missing values are omitted from the computation |
| |
| of correlations. However, when computing the polychoric |
| correlation between two variables, if an observation's values for |
| |
| these two variables are not missing, then the observation is used |
| regardless of any missing values the observation may have on |
| other |
| variables. |
| |
| LIMITATIONS: |
| LIMITED ERROR CHECKING IS DONE. If the DATA= option is |
| specified, |
| be sure the named data set exists. If DATA= is not specified, a |
| data set must have been created previously in the current SAS |
| session. Be sure that the variables specified in the VAR= option |
| exist on that data set. Running PROC CONTENTS on the data set |
| prior to using this macro is recommended for verifying the data |
| set |
| name and the names of variables. |
| |
| The time required to compute the correlation or distance matrix |
| increases quadratically as the number of variables increases. Up |
| |
| to 999 variables are allowed, but the time required for more than |
| |
| 100 variables may be exorbitant. |
| |
| EXAMPLE: |
| |
| data ordinal; |
| array x{5} x1-x5; |
| do n=1 to 20; |
| do i=1 to 5; |
| x{i}=rantbl(238423,.1,.2,.4,.2,.1); |
| end; |
| keep x1-x5; |
| output; |
| end; |
| run; |
| |
| * If not already defined in your current SAS session, define |
| the |
| * POLYCHOR macro before calling it by putting the path and |
| * filename of your copy of this file in the %INCLUDE statement |
| * below. Example: %inc 'c:\mysasfiles\polychor.sas'; |
| |
| *****************************************************************; |
| |
| %inc ''; |
| |
| * Create and print a TYPE=CORR data set named _PLCORR |
| containing |
| * a matrix of polychoric correlations among all variables in |
| the |
| * data set ORDINAL. |
| |
| *****************************************************************; |
| |
| %polychor() |
| proc print; run; |
| |
| * Create and print a TYPE=DISTANCE data set named DIST |
| containing |
| * a dissimilarity matrix using variables X1, X2, and X5. |
| |
| *****************************************************************; |
| |
| %polychor(data=ordinal,var=x1 x2 x5,out=dist,type=distance) |
| proc print; run; |
| |
| ********************************************************************** |
| **/
%macro polychor(
data=_last_,
var=_numeric_,
out=_plcorr,
type=corr
);
options nonotes nostimer;
%if &data=_last_ %then %let data=&syslast;
/* Verify that TYPE=CORR or DISTANCE */
%if %upcase(&type) ne CORR and %upcase(&type) ne DISTANCE %then %do;
%put ERROR: TYPE= must be CORR or DISTANCE.;
%goto exit;
%end;
data _null_;
set &data;
array x{*} &var;
length name $8.;
if _n_=1 then
do i=1 to dim(x);
call vname(x{i} , name);
call symput('_v'||trim(left(put(i,4.))) , name);
end;
p=dim(x);
call symput('_p',trim(left(put(p,4.))));
run;
%do _i=1 %to &_p;
%do _j=&_i+1 %to &_p;
proc freq data=&data noprint;
tables &&_v&_i * &&_v&_j / plcorr;
output out=_tmp plcorr;
run;
data _null_;
set _tmp;
value= %if %upcase(&type)=CORR %then _plcorr_;
%if %upcase(&type)=DISTANCE %then 1-_plcorr_**2;
;
call symput("p&_i._&_j" , value);
run;
%end;
%end;
data &out
%if %upcase(&type)=CORR %then %do;
;
_type_='CORR';
length _name_ $8.;
%end;
%if %upcase(&type)=DISTANCE %then %str( (type=distance); );
/* Create matrix */
array x{*} %do i=1 %to &_p;
&&_v&i
%end;
;
do i=1 to dim(x);
do j=1 to i;
/* Set diagonal values */
if i=j then x{j}= %if %upcase(&type)=CORR %then 1;
%if %upcase(&type)=DISTANCE %then 0;
;
/* Set lower triangular values */
else
x{j}=symget("p"||trim(left(put(j,4.)))||"_"
||trim(left(put(i,4.))));
end;
/* Create _NAME_ variable for CORR data sets */
%if %upcase(&type)=CORR %then
%str( _name_=symget("_v"||trim(left(put(i,4.)))); );
drop i j;
output;
end;
run;
/* Add _TYPE_=MEAN, STD and N observations to CORR data sets */
%if %upcase(&type)=CORR %then %do;
proc summary data=&data;
var &var;
output out=_simple (drop=_type_ _freq_ rename=(_stat_=_type_));
run;
data &out (type=corr);
set _simple (where=(_type_ in ('MEAN','STD','N'))) &out;
run;
%end;
%if &syserr=0 %then
%if %upcase(&type)=CORR %then %do;
%put;
%put POLYCHOR: Polychoric correlation matrix was output to data set
%upcase(&out).;
%put;
%end;
%else %do;
%put;
%put POLYCHOR: Distance matrix based on polychoric correlations was
output;
%put %str( to data set %upcase(&out).);
%put;
%end;
%exit:
options notes stimer;
%mend polychor;
