How do I handle messy data in SPSS to produce duration times from dates and frequencies from strings?
Supposing we have a list of individual dates of form below
Date of Birth (DOB) |
Date on System |
Date Left study |
||
17.02.40 |
12.05.00 |
|
||
02.08.73 |
18.01.05 |
24.03.08 |
Patient data is often entered in the form of dates. For example, date of birth and date that a particular person started a study and, if applicable, left it. We may wish to work out how long a person was a participant in a study. To do this we first of all need a cut-off date which is either the date when the study ended (studyEdate) for continuing participants or the date of drop out (DateFinished) for people who left the study.
There are various commands for telling SPSS what the number in a column represents a date. The DATE.DMY command defines a date in the spreadsheet.
COMPUTE studyEdate = DATE.DMY(30/11/2010). EXE.
Existing columns of data which represent dates may be flagged to SPSS using various formats (found by clicking the variable view tab and 'Type' next to the column of interest. Data may also be defined as dates using syntax.
FORMATS dob DateFinished DateonSystem studyEdate (DATE8). EXECUTE.
We can work out duration time in the study using the DATEDIF command which works out the difference in days or months between two dates. We, firstly though, need to tell SPSS to use the study end date, for people who did not drop out, or the date they finished, if they did. Since these two dates are entered in separate columns we can do this using the SYSMIS command which tells SPSS if there is no date of finish to use the study end date as the cut-off date. We can now compute the number of days in the study for dropouts and non-dropouts.
COMPUTE DFMISS = SYSMIS(DATEFINISHED). IF (DFMISS EQ 1) usertime = DATEDIF(studyEdate, DateonSystem, "months")/12. IF (DFMISS EQ 1) usertime2 = DATEDIF(studyEdate, DateonSystem, "days"). IF (DFMISS EQ 0) usfintime = DATEDIF(DateFinished, DateonSystem, "months")/12. IF (DFMISS EQ 0) usfintime2 = DATEDIF(DateFinished, DateonSystem, "days").
We can also work out the ages of the participants when they entered the study in a similar way using the TRUNC function to round down.
COMPUTE age = TRUNC(DATEDIF(DateonSystem, DOB, "days")/365.25). EXE.
We can now obtain summary statistics for durations of (non-)dropouts and age.
DESCRIPTIVES VARIABLES=usertime2 USFINTIME2 AGE /STATISTICS=MEAN STDDEV MIN MAX .