0 users online | 0 Guests and 0 Registered

Is there an equivalent for the COALESCE function to find the first non-missing value in a column, rather than a row?


The COALESCE function allows for the first non-missing argument to be returned from a list of variables reading across a row: non_miss = coalesce(arg1,arg2,arg3) the only requirement being that the three variables arg1-arg3 are of the same type.

Sometimes data may be generated where individual rows contain a missing or non-missing value for the same variable, and it is desirable to consolidate these rows to retain only the non-missing values.

Submitting the code:

data mults ;
infile datalines dsd ;
input id char1 $ num1 char2 $ ;
datalines ;
1,Fred,.,Smith
1,,100,Smith
2,,200,Jones
2,Bert,.,
;
run ;

which generates a table:

1 Fred . Smith
1   100 Smith
2   200 Jones
2 Bert .  

 

with a desired output of:

1 Fred 100 Smith
2 Bert 200 Jones

 

This can be achieved using a summary statistic in PROC SQL.  The default action for summary statistics with a single argument in PROC SQL, is to calculate a value 'down' the column.  Whilst it is usual to do this with numeric values, it is also possible with certain functions, to undertake the summarization with character variables.

The solution for this particular problem uses the MAX function in conjunction with the GROUP BY clause to create classification groups:

proc sql ;
create table flats as
select  id
       ,max(char1) as forename
       ,max(num1)  as value
       ,max(char2) as surname
from mults
group by id
;
quit ;
Author:
Alan D Rudland
Revision:
1.4
Average rating: 4 (1 Vote)

You cannot comment on this entry

Chuck Norris has counted to infinity. Twice.

Records in this category

Tags