Hello. I would like to create a program something like this:
Proc sql;
create table TARGET as
Select
case when column1 exists then get the value of column 1. Otherwise null end as COL1,
case when column2 exists then get the value of column 2. Otherwise null end as COL2,
.....
From SOURCE;
quit;
May I know how I can convert this to sas/proc sql code? Thanks.
@iSAS
wrote:
Hello KurtBremser. My problem is the first suggestion you've mentioned.
I'm receiving a source file and its column/s changes consistently. Example, today I will receive the file with Col1,Col2, Col3, Col4 and tomorrow I will only receive Col1,Col2, and Col4. Thus I would like to create a program that:
If a certain column exists on the current source file, it will get the data of that column. Otherwise, it will just put null.
See this example, using sashelp.class as a guinea pig:
/* list of columns needed */
data columns_needed;
input name :$32. type :$1.;
name = upcase(name);
cards;
weight n
age n
sex c
proc sort data=columns_needed;
by name;
/* create some example data */
data have;
set sashelp.class (drop=age);
/* retrieve metadata */
proc sql;
create table columns as
select upcase(name) as name
from dictionary.columns
where libname = 'WORK' and memname = 'HAVE'
order by name;
quit;
/* compare existing columns with needed, and dynamically create a data step to create missing columns */
data _null_;
merge
columns (in=have)
columns_needed (in=need)
end=eof
by name;
if _n_ = 1
then call execute('data want; set have;');
if need and not have
then do;
if type = 'n'
then call execute(name !! " = .;");
else call execute(name !! " = '';");
if eof then call execute('run;');
Since the existence of a column will be known before the step, I'd retrieve that information from dictionary.columns and create code selectively.
If, OTOH, you are looking for missing values, the solution would be different. Which of the two is it?
Hello KurtBremser. My problem is the first suggestion you've mentioned.
I'm receiving a source file and its column/s changes consistently. Example, today I will receive the file with Col1,Col2, Col3, Col4 and tomorrow I will only receive Col1,Col2, and Col4. Thus I would like to create a program that:
If a certain column exists on the current source file, it will get the data of that column. Otherwise, it will just put null.
@iSAS wrote:
Hello KurtBremser. My problem is the first suggestion you've mentioned.
I'm receiving a source file and its column/s changes consistently. Example, today I will receive the file with Col1,Col2, Col3, Col4 and tomorrow I will only receive Col1,Col2, and Col4. Thus I would like to create a program that:
If a certain column exists on the current source file, it will get the data of that column. Otherwise, it will just put null.
See this example, using sashelp.class as a guinea pig:
/* list of columns needed */
data columns_needed;
input name :$32. type :$1.;
name = upcase(name);
cards;
weight n
age n
sex c
proc sort data=columns_needed;
by name;
/* create some example data */
data have;
set sashelp.class (drop=age);
/* retrieve metadata */
proc sql;
create table columns as
select upcase(name) as name
from dictionary.columns
where libname = 'WORK' and memname = 'HAVE'
order by name;
quit;
/* compare existing columns with needed, and dynamically create a data step to create missing columns */
data _null_;
merge
columns (in=have)
columns_needed (in=need)
end=eof
by name;
if _n_ = 1
then call execute('data want; set have;');
if need and not have
then do;
if type = 'n'
then call execute(name !! " = .;");
else call execute(name !! " = '';");
if eof then call execute('run;');
Now you should heed @RW9's advice. If there's actually data hidden in the column names, then a proper modeling of the input data will make all this unnecessary.
Or you simply do a transpose of your dataset:
data in;
input id col_1 col_2 col_4;
cards;
1 3 5 6
2 7 2 5
proc transpose
data=in
out=out (rename=(_name_=colname col1=value))
by ID; /* add other variables that never change */
var col_:;
and you have normalized dataset that needs no correction in code at all.
When you now merge that by colname to your dataset of needed columns (or any other dataset that has the required structure), the missing values will appear on their own.
Really a good idea to post test data in the form of a datastep and what you want out. Now I am guessing that you don't mean, then the column is present or not, but when the data within the column is present or not? If so your code is nearly there (note how I use the code window - its the {i} above post are):
proc sql;
create table target as
select case when col1 ne "" then col1 else "" end as col1,
case when col2 ne "" then col2 else "" end as col2
from have;
quit;
Note this assumes the variables are character, numeric would be ne . However there doesn't seem to be any point to this code. Simply:
proc sql;
create table target as
select col1,
from have;
quit;
Will achieve exactly the same thing, you don't need to set missing's explicitly. Maybe you meant something else, but I cant' tell from what you posted.
Hello RW9. Thank you for the response.
My issue is somewhat different: Example, today I will receive the file with Col1,Col2, Col3, Col4 and tomorrow I will only receive Col1,Col2, and Col4. Thus I would like to create a program that:
If a certain column exists on the current source file, it will get the data of that column. Otherwise, it will just put null.
Then your problem is not with coding, but understanding the data and the modelling, and not having proper documentation on the data transfer - all common issues. Why does the data structure change like this? If indeed these columns change each time, then they should not be columns - they should be rows of data. Columns are a fixed structure which can be agreed up front, and programmed against, rows, expand and shrink as the data needs. This is basic data modelling. So the sender should identify the data which changes and apply this in rows not columns, then you and the sender can agree a data transfer document which details the fixed structure - they can program the export and you the import based on the consistent agreed document.
The alternative is that you try to fix this time, then next time they do something else, maybe they wont call them col next time, maybe they wont number them like that, thus you repeat this exercise every import. A slight change nullifies this:
Col1 Col2 Col3 Col4
a b c d
Col Result
1 a
2 b
3 c
4 d
This one can have 3 removed without any change to the import program or your coding which uses this data.
Available on demand!
Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.
Register now!
Classroom Training Available!
Select SAS Training centers are offering in-person courses. View upcoming courses for:
Arlington, VA
Cary, NC
View all other training opportunities.