Learning SAS
Introduction
This document is a compilation of what is essential to know about SAS programming. The are two notions that are fondamental to understand before diving into the SaS programming language. These are the data step and the proc step.
The data step consists to create a dataset and to manipulate it. The proc step is used to analyze the data.
Before starting, let’s talk about the libname
.
Libname
The libname
statement is used to assign a library reference name to a physical location. The example below shows how to assign the reference name Inputs
and outputs
to the physical location /home/u63691422/EPG1V2/data
and /home/u63691422/EPG1V2/Jumbong_Training/Outputs
respectively.
Tlibname Inputs "/home/u63691422/EPG1V2/data" ; libname Outputs "/home/u63691422/EPG1V2/Jumbong_Training/Outputs";
Data Step
The syntax of the data step is as follows:
DATA Outputs.mydata;
SET Inputs.mydata; RUN;
Operator summary table
Table of comparison operators
Operator | Symbol | Meaning |
---|---|---|
LT | < | Less than |
GT | > | Greater than |
LE | <= | Less than or equal to |
GE | >= | Greater than or equal to |
EQ | = | Equal to |
NE | ^= | Not equal to |
IN | IN | In a list of values |
Logical operators table
Operator | Symbol | Meaning |
---|---|---|
AND | & | Logical AND |
OR | ! | Logical OR |
NOT | ~ | Logical NOT |
Arithmetic operators table
Operator | Symbol | Meaning |
---|---|---|
ADD | + | Addition |
SUB | - | Subtraction |
MUL | * | Multiplication |
DIV | / | Division |
POW | ** | Exponentiation |
MOD | % | Modulus |
Max | <> | Maximum |
Min | >< | Minimum |
Operator of concatenation
The concatenation operator is used to concatenate two or more character strings. The concatenation operator is represented by two vertical bars (||).
Variables Selection
In order to select only some variables from the data source, the keep
statement is used.
DATA Outputs.mydata;
SET Inputs.mydata;
KEEP var1 var2 var3; RUN;
Observation filtration
- If it is important to extract only observations m and n from the data source, the
obs
statement is used.
DATA Outputs.mydata;
SET Inputs.mydata (firstobs=m obs=n); RUN;
- In order to select observations that meet certain conditions, the
WHERE
statement is used.
DATA Outputs.mydata;
SET Inputs.mydata;
WHERE CONDITION; RUN;
Compute the cumulative sum by group of a variable.
proc sort data=INPUTS.class_update out=class_sorted;
by Sex;
run;
data output;
set class_sorted(keep=Sex Age);
by Sex;
retain s_age;
if first.Sex then s_age = Age; /* Réinitialiser pour chaque groupe */
else s_age + Age; /* Cumul des âges */
if last.Sex then output;
run;
RUN;
This code compute the cumulative sum of the variable Age by group of the variable and then take the end last element of each group which corresponds to the cumulative sum of the variable Age by group.
## Conactenation and control concatenation
```default
DATA Outputs.mydata;
SET table1 table2; RUN;
DATA Outputs.mydata;
SET table1 table2;
BY ID; RUN;
Compute the frequency of a variable in a dataset and the frequency of missing values of the variable.
PROC FREQ data = Outputs.mydata ;
table variable/missing; RUN;
Compute the frequency of a variable in a dataset and export results to a excel sheet.
/* Module : Macro
Input :
- data : dataset
- list_var : macro variable containing the list of variables to compute the frequency
- path : path to save the excel file
Output :
- Excel file containing the frequency of the variables
*/
%macro export_freq(data, list_var, path);
%let nbvar = %sysfunc(countw(&list_var));
%do i = 1 %to &nbvar;
%let var = %scan(&list_var, &i);
PROC FREQ data = &data ;
tables &var /out = freq&var missing;
RUN;
PROC EXPORT data = freq&var
outfile = "&path"
dbms = xlsx replace;
sheet = "&var";
RUN;
%end;
%mend export_freq;
%export_freq(Outputs.mydata, var1 var2 var3, /home/u63691422/EPG1V2/Jumbong_Training/Outputs/freq.xlsx);