Learning SAS

Author

Jumbong Junior

Introduction

This document is a compilation of what is essential to know about SAS programming. The are two notions that are fondamental to understand before diving into the SaS programming language. These are the data step and the proc step.

The data step consists to create a dataset and to manipulate it. The proc step is used to analyze the data.

Before starting, let’s talk about the libname.

Libname

The libname statement is used to assign a library reference name to a physical location. The example below shows how to assign the reference name Inputs and outputs to the physical location /home/u63691422/EPG1V2/data and /home/u63691422/EPG1V2/Jumbong_Training/Outputs respectively.

Tlibname Inputs "/home/u63691422/EPG1V2/data" ;
libname Outputs "/home/u63691422/EPG1V2/Jumbong_Training/Outputs";

Data Step

The syntax of the data step is as follows:

DATA Outputs.mydata;
    SET Inputs.mydata;
RUN;

Operator summary table

Table of comparison operators

Operator Symbol Meaning
LT < Less than
GT > Greater than
LE <= Less than or equal to
GE >= Greater than or equal to
EQ = Equal to
NE ^= Not equal to
IN IN In a list of values

Logical operators table

Operator Symbol Meaning
AND & Logical AND
OR ! Logical OR
NOT ~ Logical NOT

Arithmetic operators table

Operator Symbol Meaning
ADD + Addition
SUB - Subtraction
MUL * Multiplication
DIV / Division
POW ** Exponentiation
MOD % Modulus
Max <> Maximum
Min >< Minimum

Operator of concatenation

The concatenation operator is used to concatenate two or more character strings. The concatenation operator is represented by two vertical bars (||).

Variables Selection

In order to select only some variables from the data source, the keep statement is used.

DATA Outputs.mydata;
    SET Inputs.mydata;
    KEEP var1 var2 var3;
RUN;

Observation filtration

  • If it is important to extract only observations m and n from the data source, the obs statement is used.
DATA Outputs.mydata;
    SET Inputs.mydata (firstobs=m obs=n);
RUN;
  • In order to select observations that meet certain conditions, the WHERE statement is used.
DATA Outputs.mydata;
    SET Inputs.mydata;
    WHERE CONDITION;
RUN;

Compute the cumulative sum by group of a variable.


proc sort data=INPUTS.class_update out=class_sorted;
   by Sex;
run;

data output;
   set class_sorted(keep=Sex Age);
   by Sex;
   retain s_age;
   if first.Sex then s_age = Age; /* Réinitialiser pour chaque groupe */
   else s_age + Age; /* Cumul des âges */
  
   if last.Sex then output;
run;

RUN;

This code compute the cumulative sum of the variable Age by group of the variable and then take the end last element of each group which corresponds to the cumulative sum of the variable Age by group.

## Conactenation and control concatenation

```default
DATA Outputs.mydata; 
    SET table1 table2;
RUN;
DATA Outputs.mydata; 
    SET table1 table2;
    BY ID;
RUN;

Compute the frequency of a variable in a dataset and the frequency of missing values of the variable.

PROC FREQ data = Outputs.mydata ;
    table variable/missing;
RUN;

Compute the frequency of a variable in a dataset and export results to a excel sheet.


/* Module : Macro 
Input : 
    - data : dataset
    - list_var : macro variable containing the list of variables to compute the frequency
    - path : path to save the excel file
Output :
    - Excel file containing the frequency of the variables
*/

%macro export_freq(data, list_var, path);
    %let nbvar = %sysfunc(countw(&list_var));

    %do i = 1 %to &nbvar;
        %let var = %scan(&list_var, &i);
        PROC FREQ data = &data ;
            tables &var /out = freq&var missing;
        RUN;

        PROC EXPORT data = freq&var
            outfile = "&path"
            dbms = xlsx replace;
            sheet = "&var";
        RUN;
    %end;
%mend export_freq;

%export_freq(Outputs.mydata, var1 var2 var3, /home/u63691422/EPG1V2/Jumbong_Training/Outputs/freq.xlsx);
Back to top