-
Notifications
You must be signed in to change notification settings - Fork 2
/
Solution_File_C-SASPy_Data_Round_Trip.py
255 lines (212 loc) · 10.3 KB
/
Solution_File_C-SASPy_Data_Round_Trip.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
# Everything is better with friends: Executing SAS® code in Python scripts with
# SASPy, and turbocharging your SAS programming with open-source tooling
#
# Half-day class, Western Users of SAS Software (WUSS) 2019
###############################################################################
# Exercises 6-9: SASPy Data Round Trip #
###############################################################################
# Lines 12-13 load modules needed for exercises and should be left as-is
from class_setup import print_with_title
from saspy import SASsession
###############################################################################
# #
# Exercise 6. [Python w/ saspy] Connect to a SAS kernel #
# #
# Instructions: Uncomment the code immediately below, and then execute #
# #
###############################################################################
sas = SASsession()
print_with_title(type(sas), 'The type of SAS session object sas:')
# Notes:
#
# 1. A SASsession object named sas is created, and the following are printed:
# * confirmation a SAS session has been established
# * the type of object sas (which is saspy.sasbase.SASsession)
#
# 2. As with the DataFrame object type above, SASsession is not built into
# Python, so we had to import its definition from the saspy module at the
# beginning of this file.
#
# 3. All subsequent exercises in this file will assume the object sas exists,
# so please don't comment out the line creating it.
###############################################################################
# #
# Exercise 7. [Python w/ pandas & saspy] Load a SAS dataset into a DataFrame #
# #
# Instructions: Uncomment the code immediately below, and then execute #
# #
###############################################################################
# Original Version
fish_df = sas.sasdata2dataframe(table='fish', libref='sashelp')
print_with_title(fish_df, 'The value of fish_df:')
print_with_title(
fish_df.describe(),
'The Python equivalent of PROC MEANS using fish_df:'
)
print_with_title(fish_df.head(), 'The first five rows of fish_df:')
# Pass a numerical parameter to the head method
print(fish_df.head(4))
# Change the head method to tail
print(fish_df.tail())
# View other portions of fish_df
print(fish_df.iloc[0:2, 1:4])
# Notes:
#
# 1. A DataFrame object named fish_df with dimensions 159x7 (159 rows and 7
# columns) is created from the SAS dataset fish in the sashelp library, and
# the following are printed:
# * the type of object fish_df (which is
# <class 'pandas.core.frame.DataFrame'>)
# * the first five rows of fish_df, which are at row indices 0 through 4
# since Python uses zero-based indexing
# * summary information about the 6 numerical columns of fish_df, which is
# obtained by fish_df calling its describe method (the pandas equivalent
# of the SAS MEANS procedure)
#
# 2. The sas object represents a connection to a SAS session and was created
# in a previous exercise. Here, sas calls its sasdata2dataframe method to
# access the SAS library sashelp defined within this SAS session and to load
# the entire contents of SAS dataset sashelp.fish into the DataFrame
# fish_df.
#
# 3. All subsequent exercises in this file will assume the object fish_df
# exists, so please don't comment out the line creating it.
#
# 4. For additional practice, try any or all of the following:
# * Pass a numerical parameter to the head method to see a different number
# of rows (e.g., fish_df.head(4)).
# * Change the head method to tail to see a different part of the dataset.
# * To view other portions of fish_df, explore the more advanced indexing
# methods loc and iloc explained at
# https://brohrer.github.io/dataframe_indexing.html.
###############################################################################
# #
# Exercise 8. [Python w/ pandas] Manipulate a DataFrame #
# #
# Instructions: Uncomment the code immediately below, and then execute #
# #
###############################################################################
# Original Version
fish_df_g = fish_df.groupby('Species')
fish_df_gs = fish_df_g['Weight']
fish_df_gsa = fish_df_gs.agg(['count', 'std', 'mean', 'min', 'max'])
print_with_title(
fish_df_gsa,
'The Python equivalent of PROC MEANS with CLASS and VAR statements:'
)
# Move around and/or remove functions used for aggregation
fish_df_gsa = fish_df_gs.agg(['min', 'mean', 'max'])
print(fish_df_gsa)
# Change the variable whose values are summarized to 'Width'
fish_df_gs = fish_df_g['Width']
fish_df_gsa = fish_df_gs.agg(['count', 'std', 'mean', 'min', 'max'])
print(fish_df_gsa)
# Print out the results of using the one-liner version
print(
fish_df.groupby('Species')['Weight'].agg(
['count', 'std', 'mean', 'min', 'max']
)
)
# Notes:
#
# 1. The DataFrame fish_df, which was created in an exercise above from the SAS
# dataset sashelp.fish, is manipulated, and the following is printed:
# * a table giving the number of rows, standard deviation, mean, min, and
# max of Weight in fish_df when aggregated by Species
#
# 2. This is accomplished by creating a series of new DataFrames:
# * The DataFrame fish_df_g is created from fish_df using the groupby method
# to group rows by values in column 'Species'.
# * The DataFrame fish_df_gs is created from fish_df_g by extracting the
# 'Weight' column using bracket notation.
# * The DataFrame fish_df_gsa is created from fish_df_gs using the agg
# method to aggregate by the functions in the list ['count', 'std',
# 'mean', 'min', 'max'].
#
# 3. Identical results could be obtained using the following SAS code:
# proc means data=sashelp.fish std mean min max;
# class species;
# var Weight;
# run;
# However, while PROC MEANS operates on SAS datasets row-by-row from disk,
# DataFrames are stored entirely in main memory. This allows any number of
# DataFrame operations to be combined for on-the-fly reshaping using "method
# chaining." In other words, fish_df_gsa could have instead been created
# with the following one-liner, which avoids the need for intermediate
# DataFrames (and thus executes much more quickly):
# fish_df_gsa = fish_df.groupby('Species')['Weight'].agg(
# ['count', 'std', 'mean', 'min', 'max']
# )
#
# 3. All subsequent exercises in this file will assume the object fish_df_gsa
# exists, so please don't comment out the line(s) creating it.
#
# 4. For additional practice, try any or all of the following:
# * Move around and/or remove functions used for aggregation, and see how
# the output changes.
# * Change the variable whose values are summarized to 'Width'.
# * Print out the results of using the one-liner version.
###############################################################################
# #
# Exercise 9. [Python w/ pandas & saspy] Load a DataFrame into a SAS dataset #
# #
# Instructions: Uncomment the code immediately below, and then execute #
# #
###############################################################################
# Original Version
sas.dataframe2sasdata(fish_df_gsa, table="fish_sds_gsa", libref="Work")
sas_submit_return_value = sas.submit(
'''
PROC PRINT DATA=fish_sds_gsa;
RUN;
''',
results='TEXT'
)
sas_submit_results = sas_submit_return_value['LST']
print_with_title(
sas_submit_results,
'SAS results from PROC PRINT applies to new SAS dataset Work.fish_sds_gsa:'
)
# Print out the SAS log
print(sas_submit_return_value['LOG'])
# Change the SAS procedure
print(
sas.submit(
'''
PROC CONTENTS DATA=fish_sds_gsa;
RUN;
''',
results='TEXT'
)['LST']
)
# Notes:
#
# 1. The DataFrame fish_df_gsa, which was created in an exercise above from the
# SAS dataset sashelp.fish, is used to create the new SAS dataset
# Work.fish_sds_gsa. The SAS PRINT procedure is then called, and the
# following is printed:
# * the output returned by PROC PRINT
#
# 2. The sas object, which was created in a cell above, is a persistent
# connection to a SAS session, and two of its methods are used as follows:
# * The dataframe2sasdata method writes the contents of the DataFrame
# fish_df_gsa to the SAS dataset fish_sds_gsa stored in the Work library.
# (Note: The row indexes of the DataFrame fish_df_gsa are lost when the
# SAS dataset fish_sds_gsa is created.)
# * The submit method is used to submit the PROC PRINT step to the SAS
# kernel, and a dictionary is returned with the following two key-value
# pairs:
# - sas_submit_return_value['LST'] is a string comprising the results from
# executing PROC PRINT, which will be in plain text because the
# results='TEXT' was used
# - sas_submit_return_value['LOG'] is a string comprising the plain-text
# log resulting from executing PROC PRINT
#
# 3. Python strings surrounded by single quotes (e.g., 'Hello, World!') cannot
# be written across multiple lines of code, whereas strings surrounded by
# triple quotes (e.g., the argument to the submit method) can.
#
# 4. For additional practice, try any or all of the following:
# * Print out the SAS log.
# * Change the SAS procedure used to interact with SAS dataset
# Work.fish_sds_gsa (e.g., try PROC CONTENTS).