the project aims to replicate the following paper: Clemens, J., & Wither, M. (2019). The minimum wage and the Great Recession: Evidence of effects on the employment and income trajectories of low-skilled workers. Journal of Public Economics, 170, 53-67.
- I first download the code from the following website: https://www.nber.org/research/data/survey-income-and-program-participation-sipp
Because the data is from 2008.07 to 2012.07, I need to download SIPP from wave 1 to wave 12.
After downloading the 2008 sipp data, I use the stata to transform the data into csv file. The code is as followws: sippl08puw2 yijia.do, and export_Csv_do_file.do
I only get the column that I needed, including the following variable. ssuid tage tpyrate1 tfipsst euectyp5 eawop rmhrswk ehrsall ehrsbs1 ehrsbs2 thearn thtotinc rhpov rfpov rfownkid eeducate. the dictionary to explain the above variable is the infile dictionary.docx.
Next, I use python to process the data. I first merge the 12 files into one file. then I calculate the descriptive summary of the data. the notebook is precessing the sipp data.ipynb
- to replicate figure 1, I need to get the minimum wage by state at 2008. I search and get the minimum wage data from: https://www.kaggle.com/datasets/lislejoem/us-minimum-wage-by-state-from-1968-to-2017?resource=download
the downloaded file is: 1_Minimum Wage Data by state.csv
the original figure 1 is as follows:
the figure I draw is as follows:
I can only draw panel A, and I can not draw panel B because I cannot find the corresponding data of minimum wages by month in the SIPP data. I can only find the data by year. Moreover, I cannot find the data in SIPP, and I can only find it from other source (i.e.,https://www.kaggle.com/datasets/lislejoem/us-minimum-wage-by-state-from-1968-to-2017?resource=download).
the original Table 1 is as follows:
the Table 1 I draw is as follows:
My table is very different from the authors, I think the reason is that i use the different variable from the author.
There are 1022variables in the SIPP data set, and the column name is not intuitive. For example, variable thearn means Total household earned income , but variable thtotinc means Total household income. I don't understand the difference betweent the two variables very much. And in table 1, the author only use the name of income. It is challenging for me to figure out which variable in the SIPP is the variable used by the author.
Meanwhile, some variables are missing, and I don't know which is the right variable. For example, the table 1 has a variable named no earnings, but I cannot find such variable in the dataset, and I cannot calculate it by myself.
Morover, I cannot seperate 6 columns based on the data. I can not find the average baseline wage informaiton, and I cannot find the wage variable in the SIPP dataset. Alternatively, I use the Regular hourly pay rate (i.e., tpyrate1 in the dataset) to represent the wage. I classify the data into three groups: 1) tpyrate1<$7.5, 2) $7.5<tpyrate1<$8.49, 3) $8.50<tpyrate1<$9.99. However, the first variable is $5.15<wage<$7.25. In my classification, all $5.15<wage<$7.25 belongs to the first group (i.e.,tpyrate1<$7.5). So I can only get the summary statistics for column 1 and column 2 in the table 1.
Not surprisingly, my observations and number of individuals are larger than those in the paper. It is because I inlcude all the obversations that have wage between $5.15 and $7.25 in the first two columns. However, it is confusing to me that my observations are very different from the sum of all the six columns in the table 1.
the original Figure 2 Panel A is as follows:
the figure 2 Panel A I draw is as follows:
My figure is very different from that in the paper. The reason may be as follows: the y-axis in my figure is the wage, but the author uses affected wage. I fail to get the variable of affected wage and this makes the difference.
the original Figure 4 panel C is as follows:
the Figure 4 panel C I draw is as follows:
The figure is different. I use Total household earned income to draw the figure, and I cannot find a variable named family income. I suppose we use different variables.
the original Table 2 Columns 1 is as follows:
the Table 2 Columns 1 I draw is as follows:
I run the following equation, but I fail to use individual fixed effect in the regression, there is an error and I fail to fix it. the error shows that code 103 too many variables specified. It indicates that too many factor variables are not allowed. the log file for running the regression is replication_Ayla_1125.log. So I can only use time fixed and state fixed effect, but I cannot use individual fixed effect, which requires about 6000 dummy variables.
Overall, I find it is very challenging to replicate a paper and get the same results. The original data is available, but I find it challenging to find the accurate variables that are used by the author, especially the data include many similar variables.
99 for myself reference
I use ArcGIS to visualize the panel 1 in figure 1. The output file is figure1_minimum wage bounded map.pdf. The process path is "D:\1_yogafile\3_FSU\2022fallCourse\appliedECO\minimumWage" .