-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
- Loading branch information
There are no files selected for viewing
Large diffs are not rendered by default.
Large diffs are not rendered by default.
Large diffs are not rendered by default.
Large diffs are not rendered by default.
Large diffs are not rendered by default.
Large diffs are not rendered by default.
Large diffs are not rendered by default.
Large diffs are not rendered by default.
Large diffs are not rendered by default.
Large diffs are not rendered by default.
Large diffs are not rendered by default.
Large diffs are not rendered by default.
Large diffs are not rendered by default.
Large diffs are not rendered by default.
Large diffs are not rendered by default.
Large diffs are not rendered by default.
Large diffs are not rendered by default.
Large diffs are not rendered by default.
Large diffs are not rendered by default.
Large diffs are not rendered by default.
Large diffs are not rendered by default.
Large diffs are not rendered by default.
Large diffs are not rendered by default.
Large diffs are not rendered by default.
Large diffs are not rendered by default.
Large diffs are not rendered by default.
Large diffs are not rendered by default.
Large diffs are not rendered by default.
Large diffs are not rendered by default.
Large diffs are not rendered by default.
Large diffs are not rendered by default.
Large diffs are not rendered by default.
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
# textbook | ||
Textbook for Econ 148: Data Science for Economists at UC Berkeley | ||
|
||
Content is stored in the content folder. Order of textbook can be changed from _toc.yml file. |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,31 @@ | ||
# Summary of Differences | ||
|
||
While Python and Stata can both be effectively used for data analysis, they do have a few key differences, as summarized below:- | ||
|
||
## General Purpose vs Specialized | ||
|
||
Python is a versatile, general-purpose programming language that can be used for a wide variety of computing tasks, including web development, artificial intelligence, etc. It's general-purpose nature makes it suitable for a wide range of tasks beyond statistical analysis, allowing users to build end-to-end data science solutions and integrate seamlessly with various technologies. On the other hand, Stata is a specialized programming language, designed specifically for statistical analysis and data management. This specialization makes Stata really efficient and easy to use when conducting statistical analyses, but it comes at the expense of general versatility. | ||
|
||
## Syntax | ||
|
||
Python has a general syntax, allowing for multiple programming paradigms, including procedural, object-oriented, and functional programming. This flexibility allows users to adopt different coding styles and adapt Python to various application domains. In contrast, Stata has a special command-driven syntax which works best for statistical analysis. This syntax is also designed to be intuitive and user-friendly, allowing it's users to focus moreso on the results. | ||
|
||
## Packages/Libraries | ||
|
||
There are multiple packages which can be easily installed and used with Python. Several of these often complement each other, allowing lots of flexibility in how to conduct your analysis. On the other hand, Stata is more reliant on it's built-in commands and functions. While there are user written packages in Stata, they lack the breadth of Python packages. | ||
|
||
## Data Management | ||
|
||
While both languages can be used for datasets of various sizes, Stata is known for being efficient when working with large datasets. However, Python is more flexible with handling diverse data structures beyond just databases. | ||
|
||
## Learning Curve | ||
|
||
Stat's syntax is designed to be very intuitive and easy to use, allowing for a small learning curve and enabling users to focus moreso on the results. In contrast, Python's learning curve may be steeper as it's a general purpose programming language. | ||
|
||
## Cost | ||
|
||
Python is free and open-source, allowing anyone to use it. On the other hand, Stata requires a license to access, with different pricing tiers depending on what you may need. You can learn more about Stata's pricing structure [here](https://www.stata.com/order/). | ||
|
||
## Community | ||
|
||
Python has a large and active community, with lots of tutorials, resources, and debugging assistance. While Stata also has an active community, it is not as large as Python's, making debugging tasks and learning new techniques harder. |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,13 @@ | ||
# History of the Languages | ||
|
||
This subchapter will discuss the history of how Python and Stata were created, as well as their intended uses. By doing so, it will shed further light on how and why the differences between Python and Stata came to be. | ||
|
||
###### <span style="color:red"> To be updated </span> | ||
|
||
<!-- ## History of Python | ||
HI | ||
## History of Stata | ||
Stata was initially released in 1985, back when PCs were just being introduced on the market. The first few releases didn't have too many features. Things began changing with the release of the program command in Stata 1.3, which allowed users to add their own commands. --> |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
# Python vs Stata | ||
|
||
This chapter is intended to give you a brief overview of the differences between Python and Stata. We will cover the history of the programming languages, how they function and key differences regarding how they interact with data. | ||
|
||
<!-- For more information, please look at [Python's official page](https://www.python.org/) and [Stata's official page](https://www.stata.com/) respectively. --> |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,31 @@ | ||
# Installation and General Usage | ||
|
||
## Installation | ||
|
||
### Python | ||
|
||
To install Python, you should download the installer for the appropriate version from [www.python.org/downloads/](https://www.python.org/downloads/) and run the installer. Once you have run the installer, you can try entering `python --version` (or `python3 --version` if you installed a Python version greater than 3.0) in your terminal/command line to verify the installation worked. | ||
|
||
After that, it is recommended you also install several packages to help interact with data. More information on how to use these packages can be found in the Syntax section, but for now I will just discuss how to install them. | ||
|
||
For this online book, you are only required to install `pandas`, `numpy`, `matplotlib`, `regex`, `seaborn` and `scikit-learn`, although it is also a good idea to explore and other packages like `scipy`, `plotly` if you wish. To install any of these packages, you can just type `pip install` followed by the package name in the terminal. For example, to install pandas, you would type `pip install pandas`. If you're interested, you can find more information about `pip` [here](https://pypi.org/project/pip/). | ||
|
||
I also recommend installing Jupyter Lab as it provides an interactive shell for you to easily conduct data analysis and look at what your data looks like. To do so, just type `pip install jupyter lab` like above. Once you have installed jupyter lab, you can start it by typing `jupyter lab` in your terminal. This should open up a Jupyter Lab environment in your default browser, allowing you to view/open various files and create new notebooks for data analysis. You can learn more about this on their [website](https://jupyter.org/). | ||
|
||
### Stata | ||
|
||
To install Stata, you need to purchase a license from their [website](https://www.stata.com/order/), after which they send you an activation key along with further instructions on how to install the software. | ||
|
||
## General Usage | ||
|
||
### Python | ||
|
||
In Python, data analysis is typically carried out using libraries like Pandas and NumPy, and the approach to data storage and analysis differs from Stata. As Python is a general programming language, there is no requirement to have any datasets used or packages imported. However, after importing the relevant libraries, you can easily load in datasets using pandas, with the primary data structure for data manipulation and analysis being the Pandas DataFrame. With Jupyter Lab notebooks, you can easily view different datasets in different cells of the notebooks, allowing you to easily view and work on multiple datasets at once. | ||
|
||
However, as Python is a general programming language reliant on packages for data analyses, it's general nature means that users often have to combine multiple packages to conduct different types of analyses. For instance, while Pandas is extensively used for data manipulation and basic statistical analyses, additional libraries like NumPy are employed for numerical operations, and SciPy might be used for more advanced statistical techniques. Visualization can be handled with libraries such as Matplotlib or Seaborn, and machine learning tasks might involve libraries like Scikit-learn, TensorFlow, or PyTorch. While this approach involves more work from the user, it also allows for a greater level of customization and adaptability depending on what the user needs. | ||
|
||
### Stata | ||
|
||
When you're working in Stata, you always have one dataset loaded into memory at a time. This dataset serves as the primary focus for any operations or analyses you perform. Stata's memory structure is designed around this single-dataset paradigm, which means that you cannot directly view or manipulate multiple datasets concurrently. | ||
|
||
However, you do not need to worry about installing additional packages, as Stata offers a comprehensive set of built-in commands that cover a wide range of statistical analyses and data manipulation techniques. These include functions for descriptive statistics, regression analysis, hypothesis testing, data cleaning, etc, and you might need to combine multiple Python packages to conduct the same analyses. |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
# Syntax | ||
|
||
This subchapter will go over the key syntatical differences between data analysis in Stata and Python. As much as possible, it will attempt to draw parallels between how Stata and Python conduct their data analyses, pointing out which commands are similar and how they are similar. It will also include a tool for typing in Stata commands and recieving the equivalent Python output. | ||
|
||
###### <span style="color:red"> To be updated</span> |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
# Prerequisites | ||
|
||
This chapter aims to familiarise you with the tools we will be using throughout this textbook. It is not a substitute for practicing with Python, use it as an aid but remember to practice. | ||
|
||
<!-- For more information, please look at [Python's official page](https://www.python.org/) and [Stata's official page](https://www.stata.com/) respectively. --> |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,138 @@ | ||
{ | ||
"cells": [ | ||
{ | ||
"cell_type": "markdown", | ||
"id": "73656491-d0c1-49b0-8b32-ab92f0f3de56", | ||
"metadata": {}, | ||
"source": [ | ||
"## Jupyter Notebooks\n", | ||
"\n", | ||
"Throughout this textbook, we will be using [Jupyter notebooks](https://jupyter.org/) to work with data. While you can use `pandas` without Jupyter notebooks, using notebooks is highly encouraged as it helps you interactively view datasets and easily see the results of multiple small iterations of data processing. If you do not already have Jupyter notebooks installed, you can see installation instructions [here](../01-python_v_stata/install.md)." | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "62c2ba61-d3b9-40f1-8c83-fdc6a0e452aa", | ||
"metadata": {}, | ||
"source": [ | ||
"## Basic Layout\n", | ||
"\n", | ||
"The basic layout of a Jupyter notebook is fairly intuitive. You can open multiple cells and perform different tasks in them. Each cell can either be a code cell, a markdown cell (often used for writing text, like the cell containing this paragraph), or a raw cell (don't worry about these for now). You can run each code cell, and you can also run the entire notebook in one go via the menu bar at the top. Feel free to explore the menu bar at the top to get a better idea of what all you can do!\n", | ||
"\n", | ||
"Each time you open a Jupyter notebook, you are assigned a kernel, which is a computational engine that executes the code contained in a notebook document. Each kernel has a RAM limit, be careful you don't exceed it! If you exceed it, your code will stop working. You can restart your kernel from the menu bar.\n", | ||
"\n", | ||
"If you're interested, you can find a detailed guide [here](https://www.youtube.com/watch?v=HW29067qVWk&ab_channel=CoreySchafer)." | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "6ea83480-29aa-443b-a8e5-63a335821d7f", | ||
"metadata": {}, | ||
"source": [ | ||
"## Tips and Tricks" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "d510689f-6773-4b91-b687-ac952743a6e4", | ||
"metadata": {}, | ||
"source": [ | ||
"The rest of this subchapter will be focused on tips and tricks you can use to make life easier when working with Jupyter notebooks. " | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "ff7a62d2-1aa8-44a6-ae09-04c3dbbc46b0", | ||
"metadata": {}, | ||
"source": [ | ||
"### Viewing Documentation\n", | ||
"\n", | ||
"You can use Jupyter to view the documentation of functions inside your notebook. The function must already be defined in the kernel (aka the cell defining the function must have already been run) for this to work.\n", | ||
"\n", | ||
"Below, click your mouse anywhere on the `print` block below and use `Shift` + `Tab` to view the function's documentation. " | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 1, | ||
"id": "132998cf-cdc8-4e22-97a4-4d3a6c58064f", | ||
"metadata": {}, | ||
"outputs": [ | ||
{ | ||
"name": "stdout", | ||
"output_type": "stream", | ||
"text": [ | ||
"I hope you have a great day :)\n" | ||
] | ||
} | ||
], | ||
"source": [ | ||
"print('I hope you have a great day :)')" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "3c31df35-6455-4cfa-bbef-0a6b43eaaebf", | ||
"metadata": {}, | ||
"source": [ | ||
"### Importing Libraries\n", | ||
"\n", | ||
"In Jupyter notebooks, we often work with functions from multiple libraries. It is a good practice to import any libraries you will be using at the top of the notebook. We have followed this practice throughout every chapter/subchapter in the course. Since we do not need any libraries for this subchapter, we have not imported anything." | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "01e8c708-b691-4cc3-ae92-9a8b96a040ec", | ||
"metadata": {}, | ||
"source": [ | ||
"### Keyboard Shortcuts\n", | ||
"\n", | ||
"Even if you are familiar with Jupyter, we strongly encourage you to become proficient with keyboard shortcuts (this will save you time in the future). To learn about keyboard shortcuts, go to **Help --> Keyboard Shortcuts** in the menu above. \n", | ||
"\n", | ||
"Here are a few that we like:\n", | ||
"1. `Ctrl`/`Cmd` + `Return`: *Evaluate the current cell*\n", | ||
"1. `Shift` + `Return`: *Evaluate the current cell and move to the next*\n", | ||
"1. `Ctrl`/`Cmd` + `+` or `-`: *Zoom in or out*\n", | ||
"1. `Ctrl`/`Cmd` + `/`: *Comment or uncomment the selected code at once*\n", | ||
"1. `ESC` : *command mode* (press before using any of the commands below)\n", | ||
" 1. `a` : *create a cell above*\n", | ||
" 1. `b` : *create a cell below*\n", | ||
" 1. `dd` : *delete a cell*\n", | ||
" 1. `z` : *undo the last cell operation*\n", | ||
" 1. `m` : *convert a cell to markdown*\n", | ||
" 1. `y` : *convert a cell to code*" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "a5ad236a-e2b4-45d6-b2e6-694b2da63f9e", | ||
"metadata": {}, | ||
"source": [ | ||
"### Running Cells \n", | ||
"\n", | ||
"Aside from keyboard shortcuts (specifically `Shift` + `Return`), you can also run a single cell by clicking the `Run` button in the menu bar at the top of your notebook. If you hover over the button, you will also find some other options that allow you to run multiple cells. Specifically, restarting your kernel clears all saved variables and frees up memory. The `Restart Kernel and Run upto Selected Cell` option is particularly useful for situations where you believe your code is fine but you're running into stange memory issues. " | ||
] | ||
} | ||
], | ||
"metadata": { | ||
"kernelspec": { | ||
"display_name": "Python [conda env:sklearn-env]", | ||
"language": "python", | ||
"name": "conda-env-sklearn-env-py" | ||
}, | ||
"language_info": { | ||
"codemirror_mode": { | ||
"name": "ipython", | ||
"version": 3 | ||
}, | ||
"file_extension": ".py", | ||
"mimetype": "text/x-python", | ||
"name": "python", | ||
"nbconvert_exporter": "python", | ||
"pygments_lexer": "ipython3", | ||
"version": "3.11.0" | ||
} | ||
}, | ||
"nbformat": 4, | ||
"nbformat_minor": 5 | ||
} |