Prepared data is availble as Apache Parquet file. To read parquet file, first install:
pip install pyarrow
Then use pd.read_parquet()
:
df_ohlcv = pd.read_parquet("sec_ohlcv.parquet")
pandas_datareader
is a python package to download data from multiple sources, e.g. yahoo finance, Eurostat and OECD. As the package's name suggests, the data will be directly pulled into a pandas dataframe. The documentation for this package is limited, therefore you might need use shift + tab
in jupyter notebooks or dig into the source code to understand the available parameters.
To install pandas_datareader
:
pip install pandas_datareader
The following code pulls BMW historical (split and dividend) adjusted price between 1/1/2001 and 1/2/2021 from yahoo finance:
import pandas_datareader as pdr
pdr.get_data_yahoo("BMW.DE", start="2001-01-01", end="2021-02-01")
By specifying adjust_price=True
, pandas_datareader
adjusts all prices (Open
, High
, Low
, Close
) based on Adj Close
price. It also adds Adj_Ratio
column and drops Adj Close
.
pdr.get_data_yahoo("BMW.DE", start="2001-01-01", end="2021-02-01", adjust_price=True)
You may also download multiple stock data at the same time, though you need some pandas kung-fu to transform the shape of the dataframe:
data = pdr.get_data_yahoo(["ADS.DE", "BMW.DE"], start="2001-01-01", end="2021-02-01", adjust_price=True)
df = data.stack(level=1).reset_index(level=1)
Making the same request repeatedly can use a lot of bandwidth, slow down your code and may result in your IP being banned. pandas-datareader
allows you to cache queries into a sqlite
database using requests_cache
. First make sure you have installed requests_cache
:
pip install requests_cache
The following function can download and cache your requests for BMW historical price between 1/1/2001 and today from yahoo finance. web.DataReader()
exposes limited parameters, therefore we can only get adjusted closing price:
from datetime import timedelta, date
import pandas_datareader.data as web
import requests_cache
def get_data(symbols, start="1/1/2001", end=date.today()):
session = requests_cache.CachedSession(
cache_name="cache", backend="sqlite", expire_after=timedelta(days=1)
)
data = web.DataReader(symbols, "yahoo", start, end, session=session)
return data
get_data("BMW.DE")
Wharton Research Data Services (WRDS) is online platform that provides access to analytics and historical financial and accounting data for corporations and banks, historical economic data, and tutorials on how to access them. We will count on WRDS to pull security price, fundamental, and analyst data.
After finishing initial setup, you can initialize a new connection by using the following code:
db = wrds.Connection(wrds_username='WRDS_USERNAME')
You can use this connection object (db
) to query WRDS database:
db.raw_sql(
"""
SELECT COL_1, COL_2 FROM SCHEMA.TABLE
""",
date_cols=["DATECOLUMN"])
Or directly pull the table:
db.get_table(library="SCHEMA", table="TABLE", columns=["COL_1", "COL_2"], obs=10, index_col="COL_INDEX", date_cols="DATECOLUMN")
Always disconnect using close()
when you exit your Python environment or complete your data query download step and want to move onto another:
db.close()
You can also use with
statement which automatically closes the connection when the query is finished:
def get_data_wrds():
with wrds.Connection(wrds_username="WRDS_USERNAME") as db:
data = db.raw_sql(
"""
SELECT COL_1, COL_2 FROM SCHEMA.TABLE
""",
date_cols=["DATECOLUMN"],
)
return data
You can find company specific code using WRDS company code lookup. Note that the most standard identifiers, such as companies’ tickers and CUSIPs, tend to change over time. Therefore, using GVKEY (Global Company Key) which is a unique number assigned to each company in the Compustat is always prefered. To pull the company information of BMW from Compustat Global Daily library:
SELECT *
FROM comp_global_daily.g_company
WHERE gvkey = '100022';
The following code queries against the daily prices of BMW since 1/1/2001, note that the adjusted price is PRICE/AJEXDI:
SELECT
datadate,
prcod / ajexdi AS adj_prcod,
prchd / ajexdi AS adj_prchd,
prcld / ajexdi AS adj_prcld,
prccd / ajexdi AS adj_prccd,
cshtrd
FROM
comp_global_daily.g_sec_dprc
WHERE
datadate >= '2001-01-01'::date
AND gvkey = '100022'
AND iid = '01W'
ORDER BY
date;
TRFD includes cash equivalent distributions, reinvestment of dividends and the compounding effect of dividends paid on reinvested dividends. To compute returns, the dividend ajusted price is PRICE/AJEXDI×TRFD:
SELECT
datadate,
prcod / ajexdi * trfd AS r_prcod,
prchd / ajexdi * trfd AS r_prchd,
prcld / ajexdi * trfd AS r_prcld,
prccd / ajexdi * trfd AS r_prccd,
cshtrd
FROM
comp_global_daily.g_sec_dprc
WHERE
datadate >= '2001-01-01'::date
AND gvkey = '100022'
AND iid = '01W'
ORDER BY
date;
We will limit our scope to major European exchanges in Austria, Belgium, Denmark, Finland, France, Germany, Ireland, Italy, Luxembourg, the Netherlands, Norway, Portugal, Spain, Sweden, Switzerland and the United Kingdom. Here are their codes in WRDS:
exchg | exchgdesc |
---|---|
104 | NYSE Euronext Amsterdam |
132 | NYSE Euronext Brussels |
151 | Swiss Exchange |
154 | Deutsche Boerse AG |
171 | XETRA |
172 | Irish Stock Exchange |
192 | NYSE Euronext Lisbon |
194 | London Stock Exchange |
201 | Bolsa De Madrid |
209 | Borsa Italiana |
228 | Oslo Bors ASA |
256 | NASDAQ OMX Nordic |
257 | Boerse Stuttgart |
273 | Wiener Börse |
286 | NYSE Euronext Paris |
If you are looking for other exchanges' codes, run the following queries:
SELECT
*
FROM
comp_global_daily.r_ex_codes;
To look up companies' listed stock on major European exchanges using EUR, run the following querry:
SELECT
sec.gvkey,
sec.iid,
excntry
FROM
comp_global_daily.g_security AS sec
JOIN comp_global_daily.g_funda AS fund ON sec.gvkey = fund.gvkey
AND sec.iid = fund.iid
WHERE
sec.exchg = ANY (ARRAY [104, 107, 132, 151, 154, 171, 192, 194, 201, 209, 256, 257, 273, 276, 286]);
Global companies like BMW list their shares on multiple exchanges. We want to find the reference listing which also has available fundamental data. To find European company's stock issues which links to their fundamental data, run the following query:
SELECT
*
FROM
comp_global_daily.g_security
WHERE
exchg = ANY (ARRAY[104,132, 171, 151, 192, 194, 201, 209, 256, 286])
AND ibtic IS NOT NULL;
Using the query below, we can find the daily stock prices for companies on major European exchanges using EUR:
SELECT
datadate,
price.gvkey,
curcdd,
prcod / ajexdi AS prcod,
prchd / ajexdi AS prchd,
prcld / ajexdi AS prcld,
prccd / ajexdi AS prccd,
cshtrd
FROM
comp_global_daily.g_secd AS price,
( SELECT DISTINCT
gvkey,
iid
FROM
comp_global_daily.g_funda) AS eu
WHERE
datadate >= '2000-01-01'::date
AND exchg = ANY (ARRAY [104, 132, 151, 154, 171, 172, 192, 194, 201, 209, 228, 256, 257, 273, 286])
AND price.cshtrd IS NOT NULL
AND price.gvkey = eu.gvkey
AND price.iid = eu.iid;
Fundamenatal data is available on WRDS as a table called g_fundq
. The columns' definition can be found here. The following code queries yearly fundamental data of Siemens AG in 2005:
SELECT
datadate,
conm,
-- currency code
curcd,
-- current assets
act,
-- cash and cash qquivalents at end of year
chee,
-- short-term investments
ivst,
-- cash and short-term investments, che = chee + ivst
che,
-- net accounts receivable
coalesce(rectr, rectrfs) AS rectr,
-- other current receivables
coalesce(recco, reccofs) AS recco,
-- total receivables, rect = rectr + recco
coalesce(rect, artfs) AS rect,
-- inventories
invt,
-- other current assets
coalesce(aco, acox, acofs, acoxfs) AS aco,
-- total current assets
act,
-- net property, plant and equipment
ppent,
-- total intangible assets
intan,
-- other assets
ao,
-- total assets
at,
-- accounts payable
ap,
-- total current debt
dlc,
-- total current liabilities
lct,
-- long-term debt
dltt,
-- total liabilities
lt,
-- common stock
cstk,
-- retained earnings, sometimes 0, need to be adjusted
re,
-- total common equity
ceq,
-- total equity
coalesce(teq, lse - lt) AS teq,
-- total liabilities and equity
lse,
-- total debt
dlc + dltt AS total_debt,
-- net debt
dlc + dltt - che AS net_debt,
-- working capital
wcap,
-- revenue
revt,
-- costs of goods sold
cogs,
-- research and development expense
xrd,
-- interests expense
xint,
-- income tax
txt,
-- net income from continuing operations
nicon,
-- net income from extraordinary items and discontinued operations
xido,
-- net income
nicon + xido as net_income,
-- ebit
ebit,
-- ebitda
ebitda,
-- eps excluding extraordinary items
epsexcon,
-- eps including extraordinary items
epsincon,
-- Invested capital, icapt = teq + dltt
icapt,
-- cash from operations from continuing and discontinued operations
oancf,
-- CAPEX
capx
FROM
comp_global_daily.g_funda
WHERE
gvkey = '019349'
AND datadate = '2005-09-30'::date;
It is also possible to include the estimates made by analysts to improve the model performance. The surprise summary data can be queried using the following code:
SELECT
*
FROM
ibes.surpsum AS surprise,
(
SELECT
*
FROM
comp_global_daily.g_security
WHERE
exchg = ANY (ARRAY [104,132, 171, 151, 192, 194, 201, 209, 256, 286])
AND ibtic IS NOT NULL) AS security_eu
WHERE
surprise.ticker = security_eu.ibtic
AND surprise.anndats >= '2001-01-01'::date
ORDER BY
surprise.anndats;
SELECT
curr.datadate,
curr.tocurd,
curr.exratd / eur.exratd AS exratd
FROM (
SELECT
datadate,
tocurd,
exratd
FROM
comp.g_exrt_dly
WHERE
tocurd = ANY (ARRAY ['ATS', 'AUD', 'BBD', 'BEF', 'BWP', 'CAD', 'CHF', 'CNY', 'CZK', 'DEM', 'DKK', 'EEK', 'EGP', 'ESP', 'FIM', 'FRF', 'GBP', 'GEL', 'GRD', 'HKD', 'HUF', 'IEP', 'ILS', 'INR', 'ISK', 'ITL', 'JPY', 'LTL', 'MXN', 'MYR', 'NLG', 'NOK', 'NZD', 'PGK', 'PLN', 'PTE', 'RUB', 'SAR', 'SEK', 'SGD', 'SKK', 'TRY', 'UAH', 'USD', 'XAF', 'XOF', 'ZAR', 'ZMK', 'ZMW'])) AS curr
JOIN (
SELECT
datadate, exratd
FROM
comp.g_exrt_dly
WHERE
tocurd = 'EUR') AS eur ON curr.datadate = eur.datadate;