Skip to content

Latest commit

 

History

History
38 lines (28 loc) · 1.51 KB

NewLinkWebscrapingHints.md

File metadata and controls

38 lines (28 loc) · 1.51 KB

Tips for Webscraping Updated Table in Week3 Peer Graded Assignment

After retreiving the URL and creating a Beautiful soup object

Firstly create a list

Later after finding the table and table data create a dictionary called cell having 3 keys PostalCode, Borough and Neighborhood.

As postal code contains upto 3 characters extract that using tablerow.p.text

Next use split ,strip and replace functions for getting Borough and Neighborhood information..

Append to the list

Create a dataframe with list

Sample code

table_contents=[]
table=soup.find('table')
for row in table.findAll('td'):
    cell = {}
    if row.span.text=='Not assigned':
        pass
    else:
        cell['PostalCode'] = row.p.text[:3]
        cell['Borough'] = (row.span.text).split('(')[0]
        cell['Neighborhood'] = (((((row.span.text).split('(')[1]).strip(')')).replace(' /',',')).replace(')',' ')).strip(' ')
        table_contents.append(cell)

# print(table_contents)
df=pd.DataFrame(table_contents)
df['Borough']=df['Borough'].replace({'Downtown TorontoStn A PO Boxes25 The Esplanade':'Downtown Toronto Stn A',
                                             'East TorontoBusiness reply mail Processing Centre969 Eastern':'East Toronto Business',
                                             'EtobicokeNorthwest':'Etobicoke Northwest','East YorkEast Toronto':'East York/East Toronto',
                                             'MississaugaCanada Post Gateway Processing Centre':'Mississauga'})