This repository contains a series of functions that can be used to scrape AZLyrics.com and download lyrics in txt format.
Lyrics can be used for many Natural Language Processing tasks: as an example, you can read on my website the report of my Sentiment Analysis of Bruce Springsteen's songs.
-
save_file(path, text, replace=False)
This is an auxiliary function used to save a given text in a txt file. -
get_lyrics(song_url, save=True, by_decade=False, replace=False, folder="songs")
This function can be used to download the lyrics of a single song, given its page url.
Function Parameters:song_url
: The AZLyrics url of the page containing the song.save
: ifTrue
(default), the lyrics are saved in a txt file named as the song title. IfFalse
, the function just returns the song title, lyrics and year as a 3-dimensional tuple.by_decade
: ifTrue
, andsave=True
, the lyrics are saved in a folder named as the decade when the song was produced. IfFalse
(default), andsave=True
, the lyrics are just stored in a generic folder.folder
: the name of the folder where the txt lyrics will be saved.
-
scrape_artist(az_url, sleep="random", by_decade=True, replace=False, folder="songs")
This function downloads all the lyrics of a given artist, starting from their page url.
Function Parameters:az_url
: The artist main page url on AZLyrics.sleep
: The sleeping time (in seconds) between iterations. By default it is set to"random"
, which means that the sleeping time is randomly selected at each iteration between 5 and 15 seconds. This has been tested and it avoids being recognized as a bot, resulting in your IP to be temporarly banned. You can also set this to a custom time, but it is recommended to keep it around 10 seconds as shorter intervals may be problematic.by_decade
: ifTrue
(default), the lyrics are saved in a folder named as the decade when the song was produced. IfFalse
, the txt lyrics are just stored in a generic folder.replace
: If False (default), if two or more songs have the same name, all lyrics are saved in separate files. If True, then only the latest one gets saved.folder
: the name of the folder where the txt lyrics will be saved.
-
get_artists(letter, home="https://www.azlyrics.com/")
Another auxiliary function, which returns the urls and names of all the artists whose names start with a givenletter
. -
scrape_all(letters="all", sleep="random", by_decade=True, replace=False, folder="songs")
This function downloads all the lyrics of all artists whose names start with a given letter. Note: I have estimated that in order to download each song of every artist on AZLyrics, the function should run non-stop for something like 27 weeks, given an average sleep time of 10 seconds between iterations. So this is just for fun, and not meant for actual use.
Function Parameters:letters
: A list containing all the letters whose corresponding artists' lyrics should be downloaded. By default, it is set to"all"
, which means that every lyric contained on AZLyrics is downloaded.sleep
: The sleeping time (in seconds) between iterations. By default it is set to"random"
, which means that the sleeping time is randomly selected at each iteration between 5 and 15 seconds. This has been tested and it avoids being recognized as a bot, resulting in your IP to be temporarly banned. You can also set this to a custom time, but it is recommended to keep it around 10 seconds as shorter intervals may be problematic.by_decade
: ifTrue
(default), the lyrics are saved in a folder named as the decade when the song was produced. IfFalse
, the txt lyrics are just stored in a generic folder.replace
: IfFalse
(default), if two or more songs have the same name, all lyrics are saved in separate files. IfTrue
, then only the latest one gets saved.folder
: the name of the folder where the txt lyrics will be saved. If set to"names"
, then each artist will be downloaded in a separate folder with the corresponding name. Otherwise, all lyrics are collected in the same folder.
Downloading all lyrics on AZLyrics.com: (See warning above!)
scrape_all(letters="all")
Downloading all lyrics of every artist starting with "a", each artist in a separate folder: (ETA = 1 week)
letter_list = ["a"]
scrape_all(letter_list, folder="names")
Downloading all lyrics of a given artist:
bruce = "https://www.azlyrics.com/s/springsteen.html"
scrape_artist(bruce, folder="bruce")
Downloading a single song lyrics:
the_chain = "https://www.azlyrics.com/lyrics/fleetwoodmac/thechain.html"
get_lyrics(the_chain, folder="fwm")