This project is made with the help of Selenium drive and the Python. This project is capable of removing followers and following from your profile and it can even download the media file such as images and videos of the post.
- Making a Login Script
- Accessing the timeline feeds and downloading the media files
- Managing the followers and followings
- How to run our code
- Chrome webdriver(same version as chrome browser)
- Requests library
- Selenium library
- Urllib3
For Chrome Web driver You can download it from here
If you are unable to install then you can follow this link Youtube
Installing libraries
pip install requests
pip install urllib3
pip install selenium
A CSS Selector is a combination of an element selector and a value which identifies the web element within a web page. The choice of locator depends largely on your Application Under Test
Id An element’s id in XPATH is defined using: “[@id='example']” and in CSS using: “#” - ID's must be unique within the DOM. Examples:
XPath: //div[@id='example']
CSS: #example
Element Type The previous example showed //div in the xpath. That is the element type, which could be input for a text box or button, img for an image, or "a" for a link.
Xpath: //input or
Css: =input
Direct Child HTML pages are structured like XML, with children nested inside of parents. If you can locate, for example, the first link within a div, you can construct a string to reach it. A direct child in XPATH is defined by the use of a “/“, while on CSS, it’s defined using “>”. Examples:
XPath: //div/a
CSS: div > a
Child or Sub-Child Writing nested divs can get tiring - and result in code that is brittle. Sometimes you expect the code to change, or want to skip layers. If an element could be inside another or one of its children, it’s defined in XPATH using “//” and in CSS just by a whitespace. Examples:
XPath: //div//a
CSS: div a
Class For classes, things are pretty similar in XPATH: “[@class='example']” while in CSS it’s just “.” Examples:
XPath: //div[@class='example']
CSS: .example
Xpath in Selenium is an XML path used for navigation through the HTML structure of the page. It is a syntax or language for finding any element on a web page using XML path expression. XPath can be used for both HTML and XML documents to find the location of any element on a webpage using HTML DOM structure.
Syntax for XPath selenium:
XPath contains the path of the element situated at the web page. Standard XPath syntax for creating XPath is.
Xpath=//tagname[@attribute='value']
- // : Select current node.
- Tagname: Tagname of the particular node.
- @: Select attribute.
- Attribute: Attribute name of the node.
- Value: Value of the attribute.
We will make a script which will take the username and password from the config file and login to the user account so that we can perform other activites First make a config file containing username and password.
USERNAME = ""
PASSWORD = ""
Writing script
First initiate the Chrome webdriver, base path and user url
BASE_DIR = os.path.dirname(__file__)
browser = webdriver.Chrome()
user_url = f"https://www.instagram.com/{USERNAME}"
Login Function
We will use the css selectors to locating the username field and password field. After entering the username and password at correct places, we click the submit button by location it by button[type='submit']
# login in the instagram account
def login(browser=browser, username=USERNAME, password=PASSWORD):
browser.get("https://www.instagram.com/")
time.sleep(2)
username_field = browser.find_element_by_name("username")
username_field.send_keys(username)
password_field = browser.find_element_by_name("password")
password_field.send_keys(password)
btn = browser.find_element_by_css_selector("button[type='submit']")
time.sleep(2)
btn.click()
It will open the user timeline and downloads media files depending upon the input passed.
def get_media(username=USERNAME, media_type=None, count=3, browser=browser)
.....
It will take 4 keyword argument.
- count It refers to the number of post it should search to download.
- username Don't pass any username parameter if you want to download the self account post otherwise mention the username of other people.
- media_type It can be anyone either img or video but you need to pass the parameter in lower case. Don't pass any parameter to download both media type.
- browser It is just an instance of web driver
def get_media(username=USERNAME, media_type=None, count=3, browser=browser):
person_url = f"https://www.instagram.com/{username}/"
browser.get(person_url)
# creating user media directory
src_path = os.path.join(BASE_DIR, f"{username}")
os.makedirs(src_path, exist_ok=True)
# getting media links from the function
media_elements = getting_all_post(media_type, count)
print(media_elements)
time.sleep(1.2)
...
...
It will first fetch the url. Then initiate the base dir for downloading media files.
Then It will call the getting_all_post function to retrieve all the media urls.
getting_all_post Function It will take two arguments
- media_type It can be anyone either img or video but you need to pass the parameter in lower case. Don't pass any parameter to download both media type.
- count It refers to the number of post it should search to download.
media nested array is intialized. The zero index will contains all the images in the post while the first index will contains all the video links
# returning src url from all post
def getting_all_post(media_type, count):
media = [[], []]
post_xpath = f"//a[contains(@href,'/p/')]"
post_media = browser.find_elements_by_xpath(post_xpath)
search_upto = len(post_media)
if count and count > search_upto:
print("only", search_upto, "media files alvailable")
print("downloading---", search_upto, "files only")
search_upto = count
elif count:
search_upto = count
for post in post_media[0:search_upto]:
browser.execute_script("arguments[0].click()", post)
time.sleep(2.5)
media[0] += get_src_from_post()
try:
media[1] += get_src_from_post(post_type="video")
except NoSuchElementException:
pass
close_btn()
# only returning the video url from all media url
if media_type == 'video':
return media[1]
elif media_type == 'img':
return media[0]
else:
return (media[0] + media[1])
It will get the src url of video and img using the get_src_from_post function so that we can download later.
get_src_from_post Function It will take the source url from a particular post
# returning src of a particular post
def get_src_from_post(post_type="img"):
links = []
file_path = "//div[@role='button']//img[@style]"
if post_type == "video":
file_path = "//video[@type='video/mp4']"
print(post_type)
files = browser.find_elements_by_xpath(file_path)
links += [file.get_attribute('src') for file in files]
return links
Once we get all the urls, we will now download all the media files required.
Continuing with the get_media function.
def get_media(username=USERNAME, media_type=None, count=3, browser=browser):
person_url = f"https://www.instagram.com/{username}/"
browser.get(person_url)
# creating user media directory
src_path = os.path.join(BASE_DIR, f"{username}")
os.makedirs(src_path, exist_ok=True)
# getting media links from the function
media_elements = getting_all_post(media_type, count)
print(media_elements)
time.sleep(1.2)
# looping through the media links
for element_url in media_elements:
# parsing the correct file name
base_url = urlparse(element_url).path
filename = os.path.basename(base_url)
print(base_url, "-------", filename)
# creating the get request to download the media files
with requests.get(element_url, stream=True) as r:
if r.status_code not in range(200, 299):
print("downloading fails, trying next")
filepath = os.path.join(src_path, f'{username}{filename}')
if os.path.exists(filepath):
print("file already exists")
continue
# open created file to write as binary
with open(filepath, 'wb') as f:
for chunk in r.iter_content():
if chunk:
f.write(chunk)
print(element_url)
print("-----------Done--------")
We will now iterate over the list of all media urls. We will be using the urlparse from the urllib3 library to name our particular media file.
Now we will request that url using the Requests library to download it. If the status_code is 200 (or response is ok) then we will write that file in the disk at the preferred location.
We can follow other suggested users or unfollow the ones that are not following us.
It will remove a particular no of following How does this code works ?
- Gets the user profile/timeline page
- Open the following section
- Then checks if count of following is greater then 12 then it will scroll and find others following as first page only contains 12 followings
- It will remove following until input no of followings are not removed
# removing a particular no of following
def remove_following(remove=0, browser=browser):
browser.get(user_url)
time.sleep(1.2)
open_following_section()
following_removed = 0
if remove > 12:
time.sleep(1.3)
scroll_to_bottom("((//nav)[3]//following::div)[1]")
time.sleep(1)
following_path = "//a[@title]"
following_sel_list = browser.find_elements_by_xpath(following_path)
while following_removed ≷ remove:
try:
user_name = following_sel_list[following_removed].text
following_x_path = f"//a[@title='{user_name}']//following::button"
following_btn = browser.find_element_by_xpath(following_x_path)
following_btn.click()
# print("removed----", user_name)
following_removed += 1
time.sleep(1.7)
unfollow_btn()
# print("un follow", following_removed)
except:
break
else:
print("successfully removed", following_removed, "followers")
close_btn()
It will use a open_following_section Function to follow the following section. open_following_section Function
def open_following_section():
follow_section_x_path = "//a[contains(@href,'following')]"
follow_section = browser.find_element_by_xpath(follow_section_x_path)
follow_section.click()
For scrolling bottom , we have used the scroll_to_bottom Function using the execute javascript function
# it scrolls a particular section of the site
def scroll_to_bottom(scroll_xpath):
scroll_time = 1.3
scroll_element = browser.find_element_by_xpath(scroll_xpath)
last_height = browser.execute_script("return arguments[0].scrollHeight;", scroll_element)
while True:
browser.execute_script("arguments[0].scrollTo(0, arguments[1]);", scroll_element, last_height)
time.sleep(scroll_time)
new_height = browser.execute_script("return arguments[0].scrollHeight;", scroll_element)
if new_height == last_height:
break
last_height = new_height
We have also used the unfollow_btn function to click the unfollow button.
# finds the unfollow btn and click it.
def unfollow_btn():
un_path = "//button[contains(text(),'Unfollow')]"
un_follow = browser.find_element_by_xpath(un_path)
un_follow.click()
Once our task is achieved, we closed that section using the close_btn.
def close_btn():
close_btn_path = "//*[name()='svg' and @aria-label='Close']"
close = browser.find_element_by_xpath(close_btn_path)
close.click()
It will help us to unfollow the users that are not following us How does this code works
- Gets the following page
- Then opens the following section
- Gets all the users so that we can compare later with the followers
- Then open the followers page
- Gets all the followers and compare with the following
- Remove the ones which are not following us
# it removes the following that are not your follower
def remove_following_not_followers(count=None, browser=browser):
browser.get(user_url)
time.sleep(1.3)
open_following_section()
time.sleep(1.3)
scroll_to_bottom("((//nav)[3]//following::div)[1]")
time.sleep(1)
following_path = "//a[@title]"
following_sel_list = browser.find_elements_by_xpath(following_path)
following_list = {x.text for x in following_sel_list}
print("followings are ", len(following_list), "--------------", following_list)
close_btn()
time.sleep(1)
open_follower_section()
time.sleep(1)
scroll_to_bottom("((//div[@role='dialog']//div)[1]//div)[5]")
followers_path = "(((//div[@role='dialog']//div)[1]//div)[5]//ul)[1]//a[@title]"
followers_sel_list = browser.find_elements_by_xpath(followers_path)
followers_list = {x.text for x in followers_sel_list}
close_btn()
print("followings are ", len(followers_list), "-------------------", followers_list)
unwanted_follower = following_list - followers_list
no_of_unwanted_follower = len(unwanted_follower)
print("length of unwanted follower", no_of_unwanted_follower)
time.sleep(1)
open_following_section()
time.sleep(1)
scroll_to_bottom("((//nav)[3]//following::div)[1]")
if count and count > no_of_unwanted_follower:
print("the count exceeded only", no_of_unwanted_follower, "unwanted followers are there")
no_of_unwanted_follower = count
elif count:
no_of_unwanted_follower = count
for following in unwanted_follower[0:no_of_unwanted_follower]:
print("removing---------", following)
following_x_path = f"//a[@title='{following}']//following::button"
following_btn = browser.find_element_by_xpath(following_x_path)
following_btn.click()
close_btn()
we have already discussed about the open_following_section , unfollow_btn, and close_btn Function. Lets discuss the other remained function used in this program.
open_follower_section Function
def open_follower_section():
follow_section_x_path = "//a[contains(@href,'followers')]"
follow_section = browser.find_element_by_xpath(follow_section_x_path)
follow_section.click()
import os
import requests
import time
from selenium import webdriver
from selenium.common.exceptions import NoSuchElementException
from urllib.parse import urlparse
from config1 import USERNAME, PASSWORD
BASE_DIR = os.path.dirname(__file__)
browser = webdriver.Chrome()
user_url = f"https://www.instagram.com/{USERNAME}"
# login in the instagram account
def login(browser=browser, username=USERNAME, password=PASSWORD):
browser.get("https://www.instagram.com/")
time.sleep(2)
username_field = browser.find_element_by_name("username")
username_field.send_keys(username)
password_field = browser.find_element_by_name("password")
password_field.send_keys(password)
btn = browser.find_element_by_css_selector("button[type='submit']")
time.sleep(2)
btn.click()
# returning src of a particular post
def get_src_from_post(post_type="img"):
links = []
file_path = "//div[@role='button']//img[@style]"
if post_type == "video":
file_path = "//video[@type='video/mp4']"
print(post_type)
files = browser.find_elements_by_xpath(file_path)
links += [file.get_attribute('src') for file in files]
return links
# returning src url from all post
def getting_all_post(media_type, count):
media = [[], []]
post_xpath = f"//a[contains(@href,'/p/')]"
post_media = browser.find_elements_by_xpath(post_xpath)
search_upto = len(post_media)
if count and count > search_upto:
print("only", search_upto, "media files alvailable")
print("downloading---", search_upto, "files only")
search_upto = count
elif count:
search_upto = count
for post in post_media[0:search_upto]:
browser.execute_script("arguments[0].click()", post)
time.sleep(2.5)
media[0] += get_src_from_post()
try:
media[1] += get_src_from_post(post_type="video")
except NoSuchElementException:
pass
close_btn()
# only returning the video url from all media url
if media_type == 'video':
return media[1]
elif media_type == 'img':
return media[0]
else:
return (media[0] + media[1])
def get_media(username=USERNAME, media_type=None, count=3, browser=browser):
person_url = f"https://www.instagram.com/{username}/"
browser.get(person_url)
# creating user media directory
src_path = os.path.join(BASE_DIR, f"{username}")
os.makedirs(src_path, exist_ok=True)
# getting media links from the function
media_elements = getting_all_post(media_type, count)
print(media_elements)
time.sleep(1.2)
# looping through the media links
for element_url in media_elements:
# parsing the correct file name
base_url = urlparse(element_url).path
filename = os.path.basename(base_url)
print(base_url, "-------", filename)
# creating the get request to download the media files
with requests.get(element_url, stream=True) as r:
if r.status_code not in range(200, 299):
print("downloading fails, trying next")
filepath = os.path.join(src_path, f'{username}{filename}')
if os.path.exists(filepath):
print("file already exists")
continue
# open created file to write as binary
with open(filepath, 'wb') as f:
for chunk in r.iter_content():
if chunk:
f.write(chunk)
print(element_url)
print("-----------Done--------")
# removing a particular no of following
def remove_following(remove=0, browser=browser):
browser.get(user_url)
time.sleep(1.2)
open_following_section()
following_removed = 0
if remove > 12:
time.sleep(1.3)
scroll_to_bottom("((//nav)[3]//following::div)[1]")
time.sleep(1)
following_path = "//a[@title]"
following_sel_list = browser.find_elements_by_xpath(following_path)
while following_removed < remove:
try:
user_name = following_sel_list[following_removed].text
following_x_path = f"//a[@title='{user_name}']//following::button"
following_btn = browser.find_element_by_xpath(following_x_path)
following_btn.click()
# print("removed----", user_name)
following_removed += 1
time.sleep(1.7)
unfollow_btn()
# print("un follow", following_removed)
except:
break
else:
print("successfully removed", following_removed, "followers")
close_btn()
# finds the unfollow btn and click it.
def unfollow_btn():
un_path = "//button[contains(text(),'Unfollow')]"
un_follow = browser.find_element_by_xpath(un_path)
un_follow.click()
def open_following_section():
follow_section_x_path = "//a[contains(@href,'following')]"
follow_section = browser.find_element_by_xpath(follow_section_x_path)
follow_section.click()
def open_follower_section():
follow_section_x_path = "//a[contains(@href,'followers')]"
follow_section = browser.find_element_by_xpath(follow_section_x_path)
follow_section.click()
def close_btn():
close_btn_path = "//*[name()='svg' and @aria-label='Close']"
close = browser.find_element_by_xpath(close_btn_path)
close.click()
# it scrolls a particlar section of the site
def scroll_to_bottom(scroll_xpath):
scroll_time = 1.3
scroll_element = browser.find_element_by_xpath(scroll_xpath)
last_height = browser.execute_script("return arguments[0].scrollHeight;", scroll_element)
while True:
browser.execute_script("arguments[0].scrollTo(0, arguments[1]);", scroll_element, last_height)
time.sleep(scroll_time)
new_height = browser.execute_script("return arguments[0].scrollHeight;", scroll_element)
if new_height == last_height:
break
last_height = new_height
# it removes the following that are not your follower
def remove_following_not_followers(count=None, browser=browser):
browser.get(user_url)
time.sleep(1.3)
open_following_section()
time.sleep(1.3)
scroll_to_bottom("((//nav)[3]//following::div)[1]")
time.sleep(1)
following_path = "//a[@title]"
following_sel_list = browser.find_elements_by_xpath(following_path)
following_list = {x.text for x in following_sel_list}
print("followings are ", len(following_list), "--------------", following_list)
close_btn()
time.sleep(1)
open_follower_section()
time.sleep(1)
scroll_to_bottom("((//div[@role='dialog']//div)[1]//div)[5]")
followers_path = "(((//div[@role='dialog']//div)[1]//div)[5]//ul)[1]//a[@title]"
followers_sel_list = browser.find_elements_by_xpath(followers_path)
followers_list = {x.text for x in followers_sel_list}
close_btn()
print("followings are ", len(followers_list), "-------------------", followers_list)
unwanted_follower = following_list - followers_list
no_of_unwanted_follower = len(unwanted_follower)
print("length of unwanted follower", no_of_unwanted_follower)
time.sleep(1)
open_following_section()
time.sleep(1)
scroll_to_bottom("((//nav)[3]//following::div)[1]")
if count and count > no_of_unwanted_follower:
print("the count exceeded only", no_of_unwanted_follower, "unwanted followers are there")
no_of_unwanted_follower = count
elif count:
no_of_unwanted_follower = count
for following in unwanted_follower[0:no_of_unwanted_follower]:
print("removing---------", following)
following_x_path = f"//a[@title='{following}']//following::button"
following_btn = browser.find_element_by_xpath(following_x_path)
following_btn.click()
close_btn()
if __name__ == '__main__':
login(browser)
-
First of all, install all the dependencies by
python3 install -r requirements.txt
in the cmd. -
Then you need to install the selenium web driver in your system. You can follow this link Youtube
-
Now you need to change the username and password in the config.py folder or you can give it on later during the login.
-
There are three features that our project provide :- A. Remove Following anyone. B. Remove Following that are not your follower. C. Download Media.
-
First you need to login to your download. Run the program in interactive shell. If you have already made changes in the config file. Then you can login by simple command:-
login()
or If you have not entered the credentials then you can login by:-login(username="ENTER YOUR USERNAME", password="ENTER YOUR SECRET KEY")
-
(A). Remove follower. Command:-
remove_following(remove=Enter the no of followers you want to unfollow)
Sample :-remove_following(remove=6)
-
(B). Remove following that are not your follower. Basic syntax :-
remove_following_not_followers(count=Enter the number of followers to remove)
Sample code :-remove_following_not_followers(count=5)
-
(C). Download Media. It can help you to download any media whether IGTV, Video Or Image. Basic Syntax :- get_media(username=USERNAME, media_type=img or video, count=3) Sample :-
get_media(username="therock", media_type=img, count=3)
orget_media(count=3)
It will download all types of media file where : -count: It refers to the number of post it should search to download.
username: Don't pass any username parameter if you want to download the self account post otherwise mention the username of other people.
media_type: It can be anyone either img or video but you need to pass the parameter in lower case. Don't pass any parameter to download both media type.
Note:
-
The download files will be in same directory with the name of the user. The download media files function has a automatic function which will prevent the duplication of the downloaded files.
-
Don't use the opened browser as it will hinder the working of the code.
-
If you find that the download is not working. There might be two reason :-
- Instagram might have changed the element position or element attribute
-
Your internet might be too slow
-
While downloading the media file. If the media file is video then it will take some more time depending on the internet speed.
All the issues and PR are most welcomed. Please make a branch before committing any changes.