hello.
I'm also trying out various things while studying crawling for the first time.
This time, I would like to apply it to Instagram, one of the most popular SNS recently.
To make it interesting, I'm trying to do the following using a little bit of human psychology.
how about it? don't you want to try 🙂
Sometimes people stop following me. TT
Now, let's do it step by step.
Import the required modules
import time
import sys
from selenium import webdriver
from bs4 import BeautifulSoup
The webdriver module of the selenium package launches a web browser and allows you to perform actions according to script commands, and the BeautifulSoup module in the bs4 package has a function that allows you to easily extract desired contents from HTML DOM data.
If selenium and bs4 packages do not exist, install them by entering the following in the command window.
pip install bs4
pip install selenium
And the Chrome webdriver can be downloaded from the link below.
https://sites.google.com/a/chromium.org/chromedriver/downloads
The rest are basic modules, so you can import and use them right away.
log in to Instagram
Now open the Chrome browser and try to log in by accessing the Instagram address.
Let's take input directly from the command window.
For example, suppose you type the following into the command window.
python crawling_instagram.py sangminem 123456
where sys.argv[0] becomes crawling_instagram.py , sys.argv[1] becomes sangminem and sys.argv[2] becomes 123456 .
Let's use this to write code to log in to Instagram as shown below.
browser = webdriver.Chrome('./chromedriver')
browser.get('https://www.instagram.com/'+sys.argv[1])
browser.execute_script("document.querySelectorAll('.-nal3')[1].click();")
time.sleep(2)
browser.find_element_by_name('username').send_keys(sys.argv[1])
browser.find_element_by_name('password').send_keys(sys.argv[2])
browser.find_element_by_xpath('//*[@id="loginForm"]/div[1]/div[3]/button').submit()
time.sleep(5)
browser.find_element_by_xpath('//*[@id="react-root"]/section/main/div/div/div/div/button').click()
I opened the Chrome web browser using chromedriver and connected by combining the Instagram address and user name.
I clicked the follower button using browser.execute_script to open the login window.
(If you click when you are not logged in, the login window is displayed.)
After that, I put a 2 second wait in case the loading would be longer.
Next, the input type finds the username and password parts, and the send_keys method is used to input the username and password.
And I found the login button in the form with xpath and called the submit method.
You can get xpath by right-clicking on the element you want to find in the developer's Elements tab, which appears when you press F12 in Chrome, and selecting Copy > Copy xpath.
Wait 5 seconds for the next login.
This is the code I wrote to click the Do button later using xpath again.
Since this part is simply to be skipped, a detailed explanation will be omitted.
Get follower list
Now that the login is complete, let's implement the logic to get the number of followers.
time.sleep(2)
browser.execute_script("document.querySelectorAll('.-nal3')[1].click();")
time.sleep(1)
oldHeight = -1
newHeight = -2
while oldHeight != newHeight:
oldHeight = newHeight
newHeight = browser.execute_script("return document.querySelectorAll('._aano')[0].scrollHeight")
browser.execute_script("document.querySelectorAll('.isgrP')[0].scrollTo(0,document.querySelectorAll('._aano')[0].scrollHeight)")
time.sleep(0.5)
soup = BeautifulSoup(browser.page_source, 'html.parser')
followers = soup.findAll('a',['FPmhX','notranslate','_0imsa'])
followers_text = []
for follower in followers:
followers_text.append(follower.get_text())
print("Number of followers: " + str(len(followers_text)))
After waiting 2 seconds again to prevent malfunction, I clicked the follower barton again.
And this is the part that waits for 1 second and gets the follower username in earnest.
Since we need to get all the followers first, we implemented the logic to repeatedly lower the scroll in order to load everyone through the while statement.
If the old scroll height and the new scroll height are different, it means that there is more to load, so this is a syntax that keeps repeating until the old scroll height and the new scroll height are the same.
The class name, which is the value of the querySelectorAll method argument, is the values directly viewed and imported from the Developer Mode Elements tab.
After loading, import html data through BeautifulSoup module, check tags and classes with user names, extract them all and put them in an array.
I got the length of the array with the print method and printed the number of followers on the screen.
Get your following list
Next, let's get the number of followers.
browser.find_element_by_xpath('/html/body/div[4]/div/div/div[1]/div/div[2]/button').click()
time.sleep(0.5)
browser.execute_script("document.querySelectorAll('.-nal3')[2].click();")
time.sleep(1)
oldHeight = -1
newHeight = -2
while oldHeight != newHeight:
oldHeight = newHeight
newHeight = browser.execute_script("return document.querySelectorAll('._aano')[0].scrollHeight")
browser.execute_script("document.querySelectorAll('.isgrP')[0].scrollTo(0,document.querySelectorAll('._aano')[0].scrollHeight)")
time.sleep(0.5)
soup = BeautifulSoup(browser.page_source, 'html.parser')
followings = soup.findAll('a',['FPmhX','notranslate','_0imsa'])
followings_text = []
for following in followings:
followings_text.append(following.get_text())
print("Number of followings: " + str(len(followings_text)))
I clicked the close button using xpath to close the follower window.
Then I waited half a second and clicked the following button.
Then I waited for 1 second again and got the username I was following.
The pattern for getting a follower username is almost similar, so I won't go into this again.
Get people who only follow you
Finally, let's compare the list of follower usernames with the list of following usernames to find the non-followers.
result = []
for following in followings_text:
cnt = 0
for follower in followers_text:
if following == follower:
cnt += 1
break
if cnt == 0:
result.append(following)
print('List of people who did not F4F: '+str(result))
Based on the following user name, we checked the list of all follower user names one by one, counted if there was, and repeatedly performed the logic to exit.
If the counting is 0, it means that I am following but not on the follower list, so I did not F4F it and add it to the result array.
Finally, outputting the resulting array achieves the desired purpose.
Just like this.
I followed but you didn't? ㅠ
For reference, I only follow my acquaintances, so there aren't many people who don't follow me.
It simply served its purpose.
Full source sharing
If anyone wants it, please share.
import time
import sys
from selenium import webdriver
from bs4 import BeautifulSoup
username = sys.argv[1]
browser = webdriver.Chrome('./chromedriver')
browser.get('https://www.instagram.com/'+username)
browser.execute_script("document.querySelectorAll('.-nal3')[1].click();")
time.sleep(2)
browser.find_element_by_name('username').send_keys(sys.argv[1])
browser.find_element_by_name('password').send_keys(sys.argv[2])
browser.find_element_by_xpath('//*[@id="loginForm"]/div[1]/div[3]/button').submit()
time.sleep(5)
browser.find_element_by_xpath('//*[@id="react-root"]/section/main/div/div/div/div/button').click()
time.sleep(5)
if len(sys.argv) > 3:
username = sys.argv[3]
print('Account: ' + username)
browser.get('https://www.instagram.com/'+username)
time.sleep(2)
browser.execute_script("document.querySelectorAll('.-nal3')[1].click();")
time.sleep(1)
oldHeight = -1
newHeight = -2
while oldHeight != newHeight:
oldHeight = newHeight
newHeight = browser.execute_script("return document.querySelectorAll('.jSC57')[0].scrollHeight")
browser.execute_script("document.querySelectorAll('.isgrP')[0].scrollTo(0,document.querySelectorAll('.jSC57')[0].scrollHeight)")
time.sleep(0.5)
soup = BeautifulSoup(browser.page_source, 'html.parser')
followers = soup.findAll('a',['FPmhX','notranslate','_0imsa'])
followers_text = []
for follower in followers:
followers_text.append(follower.get_text())
print("Number of followers: " + str(len(followers_text)))
browser.find_element_by_xpath('/html/body/div[4]/div/div/div[1]/div/div[2]/button').click()
time.sleep(0.5)
browser.execute_script("document.querySelectorAll('.-nal3')[2].click();")
time.sleep(1)
oldHeight = -1
newHeight = -2
while oldHeight != newHeight:
oldHeight = newHeight
newHeight = browser.execute_script("return document.querySelectorAll('._aano')[0].scrollHeight")
browser.execute_script("document.querySelectorAll('.isgrP')[0].scrollTo(0,document.querySelectorAll('._aano')[0].scrollHeight)")
time.sleep(0.5)
soup = BeautifulSoup(browser.page_source, 'html.parser')
followings = soup.findAll('a',['FPmhX','notranslate','_0imsa'])
followings_text = []
for following in followings:
followings_text.append(following.get_text())
print("Number of followings: " + str(len(followings_text)))
result = []
for following in followings_text:
cnt = 0
for follower in followers_text:
if following == follower:
cnt += 1
break
if cnt == 0:
result.append(following)
print('List of people who did not F4F: '+str(result))
See you next time with another topic. 🙂
(Caution) The class name can be changed from time to time, so if the program does not work, you may have to figure out the Instagram tag structure yourself to fix it.