Suppose I want to scrape some text from my facebook page. How can I do that if I am logged in from my browser. I have a few more questions regarding this if you have the time to chat
Can i scrape things which are only available after loging in that website
Hi @divij26,
It is difficult to scrape news feed of facebook, because that data is very very dynamic, it is different for every user (if they are logged in) , even if you refresh your browser every minute, you will see the different content. But if you give the link of a particular one image in fb, then you can scrape the comments from that image, bcoz everytime you open that image link - you will get the same image. so it’s then easy to retrieve the data.
Also to scrape data from such websites you will need to use selenium
which a python web automation framework…
Let’s discuss where this is useful…
Suppose you want to scrape you facebook newsfeed, you go to the link as www.facebook.com
(assuming already logged in) , now you can see a lot of newsfeeds, right.
But now, you give this link in beautifulsoup to scrape the newsfeed. Unfortunately it won’t be able to do so. Because - what happens when you open facebook in incognito mode? it just open facebook login page/ home page. right… But, when you open facebook where you are logged in, you see your personalized newsfeed.
So, when we do scraping for a link, it sends a request to the link (facebook) assume from a new browser , and facebook will think you are not loggedin so, it will return homepage/login page…
But Selenium gives you the power to do this, you can login automatically and get the data automatically…
If you’re interested, try to read from online resources about this.
I’ve built one scraping script to get the facebook comments from an image provided it’s link.
So, if you want to I can share that with you for reference.
I hope it’s more clear now, scraping a static page vs. dynamic page.
I hope I’ve cleared your doubt. I ask you to please rate your experience here
Your feedback is very important. It helps us improve our platform and hence provide you
the learning experience you deserve.
On the off chance, you still have some questions or not find the answers satisfactory, you may reopen
the doubt.