Problem in Scraping wesite

Rashmi007 · June 27, 2020, 9:44pm

When i am trying to scrap URL:
‘https://issues.apache.org/jira/projects/DERBY/issues/DERBY-7013?filter=allopenissues.json’

i am getting the HTML page which i am not able to understand.bcoz when i open the page -->inspect,it seems to be different.

prashant_ml · June 28, 2020, 3:07am

yes @Rashmi007 ,
it is really a different and big website.
Can you let me know what exactly do you want from this website.

Thank You .

Rashmi007 · June 28, 2020, 6:16am

Actually i want to work on Bug Prediction.So i am trying to collect data from different repositories.
From here,I want to collect 5 fields:
Issue key, Issue Type, Summary, description, Text
URl =https://issues.apache.org/jira/projects/DERBY/issues/DERBY-7013?filter=allopenissues

prashant_ml · June 28, 2020, 7:47am

hey @Rashmi007 ,
to implement the task that you want on the given URL ,
it will not be easier for you to do so with beautifulSoup .
So , if you check there is a export option in right side , from there as it is not allowed to get the json, but yeah you can get the xml file from there and can easily parse it to get useful information that you need.

But i would suggest to first learn about selenium, and use it to extract only the issues keys.
once you have got those keys , then got o there respective pages , export there data and work upon it.