java - Crawling data from Web, When page loads data dynamically at end of page -


I want to crawl data from some web to Java, but I have found that the page is loading when the page's Reaches the end. I'm not a web developer and do not know what technology they use to load data while reaching the end of the scroll page.

Can you give me some hints? What technology did they use? How can I read the data when I do not want to use the browser? (I wrote a code with Java using urlConnection to read data from the site.

The site is something like this.

Thank you. < P>

This is a common "problem" of web crawler bots ... some pages have dynamic content added from the sources contained in it. Loads or triggers (such as your example - by scrolling down) can be loaded. When the target page downloads and scraps the dome structure In most cases, the HTML elements of the external data are not included.

What I suggest to do is to identify the source path of this data, which is carefully coded on the dome And it is being called as a secondary source, which includes all the necessary data.

Edit:

For example you linked- Its easy:

  - Install Firebug - S Scroll down the page to check the Crypt - request puts the fire - now you can see the links and wars used to add dynamic as content.   

www.healthtap.com/#topics/Women%27s%20health:

Dynamical Scale Feedback Link:

? Extended categories = 1 & amp; Auth_token = false & amp; Per_page = 8 & amp; Page = 7 & amp; Per_page = 8 & amp; Auth_token = false & amp; Gener_token = true

As you can see here some parameters you can play on:

  1 / topic / + Page Fir Name of Name + Jason? 2 / per page = num - & gt; How much results 3 / gener_token = true - & gt; To return it is a security value, but it changed to false and it works fine ....   

Now you can play with this link and load all the data You can merge it from the main page and you can crawl it.

Test!

Comments

Popular posts from this blog

excel vba - How to delete Solver(SOLVER.XLAM) code -

github - Teamcity & Git - PR merge builds - anyway to get HEAD commit hash? -

ios - Replace text in UITextView run slowly -