![]() ![]() What that means is that, in a lot of cases, what comes back from the server is just an empty HTML skeleton with lots of JavaScript to fetch the data and display it. It handles a lot of the plumbing, data access and binding for you. If you’re not familiar with AngularJS, it’s a JavaScript framework that allows you to build single-page web applications. I’ll walk you through some of the challenges I overcame during this project. I find the most compelling use case for CasperJS scraping is when a site relies on a lot of JavaScript to navigate through the content a recent project was a perfect example as it uses AngularJS, loads all the content asynchronously and uses infinite scrolling instead of pagination. Before we start, a couple of caveats – firstly, be sure that you have permission to scrape and use the content you’re after secondly, be a good citizen and space your requests out so as to not overload the server thirdly, if the site you’re scraping doesn’t use much/any JavaScript for navigation you’re likely to get a faster result by using tools which just grab the HTML of a page such as WWW::Mechanize, Mechanize, HtmlAgilityPack, Beautiful Soup or HtmlUnit. I’ve written before about using CasperJS for doing headless testing but it’s also very useful as a web scraper. ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |