Update: Benchmarking PhantomJS Against cURL

Yesterday I wrote a post detailing some benchmarking results I got while comparing cURL to PhantomJS. The results were pretty good, considering the amount of extra processing that needs to be done, but were not quite good enough for me to use in production. Ariya, who is apparently the developer of PhantomJS, left me a note in the comments telling me that the results might be a little skewed because PhantomJS was loading images as well as javascript. He was correct - loading images is actually a pretty crucial part of my specific use case, but I realize now that those results were misleading because I didn't describe that scope in the post. So I've redone all the tests with the --load-images=no flag (new gist), and have gotten drastically different results. My original intention was to just update the previous post with the new values, but they are so incredibly different that I think they deserve an entirely different post (and an update to the old one better describing the scope of the test). The new results are below:

Gawker 10 Request Benchmark

Trial # PhantomJS Load Time (ms) cURL Load Time (ms)
1 41446 2881
2 67852 38782
3 26975 2977
Average 22879.333 36537
On Average, every PhantomJS request took 0.626 times as long as a regular cURL request!

WegnerDesign 10 Request Benchmark

Trial # PhantomJS Load Time (ms) cURL Load Time (ms)
1 15760 9061
2 15063 13812
3 16703 9939
Average 15842 10937.333
On Average, every PhantomJS request took 1.448 times longer than a regular cURL request

Google 10 Request Benchmark

Trial # PhantomJS Load Time (ms) cURL Load Time (ms)
1 6948 2072
2 5092 2044
3 5108 2078
Average 5716 2064.666
On Average, every PhantomJS request took 2.768 times longer than a regular cURL request As you can see, these results are incredible. All of the PhantomJS averages dropped by at least half, and the first request - the one with all content loaded via AJAX - actually was faster than cURL. Now that doesn't really make one bit of sense to me, and is probably a result of only having 3 trials, but regardless it proves that PhantomJS is really freaking fast. If you're doing web scraping (and don't need images), I can't think of a single good reason why you wouldn't jump on the PhantomJS wagon. One thing that I'm still confused about, though - and Arial, if you're reading this I'd love to hear from you in the comments - is that the tests (WegnerDesign, Google) that don't load any content via javascript still load significantly slower compared to the largely AJAX based Gawker test. Logically it would make sense that low-javascript pages would perform better than high-javascript pages, but I haven't seen proof of that in either this set of tests or the previous.
comments powered by Disqus