Yesterday I wrote a
post detailing some benchmarking results I got while comparing
cURL to
PhantomJS. The results were pretty good, considering the amount of extra processing that needs to be done, but were not quite good enough for me to use in production.
Ariya, who is apparently the developer of PhantomJS, left me a note in the comments telling me that the results might be a little skewed because PhantomJS was loading images as well as javascript. He was correct - loading images is actually a pretty crucial part of my specific use case, but I realize now that those results were misleading because I didn't describe that scope in the post.
So I've redone all the tests with the --load-images=no flag (
new gist), and have gotten drastically different results. My original intention was to just update the previous post with the new values, but they are so incredibly different that I think they deserve an entirely different post (and an update to the old one better describing the scope of the test). The new results are below:
Gawker 10 Request Benchmark
Trial # |
PhantomJS Load Time (ms) |
cURL Load Time (ms) |
1 |
41446 |
2881 |
2 |
67852 |
38782 |
3 |
26975 |
2977 |
Average |
22879.333 |
36537 |
On Average, every PhantomJS request took 0.626 times as long as a regular cURL request!
WegnerDesign 10 Request Benchmark
Trial # |
PhantomJS Load Time (ms) |
cURL Load Time (ms) |
1 |
15760 |
9061 |
2 |
15063 |
13812 |
3 |
16703 |
9939 |
Average |
15842 |
10937.333 |
On Average, every PhantomJS request took 1.448 times longer than a regular cURL request
Google 10 Request Benchmark
Trial # |
PhantomJS Load Time (ms) |
cURL Load Time (ms) |
1 |
6948 |
2072 |
2 |
5092 |
2044 |
3 |
5108 |
2078 |
Average |
5716 |
2064.666 |
On Average, every PhantomJS request took 2.768 times longer than a regular cURL request
As you can see, these results are incredible. All of the PhantomJS averages dropped by at least half, and the first request - the one with all content loaded via AJAX - actually was
faster than cURL. Now that doesn't really make one bit of sense to me, and is probably a result of only having 3 trials, but regardless it proves that
PhantomJS is really freaking fast. If you're doing web scraping (and don't need images), I can't think of a single good reason why you wouldn't jump on the PhantomJS wagon.
One thing that I'm still confused about, though - and Arial, if you're reading this I'd love to hear from you in the comments - is that the tests (WegnerDesign, Google) that don't load any content via javascript still load significantly slower compared to the largely AJAX based Gawker test. Logically it would make sense that low-javascript pages would perform better than high-javascript pages, but I haven't seen proof of that in either this set of tests or the previous.