Try to use casperjs

CasperJS is navigation scripting and testing utility for the PhantomJS and SlimerJS written in Javascript.
You know, PhantomJS, and SlimerJS are headless browsers.
Some years ago, I used selenium for web scraping because selenium has python binding and easy to use.
Today, I used CasperJS for test.
Installation is very easy. Just use homebrew(for Mac users) or npm (Need to install PhantomJS before). ;-)
I wrote simple code that the code search patents in google patent and echo the each link.
At first, create casper object. And then write next action like ‘casper.then( function() { /* your function */ } );’ .
fill function is useful for form input, user don’t need push button command.
Following code access google patent and search patents that are written about JAK3.
Then, echo urls.

var casper = require( 'casper' ).create();
function getLinks() {
        var links = [];
        var list = document.querySelectorAll( 'article > a' );

        for ( var i = 0; i < list.length; i++ ){
            var a = list[i];
            links.push( a.href );
        return links;

casper.start().viewport( 1600,1000 );

casper.thenOpen( '',
                   this.echo( this.getTitle() );
                 function(){ this.capture('top.png') }

casper.then( function(){
             this.fill("form", { q : "JAK3" }, true);
casper.wait( 5000,
                 function(){ this.capture('res.png') }

                        links = this.evaluate( getLinks );
                        this.echo( links.length + 'patents found' );
                        for ( i = 0; i < links.length; i++ ){
                                    this.echo( links[i]  );

To run the code, just type casperjs yourscript.js.

 iwatobipen$ casperjs googlepat.js 
Google Patents
10patents found

Works fine and I got following screenshot.
CasperJS has more function for scraping. I’ll read API as soon as possible.




Published by iwatobipen

I'm medicinal chemist in mid size of pharmaceutical company. I love chemoinfo, cording, organic synthesis, my family.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: