It's all about the data: August 2011

Wednesday, August 17, 2011

Blogs on Trade and the Environment

http://environment.yale.edu/envirocenter/

This blogging on the Yale Center for Environmental Law & Policy site discusses issues arising from our recent study of linkages between trade and the environment.

Tuesday, August 16, 2011

Fantasy Football 2011

It's that time of year again! Yesterday I scraped some ranking and points projection data from http://fftoolbox.com.

I was interested in how the projected points declined with rank, across the player positions. The plot, below, helps explain why running backs are selected ahead of wide receivers, for example: the decline in production of wide receivers is much more shallow than for running backs. You get hurt less (in expectation) by taking lower-ranked wide receivers than you do by taking lower-ranked running backs. What I'd really like to do is integrate weekly variation into the analysis... but this requires a more substantial data scrape than I had time for.

Monday, August 15, 2011

Using "Google Docs" to scrape HTML tables from web pages

One of my students suggested I try this... so I did. In Google Docs, create a new spreadsheet. In the first cell, type something of the form:

=ImportHtml("http://the-url-goes-here", "table", 0)

My first attempt was scraping some fantasy football points projections:

=ImportHtml("http://www.fftoolbox.com/football/2011/cheatsheets.cfm?player_pos=QB", "table", 0)

Bingo. At least, it worked for me on the 8 pages I tried. I used 0 as the third argument because some web page recommended it.

I could see using this for data scrapes when a small number of pages are involved, but for more advanced scrapes that require automation I'll continue to use R.