Command-line Tools can be 235x Faster than your Hadoop Cluster - Adam Drake

Uncategorized

Command-line Tools can be 235x Faster than your Hadoop Cluster - Adam Drake

URL: https://adamdrake.com/command-line-tools-can-be-235x-faster-than-your-hadoop-cluster.html
Type: article
Domain: adamdrake.com

Excerpt: Introduction As I was browsing the web and catching up on some sites I visit periodically, I found a cool article from Tom Hayden about using Amazon Elastic Map Reduce (EMR) and mrjob in order to compute some statistics on win/loss ratios for chess games he downloaded from the millionbase archive, and generally have fun with EMR. Since the data volume was only about 1.75GB containing around 2 million chess games, I was skeptical of using Hadoop for the task, but I can understand his goal of learning and having fun with mrjob and EMR.