Monday, November 14, 2011

Howto debug a Hadoop Program?

Hadoop could run on three modes:
  • Single Node Alone
  • Pseudo Distributed
  • Fully Distributed
When developing hadoop program, using Single Node Alone Mode could give you quick and easy way to debug programs. How to use it? Config Hadoop to run in a non-distributed mode, as a single Java process.

$ mkdir input
$ cp conf/*.xml input
$ bin/hadoop jar hadoop-examples-*.jar grep input output 'dfs[a-z.]+'
$ cat output/*

What's Discoproject ?

Disco is a distributed computing framework based on the MapReduce paradigm. Disco is open-source; developed by Nokia Research Center to solve real problems in handling massive amounts of data.
See details:http://discoproject.org/

How to install Mercurial/hg on Ubuntu

Four steps you need:
  1. sudo add-apt-repository ppa:tortoisehg-ppa/releases
  2. sudo add-apt-repository ppa:mercurial-ppa/releases
  3. sudo apt-get update
  4. sudo apt-get install mercurial python-nautilus tortoisehg