Archive

Posts Tagged ‘hadoop’

Getting Data from RestFB and Creating Sequence File > Hadoop

July 16, 2012 Leave a comment

Here is a quick code to get data from Facebook using RestFB API and  create Sequence file and dump you data into Hadoop Cluster.

Requirement:

  • Hadoop 1.0.3 Installed as Stand Alone or Multinode.
  • Eclipse IDE for development
  • Hadoop and Apache commons jars.
  • RestFB APIs

Steps to Create Eclipse Project.

  • New Java Project.
  • Add the jar to the project. (Apache Commons and hadoop-core.1.0.3.jar) and add RestFB jar.
  • You will find all (commons and hadoop) jars under hadoop directory.

Sequence File Content Format.

  • Key – <facebook_id, facebook_name, timestamp>
  • Value – <batch_me, batch_me_friends, batch_me_likes>

Add the below code to get DATA from Facebook and generate Sequence File. Before you start you need to updated the AccessToken in the code with yours Access Token from Facebook. You might want to look here..

Read more…

Installing Hadoop 1.0.3 on Ubuntu Single Node Cluster using shell script

July 14, 2012 Leave a comment

I was working on setting up Hadoop on Ubuntu as a Single node cluster.

I came across a very nice blog about it here. (Must read to setup your single node cluster).
While I was at it, I was creating / Installing Hadoop multiple time in different system, then I though to create a script of my own, based on the blog above.

Here is the link to my script which is on GITHUB anyone interested can check-out and enhance the script.

https://github.com/zubayr/hadoopscript/blob/master/initScriptHadoop.sh

README : https://github.com/zubayr/hadoopscript/blob/master/README.txt

Requirement.

1. Hadoop 1.0.3

2. Ubuntu 10.04  or above (Tested on 11.04, 11.10 and 12.04 32bit platform)

Here is the details on how to install Hadoop using the script

Please Readme

- hadoop script to setup Single Node Cluster - For Hadoop 1.0.3 Only.

- Tested on Ubuntu 11.10, 12.04 - Fresh Install.

- Scripts assumes nothing is installed for Hadoop and installs Required Components for Hadoop to run.

- This Script was created using the Installation Guide by Micheal Noll.

http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-single-node-cluster/

 

Steps For Executing Script: Currently script only takes single option at a time :(

-----------------------------------------------------------------------

Execute Help

]$ sudo ./initScriptHadoop.sh --help

 

usage: ./initScriptHadoop.sh <single-parameter>

 

  Optional parameters:

     --install-init, -i Initialization script To Install Hadoop as Single Node Cluster.

     Use the below Options, Once you are logged-in as Hadoop User 'hduser' created in the -i init script above.

     --install-ssh, -s Install ssh-keygen -t rsa -P

     --install-bashrc, -b Updated '.bashrc' with JAVA_HOME, HADOOP_HOME.

     --ipv6-disable, -v IPv6 Support Disable.[ Might Not be required.

                                Updating 'conf/hadoop-env.sh' with

'HADOOP_OPTS=-Djava.net.preferIPv4Stack=true' option in -e]

     --hostname-update, -u Update Hostname for the system.

     --config-update, -c Update Configuration with default values

(Single Node) in core-site.xml, mapred-site.xml, hdfs-site.xml.

     --update-hadoop-env, -e Update Hadoop Env Script with JAVA_HOME.

     --help, -h Display this Message.

 

Read more…