Archive

Archive for July, 2012

Getting Data from RestFB and Creating Sequence File > Hadoop

July 16, 2012 Leave a comment
Here is a quick code to get data from Facebook using RestFB API and create Sequence file and dump you data into Hadoop Cluster.

Requirement:

  • Hadoop 1.0.3 Installed as Stand Alone or Multinode. 
  • Eclipse IDE for development 
  • Hadoop and Apache commons jars. 
  • RestFB APIs 

Steps to Create Eclipse Project.

  • New Java Project. 
  • Add the jar to the project. (Apache Commons and hadoop-core.1.0.3.jar) and add RestFB jar. 
  • You will find all (commons and hadoop) jars under hadoop directory. 

Sequence File Content Format.

  • Key –  
  • Value –



Add the below code to get DATA from Facebook and generate Sequence File. Before you start you need to updated the AccessToken in the code with yours Access Token from Facebook. Take look here before you proceed.

/*
* Sequence File to HDFS Filesystem
*
* We take information from Facebook as Batch
* and then store them in Sequence file in Hadoop Distributed File System.
*
* */

import java.io.IOException;
import java.net.URI;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.LocalFileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IOUtils;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.SequenceFile;
import org.apache.hadoop.io.Text;

import static java.lang.String.format;
import static java.lang.System.currentTimeMillis;
import static java.lang.System.out;

import java.text.DateFormat;
import java.text.SimpleDateFormat;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.Date;
import java.util.HashMap;
import java.util.List;
import java.util.Map;

import com.restfb.Connection;
import com.restfb.DefaultFacebookClient;
import com.restfb.DefaultJsonMapper;
import com.restfb.Facebook;
import com.restfb.FacebookClient;
import com.restfb.JsonMapper;
import com.restfb.Parameter;
import com.restfb.batch.BatchRequest;
import com.restfb.batch.BatchRequest.BatchRequestBuilder;
import com.restfb.batch.BatchResponse;
import com.restfb.json.JsonArray;
import com.restfb.json.JsonObject;
import com.restfb.types.Page;
import com.restfb.types.Post;
import com.restfb.types.Url;
import com.restfb.types.User;

public class SequenceFileToHDFS {
    @SuppressWarnings(“deprecation”)
    public static void main(String[] args) throws IOException
    {
        String accessToken = “ACCESS_TOKEN_STRING”;
       
        //get current date time with Date()
        DateFormat dateFormat = new SimpleDateFormat(“yyyy_MM_dd_HH_mm_ss”);
        Date date = new Date(); 
       
String uri = “sequence_file_”+dateFormat.format(date)+”.seq”;
        Configuration conf = new Configuration();
       
        /*
         * Uncomment this 2 lines below to get configuratio from the XML
         * Make sure the PATH is set right to get the configuration
         * Which will dump the Sequence file into HADOOP
        */
        //conf.addResource(new Path (“/usr/local/hadoop/conf/core-site.xml”));
        //conf.addResource(new Path (“/usr/local/hadoop/conf/hdfs-site.xml”));
       
        /*Comment these 2 lines below and uncomment above 2 lines to write the data into Hadoop*/
            FileSystem.getLocal(conf); //for local file system
            LocalFileSystem fs = FileSystem.getLocal(conf);
        /*Local Sequence End Here*/
       
        /* Uncomment line below to make it work with Configuration file above */
        //FileSystem fs = FileSystem.get(URI.create(uri), conf);
       
        Path path = new Path(uri);

        /*
         * Starting Facebook Retrieval
         */
       
        DefaultFacebookClient facebookClient = new DefaultFacebookClient(accessToken);
        User user = facebookClient.fetchObject(“me”, User.class);

        /*
         * Building Batch Request to send to Facebook
         */
               
        BatchRequest meRequest = new BatchRequestBuilder(“me”).build();
        BatchRequest meFriendRequest = new BatchRequestBuilder(“me/friends”).build();
        BatchRequest meLikeRequest = new BatchRequestBuilder(“me/likes”).parameters(Parameter.with(“limit”, 5)).build();

        /* Executing BATCH Request */
        /* This will be our Sequence Value*/
        List batchResponses =
            facebookClient.executeBatch(meRequest, meFriendRequest, meLikeRequest);

        /*
         * Based on the Response from Facebook
         * We create Sequence File.
         *
         */
       
        if(batchResponses.get(0).getCode() == 200)
        {
            /* Creating Sequence Key */
            JsonObject sequencekeyMapUser = new JsonObject();
            sequencekeyMapUser.put(“facebookId”, user.getId());
            sequencekeyMapUser.put(“facebookName”,user.getName());
            sequencekeyMapUser.put(“timestamp”, dateFormat.format(date));

            Text key = new Text();
            Text value = new Text();
            SequenceFile.Writer writer = null;
            try
            {
                writer = SequenceFile.createWriter(fs, conf, path, key.getClass(), value.getClass());
                key.set(sequencekeyMapUser.toString());
                value.set(batchResponses.toString());
                System.out.printf(“[%s]\t%s\t%s\n”, writer.getLength(), key, value);
                writer.append(key, value);

            }
            finally
            {
                IOUtils.closeStream(writer);
            }
        }
        else if(batchResponses.get(0).getCode() != 200)
        {
            System.out.printf(“Access Token Expired\n”);
        }
    }
}

Categories: BigData, HOWTOs

Getting Batch Data from Facebook using restFB APIs

July 16, 2012 Leave a comment

Here is quick sample code to get data from Facebook Batch API.

Download the jar from here – http://code.google.com/p/restfb/downloads/detail?name=restfb-1.6.9.zip

And put it in your library path and execute the below code.

Go to this link and login to facebook to get your access token : https://developers.facebook.com/tools/explorer

Change the code to pass your “AccessToken” directly to “DefaultFacebookClient facebookClient = new DefaultFacebookClient(“<<>>”);

 

import static java.lang.String.format;
import static java.lang.System.currentTimeMillis;
import static java.lang.System.out;

import java.util.ArrayList;
import java.util.Arrays;
import java.util.Date;
import java.util.HashMap;
import java.util.List;
import java.util.Map;

import com.restfb.Connection;
import com.restfb.DefaultFacebookClient;
import com.restfb.DefaultJsonMapper;
import com.restfb.Facebook;
import com.restfb.FacebookClient;
import com.restfb.JsonMapper;
import com.restfb.Parameter;
import com.restfb.batch.BatchRequest;
import com.restfb.batch.BatchRequest.BatchRequestBuilder;
import com.restfb.batch.BatchResponse;
import com.restfb.json.JsonArray;
import com.restfb.json.JsonObject;
import com.restfb.types.Page;
import com.restfb.types.Post;
import com.restfb.types.Url;
import com.restfb.types.User;

public class GraphReaderHadoopFB {

  public static void main(String[] args) {

      DefaultFacebookClient facebookClient = new DefaultFacebookClient(“<<>>”);
     
          out.println(“Starting BATCH \n”);
     
          //Building Batch Request to send to Facebook
          out.println(“Creating BATCH \n”);
        BatchRequest meRequest = new BatchRequestBuilder(“me”).build();
        BatchRequest meFriendRequest = new BatchRequestBuilder(“me/friends”).build();
        BatchRequest meLikeRequest = new BatchRequestBuilder(“me/likes”).parameters(Parameter.with(“limit”, 5)).build();
       
       
        //Creating POST Request – Not working yet – moved to GET
        out.println(“Posting Request \n”);
        BatchRequest postRequest = new BatchRequestBuilder(“me”).method(“GET”)
                .body(Parameter.with(“message”, “Info!!!”)).build();

        //Executing BATCH Request
        out.println(“Complete Batch Response \n”);
        List batchResponses =
                facebookClient.executeBatch(meRequest, meFriendRequest, meLikeRequest, postRequest);
       
       
        //Got Response we can use this information to process further.
        out.println(“\n Response \n”);
        BatchResponse meResponse = batchResponses.get(0);
        BatchResponse meFriendResponse = batchResponses.get(1);
        BatchResponse meLikeResponse = batchResponses.get(2);
        BatchResponse postResponse = batchResponses.get(3);
       
        out.println(“\n *********** Individual Reponse ************* \n”);
       
        out.println(“\n meResponse \n”);
        out.println(meResponse.getBody());
       
        out.println(“\n meFriendResponse \n”);
        out.println(meFriendResponse.getBody());
       
        out.println(“\n meLikeResponse Getting 5 (LIMITED) \n”);
        out.println(meLikeResponse.getBody());
       
        out.println(“\n postResponse \n”);
        out.println(postResponse.getBody());
       
  }
}

Categories: BigData, HOWTOs

Installing Hadoop 1.0.3 on Ubuntu Single Node Cluster using shell script

July 16, 2012 Leave a comment

I was working on setting up Hadoop on Ubuntu as a Single node cluster.

I came across a very nice blog about it here. (Must read to setup your single node cluster).
While I was at it, I was creating / Installing Hadoop multiple time in different system, then I though to create a script of my own, based on the blog above.

Here is the link to my script which is on GITHUB anyone interested can check-out and enhance the script.

https://github.com/zubayr/hadoopscript/blob/master/initScriptHadoop.sh

README : https://github.com/zubayr/hadoopscript/blob/master/README.txt

Requirement.

1. Hadoop 1.0.3

2. Ubuntu 10.04  or above (Tested on 11.04, 11.10 and 12.04 32bit platform)

Here is the details on how to install Hadoop using the script

Please Readme

- hadoop script to setup Single Node Cluster - For Hadoop 1.0.3 Only.

- Tested on Ubuntu 11.10, 12.04 - Fresh Install.

- Scripts assumes nothing is installed for Hadoop and installs Required Components for Hadoop to run.

- This Script was created using the Installation Guide by Micheal Noll.

http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-single-node-cluster/

 

Steps For Executing Script: Currently script only takes single option at a time :(

-----------------------------------------------------------------------

Execute Help

]$ sudo ./initScriptHadoop.sh --help

 

usage: ./initScriptHadoop.sh

 

  Optional parameters:

     --install-init, -i Initialization script To Install Hadoop as Single Node Cluster.

     Use the below Options, Once you are logged-in as Hadoop User 'hduser' created in the -i init script above.

     --install-ssh, -s Install ssh-keygen -t rsa -P

     --install-bashrc, -b Updated '.bashrc' with JAVA_HOME, HADOOP_HOME.

     --ipv6-disable, -v IPv6 Support Disable.[ Might Not be required.

                                Updating 'conf/hadoop-env.sh' with

'HADOOP_OPTS=-Djava.net.preferIPv4Stack=true' option in -e]

     --hostname-update, -u Update Hostname for the system.

     --config-update, -c Update Configuration with default values

(Single Node) in core-site.xml, mapred-site.xml, hdfs-site.xml.

     --update-hadoop-env, -e Update Hadoop Env Script with JAVA_HOME.

     --help, -h Display this Message.

 

1. First Install prerequisites using -i Option

     ahmed@ahmed-on-Edge:~$ ./initScriptHadoop.sh -i

      Welcome to Precofiguration For Hadoop single node setup wizard

    

     Would you like install Java 1.6 ? (y/n) y

     Would you like to setup user 'hduser' and 'hadoop Group'? (y/n) y

     Would you like to download Hadoop 1.0.3 and extract to /usr/local? (y/n) y

     Would you like to make 'hduser' owner /usr/local/hadoop/ directory? (y/n) y

     Would you like to login into 'htuser' once done? (y/n) y

    

      Review your choices:

    

      Install Java 1.6 : y

      Setup 'hduser' user : y

      Download Hadoop 1.0.3 : y

      Setup 'hduser' as Owner : y

      Login to 'hduser' : y

    

     Proceed with setup? (y/n)y

 

During the installation it will ask for password for hduser

 

2. Login to 'hduser' (which will be created in the -i options).

3. Execute options -s, -b, -c, -e

   /initScriptHadoop.sh -s;

   /initScriptHadoop.sh -b;

   /initScriptHadoop.sh -c;

   /initScriptHadoop.sh -e;

 

Once you are done installing. Login as hduser.

First Format the node.

] $ cd /usr/local/hadoop

] $ hduser@ubuntu:/usr/local/hadoop$ bin/hadoop namenode -format

And format the node will generate output similar to this

hduser@ubuntu:/usr/local/hadoop$ bin/hadoop namenode -format
10/05/08 16:59:56 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG: host = ubuntu/127.0.1.1
STARTUP_MSG: args = [-format]
STARTUP_MSG: version = 0.20.2
STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/..
************************************************************/
10/05/08
16:59:56 INFO namenode.FSNamesystem: fsOwner=hduser,hadoop
10/05/08 16:59:56 INFO namenode.FSNamesystem: supergroup=supergroup
10/05/08 16:59:56 INFO namenode.FSNamesystem: isPermissionEnabled=true
10/05/08 16:59:56 INFO common.Storage: Image file of size 96 saved in 0 seconds.
10/05/08 16:59:57 INFO common.Storage: Storage . . . as been successfully formatted.
10/05/08 16:59:57 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at ubuntu/127.0.1.1
************************************************************/
hduser@ubuntu:/usr/local/hadoop$

Next lets start the Single node Cluster and then check if all is well with “jps” command.

hduser@ubuntu:/usr/local/hadoop$ bin/start-all.sh
starting namenode, logging to /usr/local/hadoop/bin/../logs/hadoop-hduser-namenode-ubuntu.out
localhost: starting datanode, logging to /usr/local/hadoop/bin/../logs/hadoop-hduser-datanode-ubuntu.out
localhost: starting secondarynamenode, logging to /usr/local/hadoop/bin/../logs/hadoop-hduser-secondarynamenode-ubuntu.out
starting jobtracker, logging to /usr/local/hadoop/bin/../logs/hadoop-hduser-jobtracker-ubuntu.out
localhost: starting tasktracker, logging to /usr/local/hadoop/bin/../logs/hadoop-hduser-tasktracker-ubuntu.out
hduser@ubuntu:/usr/local/hadoop$

hduser@ubuntu:/usr/local/hadoop$ jps
2287 TaskTracker
2149 JobTracker
1938 DataNode
2085 SecondaryNameNode
2349 Jps
1788 NameNode

This shows that all the processes are working fine. Now we are ready to run some good old word count MapReduce Code.

Categories: BigData, HOWTOs

Getting Data from RestFB and Creating Sequence File > Hadoop

July 16, 2012 Leave a comment

Here is a quick code to get data from Facebook using RestFB API and  create Sequence file and dump you data into Hadoop Cluster.

Requirement:

  • Hadoop 1.0.3 Installed as Stand Alone or Multinode.
  • Eclipse IDE for development
  • Hadoop and Apache commons jars.
  • RestFB APIs

Steps to Create Eclipse Project.

  • New Java Project.
  • Add the jar to the project. (Apache Commons and hadoop-core.1.0.3.jar) and add RestFB jar.
  • You will find all (commons and hadoop) jars under hadoop directory.

Sequence File Content Format.

  • Key – <facebook_id, facebook_name, timestamp>
  • Value – <batch_me, batch_me_friends, batch_me_likes>

Add the below code to get DATA from Facebook and generate Sequence File. Before you start you need to updated the AccessToken in the code with yours Access Token from Facebook. You might want to look here..

Read more…

Getting Batch Data from Facebook using restFB APIs

July 14, 2012 Leave a comment

Here is quick sample code to get data from Facebook Batch API.

Download the jar from here – http://code.google.com/p/restfb/downloads/detail?name=restfb-1.6.9.zip

And put it in your library path and execute the below code.

Go to this link and login to facebook to get your access token : https://developers.facebook.com/tools/explorer

Change the code to pass your “AccessToken” directly to “DefaultFacebookClient facebookClient = new DefaultFacebookClient(“<<<ACCESSTOKEN HERE>>>”);

Read more…

Installing Hadoop 1.0.3 on Ubuntu Single Node Cluster using shell script

July 14, 2012 Leave a comment

I was working on setting up Hadoop on Ubuntu as a Single node cluster.

I came across a very nice blog about it here. (Must read to setup your single node cluster).
While I was at it, I was creating / Installing Hadoop multiple time in different system, then I though to create a script of my own, based on the blog above.

Here is the link to my script which is on GITHUB anyone interested can check-out and enhance the script.

https://github.com/zubayr/hadoopscript/blob/master/initScriptHadoop.sh

README : https://github.com/zubayr/hadoopscript/blob/master/README.txt

Requirement.

1. Hadoop 1.0.3

2. Ubuntu 10.04  or above (Tested on 11.04, 11.10 and 12.04 32bit platform)

Here is the details on how to install Hadoop using the script

Please Readme

- hadoop script to setup Single Node Cluster - For Hadoop 1.0.3 Only.

- Tested on Ubuntu 11.10, 12.04 - Fresh Install.

- Scripts assumes nothing is installed for Hadoop and installs Required Components for Hadoop to run.

- This Script was created using the Installation Guide by Micheal Noll.

http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-single-node-cluster/

 

Steps For Executing Script: Currently script only takes single option at a time :(

-----------------------------------------------------------------------

Execute Help

]$ sudo ./initScriptHadoop.sh --help

 

usage: ./initScriptHadoop.sh <single-parameter>

 

  Optional parameters:

     --install-init, -i Initialization script To Install Hadoop as Single Node Cluster.

     Use the below Options, Once you are logged-in as Hadoop User 'hduser' created in the -i init script above.

     --install-ssh, -s Install ssh-keygen -t rsa -P

     --install-bashrc, -b Updated '.bashrc' with JAVA_HOME, HADOOP_HOME.

     --ipv6-disable, -v IPv6 Support Disable.[ Might Not be required.

                                Updating 'conf/hadoop-env.sh' with

'HADOOP_OPTS=-Djava.net.preferIPv4Stack=true' option in -e]

     --hostname-update, -u Update Hostname for the system.

     --config-update, -c Update Configuration with default values

(Single Node) in core-site.xml, mapred-site.xml, hdfs-site.xml.

     --update-hadoop-env, -e Update Hadoop Env Script with JAVA_HOME.

     --help, -h Display this Message.

 

Read more…