Tuesday, December 17, 2013

SQOOP Installation on HADOOP to Connect with ORACLE and Pull the data and its structures .


11.    log into the terminal/Putty , go to downloads folder 

   

                   Download the sqoop .tar file using the below 
           
             Please note that if you do not get this version available , download any other stable version from                    Apache's mirror http://apache.mirrors.hoobly.com/sqoop/


   2.    Unzip the file using below:
                           tar -xzvf sqoop-1.4.4.bin__hadoop-1.0.0.tar.gz

                              
a 3.     Move the unzipped file to the install folder :

      o

  
   4.    Log into the Oracle's website below 
               Download the ojdbc6.jar by accepting the terms and conditions . 
               Please note that you will have to sign up with oracle to download the file .    


5.    Copy and paste the  ojdbc6.jar to the "~/install/sqoop/lib" folder as given below 



   6.    Go to "/home/hadoop/lab/install/sqoop-1.4.3.bin__hadoop-1.0.0/conf 


          open the sqoop-env.sh  using a VI editor 
          Or do the below 
          cp sqoop-env-template.sh sqoop-env.sh ( This will create the file if it does not exist )
          
          vi sqoop-env.sh

          Edit the below path as you have in your machine .


   7.    Start Hadoop by going to 
           
          hadoop@ubuntu-server64:~/lab/install/hadoop-1.0.4/bin$ ./start-all.sh


         "jps" should list all the processes running as given above .
          For Hadoop Installation , please check my blog on Hadoop installation .

   8.    Go to "/home/hadoop/lab/install/sqoop-1.4.3.bin__hadoop-1.0.0/bin " , Test the sqoop to be working or not ( This is for oracle for others RDBMS please revisit #4 above and download the appropriate jar)

          ./sqoop list-databases --connect jdbc:oracle:thin:@192.168.10.100:1521:PROD  --username SYSTEM --password manager 

   If the connection is working fine , it is time to go to the next step . Else There can be couple of issues 
                        1.   Correct .JAR is not placed in the sqoop/lib folder 
                        2.   User/pass using JDBC doesn't have access to the database .

   This can be one type of error 
================ERROR=====================================================

    hadoop@ubuntu-server64:~/lab/install/sqoop-1.4.3.bin__hadoop-1.0.0/bin$ ./sqoop list-databases --connect jdbc:oracle:thin:@192.168.0.41:1521:PROD  --username SPENDOMETER -P 
   Warning: /usr/lib/hbase does not exist! HBase imports will fail.
    Please set $HBASE_HOME to the root of your HBase installation.
    Enter password:
    13/12/16 14:48:08 INFO manager.SqlManager: Using default fetchSize of 1000
    13/12/16 14:48:09 INFO manager.OracleManager: Time zone has been set to GMT
    13/12/16 14:48:09 ERROR manager.OracleManager: The catalog view DBA_USERS was not found. This may happen if the user does not have DBA privileges. Please check privileges and try again.

================ERROR=====================================================

   9.   Execute the below command to pull the data and load into Hive .

    ./sqoop import  --connect jdbc:oracle:thin:@192.168.0.41:1521:PROD --table TEMP_FIRST_NAME --username SPENDOMETER --password spendometer --hive-import --split-by first_name --target-dir /user/hive/warehouse/name
   
    TEMP_FIRST_NAME  -> Name of the table 
    username -> This must always be CAPITIAL LETTER (SPENDOMETER )
    password -> This must always be small letter (spendometer )
    
   10.   To check if the file has been created or not , use the below command 
           Hadoop fs -ls /user/hive/warehouse/temp_first_name



       This should contain the folders inside which the data will be available 

   11.   Start the hive with command as given below .



    If  the hive starts and you still don't see the table , you are pointing to the wrong metastore_DB from hive .






   12.  start the hive using "bash" command as below   

         hadoop@ubuntu-server64:~/lab/install/sqoop-1.4.3.bin__hadoop-1.0.0/bin$ bash            /home/hadoop/lab/install/hive-0.9.0/bin/hive


    This should now fetch you data as below 


1 comment:

  1. what should we do with the error?

    When I try to list Databases it gave me the DBA users was not found error too.

    ERROR manager.OracleManager: The catalog view DBA_USERS was not found. This may happen if the user does not have DBA privileges. Please check privileges and try again.

    ReplyDelete