Friday, July 17, 2015

Stream tweeter data into hdfs using Flume

Install Flume:
# wget http://www.gtlib.gatech.edu/pub/apache/flume/1.6.0/apache-flume-1.6.0-bin.tar.gz
 tar -xzvf apache-flume-1.6.0-bin.tar.gz
cd apache-flume-1.6.0-bin
cd conf
cp flume-env.sh.template flume-env.sh
vi flume-env.sh  << add below lines>

export JAVA_HOME=/usr/lib/jvm/java-7-oracle
# Give Flume more memory and pre-allocate, enable remote monitoring via JMX
# export JAVA_OPTS="-Xms100m -Xmx2000m -Dcom.sun.management.jmxremote"

# Note that the Flume conf directory is always included in the classpath.
FLUME_CLASSPATH="/home/hduser/apache-flume-1.6.0-bin/lib/flume-sources-1.0-SNAPSHOT.jar"

#cp flume-conf.properties.template flume-conf
*************
Twitter application setup

https://apps.twitter.com/

https://apps.twitter.com/app/3389049/show




#vi flume-conf
TwitterAgent.sources = Twitter
TwitterAgent.channels = MemChannel
TwitterAgent.sinks = HDFS

TwitterAgent.sources.Twitter.type = com.cloudera.flume.source.TwitterSource
TwitterAgent.sources.Twitter.channels = MemChannel
TwitterAgent.sources.Twitter.consumerKey = ZR0iLmZXu1QM1ZvX0K3VlPglE
TwitterAgent.sources.Twitter.consumerSecret = CNKjEE9j4iT4Hev6P6joq7iWSIAPx0hRaRKJwGeew9gg1SRoms
TwitterAgent.sources.Twitter.accessToken = 3280478912-ieuY8LQEA3fbgbKkb92aDNTKrmxiNn43ZtsexjF
TwitterAgent.sources.Twitter.accessTokenSecret =  n5Rti4gQy4DxyGp7EFr83hx0CFwWBm4hSlkJ5vOkWfOyC
TwitterAgent.sources.Twitter.keywords = hadoop, big data, analytics, bigdata, cloudera, data science, data scientiest, business intelligence, mapreduce, data warehouse, data warehousing, mahout, hbase, nosql, newsql, businessintelligence, cloudcomputing

TwitterAgent.sinks.HDFS.channel = MemChannel
TwitterAgent.sinks.HDFS.type = hdfs
TwitterAgent.sinks.HDFS.hdfs.path = hdfs://localhost:54310/user/flume/tweets/%Y/%m/%d/%H/
TwitterAgent.sinks.HDFS.hdfs.fileType = DataStream
TwitterAgent.sinks.HDFS.hdfs.writeFormat = Text
TwitterAgent.sinks.HDFS.hdfs.batchSize = 1000
TwitterAgent.sinks.HDFS.hdfs.rollSize = 0
TwitterAgent.sinks.HDFS.hdfs.rollCount = 10000

TwitterAgent.channels.MemChannel.type = memory
TwitterAgent.channels.MemChannel.capacity = 10000
TwitterAgent.channels.MemChannel.transactionCapacity = 100


#cd bin/

# ./flume-ng agent -n TwitterAgent -c conf -f /home/hduser/apache-flume-1.6.0-bin/conf/flume.conf 


Browse the Name node and hdfs file system
http://localhost:50070/dfshealth.jsp




Error
15/07/18 20:45:51 WARN hdfs.HDFSEventSink: HDFS IO error
java.io.IOException: Callable timed out after 15000 ms on file: hdfs://localhost:54310/user/flume/tweets/2015/07/18/20//FlumeData.1437277535793.tmp
at org.apache.flume.sink.hdfs.BucketWriter.callWithTimeout(BucketWriter.java:693)
at org.apache.flume.sink.hdfs.BucketWriter.open(BucketWriter.java:235)
at org.apache.flume.sink.hdfs.BucketWriter.append(BucketWriter.java:514)
at org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:418)
at org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68)

Fixed : Increated timeout parameter by 15000ms


Sqoop and Mysql Integration

Sqoop installation:

1. Download tar file
wget http://apache.cs.utah.edu/sqoop/1.4.6/sqoop-1.4.6.bin__hadoop-0.23.tar.gz


Update .bashrc file with SQOOP_HOME environment variable



mysql hdfs connector

http://dev.mysql.com/downloads/connector/j

4. My sql installation:
Follow the ubantu help document to install mysql on ubantu ( Please check relevant ubantu os version)
https://help.ubuntu.com/lts/serverguide/mysql.html

#Check the ip address of your local machine
ifconfig  and updated the following file with bind address
#vi /etc/mysql/my.cnf
bind-address = 192.168.107.128 #your machine IP addrss
#run the following command
mysql> grant all privileges on *.* to root@192.168.107.128 IDENTIFIED BY 'root' WITH GRANT OPTION;
Query OK, 0 rows affected (0.02 sec)

root@ubuntu:~# mysql -hlocalhost -uroot -p
mysql> 


#down load paranamer.jar from following location
http://grepcode.com/snapshot/repo1.maven.org/maven2/com.thoughtworks.paranamer/paranamer/2.3

Error: while executing the 
root@ubuntu:/home/hduser/sqoop-1.4.6.bin__hadoop-0.23/bin# ./sqoop import --connect jdbc:mysql://192.168.107.128/sqoop_test --table Employee -username root -P --target-dir hdfs://localhost:54310/user/hduser/input1 -m 1
Warning: /home/hduser/sqoop-1.4.6.bin__hadoop-0.23/bin/../../hbase does not exist! HBase imports will fail.
Please set $HBASE_HOME to the root of your HBase installation.
Warning: /home/hduser/sqoop-1.4.6.bin__hadoop-0.23/bin/../../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /home/hduser/sqoop-1.4.6.bin__hadoop-0.23/bin/../../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
Warning: /home/hduser/sqoop-1.4.6.bin__hadoop-0.23/bin/../../zookeeper does not exist! Accumulo imports will fail.
Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.
15/07/17 11:26:02 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6
Enter password:
15/07/17 11:26:12 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
15/07/17 11:26:12 INFO tool.CodeGenTool: Beginning code generation
15/07/17 11:26:13 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `Employee` AS t LIMIT 1
15/07/17 11:26:13 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `Employee` AS t LIMIT 1
15/07/17 11:26:13 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/local/hadoop-2.6.0
Note: /tmp/sqoop-root/compile/0243c39d019b0e2b147a4733f9b43305/Employee.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
15/07/17 11:26:17 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-root/compile/0243c39d019b0e2b147a4733f9b43305/Employee.jar
15/07/17 11:26:17 WARN manager.MySQLManager: It looks like you are importing from mysql.
15/07/17 11:26:17 WARN manager.MySQLManager: This transfer can be faster! Use the --direct
15/07/17 11:26:17 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path.
15/07/17 11:26:17 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)
15/07/17 11:26:17 INFO mapreduce.ImportJobBase: Beginning import of Employee
15/07/17 11:26:17 INFO Configuration.deprecation: mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
15/07/17 11:26:17 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
15/07/17 11:26:20 INFO Configuration.deprecation: mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
15/07/17 11:26:20 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
15/07/17 11:26:20 INFO Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id
15/07/17 11:26:20 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
15/07/17 11:26:21 INFO mapreduce.JobSubmitter: Cleaning up the staging area file:/usr/local/hadoop-2.6.0/tmp/mapred/staging/root1479571726/.staging/job_local1479571726_0001
15/07/17 11:26:21 ERROR tool.ImportTool: Encountered IOException running import job: java.io.FileNotFoundException: File does not exist: hdfs://localhost:54310/home/hduser/sqoop-1.4.6.bin__hadoop-0.23/lib/paranamer-2.3.jar
at org.apache.hadoop.hdfs.DistributedFileSystem$18.doCall(DistributedFileSystem.java:1122)
at org.apache.hadoop.hdfs.DistributedFileSystem$18.doCall(DistributedFileSystem.java:1114)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1114)
at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:288)
at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:224)
at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestamps(ClientDistributedCacheManager.java:93)
at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestampsAndCacheVisibilities(ClientDistributedCacheManager.java:57)
at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:269)
at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:390)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:483)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1296)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1293)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1293)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1314)
at org.apache.sqoop.mapreduce.ImportJobBase.doSubmitJob(ImportJobBase.java:196)
at org.apache.sqoop.mapreduce.ImportJobBase.runJob(ImportJobBase.java:169)
at org.apache.sqoop.mapreduce.ImportJobBase.runImport(ImportJobBase.java:266)
at org.apache.sqoop.manager.SqlManager.importTable(SqlManager.java:673)
at org.apache.sqoop.manager.MySQLManager.importTable(MySQLManager.java:118)
at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:497)
at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:605)
at org.apache.sqoop.Sqoop.run(Sqoop.java:143)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227)
at org.apache.sqoop.Sqoop.main(Sqoop.java:236)



Fixed:
1. Go to mapred-site.xml
hduser@ubuntu:/usr/local/hadoop-2.6.0/etc/hadoop
and add following lines
 <property>
      <name>mapreduce.framework.name</name>
      <value>yarn</value>
 </property>

2. restart hadoop deamons using /usr/local/hadoop-2.6.0/sbin/stop/shart-all.sh command
 and execute the sqoop import command again

Import from Mysql to HDFS using sqoop
hduser@ubuntu:~/sqoop-1.4.6.bin__hadoop-0.23/bin$ ./sqoop import --connect jdbc:mysql://192.168.107.128/sqoop_test --table Employee -username root -P --target-dir hdfs://localhost:54310/user/hduser/input1 -m 1
Warning: /home/hduser/sqoop-1.4.6.bin__hadoop-0.23/../hbase does not exist! HBase imports will fail.
Please set $HBASE_HOME to the root of your HBase installation.
Warning: /home/hduser/sqoop-1.4.6.bin__hadoop-0.23/../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /home/hduser/sqoop-1.4.6.bin__hadoop-0.23/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
Warning: /home/hduser/sqoop-1.4.6.bin__hadoop-0.23/../zookeeper does not exist! Accumulo imports will fail.
Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.
15/07/17 11:32:17 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6
Enter password:
15/07/17 11:32:20 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
15/07/17 11:32:20 INFO tool.CodeGenTool: Beginning code generation
15/07/17 11:32:21 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `Employee` AS t LIMIT 1
15/07/17 11:32:21 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `Employee` AS t LIMIT 1
15/07/17 11:32:21 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/local/hadoop-2.6.0
Note: /tmp/sqoop-hduser/compile/965ff3242704eeefead296b62f34429d/Employee.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
15/07/17 11:32:24 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hduser/compile/965ff3242704eeefead296b62f34429d/Employee.jar
15/07/17 11:32:24 WARN manager.MySQLManager: It looks like you are importing from mysql.
15/07/17 11:32:24 WARN manager.MySQLManager: This transfer can be faster! Use the --direct
15/07/17 11:32:24 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path.
15/07/17 11:32:24 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)
15/07/17 11:32:24 INFO mapreduce.ImportJobBase: Beginning import of Employee
15/07/17 11:32:24 INFO Configuration.deprecation: mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
15/07/17 11:32:25 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
15/07/17 11:32:27 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
15/07/17 11:32:27 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
15/07/17 11:32:33 INFO db.DBInputFormat: Using read commited transaction isolation
15/07/17 11:32:33 INFO mapreduce.JobSubmitter: number of splits:1
15/07/17 11:32:33 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1437157914144_0001
15/07/17 11:32:35 INFO impl.YarnClientImpl: Submitted application application_1437157914144_0001
15/07/17 11:32:35 INFO mapreduce.Job: The url to track the job: http://ubuntu:8088/proxy/application_1437157914144_0001/
15/07/17 11:32:35 INFO mapreduce.Job: Running job: job_1437157914144_0001
15/07/17 11:32:59 INFO mapreduce.Job: Job job_1437157914144_0001 running in uber mode : false
15/07/17 11:32:59 INFO mapreduce.Job:  map 0% reduce 0%
15/07/17 11:33:11 INFO mapreduce.Job:  map 100% reduce 0%
15/07/17 11:33:12 INFO mapreduce.Job: Job job_1437157914144_0001 completed successfully
15/07/17 11:33:12 INFO mapreduce.Job: Counters: 30
File System Counters
FILE: Number of bytes read=0
FILE: Number of bytes written=124533
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=87
HDFS: Number of bytes written=47
HDFS: Number of read operations=4
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters
Launched map tasks=1
Other local map tasks=1
Total time spent by all maps in occupied slots (ms)=10398
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=10398
Total vcore-seconds taken by all map tasks=10398
Total megabyte-seconds taken by all map tasks=10647552
Map-Reduce Framework
Map input records=3
Map output records=3
Input split bytes=87
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=193
CPU time spent (ms)=1390
Physical memory (bytes) snapshot=96030720
Virtual memory (bytes) snapshot=668921856
Total committed heap usage (bytes)=16093184
File Input Format Counters
Bytes Read=0
File Output Format Counters
Bytes Written=47
15/07/17 11:33:13 INFO mapreduce.ImportJobBase: Transferred 47 bytes in 45.7892 seconds (1.0264 bytes/sec)
15/07/17 11:33:13 INFO mapreduce.ImportJobBase: Retrieved 3 records.
hduser@ubuntu:~/sqoop-1.4.6.bin__hadoop-0.23/bin$ hadoop fs -ls
Found 2 items
drwxr-xr-x   - hduser supergroup          0 2015-07-17 10:42 input
drwxr-xr-x   - hduser supergroup          0 2015-07-17 11:33 input1
hduser@ubuntu:~/sqoop-1.4.6.bin__hadoop-0.23/bin$ hdfs dfs -lsr
lsr: DEPRECATED: Please use 'ls -R' instead.
drwxr-xr-x   - hduser supergroup          0 2015-07-17 10:42 input
drwxr-xr-x   - hduser supergroup          0 2015-07-17 11:33 input1
-rw-r--r--   1 hduser supergroup          0 2015-07-17 11:33 input1/_SUCCESS
-rw-r--r--   1 hduser supergroup         47 2015-07-17 11:33 input1/part-m-00000

#hduser@ubuntu:~/sqoop-1.4.6.bin__hadoop-0.23/bin$ hadoop fs -cat input1/part-m-00000
Bhupendra,mishra
shreyash,mishra
shreya,mishra
hduser@ubuntu:~/sqoop-1.4.6.bin__hadoop-0.23/bin$

Export from hdfs to MySql table:

root@ubuntu:~# mysql -hlocalhost -uroot -p
mysql>

1. Create table with relevant columns
mysql> use sqoop_test
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Database changed
mysql> create table EmpExport(name varchar(40), lname varchar(30));
Query OK, 0 rows affected (0.07 sec)


2. Run sqoop export command as follows:
hduser@ubuntu:~/sqoop-1.4.6.bin__hadoop-0.23/bin$ ./sqoop export --connect jdbc:mysql://192.168.107.128/sqoop_test --table EmpExport -username root -P --export-dir input1/part-m-00000
Warning: /home/hduser/sqoop-1.4.6.bin__hadoop-0.23/../hbase does not exist! HBase imports will fail.
Please set $HBASE_HOME to the root of your HBase installation.
Warning: /home/hduser/sqoop-1.4.6.bin__hadoop-0.23/../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /home/hduser/sqoop-1.4.6.bin__hadoop-0.23/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
Warning: /home/hduser/sqoop-1.4.6.bin__hadoop-0.23/../zookeeper does not exist! Accumulo imports will fail.
Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.
15/07/17 12:07:59 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6
Enter password:
15/07/17 12:08:02 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
15/07/17 12:08:02 INFO tool.CodeGenTool: Beginning code generation
15/07/17 12:08:03 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `EmpExport` AS t LIMIT 1
15/07/17 12:08:03 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `EmpExport` AS t LIMIT 1
15/07/17 12:08:03 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/local/hadoop-2.6.0
Note: /tmp/sqoop-hduser/compile/20e181133cc5b1b7fba470120c9ef534/EmpExport.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
15/07/17 12:08:06 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hduser/compile/20e181133cc5b1b7fba470120c9ef534/EmpExport.jar
15/07/17 12:08:06 INFO mapreduce.ExportJobBase: Beginning export of EmpExport
15/07/17 12:08:06 INFO Configuration.deprecation: mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
15/07/17 12:08:07 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
15/07/17 12:08:09 INFO Configuration.deprecation: mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative
15/07/17 12:08:09 INFO Configuration.deprecation: mapred.map.tasks.speculative.execution is deprecated. Instead, use mapreduce.map.speculative
15/07/17 12:08:09 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
15/07/17 12:08:09 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
15/07/17 12:08:14 INFO input.FileInputFormat: Total input paths to process : 1
15/07/17 12:08:14 INFO input.FileInputFormat: Total input paths to process : 1
15/07/17 12:08:14 INFO mapreduce.JobSubmitter: number of splits:4
15/07/17 12:08:14 INFO Configuration.deprecation: mapred.map.tasks.speculative.execution is deprecated. Instead, use mapreduce.map.speculative
15/07/17 12:08:15 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1437157914144_0002
15/07/17 12:08:15 INFO impl.YarnClientImpl: Submitted application application_1437157914144_0002
15/07/17 12:08:15 INFO mapreduce.Job: The url to track the job: http://ubuntu:8088/proxy/application_1437157914144_0002/
15/07/17 12:08:15 INFO mapreduce.Job: Running job: job_1437157914144_0002
15/07/17 12:08:31 INFO mapreduce.Job: Job job_1437157914144_0002 running in uber mode : false
15/07/17 12:08:31 INFO mapreduce.Job:  map 0% reduce 0%
15/07/17 12:09:12 INFO mapreduce.Job:  map 100% reduce 0%
15/07/17 12:09:13 INFO mapreduce.Job: Job job_1437157914144_0002 completed successfully
15/07/17 12:09:14 INFO mapreduce.Job: Counters: 30
File System Counters
FILE: Number of bytes read=0
FILE: Number of bytes written=497480
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=755
HDFS: Number of bytes written=0
HDFS: Number of read operations=19
HDFS: Number of large read operations=0
HDFS: Number of write operations=0
Job Counters
Launched map tasks=4
Data-local map tasks=4
Total time spent by all maps in occupied slots (ms)=149420
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=149420
Total vcore-seconds taken by all map tasks=149420
Total megabyte-seconds taken by all map tasks=153006080
Map-Reduce Framework
Map input records=3
Map output records=3
Input split bytes=611
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=2303
CPU time spent (ms)=4780
Physical memory (bytes) snapshot=348635136
Virtual memory (bytes) snapshot=2666725376
Total committed heap usage (bytes)=64946176
File Input Format Counters
Bytes Read=0
File Output Format Counters
Bytes Written=0
15/07/17 12:09:14 INFO mapreduce.ExportJobBase: Transferred 755 bytes in 64.8425 seconds (11.6436 bytes/sec)
15/07/17 12:09:14 INFO mapreduce.ExportJobBase: Exported 3 records.
hduser@ubuntu:~/sqoop-1.4.6.bin__hadoop-0.23/bin$

3. Check the table to verify data.
mysql> select * from EmpExport
    -> ;
+-----------+--------+
| name      | lname  |
+-----------+--------+
| shreyash  | mishra |
| Bhupendra | mishra |
| shreya    | mishra |
+-----------+--------+
3 rows in set (0.00 sec)

http://localhost:8088/cluster


*********************************************************************************
Horton sandbox

[root@sandbox ~]# sqoop import --connect jdbc:mysql://localhost/sqoop_test  -username=root --table Employee -m 1
Warning: /usr/lib/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
15/07/16 05:23:41 INFO sqoop.Sqoop: Running Sqoop version: 1.4.4.2.1.1.0-385
15/07/16 05:23:41 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
15/07/16 05:23:41 INFO tool.CodeGenTool: Beginning code generation
15/07/16 05:23:42 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `Employee` AS t LIMIT 1
15/07/16 05:23:42 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `Employee` AS t LIMIT 1
15/07/16 05:23:42 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/lib/hadoop-mapreduce
Note: /tmp/sqoop-root/compile/c6e94bdaca90ad5015aefbcaaaf1d04e/Employee.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
15/07/16 05:23:46 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-root/compile/c6e94bdaca90ad5015aefbcaaaf1d04e/Employee.jar
15/07/16 05:23:46 WARN manager.MySQLManager: It looks like you are importing from mysql.
15/07/16 05:23:46 WARN manager.MySQLManager: This transfer can be faster! Use the --direct
15/07/16 05:23:46 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path.
15/07/16 05:23:46 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)
15/07/16 05:23:46 INFO mapreduce.ImportJobBase: Beginning import of Employee
15/07/16 05:23:46 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
15/07/16 05:23:48 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
15/07/16 05:23:48 INFO client.RMProxy: Connecting to ResourceManager at sandbox.hortonworks.com/10.0.2.15:8050
15/07/16 05:23:51 INFO db.DBInputFormat: Using read commited transaction isolation
15/07/16 05:23:52 INFO mapreduce.JobSubmitter: number of splits:1
15/07/16 05:23:52 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1437048929231_0001
15/07/16 05:23:53 INFO impl.YarnClientImpl: Submitted application application_1437048929231_0001
15/07/16 05:23:53 INFO mapreduce.Job: The url to track the job: http://sandbox.hortonworks.com:8088/proxy/application_1437048929231_0001/
15/07/16 05:23:53 INFO mapreduce.Job: Running job: job_1437048929231_0001
15/07/16 05:24:08 INFO mapreduce.Job: Job job_1437048929231_0001 running in uber mode : false
15/07/16 05:24:08 INFO mapreduce.Job:  map 0% reduce 0%
15/07/16 05:24:18 INFO mapreduce.Job:  map 100% reduce 0%
15/07/16 05:24:19 INFO mapreduce.Job: Job job_1437048929231_0001 completed successfully
15/07/16 05:24:19 INFO mapreduce.Job: Counters: 30
        File System Counters
                FILE: Number of bytes read=0
                FILE: Number of bytes written=106914
                FILE: Number of read operations=0
                FILE: Number of large read operations=0
                FILE: Number of write operations=0
                HDFS: Number of bytes read=87
                HDFS: Number of bytes written=12
                HDFS: Number of read operations=4
                HDFS: Number of large read operations=0
                HDFS: Number of write operations=2
        Job Counters
                Launched map tasks=1
                Other local map tasks=1
                Total time spent by all maps in occupied slots (ms)=7907
                Total time spent by all reduces in occupied slots (ms)=0
                Total time spent by all map tasks (ms)=7907
                Total vcore-seconds taken by all map tasks=7907
                Total megabyte-seconds taken by all map tasks=1976750
        Map-Reduce Framework
                Map input records=1
                Map output records=1
                Input split bytes=87
                Spilled Records=0
                Failed Shuffles=0
                Merged Map outputs=0
                GC time elapsed (ms)=60
                CPU time spent (ms)=1950
                Physical memory (bytes) snapshot=153657344
                Virtual memory (bytes) snapshot=895991808
                Total committed heap usage (bytes)=75497472
        File Input Format Counters
                Bytes Read=0
        File Output Format Counters
                Bytes Written=12
15/07/16 05:24:19 INFO mapreduce.ImportJobBase: Transferred 12 bytes in 30.8798 seconds (0.3886 bytes/sec)
15/07/16 05:24:19 INFO mapreduce.ImportJobBase: Retrieved 1 records.

[root@sandbox ~]# hadoop fs -ls
Found 2 items
drwx------   - root root          0 2015-07-16 05:24 .staging
drwxr-xr-x   - root root          0 2015-07-16 05:24 Employee
[root@sandbox ~]#  hdfs dfs -lsr
lsr: DEPRECATED: Please use 'ls -R' instead.
drwx------   - root root          0 2015-07-16 05:24 .staging
drwxr-xr-x   - root root          0 2015-07-16 05:24 Employee
-rw-r--r--   1 root root          0 2015-07-16 05:24 Employee/_SUCCESS
-rw-r--r--   1 root root         12 2015-07-16 05:24 Employee/part-m-00000
[root@sandbox ~]# hadoop fs -cat Employee/_SUCCESS


MapReduce WordCount Job

hduser@ubuntu:/usr/local/hadoop-2.6.0/sbin$ hadoop jar WordCount.jar input1/part-m-00000 /usr/local/hadoop-2.6.0/output1

15/07/17 22:07:00 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
15/07/17 22:07:00 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
15/07/17 22:07:06 INFO mapred.FileInputFormat: Total input paths to process : 1
15/07/17 22:07:06 INFO mapreduce.JobSubmitter: number of splits:2
15/07/17 22:07:07 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1437194421125_0001
15/07/17 22:07:09 INFO impl.YarnClientImpl: Submitted application application_1437194421125_0001
15/07/17 22:07:09 INFO mapreduce.Job: The url to track the job: http://ubuntu:8088/proxy/application_1437194421125_0001/
15/07/17 22:07:09 INFO mapreduce.Job: Running job: job_1437194421125_0001
15/07/17 22:07:42 INFO mapreduce.Job: Job job_1437194421125_0001 running in uber mode : false
15/07/17 22:07:42 INFO mapreduce.Job:  map 0% reduce 0%
15/07/17 22:08:20 INFO mapreduce.Job:  map 100% reduce 0%
15/07/17 22:08:48 INFO mapreduce.Job:  map 100% reduce 100%
15/07/17 22:08:49 INFO mapreduce.Job: Job job_1437194421125_0001 completed successfully
15/07/17 22:08:49 INFO mapreduce.Job: Counters: 49
File System Counters
FILE: Number of bytes read=71
FILE: Number of bytes written=316866
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=283
HDFS: Number of bytes written=53
HDFS: Number of read operations=9
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters
Launched map tasks=2
Launched reduce tasks=1
Data-local map tasks=2
Total time spent by all maps in occupied slots (ms)=72004
Total time spent by all reduces in occupied slots (ms)=24103
Total time spent by all map tasks (ms)=72004
Total time spent by all reduce tasks (ms)=24103
Total vcore-seconds taken by all map tasks=72004
Total vcore-seconds taken by all reduce tasks=24103
Total megabyte-seconds taken by all map tasks=73732096
Total megabyte-seconds taken by all reduce tasks=24681472
Map-Reduce Framework
Map input records=3
Map output records=3
Map output bytes=59
Map output materialized bytes=77
Input split bytes=212
Combine input records=0
Combine output records=0
Reduce input groups=3
Reduce shuffle bytes=77
Reduce input records=3
Reduce output records=3
Spilled Records=6
Shuffled Maps =2
Failed Shuffles=0
Merged Map outputs=2
GC time elapsed (ms)=475
CPU time spent (ms)=2790
Physical memory (bytes) snapshot=456896512
Virtual memory (bytes) snapshot=2006376448
Total committed heap usage (bytes)=256843776
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=71
File Output Format Counters
Bytes Written=53


Output can be at below url 
http://localhost:50075/browseBlock.jsp?blockId=1073742969&blockSize=53&genstamp=2145&filename=%2Fusr%2Flocal%2Fhadoop-2.6.0%2Foutput1%2Fpart-00000&datanodePort=50010&namenodeInfoPort=50070&nnaddr=127.0.0.1:54310

or

hduser@ubuntu:/usr/local/hadoop-2.6.0/sbin$ hdfs dfs -cat hdfs:/usr/local/hadoop-2.6.0/output1/part-00000
Bhupendra,mishra 1
shreya,mishra 1
shreyash,mishra 1
hduser@ubuntu:/usr/local/hadoop-2.6.0/sbin$