Friday 16 February 2018

Apache Hadoop Commands and Outputs

HADOOP

1. version

Print the Hadoop version

hadoopuser@ub16043lts00:~$ hadoop version
 

Hadoop 3.0.0
Source code repository https://git-wip-us.apache.org/repos/asf/hadoop.git -r c25427ceca461ee979d30edd7a4b0f50718e6533
Compiled by andrew on 2017-12-08T19:16Z
Compiled with protoc 2.5.0
From source with checksum 397832cb5529187dc8cd74ad54ff22
This command was run using /usr/local/hadoop-3.0.0/share/hadoop/common/hadoop-common-3.0.0.jar

 


2. checknative

hadoopuser@ub16043lts00:~$ hadoop checknative

2018-02-17 11:45:24,249 INFO bzip2.Bzip2Factory: Successfully loaded & initialized native-bzip2 library system-native
2018-02-17 11:45:24,264 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library
2018-02-17 11:45:24,308 WARN erasurecode.ErasureCodeNative: ISA-L support is not available in your platform... using builtin-java codec where applicable
Native library checking:
hadoop:  true /usr/local/hadoop-3.0.0/lib/native/libhadoop.so.1.0.0
zlib:    true /lib/x86_64-linux-gnu/libz.so.1
zstd  :  false
snappy:  true /usr/lib/x86_64-linux-gnu/libsnappy.so.1
lz4:     true revision:10301
bzip2:   true /lib/x86_64-linux-gnu/libbz2.so.1
openssl: true /usr/lib/x86_64-linux-gnu/libcrypto.so
ISA-L:   false libhadoop was built without ISA-L support




2. jar

pi (Hadoop PI Estimation example)

Usage: pi <num_maps> <num_samples>

hadoopuser@ub16043lts00:~$ hadoop jar /usr/local/hadoop-3.0.0/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.0.0.jar pi 10 20



teragen (Hadoop Teragen example to generate data for the terasort)

Usage: teragen <number of 100-byte rows> <output dir>

The actual TeraGen data format per row to clear things up:
<10 bytes key><10 bytes rowid><78 bytes filler>\r\n

where:

The keys are random characters from the set ‘ ‘ .. ‘~’.
The rowid is the right justified row id as a int.

The filler consists of 7 runs of 10 characters from ‘A’ to ‘Z’.

hadoopuser@ub16043lts00:~$ hadoop jar /usr/local/hadoop-3.0.0/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.0.0.jar teragen 10000000 /teragen_output_directory



terasort (Hadoop Terasort example to run the terasort)

Usage: terasort <input dir> <output dir>

hadoopuser@ub16043lts00:~$ hadoop jar /usr/local/hadoop-3.0.0/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.0.0.jar terasort /teragen_output_directory /terasort_output_directory



teravalidate (Hadoop Teravalidate example to check the results of terasort)

Usage: teravalidate <terasort output dir (= input data)> <teravalidate output dir>

hadoopuser@ub16043lts00:~$ hadoop jar /usr/local/hadoop-3.0.0/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.0.0.jar teravalidate /terasort_output_directory /teravalidate_output_directory



wordcount (Hadoop Word Count example)

Usage: <input_file> <output_dir>

hadoopuser@ub16043lts00:~$ hadoop jar /usr/local/hadoop-3.0.0/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.0.0.jar wordcount /gauravb/inputData/hadoop_installation /gauravb/outputData/wordCountOutput01
 

2017-09-27 13:45:50,135 INFO client.RMProxy: Connecting to ResourceManager at ub16043lts00/10.0.1.1:8032
2017-09-27 13:45:51,726 INFO input.FileInputFormat: Total input files to process : 1
2017-09-27 13:45:52,003 INFO mapreduce.JobSubmitter: number of splits:1
2017-09-27 13:45:52,224 INFO Configuration.deprecation: yarn.resourcemanager.system-metrics-publisher.enabled is deprecated. Instead, use yarn.system-metrics-publisher.enabled
2017-09-27 13:45:52,517 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1506498574236_0003
2017-09-27 13:45:53,254 INFO impl.YarnClientImpl: Submitted application application_1506498574236_0003
2017-09-27 13:45:53,373 INFO mapreduce.Job: The url to track the job: http://ub16043lts00:8088/proxy/application_1506498574236_0003/
2017-09-27 13:45:53,374 INFO mapreduce.Job: Running job: job_1506498574236_0003
2017-09-27 13:46:04,815 INFO mapreduce.Job: Job job_1506498574236_0003 running in uber mode : false
2017-09-27 13:46:04,816 INFO mapreduce.Job:  map 0% reduce 0%
2017-09-27 13:46:31,512 INFO mapreduce.Job:  map 100% reduce 0%
2017-09-27 13:46:54,959 INFO mapreduce.Job:  map 100% reduce 100%
2017-09-27 13:46:56,990 INFO mapreduce.Job: Job job_1506498574236_0003 completed successfully
2017-09-27 13:46:57,353 INFO mapreduce.Job: Counters: 53
    File System Counters
        FILE: Number of bytes read=12215
        FILE: Number of bytes written=405925
        FILE: Number of read operations=0
        FILE: Number of large read operations=0
        FILE: Number of write operations=0
        HDFS: Number of bytes read=16392
        HDFS: Number of bytes written=9392
        HDFS: Number of read operations=8
        HDFS: Number of large read operations=0
        HDFS: Number of write operations=2
    Job Counters
        Launched map tasks=1
        Launched reduce tasks=1
        Data-local map tasks=1
        Total time spent by all maps in occupied slots (ms)=47810
        Total time spent by all reduces in occupied slots (ms)=52560
        Total time spent by all map tasks (ms)=23905
        Total time spent by all reduce tasks (ms)=17520
        Total vcore-milliseconds taken by all map tasks=23905
        Total vcore-milliseconds taken by all reduce tasks=17520
        Total megabyte-milliseconds taken by all map tasks=48957440
        Total megabyte-milliseconds taken by all reduce tasks=53821440
    Map-Reduce Framework
        Map input records=365
        Map output records=1701
        Map output bytes=22630
        Map output materialized bytes=12215
        Input split bytes=127
        Combine input records=1701
        Combine output records=708
        Reduce input groups=708
        Reduce shuffle bytes=12215
        Reduce input records=708
        Reduce output records=708
        Spilled Records=1416
        Shuffled Maps =1
        Failed Shuffles=0
        Merged Map outputs=1
        GC time elapsed (ms)=264
        CPU time spent (ms)=3620
        Physical memory (bytes) snapshot=778432512
        Virtual memory (bytes) snapshot=7249113088
        Total committed heap usage (bytes)=593760256
        Peak Map Physical memory (bytes)=641290240
        Peak Map Virtual memory (bytes)=2796630016
        Peak Reduce Physical memory (bytes)=137142272
        Peak Reduce Virtual memory (bytes)=4452483072
    Shuffle Errors
        BAD_ID=0
        CONNECTION=0
        IO_ERROR=0
        WRONG_LENGTH=0
        WRONG_MAP=0
        WRONG_REDUCE=0
    File Input Format Counters
        Bytes Read=16265
    File Output Format Counters
        Bytes Written=9392





MAPRED


1. version

hadoopuser@ub16043lts00:~$ mapred version
 

Hadoop 3.0.0
Source code repository https://git-wip-us.apache.org/repos/asf/hadoop.git -r c25427ceca461ee979d30edd7a4b0f50718e6533
Compiled by andrew on 2017-12-08T19:16Z
Compiled with protoc 2.5.0
From source with checksum 397832cb5529187dc8cd74ad54ff22
This command was run using /usr/local/hadoop-3.0.0/share/hadoop/common/hadoop-common-3.0.0.jar




2. classpath

hadoopuser@ub16043lts00:~$ mapred classpath

/usr/local/hadoop-3.0.0/etc/hadoop:/usr/local/hadoop-3.0.0/share/hadoop/common/lib/*:/usr/local/hadoop-3.0.0/share/hadoop/common/*:/usr/local/hadoop-3.0.0/share/hadoop/hdfs:/usr/local/hadoop-3.0.0/share/hadoop/hdfs/lib/*:/usr/local/hadoop-3.0.0/share/hadoop/hdfs/*:/usr/local/hadoop-3.0.0/share/hadoop/mapreduce/*:/usr/local/hadoop-3.0.0/share/hadoop/yarn:/usr/local/hadoop-3.0.0/share/hadoop/yarn/lib/*:/usr/local/hadoop-3.0.0/share/hadoop/yarn/*




3. envvars

hadoopuser@ub16043lts00:~$ mapred envvars

JAVA_HOME='/usr/lib/jvm/java-8-openjdk-amd64/jre/'
HADOOP_MAPRED_HOME='/usr/local/hadoop-3.0.0'
MAPRED_DIR='share/hadoop/mapreduce'
MAPRED_LIB_JARS_DIR='share/hadoop/mapreduce/lib'
HADOOP_CONF_DIR='/usr/local/hadoop-3.0.0/etc/hadoop'
HADOOP_TOOLS_HOME='/usr/local/hadoop-3.0.0'
HADOOP_TOOLS_DIR='share/hadoop/tools'
HADOOP_TOOLS_LIB_JARS_DIR='share/hadoop/tools/lib'



4. historyserver

hadoopuser@ub16043lts00:~$ mapred historyserver

STARTUP_MSG: Starting JobHistoryServer
STARTUP_MSG:   host = ub16043lts00/10.0.1.1
STARTUP_MSG:   args = []
STARTUP_MSG:   version = 3.0.0
STARTUP_MSG:   classpath = /usr/local/hadoop-3.0.0/etc/hadoop:/usr/local/hadoop-3.0.0/share/hadoop/common/lib/commons-logging-1.1.3.jar:/usr/local/hadoop-3.0.0/share/hadoop/common/lib/jackson-core-2.7.8.jar:/usr/local/hadoop-3.0.0/share/hadoop/common/lib/htrace-core4-4.1.0-incubating.jar:/usr/local/hadoop-3.0.0/share/hadoop/common/lib/jetty-servlet-9.3.19.v20170502.jar:/usr/local/hadoop-3.0.0/share/hadoop/common/lib/accessors-smart-1.2.jar
:/usr/local/hadoop-3.0.0/share/hadoop/yarn/hadoop-yarn-server-web-proxy-3.0.0.jar:/usr/local/hadoop-3.0.0/share/hadoop/yarn/hadoop-yarn-server-applicationhistoryservice-3.0.0.jar:/usr/local/hadoop-3.0.0/share/hadoop/yarn/hadoop-yarn-server-timelineservice-hbase-3.0.0.jar
STARTUP_MSG:   build = https://git-wip-us.apache.org/repos/asf/hadoop.git -r c25427ceca461ee979d30edd7a4b0f50718e6533; compiled by 'andrew' on 2017-12-08T19:16Z
STARTUP_MSG:   java = 1.8.0_151
************************************************************/
2018-02-18 21:13:00,901 INFO hs.JobHistoryServer: registered UNIX signal handlers for [TERM, HUP, INT]
2018-02-18 21:13:02,855 INFO beanutils.FluentPropertyBeanIntrospector: Error when creating PropertyDescriptor for public final void org.apache.commons.configuration2.AbstractConfiguration.setProperty(java.lang.String,java.lang.Object)! Ignoring this property.
2018-02-18 21:13:02,954 INFO impl.MetricsConfig: loaded properties from hadoop-metrics2.properties
2018-02-18 21:13:03,207 INFO impl.MetricsSystemImpl: Scheduled Metric snapshot period at 10 second(s).
2018-02-18 21:13:03,208 INFO impl.MetricsSystemImpl: JobHistoryServer metrics system started
2018-02-18 21:13:03,236 INFO hs.JobHistory: JobHistory Init
2018-02-18 21:13:05,189 INFO jobhistory.JobHistoryUtils: Default file system [hdfs://ub16043lts00:9820]
2018-02-18 21:13:05,946 INFO hs.HistoryFileManager: Perms after creating 504, Expected: 504
2018-02-18 21:13:05,968 INFO jobhistory.JobHistoryUtils: Default file system [hdfs://ub16043lts00:9820]
2018-02-18 21:13:06,017 INFO hs.HistoryFileManager: Initializing Existing Jobs...
2018-02-18 21:13:06,047 INFO hs.HistoryFileManager: Found 0 directories to load
2018-02-18 21:13:06,047 INFO hs.HistoryFileManager: Existing job initialization finished. 0.0% of cache is occupied.
2018-02-18 21:13:06,051 INFO hs.CachedHistoryStorage: CachedHistoryStorage Init
2018-02-18 21:13:06,192 INFO ipc.CallQueueManager: Using callQueue: class java.util.concurrent.LinkedBlockingQueue queueCapacity: 100 scheduler: class org.apache.hadoop.ipc.DefaultRpcScheduler
2018-02-18 21:13:06,231 INFO ipc.Server: Starting Socket Reader #1 for port 10033
2018-02-18 21:13:06,653 INFO delegation.AbstractDelegationTokenSecretManager: Updating the current master key for generating delegation tokens
2018-02-18 21:13:06,656 INFO delegation.AbstractDelegationTokenSecretManager: Starting expired delegation token remover thread, tokenRemoverScanInterval=60 min(s)
2018-02-18 21:13:06,656 INFO delegation.AbstractDelegationTokenSecretManager: Updating the current master key for generating delegation tokens
2018-02-18 21:13:06,877 INFO util.log: Logging initialized @6980ms
2018-02-18 21:13:07,127 INFO server.AuthenticationFilter: Unable to initialize FileSignerSecretProvider, falling back to use random secrets.
2018-02-18 21:13:07,132 INFO http.HttpRequestLog: Http request log for http.requests.jobhistory is not defined
2018-02-18 21:13:07,172 INFO http.HttpServer2: Added global filter 'safety' (class=org.apache.hadoop.http.HttpServer2$QuotingInputFilter)
2018-02-18 21:13:07,181 INFO http.HttpServer2: Added filter static_user_filter (class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter) to context jobhistory
2018-02-18 21:13:07,181 INFO http.HttpServer2: Added filter static_user_filter (class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter) to context static
2018-02-18 21:13:07,181 INFO http.HttpServer2: Added filter static_user_filter (class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter) to context logs
2018-02-18 21:13:07,202 INFO http.HttpServer2: adding path spec: /jobhistory/*
2018-02-18 21:13:07,202 INFO http.HttpServer2: adding path spec: /ws/*
2018-02-18 21:13:08,306 INFO webapp.WebApps: Registered webapp guice modules
2018-02-18 21:13:08,307 INFO http.HttpServer2: Jetty bound to port 19888
2018-02-18 21:13:08,309 INFO server.Server: jetty-9.3.19.v20170502
2018-02-18 21:13:08,488 INFO handler.ContextHandler: Started o.e.j.s.ServletContextHandler@780ec4a5{/logs,file:///usr/local/hadoop-3.0.0/logs/,AVAILABLE}
2018-02-18 21:13:08,489 INFO handler.ContextHandler: Started o.e.j.s.ServletContextHandler@5aabbb29{/static,jar:file:/usr/local/hadoop-3.0.0/share/hadoop/yarn/hadoop-yarn-common-3.0.0.jar!/webapps/static,AVAILABLE}
Feb 18, 2018 9:13:08 PM com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory register
INFO: Registering org.apache.hadoop.mapreduce.v2.hs.webapp.HsWebServices as a root resource class
Feb 18, 2018 9:13:08 PM com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory register
INFO: Registering org.apache.hadoop.mapreduce.v2.hs.webapp.JAXBContextResolver as a provider class
Feb 18, 2018 9:13:08 PM com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory register
INFO: Registering org.apache.hadoop.yarn.webapp.GenericExceptionHandler as a provider class
Feb 18, 2018 9:13:08 PM com.sun.jersey.server.impl.application.WebApplicationImpl _initiate
INFO: Initiating Jersey application, version 'Jersey: 1.19 02/11/2015 03:25 AM'
Feb 18, 2018 9:13:09 PM com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory getComponentProvider
INFO: Binding org.apache.hadoop.mapreduce.v2.hs.webapp.JAXBContextResolver to GuiceManagedComponentProvider with the scope "Singleton"
Feb 18, 2018 9:13:09 PM com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory getComponentProvider
INFO: Binding org.apache.hadoop.yarn.webapp.GenericExceptionHandler to GuiceManagedComponentProvider with the scope "Singleton"
Feb 18, 2018 9:13:10 PM com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory getComponentProvider
INFO: Binding org.apache.hadoop.mapreduce.v2.hs.webapp.HsWebServices to GuiceManagedComponentProvider with the scope "PerRequest"
2018-02-18 21:13:10,714 INFO handler.ContextHandler: Started o.e.j.w.WebAppContext@4f0f7849{/,file:///tmp/jetty-ub16043lts00-19888-jobhistory-_-any-4369259852477421612.dir/webapp/,AVAILABLE}{/jobhistory}
2018-02-18 21:13:10,724 INFO server.AbstractConnector: Started ServerConnector@716a412{HTTP/1.1,[http/1.1]}{ub16043lts00:19888}
2018-02-18 21:13:10,725 INFO server.Server: Started @10830ms
2018-02-18 21:13:10,725 INFO webapp.WebApps: Web app jobhistory started at 19888
2018-02-18 21:13:10,773 INFO ipc.CallQueueManager: Using callQueue: class java.util.concurrent.LinkedBlockingQueue queueCapacity: 1000 scheduler: class org.apache.hadoop.ipc.DefaultRpcScheduler
2018-02-18 21:13:10,796 INFO ipc.Server: Starting Socket Reader #1 for port 10020
2018-02-18 21:13:10,867 INFO pb.RpcServerFactoryPBImpl: Adding protocol org.apache.hadoop.mapreduce.v2.api.HSClientProtocolPB to the server
2018-02-18 21:13:10,867 INFO ipc.Server: IPC Server Responder: starting
2018-02-18 21:13:10,870 INFO ipc.Server: IPC Server listener on 10020: starting
2018-02-18 21:13:10,872 INFO hs.HistoryClientService: Instantiated HistoryClientService at ub16043lts00/10.0.1.1:10020
2018-02-18 21:13:10,873 INFO ipc.Server: IPC Server Responder: starting
2018-02-18 21:13:10,873 INFO ipc.Server: IPC Server listener on 10033: starting
2018-02-18 21:13:10,882 INFO util.JvmPauseMonitor: Starting JVM pause monitor
2018-02-18 21:13:36,659 INFO hs.JobHistory: History Cleaner started
2018-02-18 21:13:36,666 INFO hs.JobHistory: History Cleaner complete
2018-02-18 21:13:51,847 INFO webapp.View: Getting list of all Jobs.
2018-02-18 21:13:52,704 INFO jobhistory.JobSummary: jobId=job_1518966178519_0001,submitTime=1518966449827,launchTime=1518966473634,firstMapTaskLaunchTime=1518966476862,firstReduceTaskLaunchTime=1518966687322,finishTime=1518966722432,resourcesPerMap=1536,resourcesPerReduce=3072,numMaps=1,numReduces=1,succededMaps=1,succeededReduces=1,failedMaps=0,failedReduces=0,killedMaps=0,killedReduces=0,user=hadoopuser,queue=default,status=SUCCEEDED,mapSlotSeconds=278,reduceSlotSeconds=101,jobName=word count
2018-02-18 21:13:52,705 INFO hs.HistoryFileManager: Deleting JobSummary file: [hdfs://ub16043lts00:9820/mr-history/tmp/hadoopuser/job_1518966178519_0001.summary]
2018-02-18 21:13:52,798 INFO hs.HistoryFileManager: Perms after creating 504, Expected: 504
2018-02-18 21:13:52,799 INFO hs.HistoryFileManager: Moving hdfs://ub16043lts00:9820/mr-history/tmp/hadoopuser/job_1518966178519_0001-1518966449827-hadoopuser-word+count-1518966722432-1-1-SUCCEEDED-default-1518966473634.jhist to hdfs://ub16043lts00:9820/mr-history/done/2018/02/18/000000/job_1518966178519_0001-1518966449827-hadoopuser-word+count-1518966722432-1-1-SUCCEEDED-default-1518966473634.jhist
2018-02-18 21:13:53,144 INFO hs.HistoryFileManager: Moving hdfs://ub16043lts00:9820/mr-history/tmp/hadoopuser/job_1518966178519_0001_conf.xml to hdfs://ub16043lts00:9820/mr-history/done/2018/02/18/000000/job_1518966178519_0001_conf.xml
2018-02-18 21:14:35,630 INFO hs.CompletedJob: Loading job: job_1518966178519_0001 from file: hdfs://ub16043lts00:9820/mr-history/done/2018/02/18/000000/job_1518966178519_0001-1518966449827-hadoopuser-word+count-1518966722432-1-1-SUCCEEDED-default-1518966473634.jhist
2018-02-18 21:14:35,630 INFO hs.CompletedJob: Loading history file: [hdfs://ub16043lts00:9820/mr-history/done/2018/02/18/000000/job_1518966178519_0001-1518966449827-hadoopuser-word+count-1518966722432-1-1-SUCCEEDED-default1518966473634.jhist] 


JobHistory 
(http://machine_hostname:port) -- Default port is 19888

MapReduce Job Output













HDFS

1. hdfs jmxget

hadoopuser@ub16043lts00:~$ hdfs jmxget

init: server=localhost;port=;service=NameNode;localVMUrl=null

Domains:
    Domain = JMImplementation
    Domain = com.sun.management
    Domain = java.lang
    Domain = java.nio
    Domain = java.util.logging

MBeanServer default domain = DefaultDomain

MBean count = 22

Query MBeanServer MBeans:
List of all the available keys:




2. getconf -namenodes

hadoopuser@ub16043lts00:~$ hdfs getconf -namenodes
 

ub16043lts00



3. getconf -secondaryNameNodes

hadoopuser@ub16043lts00:~$ hdfs getconf -secondaryNameNodes
 

0.0.0.0



4. getconf -backupNodes

hadoopuser@ub16043lts00:~$ hdfs getconf -backupNodes
 

0.0.0.0



5. getconf -nnRpcAddresses

hadoopuser@ub16043lts00:~$ hdfs getconf -nnRpcAddresses
 

ub16043lts00:9820



6. groups

hadoopuser@ub16043lts00:~$ hdfs groups
 

hadoopuser : hadoopgroup sudo



7. balancer
Run a cluster balancing utility

hadoopuser@ub16043lts00:~$ hdfs balancer
 

2017-08-31 14:08:10,602 INFO balancer.Balancer: namenodes  = [hdfs://10.0.1.1]
2017-08-31 14:08:10,623 INFO balancer.Balancer: parameters = Balancer.BalancerParameters [BalancingPolicy.Node, threshold = 10.0, max idle iteration = 5, #excluded nodes = 0, #included nodes = 0, #source nodes = 0, #blockpools = 0, run during upgrade = false]
2017-08-31 14:08:10,623 INFO balancer.Balancer: included nodes = []
2017-08-31 14:08:10,623 INFO balancer.Balancer: excluded nodes = []
2017-08-31 14:08:10,623 INFO balancer.Balancer: source nodes = []
Time Stamp               Iteration#  Bytes Already Moved  Bytes Left To Move  Bytes Being Moved
2017-08-31 14:08:17,754 INFO balancer.Balancer: dfs.balancer.movedWinWidth = 5400000 (default=5400000)
2017-08-31 14:08:17,754 INFO balancer.Balancer: dfs.balancer.moverThreads = 1000 (default=1000)
2017-08-31 14:08:17,754 INFO balancer.Balancer: dfs.balancer.dispatcherThreads = 200 (default=200)
2017-08-31 14:08:17,754 INFO balancer.Balancer: dfs.datanode.balance.max.concurrent.moves = 50 (default=50)
2017-08-31 14:08:17,754 INFO balancer.Balancer: dfs.balancer.getBlocks.size = 2147483648 (default=2147483648)
2017-08-31 14:08:17,754 INFO balancer.Balancer: dfs.balancer.getBlocks.min-block-size = 10485760 (default=10485760)
2017-08-31 14:08:17,829 INFO balancer.Balancer: dfs.balancer.max-size-to-move = 10737418240 (default=10737418240)
2017-08-31 14:08:17,829 INFO balancer.Balancer: dfs.blocksize = 134217728 (default=134217728)
2017-08-31 14:08:17,910 INFO net.NetworkTopology: Adding a new node: /default-rack/10.0.1.2:9866
2017-08-31 14:08:17,911 INFO net.NetworkTopology: Adding a new node: /default-rack/10.0.1.1:9866
2017-08-31 14:08:17,915 INFO balancer.Balancer: 0 over-utilized: []
2017-08-31 14:08:17,915 INFO balancer.Balancer: 0 underutilized: []
The cluster is balanced. Exiting...
31 Aug, 2017 2:08:17 PM           0                  0 B                 0 B                0 B
31 Aug, 2017 2:08:18 PM  Balancing took 8.444 seconds



8. dfsadmin -report

hadoopuser@ub16043lts00:~$ hdfs dfsadmin -printTopology
 

Rack: /default-rack
   127.0.0.1:50010 (localhost)


9. dfsadmin -report

hadoopuser@ub16043lts00:~$ hdfs dfsadmin -report
 

Configured Capacity: 142186881024 (132.42 GB)
Present Capacity: 123893272576 (115.38 GB)
DFS Remaining: 123893207040 (115.38 GB)
DFS Used: 65536 (64 KB)
DFS Used%: 0.00%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0
Missing blocks (with replication factor 1): 0
Pending deletion blocks: 0

-------------------------------------------------
Live datanodes (2):

Name: 10.0.1.1:9866 (ub16043lts00)
Hostname: ub16043lts00
Decommission Status : Normal
Configured Capacity: 95145664512 (88.61 GB)
DFS Used: 32768 (32 KB)
Non DFS Used: 5825781760 (5.43 GB)
DFS Remaining: 84463058944 (78.66 GB)
DFS Used%: 0.00%
DFS Remaining%: 88.77%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Thu Aug 31 14:09:22 IST 2017
Last Block Report: Thu Aug 31 13:52:52 IST 2017


Name: 10.0.1.2:9866 (ub16043lts01)
Hostname: ub16043lts01
Decommission Status : Normal
Configured Capacity: 47041216512 (43.81 GB)
DFS Used: 32768 (32 KB)
Non DFS Used: 5222047744 (4.86 GB)
DFS Remaining: 39430148096 (36.72 GB)
DFS Used%: 0.00%
DFS Remaining%: 83.82%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Thu Aug 31 14:09:23 IST 2017
Last Block Report: Thu Aug 31 13:52:52 IST 2017



10. dfs -mkdir
 
hadoopuser@ub16043lts00:~$ hdfs dfs -mkdir /user_gauravb
hadoopuser@ub16043lts00:~$ hdfs dfs -mkdir /user_hadoop3x



11. dfs -ls

List the contents of the root directory in HDFS

hadoopuser@ub16043lts00:~$ hdfs dfs -ls / 

Found 4 items
drwxr-xr-x   - hadoopuser supergroup          0 2017-08-31 14:08 /system
drwxr-xr-x   - hadoopuser supergroup          0 2017-08-31 14:32 /user
drwxr-xr-x   - hadoopuser supergroup          0 2017-08-31 14:32 /user_gauravb
drwxr-xr-x   - hadoopuser supergroup          0 2017-08-31 14:32 /user_hadoop3x



12. dfs -ls -R

Behaves like -ls, but recursively displays entries in all subdirectories of path.

hadoopuser@ub16043lts00:~$ hdfs dfs -ls -R / 

Found 20 items
drwxr-xr-x   - hadoopuser    supergroup          0 2014-05-08 13:00 /user/hdfs
-rw-r--r--   1 hadoopuser    supergroup          0 2014-05-02 11:17 /user/hdfs/_SUCCESS
-rw-r--r--  10 hadoopuser    supergroup          0 2014-05-02 11:14 /user/hdfs/_partition.lst
-rw-r--r--   1 hadoopuser    supergroup  100000000 2014-05-02 11:17 /user/hdfs/part-r-00000
drwxr-xr-x   - hadoopuser    supergroup          0 2014-04-17 17:03 /user/hdfs/terasort-input
drwxr-xr-x   - hadoopuser    supergroup          0 2014-04-22 15:51 /user/hdfs/terasort-input01
-rw-r--r--   1 hadoopuser    supergroup          0 2014-04-22 15:51 /user/hdfs/terasort-input01/_SUCCESS
-rw-r--r--   1 hadoopuser    supergroup      50000 2014-04-22 15:50 /user/hdfs/terasort-input01/part-m-00000
-rw-r--r--   1 hadoopuser    supergroup      50000 2014-04-22 15:50 /user/hdfs/terasort-input01/part-m-00001
drwxr-xr-x   - hadoopuser    supergroup          0 2014-05-02 11:08 /user/hdfs/terasort-input02
-rw-r--r--   1 hadoopuser    supergroup          0 2014-05-02 11:08 /user/hdfs/terasort-input02/_SUCCESS
-rw-r--r--   1 hadoopuser    supergroup   50000000 2014-05-02 11:07 /user/hdfs/terasort-input02/part-m-00000
-rw-r--r--   1 hadoopuser    supergroup   50000000 2014-05-02 11:07 /user/hdfs/terasort-input02/part-m-00001
drwxr-xr-x   - hadoopuser    supergroup          0 2014-05-02 11:28 /user/hdfs/terasort-output02
-rw-r--r--   1 hadoopuser    supergroup          0 2014-05-02 11:28 /user/hdfs/terasort-output02/_SUCCESS
-rw-r--r--  10 hadoopuser    supergroup          0 2014-05-02 11:25 /user/hdfs/terasort-output02/_partition.lst
-rw-r--r--   1 hadoopuser    supergroup  100000000 2014-05-02 11:28 /user/hdfs/terasort-output02/part-r-00000
drwxr-xr-x   - hadoopuser    supergroup          0 2014-05-02 11:30 /user/hdfs/teravalidate-output02
-rw-r--r--   1 hadoopuser    supergroup          0 2014-05-02 11:30 /user/hdfs/teravalidate-output02/_SUCCESS
-rw-r--r--   1 hadoopuser    supergroup         23 2014-05-02 11:30 /user/hdfs/teravalidate-output02/part-r-00000


13. dfs -copyFromLocal

copyFromLocal is similar to put command, except that the source is restricted to a local file reference. So, basically you can do with put, all that you do with copyFromLocal, but not vice-versa.

hadoopuser@ub16043lts00:~$ hdfs dfs -copyFromLocal /home/hadoopuser/Documents/hadoop_installation /user_gauravb 


14. dfs -put

hadoopuser@ub16043lts00:~$ hdfs dfs -put /home/hadoopuser/Documents/hadoop_installation /user_gauravb

hadoopuser@ub16043lts00:~$ hdfs dfs -put /home/gb/pagecounts-20081001-000000.gz /user/HiveInputData

hadoopuser@ub16043lts00:~$ hdfs dfs -put /home/gb/hadoop/tables.ddl /user/HiveInputData

Note: Adding different extension files such as .gz and .ddl


Difference between "copyFromLocal" and "put" 

If your HDFS contains the path: /tmp/files/file_name.txt And if your local disk also contains this path then the hdfs API won't know which one you mean, unless you specify a scheme like file:// or hdfs://.


15. chown

hadoopuser@ub16043lts00:~$ hdfs dfs -chown root:supergroup /user/HDFSInputData


16. chmod

hadoopuser@ub16043lts00:~$ hdfs dfs -chmod -R 777 /user/HDFSInputData


17. fsck

hadoopuser@ub16043lts00:~$ hdfs fsck /user/hive/warehouse/

....
....
/user/hive/warehouse/movielens.db/users/users.txt:  Under replicated BP-1678988347-10.0.1.2-1524751491718:blk_1073741944_1120. Target Replicas is 2 but found 1 replica(s).
Status: HEALTHY
 Total size:    41460685 B
 Total dirs:    12
 Total files:    9
 Total symlinks:        0
 Total blocks (validated):    9 (avg. block size 4606742 B)
 Minimally replicated blocks:    9 (100.0 %)
 Over-replicated blocks:    0 (0.0 %)
 Under-replicated blocks:    9 (100.0 %)
 Mis-replicated blocks:        0 (0.0 %)
 Default replication factor:    2
 Average block replication:    1.0
 Corrupt blocks:        0
 Missing replicas:        9 (50.0 %)
 Number of data-nodes:        1
 Number of racks:        1
FSCK ended at Tue May 08 13:06:03 IST 2018 in 4 milliseconds


The filesystem under path '/user/hive/warehouse' is HEALTHY
 


18. fsck

hadoopuser@ub16043lts00:~$ hdfs fsck /user/hive/warehouse/ -locations -blocks -files

Connecting to namenode via http://localhost:50070/fsck?ugi=bigdatauser&locations=1&blocks=1&files=1&path=%2Fuser%2Fhive%2Fwarehouse
FSCK started by bigdatauser (auth:SIMPLE) from /127.0.0.1 for path /user/hive/warehouse at Tue May 08 13:09:09 IST 2018
/user/hive/warehouse <dir>
/user/hive/warehouse/movielens.db <dir>
/user/hive/warehouse/movielens.db/movies <dir>
/user/hive/warehouse/movielens.db/movies/movies.txt 163542 bytes, 1 block(s):  Under replicated BP-1678988347-10.0.1.2-1524751491718:blk_1073741946_1122. Target Replicas is 2 but found 1 replica(s).
0. BP-1678988347-10.0.1.2-1524751491718:blk_1073741946_1122 len=163542 repl=1 [DatanodeInfoWithStorage[127.0.0.1:50010,DS-0842bbd5-3c1f-4ad2-a792-afc01460a63d,DISK]]

/user/hive/warehouse/movielens.db/occupations <dir>
/user/hive/warehouse/movielens.db/occupations/occupations.txt 345 bytes, 1 block(s):  Under replicated BP-1678988347-10.0.1.2-1524751491718:blk_1073741945_1121. Target Replicas is 2 but found 1 replica(s).
0. BP-1678988347-10.0.1.2-1524751491718:blk_1073741945_1121 len=345 repl=1 [DatanodeInfoWithStorage[127.0.0.1:50010,DS-0842bbd5-3c1f-4ad2-a792-afc01460a63d,DISK]]


/user/hive/warehouse/movielens.db/users <dir>
/user/hive/warehouse/movielens.db/users/users.txt 110208 bytes, 1 block(s):  Under replicated BP-1678988347-10.0.1.2-1524751491718:blk_1073741944_1120. Target Replicas is 2 but found 1 replica(s).
0. BP-1678988347-10.0.1.2-1524751491718:blk_1073741944_1120 len=110208 repl=1 [DatanodeInfoWithStorage[127.0.0.1:50010,DS-0842bbd5-3c1f-4ad2-a792-afc01460a63d,DISK]]

Status: HEALTHY
 Total size:    41460685 B
 Total dirs:    12
 Total files:    9
 Total symlinks:        0
 Total blocks (validated):    9 (avg. block size 4606742 B)
 Minimally replicated blocks:    9 (100.0 %)
 Over-replicated blocks:    0 (0.0 %)
 Under-replicated blocks:    9 (100.0 %)
 Mis-replicated blocks:        0 (0.0 %)
 Default replication factor:    2
 Average block replication:    1.0
 Corrupt blocks:        0
 Missing replicas:        9 (50.0 %)
 Number of data-nodes:        1
 Number of racks:        1
FSCK ended at Tue May 08 13:09:09 IST 2018 in 41 milliseconds


The filesystem under path '/user/hive/warehouse' is HEALTHY




19. fsck

hadoopuser@ub16043lts00:~$ hdfs fsck -list-corruptfileblocks

Connecting to namenode via http://localhost:50070/fsck?ugi=bigdatauser&listcorruptfileblocks=1&path=%2F
The filesystem under path '/' has 0 CORRUPT files

 

 
20. setrep [-W (write), -R (recursive)]

hadoopuser@ub16043lts00:~$ hdfs dfs -setrep 4 /user/hive/warehouse/movielens.db/users/users.txt

Replication 4 set: /user/hive/warehouse/movielens.db/users/users.txt

hadoopuser@ub16043lts00:~$ hdfs dfs -setrep -W 4 /user/hive/warehouse/movielens.db/users/users.txt

hadoopuser@ub16043lts00:~$ hdfs dfs -setrep -R 4 /user/hive/warehouse/movielens.db/users/users.txt


21. du

hadoopuser@ub16043lts00:~$ hdfs dfs -du -h /

1.3 K    /README.txt
3.9 M    /cleaned_user_train.csv
1.3 K    /g
1.3 K    /gauravb
1.3 K    /gauravb1
3.3 M    /mr-history
575      /pig_data
223      /pig_output
223.7 M  /spark-jars
181.9 M  /tmp
39.5 M   /user


22. df

hadoopuser@ub16043lts00:~$ hdfs dfs -df -h /

Filesystem               Size     Used  Available  Use%
hdfs://localhost:8020  43.8 G  457.5 M      3.4 G    1%


23. storagepolicies

hadoopuser@ub16043lts00:~$ hdfs storagepolicies -listPolicies

Block Storage Policies:
    BlockStoragePolicy{COLD:2, storageTypes=[ARCHIVE], creationFallbacks=[], replicationFallbacks=[]}
    BlockStoragePolicy{WARM:5, storageTypes=[DISK, ARCHIVE], creationFallbacks=[DISK, ARCHIVE], replicationFallbacks=[DISK, ARCHIVE]}
    BlockStoragePolicy{HOT:7, storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}
    BlockStoragePolicy{ONE_SSD:10, storageTypes=[SSD, DISK], creationFallbacks=[SSD, DISK], replicationFallbacks=[SSD, DISK]}
    BlockStoragePolicy{ALL_SSD:12, storageTypes=[SSD], creationFallbacks=[DISK], replicationFallbacks=[DISK]}
    BlockStoragePolicy{LAZY_PERSIST:15, storageTypes=[RAM_DISK, DISK], creationFallbacks=[DISK], replicationFallbacks=[DISK]}


24. fetchImage

hadoopuser@ub16043lts00:~$ hdfs dfsadmin -fetchImage /home/bigdatauser/hadoop_image_08_may_2018

18/05/08 13:32:05 INFO namenode.TransferFsImage: Opening connection to http://localhost:50070/imagetransfer?getimage=1&txid=latest
18/05/08 13:32:05 INFO namenode.TransferFsImage: Image Transfer timeout configured to 60000 milliseconds
18/05/08 13:32:05 INFO namenode.TransferFsImage: Transfer took 0.14s at 240.88 KB/s







YARN


1. version

hadoopuser@ub16043lts00:~$ yarn version
 

Hadoop 3.0.0
Source code repository https://git-wip-us.apache.org/repos/asf/hadoop.git -r c25427ceca461ee979d30edd7a4b0f50718e6533
Compiled by andrew on 2017-12-08T19:16Z
Compiled with protoc 2.5.0
From source with checksum 397832cb5529187dc8cd74ad54ff22
This command was run using /usr/local/hadoop-3.0.0/share/hadoop/common/hadoop-common-3.0.0.jar

 


2. node -list

hadoopuser@ub16043lts00:~$ yarn node -list
 

2017-08-31 14:14:12,233 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
Total Nodes:1
         Node-Id         Node-State    Node-Http-Address    Number-of-Running-Containers
ub16043lts00:35581            RUNNING    ub16043lts00:8042                               0

 


3. classpath

hadoopuser@ub16043lts00:~$ yarn classpath
 

/usr/local/hadoop-3.0.0/etc/hadoop:/usr/local/hadoop-3.0.0/share/hadoop/common/lib/*:/usr/local/hadoop-3.0.0/share/hadoop/common/*:/usr/local/hadoop-3.0.0/share/hadoop/hdfs:/usr/local/hadoop-3.0.0/share/hadoop/hdfs/lib/*:/usr/local/hadoop-3.0.0/share/hadoop/hdfs/*:/usr/local/hadoop-3.0.0/share/hadoop/mapreduce/*:/usr/local/hadoop-3.0.0/share/hadoop/yarn/lib/*:/usr/local/hadoop-3.0.0/share/hadoop/yarn/*



4.