2013-09-29 11:40:09,693 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: For
namenode datanode1/192.168.3.11:8020 using DELETEREPORT_INTERVAL of 300000 msec BLOCKREPORT_INTERVAL of 21600000msec Initial delay: 0msec; heartBeatInterval=3000 2013-09-29 11:40:09,731 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Namenode Block pool BP-1224883789-192.168.3.10-1378444984820 (storage id
DS-621562718-192.168.3.15-50010-1378445052913) service to datanode1/192.168.3.11:8020 trying to claim ACTIVE state with txid=50455
2013-09-29 11:40:09,731 INFO org.apache.hadoop.hdfs.server.datanode.DataNode:
Acknowledging ACTIVE Namenode Block pool BP-1224883789-192.168.3.10-1378444984820 (storage id DS-621562718-192.168.3.15-50010-1378445052913) service to datanode1/192.168.3.11:8020
2013-09-29 11:40:09,770 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: BlockReport of 707 blocks took 3 msec to generate and 36 msecs for RPC and NN processing
2013-09-29 11:40:09,771 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: sent block report, processed command:null
2013-09-29 11:40:09,772 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: BlockReport of 707 blocks took 2 msec to generate and 39 msecs for RPC and NN processing
2013-09-29 11:40:09,772 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: sent block report, processed
command:org.apache.hadoop.hdfs.server.protocol.FinalizeCommand@71b493c6
2013-09-29 11:40:09,774 INFO org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner: Periodic Block Verification Scanne
r initialized with interval 504 hours for block pool BP-1224883789-192.168.3.10-1378444984820. 2013-09-29 11:40:09,784 INFO org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Added bpid=BP-1224883789-192.168.3.10-1378444984820 to blockPoolScannerMap, new size=1
2013-09-29 11:40:09,002 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Refresh request received for nameservices: bdpha
如果在上述操作没有在30分钟之内恢复。请启动应急预案中的集群主备切换操作 具体操作方法请参照《HDQS-AM-004历史数据查询系统应急处理手册》
2.4.5、journalnode服务故障报警
现象描述:syslog告警
确认方法:登录到故障节点,使用jps查看服务状态;如果标识红色服务不存在,故服务故障
排除方法:
第一步:登录到故障节点,进入journalnode日志目录;查看日志信息 命令如下:cd /opt/hadoop/logs//登录到日志目录
tail -200f hadoop-hadoop-journalnode-TY101-M01.log // 查看最近200条日志记录 第二步:重新启动journalnode服务 命令如下:
/opt/hadoop/sbin/hadoop-daemon.sh start journalnode //开启journalnode服务 /opt/hadoop/sbin/hadoop-daemon.sh stop journalnode//关闭journalnode服务 查看启动系统日志
tail -200f hadoop-hadoop-journalnode-TY101-M01.log // 查看最近200条日志记录 第三步:查看服务状态;使用jps查看即可
2.4.6、jobtracker服务故障
现象描述:syslog告警同时业务中断 Syslog报警日志信息:
[2013-10-20 10:55:02 INFO ] [com.cms.web.syslog.SyslogUtil:100] - 发送:
CEB-HDQS|+|CEB-HDQS|+|1001|+|1001|+|NA|+|TY101-M01|+|检测JobTracker服务状态|+|TY101-M01|+|dead|+|APP|+|HDQS|+|JobTracker|+|1|+|TY101-M01上JobTracker服务故障|+|1382237640|+|xiaoxu|+|13810466464
确认方法:登录到故障节点,使用jps查看服务状态,红色标注的服务不存在 使用BDP监控平台查看信息:在平台首页>管理控制台>集群监控>集群服务监控
排除方法:
第一步:查看jobtracker服务日志
cd /opt/hadoop-mr1/logs //进入日志目录
tail -200f hadoop-hadoop-jobtracker-TY101-M01.log //查看日志信息 第二步:启动jobtracker服务
命令如下:/opt/hadoop-mr1/bin/hadoop-daemon.sh start jobtracker //开启jobtracker服务 查看启动服务日志
tail -200f hadoop-hadoop-jobtracker-TY101-M01.log //查看日志信息 或者重启MR服务
首先先关闭MR服务:/opt/hadoop-mr1/bin/stop-mapred.sh 然后开启MR服务:/opt/hadoop-mr1/bin/start-mapred.sh 最后在查看日志:
tail -200f /opt/hadoop-mr1/logs/hadoop-hadoop-jobtracker-TY101-M01.log 第三步:日志如无异常,查看服务状态即可,使用jps产看 日志显示如下:
2013-09-29 14:34:42,984 INFO org.apache.hadoop.mapred.JobTracker: Recovery done! Recoverd 0 of 0 jobs.
2013-09-29 14:34:42,984 INFO org.apache.hadoop.mapred.JobTracker: Recovery Duration (ms):1
2013-09-29 14:34:42,984 INFO org.apache.hadoop.mapred.JobTracker: Refreshing hosts information
2013-09-29 14:34:42,996 INFO org.apache.hadoop.util.HostsFileReader: Setting the includes file to
2013-09-29 14:34:42,996 INFO org.apache.hadoop.util.HostsFileReader: Setting the excludes file to
2013-09-29 14:34:42,996 INFO org.apache.hadoop.util.HostsFileReader: Refreshing hosts (include/exclude) list
2013-09-29 14:34:42,996 INFO org.apache.hadoop.mapred.JobTracker: Decommissioning 0 nodes
2013-09-29 14:34:42,997 INFO org.apache.hadoop.ipc.Server: IPC Server Responder: starting 2013-09-29 14:34:42,997 INFO org.apache.hadoop.ipc.Server: IPC Server listener on 9001: starting
2013-09-29 14:34:42,999 INFO org.apache.hadoop.mapred.JobTracker: Starting RUNNING 2013-09-29 14:34:43,013 WARN org.apache.hadoop.mapred.JobTracker: Serious problem, cannot find record of 'previous' heartbeat for 'tracker_datanode1:localhost/127.0.0.1:34514'; reinitializing the tasktracker
2013-09-29 14:34:43,017 WARN org.apache.hadoop.mapred.JobTracker: Serious problem, cannot find record of 'previous' heartbeat for 'tracker_datanode5:localhost/127.0.0.1:50273'; reinitializing the tasktracker
2013-09-29 14:34:43,017 WARN org.apache.hadoop.mapred.JobTracker: Serious problem, cannot find record of 'previous' heartbeat for 'tracker_datanode2:localhost/127.0.0.1:47020'; reinitializing the tasktracker
2013-09-29 14:34:43,017 WARN org.apache.hadoop.mapred.JobTracker: Serious problem, cannot find record of 'previous' heartbeat for 'tracker_datanode4:localhost/127.0.0.1:58744'; reinitializing the tasktracker
2013-09-29 14:34:43,145 INFO org.apache.hadoop.net.NetworkTopology: Adding a new node: /default-rack/datanode1
2013-09-29 14:34:43,147 INFO org.apache.hadoop.mapred.JobTracker: Adding tracker tracker_datanode1:localhost/127.0.0.1:57061 to host datanode1
2013-09-29 14:34:43,150 INFO org.apache.hadoop.net.NetworkTopology: Adding a new node: /default-rack/datanode5
2013-09-29 14:34:43,150 INFO org.apache.hadoop.mapred.JobTracker: Adding tracker tracker_datanode5:localhost/127.0.0.1:42437 to host datanode5
2013-09-29 14:34:43,158 INFO org.apache.hadoop.net.NetworkTopology: Adding a new node: /default-rack/datanode4
2013-09-29 14:34:43,158 INFO org.apache.hadoop.mapred.JobTracker: Adding tracker tracker_datanode4:localhost/127.0.0.1:33865 to host datanode4
2013-09-29 14:34:43,171 INFO org.apache.hadoop.net.NetworkTopology: Adding a new node: /default-rack/datanode2
2013-09-29 14:34:43,171 INFO org.apache.hadoop.mapred.JobTracker: Adding tracker tracker_datanode2:localhost/127.0.0.1:34548 to host datanode2
2.4.7、tasktracker服务故障
现象描述:syslog告警
确认方法:登录到故障节点,使用jps查看服务状态,红色标注的服务不存在 排除方法:
第一步:查看tasktracker服务日志
cd /opt/hadoop-mr1/logs //进入日志目录
tail -200f hadoop-hadoop-tasktracker-TY101-001.log //查看日志信息 第二步:启动tasktracker服务
命令如下:/opt/hadoop-mr1/bin/hadoop-daemon.sh start tasktracker//开启tasktracker服务 查看启动服务日志
tail -200f hadoop-hadoop-tasktracker-TY101-001.log //查看日志信息 第三步:日志如无异常,查看服务状态即可,使用jps产看 正常开启tasktracker日志信息:
2013-09-29 14:28:10,534 INFO org.apache.hadoop.mapred.TaskLogsTruncater: Initializing logs' truncater with mapRetainSize=-1 and reduceRetainSize=-1
2013-09-29 14:28:10,550 INFO org.apache.hadoop.mapred.TaskTracker: Starting tasktracker with owner as demo
2013-09-29 14:28:10,550 WARN org.apache.hadoop.conf.Configuration: slave.host.name is deprecated. Instead, use dfs.datanode.hostname
2013-09-29 14:28:10,551 INFO org.apache.hadoop.mapred.TaskTracker: Good mapred local directories are: /hadoop/tmp/hadoop-demo/mapred/local
2013-09-29 14:28:10,581 WARN org.apache.hadoop.conf.Configuration: session.id is deprecated. Instead, use dfs.metrics.session-id
2013-09-29 14:28:10,581 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: Initializing JVM Metrics with processName=TaskTracker, sessionId=
2013-09-29 14:28:10,628 INFO org.apache.hadoop.ipc.Server: Starting Socket Reader #1 for port 50273
2013-09-29 14:28:10,652 INFO org.apache.hadoop.ipc.Server: IPC Server Responder: starting 2013-09-29 14:28:10,652 INFO org.apache.hadoop.ipc.Server: IPC Server listener on 50273: starting
2013-09-29 14:28:10,654 INFO org.apache.hadoop.mapred.TaskTracker: TaskTracker up at: localhost/127.0.0.1:50273
2013-09-29 14:28:10,654 INFO org.apache.hadoop.mapred.TaskTracker: Starting tracker tracker_datanode5:localhost/127.0.0.1:50273
2013-09-29 14:28:10,672 INFO org.apache.hadoop.mapred.TaskTracker: Starting thread: Map-events fetcher for all reduce tasks on tracker_datanode5:localhost/127.0.0.1:50273
2013-09-29 14:28:10,678 INFO org.apache.hadoop.util.ProcessTree: setsid exited with exit code 0 2013-09-29 14:28:10,681 INFO org.apache.hadoop.mapred.TaskTracker: Using
ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin@4cc5aa00 2013-09-29 14:28:10,682 WARN org.apache.hadoop.mapred.TaskTracker: TaskTracker's totalMemoryAllottedForTasks is -1 and reserved physical memory is not configured. TaskMemoryManager is disabled.
2013-09-29 14:28:10,683 INFO org.apache.hadoop.mapred.IndexCache: IndexCache created with max memory = 10485760
2013-09-29 14:28:10,690 INFO org.apache.hadoop.http.HttpServer: Jetty bound to port 50060 2013-09-29 14:28:10,690 INFO org.mortbay.log: jetty-6.1.26.cloudera.2
2013-09-29 14:28:10,875 INFO org.mortbay.log: Started SelectChannelConnector@0.0.0.0:50060