富士通双机软件异常恢复过程20140404

2020-06-07 14:51

富士通双机软件异常恢复过程

在分析富士通双机软件过程中,跟踪发现XX平台的两台服务器的syslog均没有正常输出,日志输出最后时间点为服务器重启的时间点,查看守护进程不存在,从直觉判断,应该是syslog异常导致PLC软件不能正常工作:

# ps -ef | grep syslog

root 7656 6142 0 11:18:53 pts/4 0:00 grep syslog

通过Solaris的SMF(Service Management Facility)对syslog进行跟踪分析,发现异常,syslog没有正常启动的原因时有两个关联服务disable

# svcs -l svc:/system/system-log:default fmri svc:/system/system-log:default 名称 system log 启用 是

状态 offline next_state none

state_time 2014年04月03日 星期四 16时48分48秒 重启程序 svc:/system/svc/restarter:default

dependency require_all/none svc:/milestone/sysconfig (online) dependency require_all/none svc:/system/filesystem/local (online) dependency optional_all/none svc:/system/filesystem/autofs (disabled) dependency require_all/none svc:/milestone/name-services (disabled) dependency require_all/none svc:/system/fjsvmadm-evhandsd (online)

将disable的程序启动

# svcadm enable svc:/system/filesystem/autofs # svcadm enable svc:/milestone/name-services

重新启动syslog服务

svcadm enable svc:/system/system-log:default

查看进程,syslog启动 # ps -ef | grep syslog

root 655 1 0 10:50:58 ? 0:01 /usr/sbin/syslogd root 7656 6142 0 11:18:53 pts/4 0:00 grep syslog

后续双机软件自动启动,在132服务器上重新进行上述操作,双机软件也恢复正常启动,查看状态如下:

# XXX.XXX.XXX.XXX

# XXX.XXX.XXX.XXX

由于在132上看不到节点2的机器状态,想尝试进行切换,发现失败,可能和程序原来是手工通过root或其他账号启动有关,合适的时候找时间对132进行重启动:

观察到PCL工作异常的日志,后续需要跟踪下: main(1): Got SIGALRM

writemsg(2): Logging msg 'Apr 3 16:07:24 hanet: [ID 361421 user.error] WARNING: 87500: standby interface failed. (sha0)' to CONSOLE /dev/sysmsg

writemsg(9): Logging msg 'Apr 3 16:07:24 hanet: [ID 361421 user.error] WARNING: 87500: standby interface failed. (sha0)' to FILE /var/opt/FJSVmadm/evh/evh_pipe

writemsg(3): Logging msg 'Apr 3 16:07:24 hanet: [ID 361421 user.error] WARNING: 87500: standby interface failed. (sha0)' to FILE /var/adm/messages

writemsg(2): Logging msg 'Apr 3 16:07:24 hanet: [ID 960721 user.error] INFO: 88500: standby interface recovered. (sha0)' to CONSOLE /dev/sysmsg

writemsg(9): Logging msg 'Apr 3 16:07:24 hanet: [ID 960721 user.error] INFO: 88500: standby interface recovered. (sha0)' to FILE /var/opt/FJSVmadm/evh/evh_pipe

writemsg(3): Logging msg 'Apr 3 16:07:24 hanet: [ID 960721 user.error] INFO: 88500: standby interface recovered. (sha0)' to FILE /var/adm/messages #

# ifconfig -a

lo0: flags=2001000849 mtu 8232 index 1 inet 127.0.0.1 netmask ff000000

e1000g1: flags=1000863 mtu 1500 index 5

inet # XXX.XXX.XXX.XXX netmask ffffff80 broadcast 10.235.156.255 ether 0:21:28:13:65:2b # # #

# /opt/FJSVhanet/usr/sbin/dsphanet [IPv4,Patrol]

Name Status Mode CL Device

+----------+--------+----+----+------------------------------------------------+ sha1 Inactive d ON e1000g1(ON),e1000g0(OFF) sha0 Active p OFF sha1(ON) [IPv6]

Name Status Mode CL Device

+----------+--------+----+----+------------------------------------------------+ #

ARNING: 87500: standby interface failed. (sha0)

资料:

http://docs.oracle.com/cd/E19424-01/820-4809/log_syslog/index.html

http://unix.ittoolbox.com/groups/technical-functional/solaris-l/how-to-run-the-syslogd-server-on-solaris-10-2351469

http://unix.derkeiler.com/Newsgroups/comp.unix.solaris/2006-04/msg01071.html

http://www.oracle.com/technetwork/articles/servers-storage-admin/intro-smf-basics-s11-1729181.html

https://community.oracle.com/thread/1921656?tstart=0

http://www.fujitsu.com/global/services/computing/server/primequest/documents/pcl-manuals.html

http://software.fujitsu.com/jp/manual/manualfiles/m120009/j2uz7781/03enz201/j7781-f-03-02.html


富士通双机软件异常恢复过程20140404.doc 将本文的Word文档下载到电脑 下载失败或者文档不完整,请联系客服人员解决!

下一篇:2013申论热点、焦点整理

相关阅读
本类排行
× 注册会员免费下载(下载后可以自由复制和排版)

马上注册会员

注:下载文档有可能“只有目录或者内容不全”等情况,请下载之前注意辨别,如果您已付费且无法下载或内容有问题,请联系我们协助你处理。
微信: QQ: