:: 네트워크 링크 업/다운 반복되며 계속해서 아래와 같은 메세지가 로깅
다운타임이 발생되는 주기는 특별한 패턴이 없이 발생하며 해당 로그가 발생후에 네트워크 모듈을 다시 로딩하게 되면 증상은 사라진다. 그러나 얼마되지 않아서 동일한 증상이 반복됨
또한 dropped 패킷이 상당히 많이 쌓임;;;
...........................................................
...........................................................
Sep 25 06:54:26 XXXXXXXX kernel: [55884.564889] ixgbe 0000:2e:00.0 eth2: Detected Tx Unit Hang
Sep 25 06:54:26 XXXXXXXX kernel: [55884.564889] Tx Queue <2>
Sep 25 06:54:26 XXXXXXXX kernel: [55884.564889] TDH, TDT <74>, <8e>
Sep 25 06:54:26 XXXXXXXX kernel: [55884.564889] next_to_use <8e>
Sep 25 06:54:26 XXXXXXXX kernel: [55884.564889] next_to_clean <74>
Sep 25 06:54:26 XXXXXXXX kernel: [55884.564889] tx_buffer_info[next_to_clean]
Sep 25 06:54:26 XXXXXXXX kernel: [55884.564889] time_stamp <100d400ce>
Sep 25 06:54:26 XXXXXXXX kernel: [55884.564889] jiffies <100d405df>
Sep 25 06:54:26 XXXXXXXX kernel: [55884.564894] ixgbe 0000:2e:00.0 eth2: Detected Tx Unit Hang
Sep 25 06:54:26 XXXXXXXX kernel: [55884.564894] Tx Queue <7>
Sep 25 06:54:26 XXXXXXXX kernel: [55884.564894] TDH, TDT <c9>, <e4>
Sep 25 06:54:26 XXXXXXXX kernel: [55884.564894] next_to_use <e4>
Sep 25 06:54:26 XXXXXXXX kernel: [55884.564894] next_to_clean <c9>
Sep 25 06:54:26 XXXXXXXX kernel: [55884.564894] tx_buffer_info[next_to_clean]
Sep 25 06:54:26 XXXXXXXX kernel: [55884.564894] time_stamp <100d400c1>
Sep 25 06:54:26 XXXXXXXX kernel: [55884.564894] jiffies <100d405df>
Sep 25 06:54:26 XXXXXXXX kernel: [55884.564901] ixgbe 0000:2e:00.0 eth2: Detected Tx Unit Hang
Sep 25 06:54:26 XXXXXXXX kernel: [55884.564901] Tx Queue <4>
Sep 25 06:54:26 XXXXXXXX kernel: [55884.564901] TDH, TDT <181>, <190>
Sep 25 06:54:26 XXXXXXXX kernel: [55884.564901] next_to_use <190>
Sep 25 06:54:26 XXXXXXXX kernel: [55884.564901] next_to_clean <181>
Sep 25 06:54:26 XXXXXXXX kernel: [55884.564901] tx_buffer_info[next_to_clean]
Sep 25 06:54:26 XXXXXXXX kernel: [55884.564901] time_stamp <100d400ce>
Sep 25 06:54:26 XXXXXXXX kernel: [55884.564901] jiffies <100d405df>
Sep 25 06:54:26 XXXXXXXX kernel: [55884.564907] ixgbe 0000:2e:00.0 eth2: Detected Tx Unit Hang
Sep 25 06:54:26 XXXXXXXX kernel: [55884.564907] Tx Queue <3>
Sep 25 06:54:26 XXXXXXXX kernel: [55884.564907] TDH, TDT <3b>, <57>
Sep 25 06:54:26 XXXXXXXX kernel: [55884.564907] next_to_use <57>
Sep 25 06:54:26 XXXXXXXX kernel: [55884.564907] next_to_clean <3b>
Sep 25 06:54:26 XXXXXXXX kernel: [55884.564907] tx_buffer_info[next_to_clean]
Sep 25 06:54:26 XXXXXXXX kernel: [55884.564907] time_stamp <100d400ce>
Sep 25 06:54:26 XXXXXXXX kernel: [55884.564907] jiffies <100d405df>
Sep 25 06:54:26 XXXXXXXX kernel: [55884.564912] ixgbe 0000:2e:00.0 eth2: Detected Tx Unit Hang
Sep 25 06:54:26 XXXXXXXX kernel: [55884.564912] Tx Queue <0>
Sep 25 06:54:26 XXXXXXXX kernel: [55884.564912] TDH, TDT <191>, <1a8>
Sep 25 06:54:26 XXXXXXXX kernel: [55884.564912] next_to_use <1a8>
Sep 25 06:54:26 XXXXXXXX kernel: [55884.564912] next_to_clean <191>
Sep 25 06:54:26 XXXXXXXX kernel: [55884.564912] tx_buffer_info[next_to_clean]
Sep 25 06:54:26 XXXXXXXX kernel: [55884.564912] time_stamp <100d400ce>
Sep 25 06:54:26 XXXXXXXX kernel: [55884.564912] jiffies <100d405df>
Sep 25 06:54:26 XXXXXXXX kernel: [55884.564924] ixgbe 0000:2e:00.0 eth2: tx hang 1 detected on queue 7, resetting adapter
Sep 25 06:54:26 XXXXXXXX kernel: [55884.564926] ixgbe 0000:2e:00.0 eth2: tx hang 1 detected on queue 4, resetting adapter
추가사항
본 시스템 구성은 2개의 네트워크 디바이스를 Bonding으로 구성 및 MTU 패킷크기를 9000으로 사용하고 있다.
처음 접근은 이부분을 확인하였으나....Bonding mode 및 단일 디바이스 , MTU 1500에서도 증상은 동일하였다.
1. 네트워크 디바이스 옵션 변경
:: 아래 옵션들을 ON/OFF 변경하여 상태를 확인해보았으나 동일증상 지속
tso => tcp-segmentation-offload
gso => generic-segmentation-offload
gro => generic-receive-offload
sg => scatter-gather
ufo => udp-fragmentation-offload (Cannot change)
lro => large-receive-offload (Cannot change)
# ethtool -K eth2 gro off lro off
# ethtool -k eth2 | grep large-receive-offload
large-receive-offload: off
# ethtool -K eth2 gro on lro on
2. 네트워크 디바이스 드라이버 변경 (4.2.1-k --> 5.6.3)
:: 드라이버 변경이후에도 동일증상 지속
# ethtool -i eth2
driver: ixgbe
version: 4.2.1-k
firmware-version: 0x2b2c0001
expansion-rom-version:
bus-info: 0000:2e:00.1
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes
supports-priv-flags: no
# modinfo ixgbe
filename: /lib/modules/4.4.0-62-generic/kernel/drivers/net/ethernet/intel/ixgbe/ixgbe.ko
version: 4.2.1-k
license: GPL
description: Intel(R) 10 Gigabit PCI Express Network Driver
author: Intel Corporation, <linux.nics@intel.com>
srcversion: F5568BA52A50F97CB589A09
alias: pci:v00008086d000015ACsv*sd*bc*sc*i*
alias: pci:v00008086d000015ADsv*sd*bc*sc*i*
alias: pci:v00008086d000015ABsv*sd*bc*sc*i*
alias: pci:v00008086d000015AAsv*sd*bc*sc*i*
alias: pci:v00008086d00001563sv*sd*bc*sc*i*
alias: pci:v00008086d00001560sv*sd*bc*sc*i*
alias: pci:v00008086d0000154Asv*sd*bc*sc*i*
alias: pci:v00008086d00001557sv*sd*bc*sc*i*
alias: pci:v00008086d00001558sv*sd*bc*sc*i*
alias: pci:v00008086d0000154Fsv*sd*bc*sc*i*
alias: pci:v00008086d0000154Dsv*sd*bc*sc*i*
alias: pci:v00008086d00001528sv*sd*bc*sc*i*
alias: pci:v00008086d000010F8sv*sd*bc*sc*i*
alias: pci:v00008086d0000151Csv*sd*bc*sc*i*
alias: pci:v00008086d00001529sv*sd*bc*sc*i*
alias: pci:v00008086d0000152Asv*sd*bc*sc*i*
alias: pci:v00008086d000010F9sv*sd*bc*sc*i*
alias: pci:v00008086d00001514sv*sd*bc*sc*i*
alias: pci:v00008086d00001507sv*sd*bc*sc*i*
alias: pci:v00008086d000010FBsv*sd*bc*sc*i*
alias: pci:v00008086d00001517sv*sd*bc*sc*i*
alias: pci:v00008086d000010FCsv*sd*bc*sc*i*
alias: pci:v00008086d000010F7sv*sd*bc*sc*i*
alias: pci:v00008086d00001508sv*sd*bc*sc*i*
alias: pci:v00008086d000010DBsv*sd*bc*sc*i*
alias: pci:v00008086d000010F4sv*sd*bc*sc*i*
alias: pci:v00008086d000010E1sv*sd*bc*sc*i*
alias: pci:v00008086d000010F1sv*sd*bc*sc*i*
alias: pci:v00008086d000010ECsv*sd*bc*sc*i*
alias: pci:v00008086d000010DDsv*sd*bc*sc*i*
alias: pci:v00008086d0000150Bsv*sd*bc*sc*i*
alias: pci:v00008086d000010C8sv*sd*bc*sc*i*
alias: pci:v00008086d000010C7sv*sd*bc*sc*i*
alias: pci:v00008086d000010C6sv*sd*bc*sc*i*
alias: pci:v00008086d000010B6sv*sd*bc*sc*i*
depends: mdio,ptp,dca,vxlan
intree: Y
vermagic: 4.4.0-62-generic SMP mod_unload modversions
parm: max_vfs:Maximum number of virtual functions to allocate per physical function - default is zero and maximum value is 63. (Deprecated) (uint)
parm: allow_unsupported_sfp:Allow unsupported and untested SFP+ modules on 82599-based adapters (uint)
parm: debug:Debug level (0=none,...,16=all) (int)
금일기준(2019. 09.25) 최신 드라이버 설치
# wget https://downloadmirror.intel.com/14687/eng/ixgbe-5.6.3.tar.gz
# tar zxvf ixgbe-5.6.3.tar.gz
# cd ixgbe-5.6.3/src/
# make install