게시물 1,376건
   
[Network Card Err] Detected Tx Unit Hang
글쓴이 : 최고관리자 날짜 : 2019-09-25 (수) 16:53 조회 : 2515
글주소 :
                                
사용환경
OS : Ubuntu 16.04 LTS 
Kernel : 4.4.0-164-generic
Network Card : Intel Corporation 82599ES 10-Gigabit SFI/SFP+ 듀얼랜 / ixgbe


증상 및 메세지
:: 네트워크 링크 업/다운 반복되며 계속해서 아래와 같은 메세지가 로깅
다운타임이 발생되는 주기는 특별한 패턴이 없이 발생하며 해당 로그가 발생후에 네트워크 모듈을 다시 로딩하게 되면 증상은 사라진다. 그러나 얼마되지 않아서 동일한 증상이 반복됨
또한 dropped 패킷이 상당히 많이 쌓임;;;
...........................................................
...........................................................
Sep 25 06:54:26 XXXXXXXX kernel: [55884.564889] ixgbe 0000:2e:00.0 eth2: Detected Tx Unit Hang
Sep 25 06:54:26 XXXXXXXX kernel: [55884.564889]   Tx Queue             <2>
Sep 25 06:54:26 XXXXXXXX kernel: [55884.564889]   TDH, TDT             <74>, <8e>
Sep 25 06:54:26 XXXXXXXX kernel: [55884.564889]   next_to_use          <8e>
Sep 25 06:54:26 XXXXXXXX kernel: [55884.564889]   next_to_clean        <74>
Sep 25 06:54:26 XXXXXXXX kernel: [55884.564889] tx_buffer_info[next_to_clean]
Sep 25 06:54:26 XXXXXXXX kernel: [55884.564889]   time_stamp           <100d400ce>
Sep 25 06:54:26 XXXXXXXX kernel: [55884.564889]   jiffies              <100d405df>
Sep 25 06:54:26 XXXXXXXX kernel: [55884.564894] ixgbe 0000:2e:00.0 eth2: Detected Tx Unit Hang
Sep 25 06:54:26 XXXXXXXX kernel: [55884.564894]   Tx Queue             <7>
Sep 25 06:54:26 XXXXXXXX kernel: [55884.564894]   TDH, TDT             <c9>, <e4>
Sep 25 06:54:26 XXXXXXXX kernel: [55884.564894]   next_to_use          <e4>
Sep 25 06:54:26 XXXXXXXX kernel: [55884.564894]   next_to_clean        <c9>
Sep 25 06:54:26 XXXXXXXX kernel: [55884.564894] tx_buffer_info[next_to_clean]
Sep 25 06:54:26 XXXXXXXX kernel: [55884.564894]   time_stamp           <100d400c1>
Sep 25 06:54:26 XXXXXXXX kernel: [55884.564894]   jiffies              <100d405df>
Sep 25 06:54:26 XXXXXXXX kernel: [55884.564901] ixgbe 0000:2e:00.0 eth2: Detected Tx Unit Hang
Sep 25 06:54:26 XXXXXXXX kernel: [55884.564901]   Tx Queue             <4>
Sep 25 06:54:26 XXXXXXXX kernel: [55884.564901]   TDH, TDT             <181>, <190>
Sep 25 06:54:26 XXXXXXXX kernel: [55884.564901]   next_to_use          <190>
Sep 25 06:54:26 XXXXXXXX kernel: [55884.564901]   next_to_clean        <181>
Sep 25 06:54:26 XXXXXXXX kernel: [55884.564901] tx_buffer_info[next_to_clean]
Sep 25 06:54:26 XXXXXXXX kernel: [55884.564901]   time_stamp           <100d400ce>
Sep 25 06:54:26 XXXXXXXX kernel: [55884.564901]   jiffies              <100d405df>
Sep 25 06:54:26 XXXXXXXX kernel: [55884.564907] ixgbe 0000:2e:00.0 eth2: Detected Tx Unit Hang
Sep 25 06:54:26 XXXXXXXX kernel: [55884.564907]   Tx Queue             <3>
Sep 25 06:54:26 XXXXXXXX kernel: [55884.564907]   TDH, TDT             <3b>, <57>
Sep 25 06:54:26 XXXXXXXX kernel: [55884.564907]   next_to_use          <57>
Sep 25 06:54:26 XXXXXXXX kernel: [55884.564907]   next_to_clean        <3b>
Sep 25 06:54:26 XXXXXXXX kernel: [55884.564907] tx_buffer_info[next_to_clean]
Sep 25 06:54:26 XXXXXXXX kernel: [55884.564907]   time_stamp           <100d400ce>
Sep 25 06:54:26 XXXXXXXX kernel: [55884.564907]   jiffies              <100d405df>
Sep 25 06:54:26 XXXXXXXX kernel: [55884.564912] ixgbe 0000:2e:00.0 eth2: Detected Tx Unit Hang
Sep 25 06:54:26 XXXXXXXX kernel: [55884.564912]   Tx Queue             <0>
Sep 25 06:54:26 XXXXXXXX kernel: [55884.564912]   TDH, TDT             <191>, <1a8>
Sep 25 06:54:26 XXXXXXXX kernel: [55884.564912]   next_to_use          <1a8>
Sep 25 06:54:26 XXXXXXXX kernel: [55884.564912]   next_to_clean        <191>
Sep 25 06:54:26 XXXXXXXX kernel: [55884.564912] tx_buffer_info[next_to_clean]
Sep 25 06:54:26 XXXXXXXX kernel: [55884.564912]   time_stamp           <100d400ce>
Sep 25 06:54:26 XXXXXXXX kernel: [55884.564912]   jiffies              <100d405df>
Sep 25 06:54:26 XXXXXXXX kernel: [55884.564924] ixgbe 0000:2e:00.0 eth2: tx hang 1 detected on queue 7, resetting adapter
Sep 25 06:54:26 XXXXXXXX kernel: [55884.564926] ixgbe 0000:2e:00.0 eth2: tx hang 1 detected on queue 4, resetting adapter





추가사항
본 시스템 구성은 2개의 네트워크 디바이스를  Bonding으로 구성 및 MTU 패킷크기를 9000으로 사용하고 있다.
처음 접근은 이부분을 확인하였으나....Bonding mode 및 단일 디바이스 , MTU 1500에서도 증상은 동일하였다.


1. 네트워크 디바이스 옵션 변경
:: 아래 옵션들을 ON/OFF 변경하여 상태를 확인해보았으나 동일증상 지속
tso => tcp-segmentation-offload
gso => generic-segmentation-offload
gro => generic-receive-offload
sg => scatter-gather
ufo => udp-fragmentation-offload (Cannot change)
lro => large-receive-offload (Cannot change)

# ethtool -K eth2 gro off lro off

# ethtool -k eth2 | grep large-receive-offload
large-receive-offload: off

# ethtool -K eth2 gro on lro on


2. 네트워크 디바이스 드라이버 변경 (4.2.1-k --> 5.6.3)
:: 드라이버 변경이후에도 동일증상 지속
# ethtool -i eth2
driver: ixgbe
version: 4.2.1-k
firmware-version: 0x2b2c0001
expansion-rom-version: 
bus-info: 0000:2e:00.1
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes
supports-priv-flags: no

# modinfo ixgbe
filename:       /lib/modules/4.4.0-62-generic/kernel/drivers/net/ethernet/intel/ixgbe/ixgbe.ko
version:        4.2.1-k
license:        GPL
description:    Intel(R) 10 Gigabit PCI Express Network Driver
author:         Intel Corporation, <linux.nics@intel.com>
srcversion:     F5568BA52A50F97CB589A09
alias:          pci:v00008086d000015ACsv*sd*bc*sc*i*
alias:          pci:v00008086d000015ADsv*sd*bc*sc*i*
alias:          pci:v00008086d000015ABsv*sd*bc*sc*i*
alias:          pci:v00008086d000015AAsv*sd*bc*sc*i*
alias:          pci:v00008086d00001563sv*sd*bc*sc*i*
alias:          pci:v00008086d00001560sv*sd*bc*sc*i*
alias:          pci:v00008086d0000154Asv*sd*bc*sc*i*
alias:          pci:v00008086d00001557sv*sd*bc*sc*i*
alias:          pci:v00008086d00001558sv*sd*bc*sc*i*
alias:          pci:v00008086d0000154Fsv*sd*bc*sc*i*
alias:          pci:v00008086d0000154Dsv*sd*bc*sc*i*
alias:          pci:v00008086d00001528sv*sd*bc*sc*i*
alias:          pci:v00008086d000010F8sv*sd*bc*sc*i*
alias:          pci:v00008086d0000151Csv*sd*bc*sc*i*
alias:          pci:v00008086d00001529sv*sd*bc*sc*i*
alias:          pci:v00008086d0000152Asv*sd*bc*sc*i*
alias:          pci:v00008086d000010F9sv*sd*bc*sc*i*
alias:          pci:v00008086d00001514sv*sd*bc*sc*i*
alias:          pci:v00008086d00001507sv*sd*bc*sc*i*
alias:          pci:v00008086d000010FBsv*sd*bc*sc*i*
alias:          pci:v00008086d00001517sv*sd*bc*sc*i*
alias:          pci:v00008086d000010FCsv*sd*bc*sc*i*
alias:          pci:v00008086d000010F7sv*sd*bc*sc*i*
alias:          pci:v00008086d00001508sv*sd*bc*sc*i*
alias:          pci:v00008086d000010DBsv*sd*bc*sc*i*
alias:          pci:v00008086d000010F4sv*sd*bc*sc*i*
alias:          pci:v00008086d000010E1sv*sd*bc*sc*i*
alias:          pci:v00008086d000010F1sv*sd*bc*sc*i*
alias:          pci:v00008086d000010ECsv*sd*bc*sc*i*
alias:          pci:v00008086d000010DDsv*sd*bc*sc*i*
alias:          pci:v00008086d0000150Bsv*sd*bc*sc*i*
alias:          pci:v00008086d000010C8sv*sd*bc*sc*i*
alias:          pci:v00008086d000010C7sv*sd*bc*sc*i*
alias:          pci:v00008086d000010C6sv*sd*bc*sc*i*
alias:          pci:v00008086d000010B6sv*sd*bc*sc*i*
depends:        mdio,ptp,dca,vxlan
intree:         Y
vermagic:       4.4.0-62-generic SMP mod_unload modversions 
parm:           max_vfs:Maximum number of virtual functions to allocate per physical function - default is zero and maximum value is 63. (Deprecated) (uint)
parm:           allow_unsupported_sfp:Allow unsupported and untested SFP+ modules on 82599-based adapters (uint)
parm:           debug:Debug level (0=none,...,16=all) (int)


금일기준(2019. 09.25)  최신 드라이버 설치 
# wget https://downloadmirror.intel.com/14687/eng/ixgbe-5.6.3.tar.gz
# tar zxvf ixgbe-5.6.3.tar.gz
# cd ixgbe-5.6.3/src/
# make install 
# rmmod ixgbe ; modprobe ixgbe RSS=8

# ethtool -i eth2
driver: ixgbe
version: 5.6.3
firmware-version: 0x2b2c0001, 1.1197.0
expansion-rom-version: 
bus-info: 0000:2e:00.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes
supports-priv-flags: yes


3. Kernel 변경 
:: Kernel >= 4.18.0
Ubuntu 16.04 에서 apt 에서 지원되는 커널패키지는 4.15.0-64 이라 소스컴파일 

# apt-get install -y build-essential libncurses5 libncurses5-dev bin86 kernel-package libssl-dev bison flex libelf-dev
# cd /usr/local/src
# wget https://mirrors.edge.kernel.org/pub/linux/kernel/v4.x/linux-4.18.19.tar.gz
# tar zxvf linux-4.18.19.tar.gz 
# mv linux-4.18.19 /usr/src
# cd /usr/src/linux-4.18.19
# cp /boot/config-4.4.0-161-generic .config
# make menuconfig
# make-kpkg --J 8 --initrd --revision=1.0 kernel_image
# dpkg -i ../linux-image-4.18.19_1.0_amd64.deb
# reboot 
# uname -r
4.18.19

Kernel 4.18. 이상부터 패치 
 - ESP(Encapsulating Security Payload) 
This issue has this upstream thread about the problem and per this Archlinux forum post, setting 
CONFIG_INET_ESP_OFFLOAD=n 
CONFIG_INET6_ESP_OFFLOAD=n 
fixes the problem. I have built a kernel with these unset and verified that these changes work. 

# uname -r
4.18.19

# grep "INET[A-Za-z0-9]*_ESP" .config
CONFIG_INET_ESP=m
# CONFIG_INET_ESP_OFFLOAD is not set
CONFIG_INET6_ESP=m
# CONFIG_INET6_ESP_OFFLOAD is not set

:: 기존에 없던 ESP 관련 네트워크 옵션이 생겼음
# ethtool -k eth2 |grep -i esp
Cannot get device udp-fragmentation-offload settings: Operation not supported
tx-esp-segmentation: off [fixed]
esp-hw-offload: off [fixed]
esp-tx-csum-hw-offload: off [fixed]

# ethtool -i eth2
driver: ixgbe
version: 5.1.0-k
firmware-version: 0x00012b2c, 1.1197.0
expansion-rom-version: 
bus-info: 0000:2e:00.1
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes
supports-priv-flags: yes


※ 또한 네트워크 카드내에 쌓인 dropped packet 증상도 해결

이름 패스워드
비밀글 (체크하면 글쓴이만 내용을 확인할 수 있습니다.)
왼쪽의 글자를 입력하세요.
   

 



 
사이트명 : 모지리네 | 대표 : 이경현 | 개인커뮤니티 : 랭키닷컴 운영체제(OS) | 경기도 성남시 분당구 | 전자우편 : mojily골뱅이chonnom.com Copyright ⓒ www.chonnom.com www.kyunghyun.net www.mojily.net. All rights reserved.