게시물 1,376건
   
[ZFS] DISK 복구 (DEGRADED -> ONLINE)
글쓴이 : 최고관리자 날짜 : 2014-05-20 (화) 15:38 조회 : 7877
글주소 :
                                
DISK 복구 (DEGRADED -> ONLINE)

디스크 1개가 REMOVED 상태
※    Device Failure and Recovery (장치 오류및 복구)
DEGRADED    
하나 이상의 구성장치가 오프라인 상태
디스크에 Fail이 발생하였지만 기능은 동작중인 상태이며 Mirror 구성일때 나타납니다.
One or more top-level vdevs is in the degraded state because one or more component devices are offline. Sufficient replicas exist to continue functioning.
One or more component devices is in the degraded or faulted state, but sufficient replicas exist to continue functioning. The underlying conditions are as follows:

  o The number of checksum errors exceeds acceptable levels and the device is degraded as an indication that something may be wrong. ZFS  continues to use the device as necessary.
  o The  number  of I/O errors exceeds acceptable levels. The device could not be marked as faulted because there are insufficient replicas to con-tinue functioning.

FAULTED     
풀의 결함 허용이 손상되었을 수 있음을 나타내는 표시로 디스크에 엑세스가 불가능하여 장애가 발생한 상태
One or more top-level vdevs is in the faulted state because one or more component devices are offline. Insufficient replicas exist to  continue  function-ing.
One or more component devices is in the faulted state, and insufficient replicas exist to continue functioning. The underlying conditions are as follows:

  o The device could be opened, but the contents did not match expected values.
  o The number of I/O errors exceeds acceptable levels and the device is faulted to prevent further use of the device.

OFFLINE     The device was explicitly taken offline by the "zpool offline" command.
Inactive 상태로 디스크 제거했을 경우

ONLINE      The device is online and functioning.

REMOVED     
물리적으로 디스크가 제거된 상태이며 장치제거 감지는 하드웨어에 따라서 지원되지 않을수 있습니다.
The device was physically removed while the system was running. Device removal detection is hardware-dependent and may not be supported on all platforms.

UNAVAIL     
장치가 물리적으로 제거 이후 다시연결 되어있을때이며 온라인상태로 배치하기위해 시도
장치 검색은 하드웨어에 의존적이며 지원되지 않을수 있습니다.
The  device could not be opened. If a pool is imported when a device was unavailable, then the device will be identified by a unique identifier instead of its path since the path was never correct in the first place.
If a device is removed and later re-attached to the system, ZFS attempts to put the device online automatically. Device attach  detection  is  hardware-dependent  and might not be supported on all platforms.


특정 디스크에서 REMOVED 증상발견
# zpool status
  pool: TEST_IMG
 state: DEGRADED
status: One or more devices has been removed by the administrator.
        Sufficient replicas exist for the pool to continue functioning in a
        degraded state.
action: Online the device using 'zpool online' or replace the device with
        'zpool replace'.
  scan: scrub in progress since Tue May 27 09:19:44 2014
    3.33G scanned out of 71.0G at 89.8M/s, 0h12m to go
    0 repaired, 4.69% done
config:

        NAME        STATE     READ WRITE CKSUM
        TEST_IMG  DEGRADED     0     0     0
          mirror-0  ONLINE       0     0     0
            sdb     ONLINE       0     0     0
            sdc     ONLINE       0     0     0
          mirror-1  ONLINE       0     0     0
            sdd     ONLINE       0     0     0
            sde     ONLINE       0     0     0
          mirror-2  ONLINE       0     0     0
            sdf     ONLINE       0     0     0
            sdg     ONLINE       0     0     0
          mirror-3  DEGRADED     0     0     0
            sdi     ONLINE       0     0     0
            sdh     REMOVED      0     0     0
          mirror-4  ONLINE       0     0     0
            sdj     ONLINE       0     0     0
            sdk     ONLINE       0     0     0


물리적으로 디스크 제거
시스템로그 아래처럼 남고...zpool status 는 그대로...
May 27 09:36:04 222-122-15-69 kernel: [923467.661104] ata8: exception Emask 0x10 SAct 0x0 SErr 0x190002 action 0xe frozen
May 27 09:36:04 222-122-15-69 kernel: [923467.704326] ata8: irq_stat 0x80400000, PHY RDY changed
May 27 09:36:04 222-122-15-69 kernel: [923467.749221] ata8: SError: { RecovComm PHYRdyChg 10B8B Dispar }
May 27 09:36:04 222-122-15-69 kernel: [923467.794725] ata8: hard resetting link
May 27 09:36:05 222-122-15-69 kernel: [923468.518079] ata8: SATA link down (SStatus 0 SControl 300)
May 27 09:36:05 222-122-15-69 kernel: [923468.518099] ata8: EH complete
May 27 09:36:05 222-122-15-69 kernel: [923468.518113] ata8.00: detaching (SCSI 7:0:0:0)
May 27 09:36:05 222-122-15-69 kernel: [923468.520713] sd 7:0:0:0: [sdh] Stopping disk
May 27 09:36:05 222-122-15-69 kernel: [923468.520749] sd 7:0:0:0: [sdh] START_STOP FAILED
May 27 09:36:05 222-122-15-69 kernel: [923468.520753] sd 7:0:0:0: [sdh]
May 27 09:36:05 222-122-15-69 kernel: [923468.520756] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK


동일 디스크 재장착
May 27 09:38:15 222-122-15-69 kernel: [923598.613664] ata8: exception Emask 0x10 SAct 0x0 SErr 0x4050000 action 0xe frozen
May 27 09:38:15 222-122-15-69 kernel: [923598.652243] ata8: irq_stat 0x80400040, connection status changed
May 27 09:38:15 222-122-15-69 kernel: [923598.692468] ata8: SError: { PHYRdyChg CommWake DevExch }
May 27 09:38:15 222-122-15-69 kernel: [923598.733490] ata8: hard resetting link
May 27 09:38:25 222-122-15-69 kernel: [923608.740222] ata8: softreset failed (1st FIS failed)
May 27 09:38:25 222-122-15-69 kernel: [923608.782047] ata8: hard resetting link
May 27 09:38:27 222-122-15-69 kernel: [923610.833680] ata8: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
May 27 09:38:27 222-122-15-69 kernel: [923610.834363] ata8.00: ATA-8: ST3000DM001-9YN166, CC82, max UDMA/133
May 27 09:38:27 222-122-15-69 kernel: [923610.834369] ata8.00: 5860533168 sectors, multi 0: LBA48 NCQ (depth 31/32), AA
May 27 09:38:27 222-122-15-69 kernel: [923610.835024] ata8.00: configured for UDMA/133
May 27 09:38:27 222-122-15-69 kernel: [923610.835035] ata8: EH complete
May 27 09:38:27 222-122-15-69 kernel: [923610.835173] scsi 7:0:0:0: Direct-Access     ATA      ST3000DM001-9YN1 CC82 PQ: 0 ANSI: 5
May 27 09:38:27 222-122-15-69 kernel: [923610.835537] sd 7:0:0:0: [sdh] 5860533168 512-byte logical blocks: (3.00 TB/2.72 TiB)
May 27 09:38:27 222-122-15-69 kernel: [923610.835548] sd 7:0:0:0: [sdh] 4096-byte physical blocks
May 27 09:38:27 222-122-15-69 kernel: [923610.835851] sd 7:0:0:0: [sdh] Write Protect is off
May 27 09:38:27 222-122-15-69 kernel: [923610.835958] sd 7:0:0:0: [sdh] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
May 27 09:38:27 222-122-15-69 kernel: [923610.836557] sd 7:0:0:0: Attached scsi generic sg7 type 0
May 27 09:38:27 222-122-15-69 kernel: [923610.890396]  sdh: sdh1 sdh9
May 27 09:38:27 222-122-15-69 kernel: [923610.891214] sd 7:0:0:0: [sdh] Attached SCSI disk


무결성 검사
# zpool scrub TEST_IMG
# zpool status
  pool: TEST_IMG
 state: ONLINE
status: One or more devices has experienced an unrecoverable error.  An
        attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
        using 'zpool clear' or replace the device with 'zpool replace'.
   see: http://zfsonlinux.org/msg/ZFS-8000-9P
  scan: scrub in progress since Tue May 27 09:39:20 2014
    37.3G scanned out of 71.0G at 159M/s, 0h3m to go
    10.8M repaired, 52.47% done
config:

        NAME        STATE     READ WRITE CKSUM
        TEST_IMG  ONLINE       0     0     0
          mirror-0  ONLINE       0     0     0
            sdb     ONLINE       0     0     0
            sdc     ONLINE       0     0     0
          mirror-1  ONLINE       0     0     0
            sdd     ONLINE       0     0     0
            sde     ONLINE       0     0     0
          mirror-2  ONLINE       0     0     0
            sdf     ONLINE       0     0     0
            sdg     ONLINE       0     0     0
          mirror-3  ONLINE       0     0     0
            sdi     ONLINE       0     0     0
            sdh     ONLINE       0     0 2.63K  (repairing)
          mirror-4  ONLINE       0     0     0
            sdj     ONLINE       0     0     0
            sdk     ONLINE       0     0     0


복구완료
# zpool statusTEST_IMG
  pool: TEST_IMG
 state: ONLINE
status: One or more devices has experienced an unrecoverable error.  An
        attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
        using 'zpool clear' or replace the device with 'zpool replace'.
   see: http://zfsonlinux.org/msg/ZFS-8000-9P
  scan: scrub repaired 11.2M in 0h8m with 0 errors on Tue May 27 09:48:14 2014
config:

        NAME        STATE     READ WRITE CKSUM
        TEST_IMG  ONLINE       0     0     0
          mirror-0  ONLINE       0     0     0
            sdb     ONLINE       0     0     0
            sdc     ONLINE       0     0     0
          mirror-1  ONLINE       0     0     0
            sdd     ONLINE       0     0     0
            sde     ONLINE       0     0     0
          mirror-2  ONLINE       0     0     0
            sdf     ONLINE       0     0     0
            sdg     ONLINE       0     0     0
          mirror-3  ONLINE       0     0     0
            sdi     ONLINE       0     0     0
            sdh     ONLINE       0     0 2.69K
          mirror-4  ONLINE       0     0     0
            sdj     ONLINE       0     0     0
            sdk     ONLINE       0     0     0


각미러 그룹별 1개씩 디스크 에러발생
# zpool status TEST_IMG
  pool: TEST_IMG
 state: DEGRADED
status: One or more devices could not be used because the label is missing or
        invalid.  Sufficient replicas exist for the pool to continue
        functioning in a degraded state.
action: Replace the device using 'zpool replace'.
   see: http://zfsonlinux.org/msg/ZFS-8000-4J
  scan: scrub repaired 11.2M in 0h8m with 0 errors on Tue May 27 09:48:14 2014
config:

        NAME        STATE     READ WRITE CKSUM
        TEST_IMG  DEGRADED     0     0     0
          mirror-0  DEGRADED     0     0     0
            sdb     ONLINE       0     0     0
            sdc     UNAVAIL      4   112     0  corrupted data
          mirror-1  DEGRADED     0     0     0
            sdd     ONLINE       0     0     0
            sde     UNAVAIL      4    13     0  corrupted data
          mirror-2  DEGRADED     0     0     0
            sdf     UNAVAIL      3    24     0  corrupted data
            sdg     ONLINE       0     0     0
          mirror-3  DEGRADED     0     0     0
            sdi     ONLINE       0     0     0
            sdh     UNAVAIL      0     0     0
          mirror-4  DEGRADED     0     0     0
            sdj     ONLINE       0     0     0
            sdk     UNAVAIL      0     0     0


디스크 재장착
# zpool status TEST_IMG
  pool: TEST_IMG
 state: DEGRADED
status: One or more devices could not be used because the label is missing or
        invalid.  Sufficient replicas exist for the pool to continue
        functioning in a degraded state.
action: Replace the device using 'zpool replace'.
   see: http://zfsonlinux.org/msg/ZFS-8000-4J
  scan: scrub repaired 1.35M in 0h0m with 0 errors on Tue May 27 09:51:45 2014
config:

        NAME        STATE     READ WRITE CKSUM
        TEST_IMG  DEGRADED     0     0     0
          mirror-0  ONLINE       0     0     0
            sdb     ONLINE       0     0     0
            sdc     ONLINE       4   112   148
          mirror-1  DEGRADED     0     0     0
            sdd     ONLINE       0     0     0
            sde     FAULTED      4    13     0  corrupted data
          mirror-2  DEGRADED     0     0     0
            sdf     FAULTED      3    24     0  corrupted data
            sdg     ONLINE       0     0     0
          mirror-3  DEGRADED     0     0     0
            sdi     ONLINE       0     0     0
            sdh     FAULTED      0     0     0  corrupted data
          mirror-4  DEGRADED     0     0     0
            sdj     ONLINE       0     0     0
            sdk     FAULTED      0     0     0  corrupted data

일부 디스크쪽에서 손상된 데이타 메세지가 발생되었으나 현재 데이타에는 특별한 증상이 없음
시스템 로그는 리부팅이후 증상이 사라짐

리부팅
# zpool status
  pool: TEST_IMG
 state: ONLINE
status: One or more devices is currently being resilvered.  The pool will
        continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Tue May 27 20:03:02 2014
    9.73G scanned out of 71.0G at 69.7M/s, 0h15m to go
    7.79G resilvered, 13.70% done
config:

        NAME        STATE     READ WRITE CKSUM
        TEST_IMG  ONLINE       0     0     0
          mirror-0  ONLINE       0     0     0
            sdb     ONLINE       0     0     0
            sdc     ONLINE       0     0     0
          mirror-1  ONLINE       0     0     0
            sdd     ONLINE       0     0     0
            sde     ONLINE       0     0     2  (resilvering)
          mirror-2  ONLINE       0     0     0
            sdf     ONLINE       0     0     3  (resilvering)
            sdg     ONLINE       0     0     0
          mirror-3  ONLINE       0     0     0
            sdi     ONLINE       0     0     0
            sdh     ONLINE       0     0     0  (resilvering)
          mirror-4  ONLINE       0     0     0
            sdj     ONLINE       0     0     0
            sdk     ONLINE       0     0     0  (resilvering)

# zpool status TEST_IMG
  pool: TEST_IMG
 state: ONLINE
status: One or more devices has experienced an unrecoverable error.  An
        attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
        using 'zpool clear' or replace the device with 'zpool replace'.
   see: http://zfsonlinux.org/msg/ZFS-8000-9P
  scan: resilvered 56.7G in 0h4m with 0 errors on Tue May 27 20:07:55 2014
config:

        NAME        STATE     READ WRITE CKSUM
        TEST_IMG  ONLINE       0     0     0
          mirror-0  ONLINE       0     0     0
            sdb     ONLINE       0     0     0
            sdc     ONLINE       0     0     0
          mirror-1  ONLINE       0     0     0
            sdd     ONLINE       0     0     0
            sde     ONLINE       0     0     2
          mirror-2  ONLINE       0     0     0
            sdf     ONLINE       0     0     3
            sdg     ONLINE       0     0     0
          mirror-3  ONLINE       0     0     0
            sdi     ONLINE       0     0     0
            sdh     ONLINE       0     0     0
          mirror-4  ONLINE       0     0     0
            sdj     ONLINE       0     0     0
            sdk     ONLINE       0     0     0


디스크 에러 클리어!!!
zpool clear TEST_IMG
# zpool status TEST_IMG
  pool: TEST_IMG
 state: ONLINE
  scan: resilvered 56.7G in 0h4m with 0 errors on Tue May 27 20:07:55 2014
config:

        NAME        STATE     READ WRITE CKSUM
        TEST_IMG  ONLINE       0     0     0
          mirror-0  ONLINE       0     0     0
            sdb     ONLINE       0     0     0
            sdc     ONLINE       0     0     0
          mirror-1  ONLINE       0     0     0
            sdd     ONLINE       0     0     0
            sde     ONLINE       0     0     0
          mirror-2  ONLINE       0     0     0
            sdf     ONLINE       0     0     0
            sdg     ONLINE       0     0     0
          mirror-3  ONLINE       0     0     0
            sdi     ONLINE       0     0     0
            sdh     ONLINE       0     0     0
          mirror-4  ONLINE       0     0     0
            sdj     ONLINE       0     0     0
            sdk     ONLINE       0     0     0



이름 패스워드
비밀글 (체크하면 글쓴이만 내용을 확인할 수 있습니다.)
왼쪽의 글자를 입력하세요.
   

 



 
사이트명 : 모지리네 | 대표 : 이경현 | 개인커뮤니티 : 랭키닷컴 운영체제(OS) | 경기도 성남시 분당구 | 전자우편 : mojily골뱅이chonnom.com Copyright ⓒ www.chonnom.com www.kyunghyun.net www.mojily.net. All rights reserved.