DISK 복구 (DEGRADED -> ONLINE)
디스크 1개가 REMOVED 상태
※ Device Failure and Recovery (장치 오류및 복구)
DEGRADED
하나 이상의 구성장치가 오프라인 상태
디스크에 Fail이 발생하였지만 기능은 동작중인 상태이며 Mirror 구성일때 나타납니다.
One or more top-level vdevs is in the degraded state because one or more component devices are offline. Sufficient replicas exist to continue functioning.
One or more component devices is in the degraded or faulted state, but sufficient replicas exist to continue functioning. The underlying conditions are as follows:
o The number of checksum errors exceeds acceptable levels and the device is degraded as an indication that something may be wrong. ZFS continues to use the device as necessary.
o The number of I/O errors exceeds acceptable levels. The device could not be marked as faulted because there are insufficient replicas to con-tinue functioning.
FAULTED
풀의 결함 허용이 손상되었을 수 있음을 나타내는 표시로 디스크에 엑세스가 불가능하여 장애가 발생한 상태
One or more top-level vdevs is in the faulted state because one or more component devices are offline. Insufficient replicas exist to continue function-ing.
One or more component devices is in the faulted state, and insufficient replicas exist to continue functioning. The underlying conditions are as follows:
o The device could be opened, but the contents did not match expected values.
o The number of I/O errors exceeds acceptable levels and the device is faulted to prevent further use of the device.
OFFLINE The device was explicitly taken offline by the "zpool offline" command.
Inactive 상태로 디스크 제거했을 경우
ONLINE The device is online and functioning.
REMOVED
물리적으로 디스크가 제거된 상태이며 장치제거 감지는 하드웨어에 따라서 지원되지 않을수 있습니다.
The device was physically removed while the system was running. Device removal detection is hardware-dependent and may not be supported on all platforms.
UNAVAIL
장치가 물리적으로 제거 이후 다시연결 되어있을때이며 온라인상태로 배치하기위해 시도
장치 검색은 하드웨어에 의존적이며 지원되지 않을수 있습니다.
The device could not be opened. If a pool is imported when a device was unavailable, then the device will be identified by a unique identifier instead of its path since the path was never correct in the first place.
If a device is removed and later re-attached to the system, ZFS attempts to put the device online automatically. Device attach detection is hardware-dependent and might not be supported on all platforms.
특정 디스크에서 REMOVED 증상발견
# zpool status
pool: TEST_IMG
state: DEGRADED
status: One or more devices has been removed by the administrator.
Sufficient replicas exist for the pool to continue functioning in a
degraded state.
action: Online the device using 'zpool online' or replace the device with
'zpool replace'.
scan: scrub in progress since Tue May 27 09:19:44 2014
3.33G scanned out of 71.0G at 89.8M/s, 0h12m to go
0 repaired, 4.69% done
config:
NAME STATE READ WRITE CKSUM
TEST_IMG DEGRADED 0 0 0
mirror-0 ONLINE 0 0 0
sdb ONLINE 0 0 0
sdc ONLINE 0 0 0
mirror-1 ONLINE 0 0 0
sdd ONLINE 0 0 0
sde ONLINE 0 0 0
mirror-2 ONLINE 0 0 0
sdf ONLINE 0 0 0
sdg ONLINE 0 0 0
mirror-3 DEGRADED 0 0 0
sdi ONLINE 0 0 0
sdh REMOVED 0 0 0
mirror-4 ONLINE 0 0 0
sdj ONLINE 0 0 0
sdk ONLINE 0 0 0
물리적으로 디스크 제거
시스템로그 아래처럼 남고...zpool status 는 그대로...
May 27 09:36:04 222-122-15-69 kernel: [923467.661104] ata8: exception Emask 0x10 SAct 0x0 SErr 0x190002 action 0xe frozen
May 27 09:36:04 222-122-15-69 kernel: [923467.704326] ata8: irq_stat 0x80400000, PHY RDY changed
May 27 09:36:04 222-122-15-69 kernel: [923467.749221] ata8: SError: { RecovComm PHYRdyChg 10B8B Dispar }
May 27 09:36:04 222-122-15-69 kernel: [923467.794725] ata8: hard resetting link
May 27 09:36:05 222-122-15-69 kernel: [923468.518079] ata8: SATA link down (SStatus 0 SControl 300)
May 27 09:36:05 222-122-15-69 kernel: [923468.518099] ata8: EH complete
May 27 09:36:05 222-122-15-69 kernel: [923468.518113] ata8.00: detaching (SCSI 7:0:0:0)
May 27 09:36:05 222-122-15-69 kernel: [923468.520713] sd 7:0:0:0: [sdh] Stopping disk
May 27 09:36:05 222-122-15-69 kernel: [923468.520749] sd 7:0:0:0: [sdh] START_STOP FAILED
May 27 09:36:05 222-122-15-69 kernel: [923468.520753] sd 7:0:0:0: [sdh]
May 27 09:36:05 222-122-15-69 kernel: [923468.520756] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
동일 디스크 재장착
May 27 09:38:15 222-122-15-69 kernel: [923598.613664] ata8: exception Emask 0x10 SAct 0x0 SErr 0x4050000 action 0xe frozen
May 27 09:38:15 222-122-15-69 kernel: [923598.652243] ata8: irq_stat 0x80400040, connection status changed
May 27 09:38:15 222-122-15-69 kernel: [923598.692468] ata8: SError: { PHYRdyChg CommWake DevExch }
May 27 09:38:15 222-122-15-69 kernel: [923598.733490] ata8: hard resetting link
May 27 09:38:25 222-122-15-69 kernel: [923608.740222] ata8: softreset failed (1st FIS failed)
May 27 09:38:25 222-122-15-69 kernel: [923608.782047] ata8: hard resetting link
May 27 09:38:27 222-122-15-69 kernel: [923610.833680] ata8: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
May 27 09:38:27 222-122-15-69 kernel: [923610.834363] ata8.00: ATA-8: ST3000DM001-9YN166, CC82, max UDMA/133
May 27 09:38:27 222-122-15-69 kernel: [923610.834369] ata8.00: 5860533168 sectors, multi 0: LBA48 NCQ (depth 31/32), AA
May 27 09:38:27 222-122-15-69 kernel: [923610.835024] ata8.00: configured for UDMA/133
May 27 09:38:27 222-122-15-69 kernel: [923610.835035] ata8: EH complete
May 27 09:38:27 222-122-15-69 kernel: [923610.835173] scsi 7:0:0:0: Direct-Access ATA ST3000DM001-9YN1 CC82 PQ: 0 ANSI: 5
May 27 09:38:27 222-122-15-69 kernel: [923610.835537] sd 7:0:0:0: [sdh] 5860533168 512-byte logical blocks: (3.00 TB/2.72 TiB)
May 27 09:38:27 222-122-15-69 kernel: [923610.835548] sd 7:0:0:0: [sdh] 4096-byte physical blocks
May 27 09:38:27 222-122-15-69 kernel: [923610.835851] sd 7:0:0:0: [sdh] Write Protect is off
May 27 09:38:27 222-122-15-69 kernel: [923610.835958] sd 7:0:0:0: [sdh] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
May 27 09:38:27 222-122-15-69 kernel: [923610.836557] sd 7:0:0:0: Attached scsi generic sg7 type 0
May 27 09:38:27 222-122-15-69 kernel: [923610.890396] sdh: sdh1 sdh9
May 27 09:38:27 222-122-15-69 kernel: [923610.891214] sd 7:0:0:0: [sdh] Attached SCSI disk
무결성 검사
# zpool scrub TEST_IMG
# zpool status
pool: TEST_IMG
state: ONLINE
status: One or more devices has experienced an unrecoverable error. An
attempt was made to correct the error. Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
using 'zpool clear' or replace the device with 'zpool replace'.
see: http://zfsonlinux.org/msg/ZFS-8000-9P
scan: scrub in progress since Tue May 27 09:39:20 2014
37.3G scanned out of 71.0G at 159M/s, 0h3m to go
10.8M repaired, 52.47% done
config:
NAME STATE READ WRITE CKSUM
TEST_IMG ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
sdb ONLINE 0 0 0
sdc ONLINE 0 0 0
mirror-1 ONLINE 0 0 0
sdd ONLINE 0 0 0
sde ONLINE 0 0 0
mirror-2 ONLINE 0 0 0
sdf ONLINE 0 0 0
sdg ONLINE 0 0 0
mirror-3 ONLINE 0 0 0
sdi ONLINE 0 0 0
sdh ONLINE 0 0 2.63K (repairing)
mirror-4 ONLINE 0 0 0
sdj ONLINE 0 0 0
sdk ONLINE 0 0 0
복구완료
# zpool statusTEST_IMG
pool: TEST_IMG
state: ONLINE
status: One or more devices has experienced an unrecoverable error. An
attempt was made to correct the error. Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
using 'zpool clear' or replace the device with 'zpool replace'.
see: http://zfsonlinux.org/msg/ZFS-8000-9P
scan: scrub repaired 11.2M in 0h8m with 0 errors on Tue May 27 09:48:14 2014
config:
NAME STATE READ WRITE CKSUM
TEST_IMG ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
sdb ONLINE 0 0 0
sdc ONLINE 0 0 0
mirror-1 ONLINE 0 0 0
sdd ONLINE 0 0 0
sde ONLINE 0 0 0
mirror-2 ONLINE 0 0 0
sdf ONLINE 0 0 0
sdg ONLINE 0 0 0
mirror-3 ONLINE 0 0 0
sdi ONLINE 0 0 0
sdh ONLINE 0 0 2.69K
mirror-4 ONLINE 0 0 0
sdj ONLINE 0 0 0
sdk ONLINE 0 0 0
각미러 그룹별 1개씩 디스크 에러발생
# zpool status TEST_IMG
pool: TEST_IMG
state: DEGRADED
status: One or more devices could not be used because the label is missing or
invalid. Sufficient replicas exist for the pool to continue
functioning in a degraded state.
action: Replace the device using 'zpool replace'.
see: http://zfsonlinux.org/msg/ZFS-8000-4J
scan: scrub repaired 11.2M in 0h8m with 0 errors on Tue May 27 09:48:14 2014
config:
NAME STATE READ WRITE CKSUM
TEST_IMG DEGRADED 0 0 0
mirror-0 DEGRADED 0 0 0
sdb ONLINE 0 0 0
sdc UNAVAIL 4 112 0 corrupted data
mirror-1 DEGRADED 0 0 0
sdd ONLINE 0 0 0
sde UNAVAIL 4 13 0 corrupted data
mirror-2 DEGRADED 0 0 0
sdf UNAVAIL 3 24 0 corrupted data
sdg ONLINE 0 0 0
mirror-3 DEGRADED 0 0 0
sdi ONLINE 0 0 0
sdh UNAVAIL 0 0 0
mirror-4 DEGRADED 0 0 0
sdj ONLINE 0 0 0
sdk UNAVAIL 0 0 0
디스크 재장착
# zpool status TEST_IMG
pool: TEST_IMG
state: DEGRADED
status: One or more devices could not be used because the label is missing or
invalid. Sufficient replicas exist for the pool to continue
functioning in a degraded state.
action: Replace the device using 'zpool replace'.
see: http://zfsonlinux.org/msg/ZFS-8000-4J
scan: scrub repaired 1.35M in 0h0m with 0 errors on Tue May 27 09:51:45 2014
config:
NAME STATE READ WRITE CKSUM
TEST_IMG DEGRADED 0 0 0
mirror-0 ONLINE 0 0 0
sdb ONLINE 0 0 0
sdc ONLINE 4 112 148
mirror-1 DEGRADED 0 0 0
sdd ONLINE 0 0 0
sde FAULTED 4 13 0 corrupted data
mirror-2 DEGRADED 0 0 0
sdf FAULTED 3 24 0 corrupted data
sdg ONLINE 0 0 0
mirror-3 DEGRADED 0 0 0
sdi ONLINE 0 0 0
sdh FAULTED 0 0 0 corrupted data
mirror-4 DEGRADED 0 0 0
sdj ONLINE 0 0 0
sdk FAULTED 0 0 0 corrupted data
일부 디스크쪽에서 손상된 데이타 메세지가 발생되었으나 현재 데이타에는 특별한 증상이 없음
시스템 로그는 리부팅이후 증상이 사라짐
리부팅
# zpool status
pool: TEST_IMG
state: ONLINE
status: One or more devices is currently being resilvered. The pool will
continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
scan: resilver in progress since Tue May 27 20:03:02 2014
9.73G scanned out of 71.0G at 69.7M/s, 0h15m to go
7.79G resilvered, 13.70% done
config:
NAME STATE READ WRITE CKSUM
TEST_IMG ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
sdb ONLINE 0 0 0
sdc ONLINE 0 0 0
mirror-1 ONLINE 0 0 0
sdd ONLINE 0 0 0
sde ONLINE 0 0 2 (resilvering)
mirror-2 ONLINE 0 0 0
sdf ONLINE 0 0 3 (resilvering)
sdg ONLINE 0 0 0
mirror-3 ONLINE 0 0 0
sdi ONLINE 0 0 0
sdh ONLINE 0 0 0 (resilvering)
mirror-4 ONLINE 0 0 0
sdj ONLINE 0 0 0
sdk ONLINE 0 0 0 (resilvering)
# zpool status TEST_IMG
pool: TEST_IMG
state: ONLINE
status: One or more devices has experienced an unrecoverable error. An
attempt was made to correct the error. Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
using 'zpool clear' or replace the device with 'zpool replace'.
see: http://zfsonlinux.org/msg/ZFS-8000-9P
scan: resilvered 56.7G in 0h4m with 0 errors on Tue May 27 20:07:55 2014
config:
NAME STATE READ WRITE CKSUM
TEST_IMG ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
sdb ONLINE 0 0 0
sdc ONLINE 0 0 0
mirror-1 ONLINE 0 0 0
sdd ONLINE 0 0 0
sde ONLINE 0 0 2
mirror-2 ONLINE 0 0 0
sdf ONLINE 0 0 3
sdg ONLINE 0 0 0
mirror-3 ONLINE 0 0 0
sdi ONLINE 0 0 0
sdh ONLINE 0 0 0
mirror-4 ONLINE 0 0 0
sdj ONLINE 0 0 0
sdk ONLINE 0 0 0
디스크 에러 클리어!!!
# zpool clear TEST_IMG
# zpool status TEST_IMG
pool: TEST_IMG
state: ONLINE
scan: resilvered 56.7G in 0h4m with 0 errors on Tue May 27 20:07:55 2014
config:
NAME STATE READ WRITE CKSUM
TEST_IMG ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
sdb ONLINE 0 0 0
sdc ONLINE 0 0 0
mirror-1 ONLINE 0 0 0
sdd ONLINE 0 0 0
sde ONLINE 0 0 0
mirror-2 ONLINE 0 0 0
sdf ONLINE 0 0 0
sdg ONLINE 0 0 0
mirror-3 ONLINE 0 0 0
sdi ONLINE 0 0 0
sdh ONLINE 0 0 0
mirror-4 ONLINE 0 0 0
sdj ONLINE 0 0 0
sdk ONLINE 0 0 0