ansible/docs/guides/nas-sp00-smart-audit-2026-05-21.md
ilia de49b34cdc
Some checks failed
CI / skip-ci-check (pull_request) Successful in 6s
CI / lint-and-test (pull_request) Failing after 9s
CI / ansible-validation (pull_request) Failing after 6s
CI / secret-scanning (pull_request) Successful in 5s
CI / dependency-scan (pull_request) Successful in 8s
CI / sast-scan (pull_request) Failing after 5s
CI / license-check (pull_request) Successful in 11s
CI / vault-check (pull_request) Failing after 6s
CI / playbook-test (pull_request) Failing after 6s
CI / container-scan (pull_request) Failing after 6s
CI / sonar-analysis (pull_request) Failing after 2s
CI / workflow-summary (pull_request) Successful in 4s
Add homelab monitoring, portfolio site, and vault tooling.
Document pve10 static IPs, monitoring stack, and site LXCs; add portfolio
to inventory; Mailcow mailbox automation; vault import/export scripts;
security audit guides and UniFi DHCP reference.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-22 16:25:07 -04:00

9.5 KiB

NAS.SP00 SMART audit

Date: 2026-05-21
Host: PVENAS (Proxmox VE) — 10.0.10.10
Pool: ZFS NAS.SP00
Related: nas-sp00-drive-failure-report.md


Executive summary

Serial Device Capacity ZFS (mirror) SMART health
W4J0L0BA sda 5.00 TB mirror-0 ONLINE PASSED
W4J0L3PY sdb 137 GB mirror-0 UNAVAIL UNKNOWN (read fails)
W4J0K9V7 sdc 5.00 TB mirror-1 ONLINE PASSED
W4J0LKCD sdd 5.00 TB mirror-1 ONLINE PASSED

Pool state at audit time: DEGRADED — failed leg W4J0L3PY (/dev/sdb). No known data errors. Three healthy drives show no reallocated, pending, or uncorrectable sectors.


ZFS pool status

  pool: NAS.SP00
 state: DEGRADED
status: One or more devices could not be used because the label is missing or
        invalid.  Sufficient replicas exist for the pool to continue
        functioning in a degraded state.
action: Replace the device using 'zpool replace'.
  scan: resilvered 0B in 00:00:01 with 0 errors on Thu May 21 21:27:54 2026

        NAME                                 STATE     READ WRITE CKSUM
        NAS.SP00                             DEGRADED     0     0     0
          mirror-0                           DEGRADED     0     0     0
            ata-ST5000DM000-1FK178_W4J0L0BA  ONLINE       0     0     0
            11449632222283419591             UNAVAIL      0     0     0  was /dev/disk/by-id/ata-ST5000DM000-1FK178_W4J0L3PY-part1
          mirror-1                           ONLINE       0     0     0
            ata-ST5000DM000-1FK178_W4J0LKCD  ONLINE       0     0     0
            ata-ST5000DM000-1FK178_W4J0K9V7  ONLINE       0     0     0

errors: No known data errors

Block devices (lsblk)

NAME SIZE MODEL SERIAL ROTA
sda 4.5T ST5000DM000-1FK178 W4J0L0BA 1
sdb 3.9G ST5000DM000 W4J0L3PY 1
sdc 4.5T ST5000DM000-1FK178 W4J0K9V7 1
sdd 4.5T ST5000DM000-1FK178 W4J0LKCD 1

Healthy drives — key metrics

Metric sda (W4J0L0BA) sdc (W4J0K9V7) sdd (W4J0LKCD)
Model ST5000DM000-1FK178 ST5000DM000-1FK178 ST5000DM000-1FK178
Firmware CC48 CC48 CC48
WWN 5000c500082c02f61 5000c500082c7e2ce 5000c500082d84c45
Rotation 5980 rpm 5980 rpm 5980 rpm
SATA 3.1 @ 6.0 Gb/s 3.1 @ 6.0 Gb/s 3.1 @ 6.0 Gb/s
Power-on hours 52,481 (~6.0 y) 53,087 (~6.1 y) 45,580 (~5.2 y)
Temperature 27 °C 30 °C 30 °C
Reallocated sectors 0 0 0
Current pending sectors 0 0 0
Offline uncorrectable 0 0 0
UDMA CRC errors 0 0 0
Start/stop count 350 367 310
Load cycle count 348,974 340,961 184,891
Power cycle count 345 363 309

High Load_Cycle_Count on Seagate Desktop HDD.15 is common (head parking); not alarming when reallocated/pending counts remain zero.


Failed drive — /dev/sdb (W4J0L3PY)

Identity

Field Value
Device Model ST5000DM000 (truncated; not full -1FK178 suffix)
Serial W4J0L3PY
WWN 5000c500082cc8bbb
Firmware CC48
User capacity 137,438,952,960 bytes [137 GB]
Expected capacity 5,000,981,078,016 bytes [5.00 TB]
Rotation 7200 rpm (reported)
SATA 3.0, 6.0 Gb/s

SMART

Read SMART Data failed: scsi error aborted command
SMART Status command failed: scsi error aborted command
SMART overall-health self-assessment test result: UNKNOWN!
SMART Status, Attributes and Thresholds cannot be read.

Action: Replace drive; see nas-sp00-drive-failure-report.md.


Full SMART attributes (healthy drives)

/dev/sda — W4J0L0BA (mirror-0, ONLINE)

SMART overall-health self-assessment test result: PASSED

ID# ATTRIBUTE_NAME          VALUE WORST THRESH TYPE      RAW_VALUE
  1 Raw_Read_Error_Rate     119   100   006    Pre-fail  211189952
  3 Spin_Up_Time            092   091   000    Pre-fail  0
  4 Start_Stop_Count        100   100   020    Old_age   350
  5 Reallocated_Sector_Ct   100   100   010    Pre-fail  0
  7 Seek_Error_Rate         080   060   030    Pre-fail  43979429424
  9 Power_On_Hours          041   041   000    Old_age   52481
 10 Spin_Retry_Count        100   100   097    Pre-fail  0
 12 Power_Cycle_Count       100   100   020    Old_age   345
183 Runtime_Bad_Block       100   100   000    Old_age   0
184 End-to-End_Error        100   100   099    Old_age   0
187 Reported_Uncorrect      100   100   000    Old_age   0
188 Command_Timeout         100   099   000    Old_age   3 3 3
189 High_Fly_Writes         100   100   000    Old_age   0
190 Airflow_Temperature_Cel 073   058   045    Old_age   27 (Min/Max 27/28)
191 G-Sense_Error_Rate      100   100   000    Old_age   0
192 Power-Off_Retract_Count 100   100   000    Old_age   0
193 Load_Cycle_Count        001   001   000    Old_age   348974
194 Temperature_Celsius     027   042   000    Old_age   27
195 Hardware_ECC_Recovered  119   100   000    Old_age   211189952
197 Current_Pending_Sector  100   100   000    Old_age   0
198 Offline_Uncorrectable   100   100   000    Old_age   0
199 UDMA_CRC_Error_Count    200   200   000    Old_age   0
240 Head_Flying_Hours       100   253   000    Old_age   15140h+51m+12.276s
241 Total_LBAs_Written      100   253   000    Old_age   57665101118
242 Total_LBAs_Read         100   253   000    Old_age   160962549062

/dev/sdc — W4J0K9V7 (mirror-1, ONLINE)

SMART overall-health self-assessment test result: PASSED

ID# ATTRIBUTE_NAME          VALUE WORST THRESH TYPE      RAW_VALUE
  1 Raw_Read_Error_Rate     117   100   006    Pre-fail  136042192
  3 Spin_Up_Time            092   091   000    Pre-fail  0
  4 Start_Stop_Count        100   100   020    Old_age   367
  5 Reallocated_Sector_Ct   100   100   010    Pre-fail  0
  7 Seek_Error_Rate         083   060   030    Pre-fail  22512744055
  9 Power_On_Hours          040   040   000    Old_age   53087
 10 Spin_Retry_Count        100   100   097    Pre-fail  0
 12 Power_Cycle_Count       100   100   020    Old_age   363
183 Runtime_Bad_Block       100   100   000    Old_age   0
184 End-to-End_Error        100   100   099    Old_age   0
187 Reported_Uncorrect      100   100   000    Old_age   0
188 Command_Timeout         100   099   000    Old_age   6 6 12
189 High_Fly_Writes         096   096   000    Old_age   4
190 Airflow_Temperature_Cel 070   060   045    Old_age   30 (Min/Max 28/30)
191 G-Sense_Error_Rate      100   100   000    Old_age   0
192 Power-Off_Retract_Count 100   100   000    Old_age   0
193 Load_Cycle_Count        001   001   000    Old_age   340961
194 Temperature_Celsius     030   040   000    Old_age   30
195 Hardware_ECC_Recovered  117   100   000    Old_age   136042192
197 Current_Pending_Sector  100   100   000    Old_age   0
198 Offline_Uncorrectable   100   100   000    Old_age   0
199 UDMA_CRC_Error_Count    200   200   000    Old_age   0
240 Head_Flying_Hours       100   253   000    Old_age   15859h+53m+20.869s
241 Total_LBAs_Written      100   253   000    Old_age   57609506493
242 Total_LBAs_Read         100   253   000    Old_age   152392393081

/dev/sdd — W4J0LKCD (mirror-1, ONLINE)

SMART overall-health self-assessment test result: PASSED

ID# ATTRIBUTE_NAME          VALUE WORST THRESH TYPE      RAW_VALUE
  1 Raw_Read_Error_Rate     116   090   006    Pre-fail  108217848
  3 Spin_Up_Time            092   091   000    Pre-fail  0
  4 Start_Stop_Count        100   100   020    Old_age   310
  5 Reallocated_Sector_Ct   100   100   010    Pre-fail  0
  7 Seek_Error_Rate         073   051   030    Pre-fail  185584998742
  9 Power_On_Hours          048   048   000    Old_age   45580
 10 Spin_Retry_Count        100   100   097    Pre-fail  0
 12 Power_Cycle_Count       100   100   020    Old_age   309
183 Runtime_Bad_Block       100   100   000    Old_age   0
184 End-to-End_Error        100   100   099    Old_age   0
187 Reported_Uncorrect      100   100   000    Old_age   0
188 Command_Timeout         100   099   000    Old_age   8 8 14
189 High_Fly_Writes         098   098   000    Old_age   2
190 Airflow_Temperature_Cel 070   050   045    Old_age   30 (Min/Max 29/30)
191 G-Sense_Error_Rate      100   100   000    Old_age   0
192 Power-Off_Retract_Count 100   100   000    Old_age   0
193 Load_Cycle_Count        008   008   000    Old_age   184891
194 Temperature_Celsius     030   050   000    Old_age   30
195 Hardware_ECC_Recovered  116   100   000    Old_age   108217848
197 Current_Pending_Sector  100   091   000    Old_age   0
198 Offline_Uncorrectable   100   091   000    Old_age   0
199 UDMA_CRC_Error_Count    200   200   000    Old_age   0
240 Head_Flying_Hours       100   253   000    Old_age   11604h+15m+50.842s
241 Total_LBAs_Written      100   253   000    Old_age   72962800596
242 Total_LBAs_Read         100   253   000    Old_age   167268621195

How this audit was collected

On PVENAS as root:

zpool status NAS.SP00
lsblk -d -o NAME,SIZE,MODEL,SERIAL,ROTA,STATE /dev/sd{a,b,c,d}
for d in sda sdb sdc sdd; do smartctl -i -H -A /dev/$d; done

Audit timestamp (host local): Thu May 21 22:13:58 2026 EDT.


Next steps

  1. Replace W4J0L3PY with a 5 TB+ NAS-class HDD (match ST5000DM000-1FK178 or better).
  2. zpool replace NAS.SP00 with the new disk by-id.
  3. Monitor resilver; run zpool scrub NAS.SP00 after pool is ONLINE.
  4. Re-run SMART audit after replacement for a clean baseline.