Private Blogs

Too much disk IO on sda in RAID10 setup - part 2

Ingo Jürgensmann - 8. Januar 2019 - 20:07

Some days ago I blogged about my issue with one of the disks in my server having a high utilization and latency. There have been several ideas and guesses what the reason might be, but I think I found the root cause today:

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed: read failure       90%     50113         60067424
# 2  Short offline       Completed without error       00%      3346   

After the RAID-Sync-Sunday last weekend I removed sda from my RAIDs today and started a "smartctl -t long /dev/sda". This test was quickly aborted because it already ran into an read error after just a few minutes. Currently I'm still running a "badblocks -w" test and this is the result so far:

# badblocks -s -c 65536 -w /dev/sda
Testing with pattern 0xaa: done
Reading and comparing: 42731372done, 4:57:27 elapsed. (0/0/0 errors)
42731373done, 4:57:30 elapsed. (1/0/0 errors)
42731374done, 4:57:33 elapsed. (2/0/0 errors)
42731375done, 4:57:36 elapsed. (3/0/0 errors)
 46.82% done, 6:44:41 elapsed. (4/0/0 errors)

Long story short: I already ordered a replacement disk!

But what's also interesting is this:

I removed the disk today at approx. 12:00 and you can see the immediate effect on the other disks/LVs (the high blue graph from sda shows the badblocks test), although the RAID10 is now in degraded mode. Interesting what effect (currently) 4 defect blocks can have to a RAID10 performance without smartctl taking notice of this. Smartctl only reported an issue after issueing the selftest. It's also strange that the latency and high utilization slowly increased over time, like 6 months or so.

Kategorie: DebianTags: DebianServerHardwarePerformance 
Kategorien: Private Blogs

Too much disk IO on sda in RAID10 setup

Ingo Jürgensmann - 5. Januar 2019 - 17:37

I have a RAID10 setup with 4x 2 TB WD Red disks in my server. Although the setup works fairly well and has enough throughput there is one strange issue with that setup: /dev/sda has more utilzation/load than the other 3 disks. See the blue line in the following graph which represents utilization by week for sda:

As you can see from the graphs and from the numbers below sda has a 2-3 times higher utilization than sdb, sdc or sdd, especially when looking at disk latency graph by Munin:

Although the graphs are a little confusing you can easily spot the big difference from the below values. And it's not only Munin showing this strange behaviour of sda, but also atop: 

Here you see that sda is 94% busy although the writes to the disks are a little bit lower than on the other disks. The screenshot of atop was before I moved MySQL/MariaDB to my NVMe disk 4 weeks ago. But you can also spot that sda is slowing down the RAID10.

So the big question is: why is utilization and latency of sda that high? it's the same disk model as the other disks. All disks are connected to a Supermicro X9SRi-F mainboard. The first two SATA ports are 6 Gbit/s, the other 4 ports are 3 Gbit/s ports:

sda sdb sdc sdd
=== START OF INFORMATION SECTION ===
Model Family:     Western Digital Red
Device Model:     WDC WD20EFRX-68AX9N0
Serial Number:    WD-WMC301414725
LU WWN Device Id: 5 0014ee 65887fe2c
Firmware Version: 80.00A80
User Capacity:    2,000,398,934,016 bytes [2.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-2 (minor revision not indicated)
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Sat Jan  5 17:24:58 2019 CET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF INFORMATION SECTION ===
Model Family:     Western Digital Red
Device Model:     WDC WD20EFRX-68AX9N0
Serial Number:    WD-WMC301414372
LU WWN Device Id: 5 0014ee 65887f374
Firmware Version: 80.00A80
User Capacity:    2,000,398,934,016 bytes [2.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-2 (minor revision not indicated)
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Sat Jan  5 17:27:01 2019 CET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF INFORMATION SECTION ===
Model Family:     Western Digital Red
Device Model:     WDC WD20EFRX-68AX9N0
Serial Number:    WD-WMC301411045
LU WWN Device Id: 5 0014ee 603329112
Firmware Version: 80.00A80
User Capacity:    2,000,398,934,016 bytes [2.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-2 (minor revision not indicated)
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is:    Sat Jan  5 17:30:15 2019 CET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF INFORMATION SECTION ===
Model Family:     Western Digital Red
Device Model:     WDC WD20EFRX-68AX9N0
Serial Number:    WD-WMC301391344
LU WWN Device Id: 5 0014ee 60332a8aa
Firmware Version: 80.00A80
User Capacity:    2,000,398,934,016 bytes [2.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-2 (minor revision not indicated)
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is:    Sat Jan  5 17:30:26 2019 CET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

There is even the same firmware version of the disks. Usually I would have expected a slower disk IO from the 3 Gbit/s disks (sdc & sdd), but not from sda. All disks are configured in BIOS to use AHCI.

I cannot explain why sda has higher latency and more utilization than the other disks. Any ideas are welcome and appreciated. You can also reach me in the Fediverse (Friendica: https://nerdica.net/profile/ij & Mastodon: @ij">https://nerdculture.de/) or via XMPP at ij@nerdica.net.

Kategorie: DebianTags: DebianServerRAIDPerformance 
Kategorien: Private Blogs

Adding NVMe to Server

Ingo Jürgensmann - 15. Dezember 2018 - 18:45

My server runs on a RAID10 of 4x WD RAID 2 TB disks. Basically those disks are fast enough to cope with the disk load of the virtual machines (VMs). But since many users moved away from Facebook and Google, my Friendica installation on Nerdica.Net has a growing user count putting a large disk I/O load with many small reads & writes on the disks, resulting a slowing down the general disk I/O for all the VMs and the server itself. On mdraid-sync-Sunday this month the server needed two full days to sync its RAID10.

So the idea was to remove the high disk I/O load from the rotational disks the something different. For that reason I bought a Samsung Pro 970 512 GB NVMe disk and a matching PCIe 3.0 card to be put into my server in the colocation. On Thursday the Samsung has been installed by the rrbone staff in the colocation. I moved the PostgreSQL and MySQL databases from the RAID10 to the NVMe disk and restarted services again.

Here are some results from Munin monitoring: 

Disk Utilization

Here you can see how the disk utilization dropped after NVMe installation. The red coloured bar symbolizes the average utilization on RAID10 disks and the green bar symbolizes the same RAID10 after the databases were moved to the NVMe disk. There's roughly 20% less utilization now, whch is good.

Disk Latency

Here you can see the same coloured bars for the disk latency. As you can see the latency dropped by 1/3 now.

CPU I/O wait

The most significant graph is maybe the CPU graph where you can see a large portion of iowait of the CPUs. This is no longer true as there is apparently no significant iowait anymore thanks to the low latency and high IOPS nature of SSD/NVMe disks.

Overall I cannot confirm that adding the NVMe disk results in a significant faster page load of Friendica or Mastodon, maybe because other measurements like Redis/Memcached or pgbouncer already helped a lot before the NVMe disk, but it helps a lot with general disk I/O load and improving disk speeds inside of the VMs, like for my regular backups and such.

Ah, one thing to report is: in a quick test pgbench reported >2200 tps on NVMe now. That at least is a real speed improvement, maybe by order of 10 or so.

Kategorie: DebianTags: DebianServerHardware
Kategorien: Private Blogs