Toshiba Canvio Alu - Beware of use in mdadm RAID or OS configurations, it goes to sleep

2016-02-01 20:37:42

As many others who have bought a Toshiba Canvio external USB disk I have stumbled upon the sleep mode problem this disk family has.

I bought it to use it in a Linux software RAID1 setup in Debian Jessie 8.3. and I connected it to a USB 3 port with an USB 3 cable in USB 3 mode. Setup was a breeze, Debian accepted the disk without any problems and adding the disk to the RAID1 with mdadm went through without any unusual events. Speed was good too, up to about 100 MB/s according to hdparm -tT.

mdadm completed its recovery process according to /proc/mdstat and the RAID1 looked fine. 

Then after only 5 minutes of idle time it happened. The disk went down in such a deep sleep that it disconnected from the system. Here is an example of dmesg -T:

[fre jan 29 13:28:40 2016] usb 2-1: Disable of device-initiated U1 failed.
[fre jan 29 13:28:45 2016] usb 2-1: Disable of device-initiated U2 failed.
[fre jan 29 13:28:49 2016] usb usb2-port1: Cannot enable. Maybe the USB cable is bad?
[fre jan 29 13:28:53 2016] usb usb2-port1: Cannot enable. Maybe the USB cable is bad?
[fre jan 29 13:28:57 2016] usb usb2-port1: Cannot enable. Maybe the USB cable is bad?
[fre jan 29 13:29:01 2016] usb usb2-port1: Cannot enable. Maybe the USB cable is bad?
[fre jan 29 13:29:01 2016] usb 2-1: USB disconnect, device number 2
[fre jan 29 13:29:01 2016] sd 2:0:0:0: [sdb] Synchronizing SCSI cache
[fre jan 29 13:29:01 2016] sd 2:0:0:0: [sdb] Unhandled error code
[fre jan 29 13:29:01 2016] sd 2:0:0:0: [sdb]  
[fre jan 29 13:29:01 2016] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
[fre jan 29 13:29:01 2016] sd 2:0:0:0: [sdb] CDB: 
[fre jan 29 13:29:01 2016] Write(10): 2a 00 6c 7a fa 00 00 00 80 00
[fre jan 29 13:29:01 2016] end_request: I/O error, dev sdb, sector 1819998720
[fre jan 29 13:29:01 2016] end_request: I/O error, dev sdb, sector 2056
[fre jan 29 13:29:01 2016] md: super_written gets error=-5, uptodate=0
[fre jan 29 13:29:01 2016] md/raid1:md0: Disk failure on sdb1, disabling device.
md/raid1:md0: Operation continuing on 1 devices.
[fre jan 29 13:29:01 2016] end_request: I/O error, dev sdb, sector 11718664
[fre jan 29 13:29:01 2016] md: super_written gets error=-5, uptodate=0
[fre jan 29 13:29:01 2016] md/raid1:md2: Disk failure on sdb3, disabling device.
md/raid1:md2: Operation continuing on 1 devices.
[fre jan 29 13:29:01 2016] sd 2:0:0:0: [sdb] Unhandled error code
[fre jan 29 13:29:01 2016] sd 2:0:0:0: [sdb]  
[fre jan 29 13:29:01 2016] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
[fre jan 29 13:29:01 2016] sd 2:0:0:0: [sdb] CDB: 
[fre jan 29 13:29:01 2016] Write(10): 2a 00 6c 7a fa 80 00 00 80 00
[fre jan 29 13:29:01 2016] end_request: I/O error, dev sdb, sector 1819998848
[fre jan 29 13:29:01 2016] sd 2:0:0:0: [sdb]  
[fre jan 29 13:29:01 2016] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
[fre jan 29 13:29:01 2016] md: md2: recovery interrupted.
[fre jan 29 13:29:01 2016] RAID1 conf printout:
[fre jan 29 13:29:01 2016]  --- wd:1 rd:2
[fre jan 29 13:29:01 2016]  disk 0, wo:0, o:1, dev:sda1
[fre jan 29 13:29:01 2016]  disk 1, wo:1, o:0, dev:sdb1
[fre jan 29 13:29:01 2016] RAID1 conf printout:
[fre jan 29 13:29:01 2016]  --- wd:1 rd:2
[fre jan 29 13:29:01 2016]  disk 0, wo:0, o:1, dev:sda1
[fre jan 29 13:29:02 2016] md: recovery of RAID array md1
[fre jan 29 13:29:02 2016] md: minimum _guaranteed_  speed: 50000 KB/sec/disk.
[fre jan 29 13:29:02 2016] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for recovery.
[fre jan 29 13:29:02 2016] md: using 128k window, over a total of 976320k.
[fre jan 29 13:29:02 2016] md: super_written gets error=-19, uptodate=0
[fre jan 29 13:29:02 2016] md/raid1:md1: Disk failure on sdb2, disabling device.
md/raid1:md1: Operation continuing on 1 devices.
[fre jan 29 13:29:02 2016] md: super_written gets error=-19, uptodate=0
[fre jan 29 13:29:02 2016] md: md1: recovery interrupted.
[fre jan 29 13:29:02 2016] usb 2-1: new SuperSpeed USB device number 3 using xhci_hcd
[fre jan 29 13:29:02 2016] usb 2-1: New USB device found, idVendor=0480, idProduct=a100
[fre jan 29 13:29:02 2016] usb 2-1: New USB device strings: Mfr=2, Product=3, SerialNumber=1
[fre jan 29 13:29:02 2016] usb 2-1: Product: External USB 3.0
[fre jan 29 13:29:02 2016] usb 2-1: Manufacturer: TOSHIBA
[fre jan 29 13:29:02 2016] usb 2-1: SerialNumber: ---------------
[fre jan 29 13:29:02 2016] usb-storage 2-1:1.0: USB Mass Storage device detected
[fre jan 29 13:29:02 2016] scsi3 : usb-storage 2-1:1.0
[fre jan 29 13:29:02 2016] RAID1 conf printout:
[fre jan 29 13:29:02 2016]  --- wd:1 rd:2
[fre jan 29 13:29:02 2016]  disk 0, wo:0, o:1, dev:sda3
[fre jan 29 13:29:02 2016]  disk 1, wo:1, o:0, dev:sdb3
[fre jan 29 13:29:02 2016] RAID1 conf printout:
[fre jan 29 13:29:02 2016]  --- wd:1 rd:2
[fre jan 29 13:29:02 2016]  disk 0, wo:0, o:1, dev:sda3
[fre jan 29 13:29:02 2016] RAID1 conf printout:
[fre jan 29 13:29:02 2016]  --- wd:1 rd:2
[fre jan 29 13:29:02 2016]  disk 0, wo:0, o:1, dev:sda2
[fre jan 29 13:29:02 2016]  disk 1, wo:1, o:0, dev:sdb2
[fre jan 29 13:29:02 2016] RAID1 conf printout:
[fre jan 29 13:29:02 2016]  --- wd:1 rd:2
[fre jan 29 13:29:02 2016]  disk 0, wo:0, o:1, dev:sda2
[fre jan 29 13:29:03 2016] scsi 3:0:0:0: Direct-Access     TOSHIBA  External USB 3.0 0    PQ: 0 ANSI: 6
[fre jan 29 13:29:03 2016] sd 3:0:0:0: Attached scsi generic sg1 type 0
[fre jan 29 13:29:03 2016] sd 3:0:0:0: [sdc] 3907029168 512-byte logical blocks: (2.00 TB/1.81 TiB)
[fre jan 29 13:29:03 2016] sd 3:0:0:0: [sdc] Write Protect is off
[fre jan 29 13:29:03 2016] sd 3:0:0:0: [sdc] Mode Sense: 43 00 00 00
[fre jan 29 13:29:03 2016] sd 3:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[fre jan 29 13:29:03 2016]  sdc: sdc1 sdc2 sdc3 sdc4 < sdc5 sdc6 >
[fre jan 29 13:29:03 2016] sd 3:0:0:0: [sdc] Attached SCSI disk

Note the "Disable of device-initiated U<X> failed" in the beginning. U1 and U2 are USB 3.0 levels of link power management. The kernel tells the drive to stay awake but it does not listen.

This observation matches very good with the sources online where people complains about disks not obeying the configuration programs in Windows they throw at the disk.

Also note that the disk comes back under another letter, it goes from /dev/sdb to /dev/sdc. Some time when it went down into sleep mode it did not come back at all. It was gone, like someone had pulled the cord. lsusb did not see it, nor did dmesg report anything about reconnection. The only way to get the correct letter back was to reboot it. Nothing of this makes mdadm happy and every sleep attempt resulted in a need of a RAID recovery process.

Some has contacted Toshiba and got in reply that this very aggressive power saving mode cannot be tamed. The only more well-known way to get around this problem is to have something read and maybe also write to the disk periodically, with a maximum period length of 5 minutes. Yes, five minutes - not even ten. After five minutes it goes to sleep.

In Windows there are programs like NoSleepHD and KeepAliveHD that writes a small amount of data to the disk every X minutes.

Of course this workaround is also possible to achieve in Linux.  Here are two suggestions to put into /etc/crontab in Debian - use only one of them:

# read from beginning of disk every third minute
*/3 * * * * root dd if=/dev/sd<letter for the toshiba disk> of=/dev/null count=1 bs=1 &> /dev/null

# make an empty file every third minute
*/3 * * * * root /usr/bin/touch /mounted/partition/on/toshiba/disk/keepalive &> /dev/null

The first one seemed to last for almost a night until the disk disconnected itself. My guess is that the computer got busy and made 3 minutes to 5 minutes. Or maybe you need to write too to get it to stay awake.

This workaround to keep the disk busy is adding unnecessary wear to it and also costs money on the electrical bill. It is quite ironic how an aggressive power saving feature is actually making the end-user to consume more power in order to fight the power saving feature. Good job, Toshiba.

Connecting the disk to an USB 2-port did not show these problems, it kept staying awake. Sadly it reduced the speed. From 100 MB/s to around 30 MB/s, which makes it unfunny to add to the RAID1 anyway if you have normal SATA-disks too as it probably slows down writing operations having a so slow device attached.

A more sophisticated way to approach the problem is by parameters - to actually tell the disk firmware to stop it's sleepyhead behaviour. That's where hdparm in Linux comes into the scene.

hdparm -S0 /dev/sd<X> should disable the standby spindowns completely. It does not complain when submitting the command, but it does not listen either. After a few minutes of idle time it reported that it was in standby mode by asking through hdparm -C /dev/sd<X>.

hdparm -B254 /dev/sd<X> should tell the disk to delay the advanced power management (APM) handling to the maximum value, which is about 5.5 hours. But it does not listen to that either. Maybe you get some more time but it disconnected anyway after just a nights run with a lot of disk intensive cron jobs.

hdparm -B255 /dev/sd<X> which should disable APM completely does not work either. It looked almost as it worked, but no, it disconnected anyway. Another thing to note is that issuing the command results in a rush of load cycle counts, that is when the disk parks its heads. You may check with smartctl -A -d sat /dev/sd<X>. For some reason this drive is very aggressive with it's head parking mechanism when APM is disabled. 

When the drive had been on for 91 hours it had a load cycle count of 1550.
Another internal 2.5 disk from Hitachi has been on for 5288 hours and had a load cycle count of 601.

1550 / 91 = 17 load cycles per hour Versus 601 / 5288 = 0,11 load cycles per hour or 5288 / 601 = 8,8 hours between every load cycle. 

Not knowing how many load cycles the disk can sustain let's say 600 000.

600 000 cycles / 17 per hour =>  35294,117647059 hours / 24 hour for a day => 1470,588235294 days / 365 days for a year = 4 years of disk life limitation

Another interesting dmesg outputs are the following which comes from this disk during a sleepyhead disconnection runaround:

[mån feb  1 16:21:14 2016] usb 2-1: new SuperSpeed USB device number 3 using xhci_hcd
[mån feb  1 16:21:14 2016] usb 2-1: New USB device found, idVendor=0480, idProduct=a100
[mån feb  1 16:21:14 2016] usb 2-1: New USB device strings: Mfr=2, Product=3, SerialNumber=1
[mån feb  1 16:21:14 2016] usb 2-1: Product: External USB 3.0
[mån feb  1 16:21:14 2016] usb 2-1: Manufacturer: TOSHIBA
[mån feb  1 16:21:14 2016] usb 2-1: SerialNumber: --------------------
[mån feb  1 16:21:14 2016] usb-storage 2-1:1.0: USB Mass Storage device detected
[mån feb  1 16:21:14 2016] scsi3 : usb-storage 2-1:1.0
[mån feb  1 16:21:15 2016] scsi 3:0:0:0: Direct-Access     TOSHIBA  External USB 3.0 0    PQ: 0 ANSI: 6
[mån feb  1 16:21:15 2016] sd 3:0:0:0: Attached scsi generic sg1 type 0
[mån feb  1 16:21:15 2016] sd 3:0:0:0: [sdc] 3907029168 512-byte logical blocks: (2.00 TB/1.81 TiB)
[mån feb  1 16:21:15 2016] sd 3:0:0:0: [sdc] Write Protect is off
[mån feb  1 16:21:15 2016] sd 3:0:0:0: [sdc] Mode Sense: 43 00 00 00
[mån feb  1 16:21:15 2016] sd 3:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[mån feb  1 16:21:15 2016]  sdc: sdc1 sdc2 sdc3
[mån feb  1 16:21:15 2016] sd 3:0:0:0: [sdc] Attached SCSI disk
[mån feb  1 16:29:28 2016] xhci_hcd 0000:00:14.0: remove, state 1
[mån feb  1 16:29:28 2016] usb usb2: USB disconnect, device number 1
[mån feb  1 16:29:28 2016] usb 2-1: USB disconnect, device number 3
[mån feb  1 16:29:28 2016] usb 2-1: Failed to set U1 timeout to 0x0,error code -19
[mån feb  1 16:29:28 2016] usb 2-1: Failed to set U1 timeout to 0x32,error code -19
[mån feb  1 16:29:28 2016] usb 2-1: Failed to set U2 timeout to 0x28,error code -19
[mån feb  1 16:29:28 2016] sd 3:0:0:0: [sdc] Synchronizing SCSI cache
[mån feb  1 16:29:28 2016] sd 3:0:0:0: [sdc]  
[mån feb  1 16:29:28 2016] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
[mån feb  1 16:29:28 2016] usb 2-1: Failed to set U1 timeout to 0x0,error code -19
[mån feb  1 16:29:28 2016] usb 2-1: Failed to set U1 timeout to 0x1e,error code -19
[mån feb  1 16:29:28 2016] usb 2-1: Failed to set U2 timeout to 0x28,error code -19
[mån feb  1 16:29:28 2016] xhci_hcd 0000:00:14.0: USB bus 2 deregistered
[mån feb  1 16:29:28 2016] xhci_hcd 0000:00:14.0: remove, state 1
[mån feb  1 16:29:28 2016] usb usb1: USB disconnect, device number 1
[mån feb  1 16:29:28 2016] usb 1-2: USB disconnect, device number 2
[mån feb  1 16:29:28 2016] xhci_hcd 0000:00:14.0: USB bus 1 deregistered

"Failed to set U1 timeout to xxx,error code -19". Error code 19 is device not found according to:
http://www.virtsync.com/c-error-codes-include-errno

Definitions of U1 and U2 states are found here:
http://www.eightforums.com/tutorials/50276-power-options-add-remove-usb-3-link-power-mangement.html

U1 is standby with fast exit and U2 is standby with slower exit. The disk should first try U1, and then U2 if correctly implemented, and if it gets requests from the operating system to go back to U0, which it does, it should obey. My guess is that it falls down through U1 to U2 within a mad limit of five minutes of idle time.

The problem is quite clear - the firmware in Toshiba Canvio disks does not listen to requests to turn off USB 3.0 power management and the power management implemented is NOT following the specifications of USB 3.0 link power management.

More on this and some possible solutions:
http://forums.toshiba.com/t5/Computer-Accessories/Canvio-sleep-function/td-p/347620
http://chrisjrob.com/2015/11/27/intermittent-usb3-drive-mount-continued/

More on link power management in USB 3.0:
https://www.kernel.org/doc/Documentation/usb/power-management.txt
https://msdn.microsoft.com/en-us/library/windows/hardware/dn379332(v=vs.85).aspx

Stay away from Toshiba Canvio Alu if you are planning to use it for RAID setup or operating system usage in Debian 8.3 with USB 3 connections. I also suggest you stay away from the whole Toshiba Canvio series too as this sleeping problem seems to cover the whole disk family.

Update 2019-06-09 13:48:00

I can confirm that Western Digital Elements Portable 2TB works well as a replacement, it does not spin down, used in RAID for 3 years now. One thing to note is that this drive also parks its heads excessively and it does not listen to hdparm.

What does work, however, is to manage APM settings through smartctl - I only managed to turn it off.

Get the APM setting: smartctl -g apm /dev/sdX
Set the APM setting: smartctl -s apm,off /dev/sdX (replace off with 1-127 to allow spin down or 128-254 to not allow)
This is a personal note. Last updated: 2019-06-09 14:06:00.