Copying files without fails

Copying files without fails
by Zepper on 2014-07-13 (#131077)

Hi.

I have an external HD device to store my pictures and movies. I noticed that some movies (files are usually big) are not playing, but stopping & crashing, or doing weird things. I suspect that a copy into my HD isn't being made as it should be. Well, a friend of mine said about plugging it in an USB 1.0 port, but that's not the case as it seems...

Is there a program to copy and verify the file checksum/integrity after copying it???

Re: Copying files without fails
by rainwarrior on 2014-07-13 (#131078)

Maybe try Robocopy? (It's part of Windows already.) https://en.wikipedia.org/wiki/Robocopy

Re: Copying files without fails
by drk421 on 2014-07-13 (#131079)

rainwarrior wrote:

Maybe try Robocopy? (It's part of Windows already.) https://en.wikipedia.org/wiki/Robocopy

Here's a front end for it:

http://www.upway2late.com/projects/winrobocopy

Re: Copying files without fails
by rainwarrior on 2014-07-13 (#131082)

Or wait, does Robocopy actually have a verify feature? On closer look I don't think it actually does... It's been so long since I used it, I'd forgotten. Sorry.

Re: Copying files without fails
by tepples on 2014-07-13 (#131087)

You could always make an archive (tarball, zipfile, disc image, etc.), calculate a hash value, copy the file, and make sure the hash checks out.

Re: Copying files without fails
by Zepper on 2014-07-13 (#131091)

Actually, it's not a good idea of zipping my pictures and movies, simply because I plug my HD & watch pics & videos in another PCs. So, I need a program that copies and checks (CRC?).

Re: Copying files without fails
by Kasumi on 2014-07-13 (#131095)

Teracopy will verify copied files if you ask it to.

Edit: Huh... maybe it's no longer free? I guess I have had it for a pretty long time.

Edit2: Okay, free for non commercial use. Pro version has features I wouldn't use. So recommendation stands I guess.

Re: Copying files without fails
by lidnariq on 2014-07-13 (#131098)

Rsync is the canonical command line lover's option; it apparently does verify all files it copies as well as not re-transferring things already on the target. ( stackexchange: does rsync verify files copied between two local drives? )

Re: Copying files without fails
by rainwarrior on 2014-07-13 (#131106)

Wow, why have I never heard of rsync. Thanks lidnariq!

Re: Copying files without fails
by Zepper on 2014-07-14 (#131118)

rsync = linux ?
Wanna Windows.

Re: Copying files without fails
by rainwarrior on 2014-07-14 (#131119)

There's a Windows port: https://www.itefix.net/cwrsync

Re: Copying files without fails
by lidnariq on 2014-07-14 (#131140)

If you're comfortable without the GUI, there's builds of rsync in both Cygwin and in MSYS (both of which are *n*x-to-windows porting efforts, of varying levels of reimplementing a unixy environment). Interix is bundled with specific versions of Windows, which could let you build rsync for yourself.

Re: Copying files without fails
by koitsu on 2014-07-14 (#131141)

Windows' built-in COPY.EXE does this via the /V flag (verifies results match the source). XCOPY.EXE also provides the same functionality (same flag). All this flag will do is verify that the data written at the destination matches that of the source. If your source is unreliable then you have a different situation to deal with.

If your source (i.e. reads) is returning I/O errors, then there's no point in bothering with generating checksums or any other comparisons because an I/O error is already indicative that your source cannot be trusted (i.e. restore from backups).

If you're just now doing backups, and are trying to do them off of media which is questionable (i.e. already returning I/O errors), then you should copy over what you can, and anything that returns I/O error should be considered lost / unusable. Otherwise if you have existing backups, then the software you used should -- if it's worth anything -- be providing you a way to back up only what's changed.

I can talk for days about this type of situation and how to "try" and recover from it (e.g. possibly make the hard disk usable / not return I/O errors going forward) since I do data recovery for the general populous, but I'm trying to keep it simple (avoiding discussing things like LBA remapping and so on).

rsync (something I use extensively) uses a checksum mechanism to both a) ensure what was copied matches the source, and b) to determine if anything has changed (i.e. comparing destination and source). I do not bother using rsync on Windows (I do not like MSYS/MinGW, and I hate Cygwin even more -- but if I had to choose between the two I'd go with the former) as there are some behaviours that I imagine NTFS and/or FAT/FAT32 cannot mimic (UNIX symbolic links vs. NTFS junctions vs. NTFS symbolic links (yes they are different!), for example).

Re: Copying files without fails
by Movax12 on 2014-07-14 (#131143)

Total Commander can do a post copy verify. http://www.ghisler.com/

Re: Copying files without fails
by DrDementia on 2014-07-15 (#131164)

you could use the command prompt(cmd.exe) and use "fc" it compares 2 files byte by byte. Just to see if your files are corrupted.

Try this command on one of your copied files. This assumes your file is named "file1.avi" and is on c:\ and your external drive letter is e:\.
"fc /b c:\file1.avi e:\file1.avi"

Re: Copying files without fails
by Sik on 2014-07-15 (#131166)

koitsu wrote:

If your source (i.e. reads) is returning I/O errors, then there's no point in bothering with generating checksums or any other comparisons because an I/O error is already indicative that your source cannot be trusted (i.e. restore from backups).

I'm being given the impression that it's the destination drive the one that is having issues (probably needs clarification on the OP's part).

Re: Copying files without fails
by Zepper on 2014-07-15 (#131175)

Sik wrote:

I'm being given the impression that it's the destination drive the one that is having issues.

Correct.

Re: Copying files without fails
by koitsu on 2014-07-15 (#131193)

Ah I see now. Yep legitimate use of /V or the like.

I can only think of a few conditions where a drive would return an I/O error on a write -- particularly if the drive has a tremendous amount of reallocated LBAs to the point of exhausting the spare sector table. I experienced that for the first time on a large disk just a few weeks ago while doing data recovery for someone.

If you can get SMART attributes from the destination drive (a utility like HD Tune Pro, specifically the Health tab, might suffice -- but there's a lot of info that utility doesn't provide that smartmontools/smartctl.exe does) I can tell you if anything looks ominous or if it's something related to a USB/SATA bridge or at some other layer (i.e. "drive looks fine, issue is elsewhere").

Those USB/SATA bridges ("drive enclosures") are such a pain in the butt. So many of them do things that they shouldn't when it comes to ATA protocol. Sad koitsu.

Re: Copying files without fails
by tepples on 2014-07-16 (#131206)

koitsu wrote:

Those USB/SATA bridges ("drive enclosures") are such a pain in the butt. So many of them do things that they shouldn't when it comes to ATA protocol. Sad koitsu.

If you had a blog, you could review these USB to ATA and USB to SATA bridges and tell us which ones suck and why. That way we could tell people not to buy one on the "sad koitsu" list.

Re: Copying files without fails
by Zepper on 2014-07-16 (#131208)

I bet the problem is... Windows 7. The file (.mp4 video) is copied, but fails on playback. The same file is copied into my HDD and no problems on playback.

Re: Copying files without fails
by Movax12 on 2014-07-16 (#131209)

So are you saying Windows 7 randomly corrupts files?

Re: Copying files without fails
by Zepper on 2014-07-16 (#131216)

Only for attached devices, such as my external HDD. I remember of... Disch? saying something about this.

Re: Copying files without fails
by Movax12 on 2014-07-16 (#131218)

Well, in that case I would strongly suspect that the device is failing, your motherboard's USB interface is in question, or you have bad/wrong drivers for the device loaded.

Re: Copying files without fails
by LocalH on 2014-07-16 (#131235)

I remember having issues with a 3.5" IDE drive attached to a Cables-to-Go bridge as the destination for BitTorrent downloads. Downloads would "complete", but re-checking would reveal a handful of blocks wish hash mismatches. From doing research I determined that this paricular bridge was commonly rebadged by a number of companies. Perhaps this is related to your issue?

Re: Copying files without fails
by koitsu on 2014-07-17 (#131253)

Movax12 wrote:

Well, in that case I would strongly suspect that the device is failing, your motherboard's USB interface is in question, or you have bad/wrong drivers for the device loaded.

I already mentioned a program Zepper could use that might give me (some of) the information to determine if the actual drive within the enclosure is bad, or if there is something wonky going on in the middle of all of that.

To recap: HD Tune Pro (NOT the free version; just get the latest and it'll function for 30 days without you having to buy it). I need a screenshot of the Health tab, window resized to show all the attributes, with the proper drive selected.

If no attributes are shown, then the SATA/USB bridge does not permit SMART ATA CDB passthrough, which means it's one of those chips which makes it impossible to actually get visibility into drive-level issues (you'd instead have to take the drive out of the enclosure and attach it to a SATA port directly).

If attributes are shown, then I also need a screenshot of the Info tab, so I know what drive model I'm looking at, in addition to some general drive details. "Decoding" attributes needs to be left to someone who knows how to decode them (i.e. me). How to decode them varies per manufacturer, drive model, and sometimes firmware revision.

But attribute review isn't 100% indicative of a problem or things working correctly; there are things like SMART self-test logs and SMART error logs which HD Tune Pro does not provide visibility into, but utilities like smartmontools do. smartmontools is available for Windows as a CLI program smartctl.exe and can work with some USB/SATA bridges but figuring out what value to pass to -d is painful. Sometimes smartctl --scan can help determine that. But again, not all USB/SATA bridges support SMART ATA CDB passthrough, while to make matters even more annoying, some support it but only certain sub-CDB fields are passed while others aren't (ex. fetching attributes is allowed but examining logs isn't). This is one of the many reasons why I hate those bridges -- they act as a middle-man and often filter or rewrite CDBs.

Re: Copying files without fails
by Zepper on 2014-07-17 (#131271)

Re: Copying files without fails
by Sik on 2014-07-18 (#131277)

Wait, if I'm reading right, that's 88 bad sectors in that drive? o.O Um yeah, unless the adapter is messing with the values I think that drive is pretty much a no-go at this point...

Re: Copying files without fails
by Zepper on 2014-07-18 (#131280)

Using chkdsk now...

Re: Copying files without fails
by lidnariq on 2014-07-18 (#131291)

Sik wrote:

Wait, if I'm reading right, that's 88 bad sectors in that drive?

But they're not user-visible ... they're ones that have been reallocated from the hidden extra portion of the disk to keep you from losing data once they start dying.
It's still a bad sign, because once sectors start going, they tend to continue.

Re: Copying files without fails
by koitsu on 2014-07-18 (#131292)

Sik wrote:

Wait, if I'm reading right, that's 88 bad sectors in that drive? o.O Um yeah, unless the adapter is messing with the values I think that drive is pretty much a no-go at this point...

No offence intended, but no. And this is why I said (quoting myself):

koitsu wrote:

If attributes are shown, then I also need a screenshot of the Info tab, so I know what drive model I'm looking at, in addition to some general drive details. "Decoding" attributes needs to be left to someone who knows how to decode them (i.e. me).

Re: Copying files without fails
by koitsu on 2014-07-18 (#131297)

Zepper wrote:

Using chkdsk now... ;)

CHKDSK will not fix this problem. All that's going to do is potentially make your situation worse depending on how important the data is on the drive right now. *sighs* The only option you have given this drive's bad condition is to open up an RMA with Seagate and have the product replaced. The enclosure and the drive are a single product. Seagate, assuming the product is still under warranty (their site can determine this for you), will replace the entire product free of charge.

As for the individual attributes that are of concern -- and I'm having to go off of what HD Tune Pro shows rather than smartmontools, so my ability to reliably decode this is somewhat limited -- they are decoded below. Please ignore the "warning"/yellow labels in HD Tune Pro -- the author of this software does not fully understand that a non-zero value in some attributes DOES NOT indicate trouble (furthers my point about people needing to know how to decode the data properly). Also be aware that assuming SMART attributes are all zeroed from the factory on new drives is false.

Also be aware that even the descriptions of SMART attributes on Wikipedia are wrong. For example, attribute 197 says "If an unstable sector is subsequently read successfully, the sector is remapped and this value is decreased" -- that is completely false for many models of drives (ex. all Western Digital drives I've ever used and analysed, including present-day ones). So you can't entirely trust that either.

Attribute 1 (0x01) -- with Seagate drives sometimes this attribute is vendor-encoded and other times its a mix between a "rate" and a counter. Therefore, sometimes a non-zero value here can indicate repeated re-read attempts done by the drive itself (the storage layer has no idea what's going on under the hood) where eventually the drive is successful in reading data from a physical sector. Whether or not that's the case on this specific model I do not know (I'm not familiar with this exact model).

Attribute 5 (0x05) -- indicates there have been 88 successful LBA remaps during its power-on lifetime. An LBA is simple an arbitrary number that acts as a pointer to a physical sector on the disk. Disks made in the past 15-20 years use LBA addressing, thus computer accesses data on a drive via an LBA, not a sector (the OS has no concept what physical sector an LBA points to). So what this counter indicates is that there have been 88 events where an LBA points to a spare sector. More on what that actually "means" down below.

Attribute 9 (0x09) -- indicates the number of power-on hours of the drive. On this model, the counter represents hours, thus 1613 hours =~ roughly 67 days of power-on time. This is a fairly new, or at least fairly unused drive.

Attribute 11 (0x0B) -- indicates the drive has had 5 physical recalibration events during its power-on lifetime. Because this is a portable drive, this is more common/more likely than if it was a drive in a stationary system (ex. desktop). However, I would classify 5 full recalibrations during such a short power-on lifetime is an indicator of something physical going on within the drive. It may have been dropped, jostled or incorrectly assembled at the factory (despite QC/QA). All are possibilities, and all are speculative.

Attribute 191 (0xBF) -- indicates the drive has had 5 shock events during its power-on lifetime. This number correlates with Attribute 11 above. "Shock events", or "G-shock", are indicators that the drive itself was dropped or jostled while it was on. (The drive cannot count these type of events when power is off). One of the problem with portable drives is that their G-shock sensors are extremely sensitive; I've seen these attributes increment on 2.5" Western Digital drives in laptops simply by someone picking the laptop and putting it back down on a flat desk. They're very sensitive. However, these kind of physical movements can in fact jostle the actuator arm and heads to the point where misalignment can take place. Remember: hard disk R/W heads [url]are literally floating a few nanometres above the surface of the platters[/url]. Whether or not these physical events caused damage to the platters, inducing LBA remaps, is impossible for me to tell (especially with HD Tune Pro; I might have a better idea with smartmontools).

Attribute 194 (0xC2) -- this value is vendor-encoded on Seagate drives and cannot be decoded using HD Tune Pro. I believe smartmontools can decode this. This value should be ignored for this analysis.

Attribute 195 (0xC3) -- most Seagate drives have this attribute as non-zero and is vendor-encoded. This is the first time I have seen a Seagate drive show a 0 value for this attribute. I'm noting it here because it's a good indicator of how each drive model and firmware version change in behaviour vs. comparative models. Normally this attribute indicates a count or possibly a rate of how often sector-level ECC has to be used to autocorrect data read from a sector (each actual sector on a hard disk contains an ECC region, alongside data and some metadata).

Attribute 196 (0xC4) -- Relates to attribute 5, indicating that there were 88 reallocation event counts. Note that this number does not necessarily have to equal that of attribute 5; this is an "event count", which does not necessarily guarantee an actual LBA remap. HD Tune Pro mislabels this attribute, sadly.

Attribute 197 (0xC5) -- indicates there are 64 "suspect" LBAs there are pending analysis. This explanation is long, so get some coffee or whatever.

During a read operation, a drive can have problems reading data from an LBA (which points to a physical sector); dust on the platters, head misalignment, spindle motor problems, actuator arm is slightly off, the list is endless. The drive internally (OS has no idea) will attempt to re-read the LBA an arbitrary number of times (varies per firmware implementation), and once reaching a retry threshold, will mark the LBA "suspect" and move on.

"Suspect" means the LBA can no longer be read by anything -- including the OS. You'll receive an I/O error when attempting to read from it. It does not mean the physical sector the LBA points to is bad/unusable, it just means that at that moment in time the drive could not read data from that LBA (and the drive will no longer let anything read that LBA).

The data at that LBA is effectively lost. You cannot get that data back, aside from one possibility: taking the drive to a data recovery company (specifically one who does physical data recovery, as in moves the platters to a donor drive or takes physical repair action). This requires you have issued absolutely NO WRITES to the drive. It's very hard to guarantee that too, because Windows writes crap to a disk under the hood all the time, you have no idea what it's doing. And I'll explain why the "DO NOT WRITE TO THE DRIVE!" matters:

A "suspect" LBA is only re-analysed (to determine if the sector the LBA points to is actually good/usable or not, or if the LBA should be mapped to a spare sector) on a write. So in some cases, yes, you can literally write to all the "suspect" LBAs on a drive and the number shown in attribute 197 will decrease as each sector at that LBA is deemed usable. Of course because you're writing data to the LBA, if successful, the data you just wrote will (naturally) overwrite whatever was there, but at least there wasn't an LBA remap. (Also in the case of an LBA remap or no LBA remap, attribute 196 will not decrement, hence why it's an "event counter" rather than a "remap counter").

A common technique I use to "test" drives in states where there are a very low (say 1 or 2) number of "suspect" LBAs is to simply write zeros to the LBAs which cannot be read (figuring out which LBA numbers to use is something the drive itself can do, believe it or not -- a SMART selective test, and on some drives a SMART short or SMART long test, can be used to get LBA numbers per results in the SMART self-test log -- HD Tune Pro does not support this, and it's a tedious/complex operation I won't describe here, but I use it regularly to do data recovery for people).

Now you see why ANY writes to the drive can potentially mean data loss if you are in fact wanting data recovery, particularly if the write hits a "suspect" LBA.

I'll use this opportunity to point something out: LBA numbers shown in OSes/within software (particularly on Windows) do not always map 1:1 with the LBA numbers used by a drive. They SHOULD map 1:1, but I have personally experienced many occasions where the OS has claimed LBA xyz is unreadable when in fact the LBA is some arbitrary number lower or higher than what the OS claims. I believe this is caused by storage drivers (ex. SATA/AHCI drivers) which use NCQ to report the incorrect LBA number (i.e. a driver bug). This is why I prefer to use the drive's own analysis tools (at the SMART level) to give me numbers.

And one more thing, more relevant I think: determining what file on a filesystem uses what LBA number is extremely painful on Windows (on Linux and FreeBSD it's a bit easier, but it depends on the filesystem (ext3 vs. reiserfs vs. UFS/FFS vs. ZFS)). Windows is a complete disk about this. There are speculations that fsutil can provide this on Vista or 7 or 8 (not XP), but the few times I've used it the numbers its given are wrong / don't match reality. So I think it might actually be giving NTFS cluster offset/number, which IS NOT the same as an LBA.

The best thing to do to find out what files are impacted by "suspect" LBAs is to use a file copy utility (not a filesystem or partition or disk copy utility). Files read which return I/O errors are obviously impacted, and the utility should obviously give you the filename.

I hope this explains why using verification utilities when writing data to the drive are now questionable -- meaning: sure, you can write a 300MByte file to a drive successfully, but it doesn't mean you're necessarily going to successfully read all that data back (yes, it's true: a write can succeed where a read of the data LBA fails. My above explanation about "suspect" LBAs and how to reanalyse them should explain why/how that's possible).

Attribute 198 (0xC6) -- indicates how many failed LBA remaps there have been. This is particularly common if a drive has undergone extensive LBA remapping and has run out of spare sectors (uncommon but does happen, especially on 4K / 4096-byte sector drives). This value is 0 so that's good, it just means there haven't been any failed remaps.

Attribute 200 (0xC8) -- write version of attribute 1. Won't go into this for the same reasons as describe in attribute 1.

Attribute 223 (0xDF) and attribute 225 (0xE1) -- I'm tired and am opting out of explaining these... sorry.

There is also one more situation that people have speculated about: bit rot. I've personally nor professionally ever encountered this (usually I can explain sudden checksum failures when using ZFS for example; I can correlate them to SATA or SCSI or SAS events), but I do believe it's a strong possibility given how magnetic media works. But do not be inclined to believe SSDs are somehow better in this regard -- SSDs have their own sets of major problems that MHDDs don't. I don't want to get into a talk about that, but search Slashdot for "SSD" sometime and read the analysis done by some folks. Also, don't trust things you read on "enthusiast" websites (i.e. gamer-fuelled hardware review sites) -- these guys often have no idea what the fuck they're talking about, and that includes occasionally dudes like Anand Lal Shampi (guy who runs Anandtech -- note the guy started the site when he was 14 years old... yeah, great, a 14 year old doing hardware reviews... I know he isn't 14 now, duh, but still...). Proper reviews and analysis have to be done by actual engineers; "enthusiast" sites often "talk tech" but have no fucking idea how something actually works under the hood. When it comes to hard disks, particularly IDE/PATA/ATA/SATA/AHCI, I'm one of the few who does. (What I don't fully understand are the physical characteristics, because I'm not a hardware engineer)

I'm certain this very long explanation will induce a billion questions of all sorts (I can see Tepples writing up some enormous 200-page inline response), just know that I can't/won't really answer a lot of them because drives are very complex and it's a tedious process for me to explain it all with text/typing. I've done software-level data recovery for a long time (I speak/read ATA protocol and have worked on some of the FreeBSD ATA and AHCI subsystem drivers, although not at very deep levels) so that's where I get the education on this -- and a lot of people who do the same thing do it wrong/badly because they don't understand how drives work (or the fact that different models and vendors of drives behave differently; ex. WD drives do not behave the same with regards to many of the above as Seagate drives (I have more experience with WD drives)).

Anyway... the reason you're getting back anomalies during verifications after writing data to the drive is because while your writes worked fine, reading the same LBAs you just wrote to fail / are marked "suspect" by the drive.

So, long story short: don't bother copying anything to that drive any longer. Any data you have on it right now which you want to keep, copy off to another drive/somewhere else (and any I/O errors when reading that file means that you should ignore that file -- it's been lost, I hope you have other backups of it :) ), then do an RMA with Seagate to get a replacement product. That's the simple answer. There is no point in trying to "save" this drive given the behaviour/description of the problem -- it will perpetually be like this forever.

If you want to see the (probably hundreds by now?) examples of me assisting in drive problems, Google "koitsu dslreports drive" or "koitsu dslreport disk" and sift through them all. I even tell stories of data recovery, with data if you're into that sort of thing.

Re: Copying files without fails
by tepples on 2014-07-18 (#131298)

Thanks for the detailed explanation, koitsu. Here's my 200 page response:

@k:
Buy another drive. Send this one back to the manufacturer if under warranty. Next time, for especially valuable data, make backups with forward error correction (such as PAR2) that can reconstruct the data in "suspect" sectors. And keep a backup set off-site.
.res 200*256 + @k - *

Re: Copying files without fails
by Zepper on 2014-07-18 (#131301)

I'm doing a CHKDSK and... it started at morning. Now it's night time and still running.

I don't know if works, but that's what I have for today.

The veredict isn't "buy a new HDD", but which one could be better? In the past, I used DVDs for backups, then I bought this HDD and everything's there. More than 10 years of backups.

edit: removed non-sense.

Re: Copying files without fails
by koitsu on 2014-07-18 (#131319)

The company doesn't really matter. There's a semi-recent... uh... "study" (*cough* if you can call it that *cough*) that indicates Seagate is abysmal compared to others, but it's all anecdotal if you ask me. Go with whatever brand you want / meets your budgetary constraints.

If you're asking me for my own experiences, I can tell you that I am incredibly happy with Western Digital products, including a 1TB My Passport drive (its USB/SATA bridge is worthwhile + allows full SMART passthrough, and it even has SES, which is pretty unique/rare). It's also USB 3.0 so that's an added bonus (if I ever end up going to Windows 7).

The timing of all of this is kinda funny though:

I ran Parodius for almost two decades nearly exclusively on Western Digital disks (we did have a couple Maxtors at one point) and during that time only had 4 separate disk issues (2 were drives that started going bad and within 48 hours died completely, the other 2 were an excess of "suspect" LBAs that grew to a point where I didn't feel comfortable using the drives any longer + had them RMA'd -- I used ZFS raidz1 (think RAID-5 but with checksumming and automatic data repair if issues were found)). The disks I bought were all "consumer grade"; in my experience there is no real difference between "enterprise grade" and "consumer grade" disks other than one thing: some enterprise disks have better shock absorption, which can matter a lot when using them in a very large SAN (say 16 disks per shelf), but for me it's not worth the 2-3x cost markup.

Outside of Parodius I've only had 1 or 2 disk issues, which is why I do backups.

The part that's funny is that the most recent issue happened 4 days ago -- my WD3003FZEX (3TB) used for backups, and had only been used for about 3-4 months, started throwing read errors, and all my analysis on it showed the situation would just get worse, so I Advance RMA'd it with WD. The replacement arrived yesterday and is good shape; I just finished testing it about an hour ago.

So my experience/track record with Western Digital drives has been excellent, but keep in mind I tend to prefer drives that only use single platters (e.g. present-day WD drives that do not exceed 1TB in size). More platters == more heads and actuator arms == higher chance of something going wrong (yes, I try to apply KISS principle to hard disks :-) ). The WD1003FZEX (1TB) drives use single 1TB platters thus 2 heads, so I really like them. The WD3003FZEX I have, on the other hand, has 4 platters (4x750GB = 3TB) thus 8 heads, and as such I'm not surprised it was the one which started developing issues.

I have colleagues who swear by Seagate because their experience with WD has been abysmal. So like I said, it just varies per person. There are only two brands I strongly recommend you avoid: Fujitsu (I dealt with SCSI disk failures of theirs constantly at my old job, I'm talking 2-3 a week) and OCZ (who only does SSDs; and they're going byebye soon anyway since Toshiba just bought them for a hilariously low US$35M). I'm not particularly fond of present-day Seagate drives due to their confirmed and repeated firmware bugs and bad engineering choices (like excessively parking heads, although the WD Green drives do that too, which is why I also avoid those), nor am I fond of Samsung (confirmed firmware bugs). That's all just based on my personal and professional experience though.

With regards to your currently-going-bad Seagate drive, I'd recommend just doing an RMA if you can. I know you're in Brazil but I believe they have a Brazil office and can do RMAs there. You already spent the money on the product, might as well get the replacement rather than essentially throwing away money. But if you want to try a different brand of portable drive, I really like the WD My Passport drives.

Re: Copying files without fails
by Zepper on 2014-07-21 (#131364)

One more thing: in Windows 7, "ejecting" the HDD before pulling out the USB plug is a "must do" or a "should do"?

Re: Copying files without fails
by tepples on 2014-07-21 (#131372)

I'd say "must do" when backups are involved. I'm not familiar with the guarantees that Windows Explorer makes on having forced the operating system to commit its cache of data to be written to disk. And some drives themselves have their own internal cache.

Re: Copying files without fails
by TmEE on 2014-07-22 (#131400)

If you disable write-behind caching it goes into the "should do" category. Otherwise it is "must do", especially if you got any virus scanners... those tend to make the buffer commits happen whole lot later from my experience.

Re: Copying files without fails
by koitsu on 2014-07-22 (#131408)

What TmEE said is correct -- that feature mainly matters if write caching is enabled (in Windows) for the device or not. If it isn't, then it's usually safe to (physically) unplug the drive whenever you want. But if it's enabled you'll want to ensure you use the "eject" thing every time, otherwise pending/cached writes will not be flushed to the drive, and you'll end up eventually with a corrupt or damaged filesystem.

Keep in mind, however, there are multiple places where write caching can be used. For example there could be a caching layer within the filesystem driver, a separate layer within the USB driver (or IDE or AHCI driver if using those), and finally the actual drive itself (handled by the drive firmware + utilising its own on-board cache). I'm under the impression that when write caching is enabled on the device in Windows, "ejecting" the drive causes Windows to basically ensure any caches used at the filesystem and/or USB/IDE/AHCI layer are issued to the drive in advance.

For the deepest layer (the drive itself), there is actually an ATA/SCSI command that can be issued to the drive that is *supposed to* cause the drive to write all its cached data to the platters/media (in ATA it's FLUSH CACHE EXT or FLUSH CACHE, and the commands are supposed to only return when they have finished the operation). I say *supposed to* because on MHDDs this is usually reliable/truthful, while there have been anecdotal "studies" done on SSDs in the past year or two showing that many SSDs lie (e.g. FLUSH CACHE EXT returns very quickly but the data has not been fully written to NAND cells before power is removed). There's speculation that the issue lies with not having or using large capacitors that can hold enough voltage to keep the drive alive for a very short period after physically losing power (giving it a chance to fully flush things to NAND cells). Do MHDDs have this? Yes, to some degree (which is how that SMART attribute can be updated/incremented/allow tracking of such events. SMART attributes, by the way, are actually written to a special area on the hard disk that you can't normally access, probably a subset of the HPA region even if HPA isn't actively in use).

All that said, regardless of the cache setting in Windows: "ejecting" the drive/device in Windows can also cause the underlying storage subsystem layer to submit things like ATA STANDBY or SLEEP CDBs to the underlying drive, giving it a chance to not only fully flush its cache, but to also "properly shut down" before the device is physically unplugged. There are some drives which are more sensitive to this needing to be done than others. Some will increment attribute 192 (0xC0), others will increment a different attribute (varies per manufacturer, model, and firmware). In my experience it's often 2.5" MHDDs which are sensitive about this.

TL;DR -- it's best to get in the process of doing the "eject" method every single time, just to be safe/cautious, but if you have write caching disabled and/or don't particularly care about the latter, then just physically unplugging is okay. I personally got in the habit of "ejecting" after Windows one day, despite write caching being disabled, popped up a message about how not all data had been fully flushed/written to the USB-attached drive I was using before I had pulled it. I was like "OH REALLY? THANKS FOR LYING TO ME THEN". (Review of the drive showed that it did in fact write all the data, so I still don't know what Windows was complaining about, but it did not sit well with me regardless).

Re: Copying files without fails
by tepples on 2014-07-22 (#131415)

koitsu wrote:

I personally got in the habit of "ejecting" after Windows one day, despite write caching being disabled, popped up a message about how not all data had been fully flushed/written to the USB-attached drive I was using before I had pulled it. I was like "OH REALLY? THANKS FOR LYING TO ME THEN". (Review of the drive showed that it did in fact write all the data, so I still don't know what Windows was complaining about, but it did not sit well with me regardless).

I remember reading about a "currently mounted" bit in the file system header that's turned on when a file system is mounted and turned off when it is unmounted or otherwise cleanly synced. Under Windows 98, booting from a "currently mounted" drive triggered an automatic ScanDisk. I'm under the impression that if the "currently mounted" bit is set on NTFS, Windows will replay the journal to make the metadata consistent, and if it's set on a non-journaling file system, Windows will female dog at the user in the way you describe. Perhaps all the data and metadata got written but the "currently mounted" bit didn't get turned off.

Re: Copying files without fails
by koitsu on 2014-07-22 (#131418)

While what you say is true, I don't think I did a good job articulating what happened. Meaning, in response to your last two lines: no, it would only be able to detect that situation on a re-mount. So let me explain clearly what happened and why as a result I have used "eject" consistently:

1. Attached USB-based (SATA) hard disk to system. Windows is configured for this device to have write caching DISABLED.
2. Did a bunch of I/O over the course of 15-20 minutes.
3. Did more I/O, but only on a specific file.
4. Was finished with device, so unplugged from USB port.
5. Windows systray immediately popped up a message (commonly called "toast" or "a toast") blabbing about how the device removed from the system may have lost data/issues because there was pending data to write to the device.
6. Reattachment did not show any anomalies, but this may be due to NTFS journal replay or whatever.
7. Review of file in step #3 showed everything I expected, i.e. journal replay did not "cause" loss of data.

So like I said, the write caching setting in Windows for the device is very important, however there is obviously some part of Windows (at least on XP -- and like my other threads, I'm not getting into a discussion about that, end of story) that still seems to cache some form of I/O to the device despite the setting implying otherwise. Hence, I use "eject" consistently every time, no matter what the write caching setting is set to.