Skip to content

Real world XLS format

Franco Corbelli edited this page Sep 2, 2023 · 1 revision

One of the fundamental reasons of zpaqfranz is to manage xls files: it may seem trivial, but it is not.

Excel in all versions (at least until 2019) has a particular behavior: when you open an xls file and then close it (even without any changes) it changes some bytes inside (metadata) WITHOUT touching the file.

So if the file is modified (let's say) on 17-07-2021 @ 17:47, it will remain modified (in the filesystem) on 17-07-2021 @ 17.47, even if its binary content has changed

17/07/2021  17:47            25.600 test.ok
17/07/2021  17:47            25.600 test.xls

Programs that rely on the date and time to decide whether to make a new copy (including robocopy, rsync, zpaq, zpaqfranz) cannot understand that the file has been changed (as in this example)

Z:\>c:\nz\sha1deep64 test.*
500f2a718a4bb1babf185e748ecef1febf83ae78  Z:\test.ok
393e24455d50ddd8c695d6ecdbfc0156dc2c0e6b  Z:\test.xls

Trying a binary or hashed comparison, it turns out that the copied file does NOT correspond to the present one: the check FAILS.

!

This is extremely bad for a storage manager: copy verification can fail depending on whether the original XLS files are opened and closed without any changes!

With other programs (rsync, robocopy) a DOUBLE pass is required: the first to copy all the files "intelligently" (based on the modified date and time), the second "stupid" (copy ALL XLS files). This is obviously a big hassle, as well as slowing down considerably in the case of large amounts of data.

zpaqfranz, by default, always adds XLS files to archives, regardless of when they are modified (can be disabled by -xls).
The r (robocopy) command also works in particular with XLS files, which are carefully checked to make sure the copy is correct.

Did you know this peculiarity of Office? It took me a lot of work to find out, really a lot

Clone this wiki locally