[File Roller] Supporting Unicode Enabled ZIP Archive When Using Info-ZIP Stack



Bug Hint (not reported by me):
https://bugzilla.gnome.org/show_bug.cgi?id=648673

There are basically two kinds of ZIP archive. Those with random file
name encoding (not Unicode enabled) and those with UTF-8 file name
encoding and proper meta data set (Unicode enabled).

UnZip 6.0 (the current latest released version) from Info-ZIP can
extract Unicode enabled archive correctly. However, it's listing
feature would treat any non-ASCII character in file name as '?', even
for Unicode enabled archives. This affects File Roller also so we have
above mentioned bug.

Fortunately, UnZip has a -U option. When dealing with Unicode enabled
archives, it will escape non-ASCII character to #UXXXX or #LYYYYYY. I
already made a working patch for File Roller to utilize this.
https://gist.github.com/4057999

Unfortunately, #UXXXX or #LYYYYYY are also legitimate file names in
ZIP archives and UnZip's -U option doesn't escape literal # currently.
I'm trying to contact the upstream already.
http://www.info-zip.org/phpBB3/viewtopic.php?f=4&t=405

In the File Roller side, we may list the archive twice, one without -U
and one with -U. Then we can determine which # is literal and which #
is for escaping. There is another annoying detail worth noting here,
Vanilla UnZip show exactly one ? for one Unicode character while
patched UnZip (found in at least Arch and Ubuntu) show several ? for
one Unicode character (the number of ? equals to number of UTF-8
bytes).

What do you think?


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]