[File Roller] Supporting Unicode Enabled ZIP Archive When Using Info-ZIP Stack
- From: Ma Xiaojun <damage3025 gmail com>
- To: desktop-devel-list gnome org
- Subject: [File Roller] Supporting Unicode Enabled ZIP Archive When Using Info-ZIP Stack
- Date: Mon, 12 Nov 2012 01:44:35 -0600
Bug Hint (not reported by me):
https://bugzilla.gnome.org/show_bug.cgi?id=648673
There are basically two kinds of ZIP archive. Those with random file
name encoding (not Unicode enabled) and those with UTF-8 file name
encoding and proper meta data set (Unicode enabled).
UnZip 6.0 (the current latest released version) from Info-ZIP can
extract Unicode enabled archive correctly. However, it's listing
feature would treat any non-ASCII character in file name as '?', even
for Unicode enabled archives. This affects File Roller also so we have
above mentioned bug.
Fortunately, UnZip has a -U option. When dealing with Unicode enabled
archives, it will escape non-ASCII character to #UXXXX or #LYYYYYY. I
already made a working patch for File Roller to utilize this.
https://gist.github.com/4057999
Unfortunately, #UXXXX or #LYYYYYY are also legitimate file names in
ZIP archives and UnZip's -U option doesn't escape literal # currently.
I'm trying to contact the upstream already.
http://www.info-zip.org/phpBB3/viewtopic.php?f=4&t=405
In the File Roller side, we may list the archive twice, one without -U
and one with -U. Then we can determine which # is literal and which #
is for escaping. There is another annoying detail worth noting here,
Vanilla UnZip show exactly one ? for one Unicode character while
patched UnZip (found in at least Arch and Ubuntu) show several ? for
one Unicode character (the number of ? equals to number of UTF-8
bytes).
What do you think?
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]