Server not writing unicode names correctly (b/c of utf8?)

Need help with FileZilla Server? Something does not work as expected? In this forum you may find an answer.

Moderator: Project members

Message
Author
Puritan
504 Command not implemented
Posts: 6
Joined: 2006-05-07 05:32

Server not writing unicode names correctly (b/c of utf8?)

#1 Post by Puritan » 2006-05-07 05:34

I'm using Filezilla client and Filezilla server to transfer files between Windows computers, including a number of files with Unicode foreign characters in their filenames. In previous versions, the files would transfer with no problems, but now in the newest versions of Filezilla, Unicode filenames come out as garbled ASCII garbage on the other side. I don't know what's causing this, but I'm guessing that it's the fact that Filezilla now defaults to using UTF-8 to transfer Unicode file names, whereas I believe Windows uses UTF-16 (?). Does anyone know how I can disable UTF-8, force UTF-16, or do whatever it takes to restore the functionality of older versions where I could transfer Unicode filenames from one Windows computer to another with no problems? Any help you can offer is appreciated, thanks.

User avatar
botg
Site Admin
Posts: 35509
Joined: 2004-02-23 20:49
First name: Tim
Last name: Kosse

#2 Post by botg » 2006-05-07 09:58

Please make sure you're using FileZilla Server 0.9.16c and FileZilla 2.2.22.

Previously, the local system encoding was used. This was nonstandard behaviour and clients would have to guess the encoding. Now UTF-8 is used in compliance to RFC 2640.
If you using the Site Manage, make sure the UTF-8 option in the advance site settings is either set to "Force" or "Auto".

Another questions, do the files appear properly in Explorer on the server machine?

Puritan
504 Command not implemented
Posts: 6
Joined: 2006-05-07 05:32

#3 Post by Puritan » 2006-05-07 21:27

Yeah, I'm using 2.2.22 and 0.916c.

If I upload a unicode filename from my client to the server, in the Filezilla client window, the filename shows up correctly in the server pane, but in the Explorer window on the server side, it comes up as high-order ASCII garbage (presumably UTF-8, which Windows is not recognizing as unicode). If I rename the files on the server such that they're readable in Explorer, when I connect to them with the client, the client lists the filenames as "?????" and cannot download them or open any unicode-named directories.

UTF-8 is the new default, but is there any way to force it to use Windows' UTF-16 instead?

User avatar
botg
Site Admin
Posts: 35509
Joined: 2004-02-23 20:49
First name: Tim
Last name: Kosse

#4 Post by botg » 2006-05-07 22:16

Internally, FileZilla Server uses wide characters and for sending over / receiving from the network, it converts to / from UTF8. Since the filenames appear correctly in FileZilla, this conversion works.
FileZilla Server used the wide character versions of the file manipulation functions of the Windows API, so here's no additional conversion neccessary.

This basically leaves two possible reasons
- FileZilla 2 does not convert between local charset and UTF-8 properly (as FZ2 uses local charset internally).
- Something on the server system itself is not working properly

To further analyze this, I'll need more information:
- Which Windows version do you have?
- What's the filesystem type of the directory you're uploading to?
- Which language are you using in the filenames?
- Does the latest nightly of FileZilla 3 (http://filezilla-project.org/nightly.php) handle this properly?

Puritan
504 Command not implemented
Posts: 6
Joined: 2006-05-07 05:32

#5 Post by Puritan » 2006-05-07 22:29

I'm using Windows XP with NTFS (on both the client and the server). The filenames are in Korean (although the underlying operating systems are the U.S. English versions).

Hmm, actually, testing it with the nightly build of Filezilla 3 (2006-05-07), the transfer works properly; uploaded files come up correctly in the server's Explorer, and the client can access all of the server's unicode files and directories and downloads them properly. I guess that makes this a client problem, not a server problem, huh? It doesn't work with Filezilla 2.2.22, regardless of whether you set UTF-8 to force, auto, or never (unless there's another setting that can affect this?).

User avatar
botg
Site Admin
Posts: 35509
Joined: 2004-02-23 20:49
First name: Tim
Last name: Kosse

#6 Post by botg » 2006-05-07 23:46

I'll release a fully unicode enabled version of FileZilla in the next few days.

Puritan
504 Command not implemented
Posts: 6
Joined: 2006-05-07 05:32

#7 Post by Puritan » 2006-05-08 01:42

Thanks so much for all your help!

Puritan
504 Command not implemented
Posts: 6
Joined: 2006-05-07 05:32

#8 Post by Puritan » 2006-05-20 21:52

Did you change the UTF support in the newest stable version of Filezilla (2.2.23)? The behavior is slightly different from 2.2.22... instead of unicode filenames on the server being shown as "???????", they're now being shown as high-order ASCII by the client (e.g. "마당.jpg"). I can now access unicode directories notwithstanding the ASCII gibberish (under 2.2.22, I couldn't access the directories at all), but I still cannot download files; the file is created on the client machine with the ASCII name, but the client says there is a "critical transfer error" and only creates an empty file of 0 bytes. I also can no longer upload files at all; whereas under 2.2.22, I could upload a file but it would end up being named ASCII gibberish on the server, now the client cannot upload the file at all, giving me a "too many retries" error.

Was 2.2.23 meant to address this issue, or is this still something you're working on for a future release?

Thanks again so much.

starkwong
504 Command not implemented
Posts: 7
Joined: 2006-05-21 06:16

#9 Post by starkwong » 2006-05-21 06:26

New version of sever has compatibility issues with old clients that doesn't use UTF-8 encoding.

The server now enables UTF-8 by default (and has no way to disable), clients that doesn't use UTF-8 encoding shows files with non-English characters results becomes completely wrong and inaccessible.

And also, the welcome message should not become UTF-8 encoded also because clients have not issued OPTS UTF8 ON yet.

User avatar
botg
Site Admin
Posts: 35509
Joined: 2004-02-23 20:49
First name: Tim
Last name: Kosse

#10 Post by botg » 2006-05-21 09:21

Puritan, I've just released 2.2.23a which fixes a problem with UTF-8 detection. Please try this version. If it still won't work, would it be possible to provide some logs or even better, provide a temporary account on your server so I can test myself?


starkwong, the OPTS UTF8 command is for broken clients who use a different UTF-8 specification (there are two, the one using OPTS UTF8 is the inferior one).

starkwong
504 Command not implemented
Posts: 7
Joined: 2006-05-21 06:16

#11 Post by starkwong » 2006-05-22 02:09

Let me make it clear by showing a screenshot:
Image

ALL Chinese characters become unreadable after using new versions of FileZilla Server.

twu2
425 Can't open data connection
Posts: 45
Joined: 2005-02-26 16:54

#12 Post by twu2 » 2006-05-22 07:15

filezilla server will always send UTF-8 encoding string right now? (I check the CVS code, it look like always convert the string encoding to UTF-8 before send it to client.)

It'll cause problem when client not support UTF-8 (and usually, such client don't know it need to send opts utf8 off).

For compatible, it should stay in non-UTF-8 mode default (in this mode, we need to convert from wide string to ansi/local string in unicode program), and enable it after receive opts utf8 on. (just like other ftpd which support utf-8 ...)

User avatar
botg
Site Admin
Posts: 35509
Joined: 2004-02-23 20:49
First name: Tim
Last name: Kosse

#13 Post by botg » 2006-05-22 09:14

FTP has been designed for 7-bit ASCII in mind. Before UTF-8 support, clients had to guess which encoding to use.
Now since RFC 2640 UTF-8 will be used. It's backwards compatible to 7-bit ASCII and can display all unicode characters.

Please use a client that is UTF-8 aware or don't use non-ascii characters.
For compatible, it should stay in non-UTF-8 mode default (in this mode, we need to convert from wide string to ansi/local string in unicode program), and enable it after receive opts utf8 on. (just like other ftpd which support utf-8 ...)
The OPTS UTF8 command comes from a conflicting and inferior UTF8 specification. FileZilla uses RFC 2640 which is fully backwards compatible.

starkwong
504 Command not implemented
Posts: 7
Joined: 2006-05-21 06:16

#14 Post by starkwong » 2006-05-22 16:29

Actually, there are servers that can handle ANSI by default and UTF-8 when client asks for, this does not generate any problems at the two kinds of clients. This is a problem of telling users to use an UTF-8 compatible clients because there are too many clients that doesn't support it, users will always assume to be a server problem.

If there is no way to disable or changing the behavior in order to follow strictly to RFC, I only have the choice to give up using this great server.

User avatar
botg
Site Admin
Posts: 35509
Joined: 2004-02-23 20:49
First name: Tim
Last name: Kosse

#15 Post by botg » 2006-05-22 18:16

starkwong wrote:Actually, there are servers that can handle ANSI by default and UTF-8 when client asks for, this does not generate any problems at the two kinds of clients.
Which ANSI? If a client from Germany connects to a Chinese server for example it tries to interpret the chinese characters as German. FTP has only been specified for 7 bit US ASCII.
Nonstandardized encoding guessing versus specified and universal UTF-8.
This is a problem of telling users to use an UTF-8 compatible clients because there are too many clients that doesn't support it, users will always assume to be a server problem.
Time to update the clients or to stop using filenames not complying to the old standard (7-bit US ASCII).
If there is no way to disable or changing the behavior in order to follow strictly to RFC, I only have the choice to give up using this great server.
The new RFC for UTF-8 support stricly follows the old RFC. As the old RFC only specifies 7-bit US ASCII and UTF-8 is fully backwards compatible to that (lowest 128 chars are the same as in 7-bit ascii)

Locked