Server not writing unicode names correctly (b/c of utf8?)

Need help with FileZilla Server? Something does not work as expected? In this forum you may find an answer.

Moderator: Project members

Message
Author
Puritan
504 Command not implemented
Posts: 6
Joined: 2006-05-07 05:32

#16 Post by Puritan » 2006-05-23 02:10

botg wrote:Puritan, I've just released 2.2.23a which fixes a problem with UTF-8 detection. Please try this version. If it still won't work, would it be possible to provide some logs or even better, provide a temporary account on your server so I can test myself?
Yes, 2.2.23a works perfectly! Thanks so much.

starkwong
504 Command not implemented
Posts: 7
Joined: 2006-05-21 06:16

#17 Post by starkwong » 2006-05-23 05:16

botg wrote: Which ANSI? If a client from Germany connects to a Chinese server for example it tries to interpret the chinese characters as German. FTP has only been specified for 7 bit US ASCII.
Nonstandardized encoding guessing versus specified and universal UTF-8.
The problem is although the Germany clients can't see what the ASCII means, they can still enter the directory when using the correct ASCII name (Instead, if they use some Chinese viewing tools, they can see the correct name).

However under the current implementation, even if Chinese clients connects to Chinese server, they can't see the names correctly, and can't enter the directories, this is the problem.

Moreover, this breaks support for all download accelarators such as FlashGet, because none of these clients use UTF-8 to handle FTP requests (Filename contains ? characters).

User avatar
botg
Site Admin
Posts: 35538
Joined: 2004-02-23 20:49
First name: Tim
Last name: Kosse

#18 Post by botg » 2006-05-23 06:45

If a client does not support UTF-8, it should still possible to browse the server. Unless of course the client has some more bugs like trying to interpret the characters > 127.
Moreover, this breaks support for all download accelarators such as FlashGet, because none of these clients use UTF-8 to handle FTP requests (Filename contains ? characters).
If these so called "download accelerators" replace high ascii characters with question marks, they are incredibly broken. Feel free to mail them tRFC959 and RFC2640.

starkwong
504 Command not implemented
Posts: 7
Joined: 2006-05-21 06:16

#19 Post by starkwong » 2006-05-23 12:12

botg wrote:If a client does not support UTF-8, it should still possible to browse the server. Unless of course the client has some more bugs like trying to interpret the characters > 127.
No, you don't understand. The problem is when client does not support UTF-8, FileZillaServer still sends UTF-8 filenames to the clients. As a result, when clients interprets the filenames, it will have certain ASCII characters are <127 as first byte (In Doble-byte OS, messing up first byte will destroy two bytes and result a ?). This may not be bugs of clients, but most likely limitation of Double-byte OS (or the DBCS architecture).

starkwong
504 Command not implemented
Posts: 7
Joined: 2006-05-21 06:16

#20 Post by starkwong » 2006-05-23 13:09

The RFC 2640 has a misleading paragraph:

Code: Select all

   The character set used to store files SHALL remain a local decision
   and MAY depend on the capability of local operating systems. Prior to
   the exchange of pathnames they SHOULD be converted into a ISO/IEC
   10646 format and UTF-8 encoded. This approach, while allowing
   international exchange of pathnames, will still allow backward
   compatibility with older systems because [b]the code set positions for
   ASCII characters are identical to the one byte sequence in UTF-8[/b].
The code set is ONLY identical for 7-byte ASCII and UTF-8. For Double-byte character the code point is completely different, so directly sending UTF-8 non-English filenames directly to older client DOES break the compatibility.
- Servers which support this specification, when presented a pathname
from an old client (one which does not support this specification),
can nearly always tell whether the pathname is in UTF-8 (see B.1)
or in some other code set. In order to support these older clients,
servers may wish to default to a non UTF-8 code set. However, how a
server supports non UTF-8 is outside the scope of this
specification.
If server sends UTF-8 names to old clients, it is impossible for clients to send the correct names back to server in DBCS OS because the string is already damaged.

User avatar
boco
Contributor
Posts: 26930
Joined: 2006-05-01 03:28
Location: Germany

#21 Post by boco » 2006-05-23 13:28

Most other servers do not send filenames in UTF-8 until OPTS UTF8 ON is issued. With Filezilla Server, I had to install it two times on one computer to support older clients.

boco

User avatar
botg
Site Admin
Posts: 35538
Joined: 2004-02-23 20:49
First name: Tim
Last name: Kosse

#22 Post by botg » 2006-05-23 17:28

RFC 959 for FTP:
control connection

The communication path between the USER-PI and SERVER-PI for
the exchange of commands and replies. This connection follows
the Telnet Protocol.
RFC 854 for Telnet:
1. When a TELNET connection is first established, each end is
assumed to originate and terminate at a "Network Virtual Terminal",
or NVT.
THE NETWORK VIRTUAL TERMINAL
[...] The code
set is seven-bit USASCII in an eight-bit field, except as modified
herein. Any code conversion and timing considerations are local
problems and do not affect the NVT.
So UTF-8 support does not break any compatibility as servers using other non-ascii charsets are not covered in the specification.

starkwong, the part of RFC2640 refers to compatibility to the old specified charset which was US-Ascii.

starkwong
504 Command not implemented
Posts: 7
Joined: 2006-05-21 06:16

#23 Post by starkwong » 2006-05-24 09:07

botg wrote:
starkwong, the part of RFC2640 refers to compatibility to the old specified charset which was US-Ascii.
Yes, so since RFC is not covered about non US-Ascii, you just don't want to deal with the compatibility of non US-Ascii, right?

You don't understand. The situation is compatibility of FileZIllaServer in non UTF-8 clients IS ALREADY BROKEN in DBCS version of Windows.

OK, I give up. You win. I just look for other FTP server software.

User avatar
botg
Site Admin
Posts: 35538
Joined: 2004-02-23 20:49
First name: Tim
Last name: Kosse

#24 Post by botg » 2006-05-24 11:09

I could boldly claim all servers not using the German ansi character set are broken because I can't see filenames on for example Chinese servers. There has never been compatibility for non-US-ascii character sets, it was pure luck it did work in the case server and client were using the same local character set.

How should a client know a server is using DBCS (and vice versa)? Only thing a client knows (and according to the specs can rely on it) is that a server is supposed to use 7-bit US-Ascii. Unless the UTF-8 specification is used, everything other character charset is unspecified nonstandard behaviour and broken by definition.

User avatar
boco
Contributor
Posts: 26930
Joined: 2006-05-01 03:28
Location: Germany

#25 Post by boco » 2006-05-25 02:43

Yes your right about the RFC thingy, and i understand your reasons.

But what you need to understand, too, is, that appearantly the Local Encoding method worked very well for most people, including me.

I think, what people want is not 100% compliance to the RFC, they want it to work.

Only very few clients I found on the net support UTF8, having a pure UTF8 server is not a good idea in this case.

I can understand the other people, too. You can't tell everyone: "Hey your client is broken, update it or get another one!" What they'll do is leaving your server and never return...

I don't want to offend anyone here, but all I get since UTF8 is problems. And I guess I'm not the only one...

boco

sxlderek
500 Command not understood
Posts: 1
Joined: 2006-05-27 17:24

#26 Post by sxlderek » 2006-05-27 18:05

I just upgraded from 0.9.2 to 0.9.17b today and got the same issue.

I totally agree to starkwong that you should only enable UTF8 after the Client says she wants it, or have an option on the server to default to traditional encoding scheme.

I have been using it for more than a year and I love it. However, I can say Filezilla Server is totally useless in a DBCS society unless we have the above option.

Can you clearly let us know whether there will be such as option in the future release?

Thank you.
--
Derek

User avatar
boco
Contributor
Posts: 26930
Joined: 2006-05-01 03:28
Location: Germany

#27 Post by boco » 2006-05-28 06:27

Should be an option with three choices like:

Code: Select all

[ ] Pure Unicode mode                             <-- currently, the server works in this mode
[ ] Use Local Encoding until OPTS UTF8 ON is sent <-- this would be my choice
[ ] Use Local Encoding only                       <-- 0.9.14a and below
Don't know if it's possible though...

boco

User avatar
botg
Site Admin
Posts: 35538
Joined: 2004-02-23 20:49
First name: Tim
Last name: Kosse

#28 Post by botg » 2006-05-30 13:06

I'm well aware this is causing some problems for users relying on an unspecified behaviour.
As workaround I'll implement the OPTS UTF8 OFF command in the next version of FileZilla Server.

marco
500 Command not understood
Posts: 1
Joined: 2004-07-06 07:59

#29 Post by marco » 2006-06-21 08:38

Hi,

i'm having troubles with ftpzilla server 0.9.18 because of utf8 i think
i realized i can, with an old filezilla client, issue the OPTS UTF8 OFF and then the server sends right nfilenames, BUT i have to deal with browsers, too.

when i use an ftplink having accented (italian) chars with mozilla firefox 1.5.0.4, i still get an error about a directory not found: it shows my filename WITHOUT the accented chars, and of course THAT filename doesn't exist...

i think we need a server option too (maybe an xml one) or a default for OFF
until then, i have to downgrade my filezilla server or give up with it.

which is the last version that doesn't comply so strictly with rfcs?

User avatar
botg
Site Admin
Posts: 35538
Joined: 2004-02-23 20:49
First name: Tim
Last name: Kosse

#30 Post by botg » 2006-06-21 09:00

You have two options:
a) Don't use non-US-ascii characters at all
b) Use an UTF-8 enabled client. You might want to submit a bug report to the Firefox developers

And by using UTF-8, FileZilla does in fact strictly comply to the RFCs.

Locked