Alright, I tested both versions in two realistic environments. One from a web server to a web server (fast connection) and another one from my laptop to a web server. In each test, the results between the reverse sorted (original) vs the unsorted (proposed fix) were basically identical (queuing and overall download time each within a couple of seconds of each other over a runtime of about 5 hours (server-server)/11 hours (client-server)). So I guess, I am performance monitoring the wrong thing here.
So what exactly should I performance monitoring and suspect to have any significant different result?
What to test:
- the time it takes to queue all items to the transfer queue
- the time it takes to queue and download all items
How to test:
- counting in network time
- ignoring network time
Because I already tested for 1a and 2a I guess it has to be one of the ignoring network time variants, unless the quality of the test method I used was not good enough (file structure).
Since millions of directories with millions of files will take a long time to test, what is the minimal concrete environment here that should display a significant different test result? And for how much performance impact am I looking for?
Reading through your answers so far I guess the performance scenario you are taking about is 1b (or 1a with the least possible amount of network time) with many files in less directories being more important than less files in more directories, to test for the actual time the code itself takes to run.
Is this the scenario I should test for?
Despite the testing here which I am happy to do, I have to admit that I am not sure about seeing any real world scenario being addressed here so far. Performance optimization in general is of cause always a goal for any application, but regarding the millions of empty files in millions of directories (with tests still remaining of cause) I assume that we are talking about the icing on the cake here and not something a lot of users will notice at all.
On the other hand, the real world scenario of adding a directory (or directory structure) containing large files to the queue and needing them to be downloaded in some order to be able to start using some of them while others may still take a while to be downloaded is a common use case scenario. And adding large quantities of large files manually to the queue has a performance impact (in user time) that can be easily measured in seconds to minutes (depending on the complexity of the file structure). It is caused by browsing directories, manually creating local directories to mirror remote directories, sorting files in the remote directory and then adding them to the queue.
Sorting files of a traversed directory in the order the current directory is sorted, is by no doubt the actual user expected behavior for recursively adding files to the queue (which would of cause have a minor performance impact, because of the sorting).
I know that adding files to the queue in the order they are returned by the server is not the same as explicitly sorting them and is in theory arbitrary. But practically speaking, the file order returned by ftp servers in general is usually implicitly in an ascending order, which also happens to be the most commonly expected sorting order for recursively adding files to the queue.
So by changing just one line of code, this most common use case scenario can be implemented with little to no impact on the rest of the application and performance (test results pending).
In the long run, explicit sorting should be implemented (there could be an option for the method to use in the preferences (current order (default option), server default, performance optimized), though i would consider this unnecessary), but for now, using the server returned order would make a huge difference and is the simplest possible solution.
Please also test deletion of directories. It's an operation that updates cached listings which is sensitive to order.
I would postpone that particular test until the performance impact of queuing for downloading is clear to not further complicate things.