Ah, I believe it is merely a simple matter of feedback diction from s3cmd.
When I removed the --no-check-md5, the INFO gathering delay was significantly higher (indicating to me it was indeed then calculating md5 values).
So, with --no-check-md5 it is just doing the stat() work.
Post by WagnerOneHi Matt,
Sorry for the delay in testing and responding. I appreciate the effort you put in this to date.
s3cmd sync --no-preserve --no-check-md5 --verbose --progress /blah/ s3://blah/
INFO: Compiling list of local files...
INFO: Running stat() and reading/calculating MD5 values on 21726 files, this may take some time...
INFO: [1000/21726]
INFO: [2000/21726]
INFO: [3000/21726]
INFO: [4000/21726]
INFO: [5000/21726]
INFO: [6000/21726]
INFO: [7000/21726]
INFO: [8000/21726]
INFO: [9000/21726]
INFO: [10000/21726]
INFO: [11000/21726]
INFO: [12000/21726]
INFO: [13000/21726]
INFO: [14000/21726]
I am not a github expert, but I believe I pulled down the correct s3cmd version.
I did "Download Zip" from here: https://github.com/s3tools/s3cmd/tree/master
Unzipped it and did
python setup.py install
s3cmd version 1.5.0-beta1
I grabbed the latest github s3cmd master branch commit diff and compared it to the corresponding file from the downloaded zip and it matches, so unless I think I should be using the version with this patch incorporated.
Mike
Try the upstream master branch now with --no-check-md5. This should disable all md5 calculations, thus also disable hardlinking and remote copying.
Thanks,
Matt
There's a bug I see (and I created) in current upstream master, where --no-check-md5 will still do the file I/O on local files to get the md5sums for them, exactly to decide if it can do remote copying. That's annoying.
This bug also means --no-check-md5 won't, as you might expect, disable remote copying. As no one has asked to be able to disable remote copying, I never coded for it.
I'll think about this a bit. There's probably a cleaner way to solve both problems.
While this feature is fantastic, I can't find a lot of detail on it in general. I wonder how to disable it?
During initial uploads at least, our DirectConnect link seems to be faster in copying the files themselves than s3cmd is at telling S3 to "remote copy" objects.
Would that simply be using s3cmd switch --no-check-md5 ?
This would seem likely to reduce the RAM required to enumerate the source files too?
Thanks,
Mike
------------------------------------------------------------------------------
Learn Graph Databases - Download FREE O'Reilly Book
"Graph Databases" is the definitive new guide to graph databases and their
applications. Written by three acclaimed leaders in the field,
this first edition is now available. Download your free book today!
http://p.sf.net/sfu/13534_NeoTech
_______________________________________________
S3tools-general mailing list
https://lists.sourceforge.net/lists/listinfo/s3tools-general
------------------------------------------------------------------------------
Learn Graph Databases - Download FREE O'Reilly Book
"Graph Databases" is the definitive new guide to graph databases and their
applications. Written by three acclaimed leaders in the field,
this first edition is now available. Download your free book today!
http://p.sf.net/sfu/13534_NeoTech_______________________________________________
S3tools-general mailing list
https://lists.sourceforge.net/lists/listinfo/s3tools-general
--
"Every generation laughs at the old fashions, but follows religiously the new."-Thoreau
------------------------------------------------------------------------------
_______________________________________________
S3tools-general mailing list
https://lists.sourceforge.net/lists/listinfo/s3tools-general