Discussion:
[S3tools-general] how to disable "remote copy" feature
WagnerOne
2014-03-12 22:14:39 UTC
Permalink
While this feature is fantastic, I can't find a lot of detail on it in general. I wonder how to disable it?

During initial uploads at least, our DirectConnect link seems to be faster in copying the files themselves than s3cmd is at telling S3 to "remote copy" objects.

Would that simply be using s3cmd switch --no-check-md5 ?

This would seem likely to reduce the RAM required to enumerate the source files too?

Thanks,
Mike
Matt Domsch
2014-03-13 02:12:59 UTC
Permalink
There's a bug I see (and I created) in current upstream master, where
--no-check-md5 will still do the file I/O on local files to get the md5sums
for them, exactly to decide if it can do remote copying. That's annoying.

This bug also means --no-check-md5 won't, as you might expect, disable
remote copying. As no one has asked to be able to disable remote copying,
I never coded for it.

I'll think about this a bit. There's probably a cleaner way to solve both
problems.
Post by WagnerOne
While this feature is fantastic, I can't find a lot of detail on it in
general. I wonder how to disable it?
During initial uploads at least, our DirectConnect link seems to be faster
in copying the files themselves than s3cmd is at telling S3 to "remote
copy" objects.
Would that simply be using s3cmd switch --no-check-md5 ?
This would seem likely to reduce the RAM required to enumerate the source files too?
Thanks,
Mike
------------------------------------------------------------------------------
Learn Graph Databases - Download FREE O'Reilly Book
"Graph Databases" is the definitive new guide to graph databases and their
applications. Written by three acclaimed leaders in the field,
this first edition is now available. Download your free book today!
http://p.sf.net/sfu/13534_NeoTech
_______________________________________________
S3tools-general mailing list
https://lists.sourceforge.net/lists/listinfo/s3tools-general
Matt Domsch
2014-03-13 23:32:20 UTC
Permalink
Try the upstream master branch now with --no-check-md5. This should
disable all md5 calculations, thus also disable hardlinking and remote
copying.

Thanks,
Matt
Post by Matt Domsch
There's a bug I see (and I created) in current upstream master, where
--no-check-md5 will still do the file I/O on local files to get the md5sums
for them, exactly to decide if it can do remote copying. That's annoying.
This bug also means --no-check-md5 won't, as you might expect, disable
remote copying. As no one has asked to be able to disable remote copying,
I never coded for it.
I'll think about this a bit. There's probably a cleaner way to solve both
problems.
Post by WagnerOne
While this feature is fantastic, I can't find a lot of detail on it in
general. I wonder how to disable it?
During initial uploads at least, our DirectConnect link seems to be
faster in copying the files themselves than s3cmd is at telling S3 to
"remote copy" objects.
Would that simply be using s3cmd switch --no-check-md5 ?
This would seem likely to reduce the RAM required to enumerate the source files too?
Thanks,
Mike
------------------------------------------------------------------------------
Learn Graph Databases - Download FREE O'Reilly Book
"Graph Databases" is the definitive new guide to graph databases and their
applications. Written by three acclaimed leaders in the field,
this first edition is now available. Download your free book today!
http://p.sf.net/sfu/13534_NeoTech
_______________________________________________
S3tools-general mailing list
https://lists.sourceforge.net/lists/listinfo/s3tools-general
WagnerOne
2014-04-03 18:14:59 UTC
Permalink
Hi Matt,

Sorry for the delay in testing and responding. I appreciate the effort you put in this to date.

When I run a s3cmd sync like so:

s3cmd sync --no-preserve --no-check-md5 --verbose --progress /blah/ s3://blah/

I see this in the output indicating it seemingly still generating md5 values:

INFO: Compiling list of local files...
INFO: Running stat() and reading/calculating MD5 values on 21726 files, this may take some time...
INFO: [1000/21726]
INFO: [2000/21726]
INFO: [3000/21726]
INFO: [4000/21726]
INFO: [5000/21726]
INFO: [6000/21726]
INFO: [7000/21726]
INFO: [8000/21726]
INFO: [9000/21726]
INFO: [10000/21726]
INFO: [11000/21726]
INFO: [12000/21726]
INFO: [13000/21726]
INFO: [14000/21726]

I am not a github expert, but I believe I pulled down the correct s3cmd version.

I did "Download Zip" from here: https://github.com/s3tools/s3cmd/tree/master

Unzipped it and did

python setup.py install

The s3cmd --version output is:
s3cmd version 1.5.0-beta1

I grabbed the latest github s3cmd master branch commit diff and compared it to the corresponding file from the downloaded zip and it matches, so unless I think I should be using the version with this patch incorporated.

Mike
Try the upstream master branch now with --no-check-md5. This should disable all md5 calculations, thus also disable hardlinking and remote copying.
Thanks,
Matt
There's a bug I see (and I created) in current upstream master, where --no-check-md5 will still do the file I/O on local files to get the md5sums for them, exactly to decide if it can do remote copying. That's annoying.
This bug also means --no-check-md5 won't, as you might expect, disable remote copying. As no one has asked to be able to disable remote copying, I never coded for it.
I'll think about this a bit. There's probably a cleaner way to solve both problems.
While this feature is fantastic, I can't find a lot of detail on it in general. I wonder how to disable it?
During initial uploads at least, our DirectConnect link seems to be faster in copying the files themselves than s3cmd is at telling S3 to "remote copy" objects.
Would that simply be using s3cmd switch --no-check-md5 ?
This would seem likely to reduce the RAM required to enumerate the source files too?
Thanks,
Mike
------------------------------------------------------------------------------
Learn Graph Databases - Download FREE O'Reilly Book
"Graph Databases" is the definitive new guide to graph databases and their
applications. Written by three acclaimed leaders in the field,
this first edition is now available. Download your free book today!
http://p.sf.net/sfu/13534_NeoTech
_______________________________________________
S3tools-general mailing list
https://lists.sourceforge.net/lists/listinfo/s3tools-general
------------------------------------------------------------------------------
Learn Graph Databases - Download FREE O'Reilly Book
"Graph Databases" is the definitive new guide to graph databases and their
applications. Written by three acclaimed leaders in the field,
this first edition is now available. Download your free book today!
http://p.sf.net/sfu/13534_NeoTech_______________________________________________
S3tools-general mailing list
https://lists.sourceforge.net/lists/listinfo/s3tools-general
--
***@wagnerone.com
"Every generation laughs at the old fashions, but follows religiously the new."-Thoreau
WagnerOne
2014-04-03 18:30:11 UTC
Permalink
Ah, I believe it is merely a simple matter of feedback diction from s3cmd.

When I removed the --no-check-md5, the INFO gathering delay was significantly higher (indicating to me it was indeed then calculating md5 values).

So, with --no-check-md5 it is just doing the stat() work.

BTW, I really appreciate the "INFO: " addition!

Mike
Post by WagnerOne
Hi Matt,
Sorry for the delay in testing and responding. I appreciate the effort you put in this to date.
s3cmd sync --no-preserve --no-check-md5 --verbose --progress /blah/ s3://blah/
INFO: Compiling list of local files...
INFO: Running stat() and reading/calculating MD5 values on 21726 files, this may take some time...
INFO: [1000/21726]
INFO: [2000/21726]
INFO: [3000/21726]
INFO: [4000/21726]
INFO: [5000/21726]
INFO: [6000/21726]
INFO: [7000/21726]
INFO: [8000/21726]
INFO: [9000/21726]
INFO: [10000/21726]
INFO: [11000/21726]
INFO: [12000/21726]
INFO: [13000/21726]
INFO: [14000/21726]
I am not a github expert, but I believe I pulled down the correct s3cmd version.
I did "Download Zip" from here: https://github.com/s3tools/s3cmd/tree/master
Unzipped it and did
python setup.py install
s3cmd version 1.5.0-beta1
I grabbed the latest github s3cmd master branch commit diff and compared it to the corresponding file from the downloaded zip and it matches, so unless I think I should be using the version with this patch incorporated.
Mike
Try the upstream master branch now with --no-check-md5. This should disable all md5 calculations, thus also disable hardlinking and remote copying.
Thanks,
Matt
There's a bug I see (and I created) in current upstream master, where --no-check-md5 will still do the file I/O on local files to get the md5sums for them, exactly to decide if it can do remote copying. That's annoying.
This bug also means --no-check-md5 won't, as you might expect, disable remote copying. As no one has asked to be able to disable remote copying, I never coded for it.
I'll think about this a bit. There's probably a cleaner way to solve both problems.
While this feature is fantastic, I can't find a lot of detail on it in general. I wonder how to disable it?
During initial uploads at least, our DirectConnect link seems to be faster in copying the files themselves than s3cmd is at telling S3 to "remote copy" objects.
Would that simply be using s3cmd switch --no-check-md5 ?
This would seem likely to reduce the RAM required to enumerate the source files too?
Thanks,
Mike
------------------------------------------------------------------------------
Learn Graph Databases - Download FREE O'Reilly Book
"Graph Databases" is the definitive new guide to graph databases and their
applications. Written by three acclaimed leaders in the field,
this first edition is now available. Download your free book today!
http://p.sf.net/sfu/13534_NeoTech
_______________________________________________
S3tools-general mailing list
https://lists.sourceforge.net/lists/listinfo/s3tools-general
------------------------------------------------------------------------------
Learn Graph Databases - Download FREE O'Reilly Book
"Graph Databases" is the definitive new guide to graph databases and their
applications. Written by three acclaimed leaders in the field,
this first edition is now available. Download your free book today!
http://p.sf.net/sfu/13534_NeoTech_______________________________________________
S3tools-general mailing list
https://lists.sourceforge.net/lists/listinfo/s3tools-general
--
"Every generation laughs at the old fashions, but follows religiously the new."-Thoreau
------------------------------------------------------------------------------
_______________________________________________
S3tools-general mailing list
https://lists.sourceforge.net/lists/listinfo/s3tools-general
--
***@wagnerone.com
"A bad Dead show is better than a good day at work."
WagnerOne
2014-04-03 19:14:38 UTC
Permalink
Hi,

I believe I may have uncovered a bug regarding using an IAM role vs. a user key/secret key combination.

The instance I'm using s3cmd on has an IAM role allowing it write access to an s3 bucket.

At some point after having that IAM role assigned, it was deprecated in favor of using a user account's keys.

I noticed s3cmd was still using the IAM role and went to --configure it again.

When running --configure, s3cmd grabbed (what I am guessing) are the automatically generated IAM role keys as the defaults. I overwrote those with the given user's account keys.

When it came time in the --configure process to test the config:

Test access with supplied credentials? [Y/n] y
Please wait, attempting to list all buckets...
ERROR: Test failed: 400 (InvalidToken): The provided token is malformed or otherwise invalid.

I saved the config anyway and was able to do s3 operations with the account keys.

The access_token in my old config file for use with the IAM role and the new config file with the account keys, the access_token is the same. So it would appear that access_token is being generated off the IAM role even if I put in the user keys at s3cmd config time.

Mike

Loading...