Discussion:
[S3tools-general] Files from and meaning of requests in pricing.
Russell Gadd
2015-03-22 20:58:57 UTC
Permalink
I'm wondering if someone could help explain:

1. Can you tell me if --files-from is an available option for the ls
command? I've experimented to find out but without success. (Example: s3cmd
-r --files-from=testlist.txt ls s3://xyztestbucket). Probably not but I
just wanted to check. It's not clear in the documentation although I
suspect most people probably haven't got a use for it. So please confirm
that --files-from doesn't apply to ls or else tell me how to specify the
command and the list of files (i.e. is s3//bucket-name required at the
front of each file). In my proposed usage it would be useful as it would
verify the existence of specific files. If not available I will have to
issue one command per file unless I list the whole lot since I'm not using
folders.

2. I'm not sure of the meaning of "requests" in the pricing of get or list
requests, which for EU-West is $.004 / 1000 requests.
Does this mean $.004 for a request which returns 1000 file names or
literally 1000 lists each of which could return any number of filenames?
Actually it's probably small beer for my usage but it would be nice to know.

Russell
Jeremy Wadsack
2015-03-22 21:10:49 UTC
Permalink
Requests is the latter, sort of. An 'ls' command is one request despite how
many results it returns (up to 1000). A 'get' request is a download of a
single file. Although a large file may be downloaded in several parts in
which case each part is a request. To get a better idea, you can read the
details of how the S3 API works:
http://docs.aws.amazon.com/AmazonS3/latest/API/APIRest.html

I don't know about your other question. I'll leave that for Matt or someone
more familiar with the code.

Jeremy Wadsack
Post by Russell Gadd
1. Can you tell me if --files-from is an available option for the ls
command? I've experimented to find out but without success. (Example: s3cmd
-r --files-from=testlist.txt ls s3://xyztestbucket). Probably not but I
just wanted to check. It's not clear in the documentation although I
suspect most people probably haven't got a use for it. So please confirm
that --files-from doesn't apply to ls or else tell me how to specify the
command and the list of files (i.e. is s3//bucket-name required at the
front of each file). In my proposed usage it would be useful as it would
verify the existence of specific files. If not available I will have to
issue one command per file unless I list the whole lot since I'm not using
folders.
2. I'm not sure of the meaning of "requests" in the pricing of get or list
requests, which for EU-West is $.004 / 1000 requests.
Does this mean $.004 for a request which returns 1000 file names or
literally 1000 lists each of which could return any number of filenames?
Actually it's probably small beer for my usage but it would be nice to know.
Russell
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website,
sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for
all
things parallel software development, from weekly thought leadership blogs
to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
S3tools-general mailing list
https://lists.sourceforge.net/lists/listinfo/s3tools-general
Matt Domsch
2015-03-22 21:22:55 UTC
Permalink
[ls] doesn't honor the --files-from option. [ls] simply asks S3 for all
the files in a bucket, possibly recursively, starting from a given prefix.

Jeremy is correct that it doesn't matter if a request returns 0 bytes or a
list of 1000 objects, it's counted as one request. Most operations have a
limit as to the number of items they can operate on (e.g. list bucket and
multiple object delete have a limit of 1000 objects for each
operation/request). If though, given a list of 1000 objects, we do a
metadata HEAD request for each object, then you'll have made 1001
requests. (we don't get metadata for every object anymore though, only
when we need it).
Post by Russell Gadd
1. Can you tell me if --files-from is an available option for the ls
command? I've experimented to find out but without success. (Example: s3cmd
-r --files-from=testlist.txt ls s3://xyztestbucket). Probably not but I
just wanted to check. It's not clear in the documentation although I
suspect most people probably haven't got a use for it. So please confirm
that --files-from doesn't apply to ls or else tell me how to specify the
command and the list of files (i.e. is s3//bucket-name required at the
front of each file). In my proposed usage it would be useful as it would
verify the existence of specific files. If not available I will have to
issue one command per file unless I list the whole lot since I'm not using
folders.
2. I'm not sure of the meaning of "requests" in the pricing of get or list
requests, which for EU-West is $.004 / 1000 requests.
Does this mean $.004 for a request which returns 1000 file names or
literally 1000 lists each of which could return any number of filenames?
Actually it's probably small beer for my usage but it would be nice to know.
Russell
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website,
sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for
all
things parallel software development, from weekly thought leadership blogs
to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
S3tools-general mailing list
https://lists.sourceforge.net/lists/listinfo/s3tools-general
Russell Gadd
2015-03-22 23:39:32 UTC
Permalink
Thanks Matt. I suspected the files-from wouldn't work with ls.

But your mention of a limit of 1000 on list operations worries me, as my
plan was to put about 25000 files into one folder with a name as just the
32 hex character MD5 with no subfolder heirarchy. It would seem that I
could then not get a list of all the objects. I was totally unaware of this
arbitrary limit.

(I'm thinking aloud now)
One method would perhaps be to issue 256 requests for all items beginning
with 2 characters from 00 to ff. I have about 30000 files so this would
give me an average of about 120 files per subset although there could be an
outlier 2 character prefix with a number of files above 1000. In that case
would I get just 1000 responses? And I don't think there'd be an obvious
way to get the remaining files. However if they came in alphabetic order
perhaps I'd only have to issue a few more requests to get files beginning
say xyd to xyf assuming I already had say all xy0 to xyc. Sounds like an
exercise in recursive algorithms. Actually if done right it will be much
less than 256.

Maybe I need to rethink, but I won't be looking for a complete list very
often so perhaps it might be ok.

Russell
Post by Matt Domsch
[ls] doesn't honor the --files-from option. [ls] simply asks S3 for all
the files in a bucket, possibly recursively, starting from a given prefix.
Jeremy is correct that it doesn't matter if a request returns 0 bytes or a
list of 1000 objects, it's counted as one request. Most operations have a
limit as to the number of items they can operate on (e.g. list bucket and
multiple object delete have a limit of 1000 objects for each
operation/request). If though, given a list of 1000 objects, we do a
metadata HEAD request for each object, then you'll have made 1001
requests. (we don't get metadata for every object anymore though, only
when we need it).
Post by Russell Gadd
1. Can you tell me if --files-from is an available option for the ls
command? I've experimented to find out but without success. (Example: s3cmd
-r --files-from=testlist.txt ls s3://xyztestbucket). Probably not but I
just wanted to check. It's not clear in the documentation although I
suspect most people probably haven't got a use for it. So please confirm
that --files-from doesn't apply to ls or else tell me how to specify the
command and the list of files (i.e. is s3//bucket-name required at the
front of each file). In my proposed usage it would be useful as it would
verify the existence of specific files. If not available I will have to
issue one command per file unless I list the whole lot since I'm not using
folders.
2. I'm not sure of the meaning of "requests" in the pricing of get or
list requests, which for EU-West is $.004 / 1000 requests.
Does this mean $.004 for a request which returns 1000 file names or
literally 1000 lists each of which could return any number of filenames?
Actually it's probably small beer for my usage but it would be nice to know.
Russell
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website,
sponsored
by Intel and developed in partnership with Slashdot Media, is your hub
for all
things parallel software development, from weekly thought leadership
blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
S3tools-general mailing list
https://lists.sourceforge.net/lists/listinfo/s3tools-general
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website,
sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for
all
things parallel software development, from weekly thought leadership blogs
to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
S3tools-general mailing list
https://lists.sourceforge.net/lists/listinfo/s3tools-general
Matt Domsch
2015-03-22 23:45:45 UTC
Permalink
The 1000 per request limit doesn't limit the number of objects in a bucket
or their names. It's solely to make sure the REST API doesn't get bogged
down trying to return 1M objects in a list at once. When a bucket has more
than 1000 files, it returns the first 1000, with an <IsTruncated/> tag, and
a marker to indicate the next object to return in a subsequent related
call, starting from the marker. You just have to issue a second call.
s3cmd does this automatically for 'ls' and most list operations. It's
broken trying to list the parts of a multipart upload when there are more
than 1000 parts for a given object; it doesn't issue the subsequent calls.
Post by Russell Gadd
Thanks Matt. I suspected the files-from wouldn't work with ls.
But your mention of a limit of 1000 on list operations worries me, as my
plan was to put about 25000 files into one folder with a name as just the
32 hex character MD5 with no subfolder heirarchy. It would seem that I
could then not get a list of all the objects. I was totally unaware of this
arbitrary limit.
(I'm thinking aloud now)
One method would perhaps be to issue 256 requests for all items beginning
with 2 characters from 00 to ff. I have about 30000 files so this would
give me an average of about 120 files per subset although there could be an
outlier 2 character prefix with a number of files above 1000. In that case
would I get just 1000 responses? And I don't think there'd be an obvious
way to get the remaining files. However if they came in alphabetic order
perhaps I'd only have to issue a few more requests to get files beginning
say xyd to xyf assuming I already had say all xy0 to xyc. Sounds like an
exercise in recursive algorithms. Actually if done right it will be much
less than 256.
Maybe I need to rethink, but I won't be looking for a complete list very
often so perhaps it might be ok.
Russell
Post by Matt Domsch
[ls] doesn't honor the --files-from option. [ls] simply asks S3 for all
the files in a bucket, possibly recursively, starting from a given prefix.
Jeremy is correct that it doesn't matter if a request returns 0 bytes or
a list of 1000 objects, it's counted as one request. Most operations have
a limit as to the number of items they can operate on (e.g. list bucket and
multiple object delete have a limit of 1000 objects for each
operation/request). If though, given a list of 1000 objects, we do a
metadata HEAD request for each object, then you'll have made 1001
requests. (we don't get metadata for every object anymore though, only
when we need it).
Post by Russell Gadd
1. Can you tell me if --files-from is an available option for the ls
command? I've experimented to find out but without success. (Example: s3cmd
-r --files-from=testlist.txt ls s3://xyztestbucket). Probably not but I
just wanted to check. It's not clear in the documentation although I
suspect most people probably haven't got a use for it. So please confirm
that --files-from doesn't apply to ls or else tell me how to specify the
command and the list of files (i.e. is s3//bucket-name required at the
front of each file). In my proposed usage it would be useful as it would
verify the existence of specific files. If not available I will have to
issue one command per file unless I list the whole lot since I'm not using
folders.
2. I'm not sure of the meaning of "requests" in the pricing of get or
list requests, which for EU-West is $.004 / 1000 requests.
Does this mean $.004 for a request which returns 1000 file names or
literally 1000 lists each of which could return any number of filenames?
Actually it's probably small beer for my usage but it would be nice to know.
Russell
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website,
sponsored
by Intel and developed in partnership with Slashdot Media, is your hub
for all
things parallel software development, from weekly thought leadership
blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
S3tools-general mailing list
https://lists.sourceforge.net/lists/listinfo/s3tools-general
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website,
sponsored
by Intel and developed in partnership with Slashdot Media, is your hub
for all
things parallel software development, from weekly thought leadership
blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
S3tools-general mailing list
https://lists.sourceforge.net/lists/listinfo/s3tools-general
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website,
sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for
all
things parallel software development, from weekly thought leadership blogs
to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
S3tools-general mailing list
https://lists.sourceforge.net/lists/listinfo/s3tools-general
Russell Gadd
2015-03-23 16:57:59 UTC
Permalink
Ok so it appears I don't have a problem with a limit of 1000 in the REST
API ls facility if I use s3cmd as s3cmd isn't restricted by this. Regarding
multipart upload I won't be needing more than 1000 parts as I will leave
the 15MB chunk size alone and I don't have any files anywhere near as big
as 15GB.

Thanks for the responses. There's a remaining question about MD5s which you
might be able to answer - please see my other thread (with the missing
subject line)

Russell
Post by Matt Domsch
The 1000 per request limit doesn't limit the number of objects in a bucket
or their names. It's solely to make sure the REST API doesn't get bogged
down trying to return 1M objects in a list at once. When a bucket has more
than 1000 files, it returns the first 1000, with an <IsTruncated/> tag, and
a marker to indicate the next object to return in a subsequent related
call, starting from the marker. You just have to issue a second call.
s3cmd does this automatically for 'ls' and most list operations. It's
broken trying to list the parts of a multipart upload when there are more
than 1000 parts for a given object; it doesn't issue the subsequent calls.
Post by Russell Gadd
Thanks Matt. I suspected the files-from wouldn't work with ls.
But your mention of a limit of 1000 on list operations worries me, as my
plan was to put about 25000 files into one folder with a name as just the
32 hex character MD5 with no subfolder heirarchy. It would seem that I
could then not get a list of all the objects. I was totally unaware of this
arbitrary limit.
(I'm thinking aloud now)
One method would perhaps be to issue 256 requests for all items beginning
with 2 characters from 00 to ff. I have about 30000 files so this would
give me an average of about 120 files per subset although there could be an
outlier 2 character prefix with a number of files above 1000. In that case
would I get just 1000 responses? And I don't think there'd be an obvious
way to get the remaining files. However if they came in alphabetic order
perhaps I'd only have to issue a few more requests to get files beginning
say xyd to xyf assuming I already had say all xy0 to xyc. Sounds like an
exercise in recursive algorithms. Actually if done right it will be much
less than 256.
Maybe I need to rethink, but I won't be looking for a complete list very
often so perhaps it might be ok.
Russell
Post by Matt Domsch
[ls] doesn't honor the --files-from option. [ls] simply asks S3 for all
the files in a bucket, possibly recursively, starting from a given prefix.
Jeremy is correct that it doesn't matter if a request returns 0 bytes or
a list of 1000 objects, it's counted as one request. Most operations have
a limit as to the number of items they can operate on (e.g. list bucket and
multiple object delete have a limit of 1000 objects for each
operation/request). If though, given a list of 1000 objects, we do a
metadata HEAD request for each object, then you'll have made 1001
requests. (we don't get metadata for every object anymore though, only
when we need it).
Post by Russell Gadd
1. Can you tell me if --files-from is an available option for the ls
command? I've experimented to find out but without success. (Example: s3cmd
-r --files-from=testlist.txt ls s3://xyztestbucket). Probably not but I
just wanted to check. It's not clear in the documentation although I
suspect most people probably haven't got a use for it. So please confirm
that --files-from doesn't apply to ls or else tell me how to specify the
command and the list of files (i.e. is s3//bucket-name required at the
front of each file). In my proposed usage it would be useful as it would
verify the existence of specific files. If not available I will have to
issue one command per file unless I list the whole lot since I'm not using
folders.
2. I'm not sure of the meaning of "requests" in the pricing of get or
list requests, which for EU-West is $.004 / 1000 requests.
Does this mean $.004 for a request which returns 1000 file names or
literally 1000 lists each of which could return any number of filenames?
Actually it's probably small beer for my usage but it would be nice to know.
Russell
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website,
sponsored
by Intel and developed in partnership with Slashdot Media, is your hub
for all
things parallel software development, from weekly thought leadership
blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
S3tools-general mailing list
https://lists.sourceforge.net/lists/listinfo/s3tools-general
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website,
sponsored
by Intel and developed in partnership with Slashdot Media, is your hub
for all
things parallel software development, from weekly thought leadership
blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
S3tools-general mailing list
https://lists.sourceforge.net/lists/listinfo/s3tools-general
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website,
sponsored
by Intel and developed in partnership with Slashdot Media, is your hub
for all
things parallel software development, from weekly thought leadership
blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
S3tools-general mailing list
https://lists.sourceforge.net/lists/listinfo/s3tools-general
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website,
sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for
all
things parallel software development, from weekly thought leadership blogs
to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
S3tools-general mailing list
https://lists.sourceforge.net/lists/listinfo/s3tools-general
Loading...