Hi, How i explain this? Using s3ql as local:// mountpoint it will create folder structure to the filesystem, but using s3ql in swift:// it just create files under one folder, all data, meta and pass goes on same folder.
Problem is, im using on cloud via swift and there is limit of 50000 files at least webpage saying to better not go over(i think it work but what if will breake?), im using containers of 50M but well it grows fast and i going to go limit very fast! my space is more than 10T so it growing FAST, i have created 3 seperate filesystems but i can.t do more, my cache on vps server (diskspace is running out) as i need to keep it over 1Gig.
Works nice no problems at all, but is there any possibility to change that to folders? i know i need re-do everything after. moving files around will take week even 150mbit connection.. but better now than too late Its ok also if need manual do folders, one each folder i can do more 50000 files
thanks already happy xmas! :)
Hi Isaac, Hi Riku,
are you sure that this limit affects you? Did you tried creating more than 50.000 data blocks?
AFAIK S3QL does not do directory/bucket/container listings on the Swift backend. So it does not matter if there are thousands or millions of data blocks on your Swift storage.
[...] i think you guys should add the directory feature in the filesystem backend files. atless add a parameter in the -backend-options specific only for openstack that allow create automatic directories only to fix the 50K files limit. [...]
There is no thing as a folder/directory in Swift terminology. Swift is (the same as S3 and all other object storages) a flat filesystem. You can have many buckets (or in Swift they are called container) but in each bucket you can only store files (or objects in object storage terminology). So a recursive directory structure is not possible. Swift clients can (any most of them do) opt to show you a virtual folder structure underneath a container. If an object is named “folder/structure/file.name” the Swift client can show you a virtual folder structure with one top level folder “folder” that has a sub-folder “structure” and a file “file.name” in that sub-folder.
I suspect the 50.000 files per directory limit is a limit that not the hubic backend has but one or more of the hubic clients (the web app or the sync client probably). If you use that hubic account for nothing else as S3QL filesystems you should be ok. Otherwise you might want to experiment with the “prefix” option of S3QL to put all S3QL files in a virtual folder of your hubic store and do not touch this “folder” in any other application that accesses your hubic account.
I have had a S3QL file system
on OVH Object Storage (the “pro” version of hubic – https://www.ovh.co.uk/cloud/storage/object-storage.xml
) with approx. 3 million data blocks / Swift objects in the container
(but only approx. 400GB storage, many small and very good compressible
files).
The Swift backend did not mind that many objects in a single
container. I’m currently deleting the filesystem – that involves doing a
listing of all objects and issuing a HTTP DELETE requests for each of
them. This takes many hours and is not finished yet but otherwise works
OK.
I opted to not use that
filesystem anymore because S3QL and the backup application that filled
up that filesystem (Burp, http://burp.grke.org/)
are not a good fit. At the end the filesystem had 16 million directory
entries and 1.5 million inodes (Burp uses hard links excessively) and
the sqlite database that S3QL uses to store the filesystem structure
was 1.2 GB uncompressed. A s3qlstat or df on that filesystem took several
seconds because of the huge database size. Also S3QL scales not very
good with parallel file accesses but Burp does a ton of those. (The
sqlite database is not thread safe and thus every read/write access to
the database gets serialized by S3QL).
Now I use Bareos and 4 (in the future possibly more) different S3QL mounts (on 4 different container but you could do that on one container with different prefixes for every filesystem). Bareos distributes the read/write access on the 4 S3QL mounts and backing up the same data as before the combined uncompressed database sizes of all 4 S3QL filesystems is only 10 MB.
Hi Isaac,
( I will reply on the list since I suspect your answer was supposed to go there, too? )
Isaac Aymerich schrieb:
no i didnt try, but in hubic documentation is definied as a limitation, anyway i will try to create 100K 1K files this afternoon to know if it really a limitation or only a recoemndation.
Keep in mind: S3QL does data de-duplication on block level. Creating 100K 1KB files with the same content will only create one single object in the Swift backend. You need to create 100K files with different contents.
Something simple like this should do it:
cd /path/to/your/S3QL/mountpoint
for i in {1..1000000}; do echo $i > $i; done
and about file listing.. i suppose then s3ql have a little database with the binary data files name,
Yes, have a look at http://www.rath.org/s3ql-docs/impl_details.html where Nikolaus explains some details about S3QL’s inner workings.
Well, what about the reason that it doesn't work well with S3QL (which I
believe prompted you to start this thread)?
Nikolaus Rath schrieb:
On Dec 09 2015, Daniel Jagszent <dan...@jagszent.de> wrote:> AFAIK S3QL does not do directory/bucket/container listings on the Swift > backend. So it does not matter if there are thousands or millions of > data blocks on your Swift storage.Well, fsck.s3ql does such listings. But they are paginated, so there should be no issues.
I can confirm that. Due to
stupidity (wanting to increase the nofile limit ulimit -n but actually
setting a hard limit on file size ulimit -f) I once got the sqlite
database corrupted for that big file system. After re-creating the
database (with the sqlite command line tool) I naturally needed to run a
fsck.s3ql on the file system. It took some time but worked flawlessly.
> [...] At the end the filesystem had 16 million directory
> entries and 1.5 million inodes (Burp uses hard links excessively) and > the sqlite database that S3QL uses to store the filesystem structure was > 1.2 GB uncompressed.
This is not unreasonable though. Note that ext4 would require at least 5 GB of metadata as well - just to store the inodes (assuming 4096 bytes inode size). That's not yet counting directory entry *names*.
Sure. The size of the sqlite
database is reasonable for so many inodes/directory entries. But I
suspect that ext4 will scale better in terms of execution time for
normal operations like e.g. file system stats (df). S3QL needs to do
several full table scans for that and this will take its time for
tables that big (In my case approx. 10 seconds).
> Also S3QL scales not very good with parallel file accesses but Burp > does a ton of those. (The sqlite database is not thread safe and thus > every read/write access to the database gets serialized by S3QL).Both is true, but one is not the cause of another. Most reads/writes don't require access to the database and could run in parallel. However, S3QL itself is mostly single threaded at the moment so the requests are indeed serialized.
Thanks for the clarification!
However, I have plans in the drawer to fix this at some point. The idea is to handle reads/writes for blocks that are already cached entirely at the C level. This will allow concurrenty *and* at the same time boost single-threaded performance as well. Just need to find the time...
That sounds great! (More
performance always does :) )
Am I right in assuming that this will
speed up read/write syscalls but not stuff that solely works on the
database? (like opendir or the attr and xattr calls)
--
You received this message because you are subscribed to a topic in the Google Groups "s3ql" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/s3ql/hSyJvnXN0zs/unsubscribe.
To unsubscribe from this group and all its topics, send an email to s3ql+uns...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
tring to upload 100K files i'm getting corruption data in s3ql, in some point s3ql crash because hubic close connections and i lose all data from last meta backup :/this is part of the log