[Feature Request] Firebase Storage: Get list of files in a StorageReference

12,451 views
Skip to first unread message

Kshitij Aggarwal

unread,
Jul 6, 2016, 2:43:41 PM7/6/16
to Firebase Google Group
Can we get an API endpoint to fetch the list of all files in a particular storage reference. It will things very easy to access files in bulk.

Mike Mcdonald

unread,
Jul 6, 2016, 10:02:54 PM7/6/16
to Firebase Google Group
Hi Kshitij,

Thanks for the feature request. We considered this (and actually built it), but decided not to ship it for a few reasons:
  • Listing files returns "files" and "folders"--folders don't really have any meaning in our system. We could offer all files *at* a location, or we could return all files *at and below* a location.
    • For example, if you have files at the references: "users/mike/profile.png" and "users/mike/photos/001.png" through "users/mike/photos/999.png", would you expect a list to return "profile.png" and "photos/", or would you expect "profile.png" and "photos/001.png" through "photos/999.png"? Or maybe just "photo.png"?
    • These have radically different performance characteristics (one effectively is a recursive search through all objects), while the other is more straightforward.
  • How do we handle pagination? Traditionally, Firebase has eschewed pagination since it makes APIs hard to use, but if we don't provide pagination, it's highly likely that if you try to read a long list of files, you'll OOM your device (since we'd want to return a nice array of files to you). The other option here is make it fire for every file once, like the database "on('child_added')", which is why we recommend just using the DB to store this information.
  • Files returned contain file metadata, which contains (among other things), public download URLs, which would allow anyone with the ability to list files the ability to download files
    • We could return listed files without this URL, but then (in order to download the files), one would have to perform N network operations to get download URLs, then perform N downloads, vs just the N downloads.
    • We could also disallow lists if a user didn't have permission to view the files, or we could return a filtered list of files the developer has access to--depending on the use case, developers might expect different results. Which would you expect? In the above example, if you listed at "users/mike" and "profile.png" was public, but "photos/XXX.png" were all private, would you expect to see: 1) a 403 unauthorized), 2) "profile.png", 3) "profile.png" and "photos/", or 4) "profile.png" and "photos/XXX.png" but with no download URLs? In the latter two cases, do those leak sensitive information (for instance, what if someone named their file something personally identifiable)?
  • List wouldn't be realtime: a file added during a list may or may not be included in the list. Plus list is eventually consistent, so it might not contain a file you just uploaded.
In general, we found that the primary use case for list semantics was getting an array of images and putting them in a list/table view--which we found to be far easier to do by storing and syncing file metadata (such as the download URL, size, etc.) in the Realtime Database. In my opinion, the only downside of this system is that the Storage <-> Database metadata synchronization is not automatic--at least, not yet. This can already be done one way via Google Cloud Functions, where you can trigger events (such as writes to the Realtime Database) when a file is uploaded, updated, or deleted. I have an example that takes an uploaded file, runs it through the Cloud Vision API, and writes the result to the Database, which then displays the file and the Cloud Vision results on a webpage.

All of that said, I would love to use this thread as an area for feedback on the API design: how would people expect this feature to work? What are the use cases where you can't use the Realtime Database (or another database) to store this information? What is the main impetus for downloading files in bulk? Would you instead want to compress them into a single archive and store that?

Thanks,
--Mike

P.S. And if you *really* want this API, you can use the Google Cloud Storage List API from the server side--one of the advantages of using a GCS bucket :)

James Spivey

unread,
Jul 29, 2016, 3:11:58 PM7/29/16
to Firebase Google Group
My use case is I have a company that has an image, then it has stores each with their own image. Currently for sake of ease of the edit system, that reference is stored on the company and store respectively. One thing that is nice is that if one deletes a company from the company list (fb.company.listOfCompanies), it becomes very easy to delete all the stores by simply deleting the UID of the company from the stores list (fb.stores.companyUid.listOfStores). The data is structured much in the same for the images, it is all nested under the parent company UID in a "folder" (images.companyId). Instead of having to loop through each store in the list of stores or inversely if restructuring the data getting a list of references stored separate from the store (fb.listOfImageRefs), and then loop over that list of items deleting each item individually, it would be great to simply pass the reference to the parent "folder" and have it delete everything it sees to be nested below that. Part of the love of firebase is not needing to have to have any middleware, but currently it seems the only real efficient way to delete a company with say 1000 stores would be to send a request off to a middle tier that could go off and do those deletes with some fault tolerance because I would not have the faith in a client doing that sort of looping and execution of commands. Is there a way to structure data or a command that I am missing that would make that process easier. This is my first real scale project with Firebase so it feels like it should be easier than looping over IDs endlessly. I do understand the point of the data not actually being through folders, but what was intriguing to me was that your angular backed console was able to do this high level delete. Does that function off a middle tier of sorts or is it all client driven, magic only sort of works for me! Thanks again for responding on twitter and for the help! I as well vote for a simpler way to list and manipulate files in bulk. That is a central tenant to most needs for file storage cases I've ever had.

~James

Samuel Hubbard

unread,
Sep 9, 2016, 4:30:29 PM9/9/16
to Firebase Google Group
I too would love to see an easier way to pull groups of image Uris or paths as it would make GridView population so much easier.... or even cross-device file syncing (which is what I'm currently trying to figure out the how-to's for).

Mike Mcdonald

unread,
Sep 9, 2016, 5:43:00 PM9/9/16
to Firebase Google Group
There are two issues with building cross device file syncing: 1 the built in LIST API is eventually consistent (and wouldn't scale if it weren't), and 2) sync isn't a reasonable default for multi-GB (or TB) files.

1) means that you can't use it for a sync system (at least, not easily ;). The integration with the Realtime Database is a perfect example of how you can easily sync files across devices. It's a few extra lines of code, but far more flexible than anything we could easily and clearly provide in a file storage SDK.

2) in general, sync is a reasonable default behavior for small bits of JSON data, but I can't tell you how many times I turn off Drive/Dropbox/etc. sync because it's trying to move GB files around from one place to another--it's not a great raw model for files. We tried to build file sync into the Realtime Database (I did it the first time), but you run into all sorts of issues that the current file implementation doesn't (are files copied by value or reference, how do you garbage collect, how do you handle someone's concept of an upload vs a sync).

From SO, how to sync files using the Realtime DB. In this case, LIST/sync is just a DB "observe ChildAdded". Magic :)


Shared:

// Firebase services
var database: FIRDatabase!
var storage: FIRStorage!
...
// Initialize Database, Auth, Storage
database = FIRDatabase.database()
storage = FIRStorage.storage()
...
// Initialize an array for your pictures
var picArray: [UIImage]()

Upload:

let fileData = NSData() // get data...
let storageRef = storage.reference().child("myFiles/myFile")
storageRef.putData(fileData).observeStatus(.Success) { (snapshot) in
  // When the image has successfully uploaded, we get it's download URL
  let downloadURL = snapshot.metadata?.downloadURL()?.absoluteString
  // Write the download URL to the Realtime Database
  let dbRef = database.reference().child("myFiles/myFile")
  dbRef.setValue(downloadURL)
}

Download:

let dbRef = database.reference().child("myFiles")
dbRef.observeEventType(.ChildAdded, withBlock: { (snapshot) in
  // Get download URL from snapshot
  let downloadURL = snapshot.value() as! String
  // Create a storage reference from the URL
  let storageRef = storage.referenceFromURL(downloadURL)
  // Download the data, assuming a max size of 1MB (you can change this as necessary)
  storageRef.dataWithMaxSize(1 * 1024 * 1024) { (data, error) -> Void in
    // Create a UIImage, add it to the array
    let pic = UIImage(data: data)
    picArray.append(pic)
  })
})

Thanks,
--Mike

Thaina Yu

unread,
Jul 30, 2017, 9:59:05 AM7/30/17
to Firebase Google Group
This is seriously a flawed in API designing. Should consider a bug in API that you lack a way to list what inside storage. And to rely on db for the data that already exist is not understandable

That said, someone more clever just figure out that firebase using google cloud bucket and can use google cloud api to getFiles

If you want API then you should just add that API google using to firebase sdk. Easy as that and not so stupid like this

Firebase has too many flaw to the point that I can't recommend to my company. Only able to play with self little project

Kato Richardson

unread,
Jul 31, 2017, 1:38:02 PM7/31/17
to Firebase Google Group
Hey Thaina,

Thanks for the feedback. We do love hearing from our community. Particularly when we fail to meet your needs.

In this particular case, I'm surprised to learn that this is such a blocker for you, given that it can be achieved using the GCS SDK as you already mentioned. I'd expect that to be a minor annoyance rather than a reason to flame the product as a whole. I'll definitely take that feedback to the engineers and ensure your thoughts are heard and discussed. Thanks again for this.

If you have other ideas of where we could do better, do submit feedback; this is large part of what has made Firebase a successful platform.

☼, Kato

--
You received this message because you are subscribed to the Google Groups "Firebase Google Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to firebase-talk+unsubscribe@googlegroups.com.
To post to this group, send email to fireba...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/firebase-talk/1130f05c-92ac-415f-8148-c748bf39d12d%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.



--

Kato Richardson | Developer Programs Eng | kato...@google.com | 775-235-8398

Mike McDonald

unread,
Jul 31, 2017, 3:03:53 PM7/31/17
to Firebase Google Group
Thaina (and other interested folks),

We've heard your feedback on this (it's the most viewed Storage question on StackOverflow) and it's something we're working on. There are a few issues to be solved here:
  • How do we properly secure listed files? List provides file metadata, which in turn contains download URLs.
    • This means that List implies Read access, which might not be intuitive to some developers.
    • Does this mean that we have to ensure that a developer has access to read every single file the list covers? If a user doesn't have permission to read one of the files, do we return everything, nothing, or a filtered subset? This could potentially be very expensive (computationally as well as financially). 
    • At present "allow read" would provide "allow get, list", which would potentially allow listing files at a location that developers don't desire. This means we're either migrating Rules or dealing with semantics that are different across products.
  • Do we provide an array of metadata, or an iterator (like the Realtime Database)? Do we offer pagination or keep everything in memory? Pagination sucks for beginning developers, but we've got to support developers with thousands/millions/billions of files.
  • Is list recursive? This has the same performance and security implications as above, but taken to the next level.
Current thought on 1 is that we cleanly separate listing files from getting files, and return everything if you have list permission (even if getting an individual file would fail):

service firebase.storage {
  match /b/{bucket}/o {
    match /users/{userId}/{fileId} {
      // allows a user to fetch a particular file
      allow get: if request.auth.uid == userId;
      // allows a user to list all files/folders prefixed with /users/$(request.auth.uid)
      allow list: if request.auth.uid == userId;
    }
  }
}

We think this will suite the common use case well (e.g. get all of a user's files), though I'd love to hear a use case that this doesn't work well with.

On the second, it's likely we'll provide an array instead of the iterator implementation (as the latter can easily be built on the former):

// currently favored design
// array style return
ref.list(optionalPageToken).then((metadatas, nextPageToken) => {
  // metadatas = [{name: file1.txt, ...}, {name: file2.txt, ...}];
});

// currently not favored
// "iterator" style list listener
ref.list().onFile(metadata => {
  // metadata = {name: file1.txt, ...}
});

Again, we think the first design is easier for people to understand and work with, though I'd appreciate feedback one way or the other.

On recursive: the answer will likely be no. File system `ls` isn't recursive (though commands like `tree` are), so this matches general expectations (even though, to be clear, Storage is *not* a filesystem).

For developers who want to query files (e.g. get me all files owned by a certain user), we still recommend storing file references in the Realtime Database (or some other database), as this feature will definitely not include additional query semantics (they're unsupported by GCS).

If you're interested in providing developer feedback on the API or testing this feature out let me know and we'll get you in our EAP. I believe you can apply for the Alpha program here, which will get you in the pipeline.

Thanks,
--Mike

On Monday, July 31, 2017 at 10:38:02 AM UTC-7, Kato Richardson wrote:
Hey Thaina,

Thanks for the feedback. We do love hearing from our community. Particularly when we fail to meet your needs.

In this particular case, I'm surprised to learn that this is such a blocker for you, given that it can be achieved using the GCS SDK as you already mentioned. I'd expect that to be a minor annoyance rather than a reason to flame the product as a whole. I'll definitely take that feedback to the engineers and ensure your thoughts are heard and discussed. Thanks again for this.

If you have other ideas of where we could do better, do submit feedback; this is large part of what has made Firebase a successful platform.

☼, Kato
On Sat, Jul 29, 2017 at 11:56 PM, Thaina Yu <thai...@gmail.com> wrote:
This is seriously a flawed in API designing. Should consider a bug in API that you lack a way to list what inside storage. And to rely on db for the data that already exist is not understandable

That said, someone more clever just figure out that firebase using google cloud bucket and can use google cloud api to getFiles

If you want API then you should just add that API google using to firebase sdk. Easy as that and not so stupid like this

Firebase has too many flaw to the point that I can't recommend to my company. Only able to play with self little project

--
You received this message because you are subscribed to the Google Groups "Firebase Google Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to firebase-tal...@googlegroups.com.

To post to this group, send email to fireba...@googlegroups.com.

Thaina Yu

unread,
Aug 1, 2017, 1:53:54 PM8/1/17
to Firebase Google Group
First I would like to appreciate the openness detail instead of shut-in "taking feedback" so we could consider and hope for future

For the issue of security I think storage rule is already cope with that. Read access of folder itself should cover the listing. But the downloading link should govern by the file permission itself. Listing is just know what exist but does not mean you can read it so it not really relate to read access. And the metadata should just state like so. It has reference there, you should not override it, and you can read it if you have permission

I mean, if you don't let them know what file it has at that location, but they have permission in that folder, they can still try to write file with the same key and found out there is a thing they have no permission for it anyway

I think this is natural intuitive for storage protocol to able to list file name to show even you have no access to it

For pagination. I also prefer Promise<{arr,pageID}>. It was widely use for querying list of file in "bucket" with just "prefix" and get all the same prefix. Which was using in both AWS S3 and GCS. So it not strange

Seriously I think we are fine with limited functionality. But to not be able to do directly with the data that just exist there is just too disappointed. When I upload folder and I can only know it all uploaded properly is only to check in firebase console make me feel that something wrong here

Lastly
I was thinking for a while that, if the current implementation of firebase storage make us rely on database, then why don't just put storage file reference in database as a storage service itself

I am a fan of cloudant database. It use couchDB. And 3 feature that I could say it better than firebaseDB is mapreduce, lucene index search, and attachment. Here I want to point at attachment

If the firebase storage workaround is to keep reference in the database. Then it has no reason not to have a way to put attachment as an object into the firebase database, make it convert to standardized reference, and store content in firebase storage as a native feature

You sell this service together anyway it should be native and it benefit us both to have it most efficient maintaining. I mean just as we delete reference from database and it would be deleted in storage make it really convenience for us. And we don't need to maintain the reference tracking code that would be the same in every project too
Reply all
Reply to author
Forward
0 new messages