[Rails 6] Bug in caching key generation when working with active record collections?

Marc Köhlbrugge

unread,

Oct 24, 2019, 8:52:22 PM10/24/19

to Ruby on Rails: Core

I'm running into a problem when using fragment caching with active record collections.

Rails 6 uses a combination of collection.cache_key and collection.cache_version to determine whether a fragment cache is fresh or not. However, there is a scenario where the collection might change while collection.cache_key and collection.cache_version stay the same.

It happens when you use a limit on the collection and then destroy one of those records, as long as it's not the most recent one. This way collection.cache_key will stay the same, because the query does not change. And collection.cache_version will stay the same as well, because the collection count will remain unchanged as does the maximum updated_at timestamp of the most recent record.

I've build a sample app for demonstration purposes:

https://github.com/marckohlbrugge/rails-6-collection-caching-bug

The readme describes a way to reproduce the issue via the website itself, or the Rails console.

Would this be considered a bug or expected behavior? Are there any known work arounds?

richard schneeman

unread,

Oct 27, 2019, 9:55:18 AM10/27/19

to rubyonra...@googlegroups.com

Sounds like a bug. Can you move into an issue?

--
You received this message because you are subscribed to the Google Groups "Ruby on Rails: Core" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rubyonrails-co...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/rubyonrails-core/4cc0ae69-c736-48d0-bd84-8b3f44dc879d%40googlegroups.com.

--

Richard Schneeman
https://www.schneems.com

Marc Köhlbrugge

unread,

Oct 27, 2019, 10:41:27 AM10/27/19

to Ruby on Rails: Core

👍 https://github.com/rails/rails/issues/37555

I did find a few similar issues, but they all got marked as stale

https://github.com/rails/rails/issues/34408

https://github.com/rails/rails/issues/31996

https://github.com/rails/rails/issues/34093

I'm surprised this doesn't appear to be a bigger problem. Perhaps I'm not supposed to fragment cache my paginated results. Using collection caching (`render collection: @posts, cached: true`) might be performant enough.

On Sunday, October 27, 2019 at 2:55:18 PM UTC+1, richard schneeman wrote:

Sounds like a bug. Can you move into an issue?

On Thu, Oct 24, 2019 at 7:52 PM Marc Köhlbrugge <h...@marckohlbrugge.com> wrote:

I'm running into a problem when using fragment caching with active record collections.

Rails 6 uses a combination of collection.cache_key and collection.cache_version to determine whether a fragment cache is fresh or not. However, there is a scenario where the collection might change while collection.cache_key and collection.cache_version stay the same.

It happens when you use a limit on the collection and then destroy one of those records, as long as it's not the most recent one. This way collection.cache_key will stay the same, because the query does not change. And collection.cache_version will stay the same as well, because the collection count will remain unchanged as does the maximum updated_at timestamp of the most recent record.

I've build a sample app for demonstration purposes:

https://github.com/marckohlbrugge/rails-6-collection-caching-bug

The readme describes a way to reproduce the issue via the website itself, or the Rails console.

Would this be considered a bug or expected behavior? Are there any known work arounds?

--
You received this message because you are subscribed to the Google Groups "Ruby on Rails: Core" group.

To unsubscribe from this group and stop receiving emails from it, send an email to rubyonra...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/rubyonrails-core/4cc0ae69-c736-48d0-bd84-8b3f44dc879d%40googlegroups.com.

Aaron Lipman

unread,

Oct 28, 2019, 1:22:43 PM10/28/19

to rubyonra...@googlegroups.com

Hi Marc & Richard,

I'd categorize this as a bug. When generating a cache_version for a collection, one solution might be to include a hash of the IDs of all items in the collection. This way, the cache_version will change should an item be removed.

I believe modifying the compute_cache_version method defined in activerecord/lib/active_record/relation.rb is the way to go about this.

Marc, please let me know if you intend to submit a pull request. While I'm not on the Rails team, I'd be happy to offer any assistance and lobby for implementing a fix. (If you're not interested in submitting a PR, I'd be excited to pick this up myself.)

To unsubscribe from this group and stop receiving emails from it, send an email to rubyonrails-co...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/rubyonrails-core/6590103e-3528-4896-bc07-e6ff6a194bef%40googlegroups.com.

Daniel Heath

unread,

Oct 28, 2019, 4:31:32 PM10/28/19

to rubyonra...@googlegroups.com

The full collection could be millions of records. Fetching the ids to hash might not be an option either, unless they can be hashed on the database side.

Thanks,

Daniel Heath

On 29 Oct 2019, at 4:22 am, Aaron Lipman <alip...@gmail.com> wrote:

To view this discussion on the web visit https://groups.google.com/d/msgid/rubyonrails-core/CAEJZ43iCipyOZddZAT1WCrtQ7g7VRLeokgLRGcYroK-TvGoGWw%40mail.gmail.com.

Aaron Lipman

unread,

Oct 30, 2019, 3:17:25 PM10/30/19

to rubyonra...@googlegroups.com

Thanks for the flag, Daniel. On my first read of the code, I didn't catch that an aggregate query was used to determine the max updated_at timestamp (without fetching individual records) if a relation wasn't loaded.

Figuring out a database-side hashing function that works with all the flavors of SQL supported by Rails may prove tricky. I'm going to consider this some more, but an alternative approach might be to update the documentation to acknowledge this issue and explain how to customize a model's cache_version method according to one's needs.

To view this discussion on the web visit https://groups.google.com/d/msgid/rubyonrails-core/4A85F762-275C-4796-9A47-0586833DADFA%40heath.cc.

Daniel Heath

unread,

Oct 30, 2019, 6:19:35 PM10/30/19

to rubyonra...@googlegroups.com

I think it’s worth considering implementing per database, as the major ones all have something that’ll work and the keys don’t need to be stable across different data store implementations.

Thanks,

Daniel Heath

On 31 Oct 2019, at 6:17 am, Aaron Lipman <alip...@gmail.com> wrote:

To view this discussion on the web visit https://groups.google.com/d/msgid/rubyonrails-core/CAEJZ43h%2BOsd6b_1H2a%3DGScC8dpp%3DMw9d1mO0id6Fdjy2igASUg%40mail.gmail.com.

Aaron Lipman

unread,

Nov 4, 2019, 10:22:00 AM11/4/19

to rubyonra...@googlegroups.com

Been thinking on this a few days, here are a couple potential solutions I've been considering:

Hash-based: Sum the MD5 hashes of the IDs and append the last 32 hexadecimal digits of the sum to the cache version. Alternatively, use a mathematical hashing function like Fibonacci Hashing. (The second approach wouldn't accommodate non-numeric ID columns, but may be easier to implement across various SQL flavors without any conditional logic.) Both approaches need to account for the various 64-bit limits found in MySQL and Postgres, e.g. MySQL's CONV() function.

Last ID based: It occurs to me that when a record within a collection is destroyed, either the size of the collection or the ID of its last item will change. If we can reliably get that last ID, we can use it in conjunction with collection size to generate more robust cache keys. The LAST_VALUE sql function might provide this. However, MySQL didn't add LAST_VALUE until version 8 (2018), but I think we can emulate the same functionality via MySQL session variables.

I'm leaning towards the "Last ID" path. I think it's a little more elegant and probably more efficient than calculating a sum when it comes to really large collections. Starting work on a pull request, but welcome other ideas/directions/considerations.

To view this discussion on the web visit https://groups.google.com/d/msgid/rubyonrails-core/FD13AF4B-ADBA-4ABC-9F56-D63F181B653D%40heath.cc.

Daniel

unread,

Nov 4, 2019, 5:58:15 PM11/4/19

to rubyonra...@googlegroups.com

The last ID isn't stable if you replace an item within the collection though, right (assuming it's via a HABTM, so the associated record doesn't get touched)?

To view this discussion on the web visit https://groups.google.com/d/msgid/rubyonrails-core/CAEJZ43hUknLmyi_dWmP2TW3Uspcvw5m7jWAUO%3DiPg_u_bWJFDg%40mail.gmail.com.

Aaron Lipman

unread,

Nov 5, 2019, 9:16:18 AM11/5/19

to rubyonra...@googlegroups.com

Again, good point. There are some workarounds (https://evrim.io/invalidating-caches-when-using-many-to-many-associations-in-rails/), but the absence of a touch: true option on has_many relations makes the last ID approach less optimal.

Also, I'm noting that support for the LAST_VERSION function didn't get added to SQLite3 until version 3.25.0 (2018). As much as I liked the idea of the last ID approach, I'm not seeing a clean way to implement that doesn't require Rails devs to update their development environment.

I'm going to continue to pursue a hash-based approach. Thanks again for your insights, Daniel!

To view this discussion on the web visit https://groups.google.com/d/msgid/rubyonrails-core/89cf81d9-d107-f976-572c-caf1a095561e%40heath.cc.

richard schneeman

unread,

Nov 5, 2019, 11:38:38 AM11/5/19

to Ruby on Rails: Core

Love this passion and energy. Can you all move the conversation over to the issue so that it's easier for other people have a similar problem to find or chime in?

To view this discussion on the web visit https://groups.google.com/d/msgid/rubyonrails-core/CAEJZ43hXHE9NOeoJoHif4FFuxK1YLsAc%2BkQNiEBxckpTYjULKA%40mail.gmail.com.

Aaron Lipman

unread,

Nov 25, 2019, 8:17:13 AM11/25/19

to rubyonra...@googlegroups.com

Thanks for the reminder, Richard.

Wanted to ping folks that I've submitted a PR adding first and last IDs to cache versions:

https://github.com/rails/rails/pull/37724

(My reasoning for using bounding IDs over a checksum is contained in the summary, but I wanted to restate my gratitude for others' input.)

I greatly appreciate further review!

To view this discussion on the web visit https://groups.google.com/d/msgid/rubyonrails-core/CAFA5uRMZjpD0QDp%2BGrWxnd3BC6OtzUiPfOMn%2B-yg_UadPSf09Q%40mail.gmail.com.

Reply all

Reply to author

Forward