Proposal: replication_local_dc_only option for NetworkTopologyStrategy

55 views
Skip to first unread message

Ihor Zashchelkin

<ihorz@tuta.com>
unread,
Aug 15, 2025, 9:15:42 AMAug 15
to ScyllaDB development
Hi. I've started looking into the Scylla codebase to see if a feature we need is easily implementable.
The idea is to add an option "replication_local_dc_only" to NetworkTopologyStrategy.
This would allow disabling remote DC replication entirely for a given keyspace, while honoring the configured RF values in every one of them.
The primary motivation is not to hide data, but to optimize tables that store temporary data for which replication might make no sense (e.g., locks, sessions, rate limiting).

I have a patch prototype, but currently it works only for vnodes.
I'm looking for a place where I should move the logic so that it works for both vnodes and tablets.

My questions are:
1. Do you see this as a valid feature for Scylla? Especially for users that decided to not use a separate cache like Redis.
2. Where would in your opinion be the ideal place to put this logic so that it works consistently between vnode and tablet replication?


--- a/locator/network_topology_strategy.cc
+++ b/locator/network_topology_strategy.cc
@@ -40,7 +40,9 @@ namespace locator {
 
 network_topology_strategy::network_topology_strategy(replication_strategy_params params) :
         abstract_replication_strategy(params,
-                                      replication_strategy_type::network_topology) {
+                                      replication_strategy_type::network_topology),
+        _rep_local_dc_only{false}
+{
     auto opts = _config_options;
     process_tablet_options(*this, opts, params);
 
@@ -57,6 +59,11 @@ network_topology_strategy::network_topology_strategy(replication_strategy_params
             continue;
         }
 
+        if (boost::equals(key, "replication_local_dc_only")) {
+            _rep_local_dc_only = boost::iequals(val, "true");
+            continue;
+        }
+
         if (boost::iequals(key, "replication_factor")) {
             if (boost::equals(key, "replication_factor")) {
                 on_internal_error(rslogger, "replication_factor should have been replaced with a DC:RF mapping by now");
@@ -192,7 +199,7 @@ class natural_endpoints_tracker {
     size_t _dcs_to_fill;
 
 public:
-    natural_endpoints_tracker(const token_metadata& tm, const std::unordered_map<sstring, size_t>& dc_rep_factor)
+    natural_endpoints_tracker(const token_metadata& tm, const std::unordered_map<sstring, size_t>& dc_rep_factor, bool _rep_local_dc_only)
         : _tm(tm)
         , _tp(_tm.get_topology())
         , _dc_rep_factor(dc_rep_factor)
@@ -207,6 +214,8 @@ class natural_endpoints_tracker {
             return i != map.end() ? i->second.size() : size_t(0);
         };
 
+        auto& local_dc = tm.get_topology().get_datacenter();
+
         // Create a data_center_endpoints object for each non-empty DC.
         for (auto& [dc, rf] : _dc_rep_factor) {
             auto node_count = size_for(_token_owners, dc);
@@ -215,9 +224,14 @@ class natural_endpoints_tracker {
                 continue;
             }
 
+            if (_rep_local_dc_only && dc != local_dc) {
+                _dc_rep_factor.at(dc) = 0;
+                continue;
+            }
+
             _dcs.emplace(dc, data_center_endpoints(rf, size_for(_racks, dc), node_count, _replicas, _seen_racks));
-            _dcs_to_fill = _dcs.size();
         }
+        _dcs_to_fill = _dcs.size();
     }
 
     bool add_endpoint_and_check_if_done(host_id ep) {
@@ -256,7 +270,7 @@ future<host_id_set>
 network_topology_strategy::calculate_natural_endpoints(
     const token& search_token, const token_metadata& tm) const {
 
-    natural_endpoints_tracker tracker(tm, _dc_rep_factor);
+    natural_endpoints_tracker tracker(tm, _dc_rep_factor, _rep_local_dc_only);
 
     for (auto& next : tm.ring_range(search_token)) {
         co_await coroutine::maybe_yield();
@@ -281,6 +295,14 @@ void network_topology_strategy::validate_options(const gms::feature_service& fs,
             on_internal_error(rslogger, fmt::format("'replication_factor' tag should be unrolled into a list of DC:RF by now."
                                                     "_config_options:{}", _config_options));
         }
+
+        if (c.first == sstring("replication_local_dc_only")) {
+            if (boost::iequals(c.second, "true") || boost::iequals(c.second, "false")) {
+                continue;
+            }
+            throw exceptions::configuration_exception(format("replication_local_dc_only must be true or false."));
+        }
+
         if (!dcs.contains(c.first)) {
             throw exceptions::configuration_exception(format("Unrecognized strategy option {{{}}} "
                 "passed to NetworkTopologyStrategy", this->to_qualified_class_name(c.first)));
@@ -293,6 +315,7 @@ effective_replication_map_ptr network_topology_strategy::make_replication_map(ta
     if (!uses_tablets()) {
         on_internal_error(rslogger, format("make_replication_map() called for table {} but replication strategy not configured to use tablets", table));
     }
+
     return do_make_replication_map(table, shared_from_this(), std::move(tm), _rep_factor);
 }
 
diff --git a/locator/network_topology_strategy.hh b/locator/network_topology_strategy.hh
index 7d5f9e5e2e..0d1afd3ff5 100644
--- a/locator/network_topology_strategy.hh
+++ b/locator/network_topology_strategy.hh
@@ -70,6 +70,7 @@ class network_topology_strategy : public abstract_replication_strategy
 
     // map: data centers -> replication factor
     std::unordered_map<sstring, size_t> _dc_rep_factor;
+    bool _rep_local_dc_only;
 
     std::vector<sstring> _datacenteres;
     size_t _rep_factor;


Thanks,
Ihor

Avi Kivity

<avi@scylladb.com>
unread,
Aug 15, 2025, 10:28:53 AMAug 15
to Ihor Zashchelkin, ScyllaDB development
What's the difference between this and having a keyspace per datacenter, each keyspace only replicating locally?
--
You received this message because you are subscribed to the Google Groups "ScyllaDB development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scylladb-dev...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/scylladb-dev/d5fb1ad9-8f71-457d-801e-973755a0046dn%40googlegroups.com.

Ihor Zashchelkin

<ihorz@tuta.com>
unread,
Aug 26, 2025, 5:11:49 AM (13 days ago) Aug 26
to ScyllaDB development
For people who specify table IDs in CREATE TABLE statements (we do):
Table IDs would be the same across datacenters, which would allow them to be hardcoded in a .cql file.
Very simple to maintain — no moving pieces.

The keyspace name would be the same no matter where you deploy:
Again, very simple to maintain. Currently, we have to ensure that keyspace names are obtainable at runtime, which is an extra moving piece.

Of course, it's not much, but nevertheless all the schema information would not need to be duplicated per each datacenter.

Also it would be similar to the local strategy but at the datacenter level.

Avi Kivity

<avi@scylladb.com>
unread,
Aug 26, 2025, 5:18:09 AM (13 days ago) Aug 26
to Ihor Zashchelkin, ScyllaDB development
I agree it's nicer for the application if the database takes care of everything. But we have to balance it with the extra QA that comes with an additional replication mechanism.

So the question is, how common would this local-only replication be? If it's just your use case, it's better done in the application. If there are more, we can offer it as a keyspace option.

btw, fully separate clusters also do local-only replication, with different tradeoffs. This doesn't work if you want a mix of local-only and global replication.

Christian Sp

<horschi@gmail.com>
unread,
Aug 27, 2025, 3:05:52 AM (12 days ago) Aug 27
to ScyllaDB development
How would we know if other users would use it too? I think it is not the kind of feature request people come up with on their own.

But we think it makes sense for a lot of use-cases:
- user sessions
- locking
- logging
- rate limiting
Reply all
Reply to author
Forward
0 new messages