Transaction APIs do not consult the DB router to choose DB connection

閲覧: 998 回
最初の未読メッセージにスキップ

N Aditya

未読、
2021/05/27 12:18:502021/05/27
To: Django developers (Contributions to Django itself)

From the Django docs, for any ORM query made, the DB alias to use is determined by the following rules:

  • Checks if the using keyword is used either as a parameter in the function call or chained with QuerySet.
  • Consults the DB routers in order until a match is found.
  • Falls back to the default router which always returns default as the alias.

Using the router scheme works perfectly for ORM queries. However, when it comes to using transaction APIs ​(like the transaction.atomic construct), it becomes mandatory to specify the using kwarg explicitly in all of its publicly exposed APIs if a DB other than the default alias has to be chosen.

To illustrate why this is a problem, consider the following scenario:

  • A DB router exists such that it directs queries to a specific database based on a value set in thread-local storage by a middleware.
  • When ORM queries from within the view are fired, they automatically get forwarded to the right DB without explicitly citing the using keyword owing to the router.
  • But if the transaction.atomic construct is used, the using keyword would have to be explicitly specified each time. While this might seem fine, it creates the following problems:
    1. For large code bases, it becomes highly unwieldy to make sure that the using keyword has been mentioned in every transaction API call. Also, if one tries to implement the above scheme in an already existing code base, it would be impractical to add the using keyword in all lines calling the transaction APIs.
    2. It doesn't gel well with the the routers framework.

A proposed workaround could to be to add a separate method named db_for_transaction to the routers framework which would be consulted by the transaction APIs first, before falling back to using the default DB alias.

  1. Having a separate method for transaction is good because it need not be mangled up with the other methods i.e db_for_read / db_for_write as they clearly indicate certain operational semantics which aren't tied to a transaction since it can have both reads and writes within it. 
  2. Also, if in the future, there is some special info that can be delivered to the transaction based on which a decision regd which DB to use could be made, then it would be cleanly isolated into its own method.


If we can finalise on this, I could submit a PR for the same.

charettes

未読、
2021/05/28 0:35:262021/05/28
To: Django developers (Contributions to Django itself)
Ticket that directed to the mailing list for wider feedback https://code.djangoproject.com/ticket/32788

---

Can you think of places where this db_for_transaction hook would differ in any way from what db_for_write returns? That's what Django uses internally in such instances

  1. https://github.com/django/django/blob/0d67481a6664a1e66d875eef59b96ed489060601/django/db/models/base.py#L745-L760
  2. https://github.com/django/django/blob/0d67481a6664a1e66d875eef59b96ed489060601/django/db/models/base.py#L950
  3. https://github.com/django/django/blob/0d67481a6664a1e66d875eef59b96ed489060601/django/db/models/deletion.py#L400

I get that your asking for a way to avoid explicitly passing atomic(using) all over the place but I'm having a hard time figuring out in which cases a db_for_transaction hook, which cannot be provided any contextual details like other router methods do, can take an an advised decision without relying on contextual/global state.

Is there a reason you can't replace your wide spread internal uses of atomic by something along these lines

def routed_atomic(savepoint=True, durable=False):

    using = get_global_using()

    return transaction.atomic(using, savepoint=savepoint, durable=durable)

Cheers,

Simon

N Aditya

未読、
2021/05/28 4:17:262021/05/28
To: Django developers (Contributions to Django itself)
Hey Simon, 

It would be possible to write a method as you've suggested above. But it entails the following problems:
1. There would have to be a wrapper for each of the transaction APIs, not just transaction.atomic since all of them take the using kwarg.
2.  I'm trying to build a library that helps achieve multi-tenancy (the isolated DB approach) by dynamically routing DB queries to the appropriate databases at runtime. For more info, check this out -> PyCon India 2021 proposal. If that be the case, for any new code base, I could expose a bunch of methods as suggested above and ask them to incorporate it. But for older/existing code bases which already directly use various transaction APIs, it would be hard to replace all lines with another method. 

By inspecting code, I figured that there is this method named get_connection in the transaction.py module which is being used by all the other transaction APIs for fetching the DB connection. Currently I've patched this method with my own to achieve the functionality mentioned above. Might I suggest an alternate implementation for this method as follows:

#settings.py

TRANSACTION_DB_SELECTOR = "path_to_some_callable"

#transaction.py
...
transaction_db_selector = import_string(settings.TRANSACTION_DB_SELECTOR)
def get_connection():
    if transaction_db_selector:
        using = transaction_db_selector()
    if using is None:
        using = DEFAULT_DB_ALIAS
    return connections[using]

Let me know your thoughts on this. 

Regards, 
Aditya N

Adam Johnson

未読、
2021/05/28 4:33:162021/05/28
To: django-d...@googlegroups.com
Aditya - you didn't answer Simon's first question: "Can you think of places where this db_for_transaction hook would differ in any way from what db_for_write returns?" I think this is the crux of the issue. atomic does already call DB routers - in what case could you need to return a different result for atomic() than for a write query?

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-develop...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/001a5f54-d848-4398-8712-d0a068519f1an%40googlegroups.com.

N Aditya

未読、
2021/05/28 5:25:022021/05/28
To: Django developers (Contributions to Django itself)
Hey Adam, 

"atomic does already call DB routers" -> Firstly after reading code, I don't think the transaction APIs consult the routers. Secondly, I think I already answered it in the initial discussion.

FYI from message-1:

  1. Having a separate method for transaction is good because it need not be mangled up with the other methods i.e db_for_read / db_for_write as they clearly indicate certain operational semantics which aren't tied to a transaction since it can have both reads and writes within it. 
  2. Also, if in the future, there is some special info that can be delivered to the transaction based on which a decision regd which DB to use could be made, then it would be cleanly isolated into its own method.
Regards, 
Aditya N

N Aditya

未読、
2021/05/28 5:34:142021/05/28
To: Django developers (Contributions to Django itself)
Hey Adam, 

Also, after giving it a bit of thought, I figured that integrating this logic with the routers framework isn't entirely necessary.
So I came up with another perspective to the solution as well which I've illustrated in message-3 of this thread.

Both approaches work for me. 

Let me know your thoughts on the same.

Regards, 
Aditya N

N Aditya

未読、
2021/05/31 3:13:162021/05/31
To: Django developers (Contributions to Django itself)
Hey Adam/Simon, 

How can we take this forward ? 

Regards,
Aditya N

charettes

未読、
2021/05/31 5:43:132021/05/31
To: Django developers (Contributions to Django itself)
Aditya,

> "atomic does already call DB routers" -> Firstly after reading code, I don't think the transaction APIs consult the routers. Secondly, I think I already answered it in the initial discussion.

Atomic doesn't consult it directly but the ORM does before interacting with transactions. While it's true that the transaction is not entirely bound to the writes if you're running Django in non-autocommit mode it's not the default usage and even discouraged. When you *do* run Django in auto-commit mode having db_for_read, db_for_write properly defined work just fine without a need for this proposed hook assuming your application code retrieves the proper database using db_for_write and use the return value as atomic(using). That's what Django does internally and what I was referring to in my message.

> Also, if in the future, there is some special info that can be delivered to the transaction based on which a decision regd which DB to use could be made, then it would be cleanly isolated into its own method.

I don't think this is a strong argument by any mean. To me this is basically saying "it might be useful in the future so lets add it now" without providing any details about why it might be useful in the first place.

> How can we take this forward ?

You'll need to gather positive support for your proposal, I don't know where Adam stands but I'm personally -1 on it. I'm still not convinced it's worth adding for your detailed usage for the following reasons:

1. There's multiple existing multi-tenancy applications out there that are relatively mature (I've even personally written one [1]) and none of them expressed a need for such a hook[2]. From my experience adding a lot of entries to DATABASES or using one schema per tenant only scales to a certain point when it starts breaking badly. Django assumes `DATABASES` is a static constant mapping, many unexpected and subtle things happen when it's not the case (connections leak, KeyErrors)
2. It's just not common enough to warrant an inclusion in Django core. Every single added feature comes with a maintenance burden cost and risks introducing bugs when interacting other existing features. This is particularly true with niche features interacting with complex systems such as ORM transaction handling.

Best,
Simon

Adam Johnson

未読、
2021/05/31 6:13:582021/05/31
To: django-d...@googlegroups.com
I'm also -1 on changing anything in Django right now.

I think we should indeed take the "default position" for complex features: write a third party library, get some traction. If many people find it useful, we can look at adding something to core. It sounds like you're already working on such a library Aditya, so part of the way there.

Florian Apolloner

未読、
2021/06/01 2:34:162021/06/01
To: Django developers (Contributions to Django itself)
On Monday, May 31, 2021 at 12:13:58 PM UTC+2 Adam Johnson wrote:
I'm also -1 on changing anything in Django right now.

-1 as well, I simply see no way how something like:

```
with transaction.atomic():
     Model1.objects.create()
     Model2.objects.create()
```

will allow for any useful behavior via the router. Unless transaction.atomic could inspect the contents and figure out the involved dependencies, but this is surely way more than what could go into core for now.

N Aditya

未読、
2021/06/01 6:09:332021/06/01
To: django-d...@googlegroups.com
Hey all, 

I believe there's a major misunderstanding of what I've been trying to convey. Let me rephrase:

Problem: Include the request object as a decision param in the routers framework/transaction APIs based on which a DB can be chosen i.e all queries(reads/writes) and transactions are directed to a specific database without having to explicitly mention it. This could be location based, IP based, domain based or based on any other metadata available in the request. 

Explanation:
1. This use-case is an extension of the examples provided in Django docs for multi-database support wherein based on the model (or) based on the type of operation(read/write) the database is chosen via the routers framework. I need to achieve the same but this time it wouldn't be based on the model or type of operation, rather it would be based on some metadata in the request
2. The multi-tenancy library which I've built just adds solidity to this use-case. Simply put, it achieves routing based on request headers. 

The gap: I believe the essence of the routers framework is to provide a hook by which ORM queries can be directed to a DB based on some condition without having to explicitly specify it in every query. A similar hook is missing for all of the transaction APIs. This is the gap I'm trying to bridge. To add further context, in the library that I built, though the routers framework currently doesn't receive the request object, I was able to work around using contextvars but the essence is that the hooks were there. I'm unable to do the same for the transaction APIs without monkey patching since such a hook is unavailable.

> I don't think this is a strong argument by any mean. To me this is basically saying "it might be useful in the future so lets add it now" without providing any details about why it might be useful in the first place.

  1. I thought I made it clear but reinstating anyway. I'd like the routers framework to include the request object as part of the arguments list to each of its methods so that decisions made can be based on it as well. If so, there could be a separate method(hook) for transaction APIs as well, like db_for_transaction which can take in the request object to make a decision. It could include something else as well in the future. (Mentioned in message-1 of this thread. Probably I forgot to mention the request object part).
  2. If the above is difficult to achieve, then we could at the least expose a hook which can be invoked with the request object as its param to make a decision. I believe this should be fairly simple to implement and maintain. (Mentioned in message-3 of this thread).

If you guys feel that this use-case is still invalid, I'd like to know a logical explanation apart from maintenance overhead issues and waiting for more people to come up with the same problem. 

----------------------------------------------------------------

Simon, regd the multi-tenancy thing, I've done some extensive research on the libraries available currently to achieve multi-tenancy (including yours) in Django and none of them have adopted the isolated DB approach and declared it production ready. There are multiple libraries using the schema per tenant approach but the downsides are: 

1. It is DB implementation specific i.e only DBs' which are using schemas (like PostgreSQL) can take advantage of it. If they're using MongoDB or other data stores, they're out of luck.
2. In a microservices architecture, one may be using a combination of DBs' for various purposes. For eg: PostgreSQL for OLTP, ElasticSearch for real-time search etc., In such cases, they're out of luck again. 
3.  A lot of B2B SAAS companies have a mandate that their data cannot be collocated in the same physical space so as to avoid data leaks and breaches, especially in healthcare systems (HIPAA). 

Also, there are a few libraries using the shared table FK approach to inject relationship filters dynamically but after reading code, I found that it doesn't cover all intricate/edge cases like:
1. m2m relations between two tenant independent models.
2. Not all methods of the QuerySet have been overridden properly. 

Though I agree with your point that all these libraries (including mine) might not scale well for tenants in the range 50 - 100K, I don't see why it wouldn't work for tenants in the order of a few 1000s. Also, I'd genuinely like to know why the DATABASES dictionary or the CACHES dictionary would run into connection leaks/KeyErrors with such an approach. 

Regards, 
Aditya N

--
You received this message because you are subscribed to a topic in the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/django-developers/clzg6MiixFc/unsubscribe.
To unsubscribe from this group and all its topics, send an email to django-develop...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/6292ccbb-bc83-48d0-b08e-aa765a29ce32n%40googlegroups.com.

Aymeric Augustin

未読、
2021/06/01 7:45:292021/06/01
To: django-d...@googlegroups.com
Hello,

The first item in Django's design philosophies is Loose coupling. Per this principle, the database layer shouldn't know about the HTTP layer. This is a strong reason for keeping the HTTP request object away from the database layer (e.g. database routers or the transactions API). This is why "a hook which can be invoked with the request object as its param to make a decision" is extremely unlikely to be accepted.

If you disagree with the consequences of this philosophy, you can store the current HTTP request as a thread-local variable in your application and access it from anywhere, for example from a database router.

Since I'm the original author of the transactions API, I've been thinking about your suggestion for the last few days. However, I'm not seeing what context could meaningfully be passed to `transaction.atomic()` besides the existing `using` parameter. Passing the HTTP request object is not an option.

I would recommend running without persistent database connections (CONN_MAX_AGE = 0) and switching settings.DATABASE["default"] in a middleware at the beginning of every request. Modifying settings at runtime isn't officially supported, but this looks like the closest option to the intent you expressed.

Best regards,

-- 
Aymeric.



You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-develop...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/CABP%2BMjQNWQ2AK8%3DmSQKH0%3DdNnHHeV3kSrT13R--9%3DKz%3DvbfX%2Bw%40mail.gmail.com.

N Aditya

未読、
2021/06/01 8:35:172021/06/01
To: Django developers (Contributions to Django itself)
Hey, 

I agree to the fact that the request object being passed to the DB layer (routers/transaction APIs) may not be pragmatic according to the Loose coupling approach. 


Quoting what you said: 
> you can store the current HTTP request as a thread-local variable in your application and access it from anywhere, for example from a database router.

From The Gap section in the above message:

> To add further context, in the library that I built, though the routers framework currently doesn't receive the request object, I was able to work around using contextvars but the essence is that the hooks were there. I'm unable to do the same for the transaction APIs without monkey patching since such a hook is unavailable.

Both of us are saying the same thing.

Setting something in thread-local/contextvars and accessing it via the DB router -> Cool. Works fine for all DB queries/migrations.
Setting something in thread-local/contextvars and accessing it via ________ -> For transaction APIs (not just atomic but also the other APIs such as savepoint, savepoinnt_rollback etc.,) the dash is what I'm looking for. 

All I'm looking for is a hook that the transaction APIs can call before deciding on which database to use. I don't see any reason for why providing a hook seems so difficult. 
A simple implementation can be: (From message-3 of this conversation):


> #settings.py

TRANSACTION_DB_SELECTOR = "path_to_some_callable"

#transaction.py
...
transaction_db_selector = import_string(settings.TRANSACTION_DB_SELECTOR)
def get_connection():
    if transaction_db_selector:
        using = transaction_db_selector()
    if using is None:
        using = DEFAULT_DB_ALIAS
    return connections[using]

Let me know your thoughts on this. 




Florian Apolloner

未読、
2021/06/01 8:51:022021/06/01
To: Django developers (Contributions to Django itself)
On Tuesday, June 1, 2021 at 1:45:29 PM UTC+2 Aymeric Augustin wrote:
I would recommend running without persistent database connections (CONN_MAX_AGE = 0) and switching settings.DATABASE["default"] in a middleware at the beginning of every request. Modifying settings at runtime isn't officially supported, but this looks like the closest option to the intent you expressed.

Funny that you should mention this. I thought about this when doing something similar for ARA, but settled down on a custom database backend: https://github.com/ansible-community/ara/blob/master/ara/server/db/backends/distributed_sqlite/base.py -- it does work surprisingly well and I think it might be more resilient than settings shenanigans. Not that I am particularly proud of that backend but I thought I'd share it here for the off-chance someone finds your post via a search engine and wonders how to implement such a thing.

Cheers,
Florian 

Florian Apolloner

未読、
2021/06/01 9:02:442021/06/01
To: Django developers (Contributions to Django itself)
On Tuesday, June 1, 2021 at 2:35:17 PM UTC+2 gojeta...@gmail.com wrote:
I don't see any reason for why providing a hook seems so difficult. 

It is more code to maintain, needs tests etc and increases complexity. Just because something is easy on the surface, doesn't mean it will be easy in the longrun.

A simple implementation can be: (From message-3 of this conversation):


> #settings.py

TRANSACTION_DB_SELECTOR = "path_to_some_callable"

#transaction.py
...
transaction_db_selector = import_string(settings.TRANSACTION_DB_SELECTOR)
def get_connection():
    if transaction_db_selector:
        using = transaction_db_selector()
    if using is None:
        using = DEFAULT_DB_ALIAS
    return connections[using]

I do not think that completely ignoring the `using` that was passed in by the user would be a very good idea (for the case when there is also a `TRANSACTION_DB_SELECTOR` set).

Cheers,
Florian

N Aditya

未読、
2021/06/01 12:17:202021/06/01
To: Django developers (Contributions to Django itself)
Hey Florian, 

I'd like to point out that the code snippet which I wrote is just a scratch version of how it might look. I'd first like to reach consensus on the idea itself before writing something concrete.
Obviously it wouldn't be as naive as overriding something that the user provided explicitly. 

Regards,
Aditya N

Lokesh Dokara

未読、
2021/06/01 14:04:552021/06/01
To: Django developers (Contributions to Django itself)
Hi Everyone,

Our use case is that we have a writer primary database and read-only replica databases. The replicas can sometimes be a bit behind the primary
We have written a Router like this

class CustomRouter:
    def db_for_read(self, model, **hints):
        return 'reader'

    def db_for_write(self, model, **hints):
        return 'default'

    def allow_relation(self, obj1, obj2, **hints):
        return None

    def allow_migrate(self, db, app_label, model_name=None, **hints):
        return None

So this is all fine for normal queries. But we observed that even within a transaction.atomic() block, the read queries go to a different database, and write queries go to a different database

So I was hoping db_for_transaction proposal may improve this behavior.
On Tuesday, June 1, 2021 at 6:32:44 PM UTC+5:30 f.apo...@gmail.com wrote:

N Aditya

未読、
2021/06/01 15:42:122021/06/01
To: Django developers (Contributions to Django itself)
Hey Lokesh, 

Just out of curiosity, I'd like to clarify the expected behaviour. 

If a db_for_transaction method is implemented, I believe it would be consulted for transaction APIs like atomic etc., Reads and writes which happen within that transaction are nevertheless going to consult their respective router methods. Is your expectation that the router somehow detect that there's an open transaction block somewhere above and scope all reads/writes that happen within it to be routed to the DB it was tied to ? i.e the DB alias used by the transaction takes precedence over what the router methods namely db_for_read/db_for_write may return.

If this is your expectation, I think it might be hard to bake this into core Django. We could receive more views on the same from the community.

Looking at it from another perspective, I believe this can be achieved externally if core Django could expose hooks such as transaction.on_enter / transaction.on_exit to which callbacks can be hooked which could set 
some state info in thread-locals/contextvars which can later be parsed by the db_for_read/db_for_write methods to determine whether to use this info to make its decision or fallback to the default behaviour.

Also, something I'd like to clarify is that my use-case for a hook such as db_for_transaction is different. You could read above to know more about this.

Regards,
Aditya N

Aymeric Augustin

未読、
2021/06/01 16:32:352021/06/01
To: django-d...@googlegroups.com
Hello,

On 1 Jun 2021, at 14:35, N Aditya <gojeta...@gmail.com> wrote:

All I'm looking for is a hook that the transaction APIs can call before deciding on which database to use. I don't see any reason for why providing a hook seems so difficult. 

Just because something is easy to implement doesn't mean it gets added to Django. It also needs to be generally useful and well designed. Your proposal doesn't meet the second criterion and I'm skeptical about the first one.

Indeed, the get_connection() API you're proposing doesn't take any argument. As a consequence, the only way to make it return a different value at different times would be to depend on global variables. This is bad software engineering. It doesn't meet Django's standards.

(If you don't know why having functions that return different values based on global variables is a bad idea, you can search for "global state" in your favorite search engine. I'm pretty sure you can find a good blog post explaining why. And yes Django suffers from some legacy in this area. We are trying to avoid taking more debt.)

Four committers rejected your proposal and suggested alternatives. We're trying to help. Pushing the same idea again won't work any better. Please take into account the objections and try to find a design that addresses them, or implement one of the alternatives we offered.

Thank you,

-- 
Aymeric.




N Aditya

未読、
2021/06/02 1:49:522021/06/02
To: Django developers (Contributions to Django itself)
Hey Augustin,

Just to clarify, from before, you quoted:

> you can store the current HTTP request as a thread-local variable in your application and access it from anywhere, for example from a database router.

Also, from your previous message, you quoted

> As a consequence, the only way to make it return a different value at different times would be to depend on global variables. This is bad software engineering. It doesn't meet Django's standards.

I believe the final suggestion you made was as follows:

> I would recommend running without persistent database connections (CONN_MAX_AGE = 0) and switching settings.DATABASE["default"] in a middleware at the beginning of every request. Modifying settings at runtime isn't officially supported, but this looks like the closest option to the intent you expressed.

Below are a few things I'd like to clarify:
  1. Are you referring to thread-locals as `global state/variables` in your previous message ? If so, why suggest something which you consider bad practise ?
  2. As a consequence, if using thread-locals is considered bad software engg, could you please give me a way by which I could pass on request metadata to the router layer cleanly ?
  3. The final recommendation you gave is something that isn't officially supported and would fail when a threaded server or a fully async request path is used.
    (Even your favourite search engine might not be of much help here)

Based on your suggestions, I haven't arrived at a soln which can make this work.


> Just because something is easy to implement doesn't mean it gets added to Django

I missed to type in a couple of words which has created quite the spur around this. My apologies for that. What I really wanted to key in was this:

> I don't see any reason for why providing a hook seems so difficult to implement

I guess we all agreed on the fact that this would be fairly easy to implement. But I never implied/wanted to imply anything about adding this to Django just because the implementation was fairly easy.

Also, you quoted in your previous message:

> It also needs to be generally useful and well designed. Your proposal doesn't meet the second criterion and I'm skeptical about the first one.

I believe I've fairly stated my use-case. If you're stating that the proposal is not `generally useful` then I believe you're implying that routing of DB queries based on request metadata like domain, IP, location etc., may not be useful to most folks. I'm skeptical about this. 
Also, regd the `well designed` part, I believe we haven't arrived at a clean solution based on your recommendations either. So I believe we'll have to get this sorted. 

Also, you quoted:

> Four committers rejected your proposal and suggested alternatives

I don't think that apart from your suggestion, the other folks mentioned anything concrete along the lines of a work around. Let me know if I'm missing something here. 

To summarise, I've been trying to help from my end as well to try and make ends meet. I love using Django and I respect a lot of its design philosophies. I would like to be a contributor in the long run and do some exciting work. Let me knw how we can take this forward in the best possible way. 

Regards, 
Aditya N

Shai Berger

未読、
2021/06/02 5:54:202021/06/02
To: django-d...@googlegroups.com
Hi Aditya,

I think the basic issue is that the DB Routers framework is not the
right tool for the task you have in mind. You are trying to redirect
all database activity according to request parameters. The routers are
built for specific uses, and -- by design -- they don't cover all
cases; it's not just transactions. Off the top of my head, it's also raw
sql queries that you'd want to redirect in a similar way. So generally,
Aymeric's suggestion of changing the settings -- although I agree with
your criticism about its sensitivity to any form parallelism -- seems
closer to the right track than the idea of extending the routers.

I _think_ -- haven't looked in the details, so this may be a complete
blunder -- that you can achieve what you want using the Database
Instrumentation framework; it is built to allow you to intervene in the
execution of any SQL operation, at a lower level.

HTH,
Shai.

Aymeric Augustin

未読、
2021/06/02 17:18:272021/06/02
To: django-d...@googlegroups.com
On 2 Jun 2021, at 07:49, N Aditya <gojeta...@gmail.com> wrote:

Below are a few things I'd like to clarify:
  1. Are you referring to thread-locals as `global state/variables` in your previous message ?
Yes.
  1. If so, why suggest something which you consider bad practise ?
You rejected the good practice before — atomic(using=...) so I'm looking for something that matches your constraint.

  1. As a consequence, if using thread-locals is considered bad software engg, could you please give me a way by which I could pass on request metadata to the router layer cleanly ?
Pass it explicitly with `atomic(using=...)`. I'm aware that it doesn't work for third-party libraries that you don't control. I don't have a perfect solution.
  1. The final recommendation you gave is something that isn't officially supported and would fail when a threaded server or a fully async request path is used.
    (Even your favourite search engine might not be of much help here)
Indeed, you'd need something slightly smarter, but since connections are thread-local in Django, it should be manageable.

I don't think you can meaningfully put "fully async request path" and "database transaction managed by the Django ORM" in the same sentence. You'd run the transaction in a thread pool, which brings you back to by previous case.

-- 
Aymeric.



N Aditya

未読、
2021/06/03 6:03:042021/06/03
To: Django developers (Contributions to Django itself)
Hey Augustin, 

I'd like to clarify a few things:

1. If atomic(using=...) is the way to go and the same has been implemented for ORM queries as well, why introduce something like the routers framework in the first place ?

2. Also generally speaking, was the intention of building the routers framework only to allow routing based on models and other hints available as part of the args list or to expose hooks that can achieve routing of 
DB queries based on custom logic that could either depend on the args list or not (The use-cases are plenty. It can be from some state being pushed into a KV store based on some logic that gets executed in the view etc., not just thread-local state. Also, considering thread-local state here to be globals seems completely unfair since in actuality, it is tied to the request's scope and not global for all requests).
    
    i. If your answer is the former, then you guys should strongly re-consider the design since they can easily be perceived and used as hooks to achieve the latter. I would end my discussion here.
    ii. If your answer is the latter, then I'm looking for a similar hook for the transaction APIs as well. I would want to continue the discussion here.

Also, if possible, I'd like to hear opinions about the above from the authors of the routers framework as well. 

-----------------------

Also, I believe you quoted this in your previous message:
> Indeed, you'd need something slightly smarter, but since connections are thread-local in Django, it should be manageable.

Using some kind of thread sync primitive like a Lock/Semaphore at the middleware level to achieve this with stability doesn't seem manageable. It's just bad. Also, if for some reason within the view, the connection gets broken and I try to re-establish it, I have no guarantees as to what settings.databases["default"] might look like at that point and so I'll have to fetch the config that was used to initially create the connection from somewhere and then re-establish it again using a thread-sync primitive. I'm not even going to talk about the repercussions of using this approach with a ThreadPoolExecutor/ProcessPoolExecutor from within a view.


Regards, 
Aditya N

Aymeric Augustin

未読、
2021/06/03 7:34:482021/06/03
To: django-d...@googlegroups.com
On 3 Jun 2021, at 12:03, N Aditya <gojeta...@gmail.com> wrote:

1. If atomic(using=...) is the way to go and the same has been implemented for ORM queries as well, why introduce something like the routers framework in the first place ?

You can meaningfully route individual ORM queries based on the model being queried and it's often useful to do so. So Django supports it.

It would be convenient to do the same for transaction blocks, but it isn't possible in a generally useful way because you don't know yet which models will be queried. So Django doesn't support it.

2. Also generally speaking, was the intention of building the routers framework only to allow routing based on models and other hints available as part of the args list

Yes, it was.

or to expose hooks that can achieve routing of DB queries based on custom logic that could either depend on the args list or not (The use-cases are plenty. It can be from some state being pushed into a KV store based on some logic that gets executed in the view etc., not just thread-local state. Also, considering thread-local state here to be globals seems completely unfair since in actuality, it is tied to the request's scope and not global for all requests).

No, it wasn't.

Also, if possible, I'd like to hear opinions about the above from the authors of the routers framework as well. 

Also, I believe you quoted this in your previous message:
> Indeed, you'd need something slightly smarter, but since connections are thread-local in Django, it should be manageable.

(...)

Let's say my proposal doesn't work, then. I didn't try it!

-- 
Aymeric.

AJAY

未読、
2022/06/12 14:59:462022/06/12
To: Django developers (Contributions to Django itself)
Hi ,

I guess I was facing similar issue like Aditya and while searching landed on this page. 

By following that is available to us (passing using in transaction apis and router for ORM) definitely we got the results we were expecting but my point is overhead of writing it. In my understanding  ORM first check using param then fallback to router and then ultimately fallback to default. I guess we can do same thing for transaction too. I dont know completely about django standers but by this it will be backward compatible (issues mentioned in previous messages of ignoring using and other stuff is removed) and the overhead of unnecessary passing using in each transaction api can be avoided (not to mention about the issues we face when some miss it and code breaks). 

About the point of maintenance, testing etc etc, in my opinion that is part of life and we cannot avoid it. He may not be asking to release it immediately/in separate  release, rather you can take this minor enhance with other release and manage your testing and all. 
全員に返信
投稿者に返信
転送
新着メール 0 件