Writing ORM from a complex sql statement

13 views
Skip to first unread message

Desmond Lim

unread,
Jun 9, 2019, 9:57:36 PM6/9/19
to sqlalchemy
Hi there,

I really have no idea how to do this via sqlalchemy. I have 2 tables:

class NodesModel(db.Model):
    __tablename__ = 'nodes'

    id = db.Column(db.BigInteger, primary_key=True)
    topic = db.Column(db.String(20), nullable=False)

    pn = relationship("PNModel", backref="nodes")

class PNModel(db.Model):
    __tablename__ = 'pn'

    post_id = db.Column(db.BigInteger, db.ForeignKey('posts.id'), primary_key=True)
    node_id = db.Column(db.BigInteger, db.ForeignKey('nodes.id'), primary_key=True)


I will get a list of nodes that have the topic (e.g. one, two). Then taking the nodes, I'll select the PN that have both nodes (it can be 1 or more) associated with it.

e.g.

telegram-cloud-photo-size-5-6327616107510343927-x.jpg

I'm doing it this way at this point:

nodes = NodesModel.find_by_topic_list(topic_list)
node_id_list = [node.id for node in nodes]

Then running this sql statement:
WITH pn_1 AS (SELECT post_id, ARRAY_AGG(DISTINCT node_id) AS nodes
              FROM pn GROUP BY post_id),
     nodes_1 AS (SELECT ARRAY_AGG(DISTINCT id) AS s_nodes 
                 FROM pn WHERE node_id IN (dict(node_id_list))
SELECT DISTINCT post_id FROM pn_1
WHERE nodes_1 = (SELECT s_nodes FROM select_nodes);

There has to be a better and more elegant way to do this with SQLAlchemy and I can't seem to figure it out. I've tried using subqueries and then I realised that I'm actually calling the nodes table twice, which I shouldn't have to.

Can anyone tell me how to use the nodes table to get the post_id form pn (after which getting the post should be easy via ORM should be easy, I think).

Thanks.
Desmond

Simon King

unread,
Jun 10, 2019, 5:08:41 AM6/10/19
to sqlal...@googlegroups.com
Hi Desmond.

I don't really understand your table structure. Could you present it
in the form of a standalone script that we can run (including sample
data)?

Thanks,

Simon

On Mon, Jun 10, 2019 at 2:57 AM Desmond Lim <limw...@gmail.com> wrote:
>
> Hi there,
>
> I really have no idea how to do this via sqlalchemy. I have 2 tables:
>
> class NodesModel(db.Model):
> __tablename__ = 'nodes'
>
> id = db.Column(db.BigInteger, primary_key=True)
> topic = db.Column(db.String(20), nullable=False)
>
> pn = relationship("PNModel", backref="nodes")
>
> class PNModel(db.Model):
> __tablename__ = 'pn'
>
> post_id = db.Column(db.BigInteger, db.ForeignKey('posts.id'), primary_key=True)
> node_id = db.Column(db.BigInteger, db.ForeignKey('nodes.id'), primary_key=True)
>
>
> I will get a list of nodes that have the topic (e.g. one, two). Then taking the nodes, I'll select the PN that have both nodes (it can be 1 or more) associated with it.
>
> e.g.
>
>
>
> I'm doing it this way at this point:
>
> nodes = NodesModel.find_by_topic_list(topic_list)
> node_id_list = [node.id for node in nodes]
>
> Then running this sql statement:
> WITH pn_1 AS (SELECT post_id, ARRAY_AGG(DISTINCT node_id) AS nodes
> FROM pn GROUP BY post_id),
> nodes_1 AS (SELECT ARRAY_AGG(DISTINCT id) AS s_nodes
> FROM pn WHERE node_id IN (dict(node_id_list))
> SELECT DISTINCT post_id FROM pn_1
> WHERE nodes_1 = (SELECT s_nodes FROM select_nodes);
>
> There has to be a better and more elegant way to do this with SQLAlchemy and I can't seem to figure it out. I've tried using subqueries and then I realised that I'm actually calling the nodes table twice, which I shouldn't have to.
>
> Can anyone tell me how to use the nodes table to get the post_id form pn (after which getting the post should be easy via ORM should be easy, I think).
>
> Thanks.
> Desmond
>
> --
> SQLAlchemy -
> The Python SQL Toolkit and Object Relational Mapper
>
> http://www.sqlalchemy.org/
>
> To post example code, please provide an MCVE: Minimal, Complete, and Verifiable Example. See http://stackoverflow.com/help/mcve for a full description.
> ---
> You received this message because you are subscribed to the Google Groups "sqlalchemy" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+...@googlegroups.com.
> To post to this group, send email to sqlal...@googlegroups.com.
> Visit this group at https://groups.google.com/group/sqlalchemy.
> To view this discussion on the web visit https://groups.google.com/d/msgid/sqlalchemy/CAM%2BCzahgN2TF88p4__5W76DiYO3THD1tV2BgekvX913Mv6nU7Q%40mail.gmail.com.
> For more options, visit https://groups.google.com/d/optout.

Desmond Lim

unread,
Jun 10, 2019, 8:22:40 AM6/10/19
to sqlalchemy
Hi Simon,

The tables and sample data are below.

So first I search the nodes table for a and b, getting the node_id of them (in this example, 1 and 2).

Then using the SQL statement, I would get all posts that have 1 and 2 as the node_id. So in the example data, 100, 103, 108 will be returned as these posts contain the nodes 1 and 2.

Hope this helps.

Desmond 

-------------

CREATE SEQUENCE IF NOT EXISTS nodes_id_seq;

CREATE TABLE "public"."nodes" (
    "id" int8 NOT NULL DEFAULT nextval('nodes_id_seq'::regclass),
    "project_uuid" uuid NOT NULL,
    "topic" varchar(20) NOT NULL,
    PRIMARY KEY ("id")
);

CREATE SEQUENCE IF NOT EXISTS posts_id_seq;


CREATE TABLE "public"."posts" (
    "id" int8 NOT NULL DEFAULT nextval('posts_id_seq'::regclass),
    "project_uuid" uuid NOT NULL,
    PRIMARY KEY ("id")
);

CREATE TABLE "public"."pn" (
    "post_id" int8 NOT NULL,
    "node_id" int8 NOT NULL,
    PRIMARY KEY ("post_id","node_id")
);

INSERT INTO "public"."nodes" ("id", "project_uuid", "topic", "size", "given_id") VALUES 
('1', '42a8e275-f153-452b-99d3-645bdd513ca9', 'a'),
('2', '42a8e275-f153-452b-99d3-645bdd513ca9', 'b'),
('3', '42a8e275-f153-452b-99d3-645bdd513ca9', 'c'),
('4', '42a8e275-f153-452b-99d3-645bdd513ca9', 'd')

INSERT INTO "public"."posts" ("id", "project_uuid") VALUES 
('100', '42a8e275-f153-452b-99d3-645bdd513ca9'),
('101', '42a8e275-f153-452b-99d3-645bdd513ca9'),
('102', '42a8e275-f153-452b-99d3-645bdd513ca9'),
('103', '42a8e275-f153-452b-99d3-645bdd513ca9'),
('104', '42a8e275-f153-452b-99d3-645bdd513ca9'),
('105', '42a8e275-f153-452b-99d3-645bdd513ca9'),
('106', '42a8e275-f153-452b-99d3-645bdd513ca9'),
('107', '42a8e275-f153-452b-99d3-645bdd513ca9'),
('108', '42a8e275-f153-452b-99d3-645bdd513ca9'),
('109', '42a8e275-f153-452b-99d3-645bdd513ca9'),
('110', '42a8e275-f153-452b-99d3-645bdd513ca9')

INSERT INTO "public"."pn" ("post_id", "node_id") VALUES 
('1', '100'),
('2', '100'),
('3', '100'),
('2', '101'),
('3', '101'),
('4', '101'),
('5', '101'),
('3', '102'),
('4', '102'),
('1', '103'),
('2', '103'),
('3', '103'),
('2', '104'),
('4', '104'),
('1', '105'),
('4', '105'),
('3', '105'),
('3', '106'),
('4', '106'),
('4', '107'),
('1', '108'),
('2', '108'),
('3', '109'),
('1', '109'),
('4', '109'),
('2', '110'),
('3', '110')

ALTER TABLE "public"."posts" ADD FOREIGN KEY ("project_uuid") REFERENCES "public"."projects"("uuid");
ALTER TABLE "public"."pn" ADD FOREIGN KEY ("post_id") REFERENCES "public"."posts"("id");
ALTER TABLE "public"."pn" ADD FOREIGN KEY ("node_id") REFERENCES "public"."nodes"("id");

Simon King

unread,
Jun 10, 2019, 10:41:20 AM6/10/19
to sqlal...@googlegroups.com
I'm still not sure I understand. Here's an example that does what I
*think* you're asking for:

import sqlalchemy as sa
import sqlalchemy.orm as saorm
from sqlalchemy.ext.declarative import declarative_base

Base = declarative_base()


pn = sa.Table(
"pn", Base.metadata,
sa.Column("post_id", sa.Integer, sa.ForeignKey("posts.id")),
sa.Column("node_id", sa.Integer, sa.ForeignKey("nodes.id")),
)


class Node(Base):
__tablename__ = "nodes"
id = sa.Column(sa.Integer, autoincrement=True, primary_key=True)
project_uuid = sa.Column(sa.Text, nullable=False)
topic = sa.Column(sa.Text(20), nullable=False)

posts = saorm.relationship("Post", secondary=pn, backref="nodes")


class Post(Base):
__tablename__ = "posts"
id = sa.Column(sa.Integer, autoincrement=True, primary_key=True)
project_uuid = sa.Column(sa.Text, nullable=False)


if __name__ == "__main__":
engine = sa.create_engine("sqlite://")
Base.metadata.create_all(bind=engine)
Session = saorm.sessionmaker(bind=engine)

projectid = "42a8e275-f153-452b-99d3-645bdd513ca9"
topics = ['a', 'b', 'c', 'd']
nodes = [Node(project_uuid=projectid, topic=t) for t in topics]
posts = [Post(project_uuid=projectid) for i in range(11)]

import random
random.seed(0)
for node in nodes:
node.posts = random.sample(posts, 4)

session = Session()
session.add_all(nodes + posts)
session.flush()

node_ids = [1, 2]
print("Looking for posts with node IDs: %s" % node_ids)
posts = (
session.query(Post)
.join("nodes")
.filter(Node.id.in_(node_ids))
.all()
)
for p in posts:
print("Post %s: Node IDs %s" % (p.id, [n.id for n in p.nodes]))

Simon
> To view this discussion on the web visit https://groups.google.com/d/msgid/sqlalchemy/CAM%2BCzahCwZ7_yKbFTPvYX4kXTz1bZ%3DoTXGVw1g0eO26qeGBGpg%40mail.gmail.com.

Desmond Lim

unread,
Jun 10, 2019, 8:30:49 PM6/10/19
to sqlalchemy
Hi Simon,

Sorry I think I really didn't make what I'm asking for clear.

I have these codes

nodes = NodesModel.find_by_project_uuid_and_topic_list(project_uuid=project_uuid, topic_list=node_dict['topics'])

node_id_list = [node.id for node in nodes]

select_sql = text('WITH pn_o AS (SELECT post_id, ARRAY_AGG(DISTINCT node_id) AS pn_nodes FROM pn GROUP BY post_id), '
                  'n_o AS (SELECT ARRAY_AGG(DISTINCT node_id) as n_nodes FROM pn WHERE node_id IN :input_tuple) '
                  'SELECT post_id FROM pn_o WHERE pn_nodes = (SELECT n_nodes FROM n_o)')

params = {"input_tuple": tuple(node_id_list)}
results = db.session.execute(select_sql, params).fetchall()

Which does what I want. In the end, in results, I get a list of posts that have nodes that contain all the topics I need, in other words, the posts contain all the topic words.

I'm just wondering, if there is a way to ORM the select_sql as I would prefer to work with the models than the raw SQL. And I'm doing it this way because I want the server to do the heavy lifting.

Desmond

Desmond Lim

unread,
Jun 10, 2019, 10:51:45 PM6/10/19
to sqlalchemy
Okay, I have ORMed my sql statement:

pn = db.session.query(PNModel.post_id, db.func.array_agg(PNModel.node_id.distinct()).label('pn_nodes')).group_by(PNModel.post_id)
n = db.session.query(db.func.array_agg(NodesModel.id.distinct()).label('n_nodes')).filter(NodesModel.project_uuid == 
    project_uuid).filter(NodesModel.topic.in_(topic_list)).subquery()
query = db.session.query(pn.c.post_id).filter(pn.c.pn_nodes == db.session.query(n.c.n_nodes))
results = query.all()

This returns what my original SQL returned. My next step is to get all the posts of the post_id in the list

post_id_list = [result[0] for result in results]
posts = PostsModel.find_by_id_list(id_list=post_id_list)

I'm just wondering, if there is an easy way to get the post object from the query instead of doing the above.

Desmond

Simon King

unread,
Jun 11, 2019, 5:10:07 AM6/11/19
to sqlal...@googlegroups.com
Does something like this work?

query = db.session.query(PostsModel).join(pn, PostsModel.id ==
pn.c.post_id).filter(pn.c.pn_nodes == db.session.query(n.c.n_nodes))

I can't help feeling this is a very inefficient query though. I still
don't really understand your data model (I don't understand where
topics come into this). But if you've got a many-to-many relationship
between posts and nodes, and you want to find posts that are linked to
*all* of a set of nodes, perhaps you could start with something like
this:

SELECT post_id
FROM pn
WHERE node_id IN (...)
GROUP BY post_id
HAVING count(node_id) = <number of distinct node IDs>

Without the HAVING clause, that query would return all the posts that
match at least one of the nodes. The HAVING clause ensures that the
posts match *all* of the nodes.

Simon
> To view this discussion on the web visit https://groups.google.com/d/msgid/sqlalchemy/CAM%2BCzagfcZmCPDHPYUWXTVV%3DzBqq96KSu30JJnO0GbAGt8wn9g%40mail.gmail.com.
Reply all
Reply to author
Forward
0 new messages