Comparing an SQLAlchemy object to a similar object

17 views
Skip to first unread message

Andrew Martin

unread,
Apr 10, 2021, 5:18:25 PM4/10/21
to sqlalchemy
Hi everyone,

This might be a totally premature optimization, but I want to at least start off in the right direction.

I'm pulling messages from a slack channel into an Airflow DAG and writing events to a table  if certain attributes of the slack message has changed. Slack message structures are really variable depending on how a message was posted (i.e., you just type something, you share a message from another channel, it has a file attachment, or you share it and add a message, etc. The dictionary you get is totally different and unreliable.) So I have a SlackMessage class that susses all of this out and creates an object with standardized attributes that I can rely on. It looks like this:

class SlackMessage:
    def __init__(self, message):
        # print(message + "\n")
        self.reactions = []
        self.ts = ""
        self.uuid = ""
        self.datetime = ""
        self.teams = []
        self.codename = ""
        self.needs_ticket = ""
        self.triage_status = ""
        self.resolution_type = ""
        self.message_type = ""
        self.message_text = ""
        self.file_urls = []
        self.message_url = ""
        self.local_url = ""
        self.jira_title = ""
        self.datetime_resolved = ""

        self.parse_message(message)

The init creates the attributes so I don't have to check for them to exist, and the parse_message method populates anything that can be at the time.

I'm pushing these objects to a triage_message table in which I want to add new records when a message has changed based on the reactions, triage_status, ticket_status, or teams values have changed. This is sort of intentionally not upsert. I want to preserve event data so that I can do analytics, like how long from the time the message was created until it obtained the "complete" reaction.

My SQLAlchemy model is different from the slack_message class and has some additional fields like a generated bigint ID, a foreign key to my data warehouse date dimension, and some automated date fields for creation and update time (although in this design, there really shouldn't be an updated date.)

The model looks like this:

class TriageMessage(Base):
    __tablename__ = "triage_message"

    uuid = Column(Text, nullable=False)
    dim_date_id = Column(
        ForeignKey("dim_date.id", ondelete="CASCADE", onupdate="CASCADE"),
        nullable=False,
    )
    ts = Column(Integer, nullable=False)
    datetime = Column(Date, nullable=False)
    datetime_resolved = Column(Date)
    codename = Column(Text, nullable=False)
    jira_title = Column(Text, nullable=False)
    local_url = Column(Text, nullable=False)
    message_type = Column(Text, nullable=False)
    message_url = Column(Text, nullable=False)
    needs_ticket = Column(Boolean, nullable=False)
    reactions = Column(ARRAY(Text()), nullable=False)
    teams = Column(ARRAY(Text()), nullable=False)
    triage_status = Column(Text, nullable=False)
    file_urls = Column(ARRAY(Text()))
    resolution_type = Column(Text)

I'm trying to figure out what the best way is to decide if a new record should be created. These aren't the same class, so it doesn't immediately make sense to me to create __eq__ and __ne__ method overrides on these.

There are two different approaches that do seem reasonable to me.

1. a comparison function

def slack_msg_eq_sqla_msg(slack_msg, sqla_msg):
    <compare the attributes I care about here and return True or False>

2. convert the slack message object to the sqla object and test for equality.

Does one of these approaches make more sense than the other to the group here?

Also bonus question! If the uuid of the message doesn't exist in the database at all, I can skip all of this and just write the new event. If I'm pulling 20k messages at a time from the slack channel and postgres, is it worth it to create a dict with uuid as the key so that I'm searching a hash table instead of doing a list comprehension to gather new items based on the uuid value?

Mike Bayer

unread,
Apr 12, 2021, 9:18:27 AM4/12/21
to noreply-spamdigest via sqlalchemy
option #1 seems much simpler I'd likely start with that


re uuid, I usually take the "existing" ids and put them in a dictionary so I know which ones to skip, absolutely.  I don't understand what the "list comprehension" approach would entail that isn't using a hash lookup of some kind.
--
SQLAlchemy -
The Python SQL Toolkit and Object Relational Mapper
 
 
To post example code, please provide an MCVE: Minimal, Complete, and Verifiable Example. See http://stackoverflow.com/help/mcve for a full description.
---
You received this message because you are subscribed to the Google Groups "sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+...@googlegroups.com.

Reply all
Reply to author
Forward
0 new messages