class AccountIngestedData(Base):
__tablename__ = "account_ingested_data"
__table_args__ = {"schema": "datablender"}
account_ingested_data_id = Column(
BigInteger,
primary_key=True,
server_default=text("nextval('account_ingested_data_id_seq'::regclass)"),
)
salesforce_id = Column(ForeignKey(AccountRawData.id), nullable=False)
account_data_source_id = Column(
ForeignKey(AccountDataSource.account_data_source_id), nullable=False
)
account_enrichment_source_field_mapper_id = Column(
ForeignKey(
AccountEnrichmentSourceFieldMapper.account_enrichment_source_field_mapper_id
),
nullable=False,
)
account_state_id = Column(ForeignKey(AccountState.account_state_id), nullable=False)
account_experian_file_id = Column(
ForeignKey(AccountExperianFile.account_experian_file_id), nullable=True
)
account_name = Column(Text, nullable=False)
account_address1 = Column(Text, nullable=False)
account_address2 = Column(Text, nullable=False)
account_city = Column(Text, nullable=False)
account_state = Column(Text, nullable=False)
account_zip = Column(Text, nullable=False)
account_data_source = relationship("AccountDataSource")
account_enrichment_source_field_mapper = relationship(
"AccountEnrichmentSourceFieldMapper"
)
account_experian_file = relationship("AccountExperianFile")
account_state = relationship("AccountState")
salesforce = relationship("AccountRawData")
And we have a state table:
class AccountState(Base):
__tablename__ = "account_state"
__table_args__ = {"schema": "datablender"}
account_state_id = Column(
BigInteger,
primary_key=True,
server_default=text("nextval('account_state_id_seq'::regclass)"),
)
account_state_description = Column(Text, nullable=False)
Example values for AccountState are like
100000, "ingested"
100001, "address valid"
100002, "address invalid"
100003, "address error"
100004, "address manual review"
100005, "owner info found"
100006, "owner info ambiguous"
100007, "owner info error"
100008, "owner info manual review"
Of course this list goes on and on as we add more and more integrations. At each stage of the automated pipeline, there's a query to get the records that correspond to each state.
My question is this: what's the cleanest way to assign the Foreign Key IDs to the accounts table based on what's happening?
I feel like there must be a really obvious solution to this that doesn't involve a global python dictionary that has to be updated every time a new status is added, and also doesn't have to bang another query on the status table for every record that gets loaded. But my brain can't figure out how to make this happen.
Really curious if anyone has thoughts about this pattern. It's obviously common, I just haven't seen the SQLA implementation before.