Periodic import of data to app

8 views
Skip to first unread message

Christoph Damm

unread,
Dec 30, 2022, 4:13:41 AM12/30/22
to Abridged recipients
Hello,

We need to import data into one of our app on a daily basis. 
In a first approach this was done with the nodes API. However this is currently not running great as it use a lot (nearly oom with 8gb) of ram.
Entries are added/updated one by one and published then.
I believe this is not the right/best use of nodes API 
Any suggestions?
Bulk sending the data over nodes API?
Switching to a custom CSV import etc?

Thanks and regards,
Christoph

Tobias Szczepanski

unread,
Dec 30, 2022, 9:36:35 AM12/30/22
to Magnolia User Mailing List, christo...@gmail.com
Hi Christoph,

to be honest, I'd challenge the architectural decision to import/push data on a daily basis in the first place. Unless there is a valid reason against accepting the space complexity, the architectural cost of data duplication is most likely not worth it. 
Do I understand your scenario correctly? Assumptions:
  1. There's a system which is the source of the data
  2. The system decides, in fact, when the data is published
  3. The system pushes the data to the author instance
  4. The system triggers the publish command for the pushed data
My questions to you would be the following:
  1. Why is the data not provided by the system via a REST Endpoint, but instead the system pushes its data into Magnolia?
    1. Caching should allow for performant requests and rendering, right?
  2. Why is Magnolia's publishing mechanism used on top of the systems, isn't #2 of my scenario assumptions correct?
Though, if you want to keep pushing the data into Magnolia, you might want to take a look at Jackrabbit Clustering, if your requirements don't actually need Magnolia's publishing mechanism, which might be the most significant bottleneck here.
https://docs.magnolia-cms.com/product-docs/6.2/Administration/Instances/Clustering.html

Best,
Tob

Christoph Damm

unread,
Jan 11, 2023, 4:07:09 AM1/11/23
to Tobias Szczepanski, Magnolia User Mailing List
Hi Tobias,
thank you for your input and sorry for my late reply.
Your assumptions are correct and suggestion number one is an option, but was not possible due to time constraints. Also your question / concern in regards to performance is right. There could be various components on a page template relying on the data (which can be dynamic per user, so page caching is not an option), hence the api would have to be performant to those needs.
To number 2: Not sure i understand that correctly. Right now the system pushes the data to the author system and does a publish right away. There is no delay and no delay needed (from a domain level) between those actions. I guess one could also just put the data into the public instance(s)! and avoid the publishing at all.
However I need to do some more detailed analysis of what is slowing down. I think my initial comment was wrong that the adding/publish is using that amount of ram, i think that was a GET request for all nodes. Yet, the individual actions (put/post + publish) are in the area of seconds or more, not ms which i would expect.
Anyway, I'll do some deeper analysis, but am open to more input.

Thank you again!

Christoph

Regards,

Christoph

Reply all
Reply to author
Forward
0 new messages