will property change events affect memory requirements?

47 views
Skip to first unread message

hejunh...@gmail.com

unread,
Jun 21, 2017, 6:57:20 AM6/21/17
to actionml-user
Hi~ 
    As far as I know, the number of user-ids, #item-ids and #event-type will affect the training model of memory requirements, so I control the scale of the data through the eventwindow, but because "$set property change events are never dropped", does it mean that #item-ids will continue to increase , the memory requirements will be more and more ? (in my case, items expired after 3 days)

Pat Ferrel

unread,
Jun 21, 2017, 12:25:50 PM6/21/17
to hejunh...@gmail.com, actionml-user
Yes, but the db-cleaner template and the SelfCleaningDatasource have an option to compress duplicates and create one aggregate $set from many changes. For highly constrained environments I’d run db-cleaner periodically to keep training time constant.

On Jun 21, 2017, at 3:57 AM, hejunh...@gmail.com wrote:

Hi~ 
    As far as I know, the number of user-ids, #item-ids and #event-type will affect the training model of memory requirements, so I control the scale of the data through the eventwindow, but because "$set property change events are never dropped", does it mean that #item-ids will continue to increase , the memory requirements will be more and more ? (in my case, items expired after 3 days)

--
You received this message because you are subscribed to the Google Groups "actionml-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to actionml-use...@googlegroups.com.
To post to this group, send email to action...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/actionml-user/fac288e9-d9ff-4074-83a4-96364eb57154%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Message has been deleted
Message has been deleted

hejunh...@gmail.com

unread,
Jun 21, 2017, 10:09:49 PM6/21/17
to actionml-user, p...@occamsmachete.com
thank you for your reply, pat.
I have used db-cleaner to control the scale of the data, but since reserved event(e.g. $set) will not be deleted, so those item-ids associated with reserved event will always exist in the ES model (even those item-ids will never become the target of any users). will those "meaningless" item-ids (from reserved event) affect the resource requirements when training URModel (e.g. consturct BiMap) ? or does it just index those item-ids (from reserved event) to ES model if there are no named events(e.g purchace) on it ?

在 2017年6月22日星期四 UTC+8上午12:25:50,pat写道:

Pat Ferrel

unread,
Jun 22, 2017, 11:33:24 AM6/22/17
to hejunh...@gmail.com, actionml-user
The slow accumulation of your $set events (actually the objects they create—items) can be reversed by sending $delete when you know the item is not longer available.


Pat Ferrel

unread,
Jun 24, 2017, 12:31:18 PM6/24/17
to Pat Ferrel, hejunh...@gmail.com, actionml-user
Hmm, actually not sure it that is true. It would take an experiment, create an item with $set, $delete it and both events will be in the PIO EventServer. Then run db-cleaner with compression turned on and see if the events are still there. If they are, this is a bug in PIO, so you should create a bug report there.


hejunh...@gmail.com

unread,
Jun 25, 2017, 3:07:34 AM6/25/17
to actionml-user, p...@occamsmachete.com

I just tested it and found it wasn't as good as expected. Maybe it's a bug.

TEST:

[input-events] (python-SDK create_event / delete_item api) 

[
  {
    "eventId": "jMame_rld_XelHmDsvlZQwAAAVza1NQAk7Kw94kNGnE",
    "event": "$set",
    "entityType": "item",
    "entityId": "item-1",
    "properties": {
      "gender": "male"
    },
    "eventTime": "2017-06-25T00:00:00.000+08:00",
    "creationTime": "2017-06-25T06:41:23.174Z"
  },
  {
    "eventId": "jMame_rld_XelHmDsvlZQwAAAVza1NQAn_o4ammNV0Y",
    "event": "$set",
    "entityType": "item",
    "entityId": "item-1",
    "properties": {
      "name": "alice",
      "id": "123"
    },
    "eventTime": "2017-06-25T00:00:00.000+08:00",
    "creationTime": "2017-06-25T06:30:25.362Z"
  },
  {
    "eventId": "jMame_rld_XelHmDsvlZQwAAAVzd9vAtgElY5zCAd68",
    "event": "$delete",
    "entityType": "item",
    "entityId": "item-1",
    "properties": {},
    "eventTime": "2017-06-25T06:36:07.085Z",
    "creationTime": "2017-06-25T06:36:07.093Z"
  }
]

>> then run "pio train -v test.json" under db-cleaner-master directory

PS: compressProperties:true, removeDuplicates:true

[after-train]

[
  {
    "eventId": "jMame_rld_XelHmDsvlZQwAAAVza1NQAk7Kw94kNGnE",
    "event": "$set",
    "entityType": "item",
    "entityId": "item-1",
    "properties": {
      "gender": "male",
      "name": "alice",
      "id": "123"
    },
    "eventTime": "2017-06-25T00:00:00.000+08:00",
    "creationTime": "2017-06-25T00:00:00.000+08:00"
  },
  {
    "eventId": "jMame_rld_XelHmDsvlZQwAAAVzd9vAtgElY5zCAd68",
    "event": "$delete",
    "entityType": "item",
    "entityId": "item-1",
    "properties": {},
    "eventTime": "2017-06-25T06:36:07.085Z",
    "creationTime": "2017-06-25T06:36:07.085Z"
  }
]

the props compressed but the item-1 and its '$delete' event is still there.

在 2017年6月25日星期日 UTC+8上午12:31:18,pat写道:

Pat Ferrel

unread,
Jun 25, 2017, 11:09:00 AM6/25/17
to hejunh...@gmail.com, actionml-user
File a bug on PredictionIO with this info. the db-cleanger template just uses internal PIO APIs.


Pat Ferrel

unread,
Jun 25, 2017, 11:12:11 AM6/25/17
to us...@predictionio.incubator.apache.org, actionml-user, hejunh...@gmail.com, Pat Ferrel
Sounds like the only way to remove these will be to export, process yourself to remove all events for deleted items and re-import them.


Reply all
Reply to author
Forward
0 new messages