I am interesting in building a tokenization system for system logs
(apache, dns, syslog, windows...) in order to parse them, analize them and do further investigations. I haven't seen any tokenization functionnality in the nltk module to precisely do so, in system logs we don't have natural language as humans speak.
Nevertheless, I wanted to ask to the community if someone has found a solution to a similar problem or has good references for NLP for logs and can give me some light.
Thank you for your time,