Hello community,
Its' main purpose is to tag anomalies in (x86\x64) PE files and show extended reports.
Only static analysis of PE files is available for now.
We use both heuristic rules and machine learning to classify and detect whether file is malicious or clean (is a file verdict system).
We kindly ask community to help us with service testing on different PE files and suggest features to improve.
Some technical information:
Heuristic core is written in C++ from scratch (more then 10k lines of code).
Prediction core is trained on Random Forests ensemble with more then 70 major features to classify if file malicious or not.
To train it we used dataset with about 1k malware samples and ~1k clean samples (from Program files, Windows and etc).
Prediction rate is about 97% on training set.
Also we have full db of virusshare samples (100k++) but we need almost same clean samples to build better dataset.
It will be great if someone tells us how to get 100k clean files of real (PE) program files :).