Following are some of the regular expressions I have been working with to get the function calls and variable names. They seem to work just alright.
# function evaluation: fetches functions like: eval, document.getElementById etc..
m = re.findall("(\w+|\w+\.\w+|\w+\.\w+\.\w+)\(.*?\);",js)
#variable extraction:
v = re.findall("var\s+(\w+)\s*=\s*new\s+(\w+)\(.*?\);?",js)
v = re.findall("var\s+(\w+)\s*[=;,]+",js)
This seem to be working better than the earlier implementations in the javascriptfeatures.py
I am working on improving them as well as other regular expressions for feature extraction. I was anyways thinking if I am going on the right track, and if there was any other approach for the same task that was better in terms of time and space than this.