Also, evaluation model could produce the score in several formats:
#thescore: <score>, #Score: <score>, **#score:** <score>, #Thescore: <score>, #TheScore: <score>
To handle all these variants in the 'get_jailbreak_score' method, I suggest:
1. 'evaluation_response' should be converted to lower case before we pass it to "extract_content" method
2.
'extract_content' needs to iterate through a list of all possible score tags to reduce LLM evaluation errors.
The current leaderboard has some randomness because:
if the evaluation model produces the score in this
format: '#thescore: <score>' the sample will get score [1 to 5] otherwise evaluaation model score will be 0