Oh yeah, the fact that certain wikipedia pages come under protection status definitely lowers their revert counts. And I accept that as well in my blog.
Blog excerpt: "This is more of a caveat. Some of these pages will have been placed under protection to stop getting abused. Protection could, for example, mean edits by anonymous users not appearing till they're approved, or if someone has been a registered user for less than 4 days old and has done less than 10 edits, they won't be able to edit certain pages. The page on Narendra Modi is under even stricter protection—only someone who's been a user for more than 30 days and has done over 500 edits can edit the page, and this has been the case since April. Such protections brings down the abuse levels for many pages, so that should be kept in mind."
But then you could turn around and say, hey Shijith if you're aware of this, why didn't you factor it into your calculations? It's because I think there should be a limit to the level of complexity a journalist aims for in their story. I accept that I posted this in the datameet mailing list, but the story is aimed at a general audience.
Again from my blog: "I've done work in the past that tries to be (conscientiously) rigorous, but I don't think journalism is the place for such work. Academic journals maybe, but not publications meant for the general public. In the pursuit of precision and conclusion validity, a lot of data journalism in India has become completely unreadable. I don't think there's anything wrong in admitting upfront that this post is a best-effort attempt, the conclusions may not be completely valid, but that hopefully this promotes the topic of wikipedia abuse in India as a legitimate avenue of research. And that some academic out there does a better job than I did in their paper. But I don't think a journalist should be the one doing that paper."
As for the level to which abuse of certain wikipedia pages is organised, and how that coordination is done over Telegram and Whatsapp, because many of these groups are private groups, it's difficult to monitor that activity and find out which pages they are targeting.
But even if that's the case, I don't think I'll be missing out on any pages though. Right now the script goes through 150k pages (and once i've reworked the code, all the pages) which have been assessed by
WikiProject India, the group of editors that maintains pages about India, and they cover
almost everything. So no scope for missing out on disputed pages.
To your point about following OpIndia or other right-wing handles to find which wikipedia pages they are targeting, the only issue I would have with that is it won't capture abuse of pages that aren't a result of coordinated, organised campaigns. Abuse of pages that is a result of individual actors editing separately, but still devastating at an aggregate level, is also interesting to me. The right-wing agenda is pretty much all pervasive now, and people don't need to be prodded by politicians, media etc. to do their bidding, they pretty much act on their own volition now :(