Data Mods

28 views
Skip to first unread message

Greg Stanton Marra

unread,
Nov 3, 2018, 3:08:20 PM11/3/18
to theblueallian...@googlegroups.com
Hi everyone!

I've been giving a bit of thought to what we can do to streamline how we get data from people who are reporting incorrect or incomplete data. Here's a few thoughts I've written out in a doc in our public tba google drive folder.

Curious to hear everyone's thoughts and feedback.

-Greg

Data Mods

The Blue Alliance aims to be a complete index of all information related to FIRST Robotics Competition Teams, Events, and Matches.


  • People come to TBA to watch webcasts, so having coverage of the webcast for every event is critical.

  • People come to TBA to scout for future events, so having coverage of media for teams is critical.

  • People also come to TBA to re-watch old matches, so we aim to have as much coverage as possible of older matches.


Data Ingestion

We have several primary ways we obtain data:



Things to Improve

  • Webcast Coverage

    • Gameday and webcasts are the most popular site feature. We should actively track down webcasts for upcoming events.

  • Offseason Moderators

    • We should aim to have an offseason moderator for every Offseason Event

  • Misc Reports

    • Many reports are never properly triaged into a “todo list”.

    • It is inconsistent which site data moderators read our email address or Facebook group.

People

We should better organize our site data moderators.


  • Currently we use #mods on the TBA Slack to organize.

  • We should create an email list we also use to organize and share communications. Alternatively, we can use theblueallian...@googlegroups.com, since it’s very low traffic.

  • We should determine the minimum number of active data mods we need at any time, and have a process to identify and add (and remove) data mods.

Process

We should have certain regular processes regarding data moderation. We need to identify a “lead data mod” to drive these.


  • Use Github as the source of truth for required pending edits. Ask people to open issues against our Github so we can track closing issues and not lose reports.

  • Send a weekly “State of the Data” email looking back on the past week and forward to webcast coverage of next week’s events, and data mod coverage of offseason events.


Tools

There are some tools that would assist with this.


  • Improve /webcasts to highlight next week’s events that do not have webcasts yet.

  • Create a tool to see upcoming Offseasons that do not have write keys issued.

  • Have a bot email a report to the Data Mods mailing list, especially highlighting those two things.



Todo

  • Data Mods

    • Get list of current active data mods

    • Figure out out many data mods we need

    • Have a recruitment process

    • Figure out who should be “data mod lead”

  • Data Coverage

    • Calculate % of matches by year with match videos to understand trend

    • Calculate % of events with webcast by year to understand webcast trend

  • Tools

    • Upgrade Webcast page to pull out “next week” events

    • Build tool that shows “offseasons without data mods”

  • Ad Hoc Data Reports

    • Edit Facebook Group description to describe github process

    • Edit https://www.thebluealliance.com/add-data to capture modern best practices

    • Have some process where we periodically triage the github issues

Timothy J Flynn

unread,
Nov 3, 2018, 4:10:24 PM11/3/18
to theblueallian...@googlegroups.com

Greg (and all),

You raise good points about our general data flow, especially for moderation of offseasons.  That being said, moderators for events are still volunteers generally, and can make mistakes.  A "File Issue" button on discrete pages would help double-check and enable a centralized location, and GitHub would be a good place for that.  That being said, asking normal users to make a GH account to file one issue may not be the best way to approach that issue.  That being said, I don't have a better solution unfortunately.

People who want offseason data to be uploaded often don't know the format TBA needs to import it (xlsx files, that is.), or even how to request an offseason event be added.  In several cases, getting the event added to TBA, much less livestreams and media be added, are low priority for the event-runners.  Perhaps a Blog post or TBA page of "How To Get Content To TBA" that goes into Detail of "Team Media, Event Media, Offseasons, etc." would be helpful?  That way it doesn't have to spread via word of mouth, and people not directly responsible for running the event can get a "hit list" in advance.  Twice now I've had people try to send me FMS backups to add event results to TBA...

With respect to webcasts, every single in-season event had a webcast listed this season (178 of them via API check).  That being said, some were in a format that made it difficult to archive.  If we became aware of these problem webcasts in advance, that would definitely help the archival intent.  That being said, some events (especially offseasons) have their livestreams added after the fact as event media (See 2017njbe).

As to the number of data mods... a number of them were added because they said "sure I'll help whittle down the backlog" when we had several thousand suggestions to review.  I don't see a large need for them in the off-season, but perhaps we could make use of their services to split events that have matches and event media, but no discrete per-match video?  If we're aiming for a data mod for each offseason, then perhaps a list in the moderation queue for future events without an assigned moderator in the "approved writers" category would be useful?

Tim

--
You received this message because you are subscribed to the Google Groups "thebluealliance-developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to thebluealliance-dev...@googlegroups.com.
To post to this group, send email to theblueallian...@googlegroups.com.
Visit this group at https://groups.google.com/group/thebluealliance-developers.
For more options, visit https://groups.google.com/d/optout.

Allen Gregory IV

unread,
Nov 3, 2018, 6:08:35 PM11/3/18
to theblueallian...@googlegroups.com
I think there are some big things we can do to improve off-season data.

- Increase the number of events using FMS-sync
Recruit mods from various large areas to help their area off-seasons setup FMS sync. We should also work with HQ to make setting up FMS sync easier. FMS sync is clearly the best way to get accurate data.

- Create a way to easily split, label, and upload live match videos from events.

Ideally we find a way to make uploading match video easier for events. I have talked to a few people about making some form of FRC match video splitter live, that allows events to have a simple way to either have the automated match splitter and uploaded run locally or in the cloud if that's easier. This is most useful for districts and off-seasons.

- Improve ways for mods to be able to delete incorrect data

I offend find media or webcasts that shouldn't be shown but I don't have the ability to remove them. The site would be better if we could remove offline-webcasts (when a correct one is also avliable), or incorrect team media.

Thanks for all the hard work in making TBA a cornerstorn of FRC,
Allen
--

Allen Gregory
Spectrum FRC#3847
360-390-5244(call or text)

Greg Stanton Marra

unread,
Nov 18, 2018, 10:47:10 AM11/18/18
to theblueallian...@googlegroups.com
Timothy: I agree GitHub isn't the best workflow for everyone. I think making it the "source of truth for stuff to fix" can be an upgrade though – sometimes I see a Facebook post or email and I don't have the time to make a complex fix or I am on my phone. Having someone say "issue filed <link>" can make sure we don't let things slip through the cracks.

I did some data analysis of offseason data: https://github.com/gregmarra/frc-r/blob/master/webcast_coverage.md . We're getting more webcasts over time, but fewer matches with videos on YouTube. Splitting and uploading is clearly a problem. Making the "best way to run an event to get data coverage later" easier and easier seems like a big opportunity here.

Allen, great point that empowering data mods to remove bad webcasts or bad media would be a big upgrade to the site. "Media Removal Suggestions" would be a way to approach this.
Reply all
Reply to author
Forward
0 new messages