you said "the specification allows us have a text track synchronizes with the video track".
Can you tell me which specification you were referring to in the above sentence?
APIs in the browsers are specified by the W3C, protocols (and content) used on the wire are specified by the IETF. Something might be possible with the protocols (IETF specs), but if there is no corresponding JS API (W3C spec) your web app will not have access to it. Only the browser can use it.
You can modify browser code if you want, but how would you have your users download the modified version and install them? In practice you are limited by what browser vendors provide, that is, if you want to have a web app.
if you use a native app, and only interact with other native app, you are free to do anything you want, and not interoperate.
Alex.