This is a curly question, where the answer will almost always result in at least one person that is not happy. The current behaviour as per the documentation:
When a viewer clicks into an extension, Twitch sends focus back to the player, to ensure that keyboard shortcuts for the player continue to work. However, if an extension asks viewers to click on a form field element (for example, “field,” “select,” “textarea”) and the viewer does so, the focus stays on the form element.
Viewer’s expect to use player controls, which is why this approach was taken.