Hey mate,
I love this idea, you could really get some wild, Dutch angle, close-up videos of your target face outputted to the monitors. Turning a target ROI into a new preview window is definitely the next challenging hoop to approach.
The closest code I’ve seen that does something similar is this script here for a Oak-D lite and Raspberry Pi System - Lossless Zoom By Luxonis. That Python Script evaluates the footage from a 4K camera, identifies a Human target as the ROI, and then outputs the ROI to a preview window. There are a lot of translatable ideas from your project and that system, however, you’re going to have to dive in and figure out how to modify it appropriately.
I talk a fair bit about Oak-D lite camera modules here if you are interested in that hit up this link - Integrated Computer Vision Package - OAK-D Lite With Raspberry Pi Set Up
Hopefully, that will help you along the way I’d love to see this finalised project end up on our Projects page, so definitely pop back again if have any other questions.
Best of luck and thanks for your kinds words,
Tim