People/Face Tracking

thoeby · April 7, 2021, 5:03pm

Hi everyone

I was working in the past a lot with CasparCG. Recently I had was working on a ‘real-time’ people-motion tracking project (AR App nothing to do with TV/CG/Graphics) and thought this maybe could be fun to port to use within CasparCG. I can track hands, faces - return points for every finger/arm/leg/etc. (where you then could ‘attach’ graphics to)

Do you think something like this could be usefull? I do not have any demo yet but maybe I could do something in a couple of days.

didikunz · April 7, 2021, 5:25pm

Sounds interesting. Would that be a fork to CasparCG server or is it done inside a template or something?

thoeby · April 14, 2021, 12:37pm

It’s fully web-based (so it would be a template or even usable within other templates) - without any changes to the server component itself (as long as the chromium version works with my code).

Will make a video today/tomorrow to demonstrate the functionality/tracking.

LateGreat · April 18, 2021, 11:03am

Hey this sounds very cool and I could see it being very useful for for spots application you could tag a person with a title key, or state.

How far along are you with you project.

thoeby · April 23, 2021, 2:25pm

Sorry for the late response - was very bussy a last couple of days. Had a few minutes to make a video to show the points we could attach things to (demo video shows a wrist-watch which is the original use case).

In theory it should be possible to simply use the values within a template or build the whole library into a html template itself.

Next steps will be to figure out whats the best way to implement this (in a way that is easily usable for others). It’s a bit since I last used Caspar (back when HTML got a thing) so any suggestions/tipps/wishes are appreciated here.

didikunz · April 23, 2021, 3:53pm

Wow, that is quite crazy! Can you track any object, like drones in a drone race?

thoeby · April 23, 2021, 4:26pm

No, the graph is trained to recognize people (uses Google MP)

But there are other ways to do something like that. It depends on the size/speed of the drone but I think that would be quite hard to do (especially if you have multiple drones on the same shot who change position all the time) I think in that case you would be better of to track the drones (via Wifi Beacon/LoraWAN or something like that) and camera zoom/movement.

On the other hand (depends on the version of chromium caspar is using) it shouldn’t be to hard to do something like a virtual set (or at least position virtual objects/graphics in 3D space and track them).

Do you have any info on hand which chromium version caspar is using? As far as I can tell it makes the most sense (at least from my perspective) to build everything outside caspar, clone the videofeed and send back the position-values to the template. Or is there a better way to do so (from a developer point of view?)

didikunz · April 23, 2021, 4:31pm

There is a list somewhere, but somebody here can answer that.

I would also do it that way, I guess.

rrebuffo · April 23, 2021, 4:34pm

put in the console:
play 1-1 [html] about:info

didikunz · April 23, 2021, 4:36pm

…did not know that one, cool.

itod · April 26, 2021, 8:06am

On the other hand (depends on the version of chromium caspar is using) it shouldn’t be to hard to do something like a virtual set (or at least position virtual objects/graphics in 3D space and track them).

This would be super useful in many broadcast applications. If you can place demo HTML <div> in fixed 3D position related to the persons position in the surroundings and camera movement - that’s a whole another level.

For instance, is it possible to fix this “SPECIAL COVERAGE” text in HTML with your code, related to camera movement?

didikunz · April 26, 2021, 8:29am

As I understand him, it is only able to track faces, so you could attach a hat or a mustache to her face…

thoeby · April 28, 2021, 2:51pm

Yes, didi got that right it’s ‘just’ for people (Hands, Face, Arms,…). But as long as your focal length+real world camera position is constant it should be possible without any AI/Tracking (edit: ‘just’ with a few sensors on the camera). Got a request from a tv channel to do something like this - can keep you updated on that one.

I quickly tested today to ‘pack everything in a template’ and it seems like there are some issues (not sure what went wrong but it seem like there is an issue with the initialisation process). I think next thing to try would be to have the recognition running outside the template (which would be a better soulution anyways but I had hopes to just try it out the easy way) and just pass the values to caspar via javascript…