I’ve been thinking about what would be the best architecture for the network I’m building. I started off with the idea that I wanted the network to “focus” on only the objects that were moving. I want the model to be able to pick out the objects that are moving, recognize that they are distinct, and understand that their motion has direction.
During my search for relevant papers, I came across the concept of optical flow. Imagine you watching a clip of a green triangle moving across a white background. The pixels that make up the triangle are not actually moving, but one could say that there are corresponding pixels between two frames of the triangle. The green pixel at the very tip of one of the vertices in frame 1 corresponds to the green pixel the same vertex in frame 2. There’s a sort of “flow” from the pixel in frame 1 to that in frame 2. One could draw a vector from all of the green pixels in frame 1 to all of their corresponding pixels in frame 2 and use that to get the direction in which the triangle is moving.
Admittedly, I feel kind of stuck. I’ve been bouncing around between different papers and it’s a lot of information that I’m not sure what to do with. I’ll need to chat with Igor to figure out next steps.