
A nano-banana canvas experiment. Generating based on a source image and the overlapping prompts, caching the results. Like adding semantic layers to an image.
Caching by combined image and prompt works well, but doesn't fully solve the order issue of:

Implemented rough cache in the GIF viewer, since GIFs mainly work by applying patches on top of the starting image. I run through the whole thing caching evenly spaced frames at the start, then use them as shortcut base when seeking to frames further in.

I feel like I could do some fun things with a segmenter tuned to drawn objects. MediaPipe (understandably) isn't quite there though.
Potential to do more conventional computer vision stuff? Maybe.