Microsoft’s computer vision model will generate alt text for Reddit images

Two years ago, Microsoft announced Florence, an AI system that it pitched as a “complete rethinking” of modern computer vision models. Unlike most vision models at the time, Florence was both “unified” and “multimodal,” meaning it could (1) understand language as well as images and (2) handle a range of tasks rather than being limited to specific applications, like generating captions.
Now, as a part of Microsoft’s broader, ongoing effort to commercializ