Cheap AI “video scraping” can now extract data from any screen recording

Video scraping is just one of many new tricks possible when the latest large language models (LLMs), such as Google’s Gemini and GPT-4o, are actually “multimodal” models, allowing audio, video, image, and text input. These models translate any multimedia input into tokens (chunks of data), which they use to make predictions about which tokens should come next in a sequence.

A term like “token prediction model” (TPM) might be more accurate than “LLM” these days for AI models with multimodal inputs and outputs, but a generalized alternative term hasn’t really taken off yet. But no matter what you call it, having an AI model that can take video inputs has interesting implications, both good and potentially bad.

Breaking down input barriers

Willison is far from the first person to feed video into AI models to achieve interesting results (more on that below, and here’s a 2015 paper that uses the “video scraping” term), but as soon as Gemini launched its video input capability, he began to experiment with it in earnest.

In February, Willison demonstrated another early application of AI video scraping on his blog, where he took a seven-second video of the books on his bookshelves, then got Gemini 1.5 Pro to extract all of the book titles it saw in the video and put them in a structured, or organized, list.

Converting unstructured data into structured data is important to Willison, because he’s also a data journalist. Willison has created tools for data journalists in the past, such as the Datasette project, which lets anyone publish data as an interactive website.

To every data journalist’s frustration, some sources of data prove resistant to scraping (capturing data for analysis) due to how the data is formatted, stored, or presented. In these cases, Willison delights in the potential for AI video scraping because it bypasses these traditional barriers to data extraction.

What's Hot

Read this if you own a Juicebox EV charger

“Havard”-trained spa owner injected clients with bogus Botox, prosecutors say

EVgo gets $1B loan for DC chargers

Microsoft Flight Simulator 2024 arrives with a “full digital twin” of Earth

Finally upgrading from isc-dhcp-server to isc-kea for my homelab

Claude AI to process secret government data through new Palantir deal

Most Popular

Read this if you own a Juicebox EV charger

Tablet PC Market to Witness Exponential Growth by 2028, Sources Say

Save $25 on Philips Wired Headphone For A Great Sounding Over-Ear Headphone

Our Picks

Read this if you own a Juicebox EV charger

“Havard”-trained spa owner injected clients with bogus Botox, prosecutors say

EVgo gets $1B loan for DC chargers

Subscribe to Updates

What's Hot

Cheap AI “video scraping” can now extract data from any screen recording

Breaking down input barriers

Related Posts

Subscribe to Updates