Seoul National University

12/12/2025 | Press release | Distributed by Public on 12/12/2025 06:09

A Crash Course on How to Improve AI Image Recognition

A Crash Course on How to Improve AI Image Recognition

Seoul National University's Artificial Intelligence Institute (SNU AIIS) has been diligently working towards its goal of maximising AI impact in academic, industrial, and social settings. Since its founding in 2019, the institute has hosted the AIIS Colloquium Series as part of its education program. The Colloquium Series features lectures by SNU professors on the application of AI across diverse fields and is open to all. On December 4, AIIS presented the second installment of 2025's Autumn Colloquium Series, titled "Multimodal Image and Video Understanding in Various Applications."


Professor Lee presenting "Multimodal Image and Video Understanding in Various Applications"

The lecture was led by Professor Lee Joonseok from the Graduate School of Data Science and focused on three key areas of his image recognition AI research: referring image segmentation, video summarization, and image editing. Referring image segmentation is the process of identifying and segmenting a specific object in an image based on a text description. He explained that while generative AI models excel at targeting unique objects, they struggle significantly with tasks where they need to distinguish between multiple similar objects. When given exercises requiring logical reasoning, AI models failed over 50% of the time, compared to the 80% accuracy rate on easier cases. As a result, current models yield a misleading 60% accuracy rate.

In order to tackle this problem, Professor Lee and his team developed a strategy around data augmentation. By placing four images together-one target and three dummies-the team created a training exercise that forces AI models to learn complex textual cues to differentiate between objects. They additionally used CLIP, a model that understands image-text similarity, to ensure the dummy images were very similar to the target, making the exercise truly challenging for the AI models. This significantly improved performance on difficult tasks without requiring manual labelling of thousands of hard examples, reducing the time and labor required.


Professor Lee speaking on data augmentation and image segmentation

Professor Lee's second area of research focused on the challenges of condensing long videos into short summaries or highlights. His main critique of the subject was the limited size of the standard datasets, which averaged about 25 videos, as well as the inherent subjectivity of what constitutes a good summary. As a result, researchers overfit their models to the test sets and reported high scores, even though the models performed no better than random guessing when tested properly.

Mr. HiSum was Professor Lee's solution to this issue. It is a massive dataset with data scraped from 30,000 YouTube videos, using the site's 'Most Replayed' feature to assume the most important moment of each video. The team applied diffusion models to the datasets to generate importance scores for each frame, which allowed the model to learn a probability distribution and generate multiple valid summaries for the same video.


Professor Lee explaining the difficulties of video summarization

Finally, Professor Lee addressed image editing using text prompts, with particular emphasis on keeping the subject's identity intact. The primary challenge AI models currently face is mixing features of different subjects when trying to edit an image with several specific concepts. For example, when the user requested an image edit involving "my dog" and "my toy", the dog may take on the color of the toy, or the toy may disappear. Another problem is that fine-tuning a model on a few pictures of one specific subject often makes the model forget how to generate other things properly.

To solve this, the team used an original, pre-trained model as a 'teacher' to ensure the new model being trained did not deviate too much from the original knowledge base. By forcing the internal attention maps of the AI model to remain consistent, the team managed to allow for 'Multi-Concept Customization', which successfully places two specific, user-defined objects in a new scene without them bleeding into each other.


Professor Lee delivering his talk to the audience

Wrapping up his lecture, Professor Lee urged students to attempt to reproduce his findings and use any potential failures as research opportunities. As a former researcher at Google, he acknowledged the resource gap between industry and academia. Accordingly, he advised students to focus on novelty and efficiency that may be overlooked in a profit-driven setting, rather than attempting to compete with industry giants in terms of scale.

The Autumn Colloquium Series ends on December 18 with a lecture on "The New Paradigm and Developmental Direction of Self-Driving" by Professor Choi Jun Won of the Department of Electrical and Computer Engineering. The SNU AIIS Colloquium Series will be back in 2026, so all are encouraged to attend these enlightening talks.

Written by Lee Eusun, SNU English Editor, [email protected]

Seoul National University published this content on December 12, 2025, and is solely responsible for the information contained herein. Distributed via Public Technologies (PUBT), unedited and unaltered, on December 12, 2025 at 12:09 UTC. If you believe the information included in the content is inaccurate or outdated and requires editing or removal, please contact us at [email protected]