Gen Li

Hi! I am currently a third-year PhD student at School of Informatics in the University of Edinburgh, fully funded by CDT-RAS scholarship. My supervisors are Dr. Laura Sevilla and Prof. Timothy Hospedales, and I also work closely with Dr. Varun Jampani and Dr. Deqing Sun at Google research.

My research interests lie in few-shot learning, learning under limited supervision, and multi-modal deep learning. Currently, I am working on visual affordance and object functionality understanding.

I am actively seeking a research intern position this year! Please feel free to contact me if you have any available positions or potential collaborations.

news

Mar 28, 2024	The work “OOAL” has been accepted to CVPR 2024!
Jul 24, 2023	Started as a research intern at Huawei, Noah’s Ark Lab, London.
Feb 28, 2023	Our paper “LOCATE” has been accepted to CVPR 2023!
Oct 20, 2021	Our paper “SuperstyleNet” has been accepted to BMVC 2021.
Sep 1, 2021	Joined CDT-RAS PhD at The University of Edinburgh.

selected publications

CVPR’24

×

One-Shot Open Affordance Learning with Foundation Models

Gen Li, Deqing Sun, Laura Sevilla-Lara, and Varun Jampani

In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2024

Abs arXiv Website

We introduce One-shot Open Affordance Learning (OOAL), where a model is trained with just one example per base object category, but is expected to identify novel objects and affordances. While vision-language models excel at recognizing novel objects and scenes, they often struggle to understand finer levels of granularity such as affordances. To handle this issue, we conduct a comprehensive analysis of existing foundation models, to explore their inherent understanding of affordances and assess the potential for data-limited affordance learning. We then propose a vision-language framework with simple and effective designs that boost the alignment between visual features and affordance text embeddings. Experiments on two affordance segmentation benchmarks show that the proposed method outperforms state-of-the-art models with less than 1% of the full training data, and exhibits reasonable generalization capability on unseen objects and affordances.
CVPR’23

×

LOCATE: Localize and Transfer Object Parts for Weakly Supervised Affordance Grounding

Gen Li, Varun Jampani, Deqing Sun, and Laura Sevilla-Lara

In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2023

Abs arXiv Code Website

Humans excel at acquiring knowledge through observation. For example, we can learn to use new tools by watching demonstrations. This skill is fundamental for intelligent systems to interact with the world. A key step to acquire this skill is to identify what part of the object affords each action, which is called affordance grounding. In this paper, we address this problem and propose a framework called LOCATE that can identify matching object parts across images, to transfer knowledge from images where an object is being used (exocentric images used for learning), to images where the object is inactive (egocentric ones used to test). To this end, we first find interaction areas and extract their feature embeddings. Then we learn to aggregate the embeddings into compact prototypes (human, object part, and background), and select the one representing the object part. Finally, we use the selected prototype to guide affordance grounding. We do this in a weakly supervised manner, learning only from image-level affordance and object labels. Extensive experiments demonstrate that our approach outperforms state-of-the-art methods by a large margin on both seen and unseen objects.
CVPR’21

×

Adaptive Prototype Learning and Allocation for Few-Shot Segmentation

Gen Li, Varun Jampani, Laura Sevilla-Lara, Deqing Sun, Jonghyun Kim, and Joongkyu Kim

In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2021

Abs arXiv Code Website

Prototype learning is extensively used for few-shot segmentation. Typically, a single prototype is obtained from the support feature by averaging the global object information. However, using one prototype to represent all the information may lead to ambiguities. In this paper, we propose two novel modules, named superpixel-guided clustering (SGC) and guided prototype allocation (GPA), for multiple prototype extraction and allocation. Specifically, SGC is a parameter-free and training-free approach, which extracts more representative prototypes by aggregating similar feature vectors, while GPA is able to select matched prototypes to provide more accurate guidance. By integrating the SGC and GPA together, we propose the Adaptive Superpixelguided Network (ASGNet), which is a lightweight model and adapts to object scale and shape variation. In addition, our network can easily generalize to k-shot segmentation with substantial improvement and no additional computational cost. In particular, our evaluations on COCO demonstrate that ASGNet surpasses the state-of-the-art method by 5% in 5-shot segmentation.