InterLinK

Publications

v. Manousaki*, K. Bacharidis*, K. Papoutsakis and A. Argyros, "VLMAH: Visual-Linguistic Modeling of Action History for Effective Action Anticipation", In IEEE International Conference on Computer Vision Workshops (ACVR 2023), Paris, France, October 2023. (*Equal Contribution)

Absract: Although existing methods for action anticipation have shown considerably improved performance on the predictability of future events in videos, the way they exploit information related to past actions is constrained by time duration and encoding complexity. This paper addresses the task of action anticipation by taking into consideration the history of all executed actions throughout long, procedural activities. A novel approach noted as Visual-Linguistic Modeling of Action History (VLMAH) is proposed that fuses the immediate past in the form of visual features as well as the distant past based on a cost-effective form of linguistic constructs (semantic labels of the nouns, verbs, or actions). Our approach generates accurate near-future action predictions during procedural activities by leveraging information on the long- and short-term past. Extensive experimental evaluation was conducted on three challenging video datasets containing procedural activities, namely the Meccano, the Assembly-101, and the 50Salads. The results confirm that using long-term action history improves action anticipation and enhances the SOTA Top-1 accuracy.

Paper: Read online

Code: Visit Site

Filippos Gouidis, Konstantinos Papoutsakis, Theodore Patkos, Antonis Argyros and Dimitris Plexousakis, "Exploring the Impact of Knowledge Graphs on Zero-Shot Visual Object State Classification", In International Conference on Computer Vision Theory and Applications 2024, Rome, Italy.

Absract: In this work, we explore the potential of Knowledge Graphs (KGs) towards an effective Zero-Shot Learning (ZSL) approach for Object State Classification (OSC) in images. For this problem, the performance of tradi- tional supervised learning methods is hindered mainly by data scarcity, as they attempt to encode the highly varying visual features of a multitude of combinations of object state and object type classes (e.g. open bottle, folded newspaper). The ZSL paradigm does indicate a promising alternative to enable the classification of object state classes by leveraging structured semantic descriptions acquired by external commonsense knowl- edge sources. We formulate an effective ZS-OSC scheme by employing a Transformer-based Graph Neural Network model and a pre-trained CNN classifier. We also investigate best practices for both the construction and integration of visually-grounded common-sense information based on KGs. An extensive experimental evaluation is reported using 4 related image datasets, 5 different knowledge repositories and 30 KGs that are constructed semi-automatically via querying known object state classes to retrieve contextual information at different node depths. The performance of vision-language models for ZS-OSC is also assessed. Overall, the obtained results suggest performance improvement for ZS-OSC models on all datasets, while both the size of a KG and the sources utilized for their construction are important for task performance.

Paper: Read online

F. Gouidis, K. Papantoniou, K. Papoutsakis, T. Patkos, A.A. Argyros and D. Plexousakis, "Fusing Domain-Specific Content from Large Language Models into Knowledge Graphs for Enhanced Zero Shot Object State Classification", In AAAI 2024 Spring Symposium on Empowering Machine Learning and Large Language Models with Domain and Commonsense Knowledge, (AAAI-MAKE), Stanford University, USA, March 2024.

Absract: Domain-specific knowledge can significantly contribute to addressing a wide variety of vision tasks. However, the generation of such knowledge entails considerable human labor and time costs. This study investigates the potential of Large Language Models (LLMs) in generating and providing domain-specific information through semantic embeddings. To achieve this, an LLM is integrated into a pipeline that utilizes Knowledge Graphs and pre-trained semantic vectors in the context of the Vision-based Zero-shot Object State Classification task. We thoroughly examine the behavior of the LLM through an extensive ablation study. Our findings reveal that the integration of LLM-based embeddings, in combination with general-purpose pre-trained embeddings, leads to substantial performance improvements. Drawing insights from this ablation study, we conduct a comparative analysis against competing models, thereby highlighting the state-ofthe-art performance achieved by the proposed approach.

Paper: Read online

Victoria Manousaki*, Konstantinos Bacharidis*, Fillipos Gouidis*, Konstantinos Papoutsakis, Dimitris Plexousakis, Antonis Argyros "Anticipating Object State Changes", (Under Review) (*Equal Contribution)

Absract: Anticipating object state changes in images and videos is a challenging problem whose solution has important implications in vision-based scene understanding, automated monitoring systems, and action planning. In this work, we propose the first method for solving this problem. The proposed method predicts object state changes that will occur in the near future as a result of yet unseen human actions. To address this new problem, we propose a novel framework that integrates learnt visual features that represent the recent visual information, with natural language (NLP) features that represent past object state changes and actions. Leveraging the extensive and challenging Ego4D dataset which provides a large-scale collection of first-person perspective videos across numerous interaction scenarios, we introduce new curated annotation data for the object state change anticipation task. An extensive experimental evaluation was conducted that demonstrates the efficacy of the proposed method in predicting object state changes in dynamic scenarios. The proposed work underscores the potential of integrating video and linguistic cues to enhance the predictive performance of video understanding systems. Moreover, it lays the groundwork for future research on the new task of object state change anticipation.

Paper: Read online

Code & Dataset: Visit Site

F. Gouidis, K. Papantoniou, K. Papoutsakis, T. Patkos, A. Argyros and D. Plexousakis "Enabling Visual Intelligence by Leveraging Visual Object States in a Neurosymbolic Framework", Australasian Joint Conference on Artificial Intelligence (AJCAI 2024), Melbourne, Australia, November 2024, , 312-320 In: Gong, M., Song, Y., Koh, Y.S., Xiang, W., Wang, D. (eds) AI 2024: Advances in Artificial Intelligence. AI 2024. Lecture Notes in Computer Science(), vol 15443. Springer, Singapore

Absract: This paper investigates the potential of integrating visual object states for developing methods addressing complex visual intelligence tasks such as Long-Term Action anticipation (LTAA) and proposes that this should be achieved with the aid of a Neurosymbolic (NeSy) framework. We consider that this approach could offer significant advancements in applications requiring nuanced understanding and anticipation of future scenarios and could serve as an inspiration for the further development of Nesy methods exhibiting Visual Intelligence.

Paper: Read online

F. Gouidis, K. Papantoniou, K. Papoutsakis, T. Patkos, A.A. Argyros and D. Plexousakis "LLM-aided Knowledge Graph construction for Zero-Shot Visual Object State Classification", IEEE International Conference on Pattern Recognition Systems (ICPRS), University of Westminster, London, UK, July 2024

Absract: The problem of classifying the states of objects using visual information holds great importance in both applied and theoretical contexts. This work focuses on the special case of Zero-shot Object-Agnostic State Classification (ZS-OaSC). To tackle this problem, we introduce an innovative strategy that capitalizes on the capabilities of Graph Neural Networks to learn to project semantic embeddings into visual space and on the potential of Large Language Models (LLMs) to provide rich content for constructing Knowledge Graphs (KGs). Through a comprehensive ablation study, we explore the synergies between LLMs and KGs, uncovering critical insights about their integration in the context of the ZS-OSC problem. Our proposed methodology is rigorously evaluated against current state-ofthe-art (SoA) methods, demonstrating superior performance in various image datasets.

Paper: Read online

F. Gouidis, K. Papoutsakis, T. Patkos, A.A. Argyros and D. Plexousakis., "Enabling Visual Intelligence by Leveraging Visual Object States in a Neurosymbolic Framework", 27th European Conference On Artificial Intelligence 2024, Workshop on Composite AI (CompAI), October 2024.

Paper: Read online

F. Gouidis, K. Papoutsakis, T. Patkos, A.A. Argyros and D. Plexousakis., "Estimating unseen States of Unknown Objects: Leveraging Knowledge Graphs Zero-Shot Object-agnostic State Classification", In IEEE/CVF Winter Conference on Applications of Computer Vision 2025.

Absract: We investigate the problem of Object State Classification (OSC) in the context of zero-shot learning. Specifically we propose the first method for Zero-shot Object-agnostic State Classification (OaSC) that given an image infers the state of a single object without relying on the knowledge or the estimation of the object class. In that direction we capitalize on Knowledge Graphs (KGs) for structuring and organizing external knowledge which in combination with visual information enable effective inference of the states of objects that have not been encountered in the training set. Having this unique property a significant strength of our method is that it can handle an Open Set of object classes. We investigate the performance of OaSC in various datasets and settings against several hypotheses and in comparison with state-of-the-art approaches for object attribute classification. OaSC outperforms these methods significantly across all benchmarks.

Paper: Read online

InteRlinK
Project

Project Concept/Goals

Goal:

RO1:

RO2:

RO3:

RO4:

Acknowledements

Principal Investigator

Konstantinos Papoutsakis, D.Eng., M.Sc., Ph.D.

Research Team

Fillipos Gouidis

Victoria Manousaki

Advisory Board

Constantinos Panagiotakis

Dimitris Plexousakis

Theodore Patkos

Events

Publications

Dataset

Contact Us

InteRlinK Project

Project Concept/Goals

Goal:

RO1:

RO2:

RO3:

RO4:

Acknowledements

Principal Investigator

Konstantinos Papoutsakis, D.Eng., M.Sc., Ph.D.

Research Team

Fillipos Gouidis

Victoria Manousaki

Advisory Board

Constantinos Panagiotakis

Dimitris Plexousakis

Theodore Patkos

Events

Publications

Dataset

Contact Us

InteRlinK
Project