Evaluating VLM Problem-Solving Capabilities in Point-and-Click Puzzle Games
This post outlines the findings from my recent paper, “Do Vision-Language-Models show human-like logical problem-solving capability in point and click puzzle games?”, co-authored with Maximilian Triebel and Dominik Helfenstein. The study investigates whether current Vision-Language Models (VLMs) can replicate the physical reasoning and precise interactions required in complex puzzle environments. Overview To properly assess these capabilities, we introduced VLATIM (Vision-Language Against The Incredible Machine). This novel benchmark is built around the classic physics puzzle game The Incredible Machine 2.
A Critical Review of Visiomotor Policies and VLA Models in Robotic Manipulation
This post summarizes a recent review I authored, titled “Visiomotor Policies, Vision-Language-Action Models, World Models for Robotic Manipulation: A Review”. The paper provides a comprehensive analysis of the methodological landscape in robotic learning, specifically contrasting specialized task-specific policies with general-purpose Vision-Language-Action (VLA) models. Overview The current paradigm of robotic learning is fragmented. On one side, specialized policies offer high robustness and precision in controlled environments but struggle with novel instructions or unseen visual scenes.
Approaches for the efficient deployment of large code bases for LLMs
This post summarizes key insights from a recent academic paper I co-authored, which investigates methods for handling large codebases using Large Language Models (LLMs). The original paper was written in German and focuses on the technical and structural challenges of applying LLMs in software engineering contexts—especially when working with large repositories. Overview The paper, titled “Ansätze zur effizienten Bereitstellung großer Codebasen für Large Language Models”, presents a structured review of current approaches in the field.