Leader: Alessandra Sorrentino (UNIFI); Other collaborator(s):
Intelligent and adaptable multi-modal behaviors will be designed and developed to improve HCI/HRI thus to provide advanced human-like interaction and communication. The data perceived from the scene/context will be merged with the user profile to increase the level of understanding, to measure parameters and to plan tailored machine reactions. Particularly, i) social cues (e.g. gaze, emotion, body pose and gestures, voice tone) extracted from vision sensors and Natural Language Processing (NLP) algorithms will be merged at different level to generate models of interaction. Then, ii) such models will be used from advanced reasoning strategies based on AI for leveraging the machine with advanced social capabilities to adapt the behaviors to the humans enhancing the social-task engagement.
Brief description of the activities and of the intermediate results:
The activities of Task 2.1 followed on two main directions, inter-connected among them. On one side, we focused on the identification of the requirements of an innovative co-speech gesture generation model, that once integrated over a robotic platform could foster the social engagement on the end-user. Namely, we revised the current literature on the topic, identifying the main limitation and design the most appropriate solution. We developed a co-speech gesture generation model based on a GAN-based solution, which allows the robot to directly learn the appropriate association between the gesture and the content (what) and quality (how) of speech, simultaneously. We integrated the proposed model on a humanoid robot (i.e. Pepper robot) and we evaluated the developed model quantitively and qualitatively. In the first case, we adopted statistical metrics to compare the quality and accuracy of the generated gestures with respect to the target one. To qualitatively evaluate the mode, we recruited a cohort of individuals that interacted with the robot in an adhoc experimental session, compiling a questionnaire at the end. The results return that the proposed model is accurate and appreciated by the human counterpart, even if fine tuning activity on gesture synchronisation and velocity requires additional effort.
Additionally, the activities of the team have been focused on the engagement estimation task. Namely, we addressed this problem by investigating user engagement dynamics during a robot-to-human (R2H) handover task, considering three main components of engagement: affective, cognitive, and behavioral. For this study, we automatically extracted 10 visual features from the video recordings of 31 participants, using stateof-the-art automatic framework: Mediapipe (for pose estimation), and OpenFace (for gazing detection and emotion recognition). Each individual engaged in eight consecutive sessions with a robot manipulator designed with social cues (i.e. social manipulator). Our statistical analysis indicated that prolonged interaction with the robot could influence user engagement. Comparing the user engagement in the first and in the last interaction, we observed a decrease in positive emotions (affective) and a more regulated Quantity of motion (behavioral). Additionally, there was a reduced attentional focus on the robot’s assigned tasks (cognitive), although the participants’ execution of the task itself remained unchanged. The obtained results were reported in a regular paper, submitted to the 33rd IEEE International Conference on Robot and Human Interactive Communication (IEEE RO-MAN 2024).
Main policy, industrial and scientific implications:
The scenario investigated for assessing the social-task engagement in participants interacting the robotic manipulator is part of a rehabilitation scenario designed in collaboration with the UNIFI-Don Gnocchi Joint Lab.
Please see the next reporting period.
The activities of Task 2.1 focused on the identification of digital biomarkers that can be extracted directly from the sensors mounted over a social robotic platform. Two main scenarios were investigated. The first scenario refers to a gait activity task, in which the human walks in front of the robot, which follows the human from behind. Considering the laser and RGB-D camera as perception sensors of interest, a preliminary data analysis on healthy and young subjects was conducted to automatically segment the gait phases of each foot (i.e., stance and swing phases). Given this information, a selected pool of digital biomarkers was extracted, namely: number of steps (NS), average step length (SL), gait time (GT), and gait velocity (GV). In parallel to this activity, we continued the investigation of digital biomarkers of interest in the robot-to-human handover scenario. In this activity, the work focused on exploiting other digital biomarkers related to the user’s arms motion trajectories that may be associated to the social-task engagement as well as on the user’s task performances. Similarly to the analysis already performed for the behavioral engagement, we exploited the information carried out in the body motion of the user in each sub-phase of the interaction (Reaching, Handover, Placing), and we compared the variability of the collected parameters between the first and the last interactions. A small group of interactions has been currently analyzed (i.e., 4 interactions).
The activities of Task 2.1 continued to be focused on the identification of digital biomarkers that can be extracted directly from the sensors mounted over a social robotic platform during a robot-to-human handover scenario. Namely, the work focused on exploiting the information carried out in the body motion of the user in each sub-phase of the interaction (Reaching, Handover, Placing), and we compared the variability of the collected parameters between the first and the last interactions of each user. A total of 20 users have been analyzed, demonstrating some differences in the behaviors related to the diminishing engagement of the users while performing the task. In parallel to this activity, we also started the design, and the implementation of a real-time hand-tracking system based on visual sensors. Namely, a novel multi-device hand-tracking system composed solely of visual sensors (i.e. two Leap Motion Controllers and one Intel Realsense camera), which could be integrated into several human-robot interaction frameworks. The aim of this system is to detect the hand motion of the user, so that the robot can use this information to better predict the action the user wants to perform in physical human-robot interactions. The system's performance was assessed during two distinct hand gestures, referred to as Grasp 1 and Grasp 2, involving different palm orientations. Data were collected from real users performing 60 grasping repetitions each. The results indicate that middle and index fingertip positions most effectively characterize grasping gestures. In terms of data quality, the RGB-D camera provided higher confidence values in hand detection compared to the Leap Motion Controllers, though its higher rate of missing data reduces overall reliability. The data stream from the Leap Motion Controllers achieved a favorable balance between data quality and quantity. The preliminary results of this activity have been presented at the “Real-World Physical and Social Human-Robot Interaction” workshop, held in conjuction with IEEE-RAS International Conference on Humanoid Robots, Nancy (France), Nov 22 - 24, 2024.
Scientific publications
Dissemination events