Proceedings
VRST ’25: Proceedings of the 2025 31st ACM Symposium on Virtual Reality Software and Technology
SESSION 1: Interaction Design and Input Techniques I
SpatialMouse: A Hybrid Pointing Device for Seamless Interaction Across 2D and 3D Spaces
We introduce the SpatialMouse, a hybrid pointing device that combines the capabilities of a desktop mouse with the spatial input of a virtual reality (VR) controller, enabling seamless transitions between 2D and 3D interaction spaces in immersive mixed reality environments. Holistic usage scenarios in mixed reality involve tasks suited alternately to 2D or 3D information spaces. Yet, existing input devices excel in either 2D or 3D, but not both, making it necessary to switch between multiple input devices (e.g., mouse and VR controller). Our SpatialMouse addresses this issue, offering the affordances of a desktop mouse for indirect 2D pointing and the spatial capabilities of VR controllers with six degrees of freedom. In a user study with 12 participants, our prototype significantly reduced perceived task load and improved user experience compared to switching between separate devices. We extract design recommendations to further support such hybrid input approaches.
Gated Temporal Shifts with Depth-Efficient Channel Attention for Real-Time Hand-Gesture Interaction
We introduce a compact video-classification pipeline for real-time dynamic hand-gesture recognition in mixed-reality (MR) settings. The network marries a MobileNetV3 backbone with two purpose-built temporal components: (1) a Gated Discriminative Temporal Shift Module (G-DiTSM) that inserts first-order motion differences and learns channel-wise gates to fuse them adaptively, and (2) a lightweight Depth-Efficient Channel Attention (DepthECA) block that recalibrates spatial features on the fly. Operating on eight sparsely sampled frames per clip (Temporal Segment Network paradigm), the resulting model contains 2.65 M parameters and requires only 0.084 GFLOPs per inference. Evaluated on the RGB-only 20BN Jester benchmark (148k clips spanning 27 gesture classes) recorded from front-facing viewpoints. The system reaches 95.34% Top-1 and 99.80% Top-5 accuracy, surpassing recent 3D CNNs and transformer baselines while being an order of magnitude lighter. Ablations confirm that DepthECA and G-DiTSM provide complementary gains (+18.78% and +0.93% Top-1, respectively, over the MobileNetV3 baseline). Because all components are plug-and-play and introduce minimal overhead, the architecture is well suited to the tight latency and power budgets of standalone MR headsets, paving the way for natural grab, rotate, and command interactions using only on-board RGB cameras
Enhancing the Sensation of Depth in Mid-Air Image Interactions with Pictorial Depth Cues
In virtual reality, visual information plays a critical role, and head-mounted displays are widely recognized as the primary means of presentation. However, non-wearable approaches such as projection mapping and mid-air images have also been explored. Mid-air images present content near real objects without screens, making them promising for mixed reality. Yet, their lack of physicality weakens depth perception and diminishes the sensation of pressing buttons. We tested whether pictorial cues (shading, shadow, size) enhance depth perception and button-press sensation in mid-air image UIs. Each experiment involved 14–16 participants. These cues increased perceived depth and improved pressing sensation. These findings suggest that pictorial cues can compensate for the absence of physical sensation and enhance the usability of mid-air image UIs.
Enhancing Freehand VR Interaction Using Fingertip Deformation on User Performance
This study investigated the use of the deformation of the fingertips of a virtual hand to enhance depth perception during freehand interaction in a virtual reality (VR). Artificial fingertip deformation may generate mapping of real hand position and pseudo-haptics, improving UI usability. We conducted two experiments focusing on depth manipulation in both pointing and steering tasks. Our results revealed that changes in fingertip shape reduced operation time in pointing tasks and improved accuracy in steering tasks. Additionally, we conducted subjective evaluation surveys for both experiments, which showed improvements in pseudo-haptics, spatial perception, and user experience. Based on these results, we propose several applications and demonstrate that fingertip deformations in virtual hands can contribute to better 3D UI design.
Trade-offs in Virtual Grasping: The Interplay of Interaction Fidelity and Object Affordance
In Virtual Reality (VR), object grasping is a core interaction that critically influences both user immersion and task performance. While contemporary systems offer both high-precision controllers and intuitive hand tracking, they present a trade-off between performance and naturalness. However, empirical guidance for selecting an optimal grasping method for object grasping remains limited. In particular, how object shape and size (as affordance-related factors) modulate this trade-off within a standardized pick-and-place paradigm is underexplored.
We investigate the interplay between interaction fidelity and object shape/size and its impact on user performance and experience in a controlled pick-and-place task. We conducted a within-subjects study with a 3 (grasping modality: controller, pinch, plausible gesture) × 5 (object shape: cube, sphere, cylinder, handled mug, complex model) × 3 (object size) factorial design. We measured objective performance (task completion time, placement accuracy) and subjective experience (NASA-TLX workload, IPQ presence).
Our findings provide evidence-based answers to the scoped question: “Which grasping method is best suited for an object of a given shape and size in a pick-and-place task?” Ultimately, this work offers actionable guidelines to help VR developers design effective and satisfying object-grasping interactions tailored to users’ task goals and virtual environments, without claiming a single universally “best” method.
Rethinking Gesture Recognition: Toward Fatigue-Aware sEMG Gesture Recognition for VR Interaction
Advances in virtual reality (VR) are transforming interaction paradigms by shifting towards gesture-based control driven by physiological sensing, enabling more intuitive and embodied experiences. Surface electromyography (sEMG) is emerging as a reliable modality for this hands-free and expressive gesture recognition in VR. However, prolonged mid-air gestures can lead to muscle fatigue and physiological changes that degrade overall recognition performance. Further, this degradation is not uniform across gestures which can impact user performance and experience in VR applications. While existing literature has shown that fatigue alters sEMG signals, its effects during extended immersive interaction and across various gestures remain underexplored. We conducted a 35-participant study in which each participant continuously performed five gesture in VR for 20 minutes each, while we collected high-resolution sEMG data from eight forearm sensors and real-time subjective fatigue ratings using the Borg CR10 scale. Further, we evaluate how gesture recognition models behave under fatigue and explore the impact of incorporating both objective (signal-derived) and subjective (user-reported) fatigue features into classification models. Our results show that integrating fatigue signals enhances model robustness and improves recognition accuracy during extended use.
Freehand Sketch-Based 3D Reconstruction with Contour Constraints via Elastic Metrics
Sketch-based 3D reconstruction enables intuitive content creation through freehand drawings, yet generating high-fidelity 3D models from geometrically ambiguous, structurally simplified, and sparse sketches remain challenging. To overcome existing methods’ limitations in sketch-style generalization, contour accuracy, and suboptimal texture effects, we propose an end-to-end framework that generates textured 3D models directly from a single freehand sketch and semantic labels. To address the scarcity of paired freehand sketch training data, we introduce a 3D model-based automated sketch generation method for extracting mesh contours via a 3D mesh-to-sketch pipeline and synthesizing freehand-style sketches employing a Transformer-based stroke generator to construct a paired dataset of hand-drawn sketches and 3D models. Meanwhile, we design a contour constraint mechanism that jointly optimizes projection-space Chamfer distances and elastic metrics, significantly enhancing the reconstruction accuracy of complex geometries. Furthermore, we integrate a semantic-guided texture generation module using Text2Tex with depth-aware diffusion models and dynamic view-optimization strategies, achieving a complete geometry-appearance integrated modeling pipeline. Finally, extensive experimental results demonstrate that our method outperforms existing structural reconstruction and texture synthesis approaches, exhibiting strong generalization capabilities and practical applicability.
SESSION 2: Locomotion and Wayfinding
Not All WIP Are Perceived Equally: Different Speed Expectations in Seated Walk-in-Place Locomotion
Gesture-based locomotion enhances immersion in virtual reality (VR), with seated motion being crucial for accessibility and prolonged use. However, existing techniques often apply uniform gesture-to-walking speed mappings, ignoring the fact that different gestures involve varying levels of physical effort and subjective impressions. This mismatch can degrade the user experience. This study investigates how three seated gestures with different physical loads—Tap-in-Place (TIP), Swing-in-Place (SIP), and Grip-in-Place (GIP)—influence users’ expected walking speed. While the evaluations revealed unique experiential trade-offs for each gesture, our primary finding is a consistent perceptual pattern in the expectation of walking speed: Users expected to walk fastest with SIP, followed by GIP, then TIP (SIP > GIP > TIP). These results demonstrate that a one-size-fits-all approach is insufficient and provide empirical recommendations for designing more intuitive seated VR locomotion systems that align walking speed with user perception.
VisionPort: Enhancing Building-Scale Indoor Navigation through Obstacle-Removing Point-and-Teleport Techniques
We present VisionPort, an enhanced Point-and-Teleport technique designed for navigating building-scale indoor virtual environments, specifically addressing the challenges posed by obstacles including walls, ceilings, and floors. VisionPort is available in two versions: VisionPort-essential and VisionPort-full. VisionPort-essential removes only the necessary portion of an obstacle targeted by the pointer, revealing the landing position behind it. In contrast, VisionPort-full allows for the complete removal of obstacles. Both versions enable users to seamlessly pass through barriers during the Point-and-Teleport locomotion process, while preserving the natural flow of navigation within the virtual building. Our evaluation, conducted in a multi-floor building setting, demonstrates that VisionPort improves navigation by reducing the time, head movement, and distance required to reach destinations. While VisionPort-full enhances efficiency, VisionPort-essential provides users with a greater sense of control, reflecting diverse preferences among participants.
Tunnels vs. Wires: A Comparative Analysis of Two 3D Steering Tasks in Virtual Environments
Steering involves continuous movement along constrained paths, well-studied in 2D. The extensions to 3D using the Ring-and-Wire and Ball-and-Tunnel tasks were often treated as interchangeable in previous work. In this paper, we directly compare these two tasks through a within-subjects user study (n = 18) with varying 3D path orientations. The results show that Ring-and-Wire significantly outperformed Ball-and-Tunnel, with 17.17% lower task time, 21.65% higher throughput, and 21.52% faster average speed. Participants also preferred Ring-and-Wire and reported lower workload. Visual ambiguity, especially near the tunnel’s rear surface, complicated spatial perception in the Ball-and-Tunnel task. We thus recommend that future studies choose 3D steering tasks carefully for experiments, as the two tasks are not interchangeable.
IGUANA: Immersive Guidance, Navigation, and Control for Consumer UAV
As the markets for unmanned aerial vehicles (UAVs) and mixed reality (MR) headsets continue to grow, recent research has increasingly explored their integration, which enables more intuitive, immersive, and situationally aware control systems. We present IGUANA, an MR-based immersive guidance, navigation, and control system for consumer UAVs. IGUANA introduces three key elements beyond conventional control interfaces: (1) a 3D terrain map interface with draggable waypoint markers and live camera preview for high-level control, (2) a novel spatial control metaphor that uses a virtual ball as a physical analogy for low-level control, and (3) a spatial overlay that helps track the UAV when it is not visible with the naked eye or visual line of sight is interrupted. We conducted a user study to evaluate our design, both quantitatively and qualitatively, and found that (1) the 3D map interface is intuitive and easy to use, relieving users from manual control and suggesting improved accuracy and consistency with lower perceived workload relative to conventional dual-stick controller, (2) the virtual ball interface is intuitive but limited by the lack of physical feedback, and (3) the spatial overlay is very useful in enhancing the users’ situational awareness.
Beyond Parabolas: Linear Pointer Teleportation for Vertical Navigation in VR
Virtual reality teleportation using hand tracking faces significant challenges in vertical navigation, with conventional parabolic methods requiring users to mentally calculate trajectories and landing points. We present two novel linear pointer-based teleportation methods—SphereBackcast and Penetration—that enable intuitive vertical movement through direct pointing and collision handling strategies. Through two experiments involving 34 participants total, we evaluated these methods against traditional parabolic teleportation across diverse environments ranging from flat terrain to multi-level structures with 4m height differences. Results demonstrate that the Penetration method significantly outperforms parabolic teleportation, achieving 53% faster completion times (13.85s vs 29.38s) and 35% lower path deviation (0.552m vs 0.851m) in environments with 2m+ vertical elements. Controller input provided 12-18% performance improvements over hand tracking while maintaining consistent relative advantages of linear methods. Both proposed methods received superior usability ratings (SUS: 69.2 and 68.75 vs 64.66) and reduced cognitive workload (NASA-TLX: 25.35 and 25.08 vs 33.05), with 50% of participants preferring the Penetration method. These findings establish efficient teleportation techniques that address critical limitations in current VR navigation, offering practical solutions for applications requiring vertical movement such as adventure games and architectural visualization.
You Have Arrived… Kind of: Investigating the Limits of Undetectable Destination Displacement During Teleportation
Teleportation has become a popular locomotion method for virtual reality due to lesser demands on physical space and decreased levels of motion sickness compared to other methods. However, prior work has shown that these advantages come at the cost of impaired spatial perception and awareness, the extent to which is still largely unknown. In this work, we present a within-subjects study (N = 29) that explores the effects of teleportation on spatial perception by investigating how much humans can be unknowingly displaced relative to their intended destination during teleportation. After teleporting to the specified location, participants indicated the direction and magnitude (small, medium, large) of the perceived shift or rotation. Displacement from the target happened either as a translation in the forward- or strafe-axis, or a rotation about the up-axis at the intended target. Each displacement condition included eleven offsets that were repeated six times. Our results indicate points of subjective equality, which show a significant perceptual shift along the forward-direction, as well as detection thresholds, which indicate a comparatively wide range in which humans are unable to detect induced shifts. Furthermore, our results show that even if humans are able to detect these shifts, larger ones can be introduced before their magnitudes are rated as medium or large, which provides ample opportunities for interface designers who want to leverage these results in virtual reality.
SESSION 3: Cybersickness, Health, and Digital Twins
Unmanned Aerial Vehicles Control in a Digital Twin: Exploring the Effect of Different Points of View on User Experience in Virtual Reality
Controlling Unmanned Aerial Vehicles (UAVs) is a cognitively demanding task, with accidents often arising from insufficient situational awareness, inadequate training, and bad user experiences. Providing more intuitive and immersive visual feedback—particularly through Digital Twin technologies—offers new opportunities to enhance pilot awareness and the overall experience quality. In this study, we investigate how different virtual points of view (POVs) influence user experience and performance during UAV piloting in Virtual Reality (VR), utilizing a digital twin that faithfully replicates the real-world flight environment. We developed a VR application that enables participants to control a physical DJI Mini 4 Pro drone while immersed in a digital twin with four distinct camera perspectives: Baseline View (static external), First Person View, Chase View, and Third Person View. Nineteen participants completed a series of ring-based obstacle courses from each perspective. In addition to objective flight data, we collected standardized subjective assessments of user experience, presence, workload, cybersickness, and situational awareness. Quantitative analyses revealed that the First Person View was associated with significantly higher mental demand and effort, greater trajectory deviation, but smoother control inputs compared to the Third Person and Chase perspectives. Complementing these findings, preference data indicated that the Third Person View was most consistently favored, whereas the First Person View elicited polarized reactions.
An In-the-Wild Accessibility Evaluation of Apple Vision Pro for Deaf or Hard of Hearing Users
Extended Reality (XR) technologies, including Mixed Reality (MR), Augmented Reality (AR), and Virtual Reality (VR), are blurring the lines between physical and digital environments, transcending the limitations of traditional two-dimensional (2D) interfaces. This shift toward embodied, often context-aware spatial interaction offers broad potential benefits, yet also introduces unique challenges, especially for certain user groups. For people who are Deaf or hard of hearing (DHH), XR’s immersive and multi-sensory environments provide unique opportunities to improve accessibility. However, design principles that work well in 2D interfaces may not always translate seamlessly into immersive contexts, creating new accessibility barriers. The launch of the Apple Vision Pro marks a significant moment in the mainstream adoption of spatial computing, yet little is known about its accessibility implications for deaf users. To explore this emerging area, we conducted an in-the-wild, open-ended study with five deaf participants who have diverse communication preferences, evaluating the Apple Vision Pro in everyday situations. Based on this exploratory evaluation, we identify key accessibility challenges and opportunities and provide practical recommendations to make spatial computing more inclusive for deaf users in the future.
Design and Evaluation of a Mixed Reality Biofeedback System for Home-Based Physiotherapy Exercises
Home-based exercise programs are a cornerstone in managing chronic non-specific back pain. However, their effectiveness is often limited by low adherence and incorrect exercise execution. This study presents and evaluates a Mixed Reality (MR) biofeedback system that tracks body motion using a multi-Kinect setup and provides real-time feedback via the Microsoft HoloLens 2. The evaluation focuses on whether the proposed real-time biofeedback enables participants to perform physiotherapy exercises more accurately and in closer alignment with prescribed guidance in a home-based setting, while also assessing system usability as well as cognitive and emotional workload experienced by users.
Thirty-two healthy adults (16 female, 16 male) participated in two sessions in a counterbalanced cross-over design. In the first session, participants were introduced to the rehabilitation exercises and instructed in the use of the MR-based biofeedback system. On the second session, which took place 2–3 days later, they performed the exercises with and without the system in a simulated home-based scenario. Of a total of 23 observed parameters, 17 showed improvements, including 8 with notably strong progress. The findings demonstrate that MR biofeedback improves the accuracy of exercise execution in home-based physiotherapy.
A Systematic Mapping Study on the Joint Use of AI and VR in Stroke Care
Context. Stroke remains a leading cause of long-term disability, prompting growing interest in emerging technologies like artificial intelligence (AI) and virtual reality (VR) to improve treatments. The combination of AI’s adaptability and VR’s immersive environments holds promise for personalized, engaging, and scalable stroke care, though research in this area remains fragmented. Objective. This study provides an overview of current research on the combined use of AI and VR in stroke care, focusing on system types, clinical validation, technologies employed, and autonomy levels. Method. We conducted a systematic mapping study of papers published between 2014 and 2024. Results. We identified 73 relevant studies. Most systems are still in early prototype or usability-testing stages, with limited clinical validation and frequent human oversight. Technologies used are diverse, and longitudinal evaluations are rare. Conclusion. Significant research gaps persist, including limited validation, lack of pre-stroke applications, and fragmented tools. These findings offer guidance for developing more robust, clinically viable, and interoperable AI and VR systems for stroke care.
The Impact of Sensory Levels on Presence and Cybersickness in Virtual Reality
This study investigates how varying sensory inputs—visual, auditory, and tactile—influence both presence and cybersickness in a virtual reality (VR) environment. Grounded in sensory conflict theory, which posits that mismatched multisensory input can cause cybersickness symptoms, we evaluated participant responses across three sensory configurations: video-only (V), video with audio (VA), and video with audio and directional wind (VAF). Fifty-six participants experienced a VR roller coaster simulation in either increasing or decreasing sensory order. Presence was assessed through self-reported realism, movement perception, and speed, while cybersickness was measured using the Simulator Sickness Questionnaire (SSQ), heart rate monitoring, Fast Motion Sickness (FMS) scale, and discomfort ratings. Results showed that increased sensory input significantly enhanced realism and speed perception, indicating improved presence. Similarly, the highest sensory condition (VAF) yielded the lowest cybersickness indicators across FMS, heart rate, and discomfort ratings. These findings suggest that carefully integrated multisensory stimuli can improve VR user experience by increasing immersion and reducing discomfort.
Ghost in the VR Shell: Capturing Spectral Cardio-Respiratory Rates from Subtle VR Device Movements
This work examines the estimation of heart rate and respiratory rate using only the kinematic data captured by a consumer-grade standalone VR devices. The high-resolution motion tracking offered by these devices creates an opportunity for indirect vital sign detection through spectral analysis of subtle VR device movement data. In our study, kinematic data were collected from a Meta Quest 3 head-mounted display, controllers and MX Ink pen across multiple posture configurations (e.g., seated, standing, lying down), both at rest and after moderate exercise. These postures emulate real-world XR scenarios for rest, fitness, and meditation. The collected data was processed using what we refer to as the Ghost approach, a simple yet effective method that applies a Fast Fourier Transform to capture the spectral components associated with respiratory and cardiac rhythms. Ground-truth biosignals were simultaneously recorded using wearable physiological sensors for validation. Results clearly reveal that both heart rate and respiratory rate can be reliably estimated from subtle micro-movements in the head-mounted display, VR controllers, or VR pen, revealing the potential for non-contact physiological monitoring within immersive environments. Finally, we demonstrate a use case of a VR stethoscope, where a standard VR controller is repurposed to estimate heart and respiratory rates.
See It and Hear It: Multimodal Guidance in MR-Based Neurosurgical Simulation for Skill Retention
External Ventricular Drain (EVD) placement is a complex neurosurgical task that requires identifying a target point within the brain and accurately positioning a catheter at the appropriate angle. While Mixed Reality (MR) technologies have seen limited adoption in the operating room, they offer significant potential for developing training systems that enhance skill acquisition and retention in unaided conditions. A current gap in research concerns the effectiveness of multimodal guidance systems that incorporate both visual and audio-based MR cues. In this paper, we present an MR-based simulator for EVD placement training and evaluate the impact of three MR-guided training modalities: (1) a baseline condition using only 2D CT scans and a 2D catheter projection; (2) a visual guidance modality incorporating a 3D trajectory overlay; and (3) an embodied-audio guidance modality featuring a virtual agent delivering spoken instructions and feedback. Participants underwent a digital training phase using one of the three modalities, followed by an unaided EVD placement on a physical phantom with a real catheter to evaluate skill transfer and retention. Results indicate that both advanced MR modalities significantly improve procedural accuracy, execution speed and receive higher scores in usability and technology acceptance compared to the baseline. Notably, training with 3D visual trajectory guidance led to significantly higher unaided placement accuracy, indicating stronger skill retention. However, multimodal guidance demonstrated equivalent execution speed, while showing a trend toward lower overall cognitive load.
A Feasibility and Impact Investigation of Continuous Subjective Cybersickness Feedback Reporting
This paper quantitatively investigates the feasibility of, and merit in, soliciting continuous subjective cybersickness ratings as participants passively engage in an immersive VR experience. The main research questions addressed are: (1) Feasibility: To what extent will participants successfully engage, unprompted, in continuous cybersickness reporting while engaging with a secondary task? and (2) Merit: To what extent do continuously reported subjective cybersickness ratings offer valuable insights beyond what can be obtained from less frequent querying?
Participants used a physical slider device, in conjunction with discreet visual feedback, to continuously report their instantaneous motion sickness state as they rode nineteen consecutive rounds of a virtual roller coaster ride and performed a simple visual counting task. We analyzed the reported sickness ratings in the context of pre-post SSQ scores, #rounds endured before quitting, tonic skin conductance levels (SCL), optical flow of the visual stimulus, and rotational and translational velocity of the virtual viewpoint, as well as in comparison to previously-obtained data from different participants who underwent the same exposure but only verbally reported a single FMS score at the end of each round (every 65s).
We found that most participants used the slider actively, and that, averaged across participants, the reported sickness scores not only increased over time but also varied up and down in conjunction with the intensity of the ride. We found a statistically significant positive correlation between instantaneous reported sickness levels and tonic electrodermal activity in 76% of our participants, as well as a statistically significant positive correlation with optical flow magnitude and viewpoint rotational velocity. We observed no significant differences in #rounds completed, Δ SSQ scores, and average maximum or average last-reported sickness levels between the continuous and discrete reporting groups.
Altogether, our results (1) demonstrate the feasibility of collecting valid self-reported ratings of cybersickness on a continuous basis during a passive VR experience, and (2) suggest that such data has the potential to be useful for better understanding cybersickness evolution in the context of potentially transient triggers.
SESSION 4: Interaction Design and Input Techniques II
The Importance of Cueing While Visually Searching a 360 Degree Environment for Multiple Targets in the Presence of Distractors
Visually searching for objects is an everyday task. In many contexts, people must visually search for multiple objects at the same time while avoiding distractor objects, such as triage during a mass casualty incident. While many prior augmented reality (AR) and virtual reality (VR) studies have investigated cues to aid in visual search tasks, few have investigated cues in contexts involving multiple targets and distractors with a full 360° effective field of regard (EFOR). Individually, multiple targets, distractors, and a full 360° EFOR each add complexity to visual search; when combined, they compound the difficulty even further. In this paper, we present such a study that compares three common types of visual cues (2D Wedge, 3D Arrow, and Gaze Line) to a baseline condition with no cueing for a 360° visual search task. Our results reinforce the importance of providing some type of cue, with the Gaze Line design being particularly beneficial. We discuss the potential implications of these findings for designing cues specifically for such complex visual search tasks.
Perceiving Multilingual Text in Virtual Reality: Glyph Complexity and Font Effects on Preferred Viewing Distance
Differences in text perception among users of diverse language backgrounds may be accentuated in immersive environments. To investigate this, we measured preferred perceptual viewing distances in virtual reality (VR) as a function of language familiarity, glyph complexity, font weight and font type. 30 native readers each of Chinese, English, and Japanese adjusted text panels—initially placed at 0.5 m, 2.5 m, 5 m, and 10 m—to the distance they perceived as most appropriate for reading. Stimuli varied in visual complexity (simple vs. complex characters or words) and in font style (serif vs. sans-serif, light vs. bold). Our results show that at the farthest distances, native English and Japanese readers positioned text significantly farther away than non-natives, indicating a top-down perceptual compensation effect; this advantage was not observed for native Chinese participants. Moreover, at the closest distances, native English readers also required slightly farther viewing distances across all language conditions. Across all groups, simple glyphs and bold fonts supported greater perceptual distances, whereas complex glyphs and light fonts required closer viewing. These findings suggests that how language background and font variables shape text perception in VR and provide a theoretical basis for adaptive rendering to optimize display parameters for diverse user populations.
Assessing Redundant Interface Designs for Precise Number Input in Virtual Reality
Typing and editing precise numerical input, in particular floating-point values, are essential tasks in Spatial Computing for applications such as 3D precision modeling, object measurement, object dimensioning, mathematical visualization, and immersive media creation. Yet the adopted interfaces and interaction techniques in VR/AR/MR are often replicas of flat, palm-sized numpads such as those found in physical calculators or their WIMP counterparts. To move beyond such conventional confines, this paper explores redesigning the numpad by leveraging the spatial freedom of VR with a specific focus on introducing redundancy in the input of floating point values. To do so, we took inspiration from mechanisms such as combination dials, movable numbers, and a mechanical calculator that offer a larger and multi-column number layout. We assess how redundant interfaces can enhance user experience and efficiency when it comes to precise number editing of floating point values. Through a user study (N=30), we compared four numpads where participants engage in inputting a list of target numbers within a virtual environment. Our findings reveal that the redesigned numpads, which utilize redundant design elements, were preferred by users over the conventional numpad design as they provided clearer and more efficient number input methods in VR.
SnapSteer: A Bimanual 3D Manipulation Interface with Limitable Motion Degrees of Freedom
We propose SnapSteer, a bimanual 3D manipulation interface using common VR controllers, which allows restriction of motion degrees of freedom (DoFs) as needed. This interface is based on the conventional one-handed 6-DoF manipulation interface called Robot Telekinesis, and assigns the other hand the role of controlling whether and in which direction the DoFs are restricted. This enables users to quickly switch between unconstrained 6-DoF operation and precise 1-DoF operation according to the task. We designed and implemented a prototype of this interface in VR, and conducted a user study (N=12) comparing its performance in a straight 3D steering task with two baseline interfaces (i.e., a 6-DoF individual control interface and Robot Telekinesis). The results showed that our interface outperformed the other two in task efficiency. On the other hand, there was no significant difference in subjective workload or usability compared to Robot Telekinesis, and which derives discussion of improvements to visual feedback during the direction adjustment phase.
Beyond the Portal: Enhancing Recognition in Virtual Reality Through Multisensory Cues
While Virtual Reality (VR) systems have become increasingly immersive, they still rely predominantly on visual input, which can constrain perceptual performance when visual information is limited. Incorporating additional sensory modalities, such as sound and scent, offers a promising strategy to enhance user experience and overcome these limitations. This paper investigates the contribution of auditory and olfactory cues in supporting perception within the portal metaphor, a VR technique that reveals remote environments through narrow, visually constrained transitions. We conducted a user study in which participants identified target scenes by selecting the correct portal among alternatives under varying sensory conditions. The results demonstrate that integrating visual, auditory, and olfactory cues significantly improved both recognition accuracy and response time. These findings highlight the potential of multisensory integration to compensate for visual constraints in VR and emphasize the value of incorporating sound and scent to enhance perception, immersion, and interaction within future VR system designs.
CONTEXT-GAD: A Context-Aware Gaze Adaptive Dwell model for Gaze-based Selections in XR Environments
Gaze-based selection, via techniques such as gaze dwell, is one of the most common hands-free interaction performed by users in eXtended Reality (XR) environments. However, selecting a small constant dwell threshold to activate a target might lead to miss-interactions, also known as the Midas Touch problem, while a large threshold leads to eye fatigue. Prior research has proposed methodologies to adapt dwell thresholds based on the probability of the user activating a certain target considering past interactions or predicting intent based on gaze features. However, utilizing past inputs or gaze features leads to a heavily biased system towards individual strategy or physiology and cannot be generalized to other XR scenarios or users. In this work, we propose a novel context-aware system that leverages visual features of the task environment and user behavioral features such as the frequency of interactions, gaze speed variance, and head rotation velocity to adapt dwell thresholds across three distinct levels. We conducted a data collection experiment with twenty participants performing gaze dwell interactions in a general User Interface (UI) navigation task, and a visual search task. We trained a hierarchical machine learning model to predict and adapt dwell thresholds into three levels based on the induced cognitive load. We evaluated our system by utilizing standard machine learning metrics and by conducting a user study (n=17) based on quantitative and qualitative measures. Our system achieves a classification accuracy of \(70.72\%\) on the first level and \(85.43\%\) on the second. In addition, the system significantly reduces task completion time in less complex tasks and improves error rates in more cognitive intensive scenes.
Saccaidance: Saccade-Aware Pattern Embedding for Gaze Guidance on High-Speed Displays
Gaze guidance is essential for directing user attention to specific areas of interest. However, conventional visual cues generate persistent visual noise that hinders concentration during tasks. We propose Saccaidance, a gaze-guidance method that appears only when users move their gaze. Saccaidance employs temporal additive color mixing and 480 Hz high-speed displays to shift the color phase of guidance patterns. This renders the patterns barely visible during fixation and makes them appear transiently when users move their gaze as a color-breaking effect. This intermittent gaze guidance appears only during gaze transitions, providing effective guidance without interfering with focused work or requiring eye-tracking hardware. We conducted experiments with 24 participants under four conditions that involved search tasks: an unmodified baseline, conventional explicit guidance, and our proposed method using oval and radial patterns. The results show that our approach effectively constrains the exploration area while preserving subjective naturalness. We also outline application scenarios of our method, including document highlighting.
SESSION 5: Human Factors
Five-day research-in-the-wild observation of notifications on smartglasses: A double edged sword
Notifications are a fundamental aspect of daily computing, whether on desktops, laptops, smartphones, or smartwatches. On average, adults receive around 200 notifications per day—approximately one every five minutes during waking hours. As Extended Reality (XR) headsets advance, they may become the primary medium for digital interactions, making notification management a crucial factor in their usability. While notifications are known to be disruptive on smartphones, their impact could be even more pronounced on head-worn devices. To investigate this, we conducted an exploratory five-day study with eight participants wearing display-equipped smartglasses that delivered notifications from their smartphones. Participants used the glasses throughout their day for at least 2 hours receiving on average 62% of all notifications on the glasses, submitted daily journal entries, and participated in post-study interviews. We also logged notification sources and timestamps throughout the study. Our findings reveal both practical advantages and significant challenges of head-worn notification delivery. While participants appreciated the convenience and immediacy of glanceable alerts, concerns about privacy, social acceptability, and distraction emerged as key barriers to adoption.
Exploring How Prior Knowledge and Presence Shape Transfer of a Reversed Size-Weight Illusion From Virtual to Real
Virtual Reality (VR) can create experiences that conflict with a user’s prior knowledge; however, how such conflicts influence subsequent real-world behavior remains unclear. This study explores how a virtual experience that contradicts real-world expectations affects later perception and motor actions, using the size-weight illusion—where people expect larger objects to be heavier than smaller ones. We conducted a 2 (internal model robustness: reinforced vs. weakened) by 2 (presence: high vs. low) mixed-design experiment. Participants first received real-world training to either strengthen or weaken their size-weight expectations, then experienced a reversed size-weight mapping in VR under varying levels of presence. We assessed how this virtual experience influenced real-world weight estimation and object lifting behavior. Results showed that participants with weakened prior knowledge exhibited lower confidence in their weight judgments and greater motor instability when lifting objects. However, the level of presence in VR did not significantly affect transfer outcomes. These findings suggest that the strength of prior knowledge modulates how conflicting virtual experiences influence real-world behavior, underscoring the need for careful VR design, particularly for younger users with less stable internal models.
AR-TMT: Investigating the Impact of Distraction Types on Attention and Behavior in AR-based Trail Making Test
Despite the growing use of AR in safety-critical domains, the field lacks a systematic understanding of how different types of distraction affect user behavior in AR environments. To address this gap, we present AR-TMT, an AR adaptation of the Trail Making Test that spatially renders targets for sequential selection on the Magic Leap 2. We implemented distractions in three categories: top-down, bottom-up, and spatial distraction based on Wolfe’s Guided Search model, and captured performance, gaze, motor behavior, and subjective load measures to analyze user attention and behavior. A user study with 34 participants revealed that top-down distraction degraded performance through semantic interference, while bottom-up distraction disrupted initial attentional engagement. Spatial distraction destabilized gaze behavior, leading to more scattered and less structured visual scanning patterns. We also found that performance was correlated with attention control (R2 =.20–.35) under object-based distraction conditions, where distractors possessed task-relevant features. The study offers insights into distraction mechanisms and their impact on users, providing opportunities for generalization to ecologically relevant AR tasks while underscoring the need to address the unique demands of AR environments.
How a task-blind adaptive VR system can improve users’ task performance: an assisted immersive analytics use case
Recently, some works have built adaptive systems providing assistance to the user in virtual reality (VR), with little or no knowledge of the user’s task. These task-blind help systems can influence behaviours and exploration strategies; however, their ability to significantly improve users’ performance on their tasks is still unclear. In this study, we aim to clarify the impact of task-blind help systems on user performance. We also explore two avenues that could provide a better understanding of why these systems can be effective and interesting to study. Our controlled user study involved 56 participants in an immersive analytics environment and compared four VR help-system configurations, including three task-blind systems and a no-assistance baseline. Results showed significant task performance improvements with one task-blind system, highlighting user control as a key factor of efficiency. This work demonstrates the potential of task-blind help systems, offering a flexible framework for adaptive design and raising questions about their broader applications.
Guiding Attention in VR: Comparing the Effect of Peripheral and Central Cues on Presence and Workload
Virtual Reality applications increasingly require methods to effectively guide users to important elements within the virtual environment. Central visual cues are the most common method, which have proven effective for directing attention, yet often compromise on level of immersion. This work explored whether peripheral visual cues could serve as an alternative approach that supports attention guidance while preserving sense of presence. We performed a user study with 24 participants to compare four visual cues: two central cues (Floating Text and Floating Arrow) and two peripheral cues (Edge Lighting and Swarm). Users completed a visual search task of 7 objects for each visual cue, with data collected on performance through reaction time, round time, and total errors. Additionally, presence and workload were evaluated through the IGROUP Presence Questionnaire and NASA Task Load Index, respectively. No statistically significant differences were found between peripheral and central cues for presence, however performance and workload varied significantly based on specific cue implementation rather than type of positioning. Our findings indicate that peripheral positioning does not inherently provide attention guidance advantages over central placement. Instead, thoughtful cue design, with a simple yet clear appearance and behavior appears to be the critical factor for achieving effective attention guidance while preserving presence in IVEs. These results provide valuable insights for VR content creators to facilitate the design process of VR experiences.
PatchFusionVR: Multitask Prediction of User Gaze, Reaction Time, and Cognitive Load in Virtual Reality from Multimodal Signals
Enhancing user experience and performance, including task load in immersive environments, requires accurate prediction of user gaze point, reaction time, and mental and physical load uptake. Current gaze prediction approaches focus primarily on motion-based information, lacking physiological data, which leads to poor prediction accuracy in highly dynamic virtual reality (VR) environments. Traditional cognitive load measurements rely on post-task analysis without proper multimodal data integration and fail to capture the real-time dynamics of user states during interaction. Likewise, reaction time or attention load are often assessed only after the interaction, without using real-time immersive sensor data, which limits adaptive responsiveness. To tackle these limitations, we leveraged a comprehensive multimodal dataset – VRWalking, which recorded timestamped eye-tracking metrics, physiological signals (heart rate and galvanic skin response), and behavioral performance data during real-time engagement in a VR environment. We developed a unified multitask model based on the MultiPatchFormer architecture, which processes multimodal VR signals through dual patch projection branches for gaze and classification inputs. The model employs multiscale patch embeddings, cross-attention between gaze and classification pathways, channel attention, and transformer encoders to jointly predict continuous user gaze and classify reaction time, cognitive load (mental load and physical load). Our methodology achieved excellent predictive performance: 95.64% for reaction time, 98.01% for mental load, and 97.45% for physical load, with a MAPE (Mean Absolute Percentage Error) of 15.24% for gaze prediction. We applied Shapley Additive explanations (SHAP) analysis to interpret the model’s behavior across all features, including eye-tracking, head-tracking, and physiological signals. The analysis revealed which features most influenced the predictions of user gaze, reaction time, mental load, and physical load. Our methods, while based only on the VRWalking dataset, demonstrated strong performance across all tasks, suggesting promising potential for real-world VR applications such as interactive training systems that respond to user attention lapses, educational platforms that adapt to cognitive load, and performance assessments that consider physiological indicators.
SESSION 6: Multimodal Experiences
TangiAR: Markerless Tangible Input for Immersive Augmented Reality with Everyday Objects
Tangible interactions with everyday objects have been shown to be fast, accurate, and natural, and have shown promise when combined with immersive augmented reality. However, implementing tangible controls presents considerable challenges. Previous works in the field either rely on additional tracking markers on objects, inadvertently shifting the difficulty to users, or are too computationally demanding for real-time operation on a head-mounted display (HMD). We propose TangiAR, a tangible control system which tracks everyday objects without the need for fiducial trackers, enabling them as passive controllers and virtual proxies in AR applications. TangiAR additionally enables hand and finger proximity interactions with tangibles, further expanding the interaction space. TangiAR can run on an unmodified Microsoft HoloLens 2, making it immediately practical. We evaluated the performance of TangiAR through a technical evaluation, including occlusion robustness and tracking accuracy tests, and a user study which examined the usability of our markerless object tracking system in various AR interactions.
Manipulating Stiffness Perception of Compliant Objects While Pinching in Virtual Reality
Providing users with realistic sensations of object stiffness in virtual environments remains challenging due to the intricacies of our haptic sense. We investigate the use of a visuo-haptic illusion to alter the perceived stiffness of hand-held objects in virtual reality. We manipulate the Control-to-Display ratio of the index finger and thumb movements during pinching to make virtual objects feel softer or harder. We evaluated this approach on a variety of haptic representations and visualizations we selected through a pre-study survey (N=24). Results of our user study (N=20) demonstrate that this method effectively and reliably modifies stiffness perception, bridging gaps of 50% in physical stiffness without adversely affecting the visuo-haptic experience. Our findings offer insights into how different visual and haptic presentations impact stiffness perception, contributing to more effective and adaptable future haptic feedback systems.
Impact of passive haptics on task performance: of the effect of technological evolution
Since its early development in the 1990s, Virtual Reality (VR) technology, particularly head-mounted displays (HMDs), has seen significant advancements. In 1999, an empirical study demonstrated that passive haptics could significantly improve both user performance and preference in 2D tasks. In this paper, we replicate this experiment using modern VR hardware to investigate the influence of technological evolution on the relevance of passive haptics in similar scenarios. Our findings show that, for the tasks examined, performance in non-haptic conditions with current VR systems is comparable to that in haptic conditions from 1999, challenging the relevance of passive haptics for such tasks for nowadays standards. Our results imply that enhancements in visual fidelity, tracking and interaction design may have reduced the performance gap that passive haptics were previously used to address.
Understanding Latency Sensitivity in Thermal and Tactile Feedback for Multimodal Haptics in VR
Low-latency multimodal feedback is essential for maintaining a high-quality user experience in VR; however, unpredictable network conditions can introduce latency that negatively impacts user experience. This work investigates how users perceive multimodal haptic feedback—specifically thermal (hot/cold) and tactile stimuli—and how latency in such feedback affects user experience. We first measured users’ response times for thermal, tactile, and combined thermal-tactile stimuli. Subsequently, we conducted a psychophysical study to identify delay thresholds for each modality by examining temporal congruency between visual and haptic cues. We designed a haptic delay network simulator to emulate a realistic network environment. Results highlighted that combined thermal-tactile feedback has higher latency tolerance than thermal-only feedback, indicating that multimodal integration can buffer the negative effects of latency. Using these thresholds, we designed controlled latency conditions and assessed user experience. Based on our findings, we propose design recommendations for haptic data transmission in networked VR systems.
SESSION 7: Immersive Visualization and Interaction
Animated Transitions for Abstract and Concrete Immersive Visualizations: A Design Space and Experiment
While data visualizations are typically abstract, there is a growing body of work around concrete visualizations, which use familiar objects to convey data. Concrete visualizations can complement abstract ones, especially in immersive analytics, but it is unclear how to design smoothly animated transitions between these two kinds of representations. We investigate a design space of abstract and concrete visualizations, where animated transitions are pathways through the design space. The design space is defined with four axes, each corresponding to a different transformation. We consider different ways to design animated transitions by staging and ordering the transformations along these axes. In a controlled experiment conducted in virtual reality with 16 participants, we compared four types of animated transitions and found quantitative and qualitative evidence of the superiority of a specific staging approach over the simultaneous application of all transformations. Our study pre-registration is available at https://osf.io/8mu73.
A Low-Latency Volumetric Display and Its Application to an Augmented Reality Mirror
Most conventional swept volumetric displays do not allow for direct physical interaction because their rigid, high-speed sweep screens make touch dangerous or impractical. However, several existing approaches enable direct interaction with volumetric content through the use of devices such as re-imaging plates and flexible screens. A major challenge in implementing direct interaction systems lies in the high demand for latencies, but it is, in general, difficult to achieve low latencies with swept volumetric displays because their refresh rates are limited by the physical sweep periods.
This paper reports yet another type of interactive swept volumetric display using a half mirror to realize a truly 3D augmented reality mirror system. It enables a “pseudo-direct” interaction by aligning the volumetric content with the user’s mirror image without occlusion problems inherent in the existing direct-interactive systems. This configuration, which presents the reflected real body and the displayed content closely together without occlusion, faces a more severe demand on latencies. To address this, we propose a new low-latency control method of a swept volumetric display to make the displayed content swiftly track the mirrored target movement. The proposed method dynamically updates each slice of the volumetric content in response to the latest pose of the tracked target, without increasing the sweeping rate of the screen.
Experiments demonstrate that the proposed method effectively maintains image fidelity at a moderate speed of target movement while significantly reducing perceived latency, enabling smooth and natural pseudo-direct interaction with volumetric content.
Investigating Resolution Strategies for Workspace-Occlusion in Augmented Virtuality
Augmented Virtuality integrates physical content into virtual environments, but the occlusion of physical by virtual content is a challenge. This unwanted occlusion may disrupt user interactions with physical devices and compromise safety and usability. This paper investigates two resolution strategies to address this issue: Redirected Walking, which subtly adjusts the user’s movement to maintain physical-virtual alignment, and Automatic Teleport Rotation, which realigns the virtual environment during travel. A user study set in a virtual forest demonstrates that both methods effectively reduce occlusion. While in our testbed, Automatic Teleport Rotation achieves higher occlusion resolution, it is suspected to increase cybersickness compared to the less intrusive Redirected Walking approach.
Investigating Seamless Transitions Between Immersive Computational Notebooks and Embodied Data Interactions
A growing interest in Immersive Analytics (IA) has led to the extension of computational notebooks (e.g., Jupyter Notebook) into an immersive environment to enhance analytical workflows. However, existing solutions rely on the WIMP (windows, icons, menus, pointer) metaphor, which remains impractical for complex data exploration. Although embodied interaction offers a more intuitive alternative, immersive computational notebooks and embodied data exploration systems are implemented as standalone tools. This separation requires analysts to invest considerable effort to transition from one environment to an entirely different one during analytical workflows. To address this, we introduce ICoN, a prototype that facilitates a seamless transition between computational notebooks and embodied data explorations within a unified, fully immersive environment. Our findings reveal that unification improves transition efficiency and intuitiveness during analytical workflows, highlighting its potential for seamless data analysis.
Interacting Beyond Reach: Multi-Perspective Augmented Reality for Precise Virtual Border Definition in Constrained Spaces
In spatially constrained environments, such as warehouses or industrial workspaces, users often face difficulties in defining virtual regions due to occlusions, physical barriers, or limited accessibility. This paper presents a multi-perspective Augmented Reality (AR) system designed to support the precise placement of 3D virtual borders in such scenarios. The approach integrates spatially aligned remote camera perspectives into a mobile AR application, allowing users to view and interact with virtual content from otherwise unreachable positions. A loosely coupled system architecture enables dynamic integration and removal of remote cameras, ensuring scalability and adaptability to diverse setups. We evaluate the system in a user study (N=17), assess its impact on physical and cognitive workload and to analyze the usage and effect of multiple perspectives during virtual object manipulation in constrained environments. Participants reported improved spatial understanding and ease of interaction, though occasional misplacement errors occurred when relying solely on static views. These findings suggest that integrating additional perspectives into AR interfaces can effectively enhance interaction in complex and constrained environments.
Enhancing Spatial Understanding in Mixed-Reality Presentations
Mixed reality (MR) presentations often involve a presenter wearing a head-mounted display (HMD) and an audience watching via a large display, making it difficult for audiences to perceive spatial relationships between the presenter and virtual objects. We report two experiments testing three design variations: (1) scene camera placement (audience-aligned vs. opposite), (2) overlaying the presenter’s first-person view, and (3) highlighting objects in the presenter’s view. Results show that audience-aligned cameras and object highlighting improve spatial understanding, while combining third- and first-person views can further aid perception. We derive design guidelines for configuring MR presentations to better support audience comprehension.
Towards Understanding how Changing Translation Gain Affects Detection Thresholds
Redirected Walking (RDW) enables users to explore expansive virtual environments within limited physical spaces by subtly manipulating the mapping between their physical and virtual movements. One such manipulation, translation gain, alters the scale of user’s virtual forward movement relative to their physical forward movement. The primary objective of the presented study (n = 35) was to understand how changing the user’s translation gain in a constant manner affects their ability to detect the manipulation. Specifically, the study presented users with three different rates of change (slow, moderate, and fast), as well as two directions (increasing and decreasing) for the applied translation gain. The study was conducted using a “Method of Limits” psychometric technique, which allows for much quicker collection of the user’s detection threshold when compared to other psychometric techniques used in prior RDW literature. Our results show that both rate of change and direction had a significant effect on the participants’ detection thresholds, but also suggest that time of exposure to noticeable translation gain manipulations may have an impact on detection thresholds as well. Finally, we discuss these findings, their potential implications, and relevant future work.
Enhancing Immersive Virtual Reality Experiences with Multiple Tasks Prediction Using Pre-Trained Large Foundation Models
Immersive virtual reality (VR) environments pose significant cognitive and physical challenges as users engage in multitasking scenarios involving attention management and working memory, often leading to increased cognitive load, sensory conflicts, and cybersickness, diminishing users’ performance and immersion. While traditional machine learning (ML) and deep learning (DL) methods have been employed to predict individual factors such as cybersickness or attention, they often fail to capture the interconnected and dynamic nature of these cognitive and physiological demands. Moreover, these methods typically require large volumes of labeled data, extended training times, and struggle to generalize across diverse VR contexts. To address these limitations, we propose an innovative method for predicting multiple tasks, i.e., cybersickness, cognitive load, working memory, and attention by leveraging the knowledge of pre-trained large foundation models, namely TimeGPT and Chronos. We apply two learning mechanisms, zero-shot and few-shot learning, for adapting these foundation models for multiple task predictions. We validate our approach on the open-source VRWalking dataset, utilizing multimodal data fusion and participant-specific grouping (based on age and gender), and compare it against traditional DL-based methods trained from scratch. Results show that our few-shot-based fine-tuned TimeGPT and Chronos models significantly outperform traditional DL models in multiple tasks. Specifically, the fine-tuned TimeGPT model achieves significantly lower RMSE values for predicting cybersickness, cognitive physical load, cognitive mental load, working memory, attention success rate, and reaction time, respectively, outperforming the traditional transformer. Furthermore, the fine-tuned TimeGPT model achieves a 4.52 × reduction in training time compared to a conventional Transformer model for the same prediction tasks. Moreover, we deploy the fine-tuned TimeGPT model on the HTC VIVE Pro VR headset, enabling real-time prediction of multiple task severity levels from streaming VR simulation data during gameplay.
SESSION 8: Affective, Collaborative, and Social Interaction
Effects of Co-speech Gesture Size of Virtual Agents on Persuasive Communication
Co-speech gestures are crucial for enriching both human-human and human-agent communications. Yet, the specific impacts of gesture size—especially when being generated by advanced data-driven techniques—remain underexplored. This study investigates how varying gesture sizes affect human-agent interactions across two distinct persuasive contexts (informational and emotional), with a focus on social outcomes such as persuasion and empathy. We conducted two controlled experiments, each involving 36 participants, comparing three gesture conditions: Minimal gesture, Small gesture, and Large gesture conditions. Experiment 1, set in an informational sales context, showed that small and large gestures significantly enhanced persuasive effectiveness, social presence, and communication quality compared to the minimal gesture condition, although no meaningful differences emerged between small and large gestures. In contrast, Experiment 2, situated in an emotionally charged context, revealed that larger gestures progressively amplified both persuasive impact and perceived empathy. These findings highlight that gesture size matters in emotionally intensive communications and the substantial social benefits of deep-learning techniques for gesture generation.
A Silent Negotiator? Cross-cultural VR Evaluation of Smart Pole Interaction Units in Dynamic Shared Spaces
As autonomous vehicles (AVs) enter pedestrian-centric environments, existing vehicle-mounted external human–machine interfaces (eHMIs) often fall short in shared spaces due to line-of-sight limitations, inconsistent signaling, and increased decision latency on pedestrians. To address these challenges, we introduce the Smart Pole Interaction Unit (SPIU), an infrastructure-based eHMI that decouples intent signaling from vehicles and provides context-aware, elevated visual cues. We evaluate SPIU using immersive VR-AWSIM simulations in four high-risk urban scenarios: four-way intersections, autonomous mixed traffic, blindspots, and nighttime crosswalks. The experiment was developed in Japan and replicated in Norway, where forty participants engaged in 32 trials each under both SPIU-present and SPIU-absent conditions. Behavioral (response time) and subjective (acceptance scale) data were collected. Results show that SPIU significantly improves pedestrian decision-making, with reductions ranging from 40% to over 80% depending on scenario and cultural context, particularly in complex or low-visibility scenarios. Cross-cultural analyses highlight SPIU’s adaptability across differing urban and social contexts. We release our open-source Smartpole-VR-AWSIM framework to support reproducibility and global advancement of infrastructure-based eHMI research through reproducible and immersive behavioral studies.
Toward Multimodal Asynchronous Collaboration in VR Artistic Creation with S.P.A.R.K
Recent artistic explorations in VR environments have explored how users engage with virtual spaces, sounds, and bodies whether, as artists, coders, or spectators. While many applications focus primarily on musical interaction or motion capture, few allow users to actively compose spatial and temporal relationships within a multimodal collaborative creation context merging altogether different artistic modalities. In this project, we present a Virtual Reality application that enables novel forms of collaboration between dancers and musicians. The application allows dancers to record full-body performances, which musicians can then use as the basis for sonic composition by spatially mapping sound triggers onto the dancer’s recorded movement. Rather than relying on live capture or real-time gesture tracking, our approach treats movement as a timeline for interaction blending choreography with sound design in an asynchronous workflow. We employ an iterative design process to ensure usability among experts. This paper details the first implementation and study involving 10 participants recruited from professional and amateur artists with electronic music backgrounds, highlighting positive reception of the application’s creative potential and usability.
Exploring Bichronous Collaboration in Virtual Environments
Virtual environments (VEs) empower geographically distributed teams to collaborate on a shared project regardless of time. Existing research has separately investigated collaborations within these VEs at the same time (i.e., synchronous) or different times (i.e., asynchronous). In this work, we highlight the often-overlooked concept of bichronous collaboration and define it as the seamless integration of archived information during a real-time collaborative session. We revisit the time-space matrix of computer-supported cooperative work (CSCW) and reclassify the time dimension as a continuum. We describe a system that empowers collaboration across the temporal states of the time continuum within a VE during remote work. We conducted a user study using the system to discover how the bichronous temporal state impacts the user experience during a collaborative inspection. Findings indicate that the bichronous temporal state is beneficial to collaborative activities for information processing, but has drawbacks such as changed interaction and positioning behaviors in the VE.
ArithMotion: Peer-Relative Motion Generation for Social VR via Arithmetic Metaphor
In social VR, bodily motions are a major nonverbal channel for expressing intent or emotion. However, freely making bodily motions is not always possible due to unaffordability of rich tracking devices, physical disabilities, or social/spatial constraints. While current social VR platforms provide methods like emotes, expressions are limited to a finite preset. To facilitate open-ended and socially-aligned motion in constrained environments, our insight is peer-relativity found in everyday interaction. Specifically, we propose ArithMotion, an end-to-end system to generate peer-relative motions by combining generative models with arithmetic-inspired interaction. We fully implemented and iteratively refined the system. User studies show participants experienced novel, open-ended expressions closely tied to social context.
Enhancing the Audience Experience for VR and AR Theatre with AI-generated Subtitles
Recent technological developments on AI and immersive media are transforming the artistic landscape, providing novel mechanisms for artists and audiences. Following a human-centric approach, together with a theatre company in Greece, this paper investigates how subtitle placement affects user experience and cognitive load in a live theatre performance enhanced by AR glasses. To do so, we design and develop a system for displaying subtitles in VR and AR. We evaluated the system in two conditions (N = 19;N = 12), both in a controlled environment (VR) and an actual theatre (AR). In the latter, we integrate AI solutions to provide automatic captioning and translation in real time, and VFX to further augment the experience. Our quantitative and qualitative results showed no difference between subtitle placements in terms of cognitive load and user experience, with users equally liking the two proposed approaches. Results also highlighted the perceived usefulness of AR to enhance theatre performances, indicating new paths for wider accessibility and further immersion.
SESSION 9: Avatars, Agents, and Embodiment
The 2×2 of Being Me and You: How the Combination of Self and Other Avatars and Movements Alters How We Reflect on Ourselves in VR
Effective self-reflection is crucial for motor skill acquisition, yet it is challenging to facilitate in single-user VR training environments. We investigate this through a method where users are embodied as a virtual trainer and prompted to actively evaluate a recorded performance. In an empirical study, we systematically varied the trainee’s appearance and their movements. Our mixed-methods analysis reveals that confronting one’s own performance triggers a fundamental role conflict between the user’s identity as the performer and their new role as the evaluator. Most importantly, this conflict challenges a binary view of embodiment. Participants experienced a multi-faceted sense of self, oscillating between identifying with the trainee and detaching as the trainer. Our work contributes a novel characterization of embodied self-evaluation, revealing a psychological duality at its core and offering clear design implications for VR systems that foster self-insight in training and therapy.
Investigating How to Control Virtual Spiders While Embodying Them in Virtual Reality
Virtual Reality (VR) enables users to embody avatars with vastly different appearances and anatomies. Embodying virtual spiders with their alien morphology could offer exciting experiences for immersive VR in gaming or education. While prior research has explored embodiment of human-like avatars and even non-human forms such as animals, it still remains unclear how best to control anatomically distinct avatars such as spiders. In this exploratory study, we systematically compared four control methods—standard VR controller, hand control, half-body control, and full-body control—while embodying a spider in VR. Using a repeated-measures design with 20 participants, we assessed each control method in terms of embodiment, usability, and perceived exertion. Results indicate that half-body control offered the best overall balance, with the highest usability and lowest exertion, while still maintaining a comparable level of embodiment to other methods. Full-body control was rated significantly lower in usability and higher in perceived exertion. These findings suggest that half-body control may provide a good balance between realism and usability for embodying spiders in VR.
VReflect: Evaluating the Impact of Perspectives, Mirrors and Avatars in Virtual Reality Movement Training
Virtual reality training systems require the careful design of content presentation, user embodiment, and overall user experience. We explore the impact of different perspectives (first-person and third-person) and virtual self-visualization techniques (VSVTs: mirrors and external avatars) on user embodiment, performance and experience. In a study with 28 participants learning karate movements, we tested four combinations of these factors. Results indicate that perspective influences visual focus and embodiment, while VSVTs affect movement execution, particularly in the third-person avatar condition. Measurements of physiological activity, workload, presence, and enjoyment found no significant overall advantages for any of the conditions. Interviews revealed that most participants preferred the familiar first-person mirror combination, although participants in third-person perspective focused more on their own body and noted the helpfulness of this viewpoint. The study demonstrates that alternative perspectives and visualization techniques offer valuable training options, as these conditions did not produce significant differences in measured cognitive load when compared with each other. Future VR training systems should incorporate interactive feedback and customization options to accommodate individual preferences and optimize learning experiences.
Joining the Circle: Human Entry Behavior in a Mixed Reality F-Formation with Agent, Avatar, and Human Partners
According to Hall’s theory, the space individuals maintain between one another depends on relational closeness and situational context. Prior research suggests that interpersonal distance (IPD) varies not only between virtual and real humans, but also among virtual humans depending on their perceived agency. However, little is known about how people spatially negotiate entry into mixed groups comprising different types of agents in extended reality (XR) settings. In this study, we examine participants’ entry behavior as they join a circular F-formation composed of three distinct entities: an agent, an avatar, and a real human. Specifically, we investigate how participants position themselves relative to each entity, analyzing their preferences and behaviors in terms of IPD and entry dynamics. Our findings reveal that participants maintained the greatest IPD from the real human, followed by the avatar and the agent, suggesting nuanced social distinctions among these three entities. Furthermore, when the real human was absent, participants tended to maintain a greater distance from the avatar compared to the agent. These results offer valuable insights for the design of XR collaboration environments and for understanding social dynamics in multi-agent interactions.
How Avatar User Visual Incongruities Impact the Sense of Embodiment in Virtual Reality: A Systematic Review
Virtual reality (VR) is considered a technological megatrend that is driving the digitization of all aspects of human life. Avatars, virtual bodies controlled by users, play an important role in VR. Many scholars consider the sense of embodiment as a key affordance of avatars. However, our understanding of how to optimize embodiment through avatar representation in VR remains underdeveloped. This study systematically reviewed a body of 43 studies from 41 research papers that investigated the manipulations of avatars in VR. The corpus was coded with head-mounted display (HMD) models, avatar creation tools, mirror use, task context, and avatar manipulations. Based on these experiment-based studies, we discuss how different types of avatar representations affect users’ embodiment. Based on the findings, we indicate practical implications of avatar design in VR applications.
I feel you: The Impact of Emotional Virtual Characters on Emotional State, Player Experience, and Connectedness in VR Games
Engaging in digital games, in particular, using highly immersive VR technology, has emerged as a coping mechanism to escape negative feelings and create positive emotions. It is, therefore, interesting to investigate which mechanisms in digital games influence players’ emotions. One potential mechanism is using virtual characters as companions, as they can positively affect the emotional state and the player experience. However, the impact of the virtual character’s specific emotions on the players has been studied less. The presence of others can positively influence people and improve their well-being when in a bad emotional state. According to the Emotional Contagion theory, the presence of positive-minded others can achieve these positive effects or, following the Emotional Similarity theory, also by the presence of negative-minded others. Therefore, we want to investigate how virtual characters in different emotional states affect the players when they are sad and immerse themselves in a game. Hence, we posed the following research question: “How do different emotions of a virtual character affect emotional state, player experience, and connectedness when players are in a sad emotional state?“. Our lab study with 75 participants put in a sad emotional state, revealed significant differences between playing a cooperative VR game with either a happy or a sad virtual character regarding participants’ emotional state and connectedness, but not regarding player experience. We discuss the results and implications of emotional virtual characters to enhance well-being.
Co-embodied Mirroring: Investigating the Effects of Movement Blending on Partner Impressions in Virtual Environments
The mirroring effect has been demonstrated to facilitate smooth social interactions. Digital technology can automatically implement mirroring to achieve more natural interactions in contexts where spontaneous mirroring is challenging, such as remote human communication or dialogue with non-human agents. However, many existing automatic mirroring systems replicate a user’s movements with a time delay, and when applied to full-body interactions between human users, such systems often result in excessive and unnatural mimicry. To address this issue, this study examines the effectiveness of Co-embodiment, a technique that blends the movements of a user and their partner in real-time, as a method for achieving both the benefits of mimicry and the naturalness of interaction. We examined how varying the blending ratios (0%, 25%, 50%) affects social impressions in a two-person interaction in the VE. Results showed that moderate blending (25%, 50%) enabled natural and comfortable interactions, comparable to no blending (0%). However, we did not observe significant improvements in perceived closeness or trustworthiness. Semi-structured interviews helped explain these results, revealing that participants differed in their interpretations of subtle synchronized movements. Some viewed them as signs of mutual understanding or cooperation, while others experienced discomfort. In some cases, this discomfort seemed specific to virtual contexts, as subtle similarities made participants doubt the presence of a real human behind the avatar. These findings highlight that impressions depend on whether blended movements feel socially responsive or merely imitative, stressing the need for interaction designs that enhance the sense of social presence.
Compensating Motion-Induced Errors in Smartphone-Based VR Avatar Reconstruction
Recent developments in smartphone-based avatar reconstruction have made the creation of personalized and realistic avatars significantly more accessible. However, relying on one smartphone camera leads to capturing images sequentially, which introduces new challenges; particularly longer capture times increase the susceptibility to subject motion, which results in degraded reconstructions.
We present a novel approach for smartphone-based avatar reconstruction that combines photogrammetry, silhouette constraints, and inverse rendering to produce high-fidelity, realistic avatars free of motion-induced artifacts. By using short, motion-resilient image sequences, referred to as sub-scans, we considerably reduce motion-induced artifacts. Our pipeline achieves high visual quality while offering improved robustness and outperforms current state-of-the-art methods in terms of computation time and accuracy.
SESSION 10: Security and Systems
Synthesizing Evidence-Based AR Design Recommendations and Identifying Gaps in Practice
From handheld devices to head-mounted displays, augmented reality (AR) technologies are becoming commonplace in everyday settings, supporting tasks in education, healthcare, gaming, and beyond. Prior research has developed a number of evidence-based design recommendations for AR apps. However, these recommendations are often scattered across academic literature and differ in scope and focus. In addition, there are still open research questions about the degree to which existing guidelines are applied in practice, particularly in handheld AR contexts. To address these gaps, we synthesized AR design recommendations from academic literature and organized them into an integrated set of guidelines. We then empirically analyzed 52 commercial handheld AR apps to assess how well they align with these guidelines. We found that while most apps follow basic usability guidelines, such as using familiar UI layouts, many apps do not adopt context-aware features, offer limited support for multimodal interaction and feedback, and overlook key usability practices such as onboarding and navigational aids. In addition, we saw very few guidelines related to data privacy, collaborative AR, safety and accessibility. We contribute a synthesis of evidence-based AR recommendations and identify key areas of disconnect between recommendations and practice for handheld AR apps, which aids future designers and developers.
Beyond the Headset: A Systematization of Knowledge on Extended Reality Privacy and Security in Healthcare
Extended reality (XR) systems offer transformative potential for healthcare in domains ranging from surgical planning to remote rehabilitation and mental‐health therapy. The rich streams of sensor, biometric, and environmental data that enable these applications, however, also create novel and poorly understood privacy and security vulnerabilities: adversaries can exploit unencrypted signaling, sensor side‐channels, and application‐layer flaws to infer sensitive patient information or disrupt clinical workflows. Nevertheless, there aren’t many thorough Systematization of Knowledge (SoK) that examine XR for healthcare at the moment. In this SoK, we survey 65 peer‐reviewed works published between 2017 and 2024 across leading XR, security, and privacy venues, synthesizing a unified threat taxonomy that spans device, network, user and cloud layers. We introduce a quantitative evaluation framework XR-PRISM (Privacy and Risk Impact Scoring Metric), drawing on adapted risk scores, detection performance, and usability assessments to rigorously assess the level of security and privacy risks. Our analysis reveals critical gaps: over 70% of countermeasures lack standardized risk evaluations, fewer than 15% include high prerequisites to launch an attack, and reproducibility is hampered by scarce artifact releases. Finally, we chart a research roadmap advocating for open benchmark suites with shared datasets, artifact disclosure policies, cloud‐layer protections, and robust detection and recovery mechanisms. By quantifying “what works—and by how much,” this SoK provides a data‐driven foundation for developing secure, privacy‐preserving, and usable XR healthcare technologies.
User Identification in Virtual Reality through Behavioral Biometrics and the Influence of Colocated Interactions
Behavioral Biometrics in Virtual Reality (VR) allow for implicit user identification, as the head- and hand-movements that can be captured from the head-mounted display and the controllers are highly descriptive of the user’s true identity. Such body movements have been explored in the past; however, to date, it is unclear how they perform in settings where more than one person interacts in a shared virtual environment. In this work, we explored through a user study (N=40) how behavioral biometrics in VR change when one or more persons interact with each other in a shared virtual environment and whether this is influenced by the nature of the interaction itself. We find that user identification is possible with up to 83.38 % by applying deep learning models, and that particularly cooperative interactions between multiple VR users lead to highly identifiable body movements. Our results help in advancing behavioral biometrics for seamless user identification in VR, as a viable alternative to using PINs and passwords.
Motion Forecasting Attacks on Behavioral Biometric Authentication Systems in Virtual Reality
Inspired by behavioral biometrics for keystroke and touch-based systems, a large body of work has emerged over the past decade on using user behavior in VR applications as a signature of the genuine user. Recent work on forecasting approaches for behavioral biometrics for VR helps address a key challenge in existing approaches where complete user movement signatures are needed to authenticate the user. Forecasting-based approaches enable VR authentication systems to use limited user behavior data and forecast future movement trajectories. However, forecasting-based approaches present a new concern where malicious users can exploit the predictability of user motions to launch an attack. In this paper, we present the first forecasting-based attack model against VR authentication systems that rely on behavioral biometrics. We propose a two-phase approach to assess authentication performance and adversarial risk. Phase 1 develops a Fully Convolutional Network for authentication using VR motion data, evaluating stochastic gradient descent (SGD) and Adam optimizers with Equal Error Rate (EER) as the primary metric. Phase 2 introduces a forecasting attack, where partial motion sequences of an impostor’s motion are fed to a Transformer model to generate future trajectories that represent genuine user behavior for an authenticator enabling an impostor to deceive the authentication system. Experimental results demonstrate the attack’s effectiveness, achieving an EER as low as 0.0346, exposing security risks in motion-based authentication. These findings underscore the urgent need for robust countermeasures to defend against predictive motion attacks in VR environments. Our code is shared at: https://bit.ly/4n1GtxG.
Not Just Who You Are, but Where and How: Modeling XR Authentication Scenarios
Authentication in extended reality (XR) presents unique challenges due to embodied interaction, spatial immersion, and variable environmental conditions. As XR systems become more prevalent, secure and usable authentication mechanisms are critical. However, current research often overlooks the scenarios in which these mechanisms operate, limiting comparability, reproducibility, and real-world applicability. This paper addresses this gap by presenting a structured model of XR authentication scenarios. We conducted semi-structured interviews with experts in the Usable Security and Privacy domain to identify key scenario dimensions influencing the design and evaluation of XR authentication mechanisms. Through thematic analysis, we identified dimensions related to contextual parameters, environmental conditions, and XR-specific properties. The resulting scenario model was validated through literature mapping and demonstrated via a realistic use case. Our work provides a foundation for context-aware design and more rigorous evaluation of authentication mechanisms across diverse XR environments.
Predictability-Aware Motion Prediction for Edge XR via High-Order Error-State Kalman Filtering
As 6G networks evolve, offloading extended reality (XR) applications emerges as a key use case, leveraging reduced latency and edge processing to migrate computationally intensive tasks, such as rendering, from user devices to the network. This enables lower battery consumption and smaller device form factors in cellular environments.
However, offloading incurs delays from network transmission and edge server queuing, particularly under multi-user concurrency, resulting in elevated motion-to-photon (MTP) latency that degrades user experience. Motion prediction techniques, including deep learning and Kalman filter (KF), have been proposed to compensate, but deep learning struggles with scalability at resource-constrained edges amid growing user loads, while traditional KF exhibits vulnerability in handling complex motions and packet loss in 6G’s high-frequency interfaces.
To address these challenges, we introduce a context-aware error-state Kalman filter (ESKF) framework for forecasting user head motion trajectories in remote XR, integrating a motion classifier that categorizes movements by predictability to minimize prediction errors across classes. Our results show that this optimized ESKF outperforms conventional Kalman filters in positional and orientational accuracy, while demonstrating superior robustness and resilience to packet loss.
When AR Hinders Performance: The Hidden Costs of Video-See-Through Displays
Head-mounted displays (HMDs) are increasingly used in safety-critical fields such as surgery, aviation, and industrial manufacturing. As major manufacturers shift toward video-see through (VST) designs to deliver unified AR and VR experiences, they also replace direct visual access to the real world with a video feed. This design choice raises concerns about its impact on user performance. This study investigates the isolated impacts of VST and optical-see through HMDs on user real-world perceptual–motor performance by comparing two leading HMDs, Apple Vision Pro (AVP) and HoloLens 2 (MHL2) against unencumbered vision using the Purdue Pegboard Test (PPT), a standard assessment of manual dexterity. Twenty participants completed tasks across three conditions (AVP, MHL2, and Baseline), while we recorded dexterity scores, cognitive load, system usability, VR sickness, and subjective feedback. Movement data were also collected via Apple Watches. Study results with 20 participants revealed that dexterity scores significantly declined under the AVP condition across all subtests. This was accompanied by significantly higher cognitive load and a notable drop in RMS acceleration values (observed in the RMS analysis of a subset of 13 participants). The analysis on dexterity score yielded a significant difference between MHL2 and Baseline only for a single subtest of the PPT (Left Hand). Post-task interviews revealed greater discomfort, visual fatigue, and reduced task confidence with AVP. These findings suggest that current VST HMDs impose a hidden ergonomic cost undermining user performance in tasks where precision, and comfort are essential. For AR applications designed to enhance user performance, such as assistive tools, training systems, or task guidance interfaces, designers must account for and mitigate this performance degradation through counterbalancing strategies to offset the visual and cognitive burden introduced by VST HMDs.
SESSION: VRST 2025 Poster Abstracts: Interaction Design and Input Techniques I
VR Eye Tracking Data for Gender Identification: A Look at Same-Domain and Cross-Domain Scenarios
Prior research has shown that cross-domain gender identification (GI) in VR is challenging, often due to limited overlapping features and a lack of shared users across datasets. In this work, we examine two distinct VR environments—a solar panel task and a biological exploration task—using a consistent feature set and eye-tracking (ET) data from common users. Our results confirm that cross-domain classification is substantially harder than domain-specific tasks and highlight head position as a key feature. Importantly, we show that incorporating common users improves model performance, emphasizing the role of user overlap in enhancing the generalizability of GI models in VR.
SESSION: Locomotion and Wayfinding
Visual Constraints Impact on Steering in VR Driving Simulation
Human vision guides lane keeping and hazard anticipation during driving. However, isolating how visual field constraints affect steering is difficult in real driving. This study used immersive VR with a depth-aligned aperture to restrict vision while participants drove curved roads under several conditions. Results revealed that misaligned restrictions impaired steering, while tangent point alignment partly improved performance. Results highlight how VR can probe visual–motor mechanisms in driving.
Detection of Translation Gain is Decreased When Virtual Reality Users Are Unaware of Its Presence
The prevalent evaluation methods used to estimate detection of redirected walking are based on methods from psychophysics that require users to know their virtual movements are being manipulated. However, this higher-than-normal level of attention toward their movements yields conservative detection thresholds. We find that participants who were unaware that redirected walking (translation gain) was applied detected the technique at a significantly higher gain than users who were aware (at gains of 1.73 and 1.38, respectively). We provide evidence that redirected walking-based navigation solutions may be able to leverage gain values that are larger than the current threshold guidelines would suggest.
Visualizing Time-Dependent Navigation Zones in Mixed Reality
Using Mixed Reality (MR) to plan the motion of robots, for example the path of a drone, is an effective way to optimize their movements in a collaborative working environments. Path planning using MR allows for the incorporation of motion constraints arising from real-world obstacles, virtual objects, and time scheduling considerations.
We address time-dependent path planning of a drone in a three-dimensional dynamic indoor environment using MR. We evaluate visualization methods which convey the planned path of the drone and which show time-dependent safety zones around real and virtual obstacles. Our results with 51 participants show the effectiveness of the visualization to support interactive path planning tasks. The contribution of our research is a novel MR visualization method for three-dimensional and time-dependent (4D) path planning.
Vibrotactile Feedback to Make Real Walking in Virtual Reality More Accessible for People With and Without Mobility Impairments
This research aims to examine the effects of various vibrotactile feedback techniques on gait (i.e., walking patterns) in virtual reality (VR). Prior studies have demonstrated that gait disturbances in VR users are significant usability barriers. However, adequate research has not been performed to address this problem. In our study, 39 participants (with mobility impairments: 18, without mobility impairments: 21) performed timed walking tasks in a real-world environment and identical activities in a VR environment with different forms of vibrotactile feedback (spatial, static, and rhythmic). Within-group results revealed that each form of vibrotactile feedback improved gait performance in VR significantly compared to the no vibrotactile condition in VR for individuals with and without mobility impairments. Moreover, spatial vibrotactile feedback increased gait performance significantly in both participant groups compared to other vibrotactile conditions.
Seeing With Sound in Safe Virtual Environments: A Walk-In-Place VR Training System for Users With Visual Impairment Using the vOICe Algorithm
We present a virtual reality (VR) training system that supports safe mobility skill development for low-vision users through visual-to-auditory sensory substitution. The system combines the vOICe algorithm with walk-in-place locomotion to enable navigation in immersive environments while minimizing physical risks and spatial requirements. Training with the system follows a two-phase structure: an initial learning phase to build familiarity with visual-to-audio substitution, followed by a navigation phase in which users apply auditory cues to explore and reach destinations in VR. The system provides a safe, controlled environment for developing non-visual spatial awareness and serves as an early exploration of a platform for evaluating sensory substitution techniques. Through this work, we aim to contribute to solutions that promote greater independence in mobility for visually impaired users.
SESSION: Cybersickness, Health, and Digital Twins
Meltdown: Bridging the Perception Gap in Sustainable Food Behaviors Through Immersive VR
Climate change education often struggles to connect personal actions with environmental consequences. Meltdown is an immersive VR escape room that teaches sustainable food consumption and waste practices through scenario-based tasks and consequence-driven feedback. A user study (N = 36) found significant gains in familiarity, confidence, and behavioral intentions, with modest knowledge improvements. Exploratory metrics (n = 13) showed high accuracy on familiar decisions but lower accuracy on less intuitive ones. These findings suggest that consequence-driven VR can effectively engage learners, link everyday choices to visible outcomes, and foster sustainable behavior change.
Accessible VR for Older Adults: Mounting without Straps
Older adults face ergonomic barriers in virtual reality (VR), particularly with head-mounted displays (HMDs), which are heavy and poorly suited for fragile musculoskeletal systems. This poster introduces a 3D-printed support system that alleviates the weight of the headset by redistributing it to an external frame. This solution promotes accessibility by allowing passive, seated VR experiences, enabling better comfort and extending the use of VR for seniors in health, leisure, and telepresence contexts.
Background Sound Tempo Modulation Can Influence Scene-Specific Memory in Virtual Reality
Sustaining user memory in digital environments such as virtual reality (VR) is a significant challenge. We show that temporary tempo modulations in background music (BGM) can selectively and naturally enhance memory in VR. In a user study (N = 20), decreasing the BGM tempo by approximately 21% significantly improved recall of events. These findings point to a new acoustic design approach that adapts scene by scene to narrative pacing and importance while maintaining a natural user experience.
Varying Ecological Validity of the Virtual Environment Influences Soccer Pass Reaction Times
Keeping It Clean: A VR Simulation for Dental Sterilization
This paper explores the applicability of virtual reality (VR) to dental education, specifically focused on learning the routine but critical process of sterilizing dental equipment. We present a novel sterilization training prototype and evaluate the extent to which educators and practicing hygienists would implement VR training tools in the future. A survey was conducted with (N=28) dental providers and a subsequent in-person evaluation of the prototype was conducted with (N=7) educators in a dental hygienist training program. Results suggest overall positive opinions of VR for dentistry, while also offering suggestions for improving the prototype.
SESSION: Interaction Design and Input Techniques II
Interior design method in AR based on AI with a gesture modification
Computer-aided design has a long history. Such techniques are founded on the construction of a three-dimensional model using Building Information Modelling software or dedicated software for generating visualisations. These processes can be supported by modern inventory techniques and the visual effects can be processed graphically and exported to various environments, including virtual reality environments. The integration of AI-powered model generation into the design process can facilitate the advancement of spatial modeling techniques. Furthermore, the incorporation of technology based on interaction will enhance existing architectural design methods. We propose a solution that enables the generation of 3D models through voice interaction, with the possibility of modification through gestures.
SESSION: Human Factors and Perception
Depth-Shifting Aerial Image Display Using Angle Changes Between the Display and Optical Elements
We propose a new optical system that creates three-dimensionally movable aerial images with a minimal optical configuration. The proposed optical system achieves depth shift of the aerial image by adjusting the angle of each element. Experimental results showed that the proposed optical system can reproduce depth and can improve and uniform the luminance of the aerial image compared to the conventional linear movement method.
An Experimental Study of Tilt Sensation Displayed with a Combination of Visual and Physical Tilt: A Case of an Interactive VR Work “Gravity Paradox”
The authors developed an interactive VR work. In this work, the player’s goal is to escape from a mansion by changing the direction of gravity. This gravity change is represented by visual and physical tilt. The visual tilt is displayed through an HMD, and the physical tilt is presented through a motion platform. This presentation was effective, and players enjoyed the work at exhibitions. In this paper, the authors experimented to study the effect of adding physical tilt to visual tilt, especially its role in sensation. Participants were presented with different tilt stimuli (images with varying tilt angles and the presence or absence of physical tilt) and asked to report their perceived tilt angles. The participants felt more tilted when physical tilt was presented. Moreover, though the perceived tilt angle seemed to depend on the history of the visual tilt angle, this phenomenon need not be considered when simple tilt emphasis is expected.
An Experiment on a High-Speed Image Projection Perceived Only During Smooth Pursuit using Striped Patterns
We have proposed a novel image display technique that uses high-speed projection to present different images depending on the user’s gaze movement direction. This method projects stripe-based frames at high refresh rates; when a user moves their gaze in a specific direction, previously hidden visual content becomes perceptible due to temporal interference patterns. In this method, the optimal gaze velocity is determined from parameters such as the decomposition parameters of the striped patterns and the projection speed of these images. However, it is known that in the human visual system, gaze velocity lags behind the target object’s velocity during pursuit eye movements. Therefore, we conducted an experiment with human subjects to compare the theoretically optimal gaze velocity with the actual measured velocity. The results suggest that the velocity of the guide point used to induce gaze movement should be set to approximately 1.5 times the theoretical optimal gaze velocity.
SESSION: Multimodal Experiences
SketchTo3DGen : GenAI Powered Articulation Ready 3D Asset Ideation using 3D Sketches and Audio Descriptions
We present SketchTo3DGen, a novel system for rapid 3D content ideation on a VR headset. SketchTo3DGen combines freehand 3D sketching and audio descriptions to generate photo-realistic 3D assets on-the-fly. Running on a Meta Quest headset with a Unity application, our system leverages remote GPU-accelerated services for AI-driven content creation using intuitive inputs. The user can draw a 3D sketch in mid-air and describe the intended asset verbally; our pipeline transcribes and normalizes the speech into a text prompt, selects informative viewpoints of the 3D VR sketch, generates corresponding images via a state-of-the-art text-to-image model, and finally reconstructs a 3D mesh using an image-to-3D generator. The entire workflow is experienced in VR with minimal interface elements. We describe the design motivations, technical pipeline, and user interaction details of SketchTo3DGen. This VR in-headset pipeline, using intuitive inputs in the form of hand-drawn 3D VR sketch and speech, streamlines 3D modeling, accelerating the generation of articulation ready 3D assets.
SESSION: Immersive visualization and Interaction
Visualizing Simulated Airflow and Thermal Comfort in Extended Reality
We propose an XR (eXtended Reality) system that simulates and visualizes airflow and thermal comfort controlled by an air conditioner. This system reconstructs an indoor space to make its 3D map, tracks the air conditioner’s pose without markers, runs fluid and thermal comfort simulators, and visualizes the simulation results on a mobile XR device. It enables users to instantly find comfortable/uncomfortable spots in the indoor space and control the air conditioner more effectively. The modules for 3D mapping, object pose tracking, simulation and visualization are integrated on a single XR device, making the system portable and widely usable.
HoloViz Office: Location-Independent Mixed Reality Workspace for 3D Medical Data Visualization
The rapid shift to remote work has exposed limitations in traditional tools for 3D data visualization, particularly in medical training. We present HoloViz Office, a portable Mixed Reality (MR) workspace that enables immersive 3D medical data visualization independent of physical location. Unlike Virtual Reality (VR) solutions that isolate users, our MR approach using HoloLens preserves situational awareness while providing intuitive interaction with complex medical datasets. We demonstrate the system through comparative brain analysis for Cerebral Small Vessel Disease (CSVD) and dynamic human anatomy exploration. Evaluation with 10 participants confirms that HoloViz Office provides location-independent, convenient, and immersive visualization capabilities, contributing to effective remote collaboration tools in medical education.
ChemersiveLLM: Prompt-to-VR Simulation of Chemistry Experiments Using Generative AI
Large Language Models (LLMs) offer significant potential for integration with Virtual Reality (VR), but current AI systems struggle to generate accurate 3D environments and support semantic interaction. We present ChemersiveLLM, a VR-based chemistry learning platform that leverages LLMs for instruction sequencing, natural language grounding, and real-time guidance. Using a semantic action-mapping framework, the system translates AI-generated content into structured lab actions, enabling multimodal interaction, embodied experimentation, and intelligent feedback. Comparative evaluation across textbook, chatbot-based, and VR learning shows that our system improves engagement, comprehension, and satisfaction, underscoring its promise as a next-generation tool for science education.
SESSION: Reproducibility
VRCare – Improving Diagnostic Eyecare Experience – An investigative study
We present VRCare, a VR-based eye screening tool enabling remote, self-guided vision assessments for colour blindness, visual field, myopia, and contrast sensitivity. A user study with 33 participants evaluated usability, comfort, and perceived effectiveness. Participants rated the system highly for intuitiveness (4.45/5), ease of use (4.48/5), and comfort (4.27/5), with 91% willing to reuse the tool. Lower confidence in diagnostic accuracy (3.76/5) and reports of mild discomfort highlight the need for clinical validation and ergonomic refinement. Overall, findings demonstrate VR’s potential for accessible vision screening outside clinical settings.
Virtual Reality in the Treatment of Male Sexual Disorders: Protocol of a Replication Study
Sexual disorders negatively affect mental health and quality of life. Current therapies are often limited by adaptability and the challenges of safe exposure. Virtual reality (VR) offers immersive, controllable environments that can overcome these barriers. Early studies suggest benefits of VR-based therapies, but the small samples, unclear protocols, and outdated devices used limit their implementation. In this paper, we propose to replicate and evaluate VR-assisted therapy through a two-phase design: a pilot assessing immersion and cybersickness of participants, followed by a randomized trial comparing psychotherapy with and without VR.
SESSION: Interaction Design and Input Techniques III
StereoVisPoseNet: Stereo-based Visibility-aware Egocentric 3D Pose Estimation Network
Egocentric 3D pose estimation is challenging due to occlusions and errors at articulated joints. We propose StereoVisPoseNet, a stereo-based visibility-aware network that integrates depth and explicit joint visibility prediction to guide Transformer-based regression and refinement. Our method reduces MPJPE from 76.04 mm to 31.91 mm and PA-MPJPE from 63.43 mm to 28.73 mm compared to UnrealEgo, with substantial improvements for arms and legs. These results demonstrate the importance of combining stereo depth with visibility-aware modeling for robust egocentric pose estimation.
FlowZone: Real-Time Pose-Tracking Virtual Reality(VR) Yoga and Mindfulness Meditation
This work presents FlowZone, an exploratory VR prototype that integrates yoga and mindfulness meditation. The system uses headset and controller data for lightweight pose tracking and provides real-time feedback in guided sessions. A preliminary user study (N=6) revealed common challenges in yoga practice, including uncertainty about pose correctness, difficulty maintaining consistency, and the importance of calming environments. These insights informed the design of FlowZone, which combines accessible pose guidance with immersive, meditation-oriented settings. While not a full evaluation, our early findings suggest that VR yoga and meditation can lower barriers to practice and support stress reduction, pointing toward promising directions for future research.
SESSION: Affective, Collaborative, and Social Interaction
MultiSphere: Latency Optimized Multi-User 360° VR Telepresence with Edge-Assisted Viewport Adaptive IPv6 Multicast
360° video telepresence with VR enables immersive remote collaboration, but scaling to multiple users is subject to bandwidth and latency constraints. We present MultiSphere, a multi-user edge-assited 360° VR telepresence system, that combines viewport-adaptive IPv6 multicast tiling with a novel dual keyframe interval (KeyInt) streaming technique. Our approach addresses the latency bottleneck inherent in joining live streams of video using standard video codecs while maintaining visual quality through strategic use of low and high KeyInt streams. Our system achieves 75-94% bandwidth savings and an average request-to-decode latency of 56 ms, a 79% reduction compared to using a regular single-KeyInt stream.
SESSION: Avatars, Agents, and Embodiedment
MoPriC : Two Stage Approach for Text Guided Motion-Primitives Composition
Text-to-motion generative models suffer from the long-term dependency problem, where it becomes difficult to maintain the context of text instructions as the motion length increases. Also, current MoCap datasets include only predefined actions and fail to reflect diverse individual styles. To address these limitations, we introduced MoPriC, a two-stage motion composition framework that produces sequential motions from elementary motion primitives guided by text descriptions. We also presented DancePrimitives, a new dataset of collected motion primitives to capture the semantics of each unit motion.
AR and LLM-Based Virtual Agent for ABA-Oriented Social Training in Autistic Children
This paper presents an interactive training system that combines augmented reality (AR) with large language models (LLMs) to support children with autism spectrum disorders (ASD) in practicing social skills. The virtual AR agent equipped with multimodal sensing, including eye tracking, speech recognition, and facial emotion detection, monitors the status of children in real time, and provides adaptive training in four key skills: emotional expression, eye contact, initiating interaction, and understanding social etiquette, following the principles of Applied Behavior Analysis (ABA). Driven by LLMs, the agent delivers personalized verbal instructions and animated feedback. Expert reviews suggest that the proposed system offers an expandable, context-aware intervention framework, serving as a valuable supplement to traditional behavioral therapies for children with ASD.
The Effect of Avatar Transparency on Collaboration in Shared Virtual Spaces
In shared virtual spaces, users tend to mimic real-world social behaviors, such as maintaining interpersonal distances and avoiding collisions. During remote collaboration, these behaviors can limit movement and positioning, despite users are not co-located. Prior work found that avatar transparency reduces users’ positioning constraints induced by social behaviors, as long as social presence between collaborators. Our goal was to develop a system that enhances navigation freedom without significantly increasing social presence. We designed a transparency management system based on interpersonal distances and collision avoidance. A user study involving groups of three remote collaborators indicated that our system reduced the distances between users compared to fully opaque avatars without any significant difference in social presence.
SESSION: Security and Systems
Head Movement Biometrics for Continuous Authentication in Virtual Reality
This paper presents an approach for continuous user authentication in VR using head movement biometrics, utilizing bilateral head position data from the stereoscopic rendering systems of VR headsets. Our method employs a 1D Convolutional Neural Network (CNN) with a specialized feature extractor designed to capture the temporal head movement patterns, head impulse movements, pose stabilization behaviors, and frequency-domain characteristics from bilateral head velocities. We evaluated the system using 30 participants who performed door-opening and walking tasks across two sessions, separated by 17 days. The system achieved an average Equal Error Rate (EER) of 2.9% for door-opening tasks, 8.3% for walking tasks, and 5.67% in activity-invariant scenarios, when an authentication decision was made ≈ every 0.3 seconds after an initial 14-second calibration period.
SESSION: VRST 2025 Demos: Affective, Collaborative, and Social Interaction Demos I
Virtual Reality for Urban Soundscape Design: Exploring knowledge sharing, creation, and workplace integration
Urban sound is treated as noise, a nuisance to be mitigated, often in reaction to complaints. However, well planned urban sounds can also contribute to the quality of urban spaces. Yet, Professionals of the Built Environment are not equipped to work with sound proactively as a resource for sustainable city making. City Ditty, an interactive soundscape simulator, was developed to utilize a multisensory approach for professionals who are not accustomed to working with sound. City Ditty acts as conduit for knowledge discovery and sharing for engaging professionals in proactive urban sound planning. It provides a platform for education, rapid development, and evaluation of urban soundscapes. An overview of City Ditty, its use in knowledge creation and sharing, and early adopter use cases are provided.
SESSION: Immersive Visualization and Interaction Demos
An Inner-Wrist Trackball Interaction Technique for Pointing and Gesture Input through Body-Rubbing
With the increasing adoption of smart glasses, there is a growing need for efficient and precise input techniques in augmented reality (AR) environments. Current input options, like smartphones and hand-tracking, have limitations, such as requiring external devices to be picked up or needing sufficient space in front of the user. We propose an inner wrist-worn device with a trackball that allows users to operate it by sliding their wrist against their body. This technique provides precise, long-range pointing without extensive motion space and enables continuous interaction. Experiments show that this technique achieves reliable pointing performance regardless of user posture or sliding area. Potential applications we developed include shooting games, gesture control, and smartwatch input, demonstrating its effectiveness for compact and precise interactions in AR/VR environment and daily use.
SESSION: Cybersickness, Health, and Digital Twins Demos
Optical-Flow-Compensated Virtual Screens: Mitigating Visually Induced Motion Sickness in Mixed Reality Video Viewing
We propose a novel technique for mitigating Visually Induced Motion Sickness (VIMS) during video viewing on virtual screens in a mixed-reality (MR) environment. The key idea is to counteract the motion of objects in a video (optical flow) by moving the virtual screen in an opposing direction and speed. We implemented this approach in a head mounted display (HMD) application and conducted the evaluation experiment to examine its effectiveness. The results indicate that the proposed method reduces VIMS in complex clips containing irregular rotations and translational motion. Moreover, it appears to be more effective than dynamic field of view restriction techniques. The proposed method also seems to enhance spatial presence relative to both conventional viewing and dynamic field of view restriction, therefore achieving VIMS mitigation and heightened immersion.
Interactive Depth-Shifting Aerial Image Display Using Angle Changes Between the Display and Optical Elements
We implemented an interactive system that enables a three-dimensional movement of an aerial image. The proposed system employs two servo motors to adjust the angles of the retroreflective element and the light source display, thereby controlling the depth position of the aerial image. The developed prototype can reproduce the position of a virtual character in three dimensions within a cubic range of 100 mm per side. Furthermore, by sensing the user’s fingertips, the prototype enables spatial interactions such as the character following the fingertip, landing on it, and being flicked away by the user.
SESSION: Multimodal Experiences Demos I
SESSION: Affective, Collaborative, and Social Interaction Demos II
SESSION: Multimodal Experiences Demos II
A Multimodal Haptic System for Pulling Virtual Objects
We propose an interactive haptic system that lets users pull virtual objects with various loads and multimodal feedback. The system is based on a syringe-plunger mechanism with a proportional solenoid valve at the syringe tip, while the plunger serves as the extractable object. The plunger integrates load cells for vertical and bending forces, an accelerometer, and a vibro transducer for audio and tactile feedback. Experiments examined how vertical loads vary with valve opening and timing, and how bending loads change with deflection. A virtual plant application demonstrates pulling and bending with diverse load profiles and combined tactile-auditory sensations.












