Voice input

Original article: Voice input

TL;DR

HoloLens integrates voice input, allowing users to command holograms without gestures, using gaze direction for targeting. The system offers a variety of voice commands, from basic selections to advanced hologram manipulations, enhanced by features like Cortana. However, users may face challenges like fine-grained control and voice command reliability.

Bullet points

  1. 🎙️ Voice Input: Voice input is a primary interaction method for HoloLens. It provides users with the ability to interact with holograms without the need for physical gestures. This makes navigating the HoloLens interface more intuitive and efficient, especially when navigating complex interfaces.

  2. 🎯 Voice and Gaze: In the HoloLens environment, voice commands are often paired with the user’s gaze direction. This combination allows the device to understand which hologram or application the user is referring to. For instance, if a user looks at a specific app and says “open”, the device knows which app to launch.

  3. 🎧 Device Support: Voice input is not exclusive to HoloLens. Both the first and second generation of HoloLens, as well as immersive headsets equipped with microphones, support voice commands. This uniformity ensures a consistent user experience across devices.

  4. 🗣️ “Select” Command: The “select” voice command is a universal interaction across HoloLens devices. It acts as a verbal alternative to physical gestures, allowing users to activate or choose holograms by simply saying “select”. This is especially useful when the user’s hands are occupied or when they prefer a hands-free interaction.

  5. 🤖 Hey Cortana: Cortana is Microsoft’s voice-activated digital assistant. By saying “Hey Cortana”, users can prompt her to answer questions, open apps, or perform specific tasks. This feature enhances the user experience by providing instant access to information and functionalities.

  6. 📋 HoloLens-specific Commands: These are a set of voice commands tailored specifically for the HoloLens environment. They range from basic commands like “Take a picture” to more advanced ones like “Increase the brightness”. These commands provide users with quick access to device functionalities.

  7. 🏷️ “See It, Say It”: This is a user-friendly feature where voice commands are labeled directly on the interface. If a user sees a button labeled “Adjust”, they can simply say “Adjust” to activate it. This intuitive model reduces the learning curve and makes voice interactions more straightforward.

  8. 🎨 Hologram Manipulation: Beyond basic interactions, HoloLens allows users to adjust holograms using voice commands. For instance, a user can command a hologram to face them or adjust its size by saying “Bigger” or “Smaller”. This provides a more immersive and interactive holographic experience.

  9. 📝 Dictation: Typing in a mixed reality environment can be cumbersome. Dictation offers a solution by allowing users to speak the text they want to input. When the holographic keyboard is active, users can switch to dictation mode, making text input faster and more efficient.

  10. 🚫 Challenges: Like all technologies, voice input in HoloLens is not without its challenges. Fine-grained control, such as adjusting volume by specific increments, can be tricky. Reliability can sometimes be an issue, with the system misinterpreting commands. Social acceptability is another concern, as users might feel awkward speaking commands in public. Lastly, there’s a learning curve associated with memorizing specific voice commands.

Keywords

  • HoloLens: A mixed reality smart-glasses developed by Microsoft. It superimposes 3D holograms onto the real world.
  • Cortana: Microsoft’s virtual assistant, similar to Siri or Alexa.
  • Gaze Cursor: A targeting mechanism in HoloLens where the device detects where the user is looking.
  • Air Tap: A gesture used in HoloLens to select or interact with holograms.
  • Hologram: A three-dimensional image formed by the interference of light beams.
  • Voice Dwell Tooltip: A tooltip that appears when gazing at a voice-enabled button in HoloLens.
  • Dictation: The action of saying words aloud to be typed, written down, or recorded on tape.
  • Ambient Environment Audio Capture: Capturing the surrounding audio, often for recording or analysis.
  • AudioCategory_Communications: A stream category in HoloLens optimized for call quality and narration.
  • AudioCategory_Speech: A stream category in HoloLens optimized for the device’s speech engine.
  • AudioCategory_Other: A stream category in HoloLens optimized for ambient audio recording.
  • WMR OOBE: Windows Mixed Reality Out Of Box Experience, the initial setup process for Windows Mixed Reality devices.