http://arvips.squarespace.com/
AR for VIPs is an application designed to turn the Microsoft HoloLens 1 into an assistive device for visually impaired people. It does so by sonifying obstacles via the spatial mesh that the HoloLens creates, and using text recognition and text to speech algorithms from Google to allow blind users to read text at a distance.
It was created by University of California, Berkeley, School of Information graduate students Dylan Fox, Alyssa Li, Anu Pandey, and Rohan Kar, and Virtual Reality at Berkeley undergraduates Rajandeep Singh, Manish Kondapalu, Teresa Pho, and Elliot Choi.
For information on HoloLens basics, application setup, and voice commands, see this Google document: https://docs.google.com/document/d/1kNmwy0BvAHv7f46hym_HN8SThqGqnV9dunIzMdukwFk/edit?usp=sharing
For information on getting your Unity environment setup, including download links for HoloToolkit 2017.4.3.0, see this Google document: https://docs.google.com/document/d/1bbTqdkJNIzG7EznGzk95gs6i77JLNr2ljUWSFLjtj6E/edit?usp=sharing
The text detection was largely based off of Jonathan Huang's paper - https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0210630. Github Link - https://github.com/eacooper/HoloLensSignARApp
The remainder of this ReadMe will focus on the components in Github and what they do.
These are the components that appear in the upper left “Hierarchy” panel in Unity when you start the Mesh-Manipulation project.
This is the standard Camera as set by the HoloToolkit Mixed Reality>Apply Mixed Reality Settings option, with one adjustment: the MixedRealityCamera’s audio listener component was disabled and a new CameraAudioListener added that is rotated 180 degrees, in order to counteract the audio reversal we encountered.
These objects don’t appear physically in the scene, but they have important scripts attached that affect the project.
This is the default Input Manager as set by the HoloToolkit. It captures user inputs from many different sources. We made no modifications.
This component contains a “Keywords” list that includes all voice command keywords the application will listen for. To enter a new keyword, use the plus button in the bottom right of this component in the Inspector, then add it to the Speech Input Handler component in the ScriptManager (below).
This component holds several vital scripts: Control Script, Speech Input Handler, and Obstacle Beacon Script. See “Script” section below.
This component holds the scripts related to image capture, text recognition and text to speech: Camera Manager, Icon Manager, Settings Manager, Text Reco, and Text to Speech Google. See “Scripts” below.
This component holds scripts related to spatial mapping and mesh processing: Play Space Manager, Surface Meshes to Planes, and Remove Surface Vertices. See “Scripts” below.
This is the cursor as placed by HoloToolkit. No modifications.
This component controls the spatial mapping scripts and serves as the parent object for the spatial mesh when the app is launched. By modifying the Spatial Mapping Observer values for Triangles per Cubic Meter and Time Between Updates, you can control the speed and fidelity of obstacle meshing.
A light to illuminate the beacons and other holograms. We did not modify it.
This holds the non-mesh objects that can be seen in the application.
This holds all obstacle and wall beacons.
This holds all text beacons.
A cube that changes color based on input. Currently disabled.
This is a collection of three pieces of debug information: Spotlight Text and Proximity Text, which indicate whether Spotlight and Proximity modes are on respectively; and Debug Text, which prints a copy of all text sent to the Debug window for in-app debugging. This object is Tagalong and Billboard, which means it follows the user and stays rotated towards them. It can be toggled on or off with the “Toggle Debug” command, and cleared with the “Clear Debug” command.
This script serves as the central nervous system of the application in several ways. It connects many of the important scripts, and converts input from the user into activation of the core scripts. It needs to be assigned the Text Manager, Text Beacon Manager, Sample Text, and Debug Text objects from the inspector. It works closely with the Obstacle Beacon Script and the TextToSpeechGoogle script.
It has the following method regions:
Initialization Sets up variables, sets up gesture and debug message events.
Gesture Controls Currently set to use the tap gesture to turn obstacle mode on or off. Formerly I experimented with using tap and hold or manipulation to trigger other commands, but the different gestures interfered with one another too much.
Core Commands These are primarily triggered by voice commands, as set in the Speech Input Source and Speech Input Handler objects. They are used to turn obstacles on or off, trigger the Text Manager’s capture text process, or to trigger the Read Text Routine.
Note that Read Text must use a coroutine because A) there is a delay in getting the text files back from Google Text to Speech, and B) the method needs to be able to wait for the current string to be read before reading the next.
If using Read All Text, it goes through every beacon in the Text Beacon Manager; if using Read Text, it performs a conecast and reads only the text that the user is pointing their head at. The size and angle of this may still need adjusting.
StopPlayback, nextPlayback, and repeatPlayback all serve to make the read text function more smoothly, allowing user to stop playback, skip to the next string, or repeat the last played string. Increase and decrease Speed allow the user to adjust the speed of audio playback to their preference.
Clear Beacons This allows the user to clear obstacle beacons (which also turns obstacle mode off) or clear text beacons. WARNING: Clearing text beacons can cause serious errors in some circumstances when combined with the “Repeat” command. (exact circumstances unknown)
Adjust Obstacle Beacon Cone This adjusts the obstacle and wall beacons that are triggered at once. You can adjust the number of beacons placed on each refresh, and adjust the deviation of the cone (low = tight cone, high = all around user).
Mesh Processing Stops scanning to create wall, ceiling, and floor planes; or restarts scanning.
HoloLens Debug UI This section writes debug content to the debug text window, and also allows user to clear it and toggle it on and off.
This script simply helps display the help text in the inspector for Script Manager.
Obstacle Beacon Script This is the second most important script in the application after Control Script. It manages the functions & part of the interfaces for all obstacle and wall beacons.
Shoot Beacon This can be used to shoot a single beacon. Primarily used for testing purposes.
ConeCastExtension This is used to turn spherecast into "conecast," making it more faithful to the "spotlight" metaphor. Script taken from https://github.com/walterellisfun/ConeCast
PlaySpaceManager This script controls the surface observer object that creates the spatial mesh. You can use it to adjust various components of the meshing process, including the ability to process the mesh. Note that it works with the "Spatial Processing" script in the HoloToolkit.
ObstacleAudio This script is currently not used, but is intended to adjust the pitch of obstacles based on their height compared to the user.
These are the beacons used for basic obstacle detection. The only difference between Linear and Logarithmic is in the 3d Sound Settings distance dropoff curve; linear is tuned to be audible from farther away, whereas logarithmic makes it easier to tell as you get closer to a beacon.
Note that the Sphere Collider Radius can be adjusted to determine how many beacons will pack into an area.
Note that the Audio Clip attached to this object is a very important part of sound design!
Same as Obstacle Beacon, but uses a more subtle sound clip. Placed on meshes identified as walls.
These are placed whenever text is identified at the approximate location of the text. The "Text Instance Script" attached it to it holds the text that the beacon represents.
This object goes with the managers and contains scripts to perform spatial processing.
This is the default mesh that is used in the Unity preview window. If running the application in the HoloLens, it will be ignored.
This camera is instantiated whenever Capture Text is called so that the application can keep track separately of what was in view when the command was called.
The main scene used for our application. Includes all of the objects listed in the Unity Hierarchy above.
A collection of important assets for HoloLens functioning in Unity. See: https://github.com/Microsoft/MixedRealityToolkit-Unity/releases/tag/2017.4.3.0-Refresh
Examples of how assets in the HoloToolkit can be used.
The audio manager had to be taken from a separate version of the HoloLens toolkit.
Used to help parse the JSON received as part of text recognition.
For more information, contact dylan dot r dot fox at berkeley dot edu.