Preliminary Detailed Design
Your website should document your journey through MSD, so include work-in-progress as well as latest results. Use pdf's for display whenever possible so that information is easily viewable without the need to download files and open applications. (Your EDGE file repository should still contain original editable files).
Content related to this node should go in the Detailed Design Documents directory unless you have made other arrangements with your guide ahead of time.
All template text must be removed prior to your Preliminary Detailed Design Review
Team Vision for Preliminary Detailed Design Phase
Use Case Normal Flow:
The use case normal flow outlined above has been recreated as a demo to show how a normal end-user would be able to operate the device (concept proposed).
Use Case Demo:
Using the information outlined above, research was done and presented here Concept Breakdown Presentation
Kinematic MeasuresDefinition: track user head/body movements and record users direction/facing. Description: in order to maintain precise auditory cues even while not maintaining visuals of the reference point, it will be necessary to acquire: left/right head turn, up/down head turn, X, Y and Z offsets of the user.
- What measurements are most helpful for user (feet, meters, and degrees)?
- Magnometer/Accelormeter how will it work when it is connected to a user's head? Is it a digital or analog sensor? How does it interface with the microcontroller? What are the degrees of error, cost, power and other status?
- The IMU data may need to be filtered and processed at some point. This is because the system may need velocity while the IMU only provides acceleration. Will it happen at the input or at another stage? AKA: WHAT DO WE NEED FROM THIS SENSOR ?
Answers:For precise audio feedback localization, we will need the X, Y, and Z location of the sound that is playing. In order to maintain precise auditory cues even while not maintaining visuals of the reference point, it will be necessary to acquire: left/right head turn, up/down head turn, X, Y and Z offsets. Furthermore we need to know how those values relate to real-life measures (inches, degrees, meters, whatever). Optionally, we can use velocity to also add some special effects to the sound, but it isn't actually necessary.
With the Adafruit IMU a full 9dof can be measured in units of m/s^2 from the accelerometer and rad/s from the gyro. Proposed best solution to obtain X,Y,Z location of the sound is to first gather a Polar coordinate, with the angle coming from the orientation information already being gathered. The magnitude information will most likely come from camera data as a distance away from RP. This coordinate can then be converted into a Cartesian coordinate from the following
- X = rcos(theta) = camera-data*cos(orientation)
- Y = rsin(theta) = camera-data*sin(orientation)
Additionally the Z coordinate can most likely be pulled directly from the camera.
Using the information outlined above, research was done and presented here Kinematic Measures
Environmental InputDefinition:what type of camera should be used? How many cameras?
- What type of camera are we using? Things to consider: number of camera's to use, ease of programmability, how it interfaces with the microcontrollor, cost of camera, power consumption, other status information, experience from team?
- What sensors will help us normalize the image data coming in ? Who has experience with this?
Answers:Depth sensing is a necessary part of the information gathered from the choosen camera. The aquisition can be accomplished into two main ways. The first is to purchase a single view depth camera, like the Kinect depth camera. The second is the use of a two camera system.
I believe the best option for this project is the use of the two camera system. Single view cameras can be expensive and are designed specifically with deth in mind. This means that, in general, the quallity of the captured image is minimal and most likely undesirable for object recognition purposes.
A two camera system work by correlating two pixels, one from each camera, and then performing triangulation on said pixel. A more thorough understanding of this process will need to be researched and a starting point can be found here
Additionally with a two camera system either both cameras can be given equal weight in regards to object recognition, or one can take most/all of the load, think dominant eye. This would allow the controller to only need to communicate to the second camera when depth is needed to be known. With this in mind, and the likelyhood of a rasp PI for a controller two Pi cameras make the most sense for ease of connectivity and programmablility.
A datasheet for the Pi cam can be found here https://www.raspberrypi.org/documentation/hardware/camera/
Using the information outlined above, research was done and presented here Environmental Input
Data NormalizationDefinition: all restrictions and parameters on an obtained image so that images, which in our case are reference points can match.
Description: handles taking pictures and process them to be used for reference point mapping. In addition, the subsystem will include noise filtering.
In general an image is represented as an MxN matrix. This matrix has many different resolutions 128x128, 256x256, 512x512, 640x480. An entry in this MxN matrix represents a light intensity value gray level 0 (black) to 255 (white). This information can be encoded in 3 different possible forms binary, gray scale and color. Many formats are used to help transport and compress images. These formats are TIF, PGM, PBM, GIF and JPEG An Image has noise and can be formed in two different ways and sampled at infinitely many rates.
Using the information outlined above, research was done and presented here Data Normalization
Feedback (write to feedback device)Definition: takes in all Reference Point and Obstacle Avoidance data and converts it into rich, distinguished feedback
- Will audio through DACs be sufficient? Will we need to store mp3 files?
- How many ways are we going to communicate with the user?
- Can text to speech be employed ? How does text to speech work ?
Answers:I strongly recommend the use of an open source library. OpenAL looks like it will be easy to use and integrate into an embedded system.
To write appropriate feedback to the device - we will want X, Y, and Z co-ordinates of the sound. Optionally, we can indicate things like the velocity of the person moving towards the sound.
Two DAC outputs will be sufficient for any general pair of headphones (L/R channel + ground), however this may be more complicated when dealing with a raspberry pi - more research needs to be performed.
This will likely need to be experimented with. We will likely be limited in the amount of rich feedback to roughly 2-3 possible "reference-point" quality feedback positions (3D sounds that can be precisely localized). Any more than that may be overload for the user, so other information must be simple in nature.
There are two styles of feedback:
- Reference point feedback and obstacle feedback. Current plans are that reference point feedback items will be rich feedback: Meaning fully three-dimensional (simulates all spatial elements of sound) that will allow for the user to more appropriately "anchor" their position in a room.
- Obstacle feedback is not yet agreed upon: It will be either an alert that just informs the user that there is something they are about to run into (a more primitive system), or it could be something more sophisticated - but care must be taken not to overload the user.
Text-to-speech can easily be employed with the usage of the correct API.
Using the information outlined above, research was done and presented here Feedback
Data Storage Processing (Sensor Interface)Definition:a mini database management system that deals with managing storage space in the device, interfacing with external sources of data, and matching items in the database.
- What type and amount of storage needed to save the required information before being pushed to an external location (cloud, physical server)?
- In what structure/format should the data be saved?
- How are reference points going to be matched to existing reference points?
Where a matrix represents a 360 view of the reference point in the room
Objective: Considered using neural network to detect image/reference point
Approach: For every reference point there is a neural network, the neural network is trained using reference point image input and its variations.
When a reference point needs to be identified it is passed through all the neural network and the models with the highest score is chosen. Because this is a classification problem this score would be a weighted linear combination of the other networks. A suggestion for weighting will be the accuracy of the network obtained during training.
Issues: If the only measure for comparison is the reference point what do we use for negative data. In other words if all the neural network sees is 'yes' in its training, for any other input it's most likely to predict 'yes'.
A possible solution to this problem will be to use a single multi-class neural network however this approach will imply slower training and introduce the curse of dimensionality.
Conclusion: Neural network is not a good solution, since we do not have all possible combinations of our reference point data-space. In other words reference points are unique to a user and similar images may map to the same class. In summary we cannot determine what the reference point is not to appropriately train the network.
Going forward: Analyze another algorithm and their feasibilty to this solution. (2 -Nov)
Verify conclusion with subject matter expert ( 2-Nov )
Using the information outlined above, research was done and presented here Data Storage Processing
Post-ProcessingDefinition: a subsystem to identify the exact reference point from the given matching picture, and calculate XYZ location of the reference point, relative to current position.
Description:perform Triangulation to determine how far the user is from the reference point, determine angular offset between current facing and the reference point, and compute final X, Y, Z offset of reference point.
- What data should be stored in data storage processing and what data should be sent to the output stage?
- How are we going to use information from sensor or data storage processing to provide richer feedback or calculate proper 3d audio parameters
Using the information outlined above, research was done and presented here Post-Processing
User and Device Triggered OutputDefinition: Feedback request by the user for Localization or Orientation purposes, or device requested for obstacle avoidance purposes.
Description: Provides meaningful feed back to the user that will allow the user to understand their location with little to no training.
- What kind of output can be requested? Orientation, localization?
- What kind of output is provided constantly? Orientation, localization, obstacle avoidance?
Answers:User Triggered Output: user triggers output by click of a button, and the response to user trigger is audio. Chirping, Waterfall or any other audio emanates for short duration then fades until triggered again.
Device Triggered Output: output generated by the device where user input is not required. Mainly used for obstacle avoidance feedback. Feedback is delivered in beeps (less annoying than words), in 2 Frequencies - frontal and lateral
Using the information outlined above, research was done and presented here Triggered Output
HousingDefinition: holds components together and enhances User Interface while making sure it can be used multiple times.
Testing ApproachCAD Models: Create CAD Models to determine scale and design for glasses Use Thingiverse to draw inspiration for housing - VR Headset
3D Printing: Make 3D Prints of final CAD Models (to scale)
Laser Cut: Laser Cut Cardboard VR Headset to determine weight/size requirements
Using the information outlined above, research was done and presented here Housing
Bill of Material (BOM)
Design Review MaterialsInclude links to:
It is appropriate for you to send your customer and guide a link to this page in preparation for the review. This will ensure that they know what you will be presenting and how to view all of your work. Any EDGE link should start with http://edge.rit.edu/edge/P1xxxx..... Using "http" instead of "https" will ensure that non-RIT stakeholders can view the content without being prompted for a DCE login and password.
Plans for next phase
Individual 3-week plan:
|Name||Role||Individual 3-Week Plan|
|Deepti Chintalapudi||Project Manager||Phase(I) Three-Week-Plan|
|Suhail Prasathong||Team Lead||Phase(I) Three-Week-Plan|
|AbdulAziz Alorifi||Team Facilitator||Phase(I) Three-Week-Plan|
|Josh Drezner||Purchasing||Phase(I) Three-Week-Plan|
|Stuart Burtner||Engineer||Phase(I) Three-Week-Plan|
|Eronmonsele Omiyi (EJ)||Engineer||Phase(I) Three-Week-Plan|
- As an individual on the team, what are you doing to help your team achieve these goals? (Use the individual 3-week plan template for this)