IntroductionIn our daily life, there are various appliances controlled by remote-controlled devices. Elderly and disabled people have difficulty recognizing the remote controls of different devices since most of them have the same shape and size. If we take the example of people with visual impairments first, it will be difficult for them to recognize which remote control belongs to which home appliance and then recognize the target device which is assembled together with the home appliances. After reviewing numerous documents, I found that a solution to this problem can be achieved with devices controlled with the help of gesture and voice recognition. A solution that comes to everyone's mind is Smart Home. Yes, it's a great concept to implement, but can you imagine what the cost would be? It's a great innovation but will be somewhat limited to The Elite Group. Therefore, if we want a real transformation we must make it available and affordable to all economic groups. Say no to plagiarism. Get a tailor-made essay on "Why Violent Video Games Shouldn't Be Banned"? Get an Original Essay There are many possible solutions for this problem. But these types of houses are still not very popular? The reason is the very simple cost! Instead of creating a new product, we can make it much cheaper by using existing resources. Nowadays, cell phones are the most common gadgets we have at home. You can make good use of this resource since it is already equipped with a microphone. Literature Survey Personalized Speech Recognition on Mobile Devices, Ian McGraw, Rohit Prabhavalkar, Raziel Alvarez, Montse Gonzalez Arenas, Kanishka Rao, David Rybach, Ouais Alsharif, Hasim Sak, Alexander Gruenstein, Francoise Beaufays, Carolina Parada. In this article the authors have done research work to create an accurate, low latency recognition system with small memory along with computational footprint which will help to run faster on Android devices. Quantization was performed on long short-term memory (LSTM) with connectionist temporal classification (CTC) capable of directly analyzing and predicting target phonemes. Here the memory size has been reduced by the SVD based compression scheme. The basic concept here is deep quantized neural networks (DNNs) and on-the-fly language model rescoring to achieve real-time performance on modern smartphones. In the article Small Memory Size and Computational Constraints, the result of word error rate (WER) performance and latency using short-term memory (LSTM) recurrent neural networks (RNNs), trained with connectionist temporal classification ( CTC) and state Minimum level Bayes risk techniques are very valuable and highly accurate. LSTMs are made small and fast enough by quantizing parameters to 8 bits, using context-independent (CI) phone outputs instead of more numerous context-dependent (CD) phone outputs, and using Singular Value Decomposition (SVD) compression. Acoustic models are trained on 3 million manually transcribed anonymous utterances extracted from Google voice search traffic (approximately 2,000 hours). All models are trained using asynchronous distributed stochastic gradient descent (ASGD). To improve robustness to noise, they generated “multi-style” training data by distorting each training utterance using a room simulator with a virtual noise source, to generate 20 distorted versions of eachenunciation. They mined YouTube videos and environmental recordings of everyday events for noise samples. To further reduce memory consumption, they compressed acoustic models using projection layers that lie between the outputs of an LSTM layer and the recurrent and non-recurring inputs to the same and subsequent layers. Adapting the acoustic models to generate multi-style training as described above yields an additional 12.8% relative improvement over the compressive SVD model. Since the 11.9 MB floating-point neural network acoustic model consumes a significant portion of memory and processing time, after quantizing the model parameters into the 8-bit integer-based representation it had an immediate impact on the memory usage, reducing the footprint of the acoustic model to a quarter of the original size. The final footprint of the acoustic model was 3 MB. Regarding the linguistic modeling of the device, the focus is on creating a compact linguistic model for dictation and voice commands. Maintaining a small system footprint, they trained a single model for both domains. They also limited the vocabulary size to 64K. Language models are trained using unsupervised speech logs from the dictation domain (∼100 million utterances) and the voice command domain (∼2 million utterances). This design of a compact speech recognition system with large vocabulary can work efficiently on mobile devices, accurately and with low latency. This was done using a CTC-based LSTM acoustic model that predicts context-independent phones and is compressed to one-tenth of its original size using a combination of SVD-based compression and quantization. For efficient decoding, we use an on-the-fly rescoring strategy followed by further optimizations for CTC models that reduce computation and memory usage. Combining these techniques allows you to build a system that runs 7x faster than real time on a Nexus 5, with a total system footprint of 20.3MB. Remote control system of home appliances using voice recognition, Noriyuki Kawarazaki and Tadashi Yoshidome.In this article, a remote control system of electrical devices with the help of voice recognition is mainly developed. The remote control system consists of the PMRC, a PC, a microphone and a speaker. PMRC is a programmable multiple remote control which is used here to store the functions of many remote controls. PMRC is a device that can perform the task of multiple remote controls at the same time. It has infrared LEDs mounted all around. These infrared LEDs sent infrared signals in any direction. So the user does not have to worry about the location of the PMRC. When a user gives voice command to the system, the PMRC sends the infrared beam signal to the home appliance. Then the system provides the text-to-speech message to the operator and many voice commands for remote control operations depend on sentences in order to have a user friendly interface in the system. In this use voice recognition software and morphological analysis software to recognize voice commands based on the sentence. "Julius" is the famous free speech recognition software for researchers. “Mecab” segments Japanese sentences into sequences of morphemes and analyzes them based on part of speech. The average speech recognition rate is 60%. The error caused by not understanding sentence-based commands. In the future, numerous tests will be carried out for visually impaired and elderly people and.
tags