A mobile virtual assistant can be defined as the software agents in the handheld mobile devices. The main functionality of this virtual assistant like Cortana, Siri, and Google Now is to help the users of the device to have a single point entry by using the voice input accomplish different tasks using the applications (such as searching, messaging etc.). Progress in the field of the AI (artificial intelligence), natural language processing and neural networks made it easier for the smart phone users to easily accomplish different types of tasks only via providing the voice instructions. As mentioned by Webster et al. (2014), this type of software agents/application are the able to answer user’s questions, provide recommendations for some specific tasks. In addition to that, these agents are also capable of performing actions as the response to the user’s voice request while interpreting the request using the natural language processing.
All the virtual assents are mainly are mainly developed using the principles of natural language processing. This virtual assistant or the software programs utilize algorithms in order to take in to consideration the specific components from the sentence provided as input to the assistant and then matches with the most suitable output for the users query or request (O'Sullivan & Grigoras, 2013). This agent always collects contextual information about its user in order to manage and plan tasks for the user. The virtual assistants mainly uses the semantic web technologies as well as different open data which are available on the Internet.
On the other hand, the contextual information about the user are the calendar, location, time, appointments with others individuals, relations among different tasks as well as decomposition between them, history of tasks, user-specific interests for different tasks.
Mobile virtual assistants
The mobile virtual assistants are helpful for different types of users as they can accomplish different types of tasks. These virtual assistants are becoming more popular because of its simplicity and extensive variety of capabilities (Webster et al., 2014). It could be useful for those that need do things in an easier and speedier way with a single interface. The users who find it harder to type or search on the tiny small smart phone consoles or touchpads. It can likewise be utilized to help the aged individuals and youngsters to utilize the virtual assistant by saying the commands and getting the desired outcome as opposed to propelling between the different applications or going over all through contacts list down to find a particular name when need to make a call. Since the application can work disconnected (without the internet connectivity)
Emergence of Mobile virtual assistants
The mobile virtual assistants are developed in order to simplify the life of the users by reducing the effort which is needed to complete routine tasks such as placing a call, texting something to someone etc. (O'Sullivan & Grigoras, 2013). In addition to that, retrieving data real time data about the weather, traffic movement on the streets, news/other data, remotely controlling smart home devices, requesting product purchases from online store, communication with other applications (for instance, the capacity to book an cab from a cab booking application and interacting another applications.)
Technology and concept behind its functionality
Every virtual assistant that are used by the smart phone users is mainly using the artificial intelligence to parse the user's instruction that user had said or typed on the console after that it returns the useful information against the users query (Webster et al., 2014). For this AI component the main objective is speech recognition of the user of the device. This type of user related data are very helpful in determining the user behavior of any specific user with higher accuracy level (this includes travels of the user, habits, search or different topic preferences of the user). Even though this is considered as one of the great aspects that helps the service providers such as google and Apple to be able to return impressively useful results for the user depending on their habits and preferences (O'Sullivan & Grigoras, 2013). This is also achieved without sniffing into the users email and dipping into the users search history on the devices. Thorough voice-search history of the users also permits the organizations to learn the user’s vocal idiosyncrasies, so that the virtual assistants understand the spoken requests of the users better in the future events.
For the mobile virtual assistants, there are mainly three components are used. These are voice recognition, text to speech conversion and natural language processing engine (Webster et al., 2014). For this virtual agents, there are no pre-defined formats in order to request theses agents to do something (complete a task) or answer a user’s question.
Device to get input: smart phones or the Echo Dot or other relevant devices gives a speech-based user interface to the users. Through this interfaces, the user issues voices based commands or makes inquiries, and the device responds to the query in speech (Chung et al., 2017). The devices and the interfaces are always ready for listening user commands, and just when it recognizes the keyword such as Alexa or “Ok Google” from its user, it transmits the audio file to the respective cloud services for processing.
Cloud services to send and receive the user data: The respective cloud services get the audio file from the devices and process the command using the speech recognition technique and mapping voice commands to different actions and information customized for the user. Given the decided aim, the cloud services appoint the third party skilled server, worked by either Google /Amazon for worked in aptitudes or by an outsider for outsider application abilities.
Figure 1: processing of user commands by Virtual mobile assistants
(Source: Created by Author using Visio)
Skill servers for processing of the user input/feature extraction: The skill servers are basically responsible for completing tasks the Google/Alexa servers may perform for a user’s command, for example, checking the weather condition or requesting a pizza from the store. Some of the basic tasks are offered and completed by Google/Amazon servers and come as inherent usefulness for all the users, and these are taken care of by their own servers.
The general arrangement of components
Voice recognition: Voice recognition is clearly the simplest and trivial stage of this whole process. When a user gives the virtual agents a command, the device gathers the users analog voice signal, changes it into a sound/audio file (converted into binary code for faster interpretations) and sends it to the processing backend servers (Chung et al., 2017). The subtleties of the user’s voice, the noises with the command and the neighborhood articulations make hard to interpret it in a correct way. It is called Human User Interface versus the standard Graphical User Interface we are utilized to. It is vital here that, consistently, google and apple gathers a large number of user queries of individuals talking distinctive languages, in various accents from different corners of the world. At the end of the day with their activities and errors in voice commands, individual users contribute to the improvement of the different virtual agents developed by the different organization to provide ease of use to their users. Mobile virtual assistants today gets about a billion queries for every week, and the backend processing are able to respond to the queries with only a 5 percent mistake rate (Nguyen & Sidorova, 2017). As of late, Apple, google and Microsoft are still working on the used algorithms and AIs for improving their performance and user experience.
Most of the mobile virtual assistants follow a similar architecture in order to interact with the user and complete different tasks.
As mentioned in their research paper to determine a model that will help in measuring the quality of the user experience and usability of the mobile virtual personal assistants, the user experience needs to be the most important component in the design and development of virtual mobile assistants. The researchers explored that, visual interaction with the virtual assistants is another component that must be considered as the important segments to the Virtual mobile assistants that is not present in the virtual assistants when contrasted with a human assistant. It is a question worth inquiring as to whether we ought to try and investigate the visual viewpoint keeping in mind the end goal to enhance the quality of interaction with the virtual assistants (Webster et al., 2014). In any case, there are a few different segments of the existing virtual mobile assistants that are known to be the confinement. Initially, building up an ideal tone and pitch to convey that coveted effect of the voice is also important. Furthermore, the precise identification of commands given in different languages and accents restrains the reliance and use of a virtual mobile assistant.
In addition to that, the researchers McTear, Callejas and Griol (2016), opined that in any assessment of a mobile virtual assistant framework it is important to critically ensure the user satisfaction while they interact with the assistants (O'Sullivan & Grigoras, 2013). Multimodal interfaces consolidate visual and sound-related signals to advance the interactions. However, they raise new difficulties concerning the ease of use and adequacy of such interfaces.
From the results of the Project FASiL or “Flexible and Adaptive Spoken Language and Multimodal Interfaces” project , a virtual assistant framework was created to comprehend which factors influence the client encounter and the acknowledgment of multimodal administrations. Results demonstrate that a conversational and multimodal approach was extremely all around acknowledged and upheld by the users (Chung et al., 2017). Moreover, the quality and speed of the framework criticism and in addition the acknowledgment precision of the talked parts are key components to a better user interaction.
As per the investigation of the researchers found that the level of user satisfaction depends on the extent of interactivity, sense of presence of the virtual assistant, degree of reliability, degree of dependency, extent of focused attention , easiness of use, extent of satisfaction and the extent of users expectation being met (Webster et al., 2014).
The VPA is relied upon to help and improve the accompanying highlights:
Informing the user: Mobile experts can oversee voice message, email and faxes utilizing their VPA from the street.
Managing Contact: Scheduling, arranging, aggregate date-book, contact and referral association, all can be dealt and using the virtual mobile assistant.
Call Control: The virtual assistants empowers remote users to perform gathering calling and call administration. Warning and sending highlights enable remote clients to be told instantly when they get particular phone messages, messages, faxes, or pages (Nguyen & Sidorova, 2017).
Web Applications: The virtual mobile assistants enables the users to get to and connect with the web to offer assistance them source data going from climate, headings and calendars, to stock execution, aggressive information and news. All are utilizing basic, conversational voice orders.
Security and privacy issues
In most of the cases, the virtual assistant sends the users queries or requests to the main server to parse and process the command or the request. After processing the request the suitable response to the query is send back to the user’s device. When the user's request are sent to the main server for the processing, it is sent with unique device ID so that the response to the query. Even though the user queries that are combined with distinctive device IDs that are not connected to any of the specific users but Apple provisions this user requests to the Siri with the corresponding device IDs at least for six months (O'Sullivan & Grigoras, 2013). In addition to that, after six months when the IDs are deleted they keep the audio clip of the users request for additional 18 months. Thus in case the main server or the IT infrastructure of the company.
In Most of the cases, it is found that the mobile virtual assistants respond to the indirect speech of the user. These assistants execute commands/orders conveyed by means of a voice message, voice recording. In addition, the assistants can execute the same command more than once, regardless of the possibility that those commands are delivered from the same digital sound file (Coskun-Setirek & Mardikyan, 2017). This happens as the assistant is not able to distinguish between a recorded voice and natural voice of a user. In addition to that, these assistants are additionally ready and area able to execute the commands that goes through obstructions.
The virtual mobile assistants are also vulnerable to the different hacking attacks which in turn help the hacker to take control of the Users Amazon Echo. After getting control over the Echo, they can convert it into a personal Eavesdropping microphone to spy on a user. In addition to that, it is also found that the virtual assistants like the Google’s Assistant and Apple's Siri are also can be hijacked by using the inaudible sounds ( sounds that are not audible to the human due to their too high frequency; more than 20,000Hz ).
Researchers from Zhejiang University were able to hack and get the control over “Siri” the virtual assistant used for the MacBook’s and iPhones, Cortana that is used for windows PC, Bixby of Samsung phones and Amazon Echo speakers (Coskun-Setirek & Mardikyan, 2017). Even though the exploitation of the vulnerabilities of this mobile assistants requires attacker’s high level knowledge of embedded systems as well as Linux operating, but after gaining the control over this tools
Again the virtual mobile assistants are not able to differentiate the users from their voice. Therefore, it is possible for the other users to order or command anything through this virtual assistant. As a result, the unauthorized and unauthenticated users can order and provide different commands to the users.
Early in this year, a TV channel of California Channel CW-6 was reporting a children’s order of doll house and some cookies, eventually the other Alexa of the devices of at the viewer’s homes also collected the anchors command as the command from its users and ordered dollhouses and resulted in huge amount of orders for the dollhouse.
Reasons behind its popularity
Following are the reasons behind the success of these virtual mobile assistants,
Least amount of effort required for the user to complete a task: The main objective of these virtual mobile assistants are to make the life of the user easier. A key prerequisite is that the administration does not take much time from the user (Webster et al., 2014). Use of the virtual assistants helps the user to not to be loaded with programming contents of any sort nor with acquiring any sort of specialized learning about the framework to use the virtual assistants. Keeping in mind the end goal to accomplish this usefulness the framework ought to learn autonomously from the user input.
Supervision by the users: In the use of the virtual assistants one of the most important features is that the user holds control over the behavior of the virtual assistants.
Adaptive: The framework adjusts rapidly to the changing inclinations of the users need.
Benefits of Virtual mobile assistant
A mobile virtual assistant is considered as a mix of natural language processing with the artificial intelligence. With the above mentioned two tools, it is feasible for the virtual assistants to be able to process an expansive volumes of information and to answer or solve the user queries with a more than 90 percent precision rate.
In other words, a mobile virtual assistant is a customized, computerized digital character, which gives data and assistance in completing tasks to their users (Chung et al., 2017). This aide underpins clients' inquiries on themes, for example, organization points of interest including money related and financial specialist related subtle elements, item portfolios, and innovation related data. They give better client engagement when contrasted with client officials.
These virtual assistants made the life of the people easier as they can send an instant message or email utilizing voice orders to their assistants, check their timetable or routine for a given day, use different applications installed in their devices, and turn different settings off and on as per their needs (Coskun-Setirek & Mardikyan, 2017). Started with the Siri, in the year 2011 now there are various assistants are available for the users for different platforms (such as Google assistant for android devices, Bixby for the Samsung devices etc.)
Most of the virtual assistants need the continuous and uninterrupted internet connections as the commands of the users are sent over to the servers at the apple headquarters in order to process the command so that the virtual assistant can respond with the appropriate response to the user (O'Sullivan & Grigoras, 2013). If the user does not have a stable internet connection, then Siri or other virtual assistants would not be able to respond to the user request.
Issues impacting its future
There are different types of attack that are able to intercept the data pockets between the mobile assistant and other services. These intercepted data packets can be then analyzed in order to recover sensitive data sent over the transmission medium (Coskun-Setirek & Mardikyan, 2017). In worst case scenario the listeners or the hackers may tray and able to recover personal any individual’s personal details, economic payment information for the purchases by the users from transmitted data packets.
Possible feature of this technology
With all the above mentioned limitations and issues with the mobile virtual assistants, with this trend of research and development in the field of the virtual assistants, it is possible that the mobile virtual assistants in the future will continually prompt the users to provide suggestions for a specific tasks and taking command/instructions from the users of the device. In this course of action, the virtual mobile assistant will know more about their users than perhaps they do about themselves by analyzing the different aspects (Webster et al., 2014). Thus the future of the mobile virtual assistants would not be separate and distinct. On the contrary they will be embodied and unified as one entity. With ongoing development in the field of the speech recognition and natural language processing will help in the better understanding of the users command to the assistants. This improvements will help the virtual assistants to improve their adaptive learning process.
According to the USA based media and analytics company “comScore”, there will be 200 billion searches by the year 2020, of which 50 billion searches will be done through the voice based searches. In the future, there will be a more noteworthy requirement for different online platforms to collect or harness user information, caught at various points, for example, following client activities at different times, product or other preferences, and so on (O'Sullivan & Grigoras, 2013). Information could be utilized over the different times and user activities for the learning process of the AI and take preventive measures to avoid the unwanted errors while executing the user commands. For instance, the AI behind the virtual assistant could use these data in different user product purchases where the analysis of the user data will make the process of purchasing a product easier by sorting the products on the on line stores as per the preferences of the users as collected and analyzed from the data (Bellegarda, 2014). The mobile virtual assistants could likewise be utilized as a part of other capacities, for example, in users different decision making, where the mobile virtual assistants could give mechanized answers to the other recipients which would help in the reducing the expenses and improve responsiveness of the virtual assistant .
Contextually it can be said that presently there are development in the field of the augmented reality based mobile virtual assistant that will be capable of answering the queries and complete other related tasks asked by the user of it.
In the development of the virtual assistants that acts in the augmented reality for the user, an intuitive and unobtrusive multimodal interface for the users. The users of the device will provide voice command to the assistant (Coskun-Setirek & Mardikyan, 2017). Using voice recognition process, the feature extraction is done from the given command. In the response process the, text to speech process is utilized in order to provide the response to their users. The virtual assistant selects the objects in the real environment of the user by using the gaze based approach while determining the location of the user by using the GPS feature of the device. Viewing direction of the assistant is determined by the orientation sensor of the device.
One of the best features of the mobile virtual assistants are, these assistants are able to learn and adapt to the different situations without any human supervision. These enable the users of the devices of smart phones and tablets, to speak the natural language as the voice commands for operating the applications and other services. This also helps in order to operate the mobile device and its different applications to complete different tasks. Device users can speak commands and can have audible confirmation responses from the agent in order to send messages, set reminders, call someone, operate music and video applications on the phone, tablets.
Bellegarda, J. R. (2014). Spoken language understanding for natural interaction: The siri experience. In Natural Interaction with Robots, Knowbots and Smartphones (pp. 3-14). Springer, New York, NY.
Canbek, N. G., & Mutlu, M. E. (2016). On the track of Artificial Intelligence: Learning with intelligent personal assistants. Journal of Human Sciences, 13(1), 592-601.
Causey III, J. D., Purvis, R. E., & Henke, J. L. (2013). U.S. Patent No. 8,579,813. Washington, DC: U.S. Patent and Trademark Office.
Chung, H., Iorga, M., Voas, J., & Lee, S. (2017). Alexa, Can I Trust You?. Computer, 50(9), 100-104.
Coskun-Setirek, A., & Mardikyan, S. (2017). Understanding the Adoption of Voice Activated Personal Assistants. International Journal of E-Services and Mobile Applications (IJESMA), 9(3), 1-21.
McTear, M., Callejas, Z., & Griol, D. (2016). The Dawn of the Conversational Interface. In The Conversational Interface (pp. 11-24). Springer International Publishing.
Nguyen, Q. N., & Sidorova, A. (2017). AI capabilities and user experiences: a comparative study of user reviews for assistant and non-assistant mobile apps.
O'Sullivan, M. J., & Grigoras, D. (2013, March). The cloud personal assistant for providing services to mobile clients. In Service Oriented System Engineering (SOSE), 2013 IEEE 7th International Symposium on (pp. 478-485). IEEE.
Saad, U., Afzal, U., El-Issawi, A., & Eid, M. (2017). A model to measure QoE for virtual personal assistant. Multimedia Tools and Applications, 76(10), 12517-12537.
Santos, J., Rodrigues, J. J., Casal, J., Saleem, K., & Denisov, V. (2016). Intelligent personal assistants based on internet of things approaches. IEEE Systems Journal.
Sarda, S., Shah, Y., Das, M., Saibewar, N., & Patil, S. (2017). VPA: Virtual Personal Assistant. International Journal of Computer Applications, 165(1).
Webster, M., Dixon, C., Fisher, M., Salem, M., Saunders, J., Koay, K. L., & Dautenhahn, K. (2014). Formal verification of an autonomous personal robotic assistant. Proc. AAAI FVHMS, 74-79