Using Sonar for Liveness Detection to Protect Smart Speakers against Remote Attackers

Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, pp. 1-28, 2020.

Cited by: 5|Views66
EI
Weibo:
The Speaker-Sonar raises the bar for remote attacks and is able to effectively tackle all known attacks techniques that can be used for creating malicious voice command attacks to the best of our knowledge

Abstract:

Smart speakers, which wait for voice commands and complete tasks for users, are becoming part of common households. While voice commands came with basic functionalities in the earlier days, as the market grew, various commands with critical functionalities were developed; e.g., access banking services, send money, open front door. Such vo...More

Code:

Data:

0
Full Text
Bibtex
Weibo
Introduction
  • Known as intelligent voice assistants such as Amazon Echo and Google Home, are becoming popular; as of May 2018, 54.4 million people in the U.S own a smart speaker [20].
  • The authors make use of both techniques; active sonar is utilized for detecting movement of users by emitting ultrasonic sound with a speaker and receiving the reflected sound with a microphone array whereas the passive sonar, similar to human ears, is used for localizing voice commands with SSL techniques.
  • In the work, the authors use the SRP-PHAT-HSDA as it is relatively lightweight but still robust in performance
Highlights
  • Smart speakers, known as intelligent voice assistants such as Amazon Echo and Google Home, are becoming popular; as of May 2018, 54.4 million people in the U.S own a smart speaker [20]
  • We make use of both techniques; active sonar is utilized for detecting movement of users by emitting ultrasonic sound with a speaker and receiving the reflected sound with a microphone array whereas the passive sonar, similar to human ears, is used for localizing voice commands with SSL techniques
  • Google Home users make an average of 5.98 voice commands daily, whereas Alexa users make an average of 7.28 voice commands
  • We propose the Speaker-Sonar, a sonar-based defense system for smart speakers
  • Our defense system aims to protect the smart speakers from remote attackers that leverage network-connected speakers to send malicious commands
  • Small Direction Difference. Such type of threat can be handled as the Speaker-Sonar can distinguish small angle differences, as shown in Section 5; the Speaker-Sonar can reject malicious commands with 90% accuracy at 20◦ and 80% at 10◦ from 1m to 2m
  • The Speaker-Sonar raises the bar for remote attacks and is able to effectively tackle all known attacks techniques that can be used for creating malicious voice command attacks to the best of our knowledge
Methods
  • Design and Architecture

    The design of Speaker-Sonar, illustrated in Figure 2, consists of 4 modules: Spectrum Preparation, User Direction Analyzer, Command Direction Analyzer, and Direction Consistency Checker.
  • The design of Speaker-Sonar, illustrated, consists of 4 modules: Spectrum Preparation, User Direction Analyzer, Command Direction Analyzer, and Direction Consistency Checker.
  • Their responsibilities are as follows: The Spectrum Preparation module prepares the frequency spectrum for the following steps by transmitting an ultrasonic sound and performing STFT (Short Time Fourier Transform) and windowing on the received signals.
  • The processed STFT results are duplicated, and each of them is passed to the User Direction Analyzer and Command
Results
  • The authors collected a total of 411 valid responses (199 for Alexa and 212 for Google Home).
  • 3 Alexa users and 18 Google users report using the device at a distance of 10–20m and the authors count them as outliers.
  • Google Home users make an average of 5.98 voice commands daily, whereas Alexa users make an average of 7.28 voice commands
Conclusion
  • Additional Design and Practical Issues

    Here the authors discuss the practical issues of Speaker-Sonar and the additional design that alleviates the issues.
  • Popular smart speakers such as the Amazon Echo have circular LED lights that are used for various functionalities; e.g., pointing the direction of the voice command it receives.
  • If more one users are close together, the system would not be able to recognize the number of users
  • Such a case would not affect the consistency check as any commands coming from the entire direction of movement would be considered legitimate.In this work, the authors propose the Speaker-Sonar, a sonar-based defense system for smart speakers.
  • The Speaker-Sonar raises the bar for remote attacks and is able to effectively tackle all known attacks techniques that can be used for creating malicious voice command attacks to the best of the knowledge
Summary
  • Introduction:

    Known as intelligent voice assistants such as Amazon Echo and Google Home, are becoming popular; as of May 2018, 54.4 million people in the U.S own a smart speaker [20].
  • The authors make use of both techniques; active sonar is utilized for detecting movement of users by emitting ultrasonic sound with a speaker and receiving the reflected sound with a microphone array whereas the passive sonar, similar to human ears, is used for localizing voice commands with SSL techniques.
  • In the work, the authors use the SRP-PHAT-HSDA as it is relatively lightweight but still robust in performance
  • Objectives:

    The authors aim to build a defense mechanism for smart speakers that verifies whether the direction of the user and the command the smart speaker received is consistent; in other words, the authors check whether the command received is coming from the same direction of the user.
  • Methods:

    Design and Architecture

    The design of Speaker-Sonar, illustrated in Figure 2, consists of 4 modules: Spectrum Preparation, User Direction Analyzer, Command Direction Analyzer, and Direction Consistency Checker.
  • The design of Speaker-Sonar, illustrated, consists of 4 modules: Spectrum Preparation, User Direction Analyzer, Command Direction Analyzer, and Direction Consistency Checker.
  • Their responsibilities are as follows: The Spectrum Preparation module prepares the frequency spectrum for the following steps by transmitting an ultrasonic sound and performing STFT (Short Time Fourier Transform) and windowing on the received signals.
  • The processed STFT results are duplicated, and each of them is passed to the User Direction Analyzer and Command
  • Results:

    The authors collected a total of 411 valid responses (199 for Alexa and 212 for Google Home).
  • 3 Alexa users and 18 Google users report using the device at a distance of 10–20m and the authors count them as outliers.
  • Google Home users make an average of 5.98 voice commands daily, whereas Alexa users make an average of 7.28 voice commands
  • Conclusion:

    Additional Design and Practical Issues

    Here the authors discuss the practical issues of Speaker-Sonar and the additional design that alleviates the issues.
  • Popular smart speakers such as the Amazon Echo have circular LED lights that are used for various functionalities; e.g., pointing the direction of the voice command it receives.
  • If more one users are close together, the system would not be able to recognize the number of users
  • Such a case would not affect the consistency check as any commands coming from the entire direction of movement would be considered legitimate.In this work, the authors propose the Speaker-Sonar, a sonar-based defense system for smart speakers.
  • The Speaker-Sonar raises the bar for remote attacks and is able to effectively tackle all known attacks techniques that can be used for creating malicious voice command attacks to the best of the knowledge
Tables
  • Table1: Terminologies in this Paper
  • Table2: Table 2
  • Table3: Table 3
  • Table4: Accuracy of Consistency Check
  • Table5: Accuracy of Consistency Check with Different Furniture Density
  • Table6: Accuracy of Consistency Check with Users
  • Table7: Survey result of the distance (in meter) of the user when talking to Alexa and Google Home
Download tables as Excel
Related work
  • Smart speakers are becoming more ubiquitous as it is becoming the primary interaction medium between people and machine (such as smartphone, personal voice assistant, smart home appliances etc.) [10, 13, 16]. Thus, ensuring the authenticity of the voice commands leads to an active research area. Attack on Voice Interface. Recently, a growing body of research has exploited the existing vulnerabilities that lie in voice interface [39, 41, 63, 64]. Researchers, with their sophisticated innovations, craft attacks to exploit vulnerabilities [40, 45, 60] in the voice interfaces (such as Google Assistant [16], Amazon Echo [4], Google Home [17], Apple Homepod [9] etc.). Taking control or being able to inject malicious commands into the voice interfaces enable the attackers to cause serious damage to the user (such as an unwanted purchase from the online shop [5] and malicious interactions between other smart home devices). Prior research works showed some serious attacks [29, 52, 70] which are very difficult to protect as those commands are incomprehensible to human. Recently, there happened some unintentional incidents which reveal that the smart speakers are more vulnerable to attacks [11]. Researchers demonstrated new techniques to execute a command in the smart speaker which is very easy to design. Yuan et al demonstrated an attack by embedding adversarial voice command into a song which is recognized by the voice recognition system [69]. Besides causing serious security issues by executing malicious commands, researchers found that user privacy and sensitive information can be leaked through the smart speakers [32]. Defense on Voice Interface. While there are many pieces of research on attacks, little has been done to protect voice interfaces against those attacks. As a consequence, voice interfaces are still vulnerable to state-of-the-art attacks and can cause severe consequences to the user. Blue et al propose to differentiate between humangenerated and machine-generated voice command based on the spectrum analysis [27]. This solution needs to build a noise filter for each speaker during the initialization phase, while Speaker-Sonar can start the detection immediately. VoiceGesture [71] extracted user-specific features in the doppler shift for live user detection. On the other hand, researchers used captcha to authenticate the user when receiving a voice command [59]. However, such solutions are intrusive in terms of user experiences as they ask the user to perform additional actions. Alanwar et al proposed EchoSafe which is a sonar-based defense mechanism against voice command attacks [22]. Whenever the room’s environment changes (such as the position of furniture and other objects), Echosafe fails to perform accurately against the voice attacks. Because every time it needs to be trained for a particular orientation of the room. However, Speaker-Sonar is not affected by the change of the rooms’ orientation. Furthermore, Speaker-Sonar reaches high accuracy under different scenarios. Blue et al propose 2MA [26], which also utilizes the direction of arrival (DoA) of the voice commands to prevent remote attacks. However, 2MA requires multiple devices for localization and assumes that the user is in constant possession of their mobile device. On the other hand, our approach not only uses DoA of voice but also the movement of users and only requires a single speaker and a microphone array. Presence detection. Researchers are able to identify the presence of people in a room with a wireless motion sensor, door sensors [44, 67]. Moreover, recent works [66] are able to compute the total number of people in a room with the help of some external hardware devices. However, Speaker-Sonar is successful in detecting human presence without the need of deploying a sensor in the room environment. Sonar-based Localization. There is a significant amount of research [24, 25, 43] conducted using RF for localization and activity recognition. Furthermore [57] conduct sonar-based localization with ultrasonic sound utilizing special equipment (e.g., Sterling Audio ST55, Harman Kardon SoundSticks, etc.). However, not much work has been done with ultrasonic sound using commodity devices. [48] uses smart TV’s and speakers to localize human over barriers but masks the transmit signal (i.e., audible) using music. [36] uses ultrasonic sound using the speaker on a laptop to infer various gestures of a moving object, [53] detects human motion with ultrasonic sound. Compare with the prior works [22], Speaker-Sonar does not require to retrain every time the environment changes.
Funding
  • This work is supported in part by NSF CNS-1527141, 1618493, 1838083 and 1801432
  • The IIE authors are supported in part by NSFC U1836211, 61728209, National Top-notch Youth Talents Program of China, Youth Innovation Promotion Association CAS, Beijing Nova Program, Beijing Natural Science Foundation (No JQ18011), National Frontier Science and Technology Innovation Project (No YJKYYQ20170070)
Study subjects and analysis
Alexa users: 3
The complete result can be found in Table 7 of the Appendix. Finally, 3 Alexa users and 18 Google users report using the device at a distance of 10–20m and we count them as outliers. Note that some users’ range might distribute into different sub-ranges as we just reported

pairs: 28
Once the doppler-spectrum is enhanced, we get the direction of the movement using SRP-PHAT-HSDA [35], an efficient state-of-art TDoA based localization algorithm. Given the enhanced doppler-spectrum of all microphone pairs (as we have 8 microphones, there are 28 pairs) as an input, we calculate the x, y, z coordinates (points to a direction in three-dimension) and the energy1 of the movement utilizing SRP-PHAT-HSDA. As discussed, the potential direction of movements includes noise as completely removing noise from the frequency spectrum is inconceivable

speakers: 4
For all three scenarios, we place four directional portable speakers, which we assume as the hacked speakers, in a different part of the user’s room to send out malicious commands. The attacker is in another room with 4 laptops which can send commands to each of the 4 speakers in the user’s room. For the first scenario, we randomly sent out malicious commands 50 times while no users were in the defense radius

times while no users: 50
The attacker is in another room with 4 laptops which can send commands to each of the 4 speakers in the user’s room. For the first scenario, we randomly sent out malicious commands 50 times while no users were in the defense radius. As expected, by 100% we were able to reject the attack

users: 11
Picture of Floor plan with high Furniture Density. Accuracy of Consistency Check with Small Objects user study took 45–60 minutes. In total, we tested our tool with 11 users of age 18 or older in a real living room environment (see Figure 10). Also, for each user, every action was measured 10 times; in total, we collected 110 samples per action (11 users * 10 samples). As shown in Table 6, the result is comparable to the experiment we have done ourselves (see Table 4). Note that the user study was approved by the Institutional Review Board (IRB). Accuracy of Rejection in Different Angles

Reference
  • [n. d.]. 8 common Amazon Echo problems and how to fix them. https://www.cnet.com/how-to/common-amazon-alexa-problems-andhow-to-fix-them/.
    Findings
  • [n. d.]. 8 common Amazon Echo problems – and how to fix them quickly. https://www.trustedreviews.com/opinion/amazon-echoproblems-2946622.
    Findings
  • [n. d.]. Alexa Skill Statistics. https://voicebot.ai/2018/03/22/amazon-alexa-skill-count-surpasses-30000-u-s/.
    Findings
  • [n. d.]. Amazon Alexa. https://developer.amazon.com/alexa.
    Findings
  • [n. d.]. Amazon Alexa ordered people dollhouses after hearing its name on TV. https://www.theverge.com/2017/1/7/14200210/amazonalexa-tech-news-anchor-order-dollhouse. https://www.techtimes.com/articles/203474/20170329/amazon-echo-not-responding-or-not-hearing-you-properly-here-are-possiblefixes.htm.
    Findings
  • [7] [n. d.]. Amazon Pay Is Coming To Alexa’s Skills. https://www.pymnts.com/amazon/2017/payments-are-coming-to-alexas-skills/.
    Findings
  • [8] [n. d.]. Amazon’s Next Mission: Using Alexa to Help You Pay Friends. https://www.wsj.com/articles/hey-alexa-can-you-help-amazonget-into-the-payments-business-1523007000.
    Findings
  • [9] [n. d.]. Apple Homepod. https://www.apple.com/homepod/.
    Findings
  • [10] [n. d.]. Apple Siri. https://www.apple.com/ios/siri/.
    Findings
  • [11] [n. d.]. Burger King faces backlash after ad stunt backfires. https://wgntv.com/2017/04/13/burger-kings-whopper-ad-stunt/. https://www.consumerreports.org/televisions/samsung-roku-smart-tvs-vulnerable-to-hacking-consumer-reports-finds/.
    Findings
  • [13] [n. d.]. Cortana. https://www.microsoft.com/en-us/cortana/skills. https://blog.trendmicro.com/trendlabs-security-intelligence/device-vulnerabilities-connected-home-remote-code-execution-andmore/.
    Findings
  • [15] [n. d.]. GGMM D6 Portable Speaker for Amazon Echo Dot 2nd Generation, 20W Powerful True 360 Alexa Speakers. https://goo.gl/QwyMNC.
    Findings
  • [16] [n. d.]. Google Assistant. https://assistant.google.com/.[17] [n.d.]. Google Home.https://developers.google.com/actions/smarthome.[18] [n.d.]. Raspberry Pi 3 Model B Motherboard in Amazon.https://www.amazon.com/Raspberry-Pi-RASPBERRYPI3-MODB-1GB-Model- Motherboard/dp/B01CD5VC92. [19] [n. d.]. S.C. Mom Says Baby Monitor Was Hacked; Experts Say Many Devices Are Vulnerable. https://www.npr.org/sections/thetwoway/2018/06/05/617196788/s-c-mom-says-baby-monitor-was-hacked-experts-say-many-devices-are-vulnerable.[20] [n.d.]. Smart Speaker Users Pass 50 Million in U.S.for the First Time.https://voicebot.ai/2018/06/28/smart-speaker-users-pass-50-
    Findings
  • million-in-u-s-for-the-first-time/. [21] [n. d.]. Voice Development Board For Everyone. https://www.matrix.one/products/voice.
    Findings
  • [22] Amr Alanwar, Bharathan Balaji, Yuan Tian, Shuo Yang, and Mani Srivastava. 2017. EchoSafe: Sonar-based Verifiable Interaction with Intelligent Digital Agents. In Proceedings of the 1st ACM Workshop on the Internet of Safe Things. ACM, 38–43.
    Google ScholarLocate open access versionFindings
  • [23] Jonathan Allen. 1977. Short term spectral analysis, synthesis, and modification by discrete Fourier transform. IEEE Transactions on Acoustics, Speech, and Signal Processing 25, 3 (1977), 235–238.
    Google ScholarLocate open access versionFindings
  • [24] Paramvir Bahl and Venkata N Padmanabhan. 2000. RADAR: An in-building RF-based user location and tracking system. In INFOCOM
    Google ScholarFindings
  • 2000. Nineteenth Annual Joint Conference of the IEEE Computer and Communications Societies. Proceedings. IEEE, Vol. 2. IEEE, 775–784.
    Google ScholarLocate open access versionFindings
  • [25] Paramvir Bahl, Venkata N Padmanabhan, and Anand Balachandran. 2000. Enhancements to the RADAR user location and tracking system. Microsoft Research 2, MSR-TR-2000-12 (2000), 775–784.
    Google ScholarLocate open access versionFindings
  • [26] Logan Blue, Hadi Abdullah, Luis Vargas, and Patrick Traynor. 2018. 2MA: Verifying Voice Commands via Two Microphone Authentication. In Proceedings of the 2018 on Asia Conference on Computer and Communications Security. ACM, 89–100.
    Google ScholarLocate open access versionFindings
  • [27] Logan Blue, Luis Vargas, and Patrick Traynor. 2018. Speakers for Voice Interface Security. In Proceedings of the 11th ACM Conference on Security & Privacy in Wireless and Mobile Networks. ACM, 123–133.
    Google ScholarLocate open access versionFindings
  • [28] Alessio Brutti, Maurizio Omologo, and Piergiorgio Svaizer. 2008. Comparison between different sound source localization techniques based on a real data collection. In Hands-Free Speech Communication and Microphone Arrays, 2008. HSCMA 2008. IEEE, 69–72.
    Google ScholarFindings
  • [29] Nicholas Carlini, Pratyush Mishra, Tavish Vaidya, Yuankai Zhang, Micah Sherr, Clay Shields, David Wagner, and Wenchao Zhou. 2016. Hidden Voice Commands.. In USENIX Security Symposium. 513–530.
    Google ScholarLocate open access versionFindings
  • [30] Subhadeep Chakraborty. 2013. Advantages of Blackman window over Hamming window method for designing FIR filter. International Journal of Computer Science & Engineering Technology 4, 08 (2013).
    Google ScholarLocate open access versionFindings
  • [31] Phillip L De Leon, Michael Pucher, Junichi Yamagishi, Inma Hernaez, and Ibon Saratxaga. 2012. Evaluation of speaker verification security and detection of HMM-based synthetic speech. IEEE Transactions on Audio, Speech, and Language Processing 20, 8 (2012), 2280–2290.
    Google ScholarLocate open access versionFindings
  • [32] Wenrui Diao, Xiangyu Liu, Zhe Zhou, and Kehuan Zhang. 2014. Your voice assistant is mine: How to abuse speakers to steal information and control your phone. In Proceedings of the 4th ACM Workshop on Security and Privacy in Smartphones & Mobile Devices. ACM, 63–74.
    Google ScholarLocate open access versionFindings
  • [33] HH Dodge, NC Mattek, Daniel Austin, TL Hayes, and JA Kaye. 2012. In-home walking speeds and variability trajectories associated with mild cognitive impairment. Neurology 78, 24 (2012), 1946–1952.
    Google ScholarLocate open access versionFindings
  • [34] Giorgio Franceschetti, James Tatoian, David Giri, and George Gibbs. 2007. Timed arrays and their application to impulse SAR for “through-the-wall” imaging. In Ultra-Wideband, Short-Pulse Electromagnetics 7.
    Google ScholarLocate open access versionFindings
  • [35] François Grondin and François Michaud. 2019. Lightweight and optimized sound source localization and tracking methods for open and closed microphone array configurations. Robotics and Autonomous Systems 113 (2019), 63–80.
    Google ScholarLocate open access versionFindings
  • [36] Sidhant Gupta, Daniel Morris, Shwetak Patel, and Desney Tan. 2012. Soundwave: using the doppler effect to sense gestures. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM, 1911–1914.
    Google ScholarLocate open access versionFindings
  • [37] Willem D Hackmann. 1986. Sonar research and naval warfare 1914-1954: A case study of a twentieth-century establishment science. Historical Studies in the Physical and Biological Sciences 16, 1 (1986), 83–110.
    Google ScholarLocate open access versionFindings
  • [38] Michael Heideman, Don Johnson, and C Burrus. 1984. Gauss and the history of the fast Fourier transform. IEEE ASSP Magazine 1, 4 (1984), 14–21.
    Google ScholarLocate open access versionFindings
  • [39] Artur Janicki, Federico Alegre, and Nicholas Evans. 2016. An assessment of automatic speaker verification vulnerabilities to replay spoofing attacks. Security and Communication Networks 9, 15 (2016), 3030–3044.
    Google ScholarLocate open access versionFindings
  • [40] Chaouki Kasmi and Jose Lopes Esteves. 2015. IEMI threats for information security: Remote command injection on modern smartphones. IEEE Transactions on Electromagnetic Compatibility 57, 6 (2015), 1752–1755.
    Google ScholarLocate open access versionFindings
  • [41] Tomi Kinnunen, Md Sahidullah, Héctor Delgado, Massimiliano Todisco, Nicholas Evans, Junichi Yamagishi, and Kong Aik Lee. 2017. The asvspoof 2017 challenge: Assessing the limits of replay spoofing attack detection. (2017).
    Google ScholarFindings
  • [42] Malcolm Llewellyn-Jones. 2006. The Royal Navy and anti-submarine warfare, 1917-49. Routledge London.
    Google ScholarFindings
  • [43] K Lorincz and M Welsh. 2004. “A Robust, Decentralized Approach to RF-Based Location Tracking,”Harvard University, Cambridge. MA, Tech. Rep. TR-19-04, Tech. Rep. (2004).
    Google ScholarFindings
  • [44] Jiakang Lu, Tamim Sookoor, Vijay Srinivasan, Ge Gao, Brian Holben, John Stankovic, Eric Field, and Kamin Whitehouse. 2010. The smart thermostat: using occupancy sensors to save energy in homes. In Proceedings of the 8th ACM Conference on Embedded Networked Sensor Systems. ACM, 211–224.
    Google ScholarLocate open access versionFindings
  • [45] Dibya Mukhopadhyay, Maliheh Shirvanian, and Nitesh Saxena. 2015. All your voices are belong to us: Stealing voices to fool humans and machines. In European Symposium on Research in Computer Security. Springer, 599–621.
    Google ScholarLocate open access versionFindings
  • [46] Soumya Nag, Mark A Barnes, Tim Payment, and Gary Holladay. 2002. Ultrawideband through-wall radar for detecting the motion of people in real time. In Radar Sensor Technology and Data Visualization, Vol. 4744. International Society for Optics and Photonics, 48–58.
    Google ScholarLocate open access versionFindings
  • [47] Rajalakshmi Nandakumar, Vikram Iyer, Desney Tan, and Shyamnath Gollakota. 2016. Fingerio: Using active sonar for fine-grained finger tracking. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems. ACM, 1515–1525.
    Google ScholarLocate open access versionFindings
  • [48] Rajalakshmi Nandakumar, Alex Takakuwa, Tadayoshi Kohno, and Shyamnath Gollakota. 2017. Covertband: Activity information leakage using music. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 1, 3 (2017), 87.
    Google ScholarLocate open access versionFindings
  • [49] Stefan Niewiadomski. 2013. Filter handbook: a practical design guide. Newnes.
    Google ScholarFindings
  • [50] Ali Pourmohammad and Seyed Mohammad Ahadi. 2012. Real time high accuracy 3-D PHAT-based sound source localization using a simple 4-microphone arrangement. IEEE Systems Journal 6, 3 (2012), 455–468.
    Google ScholarLocate open access versionFindings
  • [51] Nirupam Roy, Haitham Hassanieh, and Romit Roy Choudhury. 2017. Backdoor: Making microphones hear inaudible sounds. In Proceedings of the 15th Annual International Conference on Mobile Systems, Applications, and Services. ACM, 2–14.
    Google ScholarLocate open access versionFindings
  • [52] Nirupam Roy, Sheng Shen, Haitham Hassanieh, and Romit Roy Choudhury. 2018. Inaudible Voice Commands: The Long-Range Attack and Defense. In 15th {USENIX} Symposium on Networked Systems Design and Implementation ({NSDI} 18). USENIX Association, 547–560.
    Google ScholarLocate open access versionFindings
  • [53] James M Sabatier and Alexander E Ekimov. 2006. Ultrasonic methods for human motion detection. Technical Report. MISSISSIPPI UNIV
    Google ScholarFindings
  • [54] Ville Pekka Sivonen. 2007. Directional loudness and binaural summation for wideband and reverberant sounds. The Journal of the Acoustical Society of America 121, 5 (2007), 2852–2861.
    Google ScholarLocate open access versionFindings
  • [55] Liwei Song and Prateek Mittal. 2017. Inaudible voice commands. arXiv:1708.07238 (2017).
    Findings
  • [56] Petre Stoica, Randolph L Moses, et al. 2005. Spectral analysis of signals. (2005).
    Google ScholarFindings
  • [57] Stephen P Tarzia, Robert P Dick, Peter A Dinda, and Gokhan Memik. 2009. Sonar-based measurement of user presence and attention. In Proceedings of the 11th international conference on Ubiquitous computing. ACM, 89–92.
    Google ScholarLocate open access versionFindings
  • [58] Kazuo Toraichi, Masaru Kamada, Shuichi Itahashi, and Ryoichi Mori. 1989. Window functions represented by B-spline functions. IEEE Transactions on Acoustics, Speech, and Signal Processing 37, 1 (1989), 145–147.
    Google ScholarLocate open access versionFindings
  • [59] Erkam Uzun, Simon Pak Ho Chung, Irfan Essa, and Wenke Lee. 2018. rtCaptcha: A Real-Time CAPTCHA Based Liveness Detection
    Google ScholarFindings
  • [60] Tavish Vaidya, Yuankai Zhang, Micah Sherr, and Clay Shields. 2015. Cocaine noodles: exploiting the gap between human and machine speech recognition. WOOT 15 (2015), 10–11.
    Google ScholarLocate open access versionFindings
  • [61] Charles Van Loan. 1992. Computational frameworks for the fast Fourier transform. Vol. 10. Siam.
    Google ScholarFindings
  • [62] Jesús Villalba and Eduardo Lleida. 2010. Speaker verification performance degradation against spoofing and tampering attacks. In FALA
    Google ScholarLocate open access versionFindings
  • [63] Zhizheng Wu, Sheng Gao, Eng Siong Cling, and Haizhou Li. 2014. A study on replay attack and anti-spoofing for text-dependent speaker verification.. In APSIPA. 1–5.
    Google ScholarLocate open access versionFindings
  • [64] Zhizheng Wu, Tomi Kinnunen, Nicholas Evans, Junichi Yamagishi, Cemal Hanilçi, Md Sahidullah, and Aleksandr Sizov. 2015. ASVspoof 2015: the first automatic speaker verification spoofing and countermeasures challenge. In Sixteenth Annual Conference of the International Speech Communication Association.
    Google ScholarLocate open access versionFindings
  • [65] Junichi Yamagishi, Takao Kobayashi, Nakano Yuji, Katsumi Ogata, and Juri Isogai. 2009. Analysis of speaker adaptation algorithms for HMM-based speech synthesis and a constrained SMAPLR adaptation algorithm. (2009).
    Google ScholarFindings
  • [66] Danny B Yang, Leonidas J Guibas, et al. 2003. Counting people in crowds with a real-time network of simple image sensors. In null. IEEE, 122.
    Google ScholarLocate open access versionFindings
  • [67] Longqi Yang, Kevin Ting, and Mani B Srivastava. 2014. Inferring occupancy from opportunistically available sensor data. In Pervasive Computing and Communications (PerCom), 2014 IEEE International Conference on. IEEE, 60–68.
    Google ScholarLocate open access versionFindings
  • [68] Yunqiang Yang and Aly E Fathy. 2005. See-through-wall imaging using ultra wideband short-pulse radar system. In Antennas and Propagation Society International Symposium, 2005 IEEE, Vol. 3. IEEE, 334–337.
    Google ScholarLocate open access versionFindings
  • [69] Xuejing Yuan, Yuxuan Chen, Yue Zhao, Yunhui Long, Xiaokang Liu, Kai Chen, Shengzhi Zhang, Heqing Huang, Xiaofeng Wang, and Carl A Gunter. 2018. CommanderSong: A Systematic Approach for Practical Adversarial Voice Recognition. arXiv preprint arXiv:1801.08535 (2018).
    Findings
  • [70] Guoming Zhang, Chen Yan, Xiaoyu Ji, Tianchen Zhang, Taimin Zhang, and Wenyuan Xu. 2017. DolphinAttack: Inaudible voice commands. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security. ACM, 103–117.
    Google ScholarLocate open access versionFindings
  • [71] Linghan Zhang, Sheng Tan, and Jie Yang. 2017. Hearing your voice is not enough: An articulatory gesture based liveness detection for voice authentication. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security. ACM, 57–71. (1) What Virtual Personal Services do you use at home? (a) Google Home (b) Amazon Alexa (c) Both (d) Other
    Google ScholarLocate open access versionFindings
  • (2) Approximately now long have you used Alexa or Google Home? (a) 0 - 3 months (b) 3 - 12 months (c) 1 - 2 years (d) 2 - 3 years
    Google ScholarFindings
Your rating :
0

 

Tags
Comments