USENIX Security '23 Technical Sessions

USENIX Security '23 is SOLD OUT.

Please do not plan to walk into the venue and register on site.
The event has reached maximum physical capacity, and we will not be able to accommodate any additional registrations.

Attendee Files 
USENIX Security '23 Attendee List (PDF)
Display:

Wednesday, August 9

7:45 am–8:45 am

Continental Breakfast

Platinum Foyer

8:45 am–9:15 am

Opening Remarks and Awards

Platinum Salon 5-6

9:15 am–9:30 am

Short Break

Platinum Foyer

9:30 am–10:30 am

Track 1

Breaking Wireless Protocols

Session Chair: Nils Ole Tippenhauer, CISPA

Platinum Salon 6

PhyAuth: Physical-Layer Message Authentication for ZigBee Networks

Ang Li and Jiawei Li, Arizona State University; Dianqi Han, University of Texas at Arlington; Yan Zhang, The University of Akron; Tao Li, Indiana University–Purdue University Indianapolis; Ting Zhu, The Ohio State University; Yanchao Zhang, Arizona State University

Available Media

ZigBee is a popular wireless communication standard for Internet of Things (IoT) networks. Since each ZigBee network uses hop-by-hop network-layer message authentication based on a common network key, it is highly vulnerable to packet-injection attacks, in which the adversary exploits the compromised network key to inject arbitrary fake packets from any spoofed address to disrupt network operations and consume the network/device resources. In this paper, we present PhyAuth, a PHY hop-by-hop message authentication framework to defend against packet-injection attacks in ZigBee networks. The key idea of PhyAuth is to let each ZigBee transmitter embed into its PHY signals a PHY one-time password (called POTP) derived from a device-specific secret key and an efficient cryptographic hash function. An authentic POTP serves as the transmitter's PHY transmission permission for the corresponding packet. PhyAuth provides three schemes to embed, detect, and verify POTPs based on different features of ZigBee PHY signals. In addition, PhyAuth involves lightweight PHY signal processing and no change to the ZigBee protocol stack. Comprehensive USRP experiments confirm that PhyAuth can efficiently detect fake packets with very low false-positive and false-negative rates while having a negligible negative impact on normal data transmissions.

Time for Change: How Clocks Break UWB Secure Ranging

Claudio Anliker, Giovanni Camurati, and Srdjan Čapkun, ETH Zurich

Available Media

Due to its suitability for wireless ranging, Ultra-Wide Band (UWB) has gained traction over the past years. UWB chips have been integrated into consumer electronics and considered for security-relevant use cases, such as access control or contactless payments. However, several publications in the recent past have shown that it is difficult to protect the integrity of distance measurements on the physical layer. In this paper, we identify transceiver clock imperfections as a new, important parameter that has been widely ignored so far. We present Mix-Down and Stretch-and-Advance, two novel attacks against the current (IEEE 802.15.4z) and the upcoming (IEEE 802.15.4ab) UWB standard, respectively. We demonstrate Mix-Down on commercial chips and achieve distance reductions from 10 m to 0 m. For the Stretch-and-Advance attack, we show analytically that the current proposal of IEEE 802.15.4ab allows reductions of over 90 m. To prevent the attack, we propose and analyze an effective countermeasure.

Formal Analysis and Patching of BLE-SC Pairing

Min Shi, Jing Chen, Kun He, Haoran Zhao, Meng Jia, and Ruiying Du, Wuhan University

Available Media

Bluetooth Low Energy (BLE) is the mainstream Bluetooth standard and BLE Secure Connections (BLC-SC) pairing is a protocol that authenticates two Bluetooth devices and derives a shared secret key between them. Although BLE-SC pairing employs well-studied cryptographic primitives to guarantee its security, a recent study revealed a logic flaw in the protocol.

In this paper, we develop the first comprehensive formal model of the BLE-SC pairing protocol. Our model is compliant with the latest Bluetooth specification version 5.3 and covers all association models in the specification to discover attacks caused by the interplay between different association models. We also partly loosen the perfect cryptography assumption in traditional symbolic analysis approaches by designing a low-entropy key oracle to detect attacks caused by the poorly derived keys. Our analysis confirms two existing attacks and discloses a new attack. We propose a countermeasure to fix the flaws found in the BLE-SC pairing protocol and discuss the backward compatibility. Moreover, we extend our model to verify the countermeasure, and the results demonstrate its effectiveness in our extended model.

Framing Frames: Bypassing Wi-Fi Encryption by Manipulating Transmit Queues

Domien Schepers and Aanjhan Ranganathan, Northeastern University; Mathy Vanhoef, imec-DistriNet, KU Leuven

Available Media

Wi-Fi devices routinely queue frames at various layers of the network stack before transmitting, for instance, when the receiver is in sleep mode. In this work, we investigate how Wi-Fi access points manage the security context of queued frames. By exploiting power-save features, we show how to trick access points into leaking frames in plaintext, or encrypted using the group or an all-zero key. We demonstrate resulting attacks against several open-source network stacks. We attribute our findings to the lack of explicit guidance in managing security contexts of buffered frames in the 802.11 standards. The unprotected nature of the power-save bit in a frame’s header, which our work reveals to be a fundamental design flaw, also allows an adversary to force queue frames intended for a specific client resulting in its disconnection and trivially executing a denial-of-service attack. Furthermore, we demonstrate how an attacker can override and control the security context of frames that are yet to be queued. This exploits a design flaw in hotspot-like networks and allows the attacker to force an access points to encrypt yet to be queued frames using an adversary-chosen key, thereby bypassing Wi-Fi encryption entirely. Our attacks have a widespread impact as they affect various devices and operating systems (Linux, FreeBSD, iOS, and Android) and because they can be used to hijack TCP connections or intercept client and web traffic. Overall, we highlight the need for transparency in handling security context across the network stack layers and the challenges in doing so.

Track 2

Interpersonal Abuse

Session Chair: Danny Y. Huang, New York University

Platinum Salon 5

Abuse Vectors: A Framework for Conceptualizing IoT-Enabled Interpersonal Abuse

Sophie Stephenson and Majed Almansoori, University of Wisconsin--Madison; Pardis Emami-Naeini, Duke University; Danny Yuxing Huang, New York University; Rahul Chatterjee, University of Wisconsin--Madison

Available Media

Tech-enabled interpersonal abuse (IPA) is a pervasive problem. Abusers, often intimate partners, use tools such as spyware to surveil and harass victim-survivors. Unfortunately, anecdotal evidence suggests that smart, Internet-connected devices such as home thermostats, cameras, and Bluetooth item finders may similarly be used against victim-survivors of IPA. To tackle abuse involving smart devices, it is vital that we understand the ecosystem of smart devices that enable IPA. Thus, in this work, we conduct a large-scale qualitative analysis of the smart devices used in IPA. We systematically crawl Google Search results to uncover web pages discussing how abusers use smart devices to enact IPA. By analyzing these web pages, we identify 32 devices used for IPA and detail the varied strategies abusers use for spying and harassment via these devices. Then, we design a simple, yet powerful framework—abuse vectors—which conceptualizes IoT-enabled IPA as four overarching patterns: Covert Spying, Unauthorized Access, Repurposing, and Intended Use. Using this lens, we pinpoint the necessary solutions required to address each vector of IoT abuse and encourage the security community to take action.

The Digital-Safety Risks of Financial Technologies for Survivors of Intimate Partner Violence

Rosanna Bellini, Cornell University; Kevin Lee, Princeton University; Megan A. Brown, Center for Social Media and Politics, New York University; Jeremy Shaffer, Cornell University; Rasika Bhalerao, Northeastern University; Thomas Ristenpart, Cornell Tech

Available Media

Digital technologies play a growing role in exacerbating financial abuse for survivors of intimate partner violence (IPV). While abusers of IPV rarely employ advanced technological attacks that go beyond interacting via standard user interfaces, scant research has examined how consumer-facing financial technologies can facilitate or obstruct IPV-related attacks on a survivor's financial well-being. Through an audit of 13 mobile banking and 17 peer-to-peer payment smartphone applications and their associated usage policies, we simulated both close-range and remote attacks commonly used by IPV adversaries. We discover that mobile banking and peer-to-peer payment applications are generally ill-equipped to deal with user-interface bound (UI-bound) adversaries, permitting unauthorized access to logins, surreptitious surveillance, and, harassing messages and system prompts.

To assess our discoveries, we interviewed 12 financial professionals who offer or oversee frontline services for vulnerable customers. While professionals expressed an interest in implementing mitigation strategies, they also highlight barriers to institutional approaches to intimate threats, and question professional responsibilities for digital safety. We conclude by providing recommendations for how digital financial service providers may better address UI-bound threats, and offer broader considerations for professional auditing and evaluation approaches to technology-facilitated abuse.

"It's the Equivalent of Feeling Like You're in Jail”: Lessons from Firsthand and Secondhand Accounts of IoT-Enabled Intimate Partner Abuse

Sophie Stephenson and Majed Almansoori, University of Wisconsin—Madison; Pardis Emami-Naeini, Duke University; Rahul Chatterjee, University of Wisconsin—Madison

Available Media

Victim-survivors of intimate partner violence (IPV) are facing a new technological threat: Abusers are leveraging IoT devices such as smart thermostats, hidden cameras, and GPS trackers to spy on and harass victim-survivors. Though prior work provides a foundation of what IoT devices can be involved in intimate partner violence, we lack a detailed understanding of the factors which contribute to this IoT abuse, the strategies victim-survivors use to mitigate IoT abuse, and the barriers they face along the way. Without this information, it is challenging to design effective solutions to stop IoT abuse.

To fill this gap, we interviewed 20 participants with firsthand or secondhand experience with IoT abuse. Our interviews captured 39 varied instances of IoT abuse, from surveillance with hidden GPS trackers to harassment with smart thermostats and light bulbs. They also surfaced 21 key barriers victim-survivors face while coping with IoT abuse. For instance, victim-survivors struggle to find proof of the IoT abuse they experience, which makes mitigations challenging. Even with proof, victim-survivors face barriers mitigating the abuse; for example, mitigation is all but impossible for victim-survivors living with an abusive partner. Our findings pinpoint several solutions to combat IoT abuse, including increased transparency of IoT devices, updated IoT access control protocols, and raising awareness of IoT abuse.

Sneaky Spy Devices and Defective Detectors: The Ecosystem of Intimate Partner Surveillance with Covert Devices

Rose Ceccio and Sophie Stephenson, University of Wisconsin—Madison; Varun Chadha, Capital One; Danny Yuxing Huang, New York University; Rahul Chatterjee, University of Wisconsin—Madison

Available Media

Recent anecdotal evidence suggests that abusers have begun to use covert spy devices such as nanny cameras, item trackers, and audio recorders to spy on and stalk their partners. Currently, it is difficult to combat this type of intimate partner surveillance (IPS) because we lack an understanding of the prevalence and characteristics of commercial spy devices. Additionally, it is unclear whether existing devices, apps, and tools designed to detect covert devices are effective. We observe that many spy devices and detectors can be found on mainstream retailers. Thus, in this work, we perform a systematic survey of spy devices and detection tools sold through popular US retailers. We gather 2,228 spy devices, 1,313 detection devices, and 51 detection apps, then study a representative sample through qualitative analysis as well as in-lab evaluations.

Our results show a bleak picture of the IPS ecosystem. Not only can commercial spy devices easily be used for IPS, but many of them are advertised for use in IPS and other covert surveillance. On the other hand, commercial detection devices and apps are all but defective, and while recent academic detection systems show promise, they require much refinement before they can be useful to survivors. We urge the security community to take action by designing practical, usable detection tools to detect hidden spy devices.

Track 3

Inferring User Details

Session Chair: Lujo Bauer, Carnegie Mellon University

Platinum Salon 7–8

Towards a General Video-based Keystroke Inference Attack

Zhuolin Yang, Yuxin Chen, and Zain Sarwar, University of Chicago; Hadleigh Schwartz, Columbia University; Ben Y. Zhao and Haitao Zheng, University of Chicago

Available Media

A large collection of research literature has identified the privacy risks of keystroke inference attacks that use statistical models to extract content typed onto a keyboard. Yet existing attacks cannot operate in realistic settings, and rely on strong assumptions of labeled training data, knowledge of keyboard layout, carefully placed sensors or data from other side-channels. This paper describes experiences developing and evaluating a general, video-based keystroke inference attack that operates in common public settings using a single commodity camera phone, with no pretraining, no keyboard knowledge, no local sensors, and no side-channels. We show that using a self-supervised approach, noisy finger tracking data from a video can be processed, labeled and filtered to train DNN keystroke inference models that operate accurately on the same video. Using IRB approved user studies, we validate attack efficacy across a variety of environments, keyboards, and content, and users with different typing behaviors and abilities. Our project website is located at: https://sandlab.cs.uchicago.edu/keystroke/.

Going through the motions: AR/VR keylogging from user head motions

Carter Slocum, Yicheng Zhang, Nael Abu-Ghazaleh, and Jiasi Chen, University of California, Riverside

Available Media

Augmented Reality/Virtual Reality (AR/VR) are the next step in the evolution of ubiquitous computing after personal computers to mobile devices. Applications of AR/VR continue to grow, including education and virtual workspaces, increasing opportunities for users to enter private text, such as passwords or sensitive corporate information. In this work, we show that there is a serious security risk of typed text in the foreground being inferred by a background application, without requiring any special permissions. The key insight is that a user’s head moves in subtle ways as she types on a virtual keyboard, and these motion signals are sufficient for inferring the text that a user types. We develop a system, TyPose, that extracts these signals and automatically infers words or characters that a victim is typing. Once the sensor signals are collected, TyPose uses machine learning to segment the motion signals in time to determine word/character boundaries, and also perform inference on the words/characters themselves. Our experimental evaluation on commercial AR/VR headsets demonstrate the feasibility of this attack, both in situations where multiple users’ data is used for training (82% top-5 word classification accuracy) or when the attack is personalized to a particular victim (92% top-5 word classification accuracy). We also show that first-line defenses of reducing the sampling rate or precision of head tracking are ineffective, suggesting that more sophisticated mitigations are needed.

Auditory Eyesight: Demystifying μs-Precision Keystroke Tracking Attacks on Unconstrained Keyboard Inputs

Yazhou Tu, Liqun Shan, and Md Imran Hossen, University of Louisiana at Lafayette; Sara Rampazzi and Kevin Butler, University of Florida; Xiali Hei, University of Louisiana at Lafayette

Available Media

In various scenarios from system login to writing emails, documents, and forms, keyboard inputs carry alluring data such as passwords, addresses, and IDs. Due to commonly existing non-alphabetic inputs, punctuation, and typos, users' natural inputs rarely contain only constrained, purely alphabetic keys/words. This work studies how to reveal unconstrained keyboard inputs using auditory interfaces.

Audio interfaces are not intended to have the capability of light sensors such as cameras to identify compactly located keys. Our analysis shows that effectively distinguishing the keys can require a fine localization precision level of keystroke sounds close to the range of microseconds. This work (1) explores the limits of audio interfaces to distinguish keystrokes, (2) proposes a μs-level customized signal processing and analysis-based keystroke tracking approach that takes into account the mechanical physics and imperfect measuring of keystroke sounds, (3) develops the first acoustic side-channel attack study on unconstrained keyboard inputs that are not purely alphabetic keys/words and do not necessarily follow known sequences in a given dictionary or training dataset, and (4) reveals the threats of non-line-of-sight keystroke sound tracking. Our results indicate that, without relying on vision sensors, attacks using limited-resolution audio interfaces can reveal unconstrained inputs from the keyboard with a fairly sharp and bendable "auditory eyesight."

Watch your Watch: Inferring Personality Traits from Wearable Activity Trackers

Noé Zufferey and Mathias Humbert, University of Lausanne, Switzerland; Romain Tavenard, University of Rennes, CNRS, LETG, France; Kévin Huguenin, University of Lausanne, Switzerland

Available Media

Wearable devices, such as wearable activity trackers (WATs), are increasing in popularity. Although they can help to improve one's quality of life, they also raise serious privacy issues. One particularly sensitive type of information has recently attracted substantial attention, namely personality, as it provides a means to influence individuals (e.g., voters in the Cambridge Analytica scandal). This paper presents the first empirical study to show a significant correlation between WAT data and personality traits (Big Five). We conduct an experiment with 200+ participants. The ground truth was established by using the NEO-PI-3 questionnaire. The participants' step count, heart rate, battery level, activities, sleep time, etc. were collected for four months. By following a principled machine-learning approach, the participants' personality privacy was quantified. Our results demonstrate that WATs data brings valuable information to infer the openness, extraversion, and neuroticism personality traits. We further study the importance of the different features (i.e., data types) and found that step counts play a key role in the inference of extraversion and neuroticism, while openness is more related to heart rate.

Track 4

Adversarial ML beyond ML

Session Chair: Birhanu Eshete, University of Michigan, Dearborn

Platinum Salon 9–10

Squint Hard Enough: Attacking Perceptual Hashing with Adversarial Machine Learning

Jonathan Prokos, Johns Hopkins University; Neil Fendley, Johns Hopkins University Applied Physics Laboratory; Matthew Green, Johns Hopkins University; Roei Schuster, Vector Institute; Eran Tromer, Tel Aviv University and Columbia University; Tushar Jois and Yinzhi Cao, Johns Hopkins University

Available Media

Many online communications systems use perceptual hash matching systems to detect illicit files in user content. These systems employ specialized perceptual hash functions such as Microsoft's PhotoDNA or Facebook's PDQ to produce a compact digest of an image file that can be approximately compared to a database of known illicit-content digests. Recently, several proposals have suggested that hash-based matching systems be incorporated into client-side and end-to-end encrypted (E2EE) systems: in these designs, files that register as illicit content will be reported to the provider, while the remaining content will be sent confidentially. By using perceptual hashing to determine confidentiality guarantees, this new setting significantly changes the function of existing perceptual hashing — thus motivating the need to evaluate these functions from an adversarial perspective, using their perceptual capabilities against them. For example, an attacker may attempt to trigger a match on innocuous, but politically-charged, content in an attempt to stifle speech.

In this work we develop threat models for perceptual hashing algorithms in an adversarial setting, and present attacks against the two most widely deployed algorithms: PhotoDNA and PDQ. Our results show that it is possible to efficiently generate targeted second-preimage attacks in which an attacker creates a variant of some source image that matches some target digest. As a complement to this main result, we also further investigate the production of images that facilitate detection avoidance attacks, continuing a recent investigation of Jain et al. Our work shows that existing perceptual hash functions are likely insufficiently robust to survive attacks on this new setting.

How to Cover up Anomalous Accesses to Electronic Health Records

Xiaojun Xu, Qingying Hao, Zhuolin Yang, and Bo Li, University of Illinois at Urbana-Champaign; David Liebovitz, Northwestern University; Gang Wang and Carl A. Gunter, University of Illinois at Urbana-Champaign

Available Media

Illegitimate access detection systems in hospital logs perform post hoc detection instead of runtime access restriction to allow widespread access in emergencies. We study the effectiveness of adversarial machine learning strategies against such detection systems on a large-scale dataset consisting of a year of access logs at a major hospital. We study a range of graph-based anomaly detection systems, including heuristic-based and Graph Neural Network (GNN)-based models. We find that evasion attacks, in which covering accesses (that is, accesses made to disguise a target access) are injected during evaluation period of the target access, can successfully fool the detection system. We also show that such evasion attacks can transfer among different detection algorithms. On the other hand, we find that poisoning attacks, in which adversaries inject covering accesses during the training phase of the model, do not effectively mislead the trained detection system unless the attacker is given unrealistic capabilities such as injecting over 10,000 accesses or imposing a high weight on the covering accesses in the training algorithm. To examine the generalizability of the results, we also apply our attack against a state-of-the-art detection model on the LANL network lateral movement dataset, and observe similar conclusions.

KENKU: Towards Efficient and Stealthy Black-box Adversarial Attacks against ASR Systems

Xinghui Wu, Xi'an Jiaotong University; Shiqing Ma, University of Massachusetts Amherst; Chao Shen and Chenhao Lin, Xi'an Jiaotong University; Qian Wang, Wuhan University; Qi Li, Tsinghua University; Yuan Rao, Xi'an Jiaotong University

Available Media

Prior researchers show that existing automatic speech recognition (ASR) systems are vulnerable to adversarial examples. Most existing adversarial attacks against ASR systems are either white- or gray-box, limiting their practical usage in the real world. Some black-box attacks also assume the knowledge of output probability vectors to infer output distribution. Other black-box attacks leverage inefficient heavyweight processes, i.e., training auxiliary models or estimating gradients. Moreover, they require input-specific and manual hyperparameter tuning to improve the attack success rate against a specific ASR system. Despite such a heavyweight tuning process, nearly or even more than half of the generated adversarial examples are perceptible to humans.

This paper designs KENKU, an efficient and stealthy black-box adversarial attack framework against ASRs, supporting hidden voice command and integrated command attacks. It optimizes the novel acoustic feature loss and perturbation loss, based on Mel-frequency Cepstral Coefficients (MFCC). Both loss values can be calculated locally, avoiding training auxiliary models or estimating gradients, making the attack efficient. Furthermore, we introduce a hyperparameter in optimization that balances the attack effectiveness and imperceptibility automatically. KENKU uses the binary search algorithm to find its optimal value. We evaluated our prototype on eight real-world systems (including five digital and three physical attacks) and compared KENKU with five state-of-the-art works. Results show that KENKU can outperform existing works in the attack performance.

Tubes Among Us: Analog Attack on Automatic Speaker Identification

Shimaa Ahmed and Yash Wani, University of Wisconsin-Madison; Ali Shahin Shamsabadi, Alan Turing Institute; Mohammad Yaghini, University of Toronto and Vector Institute; Ilia Shumailov, Vector Institute and University of Oxford; Nicolas Papernot, University of Toronto and Vector Institute; Kassem Fawaz, University of Wisconsin-Madison

Available Media

Recent years have seen a surge in the popularity of acoustics-enabled personal devices powered by machine learning. Yet, machine learning has proven to be vulnerable to adversarial examples. A large number of modern systems protect themselves against such attacks by targeting artificiality, i.e., they deploy mechanisms to detect the lack of human involvement in generating the adversarial examples. However, these defenses implicitly assume that humans are incapable of producing meaningful and targeted adversarial examples. In this paper, we show that this base assumption is wrong. In particular, we demonstrate that for tasks like speaker identification, a human is capable of producing analog adversarial examples directly with little cost and supervision: by simply speaking through a tube, an adversary reliably impersonates other speakers in eyes of ML models for speaker identification. Our findings extend to a range of other acoustic-biometric tasks such as liveness detection, bringing into question their use in security-critical settings in real life, such as phone banking.

Track 5

Private Set Operations

Session Chair: Wouter Lueks, CISPA

Platinum Salon 3–4

Efficient Unbalanced Private Set Intersection Cardinality and User-friendly Privacy-preserving Contact Tracing

Mingli Wu and Tsz Hon Yuen, The University of Hong Kong

Available Media

An unbalanced private set intersection cardinality (PSI-CA) protocol is a protocol to securely get the intersection cardinality of two sets X and Y without disclosing anything else, in which |Y| < |X|. In this paper, we propose efficient unbalanced PSI-CA protocols based on fully homomorphic encryption (FHE). To handle the long item issue in PSI-CA protocols, we invent two techniques: virtual Bloom filter and polynomial links. The former can encode a long item into several independent shorter ones. The latter fragments each long item into shorter slices and builds links between them.

Our FHE-based unbalanced PSI-CA protocols have the lowest communication complexity O(|Y|log(|X|), which is much cheaper than the existing balanced PSI-CA protocols with O(|Y|+|X|). When |X|=228 and |Y|=2048, our protocols are 172× ∼ 412× cheaper than the best balanced PSI-CA protocol. Our protocols can be easily modified into unbalanced PSI protocols. Compared with Cong et al. (CCS'21), one of our unbalanced PSI protocols can save 42.04% ∼ 58.85% communication costs and accelerate the receiver querying time.

We apply our lightweight unbalanced PSI-CA protocols to design a privacy-preserving contact tracing system. We demonstrate that our system outperforms existing schemes in terms of security and performance.

Near-Optimal Oblivious Key-Value Stores for Efficient PSI, PSU and Volume-Hiding Multi-Maps

Alexander Bienstock, New York University; Sarvar Patel and Joon Young Seo, Google; Kevin Yeo, Google and Columbia University

Available Media

In this paper, we study oblivious key-value stores (OKVS) that enable encoding n key-value pairs into length m encodings while hiding the input keys. The goal is to obtain high rate, n/m, with efficient encoding and decoding algorithms. We present RB-OKVS built on random band matrices that obtains near-optimal rates as high as 0.97 whereas prior works could only achieve rates up to 0.81 with similar encoding times.

Using RB-OKVS, we obtain state-of-the-art protocols for private set intersection (PSI) and union (PSU). Our semi-honest PSI has up to 12% smaller communication and 13% reductions in monetary cost with slightly larger computation. We also obtain similar improvements for both malicious and circuit PSI. For PSU, our protocol obtains improvements of up to 22% in communication, 40% in computation and 21% in monetary cost. In general, we obtain the most communication- and cost-efficient protocols for all the above primitives.

Finally, we present the first connection between OKVS and volume-hiding encrypted multi-maps (VH-EMM) where the goal is to outsource storage of multi-maps while hiding the number of values associated with each key (i.e., volume). We present RB-MM with 16% smaller storage, 5x faster queries and 8x faster setup than prior works.

Distance-Aware Private Set Intersection

Anrin Chakraborti, Duke University; Giulia Fanti, Carnegie Mellon University; Michael K. Reiter, Duke University

Available Media

Private set intersection (PSI) allows two mutually untrusting parties to compute an intersection of their sets, without revealing information about items that are not in the intersection. This work introduces a PSI variant called distance-aware PSI (DA-PSI) for sets whose elements lie in a metric space. DAPSI returns pairs of items that are within a specified distance threshold of each other. This paper puts forward DA-PSI constructions for two metric spaces: (i) Minkowski distance of order 1 over the set of integers (i.e., for integers a and b, their distance is |a−b|); and (ii) Hamming distance over the set of binary strings of length ℓ. In the Minkowski DA-PSI protocol, the communication complexity scales logarithmically in the distance threshold and linearly in the set size. In the Hamming DA-PSI protocol, the communication volume scales quadratically in the distance threshold and is independent of the dimensionality of string length ℓ. Experimental results with real applications confirm that DA-PSI provides more effective matching at lower cost than naïve solutions.

Linear Private Set Union from Multi-Query Reverse Private Membership Test

Cong Zhang, State Key Laboratory of Information Security, Institute of Information Engineering, Chinese Academy of Sciences; School of Cyber Security, University of Chinese Academy of Sciences; Yu Chen, School of Cyber Science and Technology, Shandong University; State Key Laboratory of Cryptology; Key Laboratory of Cryptologic Technology and Information Security, Ministry of Education, Shandong University; Weiran Liu, Alibaba Group; Min Zhang, School of Cyber Science and Technology, Shandong University; State Key Laboratory of Cryptology; Key Laboratory of Cryptologic Technology and Information Security, Ministry of Education, Shandong University; Dongdai Lin, State Key Laboratory of Information Security, Institute of Information Engineering, Chinese Academy of Sciences; School of Cyber Security, University of Chinese Academy of Sciences

Available Media

Private set union (PSU) protocol enables two parties, each holding a set, to compute the union of their sets without revealing anything else to either party. So far, there are two known approaches for constructing PSU protocols. The first mainly depends on additively homomorphic encryption (AHE), which is generally inefficient since it needs to perform a non-constant number of homomorphic computations on each item. The second is mainly based on oblivious transfer and symmetric-key operations, which is recently proposed by Kolesnikov et al. (ASIACRYPT 2019). It features good practical performance, which is several orders of magnitude faster than the first one. However, neither of these two approaches is optimal in the sense that their computation and communication complexity are not both O(n), where n is the size of the set. Therefore, the problem of constructing the optimal PSU protocol remains open.

In this work, we resolve this open problem by proposing a generic framework of PSU from oblivious transfer and a newly introduced protocol called multi-query reverse private membership test (mq-RPMT). We present two generic constructions of mq-RPMT. The first is based on symmetric-key encryption and general 2PC techniques. The second is based on re-randomizable public-key encryption. Both constructions lead to PSU with linear computation and communication complexity.

We implement our two PSU protocols and compare them with the state-of-the-art PSU. Experiments show that our PKE-based protocol has the lowest communication of all schemes, which is 3.7-14.8× lower depending on set size. The running time of our PSU scheme is 1.2-12× faster than that of state-of-the-art depending on network environments.

Track 6

Logs and Auditing

Session Chair: Davide Balzarotti, Eurecom

Platinum Salon 1–2

Auditing Frameworks Need Resource Isolation: A Systematic Study on the Super Producer Threat to System Auditing and Its Mitigation

Peng Jiang, Ruizhe Huang, Ding Li, Yao Guo, and Xiangqun Chen, MOE Key Lab of HCST, School of Computer Science, Peking University; Jianhai Luan, Yuxin Ren, and Xinwei Hu, Huawei Technologies

Available Media

System auditing is a crucial technique for detecting APT attacks. However, attackers may try to compromise the system auditing frameworks to conceal their malicious activities. In this paper, we present a comprehensive and systematic study of the super producer threat in auditing frameworks, which enables attackers to either corrupt the auditing framework or paralyze the entire system. We analyze that the main cause of the super producer threat is the lack of data isolation in the centralized architecture of existing solutions. To address this threat, we propose a novel auditing framework, NODROP, which isolates provenance data generated by different processes with a threadlet-based architecture design. Our evaluation demonstrates that NODROP can ensure the integrity of the auditing frameworks while achieving an average 6.58% higher application overhead compared to vanilla Linux and 6.30% lower application overhead compared to a state-ofthe-art commercial auditing framework, Sysdig across eight different hardware configurations.

AIRTAG: Towards Automated Attack Investigation by Unsupervised Learning with Log Texts

Hailun Ding, Rutgers University; Juan Zhai, University of Massachusetts Amherst; Yuhong Nan, Sun Yat-sen University; Shiqing Ma, University of Massachusetts Amherst

Available Media

The success of deep learning (DL) techniques has led to their adoption in many fields, including attack investigation, which aims to recover the whole attack story from logged system provenance by analyzing the causality of system objects and subjects. Existing DL-based techniques, e.g., state-of-the-art one ATLAS, follow the design of traditional forensics analysis pipelines. They train a DL model with labeled causal graphs during offline training to learn benign and malicious patterns. During attack investigation, they first convert the log data to causal graphs and leverage the trained DL model to determine if an entity is part of the whole attack chain or not. This design does not fully release the power of DL. Existing works like BERT have demonstrated the superiority of leveraging unsupervised pre-trained models, achieving stateof-the-art results without costly and error-prone data labeling. Prior DL-based attacks investigation has overlooked this opportunity. Moreover, generating and operating the graphs are time-consuming and not necessary. Based on our study, these operations take around 96% of the total analysis time, resulting in low efficiency. In addition, abstracting individual log entries to graph nodes and edges makes the analysis more coarse-grained, leading to inaccurate and unstable results. We argue that log texts provide the same information as causal graphs but are fine-grained and easier to analyze.

This paper presents AIRTAG, a novel attack investigation system. It is powered by unsupervised learning with log texts. Instead of training on labeled graphs, AIRTAG leverages unsupervised learning to train a DL model on the log texts. Thus, we do not require the heavyweight and error-prone process of manually labeling logs. During the investigation, the DL model directly takes log files as inputs and predicts entities related to the attack. We evaluated AIRTAG on 19 scenarios, including single-host and multi-host attacks. Our results show the superior efficiency and effectiveness of AIRTAG compared to existing solutions. By removing graph generation and operations, AIRTAG is 2.5x faster than the state-of-the-art method, ATLAS, with 9.0% fewer false positives and 16.5% more true positives on average.

Rethinking System Audit Architectures for High Event Coverage and Synchronous Log Availability

Varun Gandhi, Harvard University; Sarbartha Banerjee, University of Texas at Austin; Aniket Agrawal and Adil Ahmad, Arizona State University; Sangho Lee and Marcus Peinado, Microsoft Research

Available Media

Once an attacker compromises the operating system, the integrity and availability of unprotected system audit logs still kept on the computer becomes uncertain. In this paper, we ask the question: can recently proposed audit systems aimed at tackling such an attacker provide enough information for forensic analysis? Our findings suggest that the answer is no, because the inefficient logging pipelines of existing audit systems prohibit generating log entries for a vast majority of attack events and protecting logs as soon as they are created (i.e., synchronously). This leads to a low attack event coverage within generated logs, while allowing attackers to tamper with unprotected logs after a compromise. To counter these limitations, we present OMNILOG, a system audit architecture that composes an end-to-end efficient logging pipeline where logs are rapidly generated and protected using a set of platform-agnostic security abstractions. This allows OMNILOG to enable high attack event coverage and synchronous log availability, while even outperforming the state-of-the-art audit systems that achieve neither property.

Improving Logging to Reduce Permission Over-Granting Mistakes

Bingyu Shen, Tianyi Shan, and Yuanyuan Zhou, University of California, San Diego

Available Media

Access control configurations are gatekeepers to block unwelcome access to sensitive data. Unfortunately, system administrators (sysadmins) sometimes over-grant permissions when resolving unintended access-deny issues reported by legitimate users, which may open up security vulnerabilities for attackers. One of the primary reasons is that modern software does not provide informative logging to guide sysadmins to understand the reported problems.

This paper makes one of the first attempts (to the best of our knowledge) to help developers improve log messages in order to help sysadmins correctly understand and fix access-deny issues without over-granting permissions. First, we conducted an observation study to understand the current practices of access-deny logging in the server software. Our study shows that many access-control program locations do not have any log messages; and a large percentage of existing log messages lack useful information to guide sysadmins to correctly understand and fix the issues. On top of our observations, we built SECLOG, which uses static analysis to automatically help developers find missing access-deny log locations and identify relevant information at the log location.

We evaluated SECLOG with ten widely deployed server applications. Overall, SECLOG identified 380 new log statements for access-deny cases, and also enhanced 550 existing access-deny log messages with diagnostic information. We have reported 114 log statements to the developers of these applications, and so far 70 have been accepted into their main branches. We also conducted a user study with sysadmins (n=32) on six real-world access-deny issues. SECLOG can reduce the number of insecure fixes from 27 to 1, and also improve the diagnosis time by 64.2% on average.

10:30 am–11:00 am

Break with Refreshments

Platinum Foyer

11:00 am–12:00 pm

Track 1

Fighting the Robots

Session Chair: Yuan Tian, UCLA

Platinum Salon 6

Diving into Robocall Content with SnorCall

Sathvik Prasad, Trevor Dunlap, Alexander Ross, and Bradley Reaves, North Carolina State University

Available Media

Unsolicited bulk telephone calls — termed "robocalls" — nearly outnumber legitimate calls, overwhelming telephone users. While the vast majority of these calls are illegal, they are also ephemeral. Although telephone service providers, regulators, and researchers have ready access to call metadata, they do not have tools to investigate call content at the vast scale required. This paper presents SnorCall, a framework that scalably and efficiently extracts content from robocalls. SnorCall leverages the Snorkel framework that allows a domain expert to write simple labeling functions to classify text with high accuracy. We apply SnorCall to a corpus of transcripts covering 232,723 robocalls collected over a 23-month period. Among many other findings, SnorCall enables us to obtain first estimates on how prevalent different scam and legitimate robocall topics are, determine which organizations are referenced in these calls, estimate the average amounts solicited in scam calls, identify shared infrastructure between campaigns, and monitor the rise and fall of election-related political calls. As a result, we demonstrate how regulators, carriers, anti-robocall product vendors, and researchers can use SnorCall to obtain powerful and accurate analyses of robocall content and trends that can lead to better defenses.

UCBlocker: Unwanted Call Blocking Using Anonymous Authentication

Changlai Du and Hexuan Yu, Virginia Tech; Yang Xiao, University of Kentucky; Y. Thomas Hou, Virginia Tech; Angelos D. Keromytis, Georgia Institute of Technology; Wenjing Lou, Virginia Tech

Available Media

Telephone users are receiving more and more unwanted calls including spam and scam calls because of the transfer-without-verification nature of global telephone networks, which allows anyone to call any other numbers. To avoid unwanted calls, telephone users often ignore or block all incoming calls from unknown numbers, resulting in the missing of legitimate calls from new callers. This paper takes an end-to-end perspective to present a solution to block unwanted calls while allowing users to define the policies of acceptable calls. The proposed solution involves a new infrastructure based on anonymous credentials, which enables anonymous caller authentication and policy definition. Our design decouples caller authentication and call session initiation and introduces a verification code to interface and bind the two processes. This design minimizes changes to telephone networks, reduces latency to call initiation, and eliminates the need for a call-time data channel. A prototype of the system is implemented to evaluate its feasibility.

Combating Robocalls with Phone Virtual Assistant Mediated Interaction

Sharbani Pandit, Georgia Institute of Technology; Krishanu Sarker, Georgia State University; Roberto Perdisci, University of Georgia and Georgia Institute of Technology; Mustaque Ahamad and Diyi Yang, Georgia Institute of Technology

Available Media

Mass robocalls affect millions of people on a daily basis. Unfortunately, most current defenses against robocalls rely on phone blocklists and are ineffective against caller ID spoofing. To enable detection and blocking of spoofed robocalls, we propose a NLP based smartphone virtual assistant that automatically vets incoming calls. Similar to a human assistant, the virtual assistant picks up an incoming call and uses machine learning models to interact with the caller to determine if the call source is a human or a robocaller. It interrupts a user by ringing the phone only when the call is determined to be not from a robocaller. Security analysis performed by us shows that such a system can stop current and more sophisticated robocallers that might emerge in the future. We also conduct a user study that shows that the virtual assistant can preserve phone call user experience.

BotScreen: Trust Everybody, but Cut the Aimbots Yourself

Minyeop Choi, KAIST; Gihyuk Ko, Cyber Security Research Center at KAIST and Carnegie Mellon University; Sang Kil Cha, KAIST and Cyber Security Research Center at KAIST

Distinguished Paper Award Winner

Available Media

Aimbots, which assist players to kill opponents in FirstPerson Shooter (FPS) games, pose a significant threat to the game industry. Although there has been significant research effort to automatically detect aimbots, existing works suffer from either high server-side overhead or low detection accuracy. In this paper, we present a novel aimbot detection design and implementation that we refer to as BotScreen, which is a client-side aimbot detection solution for a popular FPS game, Counter-Strike: Global Offensive (CS:GO). BotScreen is the first in detecting aimbots in a distributed fashion, thereby minimizing the server-side overhead. It also leverages a novel deep learning model to precisely detect abnormal behaviors caused by using aimbots. We demonstrate the effectiveness of BotScreen in terms of both accuracy and performance on CS:GO. We make our tool as well as our dataset publicly available to support open science.

Track 2

Perspectives and Incentives

Session Chair: Daniel Zappala, Brigham Young University

Platinum Salon 5

"If I could do this, I feel anyone could:" The Design and Evaluation of a Secondary Authentication Factor Manager

Garrett Smith, Tarun Yadav, and Jonathan Dutson, Brigham Young University; Scott Ruoti, University of Tennessee Knoxville; Kent Seamons, Brigham Young University

Available Media

Two-factor authentication (2FA) defends against account compromise by protecting an account with both a password—the primary authentication factor—and a device or resource that is hard to steal—the secondary authentication factor (SAF). However, prior research shows that users need help registering their SAFs with websites and successfully enabling 2FA. To address these issues, we propose the concept of a SAF manager that helps users manage SAFs through their entire life cycle: setup, authentication, removal, replacement, and auditing. We design and implement two proof-of-concept prototypes. In a between-subjects user study (N=60), we demonstrate that our design improves users' ability to correctly and quickly setup and remove a SAF on their accounts. Qualitative results show that users responded very positively to the SAF manager and were enthusiastic about its ability to help them rapidly replace a SAF. Furthermore, our SAF manager prevented fatal errors that users experienced when not using the manager.

Exploring Privacy and Incentives Considerations in Adoption of COVID-19 Contact Tracing Apps

Oshrat Ayalon, Max Planck Institute for Software Systems; Dana Turjeman, Reichman University; Elissa M. Redmiles, Max Planck Institute for Software Systems

Available Media

Mobile Health (mHealth) apps, such as COVID-19 contact tracing and other health-promoting technologies, help support personal and public health efforts in response to the pandemic and other health concerns. However, due to the sensitive data handled by mHealth apps, and their potential effect on people's lives, their widespread adoption demands trust in a multitude of aspects of their design. In this work, we report on a series of conjoint analyses (N = 1,521) to investigate how COVID-19 contact tracing apps can be better designed and marketed to improve adoption. Specifically, with a novel design of randomization on top of a conjoint analysis, we investigate people's privacy considerations relative to other attributes when they are contemplating contact-tracing app adoption. We further explore how their adoption considerations are influenced by deployment factors such as offering extrinsic incentives (money, healthcare) and user factors such as receptiveness to contact-tracing apps and sociodemographics. Our results, which we contextualize and synthesize with prior work, offer insight into the most desired digital contact-tracing products (e.g., app features) and how they should be deployed (e.g., with incentives) and targeted to different user groups who have heterogeneous preferences.

Exploring Tenants' Preferences of Privacy Negotiation in Airbnb

Zixin Wang, Zhejiang University; Danny Yuxing Huang, New York University; Yaxing Yao, University of Maryland, Baltimore County

Available Media

Literature suggests the unmatched or conflicting privacy needs between users and bystanders in smart homes due to their different privacy concerns and priorities. A promising approach to mitigate such conflicts is through negotiation. Yet, it is not clear whether bystanders have privacy negotiation needs and if so, what factors may influence their negotiation intention and how to better support the negotiation to achieve their privacy goals. To answer these questions, we conducted a vignette study that varied across three categorical factors, including device types, device location, and duration of stay with 867 participants in the context of Airbnb. We further examined our participants' preferences regarding with whom, when, how, and why they would like to negotiate their privacy. Our findings showed that device type remained the only factor that significantly influenced our participants' negotiation intention. Additionally, we found our participants' other preferences, such as they preferred to contact Airbnb hosts first to convey their privacy needs through asynchronous channels (e.g., messages and emails). We summarized design implications to fulfill tenants' privacy negotiation needs.

Know Your Cybercriminal: Evaluating Attacker Preferences by Measuring Profile Sales on an Active, Leading Criminal Market for User Impersonation at Scale

Michele Campobasso and Luca Allodi, Eindhoven University of Technology

Available Media

In this paper we exploit market features proper of a leading Russian cybercrime market for user impersonation at scale to evaluate attacker preferences when purchasing stolen user profiles, and the overall economic activity of the market. We run our data collection over a period of $161$ days and collect data on a sample of $1'193$ sold user profiles out of $11'357$ advertised products in that period and their characteristics. We estimate a market trade volume of up to approximately $700$ profiles per day, corresponding to estimated daily sales of up to $4'000$ USD and an overall market revenue within the observation period between $540k$ and $715k$ USD. We find profile provision to be rather stable over time and mainly focused on European profiles, whereas actual profile acquisition varies significantly depending on other profile characteristics. Attackers' interests focus disproportionally on profiles of certain types, including those originating in North America and featuring Crypto resources. We model and evaluate the relative importance of different profile characteristics in the final decision of an attacker to purchase a profile, and discuss implications for defenses and risk evaluation.

Track 3

Traffic Analysis

Session Chair: Rob Jansen, U.S. Naval Research Laboratory

Platinum Salon 7–8

HorusEye: A Realtime IoT Malicious Traffic Detection Framework using Programmable Switches

Yutao Dong, Tsinghua Shenzhen International Graduate School, Shenzhen, China; Peng Cheng Laboratory, Shenzhen, China; Qing Li, Peng Cheng Laboratory, Shenzhen, China; Kaidong Wu and Ruoyu Li, Tsinghua Shenzhen International Graduate School, Shenzhen, China; Peng Cheng Laboratory, Shenzhen, China; Dan Zhao, Peng Cheng Laboratory, Shenzhen, China; Gareth Tyson, Hong Kong University of Science and Technology (GZ), Guangzhou, China; Junkun Peng, Yong Jiang, and Shutao Xia, Tsinghua Shenzhen International Graduate School, Shenzhen, China; Peng Cheng Laboratory, Shenzhen, China; Mingwei Xu, Tsinghua University, Beijing, China

Available Media

The ever-growing volume of IoT traffic brings challenges to IoT anomaly detection systems. Existing anomaly detection systems perform all traffic detection on the control plane, which struggles to scale to the growing rates of traffic. In this paper, we propose HorusEye, a high throughput and accurate two-stage anomaly detection framework. In the first stage, preliminary burst-level anomaly detection is implemented on the data plane to exploit its high-throughput capability (e.g., 100Gbps). We design an algorithm that converts a trained iForest model into white list matching rules, and implement the first unsupervised model that can detect unseen attacks on the data plane. The suspicious traffic is then reported to the control plane for further investigation. To reduce the false-positive rate, the control plane carries out the second stage, where more thorough anomaly detection is performed over the reported suspicious traffic using flow-level features and a deep detection model. We implement a prototype of HorusEye and evaluate its performance through a comprehensive set of experiments. The experimental results illustrate that the data plane can detect 99% of the anomalies and offload 76% of the traffic from the control plane. Compared with the state-of-the-art schemes, our framework has superior throughput and detection performance.

An Input-Agnostic Hierarchical Deep Learning Framework for Traffic Fingerprinting

Jian Qu, Xiaobo Ma, and Jianfeng Li, Xi’an Jiaotong University; Xiapu Luo, The Hong Kong Polytechnic University; Lei Xue, Sun Yat-sen University; Junjie Zhang, Wright State University; Zhenhua Li, Tsinghua University; Li Feng, Southwest Jiaotong University; Xiaohong Guan, Xi'an Jiaotong University

Available Media

Deep learning has proven to be promising for traffic fingerprinting that explores features of packet timing and sizes. Although well-known for automatic feature extraction, it is faced with a gap between the heterogeneousness of the traffic (i.e., raw packet timing and sizes) and the homogeneousness of the required input (i.e., input-specific). To address this gap, we design an input-agnostic hierarchical deep learning framework for traffic fingerprinting that can hierarchically abstract comprehensive heterogeneous traffic features into homogeneous vectors seamlessly digestible by existing neural networks for further classification. The extensive evaluation demonstrates that our framework, with just one paradigm, not only supports heterogeneous traffic input but also achieves better or comparable performance compared to state-of-the-art methods black across a wide range of traffic fingerprinting tasks.

Subverting Website Fingerprinting Defenses with Robust Traffic Representation

Meng Shen, School of Cyberspace Science and Technology, Beijing Institute of Technology; Kexin Ji and Zhenbo Gao, School of Computer Science, Beijing Institute of Technology; Qi Li, Institute for Network Sciences and Cyberspace, Tsinghua University; Liehuang Zhu, School of Cyberspace Science and Technology, Beijing Institute of Technology; Ke Xu, Department of Computer Science and Technology, Tsinghua University

Available Media

Anonymity networks, e.g., Tor, are vulnerable to various website fingerprinting (WF) attacks, which allows attackers to perceive user privacy on these networks. However, the defenses developed recently can effectively interfere with WF attacks, e.g., by simply injecting dummy packets. In this paper, we propose a novel WF attack called Robust Fingerprinting (RF), which enables an attacker to fingerprint the Tor traffic under various defenses. Specifically, we develop a robust traffic representation method that generates Traffic Aggregation Matrix (TAM) to fully capture key informative features leaked from Tor traces. By utilizing TAM, an attacker can train a CNN-based classifier that learns common high-level traffic features uncovered by different defenses. We conduct extensive experiments with public real-world datasets to compare RF with state-of-the-art (SOTA) WF attacks. The closed- and open-world evaluation results demonstrate that RF significantly outperforms the SOTA attacks. In particular, RF can effectively fingerprint Tor traffic under the SOTA defenses with an average accuracy improvement of 8.9% over the best existing attack (i.e., Tik-Tok).

Rosetta: Enabling Robust TLS Encrypted Traffic Classification in Diverse Network Environments with TCP-Aware Traffic Augmentation

Renjie Xie and Jiahao Cao, Tsinghua University; Enhuan Dong and Mingwei Xu, Tsinghua University and Quan Cheng Laboratory; Kun Sun, George Mason University; Qi Li and Licheng Shen, Tsinghua University; Menghao Zhang, Tsinghua University and Kuaishou Technology

Available Media

As the majority of Internet traffic is encrypted by the Transport Layer Security (TLS) protocol, recent advances leverage Deep Learning (DL) models to conduct encrypted traffic classification by automatically extracting complicated and informative features from the packet length sequences of TLS flows. Though existing DL models have reported to achieve excellent classification results on encrypted traffic, we conduct a comprehensive study to show that they all have significant performance degradation in real diverse network environments. After systematically studying the reasons, we discover the packet length sequences of flows may change dramatically due to various TCP mechanisms for reliable transmission in varying network environments. Thereafter, we propose Rosetta to enable robust TLS encrypted traffic classification for existing DL models. It leverages TCP-aware traffic augmentation mechanisms and self-supervised learning to understand implict TCP semantics, and hence extracts robust features of TLS flows. Extensive experiments show that Rosetta can significantly improve the classification performance of existing DL models on TLS traffic in diverse network environments.

Track 4

Adversarial Patches and Images

Session Chair: Yinzhi Cao, Johns Hopkins University

Platinum Salon 9–10

Towards Targeted Obfuscation of Adversarial Unsafe Images using Reconstruction and Counterfactual Super Region Attribution Explainability

Mazal Bethany, Andrew Seong, Samuel Henrique Silva, Nicole Beebe, Nishant Vishwamitra, and Peyman Najafirad, The University of Texas at San Antonio

Available Media

Online Social Networks (OSNs) are increasingly used by perpetrators to harass their targets via the exchange of unsafe images. Furthermore, perpetrators have resorted to using advanced techniques like adversarial attacks to evade the detection of such images. To defend against this threat, OSNs use AI/ML-based detectors to flag unsafe images. However, these detectors cannot explain the regions of unsafe content for the obfuscation and inspection of such regions, and are also critically vulnerable to adversarial attacks that fool their detection. In this work, we first conduct an in-depth investigation into state-of-the-art explanation techniques and commercially-available unsafe image detectors and find that they are severely deficient against adversarial unsafe images. To address these deficiencies we design a new system that performs targeted obfuscation of unsafe adversarial images on social media using reconstruction to remove adversarial perturbations and counterfactual super region attribution explainability to explain unsafe image segments, and created a prototype called ProjectName. We demonstrate the effectiveness of our system with a large-scale evaluation on three common unsafe images: Sexually Explicit, Cyberbullying, and Self-Harm. Our evaluations of ProjectName on more than 64,000 real-world unsafe OSN images, and unsafe images found in the wild such as sexually explicit celebrity deepfakes and self-harm images show that it significantly neutralizes the threat of adversarial unsafe images, by safely obfuscating 91.47% of such images.

TPatch: A Triggered Physical Adversarial Patch

Wenjun Zhu and Xiaoyu Ji, USSLAB, Zhejiang University; Yushi Cheng, BNRist, Tsinghua University; Shibo Zhang and Wenyuan Xu, USSLAB, Zhejiang University

Available Media

Autonomous vehicles increasingly utilize the vision-based perception module to acquire information about driving environments and detect obstacles. Correct detection and classification are important to ensure safe driving decisions. Existing works have demonstrated the feasibility of fooling the perception models such as object detectors and image classifiers with printed adversarial patches. However, most of them are indiscriminately offensive to every passing autonomous vehicle. In this paper, we propose TPatch, a physical adversarial patch triggered by acoustic signals. Unlike other adversarial patches, TPatch remains benign under normal circumstances but can be triggered to launch a hiding, creating or altering attack by a designed distortion introduced by signal injection attacks towards cameras. To avoid the suspicion of human drivers and make the attack practical and robust in the real world, we propose a content-based camouflage method and an attack robustness enhancement method to strengthen it. Evaluations with three object detectors, YOLO V3/V5 and Faster R-CNN, and eight image classifiers demonstrate the effectiveness of TPatch in both the simulation and the real world. We also discuss possible defenses at the sensor, algorithm, and system levels.

CAPatch: Physical Adversarial Patch against Image Captioning Systems

Shibo Zhang, USSLAB, Zhejiang University; Yushi Cheng, BNRist, Tsinghua University; Wenjun Zhu, Xiaoyu Ji, and Wenyuan Xu, USSLAB, Zhejiang University

Available Media

The fast-growing surveillance systems will make image captioning, i.e., automatically generating text descriptions of images, an essential technique to process the huge volumes of videos efficiently, and correct captioning is essential to ensure the text authenticity. While prior work has demonstrated the feasibility of fooling computer vision models with adversarial patches, it is unclear whether the vulnerability can lead to incorrect captioning, which involves natural language processing after image feature extraction. In this paper, we design CAPatch, a physical adversarial patch that can result in mistakes in the final captions, i.e., either create a completely different sentence or a sentence with keywords missing, against multi-modal image captioning systems. To make CAPatch effective and practical in the physical world, we propose a detection assurance and attention enhancement method to increase the impact of CAPatch and a robustness improvement method to address the patch distortions caused by image printing and capturing. Evaluations on three commonly-used image captioning systems (Show-and-Tell, Self-critical Sequence Training: Att2in, and Bottom-up Top-down) demonstrate the effectiveness of CAPatch in both the digital and physical worlds, whereby volunteers wear printed patches in various scenarios, clothes, lighting conditions. With a size of 5% of the image, physically-printed CAPatch can achieve continuous attacks with an attack success rate higher than 73.1% over a video recorder.

Hard-label Black-box Universal Adversarial Patch Attack

Guanhong Tao, Shengwei An, Siyuan Cheng, Guangyu Shen, and Xiangyu Zhang, Purdue University

Available Media

Deep learning models are widely used in many applications. Despite their impressive performance, the security aspect of these models has raised serious concerns. Universal adversarial patch attack is one of the security problems in deep learning, where an attacker can generate a patch trigger on pre-trained models using gradient information. Whenever the trigger is pasted on an input, the model will misclassify it to a target label. Existing attacks are realized with access to the model's gradient or its output confidence. In this paper, we propose a novel attack method HardBeat that generates universal adversarial patches with access only to the predicted label. It utilizes historical data points during the search for an optimal patch trigger and performs focused/directed search through a novel importance-aware gradient approximation to explore the neighborhood of the current trigger. The evaluation is conducted on four popular image datasets with eight models and two online commercial services. The experimental results show HardBeat is significantly more effective than eight baseline attacks, having more than twice high-ASR (attack success rate) patch triggers (>90%) on local models and 17.5% higher ASR on online services. Three existing advanced defense techniques fail to defend against HardBeat.

Track 5

Decentralized Finance

Session Chair: Joel Reardon, University of Calgary

Platinum Salon 3–4

Anatomy of a High-Profile Data Breach: Dissecting the Aftermath of a Crypto-Wallet Case

Svetlana Abramova and Rainer Böhme, Universität Innsbruck

Available Media

Media reports show an alarming increase of data breaches at providers of cybersecurity products and services. Since the exposed records may reveal security-relevant data, such incidents cause undue burden and create the risk of re-victimization to individuals whose personal data gets exposed. In pursuit of examining a broad spectrum of the downstream effects on victims, we surveyed 104 persons who purchased specialized devices for the secure storage of crypto-assets and later fell victim to a breach of customer data. Our case study reveals common nuisances (i.e., spam, scams, phishing e-mails) as well as previously unseen attack vectors (e.g., involving tampered devices), which are possibly tied to the breach. A few victims report losses of digital assets as a form of the harm. We find that our participants exhibit heightened safety concerns, appear skeptical about litigation efforts, and demonstrate the ability to differentiate between the quality of the security product and the circumstances of the breach. We derive implications for the cybersecurity industry at large, and point out methodological challenges in data breach research.

Glimpse: On-Demand PoW Light Client with Constant-Size Storage for DeFi

Giulia Scaffino, TU Wien and Christian Doppler Laboratory Blockchain Technologies for the Internet of Things; Lukas Aumayr and Zeta Avarikioti, TU Wien; Matteo Maffei, TU Wien and Christian Doppler Laboratory Blockchain Technologies for the Internet of Things

Available Media

Cross-chain communication is instrumental in unleashing the full potential of blockchain technologies, as it allows users and developers to exploit the unique design features and the profit opportunities of different existing blockchains. The majority of interoperability solutions are provided by centralized exchanges and bridge protocols based on a trusted majority, both introducing undesirable trust assumptions compared to native blockchain assets. Hence, increasing attention has been given to decentralized solutions: Light and super-light clients paved the way for chain relays, which allow verifying on a blockchain the state of another blockchain by respectively verifying and storing a linear and logarithmic amount of data. Unfortunately, relays turn out to be inefficient in terms of computational costs, storage, or compatibility.

We introduce Glimpse, an on-demand bridge that leverages a novel on-demand light client construction with only constant on-chain storage, cost, and computational overhead. Glimpse is expressive, enabling a plethora of DeFi and offchain applications such as lending, pegs, proofs of oracle attestations, and betting hubs. Glimpse also remains compatible with blockchains featuring a limited scripting language such as the Liquid Network (a pegged sidechain of Bitcoin), for which we present a concrete instantiation. We prove Glimpse security in the Universal Composability (UC) framework and further conduct an economic analysis. We evaluate the cost of Glimpse for Bitcoin-like chains: verifying a simple transaction has at most 700 bytes of on-chain overhead, resulting in a one-time fee of $3, only twice as much as a standard Bitcoin transaction.

Mixed Signals: Analyzing Ground-Truth Data on the Users and Economics of a Bitcoin Mixing Service

Fieke Miedema, Kelvin Lubbertsen, Verena Schrama, and Rolf van Wegberg, Delft University of Technology

Available Media

Bitcoin mixing is a commodity, mostly offered in the underground economy, selling anonymity in the bitcoin ecosystem. Its popularity is rather remarkable, as transactions initiated by its users run through wallets of a centralized service where personal identifiable information is collected in the mixing process, without any prior knowledge of data retention policies. This leaves us to wonder if users resort to strategies to mitigate these risks—like the usage of IP proxy services—or test the service with smaller transactions to identify scam services at low' costs.

In this paper, we explore unique ground-truth data capturing 15,574 mixing transactions, initiated by 8,838 users, totaling US $45M worth of bitcoins mixed through BestMixer between July 2018 and June 2019. We find that user adoption of risk mitigation strategies is limited, while transaction volumes users entrust BestMixer are high and usage is frequent and recurrent—with 23% of users returning. Our analysis shows that only 61% of all transactions used some form of IP address obfuscation—i.e., VPN or VPS usage. We discuss possible explanations for these findings, including how information asymmetries and the role of mixers in the process of cashing-out criminal proceeds might force users to accept the risks associated with bitcoin mixing. Furthermore, we address the implications of our findings for the broader cryptocurrency security ecosystem.

Is Your Wallet Snitching On You? An Analysis on the Privacy Implications of Web3

Christof Ferreira Torres, Fiona Willi, and Shweta Shinde, ETH Zurich

Available Media

With the recent hype around the Metaverse and NFTs, Web3 is getting more and more popular. The goal of Web3 is to decentralize the web via decentralized applications. Wallets play a crucial role as they act as an interface between these applications and the user. Wallets such as MetaMask are being used by millions of users nowadays. Unfortunately, Web3 is often advertised as more secure and private. However, decentralized applications as well as wallets are based on traditional technologies, which are not designed with privacy of users in mind. In this paper, we analyze the privacy implications that Web3 technologies such as decentralized applications and wallets have on users. To this end, we build a framework that measures exposure of wallet information. First, we study whether information about installed wallets is being used to track users online. We analyze the top 100K websites and find evidence of 1,325 websites running scripts that probe whether users have wallets installed in their browser. Second, we measure whether decentralized applications and wallets leak the user's unique wallet address to third-parties. We intercept the traffic of 616 decentralized applications and 100 wallets and find over 2000 leaks across 211 applications and more than 300 leaks across 13 wallets. Our study shows that Web3 poses a threat to users' privacy and requires new designs towards more privacy-aware wallet architectures.

Track 6

Memory

Session Chair: Cynthia Irvine, Naval Postgraduate School

Platinum Salon 1–2

Capstone: A Capability-based Foundation for Trustless Secure Memory Access

Jason Zhijingcheng Yu, National University of Singapore; Conrad Watt, University of Cambridge; Aditya Badole, Trevor E. Carlson, and Prateek Saxena, National University of Singapore

Available Media

Capability-based memory isolation is a promising new architectural primitive. Software can access low-level memory only via capability handles rather than raw pointers, which provides a natural interface to enforce security restrictions. Existing architectural capability designs such as CHERI provide spatial safety, but fail to extend to other memory models that security-sensitive software designs may desire. In this paper, we propose Capstone, a more expressive architectural capability design that supports multiple existing memory isolation models in a trustless setup, i.e., without relying on trusted software components. We show how Capstone is well-suited for environments where privilege boundaries are fluid (dynamically extensible), memory sharing/delegation are desired both temporally and spatially, and where such needs are to be balanced with availability concerns. Capstone can also be implemented efficiently. We present an implementation sketch and through evaluation show that its overhead is below 50% in common use cases. We also prototype a functional emulator for Capstone and use it to demonstrate the runnable implementations of six real-world memory models without trusted software components: three types of enclave-based TEEs, a thread scheduler, a memory allocator, and Rust-style memory safety—all within the interface of Capstone.

FloatZone: Accelerating Memory Error Detection using the Floating Point Unit

Floris Gorter, Enrico Barberis, Raphael Isemann, Erik van der Kouwe, Cristiano Giuffrida, and Herbert Bos, Vrije Universiteit Amsterdam

Available Media

Memory sanitizers are powerful tools to detect spatial and temporal memory errors, such as buffer overflows and use-after-frees. Fuzzers and software testers often rely on these tools to discover the presence of bugs. Sanitizers, however, incur significant runtime overhead. For example, AddressSanitizer (ASan), the most widely used sanitizer, incurs a slowdown of 2x. The main source of this overhead consists of the sanitizer checks, which involve at least a memory lookup, a comparison, and a conditional branch instruction. Applying these checks to confirm the validity of the memory accesses in a program can greatly slow down the execution.

We introduce FloatZone, a compiler-based sanitizer to detect spatial and temporal memory errors in C/C++ programs using lightweight checks that leverage the Floating Point Unit (FPU). We show that the combined effects of "lookup, compare, and branch" can be achieved with a single floating point addition that triggers an underflow exception in the case of a memory violation. This novel method to detect illegal accesses greatly improves performance by avoiding the drawbacks of traditional comparisons: it prevents branch mispredictions, enables higher instruction-level parallelism due to offloading to the FPU, and also reduces the cache miss rate due to the lack of shadow memory.

Our evaluation shows that FloatZone significantly outperforms existing systems, with just 37% runtime overhead on SPEC CPU2006 and CPU2017. Moreover, we measure an average 2.87x increase in fuzzing throughput compared to the state of the art. Finally, we confirm that FloatZone offers detection capabilities comparable with ASan on the Juliet test suite and a collection of OSS-Fuzz bugs.

PUMM: Preventing Use-After-Free Using Execution Unit Partitioning

Carter Yagemann, The Ohio State University; Simon P. Chung, Brendan Saltaformaggio, and Wenke Lee, Georgia Institute of Technology

Available Media

Critical software is written in memory unsafe languages that are vulnerable to use-after-free and double free bugs. This has led to proposals to secure memory allocators by strategically deferring memory reallocations long enough to make such bugs unexploitable. Unfortunately, existing solutions suffer from high runtime and memory overheads. Seeking a better solution, we propose to profile programs to identify units of code that correspond to the handling of individual tasks. With the intuition that little to no data should flow between separate tasks at runtime, reallocation of memory freed by the currently executing unit is deferred until after its completion; just long enough to prevent use-after-free exploitation.

To demonstrate the efficacy of our design, we implement a prototype for Linux, PUMM, which consists of an offline profiler and an online enforcer that transparently wraps standard libraries to protect C/C++ binaries. In our evaluation of 40 real-world and 3,000 synthetic vulnerabilities across 26 programs, including complex multi-threaded cases like the Chakra JavaScript engine, PUMM successfully thwarts all real-world exploits, and only allows 4 synthetic exploits, while reducing memory overhead by 52.0% over prior work and incurring an average runtime overhead of 2.04%.

MTSan: A Feasible and Practical Memory Sanitizer for Fuzzing COTS Binaries

Xingman Chen, Tsinghua University; Yinghao Shi, Institute of Information Engineering, Chinese Academy of Sciences; Zheyu Jiang and Yuan Li, Tsinghua University; Ruoyu Wang, Arizona State University; Haixin Duan, Tsinghua University and Zhongguancun Laboratory; Haoyu Wang, Huazhong University of Science and Technology; Chao Zhang, Tsinghua University and Zhongguancun Laboratory

Available Media

Fuzzing has been widely adopted for finding vulnerabilities in programs, especially when source code is not available. But the effectiveness and efficiency of binary fuzzing are curtailed by the lack of memory sanitizers for binaries. This lack of binary sanitizers is due to the information loss in compiling programs and challenges in binary instrumentation.

In this paper, we present a feasible and practical hardware-assisted memory sanitizer, MTSan, for binary fuzzing. MTSan can detect both spatial and temporal memory safety violations at runtime. It adopts a novel progressive object recovery scheme to recover objects in binaries and uses a customized binary rewriting solution to instrument binaries with the memory-tagging-based memory safety sanitizing policy. Further, MTSan uses a hardware feature, ARM Memory Tagging Extension (MTE) to significantly reduce its runtime overhead. We implemented a prototype of MTSan on AArch64 and systematically evaluated its effectiveness and performance. Our evaluation results show that MTSan could detect more memory safety violations than existing binary sanitizers whiling introducing much lower runtime and memory overhead.

12:00 pm–1:30 pm

Lunch (on your own)

1:30 pm–2:45 pm

Track 1

Security in Digital Realities

Session Chair: Lujo Bauer, Carnegie Mellon University

Platinum Salon 6

Hidden Reality: Caution, Your Hand Gesture Inputs in the Immersive Virtual World are Visible to All!

Sindhu Reddy Kalathur Gopal and Diksha Shukla, University of Wyoming; James David Wheelock, University of Colorado Boulder; Nitesh Saxena, Texas A&M University, College Station

Available Media

Text entry is an inevitable task while using Virtual Reality (VR) devices in a wide range of applications such as remote learning, gaming, and virtual meeting. VR users enter passwords/pins to log in to their user accounts in various applications and type regular text to compose emails or browse the internet. The typing activity on VR devices is believed to be resistant to direct observation attacks as the virtual screen in an immersive environment is not directly visible to others present in physical proximity. This paper presents a video-based side-channel attack, Hidden Reality (HR), that shows – although the virtual screen in VR devices is not in direct sight of adversaries, the indirect observations might get exploited to steal the user’s private information.

The Hidden Reality (HR) attack utilizes video clips of the user’s hand gestures while they type on the virtual screen to decipher the typed text in various key entry scenarios on VR devices including typed pins and passwords. Experimental analysis performed on a large corpus of 368 video clips show that the Hidden Reality model can successfully decipher an average of over 75% of the text inputs. The high success rate of our attack model led us to conduct a user study to understand the user’s behavior and perception of security in virtual reality. The analysis showed that over 95% of users were not aware of any security threats on VR devices and believed the immersive environments to be secure from digital attacks. Our attack model challenges users’ false sense of security in immersive environments and emphasizes the need for more stringent security solutions in VR space.

LocIn: Inferring Semantic Location from Spatial Maps in Mixed Reality

Habiba Farrukh, Reham Mohamed, Aniket Nare, Antonio Bianchi, and Z. Berkay Celik, Purdue University

Available Media

Mixed reality (MR) devices capture 3D spatial maps of users' surroundings to integrate virtual content into their physical environment. Existing permission models implemented in popular MR platforms allow all MR apps to access these 3D spatial maps without explicit permission. Unmonitored access of MR apps to these 3D spatial maps poses serious privacy threats to users as these maps capture detailed geometric and semantic characteristics of users' environments. In this paper, we present LocIn, a new location inference attack that exploits these detailed characteristics embedded in 3D spatial maps to infer a user's indoor location type. LocIn develops a multi-task approach to train an end-to-end encoder-decoder network that extracts a spatial feature representation for capturing contextual patterns of the user's environment. LocIn leverages this representation to detect 3D objects and surfaces and integrates them into a classification network with a novel unified optimization function to predict the user's indoor location. We demonstrate LocIn attack on spatial maps collected from three popular MR devices. We show that LocIn infers a user's location type with an average 84.1% accuracy.

Unique Identification of 50,000+ Virtual Reality Users from Head & Hand Motion Data

Vivek Nair and Wenbo Guo, UC Berkeley; Justus Mattern, RWTH Aachen; Rui Wang and James F. O'Brien, UC Berkeley; Louis Rosenberg, Unanimous AI; Dawn Song, UC Berkeley

Available Media

With the recent explosive growth of interest and investment in virtual reality (VR) and the so-called "metaverse," public attention has rightly shifted toward the unique security and privacy threats that these platforms may pose. While it has long been known that people reveal information about themselves via their motion, the extent to which this makes an individual globally identifiable within virtual reality has not yet been widely understood. In this study, we show that a large number of real VR users (N=55,541) can be uniquely and reliably identified across multiple sessions using just their head and hand motion relative to virtual objects. After training a classification model on 5 minutes of data per person, a user can be uniquely identified amongst the entire pool of 50,000+ with 94.33% accuracy from 100 seconds of motion, and with 73.20% accuracy from just 10 seconds of motion. This work is the first to truly demonstrate the extent to which biomechanics may serve as a unique identifier in VR, on par with widely used biometrics such as facial or fingerprint recognition.

Exploring User Reactions and Mental Models Towards Perceptual Manipulation Attacks in Mixed Reality

Kaiming Cheng, Jeffery F. Tian, Tadayoshi Kohno, and Franziska Roesner, University of Washington

Available Media

Perceptual Manipulation Attacks (PMA) involve manipulating users’ multi-sensory (e.g., visual, auditory, haptic) perceptions of the world through Mixed Reality (MR) content, in order to influence users' judgments and following actions. For example, a MR driving application that is expected to show safety-critical output might also (maliciously or unintentionally) overlay the wrong signal on a traffic sign, misleading the user into slamming on the brake. While current MR technology is sufficient to create such attacks, little research has been done to understand how users perceive, react to, and defend against such potential manipulations. To provide a foundation for understanding and addressing PMA in MR, we conducted an in-person study with 21 participants. We developed three PMA in which we focused on attacking three different perceptions: visual, auditory, and situational awareness. Our study first investigates how user reactions are affected by evaluating their performance on “microbenchmark'' tasks under benchmark and different attack conditions. We observe both primary and secondary impacts from attacks, later impacting participants' performance even under non-attack conditions. We follow up with interviews, surfacing a range of user reactions and interpretations of PMA. Through qualitative data analysis of our observations and interviews, we identify various defensive strategies participants developed, and we observe how these strategies sometimes backfire. We derive recommendations for future investigation and defensive directions based on our findings.

Erebus: Access Control for Augmented Reality Systems

Yoonsang Kim, Sanket Goutam, Amir Rahmati, and Arie Kaufman, Stony Brook University

Available Media

Augmented Reality (AR) is widely considered the next evolution in personal devices, enabling seamless integration of the digital world into our reality. Such integration, however, often requires unfettered access to sensor data, causing significant overprivilege for applications that run on these platforms. Through analysis of 17 AR systems and 45 popular AR applications, we explore existing mechanisms for access control in AR platforms, identify key trends in how AR applications use sensor data, and pinpoint unique threats users face in AR environments. Using these findings, we design and implement Erebus, an access control framework for AR platforms that enables fine-grained control over data used by AR applications. Erebus achieves the principle of least privileged through creation of a domain-specific language (DSL) for permission control in AR platforms, allowing applications to specify data needed for their functionality. Using this DSL, Erebus further enables users to customize app permissions to apply under specific user conditions. We implement Erebus on Google's ARCore SDK and port five existing AR applications to demonstrate Erebus capability to secure various classes of apps. Performance results using these applications and various microbenchmarks show that Erebus achieves its security goals while being practical, introducing negligible performance overhead to the AR system.

Track 2

Password Guessing

Session Chair: Blase Ur, University of Chicago

Platinum Salon 5

No Single Silver Bullet: Measuring the Accuracy of Password Strength Meters

Ding Wang, Xuan Shan, and Qiying Dong, Nankai University; Yaosheng Shen, Peking University; Chunfu Jia, Nankai University

Available Media

To help users create stronger passwords, nearly every respectable web service adopts a password strength meter (PSM) to provide real-time strength feedback upon user registration and password change. Recent research has found that PSMs that provide accurate feedback can indeed effectively nudge users toward choosing stronger passwords. Thus, it is imperative to systematically evaluate existing PSMs to facilitate the selection of accurate ones. In this paper, we highlight that there is no single silver bullet metric for measuring the accuracy of PSMs: For each given guessing scenario and strategy, a specific metric is necessary. We investigate the intrinsic characteristics of online and offline guessing scenarios, and for the first time, propose a systematic evaluation framework that is composed of four different dimensioned criteria to rate PSM accuracy under these two password guessing scenarios (as well as various guessing strategies).

More specifically, for online guessing, the strength misjudgments of passwords with different popularity would have varied effects on PSM accuracy, and we suggest the weighted Spearman metric and consider two typical attackers: The general attacker who is unaware of the target password distribution, and the knowledgeable attacker aware of it. For offline guessing, since the cracked passwords are generally weaker than the uncracked ones, and they correspond to two disparate distributions, we adopt the Kullback-Leibler divergence metric and investigate the four most typical guessing strategies: brute-force, dictionary-based, probability-based, and a combination of above three strategies. In particular, we propose the Precision metric to measure PSM accuracy when non-binned strength feedback (e.g., probability) is transformed into easy-to-understand bins/scores (e.g., [weak, medium, strong]). We further introduce a reconciled Precision metric to characterize the impacts of strength misjudgments in different directions (e.g., weak→strong and strong→weak) on PSM accuracy. The effectiveness and practicality of our evaluation framework are demonstrated by rating 12 leading PSMs, leveraging 14 real-world password datasets. Finally, we provide three recommendations to help improve the accuracy of PSMs.

Password Guessing Using Random Forest

Ding Wang and Yunkai Zou, Nankai University; Zijian Zhang, Peking University; Kedong Xiu, Nankai University

Available Media

Passwords are the most widely used authentication method, and guessing attacks are the most effective method for password strength evaluation. However, existing password guessing models are generally built on traditional statistics or deep learning, and there has been no research on password guessing that employs classical machine learning.

To fill this gap, this paper provides a brand new technical route for password guessing. More specifically, we re-encode the password characters and make it possible for a series of classical machine learning techniques that tackle multi-class classification problems (such as random forest, boosting algorithms and their variants) to be used for password guessing. Further, we propose RFGuess, a random-forest based framework that characterizes the three most representative password guessing scenarios (i.e., trawling guessing, targeted guessing based on personally identifiable information (PII) and on users' password reuse behaviors).

Besides its theoretical significance, this work is also of practical value. Experiments using 13 large real-world password datasets demonstrate that our random-forest based guessing models are effective: (1) RFGuess for trawling guessing scenarios, whose guessing success rates are comparable to its foremost counterparts; (2) RFGuess-PII for targeted guessing based on PII, which guesses 20%~28% of common users within 100 guesses, outperforming its foremost counterpart by 7%~13%; (3) RFGuess-Reuse for targeted guessing based on users' password reuse/modification behaviors, which performs the best or 2nd best among related models. We believe this work makes a substantial step toward introducing classical machine learning techniques into password guessing.

Pass2Edit: A Multi-Step Generative Model for Guessing Edited Passwords

Ding Wang and Yunkai Zou, Nankai University; Yuan-An Xiao, Peking University; Siqi Ma, The University of New South Wales; Xiaofeng Chen, Xidian University

Available Media

While password stuffing attacks (that exploit the direct password reuse behavior) have gained considerable attention, only a few studies have examined password tweaking attacks, where an attacker exploits users' indirect reuse behaviors (with edit operations like insertion, deletion, and substitution). For the first time, we model the password tweaking attack as a multi-class classification problem for characterizing users' password edit/modification processes, and propose a generative model coupled with the multi-step decision-making mechanism, called Pass2Edit, to accurately characterize users' password reuse/modification behaviors.

We demonstrate the effectiveness of Pass2Edit through extensive experiments, which consist of 12 practical attack scenarios and employ 4.8 billion real-world passwords. The experimental results show that Pass2Edit and its variant significantly improve over the prior art. More specifically, when the victim's password at site A (namely pwA) is known, within 100 guesses, the cracking success rate of Pass2Edit in guessing her password at site B (pwBpwA) is 24.2% (for common users) and 11.7% (for security-savvy users), respectively, which is 18.2%-33.0% higher than its foremost counterparts. Our results highlight that password tweaking is a much more damaging threat to password security than expected.

Improving Real-world Password Guessing Attacks via Bi-directional Transformers

Ming Xu and Jitao Yu, Fudan University; Xinyi Zhang, Facebook; Chuanwang Wang, Shenghao Zhang, Haoqi Wu, and Weili Han, Fudan University

Available Media

Password guessing attacks, prevalent issues in the real world, can be conceptualized as efforts to approximate the probability distribution of text tokens. Techniques in the natural language processing (NLP) field naturally lend themselves to password guessing. Among them, bi-directional transformers stand out with their ability to utilize bi-directional contexts to capture the nuances in texts.

To further improve password guessing attacks, we propose a bi-directional-transformer-based guessing framework, referred to as PassBERT, which applies the pre-training / fine-tuning paradigm to password guessing attacks. We first prepare a pre-trained password model, which contains the knowledge of the general password distribution. Then, we design three attack-specific fine-tuning approaches to tailor the pre-trained password model to the following real-world attack scenarios: (1) conditional password guessing, which recovers the complete password given a partial password; (2) targeted password guessing, which compromises the password(s) of a specific user using their personal information; (3) adaptive rule-based password guessing, which selects adaptive mangling rules for a word (i.e., base password) to generate rule-transformed password candidates. The experimental results show that our fine-tuned models can outperform the state-of-the-art models by 14.53%, 21.82% and 4.86% in the three attacks, respectively, demonstrating the effectiveness of bi-directional transformers on downstream guessing attacks. Finally, we propose a hybrid password strength meter to mitigate the risks from the three attacks.

Araña: Discovering and Characterizing Password Guessing Attacks in Practice

Mazharul Islam, University of Wisconsin–Madison; Marina Sanusi Bohuk, Cornell Tech; Paul Chung, University of Wisconsin–Madison; Thomas Ristenpart, Cornell Tech; Rahul Chatterjee, University of Wisconsin–Madison

Available Media

Remote password guessing attacks remain one of the largest sources of account compromise. Understanding and characterizing attacker strategies is critical to improving security but doing so has been challenging thus far due to the sensitivity of login services and the lack of ground truth labels for benign and malicious login requests. We perform an in-depth measurement study of guessing attacks targeting two large universities. Using a rich dataset of more than 34 million login requests to the two universities as well as thousands of compromise reports, we were able to develop a new analysis pipeline to identify 29 attack clusters—many of which involved compromises not previously known to security engineers. Our analysis provides the richest investigation to date of password guessing attacks as seen from login services. We believe our tooling will be useful in future efforts to develop real-time detection of attack campaigns, and our characterization of attack campaigns can help more broadly guide mitigation design.

Track 3

Privacy Policies, Labels, Etc.

Session Chair: Anastasia Shuba, DuckDuckGo

Platinum Salon 7–8

PoliGraph: Automated Privacy Policy Analysis using Knowledge Graphs

Hao Cui, Rahmadi Trimananda, Athina Markopoulou, and Scott Jordan, University of California, Irvine

Available Media

Privacy policies disclose how an organization collects and handles personal information. Recent work has made progress in leveraging natural language processing (NLP) to automate privacy policy analysis and extract data collection statements from different sentences, considered in isolation from each other. In this paper, we view and analyze, for the first time, the entire text of a privacy policy in an integrated way. In terms of methodology: (1) we define PoliGraph, a type of knowledge graph that captures statements in a privacy policy as relations between different parts of the text; and (2) we develop an NLP-based tool, PoliGraph-er, to automatically extract PoliGraph from the text. In addition, (3) we revisit the notion of ontologies, previously defined in heuristic ways, to capture subsumption relations between terms. We make a clear distinction between local and global ontologies to capture the context of individual privacy policies, application domains, and privacy laws. Using a public dataset for evaluation, we show that PoliGraph-er identifies 40% more collection statements than prior state-of-the-art, with 97% precision. In terms of applications, PoliGraph enables automated analysis of a corpus of privacy policies and allows us to: (1) reveal common patterns in the texts across different privacy policies, and (2) assess the correctness of the terms as defined within a privacy policy. We also apply PoliGraph to: (3) detect contradictions in a privacy policy, where we show false alarms by prior work, and (4) analyze the consistency of privacy policies and network traffic, where we identify significantly more clear disclosures than prior work.

Calpric: Inclusive and Fine-grain Labeling of Privacy Policies with Crowdsourcing and Active Learning

Wenjun Qiu, David Lie, and Lisa Austin, University of Toronto

Available Media

A significant challenge to training accurate deep learning models on privacy policies is the cost and difficulty of obtaining a large and comprehensive set of training data. To address these challenges, we present Calpric , which combines automatic text selection and segmentation, active learning and the use of crowdsourced annotators to generate a large, balanced training set for privacy policies at low cost. Automated text selection and segmentation simplifies the labeling task, enabling untrained annotators from crowdsourcing platforms, like Amazon's Mechanical Turk, to be competitive with trained annotators, such as law students, and also reduces inter-annotator agreement, which decreases labeling cost. Having reliable labels for training enables the use of active learning, which uses fewer training samples to efficiently cover the input space, further reducing cost and improving class and data category balance in the data set.

The combination of these techniques allows Calpric to produce models that are accurate over a wider range of data categories, and provide more detailed, fine-grain labels than previous work. Our crowdsourcing process enables Calpric to attain reliable labeled data at a cost of roughly $0.92-$1.71 per labeled text segment. Calpric 's training process also generates a labeled data set of 16K privacy policy text segments across 9 Data categories with balanced positive and negative samples.

POLICYCOMP: Counterpart Comparison of Privacy Policies Uncovers Overbroad Personal Data Collection Practices

Lu Zhou, Xidian University and Shanghai Jiao Tong University; Chengyongxiao Wei, Tong Zhu, and Guoxing Chen, Shanghai Jiao Tong University; Xiaokuan Zhang, George Mason University; Suguo Du, Hui Cao, and Haojin Zhu, Shanghai Jiao Tong University

Available Media

Since mobile apps' privacy policies are usually complex, various tools have been developed to examine whether privacy policies have contradictions and verify whether privacy policies are consistent with the apps' behaviors. However, to the best of our knowledge, no prior work answers whether the personal data collection practices (PDCPs) in an app's privacy policy are necessary for given purposes (i.e., whether to comply with the principle of data minimization). Though defined by most existing privacy regulations/laws such as GDPR, the principle of data minimization has been translated into different privacy practices depending on the different contexts (e.g., various developers and targeted users). In the end, the developers can collect personal data claimed in the privacy policies as long as they receive authorizations from the users.

Currently, it mainly relies on legal experts to manually audit the necessity of personal data collection according to the specific contexts, which is not very scalable for millions of apps. In this study, we aim to take the first step to automatically investigate whether PDCPs in an app's privacy policy are overbroad from the perspective of counterpart comparison. Our basic insight is that, if an app claims to collect much more personal data in its privacy policy than most of its counterparts, it is more likely to be conducting overbroad collection. To achieve this, POLICYCOMP, an automatic framework for detecting overbroad PDCPs is proposed. We use POLICYCOMP to perform a large-scale analysis on 10,042 privacy policies and flag 48.29% of PDCPs to be overbroad. We shared our findings with 2,000 app developers and received 52 responses from them, 39 of which acknowledged our findings and took actions (e.g., removing overbroad PDCPs).

Lalaine: Measuring and Characterizing Non-Compliance of Apple Privacy Labels

Yue Xiao, Zhengyi Li, and Yue Qin, Indiana University Bloomington; Xiaolong Bai, Orion Security Lab, Alibaba Group; Jiale Guan, Xiaojing Liao, and Luyi Xing, Indiana University Bloomington

Available Media

As a key supplement to privacy policies that are known to be lengthy and difficult to read, Apple has launched app privacy labels, which purportedly help users more easily understand an app's privacy practices. However, false and misleading privacy labels can dupe privacy-conscious consumers into downloading data-intensive apps, ultimately eroding the credibility and integrity of the labels. Although Apple releases requirements and guidelines for app developers to create privacy labels, little is known about whether and to what extent the privacy labels in the wild are correct and compliant, reflecting the actual data practices of iOS apps.

This paper presents the first systematic study, based on our new methodology named Lalaine, to evaluate data-flow to privacy-label flow-to-label consistency. Lalaine fully analyzed the privacy labels and binaries of 5,102 iOS apps, shedding lights on the prevalence and seriousness of privacy-label non-compliance. We provide detailed case studies and analyze root causes for privacy label non-compliance that complements prior understandings. This has led to new insights for improving privacy-label design and compliance requirements, so app developers, platform stakeholders, and policy-makers can better achieve their privacy and accountability goals. Lalaine is thoroughly evaluated for its high effectiveness and efficiency. We are responsibly reporting the results to stakeholders.

Automated Cookie Notice Analysis and Enforcement

Rishabh Khandelwal and Asmit Nayak, University of Wisconsin—Madison; Hamza Harkous, Google, Inc.; Kassem Fawaz, University of Wisconsin—Madison

Available Media

Online websites use cookie notices to elicit consent from the users, as required by recent privacy regulations like the GDPR and the CCPA. Prior work has shown that these notices are designed in a way to manipulate users into making website-friendly choices which put users' privacy at risk. In this work, we present CookieEnforcer, a new system for automatically discovering cookie notices and extracting a set of instructions that result in disabling all non-essential cookies. In order to achieve this, we first build an automatic cookie notice detector that utilizes the rendering pattern of the HTML elements to identify the cookie notices. Next, we analyze the cookie notices and predict the set of actions required to disable all unnecessary cookies. This is done by modeling the problem as a sequence-to-sequence task, where the input is a machine-readable cookie notice and the output is the set of clicks to make. We demonstrate the efficacy of CookieEnforcer via an end-to-end accuracy evaluation, showing that it can generate the required steps in 91% of the cases. Via a user study, we also show that CookieEnforcer can significantly reduce the user effort. Finally, we characterize the behavior of CookieEnforcer on the top 100k websites from the Tranco list, showcasing its stability and scalability.

Track 4

ML Applications to Malware

Session Chair: Kevin Roundy, Norton Research

Platinum Salon 9–10

Continuous Learning for Android Malware Detection

Yizheng Chen, Zhoujie Ding, and David Wagner, UC Berkeley

Available Media

Machine learning methods can detect Android malware with very high accuracy. However, these classifiers have an Achilles heel, concept drift: they rapidly become out of date and ineffective, due to the evolution of malware apps and benign apps. Our research finds that, after training an Android malware classifier on one year's worth of data, the F1 score quickly dropped from 0.99 to 0.76 after 6 months of deployment on new test samples.

In this paper, we propose new methods to combat the concept drift problem of Android malware classifiers. Since machine learning technique needs to be continuously deployed, we use active learning: we select new samples for analysts to label, and then add the labeled samples to the training set to retrain the classifier. Our key idea is, similarity-based uncertainty is more robust against concept drift. Therefore, we combine contrastive learning with active learning. We propose a new hierarchical contrastive learning scheme, and a new sample selection technique to continuously train the Android malware classifier. Our evaluation shows that this leads to significant improvements, compared to previously published methods for active learning. Our approach reduces the false negative rate from 14% (for the best baseline) to 9%, while also reducing the false positive rate (from 0.86% to 0.48%). Also, our approach maintains more consistent performance across a seven-year time period than past methods.

Humans vs. Machines in Malware Classification

Simone Aonzo, EURECOM; Yufei Han, INRIA; Alessandro Mantovani and Davide Balzarotti, EURECOM

Available Media

Today, the classification of a file as either benign or malicious is performed by a combination of deterministic indicators (such as antivirus rules), Machine Learning classifiers, and, more importantly, the judgment of human experts.

However, to compare the difference between human and machine intelligence in malware analysis, it is first necessary to understand how human subjects approach malware classification. In this direction, our work presents the first experimental study designed to capture which `features' of a suspicious program (e.g., static properties or runtime behaviors) are prioritized for malware classification according to humans and machines intelligence. For this purpose, we created a malware classification game where 110 human players worldwide and with different seniority levels (72 novices and 38 experts) have competed to classify the highest number of unknown samples based on detailed sandbox reports. Surprisingly, we discovered that both experts and novices base their decisions on approximately the same features, even if there are clear differences between the two expertise classes.

Furthermore, we implemented two state-of-the-art Machine Learning models for malware classification and evaluated their performances on the same set of samples. The comparative analysis of the results unveiled a common set of features preferred by both Machine Learning models and helped better understand the difference in the feature extraction.

This work reflects the difference in the decision-making process of humans and computer algorithms and the different ways they extract information from the same data. Its findings serve multiple purposes, from training better malware analysts to improving feature encoding.

Adversarial Training for Raw-Binary Malware Classifiers

Keane Lucas, Samruddhi Pai, Weiran Lin, and Lujo Bauer, Carnegie Mellon University; Michael K. Reiter, Duke University; Mahmood Sharif, Tel Aviv University

Available Media

Machine learning (ML) models have shown promise in classifying raw executable files (binaries) as malicious or benign with high accuracy. This has led to the increasing influence of ML-based classification methods in academic and real-world malware detection, a critical tool in cybersecurity. However, previous work provoked caution by creating variants of malicious binaries, referred to as adversarial examples, that are transformed in a functionality-preserving way to evade detection. In this work, we investigate the effectiveness of using adversarial training methods to create malware classification models that are more robust to some state-of-the-art attacks. To train our most robust models, we significantly increase the efficiency and scale of creating adversarial examples to make adversarial training practical, which has not been done before in raw-binary malware detectors. We then analyze the effects of varying the length of adversarial training, as well as analyze the effects of training with various types of attacks. We find that data augmentation does not deter state-of-the-art attacks, but that using a generic gradient-guided method, used in other discrete domains, does improve robustness. We also show that in most cases, models can be made more robust to malware-domain attacks by adversarially training them with lower-effort versions of the same attack. In the best case, we reduce one state-of-the-art attack's success rate from 90% to 5%. We also find that training with some types of attacks can increase robustness to other types of attacks. Finally, we discuss insights gained from our results, and how they can be used to more effectively train robust malware detectors.

Black-box Adversarial Example Attack towards FCG Based Android Malware Detection under Incomplete Feature Information

Heng Li, Huazhong University of Science and Technology; Zhang Cheng, NSFOCUS Technologies Group Co., Ltd. and Huazhong University of Science and Technology; Bang Wu, Liheng Yuan, Cuiying Gao, and Wei Yuan, Huazhong University of Science and Technology; Xiapu Luo, The Hong Kong Polytechnic University

Available Media

The function call graph (FCG) based Android malware detection methods have recently attracted increasing attention due to their promising performance. However, these methods are susceptible to adversarial examples (AEs). In this paper, we design a novel black-box AE attack towards the FCG based malware detection system, called BagAmmo. To mislead its target system, BagAmmo purposefully perturbs the FCG feature of malware through inserting "never-executed" function calls into malware code. The main challenges are two-fold. First, the malware functionality should not be changed by adversarial perturbation. Second, the information of the target system (e.g., the graph feature granularity and the output probabilities) is absent.

To preserve malware functionality, BagAmmo employs the try-catch trap to insert function calls to perturb the FCG of malware. Without the knowledge about feature granularity and output probabilities, BagAmmo adopts the architecture of generative adversarial network (GAN), and leverages a multi-population co-evolution algorithm (i.e., Apoem) to generate the desired perturbation. Every population in Apoem represents a possible feature granularity, and the real feature granularity can be achieved when Apoem converges.

Through extensive experiments on over 44k Android apps and 32 target models, we evaluate the effectiveness, efficiency and resilience of BagAmmo. BagAmmo achieves an average attack success rate of over 99.9% on MaMaDroid, APIGraph and GCN, and still performs well in the scenario of concept drift and data imbalance. Moreover, BagAmmo outperforms the state-of-the-art attack SRL in attack success rate.

Evading Provenance-Based ML Detectors with Adversarial System Actions

Kunal Mukherjee, Joshua Wiedemeier, Tianhao Wang, James Wei, Feng Chen, Muhyun Kim, Murat Kantarcioglu, and Kangkook Jee, The University of Texas at Dallas

Available Media

We present PROVNINJA, a framework designed to generate adversarial attacks that aim to elude provenance-based Machine Learning (ML) security detectors. PROVNINJA is designed to identify and craft adversarial attack vectors that statistically mimic and impersonate system programs.

Leveraging the benign execution profile of system processes commonly observed across a multitude of hosts and networks, our research proposes an efficient and effective method to probe evasive alternatives and devise stealthy attack vectors that are difficult to distinguish from benign system behaviors. PROVNINJA's suggestions for evasive attacks, originally derived in the feature space, are then translated into system actions, leading to the realization of actual evasive attack sequences in the problem space.

When evaluated against State-of-The-Art (SOTA) detector models using two realistic Advanced Persistent Threat (APT) scenarios and a large collection of fileless malware samples, PROVNINJA could generate and realize evasive attack variants, reducing the detection rates by up to 59%. We also assessed PROVNINJA under varying assumptions on adversaries' knowledge and capabilities. While PROVNINJA primarily considers the black-box model, we also explored two contrasting threat models that consider blind and white-box attack scenarios.

Track 5

Secure Messaging

Session Chair: Sebastia Schinzel, Münster University of Applied Sciences

Platinum Salon 3–4

TreeSync: Authenticated Group Management for Messaging Layer Security

Théophile Wallez, Inria Paris; Jonathan Protzenko, Microsoft Research; Benjamin Beurdouche, Mozilla; Karthikeyan Bhargavan, Inria Paris

Distinguished Paper Award Winner and Co-Winner of the 2023 Internet Defense Prize

Available Media

Messaging Layer Security (MLS), currently undergoing standardization at the IETF, is an asynchronous group messaging protocol that aims to be efficient for large dynamic groups, while providing strong guarantees like forward secrecy (FS) and post-compromise security (PCS). While prior work on MLS has extensively studied its group key establishment component (called TreeKEM), many flaws in early designs of MLS have stemmed from its group integrity and authentication mechanisms that are not as well-understood. In this work, we identify and formalize TreeSync: a sub-protocol of MLS that specifies the shared group state, defines group management operations, and ensures consistency, integrity, and authentication for the group state across all members.

We present a precise, executable, machine-checked formal specification of TreeSync, and show how it can be composed with other components to implement the full MLS protocol. Our specification is written in F* and serves as a reference implementation of MLS; it passes the RFC test vectors and is interoperable with other MLS implementations. Using the DY* symbolic protocol analysis framework, we formalize and prove the integrity and authentication guarantees of TreeSync, under minimal security assumptions on the rest of MLS. Our analysis identifies a new attack and we propose several changes that have been incorporated in the latest MLS draft. Ours is the first testable, machine-checked, formal specification for MLS, and should be of interest to both developers and researchers interested in this upcoming standard.

Formal Analysis of Session-Handling in Secure Messaging: Lifting Security from Sessions to Conversations

Cas Cremers, CISPA Helmholtz Center for Information Security; Charlie Jacomme, Inria Paris; Aurora Naska, CISPA Helmholtz Center for Information Security

Available Media

The building blocks for secure messaging apps, such as Signal’s X3DH and Double Ratchet (DR) protocols, have received a lot of attention from the research community. They have notably been proved to meet strong security properties even in the case of compromise such as Forward Secrecy (FS) and Post-Compromise Security (PCS). However, there is a lack of formal study of these properties at the application level. Whereas the research works have studied such properties in the context of a single ratcheting chain, a conversation between two persons in a messaging application can in fact be the result of merging multiple ratcheting chains.

In this work, we initiate the formal analysis of secure messaging taking the session-handling layer into account, and apply our approach to Sesame, Signal’s session management. We first experimentally show practical scenarios in which PCS can be violated in Signal by a clone attacker, despite its use of the Double Ratchet. We identify how this is enabled by Signal’s session-handling layer. We then design a formal model of the session-handling layer of Signal that is tractable for automated verification with the Tamarin prover, and use this model to rediscover the PCS violation and propose two provably secure mechanisms to offer stronger guarantees.

Cryptographic Administration for Secure Group Messaging

David Balbás, IMDEA Software Institute & Universidad Politécnica de Madrid; Daniel Collins and Serge Vaudenay, EPFL

Available Media

Many real-world group messaging systems delegate group administration to the application level, failing to provide formal guarantees related to group membership. Taking a cryptographic approach to group administration can prevent both implementation and protocol design pitfalls that result in a loss of confidentiality and consistency for group members.

In this work, we introduce a cryptographic framework for the design of group messaging protocols that offer strong security guarantees for group membership. To this end, we extend the continuous group key agreement (CGKA) paradigm used in the ongoing IETF MLS group messaging standardisation process and introduce the administrated CGKA (A-CGKA) primitive. Our primitive natively enables a subset of group members, the group admins, to control the addition and removal of parties and to update their own keying material in a secure manner. Notably, our security model prevents even corrupted (non-admin) members from forging messages that modify group membership. Moreover, we present two efficient and modular constructions of group administrators that are correct and secure with respect to our definitions. Finally, we propose, implement, and benchmark an efficient extension of MLS that integrates cryptographic administrators.

Wink: Deniable Secure Messaging

Anrin Chakraborti, Duke University; Darius Suciu and Radu Sion, Stony Brook University

Available Media

End-to-end encrypted (E2EE) messaging is an essential first step in providing message confidentiality. Unfortunately, all security guarantees of end-to-end encryption are lost when keys or plaintext are disclosed, either due to device compromise or coercion by powerful adversaries. This work introduces Wink, the first plausibly-deniable messaging system protecting message confidentiality from partial device compromise and compelled key disclosure. Wink can surreptitiously inject hidden messages in standard random coins, e.g., in salts, IVs, used by existing E2EE protocols. It does so as part of legitimate secure cryptographic functionality deployed inside the widely-available trusted execution environment (TEE) TrustZone. This results in hidden communication using virtually unchanged existing E2EE messaging apps, as well as strong plausible deniability. Wink has been demonstrated with multiple existing E2EE applications (including Telegram and Signal) with minimal (external) instrumentation, negligible overheads, and crucially, without changing on-wire message formats.

Three Lessons From Threema: Analysis of a Secure Messenger

Kenneth G. Paterson, Matteo Scarlata, and Kien Tuong Truong, ETH Zurich

Available Media

We provide an extensive cryptographic analysis of Threema, a Swiss-based encrypted messaging application with more than 10 million users and 7000 corporate customers. We present seven different attacks against the protocol in three different threat models. We discuss impact and remediations for our attacks, which have all been responsibly disclosed to Threema and patched. Finally, we draw wider lessons for developers of secure protocols.

Track 6

x-Fuzz

Session Chair: Manuel Egele, Boston University

Platinum Salon 1–2

MorFuzz: Fuzzing Processor via Runtime Instruction Morphing enhanced Synchronizable Co-simulation

Jinyan Xu and Yiyuan Liu, Zhejiang University; Sirui He, City University of Hong Kong; Haoran Lin and Yajin Zhou, Zhejiang University; Cong Wang, City University of Hong Kong

Available Media

Modern processors are too complex to be bug free. Recently, a few hardware fuzzing techniques have shown promising results in verifying processor designs. However, due to the complexity of processors, they suffer from complex input grammar, deceptive mutation guidance, and model implementation differences. Therefore, how to effectively and efficiently verify processors is still an open problem.

This paper proposes MorFuzz, a novel processor fuzzer that can efficiently discover software triggerable hardware bugs. The core idea behind MorFuzz is to use runtime information to generate instruction streams with valid formats and meaningful semantics. MorFuzz designs a new input structure to provide multi-level runtime mutation primitives and proposes the instruction morphing technique to mutate instruction dynamically. Besides, we also extend the co-simulation framework to various microarchitectures and develop the state synchronization technique to eliminate implementation differences. We evaluate MorFuzz on three popular open-source RISC-V processors: CVA6, Rocket, BOOM, and discover 17 new bugs (with 13 CVEs assigned). Our evaluation shows MorFuzz achieves 4.4× and 1.6× more state coverage than the state-of-the-art fuzzer, DifuzzRTL, and the famous constrained instruction generator, riscv-dv.

µFUZZ: Redesign of Parallel Fuzzing using Microservice Architecture

Yongheng Chen, Georgia Institute of Technology; Rui Zhong, Pennsylvania State University; Yupeng Yang, Georgia Institute of Technology; Hong Hu and Dinghao Wu, Pennsylvania State University; Wenke Lee, Georgia Institute of Technology

Available Media

Fuzzing has been widely adopted as an effective testing technique for detecting software bugs. Researchers have explored many parallel fuzzing approaches to speed up bug detection. However, existing approaches are built on top of serial fuzzers and rely on periodic fuzzing state synchronization. Such a design has two limitations. First, the synchronous serial design of the fuzzer might waste CPU power due to blocking I/O operations. Second, state synchronization is either too late so that we fuzz with a suboptimal strategy or too frequent so that it causes enormous overhead.

In this paper, we redesign parallel fuzzing with microservice architecture and propose the prototype μFUZZ. To better utilize CPU power in the existence of I/O, μFUZZ breaks down the synchronous fuzzing loops into concurrent microservices, each with multiple workers. To avoid state synchronization, μFUZZ partitions the state into different services and their workers so that they can work independently but still achieve a great aggregated result. Our experiments show that μFUZZ outperforms the second-best existing fuzzers with 24% improvements in code coverage and 33% improvements in bug detection on average in 24 hours. Besides, μFUZZ finds 11 new bugs in well-tested real-world programs.

FISHFUZZ: Catch Deeper Bugs by Throwing Larger Nets

Han Zheng, National Computer Network Intrusion Protection Center, University of Chinese Academy of Science; School of Computer and Communication Sciences, EPFL; Zhongguancun Laboratory; Jiayuan Zhang, National Computer Network Intrusion Protection Center, University of Chinese Academy of Science; School of Computer and Communication, Lanzhou University of Technology; Yuhang Huang, National Computer Network Intrusion Protection Center, University of Chinese Academy of Science; Zezhong Ren, National Computer Network Intrusion Protection Center, University of Chinese Academy of Science; Zhongguancun Laboratory; He Wang, School of Cyber Engineering, Xidian University; Chunjie Cao, School of Cyberspace Security, Hainan University; Yuqing Zhang, National Computer Network Intrusion Protection Center, University of Chinese Academy of Science; Zhongguancun Laboratory; School of Cyberspace Security, Hainan University; School of Cyber Engineering, Xidian University; Flavio Toffalini and Mathias Payer, School of Computer and Communication Sciences, EPFL

Available Media

Fuzzers effectively explore programs to discover bugs. Greybox fuzzers mutate seed inputs and observe their execution. Whenever a seed reaches new behavior (e.g., new code or higher execution frequency), it is stored for further mutation. Greybox fuzzers directly measure exploration and, by repeating execution of the same targets with large amounts of mutated seeds, passively exploit any lingering bugs. Directed greybox fuzzers (DGFs) narrow the search to few code locations but so far generalize distance to all targets into a single score and do not prioritize targets dynamically.

FISHFUZZ introduces an input prioritization strategy that builds on three concepts: (i) a novel multi-distance metric whose precision is independent of the number of targets, (ii) a dynamic target ranking to automatically discard exhausted targets, and (iii) a smart queue culling algorithm, based on hyperparameters, that alternates between exploration and exploitation. FISHFUZZ enables fuzzers to seamlessly scale among thousands of targets and prioritize seeds toward interesting locations, thus achieving more comprehensive program testing. To demonstrate generality, we implement FISHFUZZ over two well-established greybox fuzzers (AFL and AFL++). We evaluate FISHFUZZ by leveraging all sanitizer labels as targets. Extensively comparing FISHFUZZ against modern DGFs and coverage-guided fuzzers demonstrates that FISHFUZZ reaches higher coverage compared to the direct competitors, finds up to 282% more bugs compared with baseline fuzzers and reproduces 68.3% existing bugs faster. FISHFUZZ also discovers 56 new bugs (38 CVEs) in 47 programs.

HyPFuzz: Formal-Assisted Processor Fuzzing

Chen Chen, Rahul Kande, Nathan Nguyen, Flemming Andersen, and Aakash Tyagi, Texas A&M University; Ahmad-Reza Sadeghi, Technische Universität Darmstadt; Jeyavijayan Rajendran, Texas A&M University

Available Media

Recent research has shown that hardware fuzzers can effectively detect security vulnerabilities in modern processors. However, existing hardware fuzzers do not fuzz well the hard-to-reach design spaces. Consequently, these fuzzers cannot effectively fuzz security-critical control- and data-flow logic in the processors, hence missing security vulnerabilities.

To tackle this challenge, we present HyPFuzz, a hybrid fuzzer that leverages formal verification tools to help fuzz the hard-to-reach part of the processors. To increase the effectiveness of HyPFuzz, we perform optimizations in time and space. First, we develop a scheduling strategy to prevent under- or over-utilization of the capabilities of formal tools and fuzzers. Second, we develop heuristic strategies to select points in the design space for the formal tool to target.

We evaluate HyPFuzz on five widely-used open-source processors. HyPFuzz detected all the vulnerabilities detected by the most recent processor fuzzer and found three new vulnerabilities that were missed by previous extensive fuzzing and formal verification. This led to two new common vulnerabilities and exposures (CVE) entries. HyPFuzz also achieves 11.68× faster coverage than the most recent processor fuzzer.

PolyFuzz: Holistic Greybox Fuzzing of Multi-Language Systems

Wen Li, Jinyang Ruan, and Guangbei Yi, Washington State University; Long Cheng, Clemson University; Xiapu Luo, The Hong Kong Polytechnic University; Haipeng Cai, Washington State University

Available Media

While offering many advantages during software process, the practice of using multiple programming languages in constructing one software system also introduces additional security vulnerabilities in the resulting code. As this practice becomes increasingly prevalent, securing multi-language systems is of pressing criticality. Fuzzing has been a powerful security testing technique, yet existing fuzzers are commonly limited to single-language software. In this paper, we present PolyFuzz, a greybox fuzzer that holistically fuzzes a given multi-language system through cross-language coverage feedback and explicit modeling of the semantic relationships between (various segments of) program inputs and branch predicates across languages. PolyFuzz is extensible for supporting multilingual code written in different language combinations and has been implemented for C, Python, Java, and their combinations. We evaluated PolyFuzz versus state-of-the-art single-language fuzzers for these languages as baselines against 15 real-world multi-language systems and 15 single-language benchmarks. PolyFuzz achieved 25.3–52.3% higher code coverage and found 1–10 more bugs than the baselines against the multilingual programs, and even 10-20% higher coverage against the single-language benchmarks. In total, PolyFuzz has enabled the discovery of 12 previously unknown multilingual vulnerabilities and 2 single-language ones, with 5 CVEs assigned. Our results show great promises of PolyFuzz for cross-language fuzzing, while justifying the strong need for holistic fuzzing against trivially applying single-language fuzzers to multi-language software.

2:45 pm–3:15 pm

Break with Refreshments

Platinum Foyer

3:15 pm–4:30 pm

Track 1

Programs, Code, and Binaries

Session Chair: Sébasiten Bardin, CEA LIST

Platinum Salon 6

VIPER: Spotting Syscall-Guard Variables for Data-Only Attacks

Hengkai Ye, Song Liu, Zhechang Zhang, and Hong Hu, The Pennsylvania State University

Available Media

As control-flow protection techniques are widely deployed, it is difficult for attackers to modify control data, like function pointers, to hijack program control flow. Instead, data-only attacks corrupt security-critical non-control data (critical data), and can bypass all control-flow protections to revive severe attacks. Previous works have explored various methods to help construct or prevent data-only attacks. However, no solution can automatically detect program-specific critical data.

In this paper, we identify an important category of critical data, syscall-guard variables, and propose a set of solutions to automatically detect such variables in a scalable manner. Syscall-guard variables determine to invoke security-related system calls (syscalls), and altering them will allow attackers to request extra privileges from the operating system. We propose branch force, which intentionally flips every conditional branch during the execution and checks whether new security-related syscalls are invoked. If so, we conduct data-flow analysis to estimate the feasibility to flip such branches through common memory errors. We build a tool, VIPER, to implement our ideas. VIPER successfully detects 34 previously unknown syscall-guard variables from 13 programs. We build four new data-only attacks on sqlite and v8, which execute arbitrary command or delete arbitrary file. VIPER completes its analysis within five minutes for most programs, showing its practicality for spotting syscall-guard variables.

AURC: Detecting Errors in Program Code and Documentation

Peiwei Hu, Ruigang Liang, and Ying Cao, SKLOIS, Institute of Information Engineering, Chinese Academy of Sciences, China, and School of Cyber Security, University of Chinese Academy of Sciences, China; Kai Chen, SKLOIS, Institute of Information Engineering, Chinese Academy of Sciences, China, School of Cyber Security, University of Chinese Academy of Sciences, China, and Beijing Academy of Artificial Intelligence, China; Runze Zhang, SKLOIS, Institute of Information Engineering, Chinese Academy of Sciences, China, and School of Cyber Security, University of Chinese Academy of Sciences, China

Available Media

Error detection in program code and documentation is a critical problem in computer security. Previous studies have shown promising vulnerability discovery performance by extensive code or document-guided analysis. However, the state-of-the-arts have the following significant limitations: (i) They assume the documents are correct and treat the code that violates documents as bugs, thus cannot find documents’ defects and code’s bugs if APIs have defective documents or no documents. (ii) They utilize majority voting to judge the inconsistent code snippets and treat the deviants as bugs, thus cannot cope with situations where correct usage is minor or all use cases are wrong.

In this paper, we present AURC, a static framework for detecting code bugs of incorrect return checks and document defects. We observe that three objects participate in the API invocation, the document, the caller (code that invokes API), and the callee (the source code of API). Mutual corroboration of these three objects eliminates the reliance on the above assumptions. AURC contains a context-sensitive backward analysis to process callees, a pre-trained model-based document classifier, and a container that collects conditions of if statements from callers. After cross-checking the results from callees, callers, and documents, AURC delivers them to the correctness inference module to infer the defective one. We evaluated AURC on ten popular codebases. AURC discovered 529 new bugs that can lead to security issues like heap buffer overflow and sensitive information leakage, and 224 new document defects. Maintainers acknowledge our findings and have accepted 222 code patches and 76 document patches.

Not All Data are Created Equal: Data and Pointer Prioritization for Scalable Protection Against Data-Oriented Attacks

Salman Ahmed, IBM Research; Hans Liljestrand, University of Waterloo; Hani Jamjoom, IBM Research; Matthew Hicks, Virginia Tech; N. Asokan, University of Waterloo; Danfeng (Daphne) Yao, Virginia Tech

Available Media

Data-oriented attacks are becoming increasingly realistic and effective against the state-of-the-art defenses in most operating systems. These attacks manipulate memory-resident data objects (data and pointers) without changing the control flow of a program. Software and hardware-based countermeasures for protecting data and pointers suffer from performance bottlenecks due to excessive instrumentation of all data objects. In this work, we propose a Data and Pointer Prioritization (DPP) framework utilizing rule-based heuristics to identify sensitive memory objects automatically from an application and protect only those sensitive data utilizing existing countermeasures. We evaluate the correctness of our framework using the Linux Flaw Project dataset, Juliet Test Suite, and five real-world programs (used for demonstrating data-oriented attacks). Our experiments show that DPP can identify vulnerable data objects from our tested applications by prioritizing as few as only 3–4% of total data objects. Our evaluation of the SPEC CPU2017 Integer benchmark suite shows that DPP-enabled AddressSanitizer (ASan) can improve performance (in terms of throughput) by ∼1.6x and reduce run-time overhead by ∼70% compared to the default ASan while protecting all the prioritized data objects.

SAFER: Efficient and Error-Tolerant Binary Instrumentation

Soumyakant Priyadarshan, Huan Nguyen, Rohit Chouhan, and R. Sekar, Stony Brook University

Available Media

Recent advances in binary instrumentation have been focused on performance. By statically transforming the code to avoid additional runtime operations, systems such as Egalito and RetroWrite achieve near zero overheads. The safety of these static transformations relies on several assumptions: (a) error-free and complete disassembly, (b) exclusive use of position-independent code, and (c) code pointer identification that is free of both false positives and false negatives. Violations of these assumptions can cause an instrumented program to crash, or worse, experience delayed failures that corrupt data or compromise security. Many earlier binary instrumentation techniques (e.g., DynamoRio, Pin, and BinCFI) minimized such assumptions, but the price to be paid is a much higher overhead, especially for indirect-call-intensive (e.g., C++) applications. Thus, an open research question is whether the safety benefits of the earlier works can be combined with the performance benefits of recent works. We answer this question in the affirmative by presenting a new instrumentation technique that (a) tolerates the use of position-dependent code and common disassembly and static analysis errors, and (b) detects assumption violations at runtime before they can lead to undefined behavior. Our approach provides a fail-crash primitive for graceful shutdown or recovery. We achieve safe instrumentation without sacrificing performance, introducing a low overhead of about ∼ 2%.

Reassembly is Hard: A Reflection on Challenges and Strategies

Hyungseok Kim, KAIST and The Affiliated Institute of ETRI; Soomin Kim and Junoh Lee, KAIST; Kangkook Jee, University of Texas at Dallas; Sang Kil Cha, KAIST

Available Media

Reassembly, a branch of static binary rewriting, has become a focus of research today. However, despite its widespread use and research interest, there have been no systematic investigations on the techniques and challenges of reassemblers. In this paper, we formally define different types of errors that occur in current existing reassemblers, and present an automated tool named REASSESSOR to find such errors. We attempt to show through our tool and the large-scale benchmark we created the current challenges in the field and how they can be approached.

Track 2

IoT Security Expectations and Barriers

Session Chair: Earlence Fernandes, UCSD

Platinum Salon 5

Measuring Up to (Reasonable) Consumer Expectations: Providing an Empirical Basis for Holding IoT Manufacturers Legally Responsible

Lorenz Kustosch and Carlos Gañán, TU Delft; Mattis van 't Schip, Radboud University; Michel van Eeten and Simon Parkin, TU Delft

Available Media

With continued cases of security and privacy incidents with consumer Internet-of-Things (IoT) devices comes the need to identify which actors are in the best place to respond. Previous literature studied expectations of consumers regarding how security and privacy should be implemented and who should take on preventive efforts. But how do such normative consumer expectations differ from what is actually realistic, or reasonable to expect how security and privacy-related events will be handled? Using a vignette survey with 862 participants, we studied consumer expectations on how IoT manufacturers and users would and should respond when confronted with a potentially infected or privacy-invading IoT device. We find that expectations differ considerably between what is realistic and what is appropriate. Furthermore, security and privacy lead to different expectations around users’ and manufacturers’ actions, with a general diffusion of expectations on how to handle privacy-related events. We offer recommendations to IoT manufacturers and regulators on how to support users in addressing security and privacy issues.

Are Consumers Willing to Pay for Security and Privacy of IoT Devices?

Pardis Emami-Naeini, Duke University; Janarth Dheenadhayalan, Yuvraj Agarwal, and Lorrie Faith Cranor, Carnegie Mellon University

Available Media

Internet of Things (IoT) device manufacturers provide little information to consumers about their security and data handling practices. Therefore, IoT consumers cannot make informed purchase choices around security and privacy. While prior research has found that consumers would likely consider security and privacy when purchasing IoT devices, past work lacks empirical evidence as to whether they would actually pay more to purchase devices with enhanced security and privacy. To fill this gap, we conducted a two-phase incentive-compatible online study with 180 Prolific participants. We measured the impact of five security and privacy factors (e.g., access control) on participants' purchase behaviors when presented individually or together on an IoT label. Participants were willing to pay a significant premium for devices with better security and privacy practices. The biggest price differential we found was for de-identified rather than identifiable cloud storage. Mainly due to its usability challenges, the least valuable improvement for participants was to have multi-factor authentication as opposed to passwords. Based on our findings, we provide recommendations on creating more effective IoT security and privacy labeling programs.

Examining Consumer Reviews to Understand Security and Privacy Issues in the Market of Smart Home Devices

Swaathi Vetrivel, Veerle van Harten, Carlos H. Gañán, Michel van Eeten, and Simon Parkin, Delft University of Technology

Available Media

Despite growing evidence that consumers care about secure Internet-of-Things (IoT) devices, relevant security and privacy-related information is unavailable at the point of purchase. While initiatives such as security labels create new avenues to signal a device's security and privacy posture, we analyse an existing avenue for such market signals - customer reviews. We investigate whether and to what extent customer reviews of IoT devices with well-known security and privacy issues reflect these concerns. We examine 83,686 reviews of four IoT device types commonly infected with Mirai across all Amazon websites in English. We perform topic modelling to group the reviews and conduct manual coding to understand (i) the prevalence of security and privacy issues and (ii) the themes that these issues articulate. Overall, around one in ten reviews (9.8%) mentions security and privacy issues; the geographical distribution varies across the six countries. We distil references to security and privacy into seven themes and identify two orthogonal themes: reviews written in technical language and those that mention friction with security steps. Our results thus highlight the value of the already existing avenue of customer reviews. We draw on these results to make recommendations and identify future research directions.

Internet Service Providers' and Individuals' Attitudes, Barriers, and Incentives to Secure IoT

Nissy Sombatruang, National Institute of Information and Communications Technology; Tristan Caulfield and Ingolf Becker, University College London; Akira Fujita, Takahiro Kasama, Koji Nakao, and Daisuke Inoue, National Institute of Information and Communications Technology

Available Media

Internet Service Providers (ISPs) and individual users of Internet of Things (IoT) play a vital role in securing IoT. However, encouraging them to do so is hard. Our study investigates ISPs' and individuals' attitudes towards the security of IoT, the obstacles they face, and their incentives to keep IoT secure, drawing evidence from Japan.

Due to the complex interactions of the stakeholders, we follow an iterative methodology where we present issues and potential solutions to our stakeholders in turn. For ISPs, we survey 27 ISPs in Japan, followed by a workshop with representatives from government and 5 ISPs. Based on the findings from this, we conduct semi-structured interviews with 20 participants followed by a more quantitative survey with 328 participants. We review these results in a second workshop with representatives from government and 7 ISPs. The appreciation of challenges by each party has lead to findings that are supported by all stakeholders.

Securing IoT devices is neither users' nor ISPs' priority. Individuals are keen on more interventions both from the government as part of regulation and from ISPs in terms of filtering malicious traffic. Participants are willing to pay for enhanced monitoring and filtering. While ISPs do want to help users, there appears to be a lack of effective technology to aid them. ISPs would like to see more public recognition for their efforts, but internally they struggle with executive buy-in and effective means to communicate with their customers. The majority of barriers and incentives are external to ISPs and individuals, demonstrating the complexity of keeping IoT secure and emphasizing the need for relevant stakeholders in the IoT ecosystem to work in tandem.

Detecting and Handling IoT Interaction Threats in Multi-Platform Multi-Control-Channel Smart Homes

Haotian Chi, Shanxi University and Temple University; Qiang Zeng, George Mason University; Xiaojiang Du, Stevens Institute of Technology

Available Media

A smart home involves a variety of entities, such as IoT devices, automation applications, humans, voice assistants, and companion apps. These entities interact in the same physical environment, which can yield undesirable and even hazardous results, called IoT interaction threats. Existing work on interaction threats is limited to considering automation apps, ignoring other IoT control channels, such as voice commands, companion apps, and physical operations. Second, it becomes increasingly common that a smart home utilizes multiple IoT platforms, each of which has a partial view of device states and may issue conflicting commands. Third, compared to detecting interaction threats, their handling is much less studied. Prior work uses generic handling policies, which are unlikely to fit all homes. We present IoTMediator, which provides accurate threat detection and threat-tailored handling in multi-platform multi-control-channel homes. Our evaluation in two real-world homes demonstrates that IoTMediator significantly outperforms prior state-of-the-art work.

Track 3

Differential Privacy

Session Chair: Amrita Roy Chowdhury, University of California, San Diego

Platinum Salon 7–8

Private Proof-of-Stake Blockchains using Differentially-Private Stake Distortion

Chenghong Wang, David Pujol, Kartik Nayak, and Ashwin Machanavajjhala, Duke University

Available Media

Safety, liveness, and privacy are three critical properties for any private proof-of-stake (PoS) blockchain. However, prior work (SP'21) has shown that to obtain safety and liveness, a PoS blockchain must in theory forgo privacy. In particular, to obtain safety and liveness, PoS blockchains elect parties proportional to their stake, which, in turn, can potentially reveal the stake of a party even if the transaction processing mechanism is private.

In this work, we make two key contributions. First, we present the first stake inference attack that can be actually run in practice. Specifically, our attack applies to both deterministic and randomized PoS protocols and has exponentially lesser running time in comparison with the SOTA approach. Second, we use differentially private stake distortion to achieve privacy in PoS blockchains. We formulate certain privacy requirements to achieve transaction and stake privacy, and design two stake distortion mechanisms that any PoS protocol can use. Moreover, we analyze our proposed mechanisms with Ethereum 2.0, a well-known PoS blockchain that is already operating in practice. The results indicate that our mechanisms mitigate stake inference risks and, at the same time, provide reasonable privacy while preserving required safety and liveness properties.

PrivateFL: Accurate, Differentially Private Federated Learning via Personalized Data Transformation

Yuchen Yang, Bo Hui, and Haolin Yuan, The Johns Hopkins University; Neil Gong, Duke University; Yinzhi Cao, The Johns Hopkins University

Available Media

Federated learning (FL) enables multiple clients to collaboratively train a model with the coordination of a central server. Although FL improves data privacy via keeping each client's training data locally, an attacker—e.g., an untrusted server—an still compromise the privacy of clients' local training data via various inference attacks. A de facto approach to preserving FL privacy is Differential Privacy (DP), which adds random noise during training. However, when applied to FL, DP suffers from a key limitation: it sacrifices the model accuracy substantially—which is even more severely than being applied to traditional centralized learning—to achieve a meaningful level of privacy.

In this paper, we study the accuracy degradation cause of FL+DP and then design an approach to improve the accuracy. First, we propose that such accuracy degradation is partially because DP introduces additional heterogeneity among FL clients when adding different random noise with clipping bias during local training. To the best of our knowledge, we are the first to associate DP in FL with client heterogeneity. Second, we design PrivateFL to learn accurate, differentially private models in FL with reduced heterogeneity. The key idea is to jointly learn a differentially private, personalized data transformation for each client during local training. The personalized data transformation shifts client's local data distribution to compensate the heterogeneity introduced by DP, thus improving FL model's accuracy.

In the evaluation, we combine and compare PrivateFL with eight state-of-the-art differentially private FL methods on seven benchmark datasets, including six image and one non-image datasets. Our results show that PrivateFL learns accurate FL models with a small ε, e.g., 93.3% on CIFAR-10 with 100 clients under (ε = 2, δ = 1e – 3)-DP. Moreover, PrivateFL can be combined with prior works to reduce DP-induced heterogeneity and further improve their accuracy.

What Are the Chances? Explaining the Epsilon Parameter in Differential Privacy

Priyanka Nanayakkara, Northwestern University; Mary Anne Smart, University of California San Diego; Rachel Cummings, Columbia University; Gabriel Kaptchuk, Boston University; Elissa M. Redmiles, Max Planck Institute for Software Systems

Available Media

Differential privacy (DP) is a mathematical privacy notion increasingly deployed across government and industry. With DP, privacy protections are probabilistic: they are bounded by the privacy loss budget parameter, ε. Prior work in health and computational science finds that people struggle to reason about probabilistic risks. Yet, communicating the implications of ε to people contributing their data is vital to avoiding privacy theater—presenting meaningless privacy protection as meaningful—and empowering more informed data-sharing decisions. Drawing on best practices in risk communication and usability, we develop three methods to convey probabilistic DP guarantees to end users: two that communicate odds and one offering concrete examples of DP outputs.

We quantitatively evaluate these explanation methods in a vignette survey study (n = 963) via three metrics: objective risk comprehension, subjective privacy understanding of DP guarantees, and self-efficacy. We find that odds-based explanation methods are more effective than (1) output-based methods and (2) state-of-the-art approaches that gloss over information about ε. Further, when offered information about ε, respondents are more willing to share their data than when presented with a state-of-the-art DP explanation; this willingness to share is sensitive to ε values: as privacy protections weaken, respondents are less likely to share data.

Tight Auditing of Differentially Private Machine Learning

Milad Nasr, Jamie Hayes, Thomas Steinke, and Borja Balle, Google DeepMind; Florian Tramèr, ETH Zurich; Matthew Jagielski, Nicholas Carlini, and Andreas Terzis, Google DeepMind

Distinguished Paper Award Winner

Available Media

Auditing mechanisms for differential privacy use probabilistic means to empirically estimate the privacy level of an algorithm. For private machine learning, existing auditing mechanisms are tight: the empirical privacy estimate (nearly) matches the algorithm's provable privacy guarantee. But these auditing techniques suffer from two limitations. First, they only give tight estimates under implausible worst-case assumptions (e.g., a fully adversarial dataset). Second, they require thousands or millions of training runs to produce nontrivial statistical estimates of the privacy leakage.

This work addresses both issues. We design an improved auditing scheme that yields tight privacy estimates for natural (not adversarially crafted) datasets—if the adversary can see all model updates during training. Prior auditing works rely on the same assumption, which is permitted under the standard differential privacy threat model. This threat model is also applicable, e.g., in federated learning settings. Moreover, our auditing scheme requires only two training runs (instead of thousands) to produce tight privacy estimates, by adapting recent advances in tight composition theorems for differential privacy. We demonstrate the utility of our improved auditing schemes by surfacing implementation bugs in private machine learning code that eluded prior auditing techniques.

PrivTrace: Differentially Private Trajectory Synthesis by Adaptive Markov Models

Haiming Wang, Zhejiang University; Zhikun Zhang, CISPA Helmholtz Center for Information Security; Tianhao Wang, University of Virginia; Shibo He, Zhejiang University; Michael Backes, CISPA Helmholtz Center for Information Security; Jiming Chen, Zhejiang University; Yang Zhang, CISPA Helmholtz Center for Information Security

Available Media

Publishing trajectory data (individual's movement information) is very useful, but it also raises privacy concerns. To handle the privacy concern, in this paper, we apply differential privacy, the standard technique for data privacy, together with Markov chain model, to generate synthetic trajectories. We notice that existing studies all use Markov chain model and thus propose a framework to analyze the usage of the Markov chain model in this problem. Based on the analysis, we come up with an effective algorithm PrivTrace that uses the first-order and second-order Markov model adaptively. We evaluate PrivTrace and existing methods on synthetic and real-world datasets to demonstrate the superiority of our method.

Track 4

Poisoning

Session Chair: Shuang Hao, University of Texas at Dallas

Platinum Salon 9–10

Meta-Sift: How to Sift Out a Clean Subset in the Presence of Data Poisoning?

Yi Zeng, Virginia Tech and SONY AI; Minzhou Pan, Himanshu Jahagirdar, and Ming Jin, Virginia Tech; Lingjuan Lyu, SONY AI; Ruoxi Jia, Virginia Tech

Available Media

External data sources are increasingly being used to train machine learning (ML) models as the data demand increases. However, the integration of external data into training poses data poisoning risks, where malicious providers manipulate their data to compromise the utility or integrity of the model. Most data poisoning defenses assume access to a set of clean data (referred to as the base set), which could be obtained through trusted sources. But it also becomes common that entire data sources for an ML task are untrusted (e.g., Internet data). In this case, one needs to identify a subset within a contaminated dataset as the base set to support these defenses.

This paper starts by examining the performance of defenses when poisoned samples are mistakenly mixed into the base set. We analyze five representative defenses that use base sets and find that their performance deteriorates dramatically with less than 1% poisoned points in the base set. These findings suggest that sifting out a base set with \emph{high precision} is key to these defenses' performance. Motivated by these observations, we study how precise existing automated tools and human inspection are at identifying clean data in the presence of data poisoning. Unfortunately, neither effort achieves the precision needed that enables effective defenses. Worse yet, many of the outcomes of these methods are worse than random selection.

In addition to uncovering the challenge, we take a step further and propose a practical countermeasure, Meta-Sift. Our method is based on the insight that existing poisoning attacks shift data distributions, resulting in high prediction loss when training on the clean portion of a poisoned dataset and testing on the corrupted portion. Leveraging the insight, we formulate a bilevel optimization to identify clean data and further introduce a suite of techniques to improve the efficiency and precision of the identification. Our evaluation shows that Meta-Sift can sift a clean base set with 100\% precision under a wide range of poisoning threats. The selected base set is large enough to give rise to successful defense when plugged into the existing defense techniques.

Towards A Proactive ML Approach for Detecting Backdoor Poison Samples

Xiangyu Qi, Tinghao Xie, Jiachen T. Wang, Tong Wu, Saeed Mahloujifar, and Prateek Mittal, Princeton University

Available Media

Adversaries can embed backdoors in deep learning models by introducing backdoor poison samples into training datasets. In this work, we investigate how to detect such poison samples to mitigate the threat of backdoor attacks. First, we uncover a post-hoc workflow underlying most prior work, where defenders passively allow the attack to proceed and then leverage the characteristics of the post-attacked model to uncover poison samples. We reveal that this workflow does not fully exploit defenders' capabilities, and defense pipelines built on it are prone to failure or performance degradation in many scenarios. Second, we suggest a paradigm shift by promoting a proactive mindset in which defenders engage proactively with the entire model training and poison detection pipeline, directly enforcing and magnifying distinctive characteristics of the post-attacked model to facilitate poison detection. Based on this, we formulate a unified framework and provide practical insights on designing detection pipelines that are more robust and generalizable. Third, we introduce the technique of Confusion Training (CT) as a concrete instantiation of our framework. CT applies an additional poisoning attack to the already poisoned dataset, actively decoupling benign correlation while exposing backdoor patterns to detection. Empirical evaluations on 4 datasets and 14 types of attacks validate the superiority of CT over 14 baseline defenses.

PORE: Provably Robust Recommender Systems against Data Poisoning Attacks

Jinyuan Jia, The Pennsylvania State University; Yupei Liu, Yuepeng Hu, and Neil Zhenqiang Gong, Duke University

Available Media

Data poisoning attacks spoof a recommender system to make arbitrary, attacker-desired recommendations via injecting fake users with carefully crafted rating scores into the recommender system. We envision a cat-and-mouse game for such data poisoning attacks and their defenses, i.e., new defenses are designed to defend against existing attacks and new attacks are designed to break them. To prevent such cat-and-mouse game, we propose PORE, the first framework to build provably robust recommender systems in this work. PORE can transform any existing recommender system to be provably robust against any untargeted data poisoning attacks, which aim to reduce the overall performance of a recommender system. Suppose PORE recommends top-N items to a user when there is no attack. We prove that PORE still recommends at least r of the N items to the user under any data poisoning attack, where r is a function of the number of fake users in the attack. Moreover, we design an efficient algorithm to compute r for each user. We empirically evaluate PORE on popular benchmark datasets.

Every Vote Counts: Ranking-Based Training of Federated Learning to Resist Poisoning Attacks

Hamid Mozaffari, Virat Shejwalkar, and Amir Houmansadr, University of Massachusetts Amherst

Available Media

Federated learning (FL) allows untrusted clients to collaboratively train a common machine learning model, called global model, without sharing their private/proprietary training data. However, FL is susceptible to poisoning by malicious clients who aim to hamper the accuracy of the global model by contributing malicious updates during FL's training process.

We argue that the key factor to the success of poisoning attacks against existing FL systems is the large space of model updates available to the clients to choose from. To address this, we propose Federated Rank Learning (FRL). FRL reduces the space of client updates from model parameter updates (a continuous space of float numbers) in standard FL to the space of parameter rankings (a discrete space of integer values). To be able to train the global model using parameter ranks (instead of parameter weights), FRL leverage ideas from recent supermasks training mechanisms. Specifically, FRL clients rank the parameters of a randomly initialized neural network (provided by the server) based on their local training data, and the FRL server uses a voting mechanism to aggregate the parameter rankings submitted by the clients.

Intuitively, our voting-based aggregation mechanism prevents poisoning clients from making significant adversarial modifications to the global model, as each client will have a single vote! We demonstrate the robustness of FRL to poisoning through analytical proofs and experimentation, and we show its high communication efficiency.

Fine-grained Poisoning Attack to Local Differential Privacy Protocols for Mean and Variance Estimation

Xiaoguang Li, Xidian University and Purdue University; Ninghui Li and Wenhai Sun, Purdue University; Neil Zhenqiang Gong, Duke University; Hui Li, Xidian University

Available Media

Although local differential privacy (LDP) protects individual users' data from inference by an untrusted data curator, recent studies show that an attacker can launch a data poisoning attack from the user side to inject carefully-crafted bogus data into the LDP protocols in order to maximally skew the final estimate by the data curator.

In this work, we further advance this knowledge by proposing a new fine-grained attack, which allows the attacker to fine-tune and simultaneously manipulate mean and variance estimations that are popular analytical tasks for many real-world applications. To accomplish this goal, the attack leverages the characteristics of LDP to inject fake data into the output domain of the local LDP instance. We call our attack the output poisoning attack (OPA). We observe a security-privacy consistency where a small privacy loss enhances the security of LDP, which contradicts the known security-privacy trade-off from prior work. We further study the consistency and reveal a more holistic view of the threat landscape of data poisoning attacks on LDP. We comprehensively evaluate our attack against a baseline attack that intuitively provides false input to LDP. The experimental results show that OPA outperforms the baseline on three real-world datasets. We also propose a novel defense method that can recover the result accuracy from polluted data collection and offer insight into the secure LDP design.

Track 5

Smart Contracts

Session Chair: Srdjan Capkun, ETH Zurich

Platinum Salon 3–4

Your Exploit is Mine: Instantly Synthesizing Counterattack Smart Contract

Zhuo Zhang, Purdue University; Zhiqiang Lin and Marcelo Morales, Ohio State University; Xiangyu Zhang and Kaiyuan Zhang, Purdue University

Available Media

Smart contracts are susceptible to exploitation due to their unique nature. Despite efforts to identify vulnerabilities using fuzzing, symbolic execution, formal verification, and manual auditing, exploitable vulnerabilities still exist and have led to billions of dollars in monetary losses. To address this issue, it is critical that runtime defenses are in place to minimize exploitation risk. In this paper, we present STING, a novel runtime defense mechanism against smart contract exploits. The key idea is to instantly synthesize counterattack smart contracts from attacking transactions and leverage the power of Maximal Extractable Value (MEV) to front run attackers. Our evaluation with 62 real-world recent exploits demonstrates its effectiveness, successfully countering 54 of the exploits (i.e., intercepting all the funds stolen by the attacker). In comparison, a general front-runner defense could only handle 12 exploits. Our results provide a clear proof-of-concept that STING is a viable defense mechanism against smart contract exploits and has the potential to significantly reduce the risk of exploitation in the smart contract ecosystem.

Smart Learning to Find Dumb Contracts

Tamer Abdelaziz, National University of Singapore; Aquinas Hobor, University College London

Available Media

We introduce Deep Learning Vulnerability Analyzer (DLVA), a vulnerability detection tool for Ethereum smart contracts based on powerful deep learning techniques for sequential data adapted for bytecode. We train DLVA to judge bytecode even though the supervising oracle, Slither, can only judge source code. DLVA's training algorithm is general: we “extend” a source code analysis to bytecode without any manual feature engineering, predefined patterns, or expert rules. DLVA's training algorithm is also robust: it overcame a 1.25% error rate mislabeled contracts, and—the student surpassing the teacher—found vulnerable contracts that Slither mislabeled. In addition to extending a source code analyzer to bytecode, DLVA is much faster than conventional tools for smart contract vulnerability detection based on formal methods: DLVA checks contracts for 29 vulnerabilities in 0.2 seconds, a 10–1,000x speedup compared to traditional tools.

DLVA has three key components. First, Smart Contract to Vector (SC2V) uses neural networks to map arbitrary smart contract bytecode to an high-dimensional floating-point vector. We benchmark SC2V against 4 state-of-the-art graph neural networks and show that it improves model differentiation by an average of 2.2%. Second, Sibling Detector (SD) classifies contracts when a target contract's vector is Euclidianclose to a labeled contract's vector in a training set; although only able to judge 55.7% of the contracts in our test set, it has an average Slither-predictive accuracy of 97.4% with a false positive rate of only 0.1%. Third, Core Classifier (CC) uses neural networks to infer vulnerable contracts regardless of vector distance. We benchmark DLVA's CC with 10 “offthe-shelf” machine learning techniques and show that the CC improves average accuracy by 11.3%. Overall, DLVA predicts Slither's labels with an overall accuracy of 92.7% and associated false positive rate of 7.2%.

Lastly, we benchmark DLVA against nine well-known smart contract analysis tools. Despite using much less analysis time, DLVA completed every query, leading the pack with an average accuracy of 99.7%, pleasingly balancing high true positive rates with low false positive rates.

Confusum Contractum: Confused Deputy Vulnerabilities in Ethereum Smart Contracts

Fabio Gritti, Nicola Ruaro, Robert McLaughlin, Priyanka Bose, Dipanjan Das, Ilya Grishchenko, Christopher Kruegel, and Giovanni Vigna, University of California, Santa Barbara

Available Media

Smart contracts are immutable programs executed in the context of a globally distributed system known as a blockchain. They enable the decentralized implementation of many interesting applications, such as financial protocols, voting systems, and supply-chain management. In many cases, multiple smart contracts need to work together and communicate with one another to implement complex business logic. However, these smart contracts must take special care to guard against malicious interactions that might lead to the violation of a contract's security properties and possibly result in substantial financial losses.

In this paper, we introduce a class of inter-program communication flaws that we call confused contract vulnerabilities. This type of bug is an instance of the confused deputy vulnerability, set in the new context of smart contract inter-communication. When exploiting a confused contract bug, an attacker is able to divert a remote (inter-contract) call in a confused (victim) contract to a target contract and function of the attacker's choosing. The call performs sensitive operations on behalf of the confused contract, which can result in financial loss or malicious modifications of the persistent storage of the involved contracts.

To identify opportunities for confused contract attacks at scale, we implemented Jackal, a system that is able to automatically identify and exploit confused contracts and candidate target contracts on the Ethereum mainnet.

We leveraged Jackal to analyze a total of 2,335,193 smart contracts deployed in the past two years, and we identified 529 potential confused contracts for which we were able to generate 31 working exploits. When investigating the impact of our exploits, we discovered past and present opportunities for confused contract attacks that could have compromised digital assets worth more than one million US dollars.

Panda: Security Analysis of Algorand Smart Contracts

Zhiyuan Sun, The Hong Kong Polytechnic University and Southern University of Science and Technology; Xiapu Luo, The Hong Kong Polytechnic University; Yinqian Zhang, Southern University of Science and Technology

Available Media

Algorand has recently grown rapidly as a representative of the new generation of pure-proof-of-stake (PPoS) blockchains. At the same time, Algorand has also attracted more and more users to use it as a trading platform for non-fungible tokens. However, similar to traditional programs, the incorrect way of programming will lead to critical security vulnerabilities in Algorand smart contracts. In this paper, we first analyze the semantics of Algorand smart contracts and find 9 types of generic vulnerabilities. Next, we propose Panda, the first extensible static analysis framework that can automatically detect such vulnerabilities in Algorand smart contracts, and formally define the vulnerability detection rules. We also construct the first benchmark dataset to evaluate Panda. Finally, we used Panda to conduct a vulnerability assessment on all smart contracts on the Algorand blockchain and found 80,515 (10.38%) vulnerable smart signatures and 150,676 (27.73%) vulnerable applications. Of the vulnerable applications, 4,008 (4.04%) are still on the blockchain and have not been deleted. In the disclosure process, the vulnerabilities found by Panda have been acknowledged by many projects, including some critical blockchain infrastructures such as the decentralized exchange and the NFT auction platform.

Proxy Hunting: Understanding and Characterizing Proxy-based Upgradeable Smart Contracts in Blockchains

William E Bodell III, Sajad Meisami, and Yue Duan, Illinois Institute of Technology

Available Media

Upgradeable smart contracts (USCs) have become a key trend in smart contract development, bringing flexibility to otherwise immutable code. However, they also introduce security concerns. On the one hand, they require extensive security knowledge to implement in a secure fashion. On the other hand, they provide new strategic weapons for malicious activities. Thus, it is crucial to fully understand them, especially their security implications in the real-world. To this end, we conduct a large-scale study to systematically reveal the status quo of USCs in the wild. To achieve our goal, we develop a complete USC taxonomy to comprehensively characterize the unique behaviors of USCs and further develop USCHUNT, an automated USC analysis framework for supporting our study. Our study aims to answer three sets of essential research questions regarding USC importance, design patterns, and security issues. Our results show that USCs are of great importance to today’s blockchain as they hold billions of USD worth of digital assets. Moreover, our study summarizes eleven unique design patterns of USCs, and discovers a total of 2,546 real-world USC-related security and safety issues in six major categories.

Track 6

x-Fuzz and Fuzz-x

Session Chair: Kevin Butler, University of Florida

Platinum Salon 1–2

Fuzztruction: Using Fault Injection-based Fuzzing to Leverage Implicit Domain Knowledge

Nils Bars, Moritz Schloegel, Tobias Scharnowski, and Nico Schiller, Ruhr-Universität Bochum; Thorsten Holz, CISPA Helmholtz Center for Information Security

Distinguished Paper Award Winner and Runner-Up Winner of the 2023 Internet Defense Prize

Available Media

Today's digital communication relies on complex protocols and specifications for exchanging structured messages and data. Communication naturally involves two endpoints: One generating data and one consuming it. Traditional fuzz testing approaches replace one endpoint, the generator, with a fuzzer and rapidly test many mutated inputs on the target program under test. While this fully automated approach works well for loosely structured formats, this does not hold for highly structured formats, especially those that go through complex transformations such as compression or encryption.

In this work, we propose a novel perspective on generating inputs in highly complex formats without relying on heavyweight program analysis techniques, coarse-grained grammar approximation, or a human domain expert. Instead of mutating the inputs for a target program, we inject faults into the data generation program so that this data is almost of the expected format. Such data bypasses the initial parsing stages in the consumer program and exercises deeper program states, where it triggers more interesting program behavior. To realize this concept, we propose a set of compile-time and run-time analyses to mutate the generator in a targeted manner, so that it remains intact and produces semi-valid outputs that satisfy the constraints of the complex format. We have implemented this approach in a prototype called Fuzztruction and show that it outperforms the state-of-the-art fuzzers AFL++, SYMCC, and WEIZZ. Fuzztruction finds significantly more coverage than existing methods, especially on targets that use cryptographic primitives. During our evaluation, Fuzztruction uncovered 151 unique crashes (after automated deduplication). So far, we manually triaged and reported 27 bugs and 4 CVEs were assigned.

FuzzJIT: Oracle-Enhanced Fuzzing for JavaScript Engine JIT Compiler

Junjie Wang, College of Intelligence and Computing, Tianjin University; Zhiyi Zhang, CodeSafe Team, Qi An Xin Group Corp.; Shuang Liu, College of Intelligence and Computing, Tianjin University; Xiaoning Du, Monash University; Junjie Chen, College of Intelligence and Computing, Tianjin University

Available Media

We present a novel fuzzing technique, FuzzJIT, for exposing JIT compiler bugs in JavaScript engines, based on our insight that JIT compilers shall only speed up the execution but never change the execution result of JavaScript code. FuzzJIT can activate the JIT compiler for every test case and acutely capture any execution discrepancy caused by JIT compilers. The key to success is the design of an input wrapping template, which proactively activates the JIT compiler and makes the generated samples oracle-aware themselves and the oracle is tested during execution spontaneously. We also design a set of mutation strategies to emphasize program elements promising in revealing JIT compiler bugs. FuzzJIT drills to JIT compilers and at the same time retains the high efficiency of fuzzing. We have implemented the design and applied the prototype to find new JIT compiler bugs in four mainstream JavaScript engines. In one month, ten, five, two, and 16 new bugs are exposed in JavaScriptCore, V8, SpiderMonkey, and ChakraCore, respectively, with three demonstrated exploitable.

GLeeFuzz: Fuzzing WebGL Through Error Message Guided Mutation

Hui Peng, Purdue University; Zhihao Yao and Ardalan Amiri Sani, UC Irvine; Dave (Jing) Tian, Purdue University; Mathias Payer, EPFL

Available Media

WebGL is a set of standardized JavaScript APIs for GPU accelerated graphics. Security of the WebGL interface is paramount because it exposes remote and unsandboxed access to the underlying graphics stack (including the native GL libraries and GPU drivers) in the host OS. Unfortunately, applying state-of-the-art fuzzing techniques to the WebGL interface for vulnerability discovery is challenging because of (1) its huge input state space, and (2) the infeasibility of collecting code coverage across concurrent processes, closed-source libraries, and device drivers in the kernel.

Our fuzzing technique, GLeeFuzz, guides input mutation by error messages instead of code coverage. Our key observation is that browsers emit meaningful error messages to aid developers in debugging their WebGL programs. Error messages indicate which part of the input fails (e.g., incomplete arguments, invalid arguments, or unsatisfied dependencies between API calls). Leveraging error messages as feedback, the fuzzer effectively expands coverage by focusing mutation on erroneous parts of the input. We analyze Chrome’s WebGL implementation to identify the dependencies between error-emitting statements and rejected parts of the input, and use this information to guide input mutation. We evaluate our GLeeFuzz prototype on Chrome, Firefox, and Safari on diverse desktop and mobile OSes. We discovered 7 vulnerabilities, 4 in Chrome, 2 in Safari, and 1 in Firefox. The Chrome vulnerabilities allow a remote attacker to freeze the GPU and possibly execute remote code at the browser privilege.

autofz: Automated Fuzzer Composition at Runtime

Yu-Fu Fu, Jaehyuk Lee, and Taesoo Kim, Georgia Institute of Technology

Available Media

Fuzzing has gained in popularity for software vulnerability detection by virtue of the tremendous effort to develop a diverse set of fuzzers. Thanks to various fuzzing techniques, most of the fuzzers have been able to demonstrate great performance on their selected targets. However, paradoxically, this diversity in fuzzers also made it difficult to select fuzzers that are best suitable for complex real-world programs, which we call selection burden. Communities attempted to address this problem by creating a set of standard benchmarks to compare and contrast the performance of fuzzers for a wide range of applications, but the result was always a suboptimal decision—the best-performing fuzzer on average does not guarantee the best outcome for the target of a user's interest.

To overcome this problem, we propose an automated, yet non-intrusive meta-fuzzer, called autofz, to maximize the benefits of existing state-of-the-art fuzzers via dynamic composition. To an end user, this means that, instead of spending time on selecting which fuzzer to adopt (similar in concept to hyperparameter tuning in ML), one can simply put all of the available fuzzers to autofz (similar in concept to AutoML), and achieve the best, optimal result. The key idea is to monitor the runtime progress of the fuzzers, called trends (similar in concept to gradient descent), and make a fine-grained adjustment of resource allocation (e.g., CPU time) of each fuzzer. This is a stark contrast to existing approaches that statically combine a set of fuzzers, or via exhaustive pre-training per target program - autofz deduces a suitable set of fuzzers of the active workload in a fine-grained manner at runtime. Our evaluation shows that, given the same amount of computation resources, autofz outperforms any best-performing individual fuzzers in 11 out of 12 available benchmarks and beats the best, collaborative fuzzing approaches in 19 out of 20 benchmarks without any prior knowledge in terms of coverage. Moreover, on average, autofz found 152% more bugs than individual fuzzers on UNIFUZZ and FTS, and 415% more bugs than collaborative fuzzing on UNIFUZZ.

CarpetFuzz: Automatic Program Option Constraint Extraction from Documentation for Fuzzing

Dawei Wang, Ying Li, and Zhiyu Zhang, SKLOIS, Institute of Information Engineering, Chinese Academy of Sciences, China; School of Cyber Security, University of Chinese Academy of Sciences, China; Kai Chen, SKLOIS, Institute of Information Engineering, Chinese Academy of Sciences, China; School of Cyber Security, University of Chinese Academy of Sciences, China; Beijing Academy of Artificial Intelligence, China

Available Media

The large-scale code in software supports the rich and diverse functionalities, and at the same time contains potential vulnerabilities. Fuzzing, as one of the most popular vulnerability detection methods, continues evolving in both industry and academy, aiming to find more vulnerabilities by covering more code. However, we find that even with the state-of-the-art fuzzers, there is still some unexplored code that can only be triggered using a specific combination of program options. Simply mutating the options may generate many invalid combinations due to the lack of consideration of constraints (or called relationships) among options. In this paper, we leverage natural language processing (NLP) to automatically extract option descriptions from program documents and analyze the relationship (e.g., conflicts, dependencies) among the options before filtering out invalid combinations and only leaving the valid ones for fuzzing. We implemented a tool called CarpetFuzz and evaluated its performance. The results show that CarpetFuzz accurately extracts the relationships from documents with 96.10% precision and 88.85% recall. Based on these relationships, CarpetFuzz reduced the 67.91% option combinations to be tested. It helps AFL find 45.97% more paths that other fuzzers cannot discover. After analyzing 20 popular open-source programs, CarpetFuzz discovered 57 vulnerabilities, including 43 undisclosed ones. We also successfully obtained CVE IDs for 30 vulnerabilities.

4:30 pm–4:45 pm

Short Break

Platinum Foyer

4:45 pm–6:00 pm

Track 1

Cache Attacks

Session Chair: Shaanan Cohney, University of Melbourne

Platinum Salon 6

SCARF – A Low-Latency Block Cipher for Secure Cache-Randomization

Federico Canale, Ruhr-University Bochum; Tim Güneysu, Ruhr-University Bochum and DFKI; Gregor Leander and Jan Philipp Thoma, Ruhr-University Bochum; Yosuke Todo, NTT Social Informatics Laboratories; Rei Ueno, Tohoku University

Available Media

Randomized cache architectures have proven to significantly increase the complexity of contention-based cache side channel attacks and therefore present an important building block for side-channel secure microarchitectures. By randomizing the address-to-cache-index mapping, attackers can no longer trivially construct minimal eviction sets which are fundamental for contention-based cache attacks. At the same time, randomized caches maintain the flexibility of traditional caches, making them broadly applicable across various CPU-types. This is a major advantage over cache partitioning approaches.

A large variety of randomized cache architectures has been proposed. However, the actual randomization function received little attention and is often neglected in these proposals. Since the randomization operates directly on the critical path of the cache lookup, the function needs to have extremely low latency. At the same time, attackers must not be able to bypass the randomization which would nullify the security benefit of the randomized mapping. In this paper we propose SCARF (Secure CAche Randomization Function), the first dedicated cache randomization cipher which achieves low latency and is cryptographically secure in the cache attacker model. The design methodology for this dedicated cache cipher enters new territory in the field of block ciphers with a small 10-bit block length and heavy key-dependency in few rounds.

The Gates of Time: Improving Cache Attacks with Transient Execution

Daniel Katzman, Tel Aviv University; William Kosasih, The University of Adelaide; Chitchanok Chuengsatiansup, The University of Melbourne; Eyal Ronen, Tel Aviv University; Yuval Yarom, The University of Adelaide

Available Media

For over two decades, cache attacks have been shown to pose a significant risk to the security of computer systems. In particular, a large number of works show that cache attacks provide a stepping stone for implementing transient-execution attacks. However, much less effort has been expended investigating the reverse direction—how transient execution can be exploited for cache attacks. In this work, we answer this question.

We first show that using transient execution, we can perform arbitrary manipulations of the cache state. Specifically, we design versatile logical gates whose inputs and outputs are the caching state of memory addresses. Our gates are generic enough that we can implement them in WebAssembly. Moreover, the gates work on processors from multiple vendors, including Intel, AMD, Apple, and Samsung. We demonstrate that these gates are Turing complete and allow arbitrary computation on cache states, without exposing the logical values to the architectural state of the program.

We then show two use cases for our gates in cache attacks. The first use case is to amplify the cache state, allowing us to create timing differences of over 100 millisecond between the cases that a specific memory address is cached or not. We show how we can use this capability to build eviction sets in WebAssembly, using only a low-resolution (0.1 millisecond) timer. For the second use case, we present the Prime+Scope attack, a variant of Prime+Probe that decouples the sampling of cache states from the measurement of said state. Prime+Store is the first timing-based cache attack that can sample the cache state at a rate higher than the clock rate. We show how to use Prime+Store to obtain bits from a concurrently executing modular exponentiation, when the only timing signal is at a resolution of 0.1 millisecond.

Synchronization Storage Channels (S2C): Timer-less Cache Side-Channel Attacks on the Apple M1 via Hardware Synchronization Instructions

Jiyong Yu and Aishani Dutta, University of Illinois Urbana-Champaign; Trent Jaeger, Pennsylvania State University; David Kohlbrenner, University of Washington; Christopher W. Fletcher, University of Illinois Urbana-Champaign

Available Media

Shared caches have been a prime target for mounting crossprocess/core side-channel attacks. Fundamentally, these attacks require a mechanism to accurately observe changes in cache state. Most cache attacks rely on timing measurements to indirectly infer cache state changes, and attack success hinges on the reliability/availability of accurate timing sources. Far fewer techniques have been proposed to directly observe cache state changes without reliance on timers. Further, none of said ‘timer-less' techniques are accessible to userspace attackers targeting modern CPUs.

This paper proposes a novel technique for mounting timerless cache attacks targeting Apple M1 CPUs named Synchronization Storage Channels (S 2C). The key observation is that the implementation of synchronization instructions, specifically Load-Linked/Store-Conditional (LL/SC), makes architectural state changes when L1 cache evictions occur. This by itself is a useful starting point for attacks, however faces multiple technical challenges when being used to perpetrate cross-core cache attacks. Specifically, LL/SC only observes L1 evictions (not shared L2 cache evictions). Further, each attacker thread can only simultaneously monitor one address at a time through LL/SC (as opposed to many). We propose a suite of techniques and reverse engineering to overcome these limitations, and demonstrate how a single-threaded userspace attacker can use LL/SC to simultaneously monitor multiple (up to 11) victim L2 sets and succeed at standard cache-attack applications, such as breaking cryptographic implementations and constructing covert channels.

ClepsydraCache -- Preventing Cache Attacks with Time-Based Evictions

Jan Philipp Thoma, Ruhr University Bochum; Christian Niesler, University of Duisburg-Essen; Dominic Funke, Gregor Leander, Pierre Mayr, and Nils Pohl, Ruhr University Bochum; Lucas Davi, University of Duisburg-Essen; Tim Güneysu, Ruhr University Bochum & DFKI

Available Media

In the recent past, we have witnessed the shift towards attacks on the microarchitectural CPU level. In particular, cache side-channels play a predominant role as they allow an attacker to exfiltrate secret information by exploiting the CPU microarchitecture. These subtle attacks exploit the architectural visibility of conflicting cache addresses. In this paper, we present ClepsydraCache, which mitigates state-of-the-art cache attacks using a novel combination of cache decay and index randomization. Each cache entry is linked with a Time-To-Live (TTL) value. We propose a new dynamic scheduling mechanism of the TTL which plays a fundamental role in preventing those attacks while maintaining performance. ClepsydraCache efficiently protects against the latest cache attacks such as Prime+(Prune+)Probe. We present a full prototype in gem5 and lay out a proof-of-concept hardware design of the TTL mechanism, which demonstrates the feasibility of deploying ClepsydraCache in real-world systems.

CacheQL: Quantifying and Localizing Cache Side-Channel Vulnerabilities in Production Software

Yuanyuan Yuan, Zhibo Liu, and Shuai Wang, The Hong Kong University of Science and Technology

Available Media

Cache side-channel attacks extract secrets by examining how victim software accesses cache. To date, practical attacks on crypto systems and media libraries are demonstrated under different scenarios, inferring secret keys from crypto algorithms and reconstructing private media data such as images.

This work first presents eight criteria for designing a fullfledged detector for cache side-channel vulnerabilities. Then, we propose CacheQL, a novel detector that meets all of these criteria. CacheQL precisely quantifies information leaks of binary code, by characterizing the distinguishability of logged side channel traces. Moreover, CacheQL models leakage as a cooperative game, allowing information leakage to be precisely distributed to program points vulnerable to cache side channels. CacheQL is meticulously optimized to analyze whole side channel traces logged from production software (where each trace can have millions of records), and it alleviates randomness introduced by crypto blinding, ORAM, or real-world noises.

Our evaluation quantifies side-channel leaks of production crypto and media software. We further localize vulnerabilities reported by previous detectors and also identify a few hundred new vulnerable program points in recent OpenSSL (ver. 3.0.0), MbedTLS (ver. 3.0.0), Libgcrypt (ver. 1.9.4). Many of our localized program points are within the pre-processing modules of crypto libraries, which are not analyzed by existing works due to scalability. We also localize vulnerabilities in Libjpeg (ver. 2.1.2) that leak privacy about input images.

Track 2

Authentication

Session Chair: Scott Ruoti, The University of Tennessee, Knoxville

Platinum Salon 5

InfinityGauntlet: Expose Smartphone Fingerprint Authentication to Brute-force Attack

Yu Chen and Yang Yu, Xuanwu Lab, Tencent; Lidong Zhai, Institute of Information Engineering, Chinese Academy of Sciences

Available Media

Billions of smartphone fingerprint authentications (SFA) occur daily for unlocking, privacy and payment. Existing threats to SFA include presentation attacks (PA) and some case-by-case vulnerabilities. The former need to know the victim's fingerprint information (e.g., latent fingerprints) and can be mitigated by liveness detection and security policies. The latter require additional conditions (e.g., third-party screen protector, root permission) and are only exploitable for individual smartphone models.

In this paper, we conduct the first investigation on the general zero-knowledge attack towards SFA where no knowledge about the victim is needed. We propose a novelty fingerprint brute-force attack on off-the-shelf smartphones, named InfinityGauntlet. Firstly, we discover design vulnerabilities in SFA systems across various manufacturers, operating systems, and fingerprint types to achieve unlimited authentication attempts. Then, we use SPI MITM to bypass liveness detection and make automatic attempts. Finally, we customize a synthetic fingerprint generator to get a valid brute-force fingerprint dictionary.

We design and implement low-cost equipment to launch InfinityGauntlet. A proof-of-concept case study demonstrates that InfinityGauntlet can brute-force attack successfully in less than an hour without any knowledge of the victim. Additionally, empirical analysis on representative smartphones shows the scalability of our work.

A Study of Multi-Factor and Risk-Based Authentication Availability

Anthony Gavazzi, Ryan Williams, Engin Kirda, and Long Lu, Northeastern University; Andre King, Andy Davis, and Tim Leek, MIT Lincoln Laboratory

Available Media

Password-based authentication (PBA) remains the most popular form of user authentication on the web despite its long-understood insecurity. Given the deficiencies of PBA, many online services support multi-factor authentication (MFA) and/or risk-based authentication (RBA) to better secure user accounts. The security, usability, and implementations of MFA and RBA have been studied extensively, but attempts to measure their availability among popular web services have lacked breadth. Additionally, no study has analyzed MFA and RBA prevalence together or how the presence of Single-Sign-On (SSO) providers affects the availability of MFA and RBA on the web.

In this paper, we present a study of 208 popular sites in the Tranco top 5K that support account creation to understand the availability of MFA and RBA on the web, the additional authentication factors that can be used for MFA and RBA, and how logging into sites through more secure SSO providers changes the landscape of user authentication security. We find that only 42.31% of sites support any form of MFA, and only 22.12% of sites block an obvious account hijacking attempt. Though most sites do not offer MFA or RBA, SSO completely changes the picture. If one were to create an account for each site through an SSO provider that offers MFA and/or RBA, whenever available, 80.29% of sites would have access to MFA and 72.60% of sites would stop an obvious account hijacking attempt. However, this proliferation through SSO comes with a privacy trade-off, as nearly all SSO providers that support MFA and RBA are major third-party trackers.

A Large-Scale Measurement of Website Login Policies

Suood Al Roomi, Georgia Institute of Technology, Kuwait University; Frank Li, Georgia Institute of Technology

Available Media

Authenticating on a website using a password involves a multi-stage login process, where each stage entails critical policy and implementation decisions that impact login security and usability. While the security community has identified best practices for each stage of the login workflow, we currently lack a broad understanding of website login policies in practice. Prior work relied upon manual inspection of websites, producing evaluations of only a small population of sites skewed towards the most popular ones.

In this work, we seek to provide a more comprehensive and systematic picture of real-world website login policies. We develop an automated method for inferring website login policies and apply it to domains across the Google CrUX Top 1 Million. We successfully evaluate the login policies on between 18K and 359K sites (varying depending on the login stage considered), providing characterization of a population two to three orders of magnitude larger than previous studies. Our findings reveal the extent to which insecure login policies exist and identify some underlying causes. Ultimately, our study provides the most comprehensive empirical grounding to date on the state of website login security, shedding light on directions for improving online authentication.

Security and Privacy Failures in Popular 2FA Apps

Conor Gilsenan, UC Berkeley / ICSI; Fuzail Shakir and Noura Alomar, UC Berkeley; Serge Egelman, UC Berkeley / ICSI

Available Media

The Time-based One-Time Password (TOTP) algorithm is a 2FA method that is widely deployed because of its relatively low implementation costs and purported security benefits over SMS 2FA. However, users of TOTP 2FA apps face a critical usability challenge: maintain access to the secrets stored within the TOTP app, or risk getting locked out of their accounts. To help users avoid this fate, popular TOTP apps implement a wide range of backup mechanisms, each with varying security and privacy implications. In this paper, we define an assessment methodology for conducting systematic security and privacy analyses of the backup and recovery functionality of TOTP apps. We identified all general purpose Android TOTP apps in the Google Play Store with at least 100k installs that implemented a backup mechanism (n = 22). Our findings show that most backup strategies end up placing trust in the same technologies that TOTP 2FA is meant to supersede: passwords, SMS, and email. Many backup implementations shared personal user information with third parties, had serious cryptographic flaws, and/or allowed the app developers to access the TOTP secrets in plaintext. We present our findings and recommend ways to improve the security and privacy of TOTP 2FA app backup mechanisms.

Multi-Factor Key Derivation Function (MFKDF) for Fast, Flexible, Secure, & Practical Key Management

Vivek Nair and Dawn Song, University of California, Berkeley

Available Media

We present the first general construction of a Multi-Factor Key Derivation Function (MFKDF). Our function expands upon password-based key derivation functions (PBKDFs) with support for using other popular authentication factors like TOTP, HOTP, and hardware tokens in the key derivation process. In doing so, it provides an exponential security improvement over PBKDFs with less than 12 ms of additional computational overhead in a typical web browser. We further present a threshold MFKDF construction, allowing for client-side key recovery and reconstitution if a factor is lost. Finally, by "stacking" derived keys, we provide a means of cryptographically enforcing arbitrarily specific key derivation policies. The result is a paradigm shift toward direct cryptographic protection of user data using all available authentication factors, with no noticeable change to the user experience. We demonstrate the ability of our solution to not only significantly improve the security of existing systems implementing PBKDFs, but also to enable new applications where PBKDFs would not be considered a feasible approach.

Track 3

Private Data Leaks

Session Chair: Rahul Chatterjee, UW--Madison

Platinum Salon 7–8

Log: It’s Big, It’s Heavy, It’s Filled with Personal Data! Measuring the Logging of Sensitive Information in the Android Ecosystem

Allan Lyons, University of Calgary; Julien Gamba, IMDEA Networks Institute and Universidad Carlos III de Madrid; Austin Shawaga, University of Calgary; Joel Reardon, University of Calgary and AppCensus, Inc.; Juan Tapiador, Universidad Carlos III de Madrid; Serge Egelman, ICSI and UC Berkeley and AppCensus, Inc.; Narseo Vallina-Rodriguez, IMDEA Networks Institute and AppCensus, Inc.

Available Media

Android offers a shared system that multiplexes all logged data from all system components, including both the operating system and the console output of apps that run on it. A security mechanism ensures that user-space apps can only read the log entries that they create, though many "privileged" apps are exempt from this restriction. This includes preloaded system apps provided by Google, the phone manufacturer, the cellular carrier, as well as those sharing the same signature. Consequently, Google advises developers to not log sensitive information to the system log.

In this work, we examined the logging of sensitive data in the Android ecosystem. Using a field study, we show that most devices log some amount of user-identifying information. We show that the logging of "activity" names can inadvertently reveal information about users through their app usage. We also tested whether different smartphones log personal identifiers by default, examined preinstalled apps that access the system logs, and analyzed the privacy policies of manufacturers that report collecting system logs.

CodexLeaks: Privacy Leaks from Code Generation Language Models in GitHub Copilot

Liang Niu and Shujaat Mirza, New York University; Zayd Maradni and Christina Pöpper, New York University Abu Dhabi

Available Media

Code generation language models are trained on billions of lines of source code to provide code generation and auto-completion features, like those offered by code assistant GitHub Copilot with more than a million users. These datasets may contain sensitive personal information—personally identifiable, private, or secret—that these models may regurgitate.

This paper introduces and evaluates a semi-automated pipeline for extracting sensitive personal information from the Codex model used in GitHub Copilot. We employ carefully-designed templates to construct prompts that are more likely to result in privacy leaks. To overcome the non-public training data, we propose a semi-automated filtering method using a blind membership inference attack. We validate the effectiveness of our membership inference approach on different code generation models. We utilize hit rate through the GitHub Search API as a distinguishing heuristic followed by human-in-the-loop evaluation, uncovering that approximately 8% (43) of the prompts yield privacy leaks. Notably, we observe that the model tends to produce indirect leaks, compromising privacy as contextual integrity by generating information from individuals closely related to the queried subject in the training corpus.

Freaky Leaky SMS: Extracting User Locations by Analyzing SMS Timings

Evangelos Bitsikas, Northeastern University; Theodor Schnitzler, Research Center Trustworthy Data Science and Security; Christina Pöpper, New York University Abu Dhabi; Aanjhan Ranganathan, Northeastern University

Available Media

Short Message Service (SMS) remains one of the most popular communication channels since its introduction in 2G cellular networks. In this paper, we demonstrate that merely receiving silent SMS messages regularly opens a stealthy side-channel that allows other regular network users to infer the whereabouts of the SMS recipient. The core idea is that receiving an SMS inevitably generates Delivery Reports whose reception bestows a timing attack vector at the sender. We conducted experiments across various countries, operators, and devices to show that an attacker can deduce the location of an SMS recipient by analyzing timing measurements from typical receiver locations. Our results show that, after training an ML model, the SMS sender can accurately determine multiple locations of the recipient. For example, our model achieves up to 96% accuracy for locations across different countries, and 86% for two locations within Belgium. Due to the way cellular networks are designed, it is difficult to prevent Delivery Reports from being returned to the originator making it challenging to thwart this covert attack without making fundamental changes to the network architecture.

The Writing on the Wall and 3D Digital Twins: Personal Information in (not so) Private Real Estate

Rachel McAmis and Tadayoshi Kohno, University of Washington

Available Media

Online real estate companies are starting to offer 3D virtual tours of homes (3D digital twins). We qualitatively analyzed 44 3D home tours with personal artifacts visible on Zillow and assessed each home for the extent and type of personal information shared. Using a codebook we created, we analyzed three categories of personal information in each home: government-provided guidance of what not to share on the internet, identity information, and behavioral information. Our analysis unearthed a wide variety of sensitive information across all homes, including names, hobbies, employment and education history, product preferences (e.g., pantry items, types of cigarettes), medications, credit card numbers, passwords, and more. Based on our analysis, residents both employed privacy protections and had privacy oversights. We identify potential adversaries that might use 3D tour information, highlight additional sensitive sources of indoor space information, and discuss future tools and policy changes that could address these issues.

Track 4

Generative AI

Session Chair: Lea Schönherr, CISPA

Platinum Salon 9–10

Glaze: Protecting Artists from Style Mimicry by Text-to-Image Models

Shawn Shan, Jenna Cryan, Emily Wenger, Haitao Zheng, Rana Hanocka, and Ben Y. Zhao, University of Chicago

Distinguished Paper Award Winner and Co-Winner of the 2023 Internet Defense Prize

Available Media

Recent text-to-image diffusion models such as MidJourney and Stable Diffusion threaten to displace many in the professional artist community. In particular, models can learn to mimic the artistic style of specific artists after "fine-tuning" on samples of their art. In this paper, we describe the design, implementation and evaluation of Glaze, a tool that enables artists to apply "style cloaks" to their art before sharing online. These cloaks apply barely perceptible perturbations to images, and when used as training data, mislead generative models that try to mimic a specific artist. In coordination with the professional artist community, we deploy user studies to more than 1000 artists, assessing their views of AI art, as well as the efficacy of our tool, its usability and tolerability of perturbations, and robustness across different scenarios and against adaptive countermeasures. Both surveyed artists and empirical CLIP-based scores show that even at low perturbation levels (p=0.05), Glaze is highly successful at disrupting mimicry under normal conditions (>92%) and against adaptive countermeasures (>85%).

Lost at C: A User Study on the Security Implications of Large Language Model Code Assistants

Gustavo Sandoval, Hammond Pearce, Teo Nys, Ramesh Karri, Siddharth Garg, and Brendan Dolan-Gavitt, New York University

Available Media

Large Language Models (LLMs) such as OpenAI Codex are increasingly being used as AI-based coding assistants. Understanding the impact of these tools on developers’ code is paramount, especially as recent work showed that LLMs may suggest cybersecurity vulnerabilities. We conduct a security-driven user study (N=58) to assess code written by student programmers when assisted by LLMs. Given the potential severity of low-level bugs as well as their relative frequency in real-world projects, we tasked participants with implementing a singly-linked ‘shopping list’ structure in C. Our results indicate that the security impact in this setting (low-level C with pointer and array manipulations) is small: AI-assisted users produce critical security bugs at a rate no greater than 10% more than the control, indicating the use of LLMs does not introduce new security risks.

Two-in-One: A Model Hijacking Attack Against Text Generation Models

Wai Man Si, Michael Backes, and Yang Zhang, CISPA Helmholtz Center for Information Security; Ahmed Salem, Microsoft

Available Media

Machine learning has progressed significantly in various applications ranging from face recognition to text generation. However, its success has been accompanied by different attacks. Recently a new attack has been proposed which raises both accountability and parasitic computing risks, namely the model hijacking attack. Nevertheless, this attack has only focused on image classification tasks. In this work, we broaden the scope of this attack to include text generation and classification models, hence showing its broader applicability. More concretely, we propose a new model hijacking attack, Ditto, that can hijack different text classification tasks into multiple generation ones, e.g., language translation, text summarization, and language modeling. We use a range of text benchmark datasets such as SST-2, TweetEval, AGnews, QNLI, and IMDB to evaluate the performance of our attacks. Our results show that by using Ditto, an adversary can successfully hijack text generation models without jeopardizing their utility.

PTW: Pivotal Tuning Watermarking for Pre-Trained Image Generators

Nils Lukas and Florian Kerschbaum, University of Waterloo

Available Media

Deepfakes refer to content synthesized using deep generators, which, when misused, have the potential to erode trust in digital media. Synthesizing high-quality deepfakes requires access to large and complex generators only a few entities can train and provide. The threat is malicious users that exploit access to the provided model and generate harmful deepfakes without risking detection. Watermarking makes deepfakes detectable by embedding an identifiable code into the generator that is later extractable from its generated images. We propose Pivotal Tuning Watermarking (PTW), a method for watermarking pre-trained generators (i) three orders of magnitude faster than watermarking from scratch and (ii) without the need for any training data. We improve existing watermarking methods and scale to generators 4× larger than related work. PTW can embed longer codes than existing methods while better preserving the generator's image quality. We propose rigorous, game-based definitions for robustness and undetectability and our study reveals that watermarking is not robust against an adaptive white-box attacker who has control over the generator's parameters. We propose an adaptive attack that can successfully remove any watermarking with access to only 200 non-watermarked images. Our work challenges the trustworthiness of watermarking for deepfake detection when the parameters of a generator are available.

Track 5

Security Worker Perspectives

Session Chair: Daniel Votipka, Tufts University

Platinum Salon 3–4

Lessons Lost: Incident Response in the Age of Cyber Insurance and Breach Attorneys

Daniel W. Woods, University of Edinburgh; Rainer Böhme, University of Innsbruck; Josephine Wolff, Tufts University; Daniel Schwarcz, University of Minnesota

Available Media

Incident Response (IR) allows victim firms to detect, contain, and recover from security incidents. It should also help the wider community avoid similar attacks in the future. In pursuit of these goals, technical practitioners are increasingly influenced by stakeholders like cyber insurers and lawyers. This paper explores these impacts via a multi-stage, mixed methods research design that involved 69 expert interviews, data on commercial relationships, and an online validation workshop. The first stage of our study established 11 stylized facts that describe how cyber insurance sends work to a small numbers of IR firms, drives down the fee paid, and appoints lawyers to direct technical investigators. The second stage showed that lawyers when directing incident response often: introduce legalistic contractual and communication steps that slow-down incident response; advise IR practitioners not to write down remediation steps or to produce formal reports; and restrict access to any documents produced.

Bug Hunters’ Perspectives on the Challenges and Benefits of the Bug Bounty Ecosystem

Omer Akgul, University of Maryland; Taha Eghtesad, Pennsylvania State University; Amit Elazari, University of California, Berkeley; Omprakash Gnawali, University of Houston; Jens Grossklags, Technical University of Munich; Michelle L. Mazurek, University of Maryland; Daniel Votipka, Tufts University; Aron Laszka, Pennsylvania State University

Distinguished Paper Award Winner

Available Media

Although researchers have characterized the bug-bounty ecosystem from the point of view of platforms and programs, minimal effort has been made to understand the perspectives of the main workers: bug hunters. To improve bug bounties, it is important to understand hunters’ motivating factors, challenges, and overall benefits. We address this research gap with three studies: identifying key factors through a free listing survey (n=56), rating each factor’s importance with a larger-scale factor-rating survey (n=159), and conducting semi-structured interviews to uncover details (n=24). Of 54 factors that bug hunters listed, we find that rewards and learning opportunities are the most important benefits. Further, we find scope to be the top differentiator between programs. Surprisingly, we find earning reputation to be one of the least important motivators for hunters. Of the challenges we identify, communication problems, such as unresponsiveness and disputes, are the most substantial. We present recommendations to make the bug-bounty ecosystem accommodating to more bug hunters and ultimately increase participation in an underutilized market.

Work-From-Home and COVID-19: Trajectories of Endpoint Security Management in a Security Operations Center

Kailani R. Jones and Dalton A. Brucker-Hahn, University of Kansas; Bradley Fidler, Independent Researcher; Alexandru G. Bardas, University of Kansas

Available Media

The COVID-19 surge of "Work From Home" (WFH) Internet use incentivized many organizations to strengthen their endpoint security monitoring capabilities. This trend has significant implications for how Security Operations Centers (SOCs) manage these end devices on their enterprise networks: in their organizational roles, regulatory environment, and required skills. By intersecting historical analysis (starting in the 1970s) and ethnography (analyzed 352 field notes across 1,000+ hours in a SOC over 34 months) whilst complementing with quantitative interviews (covering 7 other SOCs), we uncover causal forces that have pushed network management toward endpoints. We further highlight the negative impacts on end user privacy and analyst burnout. As such, we assert that SOCs should consider preparing for a continual, long-term shift from managing the network perimeter and the associated devices to commanding the actual user endpoints while facing potential privacy challenges and more burnout.

“Employees Who Don’t Accept the Time Security Takes Are Not Aware Enough”: The CISO View of Human-Centred Security

Jonas Hielscher and Uta Menges, Ruhr University Bochum; Simon Parkin, TU Delft; Annette Kluge and M. Angela Sasse, Ruhr University Bochum

Available Media

In larger organisations, the security controls and policies that protect employees are typically managed by a Chief Information Security Officer (CISO). In research, industry, and policy, there are increasing efforts to relate principles of human behaviour interventions and influence to the practice of the CISO, despite these being complex disciplines in their own right. Here we explore how well the concepts of human-centred security (HCS) have survived exposure to the needs of practice: in an action research approach we engaged with n=30 members of a Swiss-based community of CISOs in five workshop sessions over the course of 8 months, dedicated to discussing HCS. We coded and analysed over 25 hours of notes we took during the discussions. We found that CISOs far and foremost perceive HCS as what is available on the market, namely awareness and phishing simulations. While they regularly shift responsibility either to the management (by demanding more support) or to the employees (by blaming them) we see a lack of power but also silo-thinking that prevents CISOs from considering actual human behaviour and friction that security causes for employees. We conclude that industry best practices and the state-of-the-art in HCS research are not aligned.

Track 6

Deep Thoughts on Deep Learning

Session Chair: Gang Wang, University of Illinois at Urbana–Champaign

Platinum Salon 1–2

Aegis: Mitigating Targeted Bit-flip Attacks against Deep Neural Networks

Jialai Wang, Tsinghua University; Ziyuan Zhang, Beijing University of Posts and Telecommunications; Meiqi Wang, Tsinghua University; Han Qiu, Tsinghua University and Zhongguancun Laboratory; Tianwei Zhang, Nanyang Technological University; Qi Li, Tsinghua University and Zhongguancun Laboratory; Zongpeng Li, Tsinghua University and Hangzhou Dianzi University; Tao Wei, Ant Group; Chao Zhang, Tsinghua University and Zhongguancun Laboratory

Available Media

Bit-flip attacks (BFAs) have attracted substantial attention recently, in which an adversary could tamper with a small number of model parameter bits to break the integrity of DNNs. To mitigate such threats, a batch of defense methods are proposed, focusing on the untargeted scenarios. Unfortunately, they either require extra trustworthy applications or make models more vulnerable to targeted BFAs. Countermeasures against targeted BFAs, stealthier and more purposeful by nature, are far from well established.

In this work, we propose Aegis, a novel defense method to mitigate targeted BFAs. The core observation is that existing targeted attacks focus on flipping critical bits in certain important layers. Thus, we design a dynamic-exit mechanism to attach extra internal classifiers (ICs) to hidden layers. This mechanism enables input samples to early-exit from different layers, which effectively upsets the adversary's attack plans. Moreover, the dynamic-exit mechanism randomly selects ICs for predictions during each inference to significantly increase the attack cost for the adaptive attacks where all defense mechanisms are transparent to the adversary. We further propose a robustness training strategy to adapt ICs to the attack scenarios through simulating BFAs during the IC training phase, to increase model robustness. Extensive evaluations over four well-known datasets and two popular DNN structures reveal that Aegis could effectively mitigate different state-of-the-art targeted attacks, reducing attack success rate by 5-10x, significantly outperforming existing defense methods. We open source the code of Aegis.

Rethinking White-Box Watermarks on Deep Learning Models under Neural Structural Obfuscation

Yifan Yan, Xudong Pan, Mi Zhang, and Min Yang, Fudan University

Available Media

Copyright protection for deep neural networks (DNNs) is an urgent need for AI corporations. To trace illegally distributed model copies, DNN watermarking is an emerging technique for embedding and verifying secret identity messages in the prediction behaviors or the model internals. Sacrificing less functionality and involving more knowledge about the target DNN, the latter branch called white-box DNN watermarking is believed to be accurate, credible and secure against most known watermark removal attacks, with emerging research efforts in both the academy and the industry.

In this paper, we present the first systematic study on how the mainstream white-box DNN watermarks are commonly vulnerable to neural structural obfuscation with dummy neurons, a group of neurons which can be added to a target model but leave the model behavior invariant. Devising a comprehensive framework to automatically generate and inject dummy neurons with high stealthiness, our novel attack intensively modifies the architecture of the target model to inhibit the success of watermark verification. With extensive evaluation, our work for the first time shows that nine published watermarking schemes require amendments to their verification procedures.

PELICAN: Exploiting Backdoors of Naturally Trained Deep Learning Models In Binary Code Analysis

Zhuo Zhang, Guanhong Tao, Guangyu Shen, Shengwei An, Qiuling Xu, Yingqi Liu, and Yapeng Ye, Purdue University; Yaoxuan Wu, University of California, Los Angeles; Xiangyu Zhang, Purdue University

Available Media

Deep Learning (DL) models are increasingly used in many cyber-security applications and achieve superior performance compared to traditional solutions. In this paper, we study backdoor vulnerabilities in naturally trained models used in binary analysis. These backdoors are not injected by attackers but rather products of defects in datasets and/or training processes. The attacker can exploit these vulnerabilities by injecting some small fixed input pattern (e.g., an instruction) called backdoor trigger to their input (e.g., a binary code snippet for a malware detection DL model) such that misclassification can be induced (e.g., the malware evades the detection). We focus on transformer models used in binary analysis. Given a model, we leverage a trigger inversion technique particularly designed for these models to derive trigger instructions that can induce misclassification. During attack, we utilize a novel trigger injection technique to insert the trigger instruction(s) to the input binary code snippet. The injection makes sure that the code snippets' original program semantics are preserved and the trigger becomes an integral part of such semantics and hence cannot be easily eliminated. We evaluate our prototype PELICAN on 5 binary analysis tasks and 15 models. The results show that PELICAN can effectively induce misclassification on all the evaluated models in both white-box and black-box scenarios. Our case studies demonstrate that PELICAN can exploit the backdoor vulnerabilities of two closed-source commercial tools.

IvySyn: Automated Vulnerability Discovery in Deep Learning Frameworks

Neophytos Christou, Di Jin, and Vaggelis Atlidakis, Brown University; Baishakhi Ray, Columbia University; Vasileios P. Kemerlis, Brown University

Available Media

We present IvySyn, the first fully-automated framework for discovering memory error vulnerabilities in Deep Learning (DL) frameworks. IvySyn leverages the statically-typed nature of native APIs in order to automatically perform type-aware mutation-based fuzzing on low-level kernel code. Given a set of offending inputs that trigger memory safety (and runtime) errors in low-level, native DL (C/C++) code, IvySyn automatically synthesizes code snippets in high-level languages (e.g., in Python), which propagate error-triggering input via high(er)-level APIs. Such code snippets essentially act as "Proof of Vulnerability", as they demonstrate the existence of bugs in native code that an attacker can target through various high-level APIs. Our evaluation shows that IvySyn significantly outperforms past approaches, both in terms of efficiency and effectiveness, in finding vulnerabilities in popular DL frameworks. Specifically, we used IvySyn to test Tensor-Flow and PyTorch. Although still an early prototype, IvySyn has already helped the TensorFlow and PyTorch framework developers to identify and fix 61 previously-unknown security vulnerabilities, and assign 39 unique CVEs.

6:00 pm–7:30 pm

Symposium Reception and Presentation of the USENIX Lifetime Achievement Award

Grand Salons E–F

Mingle with fellow attendees at the USENIX Security '23 Reception, featuring dinner, drinks, and the chance to connect with other attendees, speakers, and symposium organizers.

7:30 pm–8:30 pm

Lightning Talks

Platinum Salon 5

We will host a Lightning Talks session (also previously known as Work-in-Progress/Rump session) on the evening of Wednesday, August 9, 2023. This is intended as an informal session of short and engaging presentations on recent unpublished results, work in progress, or other topics of interest to USENIX Security attendees. As in the past, talks do not always need to be serious and funny talks are encouraged! For full consideration, submit your lightning talk via the lightning talk submission form, through Wednesday, July 26, 2023, 11:59 pm AoE. You can continue submitting talks via the submission form until Monday, August 7, 2023, 12:00 pm PDT. However, due to time, there is no guarantee of full consideration after the initial deadline.

Thursday, August 10

8:00 am–9:00 am

Continental Breakfast

Platinum Foyer

9:00 am–10:15 am

Track 1

Smart? Assistants

Session Chair: Habiba Farrukh, Purdue University

Platinum Salon 6

Hey Kimya, Is My Smart Speaker Spying on Me? Taking Control of Sensor Privacy Through Isolation and Amnesia

Piet De Vaere and Adrian Perrig, ETH Zürich

Available Media

Although smart speakers and other voice assistants are becoming increasingly ubiquitous, their always-standby nature continues to prompt significant privacy concerns. To address these, we propose Kimya, a hardening framework that allows device vendors to provide strong data-privacy guarantees. Concretely, Kimya guarantees that microphone data can only be used for local processing, and is immediately discarded unless a user-auditable notification is generated. Kimya thus makes devices accountable for their data-retention behavior. Moreover, Kimya is not limited to voice assistants, but is applicable to all devices with always-standby, event-triggered sensors. We implement Kimya for ARM Cortex-M, and apply it to a wake-word detection engine. Our evaluation shows that Kimya introduces low overhead, can be used in constrained environments, and does not require hardware modifications.

Spying through Your Voice Assistants: Realistic Voice Command Fingerprinting

Dilawer Ahmed, Aafaq Sabir, and Anupam Das, North Carolina State University

Available Media

Voice assistants are becoming increasingly pervasive due to the convenience and automation they provide through the voice interface. However, such convenience often comes with unforeseen security and privacy risks. For example, encrypted traffic from voice assistants can leak sensitive information about their users' habits and lifestyles. In this paper, we present a taxonomy of fingerprinting voice commands on the most popular voice assistant platforms (Google, Alexa, and Siri). We also provide a deeper understanding of the feasibility of fingerprinting third-party applications and streaming services over the voice interface. Our analysis not only improves the state-of-the-art technique but also studies a more realistic setup for fingerprinting voice activities over encrypted traffic.Our proposed technique considers a passive network eavesdropper observing encrypted traffic from various devices within a home and, therefore, first detects the invocation/activation of voice assistants followed by what specific voice command is issued. Using an end-to-end system design, we show that it is possible to detect when a voice assistant is activated with 99% accuracy and then utilize the subsequent traffic pattern to infer more fine-grained user activities with around 77-80% accuracy.

QFA2SR: Query-Free Adversarial Transfer Attacks to Speaker Recognition Systems

Guangke Chen, Yedi Zhang, and Zhe Zhao, ShanghaiTech University; Fu Song, ShanghaiTech University; Automotive Software Innovation Center; Institute of Software, Chinese Academy of Sciences & University of Chinese Academy of Sciences

Available Media

Current adversarial attacks against speaker recognition systems (SRSs) require either white-box access or heavy black-box queries to the target SRS, thus still falling behind practical attacks against proprietary commercial APIs and voice-controlled devices. To fill this gap, we propose QFA2SR, an effective and imperceptible query-free black-box attack, by leveraging the transferability of adversarial voices. To improve transferability, we present three novel methods, tailored loss functions, SRS ensemble, and time-freq corrosion. The first one tailors loss functions to different attack scenarios. The latter two augment surrogate SRSs in two different ways. SRS ensemble combines diverse surrogate SRSs with new strategies, amenable to the unique scoring characteristics of SRSs. Time-freq corrosion augments surrogate SRSs by incorporating well-designed time-/frequency-domain modification functions, which simulate and approximate the decision boundary of the target SRS and distortions introduced during over-the-air attacks. QFA2SR boosts the targeted transferability by 20.9%-70.7% on four popular commercial APIs (Microsoft Azure, iFlytek, Jingdong, and TalentedSoft), significantly outperforming existing attacks in query-free setting, with negligible effect on the imperceptibility. QFA2SR is also highly effective when launched over the air against three wide-spread voice assistants (Google Assistant, Apple Siri, and TMall Genie) with 60%, 46%, and 70% targeted transferability, respectively.

Learning Normality is Enough: A Software-based Mitigation against Inaudible Voice Attacks

Xinfeng Li, Xiaoyu Ji, and Chen Yan, USSLAB, Zhejiang University; Chaohao Li, USSLAB, Zhejiang University and Hangzhou Hikvision Digital Technology Co., Ltd.; Yichen Li, Hong Kong University of Science and Technology; Zhenning Zhang, University of Illinois at Urbana-Champaign; Wenyuan Xu, USSLAB, Zhejiang University

Available Media

Inaudible voice attacks silently inject malicious voice commands into voice assistants to manipulate voice-controlled devices such as smart speakers. To alleviate such threats for both existing and future devices, this paper proposes NormDetect, a software-based mitigation that can be instantly applied to a wide range of devices without requiring any hardware modification. To overcome the challenge that the attack patterns vary between devices, we design a universal detection model that does not rely on audio features or samples derived from specific devices. Unlike existing studies’ supervised learning approach, we adopt unsupervised learning inspired by anomaly detection. Though the patterns of inaudible voice attacks are diverse, we find that benign audios share similar patterns in the time-frequency domain. Therefore, we can detect the attacks (the anomaly) by learning the patterns of benign audios (the normality). NormDetect maps spectrum features to a low-dimensional space, performs similarity queries, and replaces them with the standard feature embeddings for spectrum reconstruction. This results in a more significant reconstruction error for attacks than normality. Evaluation based on the 383,320 test samples we collected from 24 smart devices shows an average AUC of 99.48% and EER of 2.23%, suggesting the effectiveness of NormDetect in detecting inaudible voice attacks.

Powering for Privacy: Improving User Trust in Smart Speaker Microphones with Intentional Powering and Perceptible Assurance

Youngwook Do and Nivedita Arora, Georgia Institute of Technology; Ali Mirzazadeh and Injoo Moon, Georgia Institute of Technology and Massachusetts Institute of Technology; Eryue Xu, Georgia Institute of Technology; Zhihan Zhang, Georgia Institute of Technology and University of Washington; Gregory D. Abowd, Georgia Institute of Technology and Northeastern University; Sauvik Das, Georgia Institute of Technology and Carnegie Mellon University

Available Media

Smart speakers come with always-on microphones to facilitate voice-based interaction. To address user privacy concerns, existing devices come with a number of privacy features: e.g., mute buttons and local trigger-word detection modules. But it is difficult for users to trust that these manufacturer-provided privacy features actually work given that there is a misalignment of incentives: Google, Meta, and Amazon benefit from collecting personal data and users know it. What's needed is perceptible assurance — privacy features that users can, through physical perception, verify actually work. To that end, we introduce, implement, and evaluate the idea of "intentionally-powered" microphones to provide users with perceptible assurance of privacy with smart speakers. We employed an iterative-design process to develop Candid Mic, a battery-free, wireless microphone that can only be powered by harvesting energy from intentional user interactions. Moreover, users can visually inspect the (dis)connection between the energy harvesting module and the microphone. Through a within-subjects experiment, we found that Candid Mic provides users with perceptible assurance about whether the microphone is capturing audio or not, and improves user trust in using smart speakers relative to mute button interfaces.

Track 2

Security-Adjacent Worker Perspectives

Session Chair: Mary Ellen Zurko, MIT Lincoln Laboratory

Platinum Salon 5

To Cloud or not to Cloud: A Qualitative Study on Self-Hosters' Motivation, Operation, and Security Mindset

Lea Gröber, CISPA Helmholtz Center for Information Security and Saarland University; Rafael Mrowczynski, CISPA Helmholtz Center for Information Security; Nimisha Vijay and Daphne A. Muller, Nextcloud; Adrian Dabrowski and Katharina Krombholz, CISPA Helmholtz Center for Information Security

Available Media

Despite readily available cloud services, some people decide to self-host internal or external services for themselves or their organization. In doing so, a broad spectrum of commercial, institutional, and private self-hosters take responsibility for their data, security, and reliability of their operations. Currently, little is known about what motivates these self-hosters, how they operate and secure their services, and which challenges they face. To improve the understanding of self-hosters' security mindsets and practices, we conducted a large-scale survey (N=994) with users of a popular self-hosting suite and in-depth follow-up interviews with selected commercial, non-profit, and private users (N=41). We found exemplary behavior in all user groups; however, we also found a significant part of self-hosters who approach security in an unstructured way, regardless of social or organizational embeddedness. Vague catch-all concepts such as firewalls and backups dominate the landscape, without proper reflection on the threats they help mitigate. At times, self-hosters engage in creative tactics to compensate for a potential lack of expertise or experience.

“I wouldn't want my unsafe code to run my pacemaker”: An Interview Study on the Use, Comprehension, and Perceived Risks of Unsafe Rust

Sandra Höltervennhoff, Leibniz University Hannover; Philip Klostermeyer and Noah Wöhler, CISPA Helmholtz Center for Information Security; Yasemin Acar, Paderborn University, George Washington University; Sascha Fahl, CISPA Helmholtz Center for Information Security

Available Media

Modern software development still struggles with memory safety issues as a significant source of security bugs. The Rust programming language addresses memory safety and provides further security features. However, Rust offers developers the ability to opt out of some of these guarantees using unsafe Rust. Previous work found that the source of many security vulnerabilities is unsafe Rust.

In this paper, we are the first to see behind the curtain and investigate developers' motivations for, experiences with, and risk assessment of using unsafe Rust in depth. Therefore, we conducted 26 semi-structured interviews with experienced Rust developers. We find that developers aim to use unsafe Rust sparingly and with caution. However, we also identify common misconceptions and tooling fatigue that can lead to security issues, find that security policies for using unsafe Rust are widely missing and that participants underestimate the security risks of using unsafe Rust.

We conclude our work by discussing the findings and recommendations for making the future use of unsafe Rust more secure.

Pushed by Accident: A Mixed-Methods Study on Strategies of Handling Secret Information in Source Code Repositories

Alexander Krause, CISPA Helmholtz Center for Information Security; Jan H. Klemmer and Nicolas Huaman, Leibniz University Hannover; Dominik Wermke, CISPA Helmholtz Center for Information Security; Yasemin Acar, Paderborn University, George Washington University; Sascha Fahl, CISPA Helmholtz Center for Information Security

Available Media

Version control systems for source code, such as Git, are key tools in modern software development. Many developers use services like GitHub or GitLab for collaborative software development. Many software projects include code secrets such as API keys or passwords that need to be managed securely. Previous research and blog posts found that developers struggle with secure code secret management and accidentally leaked code secrets to public Git repositories. Leaking code secrets to the public can have disastrous consequences, such as abusing services and systems or making sensitive user data available to attackers. In a mixed-methods study, we surveyed 109 developers with version control system experience. Additionally, we conducted 14 in-depth semi-structured interviews with developers who experienced secret leakage in the past. 30.3% of our participants encountered code secret leaks in the past. Most of them face several challenges with secret leakage prevention and remediation. Based on our findings, we discuss challenges, such as estimating the risks of leaked secrets, and the needs of developers in remediating and preventing code secret leaks, such as low adoption requirements. We conclude with recommendations for developers and source code platform providers to reduce the risk of secret leakage.

A Mixed-Methods Study of Security Practices of Smart Contract Developers

Tanusree Sharma, Zhixuan Zhou, Andrew Miller, and Yang Wang, University of Illinois at Urbana Champaign

Available Media

Smart contracts are self-executing programs that run on blockchains (e.g., Ethereum). While security is a key concern for smart contracts, it is unclear how smart contract developers approach security. To help fill this research gap, we conducted a mixed-methods study of smart contract developers, including interviews and a code review task with 29 developers and an online survey with 171 valid respondents. Our findings show various smart contract security perceptions and practices, including the usage of different tools and resources. Overall, the majority of our participants did not consider security as a priority in their smart contract development. In addition, the security vulnerability identification rates in our code review tasks were alarmingly low (often lower than 50%) across different vulnerabilities and regardless of our participants' years of experience in smart contract development. We discuss how future education and tools could better support developers in ensuring smart contract security.

The Role of Professional Product Reviewers in Evaluating Security and Privacy

Wentao Guo, Jason Walter, and Michelle L. Mazurek, University of Maryland

Available Media

Consumers who use Internet-connected products are often exposed to security and privacy vulnerabilities that they lack time or expertise to evaluate themselves. Can professional product reviewers help by evaluating security and privacy on their behalf? We conducted 17 interviews with product reviewers about their procedures, incentives, and assumptions regarding security and privacy. We find that reviewers have some incentives to evaluate security and privacy, but they also face substantial disincentives and challenges, leading them to consider a limited set of relevant criteria and threat models. We recommend future work to help product reviewers provide useful advice to consumers in ways that align with reviewers' business models and incentives. These include developing usable resources and tools, as well as validating the heuristics they use to judge security and privacy expediently.

Track 3

Censorship and Internet Freedom

Session Chair: Rob Jansen, U.S. Naval Research Laboratory

Platinum Salon 7–8

Network Responses to Russia's Invasion of Ukraine in 2022: A Cautionary Tale for Internet Freedom

Reethika Ramesh, Ram Sundara Raman, and Apurva Virkud, University of Michigan; Alexandra Dirksen, TU Braunschweig; Armin Huremagic, University of Michigan; David Fifield, unaffiliated; Dirk Rodenburg and Rod Hynes, Psiphon; Doug Madory, Kentik; Roya Ensafi, University of Michigan

Available Media

Russia's invasion of Ukraine in February 2022 was followed by sanctions and restrictions: by Russia against its citizens, by Russia against the world, and by foreign actors against Russia. Reports suggested a torrent of increased censorship, geoblocking, and network events affecting Internet freedom.

This paper is an investigation into the network changes that occurred in the weeks following this escalation of hostilities. It is the result of a rapid mobilization of researchers and activists, examining the problem from multiple perspectives. We develop GeoInspector, and conduct measurements to identify different types of geoblocking, and synthesize data from nine independent data sources to understand and describe various network changes. Immediately after the invasion, more than 45% of Russian government domains tested blocked access from countries other than Russia and Kazakhstan; conversely, 444 foreign websites, including news and educational domains, geoblocked Russian users. We find significant increases in Russian censorship, especially of news and social media. We find evidence of the use of BGP withdrawals to implement restrictions, and we quantify the use of a new domestic certificate authority. Finally, we analyze data from circumvention tools, and investigate their usage and blocking. We hope that our findings showing the rapidly shifting landscape of Internet splintering serves as a cautionary tale, and encourages research and efforts to protect Internet freedom.

A Study of China's Censorship and Its Evasion Through the Lens of Online Gaming

Yuzhou Feng, Florida International University; Ruyu Zhai, Hangzhou Dianzi University; Radu Sion, Stony Brook University; Bogdan Carbunar, Florida International University

Available Media

For the past 20 years, China has increasingly restricted the access of minors to online games using addiction prevention systems (APSes). At the same time, and through different means, i.e., the Great Firewall of China (GFW), it also restricts general population access to the international Internet. This paper studies how these restrictions impact young online gamers, and their evasion efforts. We present results from surveys (n = 2,415) and semi-structured interviews (n = 35) revealing viable commonly deployed APS evasion techniques and APS vulnerabilities. We conclude that the APS does not work as designed, even against very young online game players, and can act as a censorship evasion training ground for tomorrow's adults, by familiarization with and normalization of general evasion techniques, and desensitization to their dangers. Findings from these studies may further inform developers of censorship-resistant systems about the perceptions and evasion strategies of their prospective users, and help design tools that leverage services and platforms popular among the censored audience.

DeResistor: Toward Detection-Resistant Probing for Evasion of Internet Censorship

Abderrahmen Amich and Birhanu Eshete, University of Michigan, Dearborn; Vinod Yegneswaran, SRI International; Nguyen Phong Hoang, University of Chicago

Available Media

The arms race between Internet freedom advocates and censors has catalyzed the emergence of sophisticated blocking techniques and directed significant research emphasis toward the development of automated censorship measurement and evasion tools based on packet manipulation. However, we observe that the probing process of censorship middleboxes using state-of-the-art evasion tools can be easily fingerprinted by censors, necessitating detection-resilient probing techniques.

We validate our hypothesis by developing a real-time detection approach that utilizes Machine Learning (ML) to detect flow-level packet-manipulation and an algorithm for IP-level detection based on Threshold Random Walk (TRW). We then take the first steps toward detection-resilient censorship evasion by presenting DeResistor, a system that facilitates detection-resilient probing for packet-manipulation-based censorship-evasion. DeResistor aims to defuse detection logic employed by censors by performing detection guided pausing of censorship evasion attempts and interleaving them with normal user-driven network activity.

We evaluate our techniques by leveraging Geneva, a state-of-the-art evasion strategy generator, and validate them against 11 simulated censors supplied by Geneva, while also testing them against real-world censors (i.e., China's Great Firewall (GFW), India and Kazakhstan). From an adversarial perspective, our proposed real-time detection method can quickly detect clients that attempt to probe censorship middleboxes with manipulated packets after inspecting only two probing flows. From a defense perspective, DeResistor is effective at shielding Geneva training from detection while enabling it to narrow the search space to produce less detectable traffic. Importantly, censorship evasion strategies generated using DeResistor can attain a high success rate from different vantage points against the GFW (up to 98%) and 100% in India and Kazakhstan. Finally, we discuss detection countermeasures and extensibility of our approach to other censor-probing-based tools.

Timeless Timing Attacks and Preload Defenses in Tor's DNS Cache

Rasmus Dahlberg and Tobias Pulls, Karlstad University

Available Media

We show that Tor's DNS cache is vulnerable to a timeless timing attack, allowing anyone to determine if a domain is cached or not without any false positives. The attack requires sending a single TLS record. It can be repeated to determine when a domain is no longer cached to leak the insertion time. Our evaluation in the Tor network shows no instances of cached domains being reported as uncached and vice versa after 12M repetitions while only targeting our own domains. This shifts DNS in Tor from an unreliable side-channel—using traditional timing attacks with network jitter—to being perfectly reliable. We responsibly disclosed the attack and suggested two short-term mitigations.

As a long-term defense for the DNS cache in Tor against all types of (timeless) timing attacks, we propose a redesign where only an allowlist of domains is preloaded to always be cached across circuits. We compare the performance of a preloaded DNS cache to Tor's current solution towards DNS by measuring aggregated statistics for four months from two exits (after engaging with the Tor Research Safety Board and our university ethical review process). The evaluated preload lists are variants of the following top-lists: Alexa, Cisco Umbrella, and Tranco. Our results show that four-months-old preload lists can be tuned to offer comparable performance under similar resource usage or to significantly improve shared cache-hit ratios (2–3x) with a modest increase in memory usage and resolver load compared to a 100 Mbit/s exit. We conclude that Tor's current DNS cache is mostly a privacy harm because the majority of cached domains are unlikely to lead to cache hits but remain there to be probed by attackers.

How the Great Firewall of China Detects and Blocks Fully Encrypted Traffic

Mingshi Wu, GFW Report; Jackson Sippe, University of Colorado Boulder; Danesh Sivakumar and Jack Burg, University of Maryland; Peter Anderson, Independent researcher; Xiaokang Wang, V2Ray Project; Kevin Bock, University of Maryland; Amir Houmansadr, University of Massachusetts Amherst; Dave Levin, University of Maryland; Eric Wustrow, University of Colorado Boulder

Available Media

One of the cornerstones in censorship circumvention is fully encrypted protocols, which encrypt every byte of the payload in an attempt to “look like nothing”. In early November 2021, the Great Firewall of China (GFW) deployed a new censorship technique that passively detects—and subsequently blocks—fully encrypted traffic in real time. The GFW’s new censorship capability affects a large set of popular censorship circumvention protocols, including but not limited to Shadowsocks, VMess, and Obfs4. Although China had long actively probed such protocols, this was the first report of purely passive detection, leading the anti-censorship community to ask how detection was possible.

In this paper, we measure and characterize the GFW’s new system for censoring fully encrypted traffic. We find that, instead of directly defining what fully encrypted traffic is, the censor applies crude but efficient heuristics to exempt traffic that is unlikely to be fully encrypted traffic; it then blocks the remaining non-exempted traffic. These heuristics are based on the fingerprints of common protocols, the fraction of set bits, and the number, fraction, and position of printable ASCII characters. Our Internet scans reveal what traffic and which IP addresses the GFW inspects. We simulate the inferred GFW’s detection algorithm on live traffic at a university network tap to evaluate its comprehensiveness and false positives. We show evidence that the rules we inferred have good coverage of what the GFW actually uses. We estimate that, if applied broadly, it could potentially block about 0.6% of normal Internet traffic as collateral damage.

Our understanding of the GFW’s new censorship mechanism helps us derive several practical circumvention strategies. We responsibly disclosed our findings and suggestions to the developers of different anti-censorship tools, helping millions of users successfully evade this new form of blocking.

Track 4

Machine Learning Backdoors

Session Chair: Mahmood Sharif, Tel Aviv University

Platinum Salon 9–10

A Data-free Backdoor Injection Approach in Neural Networks

Peizhuo Lv, Chang Yue, Ruigang Liang, and Yunfei Yang, SKLOIS, Institute of Information Engineering, Chinese Academy of Sciences, China; School of Cyber Security, University of Chinese Academy of Sciences, China; Shengzhi Zhang, Department of Computer Science, Metropolitan College, Boston University, USA; Hualong Ma, SKLOIS, Institute of Information Engineering, Chinese Academy of Sciences, China; School of Cyber Security, University of Chinese Academy of Sciences, China; Kai Chen, SKLOIS, Institute of Information Engineering, Chinese Academy of Sciences, China; School of Cyber Security, University of Chinese Academy of Sciences, China; Beijing Academy of Artificial Intelligence, China

Available Media

Recently, the backdoor attack on deep neural networks (DNNs) has been extensively studied, which causes the backdoored models to behave well on benign samples, whereas performing maliciously on controlled samples (with triggers attached). Almost all existing backdoor attacks require access to the original training/testing dataset or data relevant to the main task to inject backdoors into the target models, which is unrealistic in many scenarios, e.g., private training data. In this paper, we propose a novel backdoor injection approach in a "data-free" manner. We collect substitute data irrelevant to the main task and reduce its volume by filtering out redundant samples to improve the efficiency of backdoor injection. We design a novel loss function for fine-tuning the original model into the backdoored one using the substitute data, and optimize the fine-tuning to balance the backdoor injection and the performance on the main task. We conduct extensive experiments on various deep learning scenarios, e.g., image classification, text classification, tabular classification, image generation, and multimodal, using different models, e.g., Convolutional Neural Networks (CNNs), Autoencoders, Transformer models, Tabular models, as well as Multimodal DNNs. The evaluation results demonstrate that our data-free backdoor injection approach can efficiently embed backdoors with a nearly 100\% attack success rate, incurring an acceptable performance downgrade on the main task.

Sparsity Brings Vulnerabilities: Exploring New Metrics in Backdoor Attacks

Jianwen Tian, NKLSTISS, Institute of Systems Engineering, Academy of Military Sciences, China; Kefan Qiu, School of Cyberspace Science and Technology, Beijing Institute of Technology; Debin Gao, Singapore Management University; Zhi Wang, DISSec, College of Cyber Science, Nankai University; Xiaohui Kuang and Gang Zhao, NKLSTISS, Institute of Systems Engineering, Academy of Military Sciences, China

Available Media

Nowadays, using AI-based detectors to keep pace with the fast iterating of malware has attracted a great attention. However, most AI-based malware detectors use features with vast sparse subspaces to characterize applications, which brings significant vulnerabilities to the model. To exploit this sparsity-related vulnerability, we propose a clean-label backdoor attack consisting of a dissimilarity metric-based candidate selection and a variation ratio-based trigger construction.%, which shows the strongest attack performance compared with previous strategies.

The proposed backdoor is verified on different datasets, including a Windows PE dataset, an Android dataset with numerical and boolean feature values, and a PDF dataset. The experimental results show that the attack can slash the accuracy on watermarked malware to nearly 0% even with the least number (0.01% of the class set) of watermarked goodwares compared to previous attacks. Problem space constraints are also considered with experiments in data-agnostic scenario} and data-and-model-agnostic scenario, proving transferability between different datasets as well as deep neural networks and traditional classifiers. The attack is verified consistently powerful under the above scenarios. Moreover, eight existing defenses were tested with their effect left much to be desired. We demonstrated the reason and proposed a subspace compression strategy to boost models' robustness, which also makes part of the previously failed defenses effective.

Aliasing Backdoor Attacks on Pre-trained Models

Cheng'an Wei, SKLOIS, Institute of Information Engineering, Chinese Academy of Sciences, China; School of Cyber Security, University of Chinese Academy of Sciences, China; Yeonjoon Lee, Hanyang University, Ansan, Republic of Korea; Kai Chen, Guozhu Meng, and Peizhuo Lv, SKLOIS, Institute of Information Engineering, Chinese Academy of Sciences, China; School of Cyber Security, University of Chinese Academy of Sciences, China

Available Media

Pre-trained deep learning models are widely used to train accurate models with limited data in a short time. To reduce computational costs, pre-trained neural networks often employ subsampling operations. However, recent studies have shown that these subsampling operations can cause aliasing issues, resulting in problems with generalization. Despite this knowledge, there is still a lack of research on the relationship between the aliasing of neural networks and security threats, such as adversarial attacks and backdoor attacks, which manipulate model predictions without the awareness of victims. In this paper, we propose the aliasing backdoor, a low-cost and data-free attack that threatens mainstream pre-trained models and transfers to all student models fine-tuned from them. The key idea is to create an aliasing error in the strided layers of the network and manipulate a benign input to a targeted intermediate representation. To evaluate the attack, we conduct experiments on image classification, face recognition, and speech recognition tasks. The results show that our approach can effectively attack mainstream models with a success rate of over 95%. Our research, based on the aliasing error caused by subsampling, reveals a fundamental security weakness of strided layers, which are widely used in modern neural network architectures. To the best of our knowledge, this is the first work to exploit the strided layers to launch backdoor attacks.

ASSET: Robust Backdoor Data Detection Across a Multiplicity of Deep Learning Paradigms

Minzhou Pan and Yi Zeng, Virginia Tech; Lingjuan Lyu, Sony AI; Xue Lin, Northeastern University; Ruoxi Jia, Virginia Tech

Available Media

Backdoor data detection is traditionally studied in an end-to-end supervised learning (SL) setting. However, recent years have seen the proliferating adoption of self-supervised learning (SSL) and transfer learning (TL), due to their lesser need for labeled data. Successful backdoor attacks have also been demonstrated in these new settings. However, we lack a thorough understanding of the applicability of existing detection methods across a variety of learning settings. By evaluating 56 attack settings, we show that the performance of most existing detection methods varies significantly across different attacks and poison ratios, and all fail on the state-of-the-art clean-label backdoor attack which only manipulates a few training data's features with imperceptible noise without changing labels. In addition, existing methods either become inapplicable or suffer large performance losses when applied to SSL and TL. We propose a new detection method called Active Separation-via Offset (ASSET), which actively induces different model behaviors between the backdoor and clean samples to promote their separation. We also provide procedures to adaptively select the number of suspicious points to remove. In the end-to-end SL setting, ASSET is superior to existing methods in terms of consistency of defensive performance across different attacks and robustness to changes in poison ratios; in particular, it is the only method that can detect the state-of-the-art clean-label attack. Moreover, ASSET's average detection rates are higher than the best existing methods in SSL and TL, respectively, by 69.3% and 33.2%, thus providing the first practical backdoor defense for these emerging DL settings.

VILLAIN: Backdoor Attacks Against Vertical Split Learning

Yijie Bai and Yanjiao Chen, Zhejiang University; Hanlei Zhang and Wenyuan Xu, Zhejing University; Haiqin Weng and Dou Goodman, Ant Group

Available Media

Vertical split learning is a new paradigm of federated learning for participants with vertically partitioned data. In this paper, we make the first attempt to explore the possibility of backdoor attacks by a malicious participant in vertical split learning. Different from conventional federated learning, vertical split learning poses new challenges for backdoor attacks, the most looming ones being a lack of access to the training data labels and the server model. To tackle these challenges, we propose VILLAIN, a backdoor attack framework that features effective label inference and data poisoning strategies. VILLAIN realizes high inference accuracy of the target label samples for the attacker. Furthermore, VILLAIN intensifies the backdoor attack power by designing a stealthy additive trigger and introducing backdoor augmentation strategies to impose a larger influence on the server model. Our extensive evaluations on 6 datasets with comprehensive vertical split learning models and aggregation methods confirm the effectiveness of VILLAIN . It is also demonstrated that VILLAIN can resist the popular privacy inference defenses, backdoor detection or removal defenses, and adaptive defenses.

Track 5

Integrity

Session Chair: Shweta Shinde, ETH Zurich

Platinum Salon 3–4

ARI: Attestation of Real-time Mission Execution Integrity

Jinwen Wang, Yujie Wang, and Ao Li, Washington University in St. Louis; Yang Xiao, University of Kentucky; Ruide Zhang, Wenjing Lou, and Y. Thomas Hou, Virginia Polytechnic Institute and State University; Ning Zhang, Washington University in St. Louis

Available Media

With the proliferation of autonomous safety-critical cyber-physical systems (CPS) in our daily life, their security is becoming ever more important. Remote attestation is a powerful mechanism to enable remote verification of system integrity. While recent developments have made it possible to efficiently attest IoT operations, autonomous systems that are built on top of real-time cyber-physical control loops and execute missions independently present new unique challenges.

In this paper, we formulate a new security property, Real-time Mission Execution Integrity (RMEI) to provide proof of correct and timely execution of the missions. While it is an attractive property, measuring it can incur prohibitive overhead for the real-time autonomous system. To tackle this challenge, we propose policy-based attestation of compartments to enable a trade-off between the level of details in measurement and runtime overhead. To further minimize the impact on real-time responsiveness, multiple techniques were developed to improve the performance, including customized software instrumentation and timing recovery through re-execution. We implemented a prototype of ARI and evaluated its performance on five CPS platforms. A user study involving 21 developers with different skill sets was conducted to understand the usability of our solution.

Design of Access Control Mechanisms in Systems-on-Chip with Formal Integrity Guarantees

Dino Mehmedagić, Mohammad Rahmani Fadiheh, Johannes Müller, Anna Lena Duque Antón, Dominik Stoffel, and Wolfgang Kunz, Rheinland-Pfälzische Technische Universität (RPTU) Kaiserslautern-Landau, Germany

Available Media

Many SoCs employ system-level hardware access control mechanisms to ensure that security-critical operations cannot be tampered with by less trusted components of the circuit. While there are many design and verification techniques for developing an access control system, continuous discoveries of new vulnerabilities in such systems suggest a need for an exhaustive verification methodology to find and eliminate such weaknesses. This paper proposes UPEC-OI, a formal verification methodology that exhaustively covers integrity vulnerabilities of an SoC-level access control system. The approach is based on iteratively checking a 2-safety interval property whose formulation does not require any explicit specification of possible attack scenarios. The counterexamples returned by UPEC-OI can provide designers of access control hardware with valuable information on possible attack channels, allowing them to perform pinpoint fixes. We present a verification-driven development methodology which formally guarantees the developed SoC’s access control mechanism to be secure with respect to integrity. We evaluate the proposed approach in a case study on OpenTitan’s Earl Grey SoC where we add an SoC-level access control mechanism alongside malicious IPs to model the threat. UPEC-OI was found vital to guarantee the integrity of the mechanism and was proven to be tractable for SoCs of realistic size.

HashTag: Hash-based Integrity Protection for Tagged Architectures

Lukas Lamster, Martin Unterguggenberger, David Schrammel, and Stefan Mangard, Graz University of Technology

Available Media

Modern computing systems rely on error-correcting codes to ensure the integrity of DRAM data. Linear checksums allow for fast detection and correction of specific error patterns. However, they do not offer sufficient protection against complex errors distributed over multiple data words and chips. Depending on the code and the error pattern, linear codes may fail to detect or even miscorrect errors, thus leading to silent data corruption. In this work, we show how compact error-correcting codes based on low-latency hashing functions allow for strong probabilistic error detection and correction while facilitating ECC bit repurposing. Our proposed design drastically lowers the expected rate of undetected errors, regardless of the underlying error patterns. By tailoring the size of our codes to the required level of integrity protection, we are able to free bits that would otherwise be required to store ECC data. We showcase how our design facilitates the efficient implementation of tagged memory architectures such as CHERI, ARM MTE, and SPARC ADI by repurposing the freed bits in commodity ECC DRAM. Thus, we harden systems against data corruption due to DRAM faults while simultaneously allowing for memory tagging without introducing additional memory accesses. We present a systematic analysis of schemes that allow memory tagging on a cache line granularity while maintaining error detection and correction capabilities, even in multi-bit fault scenarios. We evaluate our integrity protection with tagging for different use cases and show that we can store 32 bits of additional tags per cache line, twice the amount needed to implement ARM's MTE, without significantly affecting error correction capabilities. We also show how up to 51 bits can be made available while maintaining single-bit error correction.

XCheck: Verifying Integrity of 3D Printed Patient-Specific Devices via Computing Tomography

Zhiyuan Yu, Yuanhaur Chang, Shixuan Zhai, Nicholas Deily, and Tao Ju, Washington University in St. Louis; XiaoFeng Wang, Indiana University Bloomington; Uday Jammalamadaka, Rice University; Ning Zhang, Washington University in St. Louis

Available Media

3D printing is bringing revolutionary changes to the field of medicine, with applications ranging from hearing aids to regrowing organs. As our society increasingly relies on this technology to save lives, the security of these systems is a growing concern. However, existing defense approaches that leverage side channels may require domain knowledge from computer security to fully understand the impact of the attack.

To bridge the gap, we propose XCheck, which leverages medical imaging to verify the integrity of the printed patient-specific device (PSD). XCheck follows a defense-in-depth approach and directly compares the computed tomography (CT) scan of the printed device to its original design. XCheck utilizes a voxel-based approach to build multiple layers of defense involving both 3D geometric verification and multivariate material analysis. To further enhance usability, XCheck also provides an adjustable visualization scheme that allows practitioners' inspection of the printed object with varying tolerance thresholds to meet the needs of different applications. We evaluated the system with 47 PSDs representing different medical applications to validate the efficacy.

Demystifying Pointer Authentication on Apple M1

Zechao Cai, Jiaxun Zhu, Wenbo Shen, Yutian Yang, and Rui Chang, Zhejiang University and ZJU-Hangzhou Global Scientific and Technological Innovation Center; Yu Wang, Hangzhou Cyberserval Co., Ltd.; Jinku Li, Xidian University; Kui Ren, Zhejiang University and ZJU-Hangzhou Global Scientific and Technological Innovation Center

Available Media

Pointer Authentication (PA) was introduced by ARMv8.3 to safeguard the integrity of pointers. While the ARM specification allows vendors to implement and customize PA, Apple has tailored it on their hardware to protect iPhones and Macs with M-series chips. Since its debut, Apple PA has been considered effective in defeating pointer corruption. However, its details have not been publicly disclosed.

To shed light on Apple PA customization, this paper conducts an in-depth reverse engineering study focused on Apple PA's hardware implementation and usage on the M1 chip. We develop a reverse engineering framework and propose novel techniques to uncover and confirm our new findings.

Our study uncovers that Apple PA has implemented several hardware-based diversifiers to counter pointer forgery attacks across various domains, which is previously unknown to researchers outside of Apple. We further discover that the XNU kernel (the kernel used by iOS and macOS) incorporates nine types of modifiers for signing and authenticating pointers and customized key management based on Apple PA hardware. Based on our in-depth understanding of Apple PA, we perform a security analysis of PA-based control-flow integrity and data-flow integrity in the XNU kernel, identifying four attack surfaces. Apple has fixed these issues in a security update and assigned us a new CVE.

Track 6

Fuzzing Firmware and Drivers

Session Chair: Sang Kil Cha, KAIST

Platinum Salon 1–2

DDRace: Finding Concurrency UAF Vulnerabilities in Linux Drivers with Directed Fuzzing

Ming Yuan and Bodong Zhao, Tsinghua University; Penghui Li, The Chinese University of Hong Kong; Jiashuo Liang and Xinhui Han, Peking University; Xiapu Luo, The Hong Kong Polytechnic University; Chao Zhang, Tsinghua University and Zhongguancun Lab

Available Media

Concurrency use-after-free (UAF) vulnerabilities account for a large portion of UAF vulnerabilities in Linux drivers. Many solutions have been proposed to find either concurrency bugs or UAF vulnerabilities, but few of them can be directly applied to efficiently find concurrency UAF vulnerabilities. In this paper, we propose the first concurrency directed greybox fuzzing solution DDRace to discover concurrency UAF vulnerabilities efficiently in Linux drivers. First, we identify candidate use-after-free locations as target sites and extract the relevant concurrency elements to reduce the exploration space of directed fuzzing. Second, we design a novel vulnerability related distance metric and an interleaving priority scheme to guide the fuzzer to better explore UAF vulnerabilities and thread interleavings. Lastly, to make test cases reproducible, we design an adaptive kernel state migration scheme to assist continuous fuzzing. We have implemented a prototype of DDRace, and evaluated it on upstream Linux drivers. Results show that DDRace is effective at discovering concurrency use-after-free vulnerabilities. It finds 4 unknown vulnerabilities and 8 known ones, which is more effective than other state-of-the-art solutions.

Automata-Guided Control-Flow-Sensitive Fuzz Driver Generation

Cen Zhang and Yuekang Li, Nanyang Technological University, Continental-NTU Corporate Lab; Hao Zhou, The Hong Kong Polytechnic University; Xiaohan Zhang, Xidian University; Yaowen Zheng, Nanyang Technological University, Continental-NTU Corporate Lab; Xian Zhan, Southern University of Science and Technology; The Hong Kong Polytechnic University; Xiaofei Xie, Singapore Management University; Xiapu Luo, The Hong Kong Polytechnic University; Xinghua Li, Xidian University; Yang Liu, Nanyang Technological University, Continental-NTU Corporate Lab; Sheikh Mahbub Habib, Continental AG, Germany

Available Media

Fuzz drivers are essential for fuzzing library APIs. However, manually composing fuzz drivers is difficult and time-consuming. Therefore, several works have been proposed to generate fuzz drivers automatically. Although these works can learn correct API usage from the consumer programs of the target library, three challenges still hinder the quality of the generated fuzz drivers: 1) How to learn and utilize the control dependencies in API usage; 2) How to handle the noises of the learned API usage, especially for complex real-world consumer programs; 3) How to organize independent sets of API usage inside the fuzz driver to better coordinate with fuzzers.

To solve these challenges, we propose RUBICK, an automata-guided control-flow-sensitive fuzz driver generation technique. RUBICK has three key features: 1) it models the API usage (including API data and control dependencies) as a deterministic finite automaton; 2) it leverages active automata learning algorithm to distill the learned API usage; 3) it synthesizes a single automata-guided fuzz driver, which provides scheduling interface for the fuzzer to test independent sets of API usage during fuzzing. During the experiments, the fuzz drivers generated by RUBICK showed a significant performance advantage over the baselines by covering an average of 50.42% more edges than fuzz drivers generated by FUZZGEN and 44.58% more edges than manually written fuzz drivers from OSS-Fuzz or human experts. By learning from large-scale open source projects, RUBICK has generated fuzz drivers for 11 popular Java projects and two of them have been merged into OSS-Fuzz. So far, 199 bugs, including four CVEs, are found using these fuzz drivers, which can affect popular PC and Android software with dozens of millions of downloads.

Hoedur: Embedded Firmware Fuzzing using Multi-Stream Inputs

Tobias Scharnowski and Simon Wörner, CISPA Helmholtz Center for Information Security; Felix Buchmann, Ruhr University Bochum; Nils Bars, Moritz Schloegel, and Thorsten Holz, CISPA Helmholtz Center for Information Security

Available Media

Embedded systems with their diverse, interconnected components form the backbone of our digital infrastructure. Despite their importance, analyzing their security in a scalable way has remained elusive and challenging. Recent firmware rehosting work has brought scalable, dynamic analyses to embedded systems, making fuzzing for automated vulnerability assessments feasible. As these works focus on modeling device behavior rather than fuzzing, they integrate with off-the-shelf fuzzers in an ad-hoc manner. They re-interpret traditional flat binary fuzzing input as a sequence of hardware responses. In practice, this presents the fuzzer with an input layout that is fragile, opaque, and hard to mutate effectively.

Our work is based on the insight that while firmware emulation recently matured significantly, the input space is presented to the fuzzer in an ineffective manner. We propose a novel method for a firmware-aware fuzzing integration based on multi-stream inputs. We reorganize the previously flat, sequential, and opaque firmware fuzzing input into multiple strictly typed and cohesive streams. This allows our fuzzer, HOEDUR, to perform type-aware mutations and maintain its progress. It also enables firmware fuzzing to use state-of-theart mutation techniques. Overall, we find that these techniques significantly increase fuzzing effectiveness. Our evaluation shows that HOEDUR achieves up to 5x the coverage of stateof-the-art firmware fuzzers, finds bugs that other fuzzers do not, and discovers known bugs up to 550x faster. In total, HOEDUR uncovered 23 previously unknown bugs.

Forming Faster Firmware Fuzzers

Lukas Seidel, Qwiet AI; Dominik Maier, TU Berlin; Marius Muench, VU Amsterdam and University of Birmingham

Available Media

A recent trend for assessing the security of an embedded system’s firmware is rehosting, the art of running the firmware in a virtualized environment, rather than on the original hardware platform. One significant use case for firmware rehosting is fuzzing to dynamically uncover security vulnerabilities.

However, state-of-the-art implementations suffer from high emulator-induced overhead, leading to less-than-optimal execution speeds. Instead of emulation, we propose near-native rehosting: running embedded firmware as a Linux userspace process on a high-performance system that shares the instruction set family with the targeted device. We implement this approach with SAFIREFUZZ, a throughput-optimized rehosting and fuzzing framework for ARM Cortex-M firmware. SAFIREFUZZ takes monolithic binary-only firmware images and uses high-level emulation (HLE) and dynamic binary rewriting to run them on far more powerful hardware with low overhead. By replicating experiments of HALucinator, the state-of-the-art HLE-based rehosting system for binary firmware, we show that SAFIREFUZZ can provide a 690x throughput increase on average during 24-hour fuzzing campaigns while covering up to 30% more basic blocks.

ReUSB: Replay-Guided USB Driver Fuzzing

Jisoo Jang, Minsuk Kang, and Dokyung Song, Yonsei University

Available Media

Vulnerabilities in device drivers are constantly threatening the security of OS kernels. USB drivers are particularly concerning due to their widespread use and the wide variety of their attack vectors. Recently, fuzzing has been shown to be effective at finding vulnerabilities in USB drivers. Numerous vulnerabilities in USB drivers have been discovered by existing fuzzers; however, the number of code paths and vulnerabilities found, unfortunately, has stagnated. A key obstacle is the statefulness of USB drivers; that is, most of their code can be covered only when given a specific sequence of inputs.

We observe that record-and-replay defined at the trust boundary of USB drivers directly helps overcoming the obstacle; deep states can be reached by reproducing recorded executions, and, combined with fuzzing, deeper code paths and vulnerabilities can be found. We present ReUSB, a USB driver fuzzer that guides fuzzing along two-dimensional record-and-replay of USB drivers to enhance their fuzzing. We address two fundamental challenges: faithfully replaying USB driver executions, and amplifying the effect of replay in fuzzing. To this end, we first introduce a set of language-level constructs that are essential in faithfully describing concurrent, two-dimensional traces but missing in state-of-the-art kernel fuzzers, and propose time-, concurrency-, and context-aware replay that can reproduce recorded driver executions with high fidelity. We then amplify the effect of our high-fidelity replay by guiding fuzzing along the replay of recorded executions, while mitigating the slowdown and side effects induced by replay via replay checkpointing. We implemented ReUSB, and evaluated it using two-dimensional traces of 10 widely used USB drivers of 3 different classes. The results show that ReUSB can significantly enhance USB driver fuzzing; it improved the code coverage of these drivers by 76% over a strong baseline, and found 15 previously unknown bugs.

10:15 am–10:45 am

Break with Refreshments

Platinum Foyer

10:45 am–12:00 pm

Track 1

Vehicles and Security

Session Chair: Martin Strohmeier, armasuisse Science + Technology

Platinum Salon 6

Exorcising "Wraith": Protecting LiDAR-based Object Detector in Automated Driving System from Appearing Attacks

Qifan Xiao, Xudong Pan, Yifan Lu, Mi Zhang, Jiarun Dai, and Min Yang, Fudan University

Available Media

Automated driving systems rely on 3D object detectors to recognize possible obstacles from LiDAR point clouds. However, recent works show the adversary can forge non-existent cars in the prediction results with a few fake points (i.e., appearing attack). By removing statistical outliers, existing defenses are however designed for specific attacks or biased by predefined heuristic rules. Towards more comprehensive mitigation, we first systematically inspect the mechanism of previous appearing attacks: Their common weaknesses are observed in crafting fake obstacles which (i) have obvious differences in the local parts compared with real obstacles and (ii) violate the physical relation between depth and point density.

In this paper, we propose a novel plug-and-play defensive module which works by side of a trained LiDAR-based object detector to eliminate forged obstacles where a major proportion of local parts have low objectness, i.e., to what degree it belongs to a real object. At the core of our module is a local objectness predictor, which explicitly incorporates the depth information to model the relation between depth and point density, and predicts each local part of an obstacle with an objectness score. Extensive experiments show, our proposed defense eliminates at least 70% cars forged by three known appearing attacks in most cases, while, for the best previous defense, less than 30% forged cars are eliminated. Meanwhile, under the same circumstance, our defense incurs less overhead for AP/precision on cars compared with existing defenses. Furthermore, We validate the effectiveness of our proposed defense on simulation-based closed-loop control driving tests in the open-source system of Baidu's Apollo.

Discovering Adversarial Driving Maneuvers against Autonomous Vehicles

Ruoyu Song, Muslum Ozgur Ozmen, Hyungsub Kim, Raymond Muller, Z. Berkay Celik, and Antonio Bianchi, Purdue University

Available Media

Over 33% of vehicles sold in 2021 had integrated autonomous driving (AD) systems. While many adversarial machine learning attacks have been studied against these systems, they all require an adversary to perform specific (and often unrealistic) actions, such as carefully modifying traffic signs or projecting malicious images, which may arouse suspicion if discovered. In this paper, we present Acero, a robustness-guided framework to discover adversarial maneuver attacks against autonomous vehicles (AVs). These maneuvers look innocent to the outside observer but force the victim vehicle to violate safety rules for AVs, causing physical consequences, e.g., crashing with pedestrians and other vehicles. To optimally find adversarial driving maneuvers, we formalize seven safety requirements for AD systems and use this formalization to guide our search. We also formalize seven physical constraints that ensure the adversary does not place themselves in danger or violate traffic laws while conducting the attack. Acero then leverages trajectory-similarity metrics to cluster successful attacks into unique groups, enabling AD developers to analyze the root cause of attacks and mitigate them. We evaluated Acero on two open-source AD software, openpilot and Autoware, running on the CARLA simulator. Acero discovered 219 attacks against openpilot and 122 attacks against Autoware. 73.3% of these attacks cause the victim to collide with a third-party vehicle, pedestrian, or static object.

Understand Users' Privacy Perception and Decision of V2X Communication in Connected Autonomous Vehicles

Zekun Cai and Aiping Xiong, The Pennsylvania State University

Available Media

Connected autonomous vehicles (CAVs) offer opportunities to improve road safety and enhance traffic efficiency. Vehicle-to-everything (V2X) communication allows CAVs to communicate with any entity that may affect, or may be affected by, the vehicles. The implementation of V2X in CAVs is inseparable from sharing and receiving a wide variety of data. Nevertheless, the public is not necessarily aware of such ubiquitous data exchange or does not understand their implications. We conducted an online study (N = 595) examining drivers’ privacy perceptions and decisions of four V2X application scenarios. Participants perceived more benefits but fewer risks of data sharing in the V2X scenarios where data collection is critical for driving than otherwise. They also showed more willingness to share data in those scenarios. In addition, we found that participants’ awareness of privacy risks (priming) and their experience on driving assistance and connectivity functions impacted their data-sharing decisions. Qualitative data confirmed that benefits, especially safety, come first, indicating a privacy-safety tradeoff. Moreover, factors such as misconceptions and novel expectations about CAV data collection and use moderated participants’ privacy decisions. We discuss implications of the obtained results to inform CAV privacy design and development.

You Can't See Me: Physical Removal Attacks on LiDAR-based Autonomous Vehicles Driving Frameworks

Yulong Cao, University of Michigan; S. Hrushikesh Bhupathiraju and Pirouz Naghavi, University of Florida; Takeshi Sugawara, The University of Electro-Communications; Z. Morley Mao, University of Michigan; Sara Rampazzi, University of Florida

Available Media

Autonomous Vehicles (AVs) increasingly use LiDAR-based object detection systems to perceive other vehicles and pedestrians on the road. While existing attacks on LiDAR-based autonomous driving architectures focus on lowering the confidence score of AV object detection models to induce obstacle misdetection, our research discovers how to leverage laser-based spoofing techniques to selectively remove the LiDAR point cloud data of genuine obstacles at the sensor level before being used as input to the AV perception. The ablation of this critical LiDAR information causes autonomous driving obstacle detectors to fail to identify and locate obstacles and, consequently, induces AVs to make dangerous automatic driving decisions. In this paper, we present a method invisible to the human eye that hides objects and deceives autonomous vehicles’ obstacle detectors by exploiting inherent automatic transformation and filtering processes of LiDAR sensor data integrated with autonomous driving frameworks. We call such attacks Physical Removal Attacks (PRA), and we demonstrate their effectiveness against three popular AV obstacle detectors (Apollo, Autoware, PointPillars), and we achieve 45◦ attack capability. We evaluate the attack impact on three fusion models (Frustum-ConvNet, AVOD, and Integrated-Semantic Level Fusion) and the consequences on the driving decision using LGSVL, an industry-grade simulator. In our moving vehicle scenarios, we achieve a 92.7% success rate removing 90% of a target obstacle’s cloud points. Finally, we demonstrate the attack’s success against two popular defenses against spoofing and object hiding attacks and discuss two enhanced defense strategies to mitigate our attack.

PatchVerif: Discovering Faulty Patches in Robotic Vehicles

Hyungsub Kim, Muslum Ozgur Ozmen, Z. Berkay Celik, Antonio Bianchi, and Dongyan Xu, Purdue University

Available Media

Modern software is continuously patched to fix bugs and security vulnerabilities. Patching is particularly important in robotic vehicles (RVs), in which safety and security bugs can cause severe physical damages. However, existing automated methods struggle to identify faulty patches in RVs, due to their inability to systematically determine patch-introduced behavioral modifications, which affect how the RV interacts with the physical environment.

In this paper, we introduce PATCHVERIF, an automated patch analysis framework. PATCHVERIF’s goal is to evaluate whether a given patch introduces bugs in the patched RV control software. To this aim, PATCHVERIF uses a combination of static and dynamic analysis to measure how the analyzed patch affects the physical state of an RV. Specifically, PATCHVERIF uses a dedicated input mutation algorithm to generate RV inputs that maximize the behavioral differences (in the physical space) between the original code and the patched one. Using the collected information about patch-introduced behavioral modifications, PATCHVERIF employs support vector machines (SVMs) to infer whether a patch is faulty or correct.

We evaluated PATCHVERIF on two popular RV control software (ArduPilot and PX4), and it successfully identified faulty patches with an average precision and recall of 97.9% and 92.1%, respectively. Moreover, PATCHVERIF discovered 115 previously unknown bugs, 103 of which have been acknowledged, and 51 of them have already been fixed.

Track 2

Verifying Users

Session Chair: Scott Ruoti, The University of Tennessee, Knoxville

Platinum Salon 5

Fast IDentity Online with Anonymous Credentials (FIDO-AC)

Wei-Zhu Yeoh, CISPA Helmholtz Center for Information Security; Michal Kepkowski, Macquarie University; Gunnar Heide, CISPA Helmholtz Center for Information Security; Dali Kaafar, Macquarie University; Lucjan Hanzlik, CISPA Helmholtz Center for Information Security

Available Media

Web authentication is a critical component of today's Internet and the digital world we interact with. The FIDO2 protocol enables users to leverage common devices to easily authenticate to online services in both mobile and desktop environments, following the passwordless authentication approach based on cryptography and biometric verification. However, there is little to no connection between the authentication process and users' attributes. More specifically, the FIDO protocol does not specify methods that could be used to combine trusted attributes with the FIDO authentication process generically and allow users to disclose them to the relying party arbitrarily. In essence, applications requiring attributes verification (e.g., age or expiry date of a driver's license, etc.) still rely on ad-hoc approaches that do not satisfy the data minimization principle and do not allow the user to check the disclosed data. A primary recent example is the data breach on Singtel Optus, one of the major telecommunications providers in Australia, where very personal and sensitive data (e.g., passport numbers) were leaked. This paper introduces FIDO-AC, a novel framework that combines the FIDO2 authentication process with the user's digital and non-shareable identity. We show how to instantiate this framework using off-the-shelf FIDO tokens and any electronic identity document, e.g., the ICAO biometric passport (ePassport). We demonstrate the practicality of our approach by evaluating a prototype implementation of the FIDO-AC system.

How to Bind Anonymous Credentials to Humans

Julia Hesse, IBM Research Europe - Zurich; Nitin Singh, IBM Research India - Bangalore; Alessandro Sorniotti, IBM Research Europe - Zurich

Available Media

Digital and paper-based authentication are the two predominant mechanisms that have been deployed in the real world to authenticate end-users. When verification of a digital credential is performed in person (e.g. the authentication that was often required to access facilities at the peak of the COVID global pandemic), the two mechanisms are often deployed together: the verifier checks government-issued ID to match the picture on the ID to the individual holding it, and then checks the digital credential to see that the personal details on it match those on the ID and to discover additional attributes of the holder. This pattern is extremely common and very likely to remain in place for the foreseeable future. However, it poses an interesting problem: if the digital credential is privacy-preserving (e.g. based on BBS+ on CL signatures), but the holder is still forced to show an ID card or a passport to verify that the presented credential was indeed issued to the holder, what is the point of deploying privacy-preserving digital credential? In this paper we address this problem by redefining what an ID card should show and force a minimal but mandatory involvement of the card in the digital interaction. Our approach permits verifiers to successfully authenticate holders and to determine if they are the rightful owners of the digital credential. At the same time, optimal privacy guarantees are preserved. We design our scheme, formally define and analyse its security in the Universal Composability (UC) framework, and implement the card component, showing the running time to be below 200ms irrespective of the number of certified attributes.

Inducing Authentication Failures to Bypass Credit Card PINs

David Basin, Patrick Schaller, and Jorge Toro-Pozo, ETH Zurich

Available Media

For credit card transactions using the EMV standard, the integrity of transaction information is protected cryptographically by the credit card. Integrity checks by the payment terminal use RSA signatures and are part of EMV’s offline data authentication mechanism. Online integrity checks by the card issuer use a keyed MAC. One would expect that failures in either mechanism would always result in transaction failure, but this is not the case as offline authentication failures do not always result in declined transactions. Consequently, the integrity of transaction data that is not protected by the keyed MAC (online) cannot be guaranteed.

We show how this missing integrity protection can be exploited to bypass PIN verification for high-value Mastercard transactions. As a proof-of-concept, we have built an Android app that modifies unprotected card-sourced data, including the data relevant for cardholder verification. Using our app, we have tricked real-world terminals into downgrading from PIN verification to either no cardholder verification or (paper) signature verification, for transactions of up to 500 Swiss Francs. Our findings have been disclosed to the vendor with the recommendation to decline any transaction where offline data authentication fails.

An Empirical Study & Evaluation of Modern CAPTCHAs

Andrew Searles, University of California, Irvine; Yoshimichi Nakatsuka, ETH Zürich; Ercan Ozturk, University of California, Irvine; Andrew Paverd, Microsoft; Gene Tsudik, University of California, Irvine; Ai Enkoji, Lawrence Livermore National Laboratory

Available Media

For nearly two decades, CAPTCHAS have been widely used as a means of protection against bots. Throughout the years, as their use grew, techniques to defeat or bypass CAPTCHAS have continued to improve. Meanwhile, CAPTCHAS have also evolved in terms of sophistication and diversity, becoming increasingly difficult to solve for both bots (machines) and humans. Given this long-standing and still-ongoing arms race, it is critical to investigate how long it takes legitimate users to solve modern CAPTCHAS, and how they are perceived by those users.

In this work, we explore CAPTCHAS in the wild by evaluating users' solving performance and perceptions of unmodified currently-deployed CAPTCHAS. We obtain this data through manual inspection of popular websites and user studies in which 1,400 participants collectively solved 14,000 CAPTCHAS. Results show significant differences between the most popular types of CAPTCHAS: surprisingly, solving time and user perception are not always correlated. We performed a comparative study to investigate the effect of experimental context – specifically the difference between solving CAPTCHAS directly versus solving them as part of a more natural task, such as account creation. Whilst there were several potential confounding factors, our results show that experimental context could have an impact on this task, and must be taken into account in future CAPTCHA studies. Finally, we investigate CAPTCHA-induced user task abandonment by analyzing participants who start and do not complete the task.

Account Verification on Social Media: User Perceptions and Paid Enrollment

Madelyne Xiao, Mona Wang, Anunay Kulshrestha, and Jonathan Mayer, Princeton University

Available Media

We investigate how users perceive social media account verification, how those perceptions compare to platform practices, and what happens when a gap emerges. We use recent changes in Twitter's verification process as a natural experiment, where the meaning and types of verification indicators rapidly and significantly shift. The project consists of two components: a user survey and a measurement of verified Twitter accounts.

In the survey study, we ask a demographically representative sample of U.S. respondents (n = 299) about social media account verification requirements both in general and for particular platforms. We also ask about experiences with online information sources and digital literacy. More than half of respondents misunderstand Twitter's criteria for blue check account verification, and over 80% of respondents mis- understand Twitter's new gold and gray check verification indicators. Our analysis of survey responses suggests that people who are older or have lower digital literacy may be modestly more likely to misunderstand Twitter verification.

In the measurement study, we randomly sample 15 million English language tweets from October 2022. We obtain ac- count verification status for the associated accounts in Novem- ber 2022, just before Twitter's verification changes, and we collect verification status again in January 2022. The resulting longitudinal dataset of 2.85 million accounts enables us to characterize the accounts that gained and lost verification following Twitter's changes. We find that accounts posting conservative political content, exhibiting positive views about Elon Musk, and promoting cryptocurrencies disproportionately obtain blue check verification after Twitter's changes.

We close by offering recommendations for improving ac- count verification indicators and processes.

Track 3

DNS Security

Session Chair: Ben Stock, CISPA Helmholtz Center for Information Security

Platinum Salon 7–8

User Awareness and Behaviors Concerning Encrypted DNS Settings in Web Browsers

Alexandra Nisenoff, Carnegie Mellon University and University of Chicago; Ranya Sharma and Nick Feamster, University of Chicago

Available Media

Recent developments to encrypt the Domain Name System (DNS) have resulted in major browser and operating system vendors deploying encrypted DNS functionality, often enabling various configurations and settings by default. In many cases, default encrypted DNS settings have implications for performance and privacy; for example, Firefox’s default DNS setting sends all of a user’s DNS queries to Cloudflare, potentially introducing new privacy vulnerabilities. In this paper, we confirm that most users are unaware of these developments—with respect to the rollout of these new technologies, the changes in default settings, and the ability to customize encrypted DNS configuration to balance user preferences between privacy and performance. Our findings suggest several important implications for the designers of interfaces for encrypted DNS functionality in both browsers and operating systems, to help improve user awareness concerning these settings, and to ensure that users retain the ability to make choices that allow them to balance tradeoffs concerning DNS privacy and performance.

Two Sides of the Shield: Understanding Protective DNS adoption factors

Elsa Rodríguez, Radu Anghel, Simon Parkin, Michel van Eeten, and Carlos Gañán, Delft University of Technology

Available Media

Protective DNS (PDNS) filters out DNS requests leading to harmful resources. PDNS is currently being promoted by various governments and industry players – some global public DNS providers offer it, as do some government-sponsored DNS resolvers. Yet, are end users even interested in adopting it? The extent of current PDNS usage, as well as the factors that encourage or discourage end-users' adoption, have not been studied. We found that overall PDNS adoption is minimal, though in some countries over 20% of the DNS queries are being answered by these types of resolvers. Four human subjects studies were undertaken to understand end-user adoption factors: a survey with 295 consumers; 24 interviews with ISP customers offered a free PDNS after a malware infection; 12 interviews with public and private enterprise professionals, and 9 interviews with DNS technology specialists. We found that users are more likely to use PDNS if operated by their own ISP rather than the government. For enterprises, we uncovered that access to global threat intelligence, a layered security strategy, and compliance with regulations were the main factors for PDNS adoption. The DNS technical specialists highlighted broader challenges of PDNS adoption such as transparency and centralization.

The Maginot Line: Attacking the Boundary of DNS Caching Protection

Xiang Li, Chaoyi Lu, and Baojun Liu, Tsinghua University; Qifan Zhang and Zhou Li, University of California, Irvine; Haixin Duan, Tsinghua University, QI-ANXIN Technology Research Institute, and Zhongguancun Laboratory; Qi Li, Tsinghua University and Zhongguancun Laboratory

Available Media

In this paper, we report MaginotDNS, a powerful cache poisoning attack against DNS servers that simultaneously act as forwarder and recursive resolver (termed as CDNS). The attack is made possible through exploiting vulnerabilities in the bailiwick checking algorithms, one of the cornerstones of DNS security since the 1990s, and affects multiple versions of popular DNS software, including BIND and Microsoft DNS. Through field tests, we find that the attack is potent, allowing attackers to take over entire DNS zones, even including Top-Level Domains (e.g., .com and .net). Through a large-scale measurement study, we also confirm the extensive usage of CDNSes in real-world networks (up to 41.8% of our probed open DNS servers) and find that at least 35.5% of all CDNSes are vulnerable to MaginotDNS. After interviews with ISPs, we show a wide range of CDNS use cases and real-world attacks. We have reported all the discovered vulnerabilities to DNS software vendors and received acknowledgments from all of them. 3 CVE-ids have been assigned, and 2 vendors have fixed their software. Our study brings attention to the implementation inconsistency of security checking logic in different DNS software and server modes (i.e., recursive resolvers and forwarders), and we call for standardization and agreements among software vendors.

Fourteen Years in the Life: A Root Server’s Perspective on DNS Resolver Security

Alden Hilton, Sandia National Laboratories; Casey Deccio, Brigham Young University; Jacob Davis, Sandia National Laboratories

Available Media

We consider how the DNS security and privacy landscape has evolved over time, using data collected annually at A-root between 2008 and 2021. We consider issues such as deployment of security and privacy mechanisms, including source port randomization, TXID randomization, DNSSEC, and QNAME minimization. We find that achieving general adoption of new security practices is a slow, ongoing process. Of particular note, we find a significant number of resolvers lacking nearly all of the security mechanisms we considered, even as late as 2021. Specifically, in 2021, over 4% of the resolvers analyzed were unprotected by either source port randomization, DNSSEC validation, DNS cookies, or 0x20 encoding. Encouragingly, we find that the volume of traffic from resolvers with secure practices is significantly higher than that of other resolvers.

NRDelegationAttack: Complexity DDoS attack on DNS Recursive Resolvers

Yehuda Afek and Anat Bremler-Barr, Tel-Aviv University; Shani Stajnrod, Reichman University

Available Media

Malicious actors carrying out distributed denial-of-service (DDoS) attacks are interested in requests that consume a large amount of resources and provide them with ammunition. We present a severe complexity attack on DNS resolvers, where a single malicious query to a DNS resolver can significantly increase its CPU load. Even a few such concurrent queries can result in resource exhaustion and lead to a denial of its service to legitimate clients. This attack is unlike most recent DDoS attacks on DNS servers, which use communication amplification attacks where a single query generates a large number of message exchanges between DNS servers.

The attack described here involves a malicious client whose request to a target resolver is sent to a collaborating malicious authoritative server; this server, in turn, generates a carefully crafted referral response back to the (victim) resolver. The chain reaction of requests continues, leading to the delegation of queries. These ultimately direct the resolver to a server that does not respond to DNS queries. The exchange generates a long sequence of cache and memory accesses that dramatically increase the CPU load on the target resolver. Hence the name non-responsive delegation attack, or NRDelegationAttack.

We demonstrate that three major resolver implementations, BIND9, Unbound, and Knot, are affected by the NRDelegationAttack, and carry out a detailed analysis of the amplification factor on a BIND9 based resolver. As a result of this work, three common vulnerabilities and exposures (CVEs) regarding NRDelegationAttack were issued by these resolver implementations. We also carried out minimal testing on 16 open resolvers, confirming that the attack affects them as well.

Track 4

Graphs and Security

Session Chair: Yuan Tian, UCLA

Platinum Salon 9–10

Inductive Graph Unlearning

Cheng-Long Wang, King Abdullah University of Science and Technology and SDAIA-KAUST Center of Excellence in Data Science and Artificial Intelligence; Mengdi Huai, Iowa State University; Di Wang, King Abdullah University of Science and Technology, Computational Bioscience Research Center, and SDAIA-KAUST Center of Excellence in Data Science and Artificial Intelligence

Available Media

As a way to implement the "right to be forgotten" in machine learning, machine unlearning aims to completely remove the contributions and information of the samples to be deleted from a trained model without affecting the contributions of other samples. Recently, many frameworks for machine unlearning have been proposed, and most of them focus on image and text data. To extend machine unlearning to graph data, GraphEraser has been proposed. However, a critical issue is that GraphEraser is specifically designed for the transductive graph setting, where the graph is static and attributes and edges of test nodes are visible during training. It is unsuitable for the inductive setting, where the graph could be dynamic and the test graph information is invisible in advance. Such inductive capability is essential for production machine learning systems with evolving graphs like social media and transaction networks. To fill this gap, we propose the GUided InDuctivE Graph Unlearning framework (GUIDE). GUIDE consists of three components: guided graph partitioning with fairness and balance, efficient subgraph repair, and similarity-based aggregation. Empirically, we evaluate our method on several inductive benchmarks and evolving transaction graphs. Generally speaking, GUIDE can be efficiently implemented on the inductive graph learning tasks for its low graph partition cost, no matter on computation or structure information. The code is available here: https://github.com/Happy2Git/GUIDE.

GAP: Differentially Private Graph Neural Networks with Aggregation Perturbation

Sina Sajadmanesh, Idiap Research Institute and EPFL; Ali Shahin Shamsabadi, Alan Turing Institute; Aurélien Bellet, Inria; Daniel Gatica-Perez, Idiap Research Institute and EPFL

Available Media

In this paper, we study the problem of learning Graph Neural Networks (GNNs) with Differential Privacy (DP). We propose a novel differentially private GNN based on Aggregation Perturbation (GAP), which adds stochastic noise to the GNN's aggregation function to statistically obfuscate the presence of a single edge (edge-level privacy) or a single node and all its adjacent edges (node-level privacy). Tailored to the specifics of private learning, GAP's new architecture is composed of three separate modules: (i) the encoder module, where we learn private node embeddings without relying on the edge information; (ii) the aggregation module, where we compute noisy aggregated node embeddings based on the graph structure; and (iii) the classification module, where we train a neural network on the private aggregations for node classification without further querying the graph edges. GAP's major advantage over previous approaches is that it can benefit from multi-hop neighborhood aggregations, and guarantees both edge-level and node-level DP not only for training, but also at inference with no additional costs beyond the training's privacy budget. We analyze GAP's formal privacy guarantees using Rényi DP and conduct empirical experiments over three real-world graph datasets. We demonstrate that GAP offers significantly better accuracy-privacy trade-offs than state-of-the-art DP-GNN approaches and naive MLP-based baselines. Our code is publicly available at https://github.com/sisaman/GAP.

PrivGraph: Differentially Private Graph Data Publication by Exploiting Community Information

Quan Yuan, Zhejiang University; Zhikun Zhang, Stanford University and CISPA Helmholtz Center for Information Security; Linkang Du, Zhejiang University; Min Chen, CISPA Helmholtz Center for Information Security; Peng Cheng and Mingyang Sun, Zhejiang University

Available Media

Graph data is used in a wide range of applications, while analyzing graph data without protection is prone to privacy breach risks. To mitigate the privacy risks, we resort to the standard technique of differential privacy to publish a synthetic graph. However, existing differentially private graph synthesis approaches either introduce excessive noise by directly perturbing the adjacency matrix, or suffer significant information loss during the graph encoding process. In this paper, we propose an effective graph synthesis algorithm PrivGraph by exploiting the community information. Concretely, PrivGraph differentially privately partitions the private graph into communities, extracts intra-community and inter-community information, and reconstructs the graph from the extracted graph information. We validate the effectiveness of PrivGraph on six real-world graph datasets and seven commonly used graph metrics.

On the Security Risks of Knowledge Graph Reasoning

Zhaohan Xi, Tianyu Du, Changjiang Li, and Ren Pang, Pennsylvania State University; Shouling Ji, Zhejiang University; Xiapu Luo, The Hong Kong Polytechnic University; Xusheng Xiao, Arizona State University; Fenglong Ma and Ting Wang, Pennsylvania State University

Available Media

Knowledge graph reasoning (KGR) – answering complex logical queries over large knowledge graphs – represents an important artificial intelligence task, entailing a range of applications (e.g., cyber threat hunting). However, despite its surging popularity, the potential security risks of KGR are largely unexplored, which is concerning, given the increasing use of such capability in security-critical domains.

This work represents a solid initial step towards bridging the striking gap. We systematize the security threats to KGR according to the adversary's objectives, knowledge, and attack vectors. Further, we present ROAR, a new class of attacks that instantiate a variety of such threats. Through empirical evaluation in representative use cases (e.g., medical decision support, cyber threat hunting, and commonsense reasoning), we demonstrate that ROAR is highly effective to mislead KGR to suggest pre-defined answers for target queries, yet with negligible impact on non-target ones. Finally, we explore potential countermeasures against ROAR, including filtering of potentially poisoning knowledge and training with adversarially augmented queries, which leads to several promising research directions.

The Case for Learned Provenance Graph Storage Systems

Hailun Ding, Juan Zhai, Dong Deng, and Shiqing Ma, Rutgers University

Available Media

Cyberattacks are becoming more frequent and sophisticated, and investigating them becomes more challenging. Provenance graphs are the primary data source to support forensics analysis. Because of system complexity and long attack duration, provenance graphs can be huge, and efficiently storing them remains a challenging problem. Existing works typically use relational or graph databases to store provenance graphs. These solutions suffer from high storage overhead and low query efficiency. Recently, researchers leveraged Deep Neural Networks (DNNs) in storage system design and achieved promising results. We observe that DNNs can embed given inputs as context-aware numerical vector representations, which are compact and support parallel query operations. In this paper, we propose to learn a DNN as the storage system for provenance graphs to achieve storage and query efficiency. We also present novel designs that leverage domain knowledge to reduce provenance data redundancy and build fast-query processing with indexes. We built a prototype LEONARD and evaluated it on 12 datasets. Compared with the relational database Quickstep and the graph database Neo4j, LEONARD reduced the space overhead by up to 25.90x and boosted up to 99.6% query executions.

Track 5

Ethereum Security

Session Chair: Thorsten Holz, CISPA

Platinum Salon 3–4

A Large Scale Study of the Ethereum Arbitrage Ecosystem

Robert McLaughlin, Christopher Kruegel, and Giovanni Vigna, University of California, Santa Barbara

Available Media

The Ethereum blockchain rapidly became the epicenter of a complex financial ecosystem, powered by decentralized exchanges (DEXs). These exchanges form a diverse capital market where anyone can swap one type of token for another. Arbitrage trades are a normal and expected phenomenon in free capital markets, and, indeed, several recent works identify these transactions on decentralized exchanges.

Unfortunately, existing studies leave significant knowledge gaps in our understanding of the system as a whole, which hinders research into the security, stability, and economic impacts of arbitrage. To address this issue, we perform two large-scale measurements over a 28-month period. First, we design a novel arbitrage identification strategy capable of analyzing over 10x more DEX applications than prior work. This uncovers 3.8 million arbitrages, which yield a total of $321 million in profit. Second, we design a novel arbitrage opportunity detection system, which is the first to support modern complex price models at scale. This system identifies 4 billion opportunities and would generate a weekly profit of 395 Ether (approximately $500,000, at the time of writing). We observe two key insights that demonstrate the usefulness of these measurements: (1) an increasing percentage of revenue is paid to the miners, which threatens consensus stability, and (2) arbitrage opportunities occasionally persist for several blocks, which implies that price-oracle manipulation attacks may be less costly than expected.

ACon^2: Adaptive Conformal Consensus for Provable Blockchain Oracles

Sangdon Park, Georgia Institute of Technology; Osbert Bastani, University of Pennsylvania; Taesoo Kim, Georgia Institute of Technology

Available Media

Blockchains with smart contracts are distributed ledger systems that achieve block-state consistency among distributed nodes by only allowing deterministic operations of smart contracts. However, the power of smart contracts is enabled by interacting with stochastic off-chain data, which in turn opens the possibility to undermine the block-state consistency. To address this issue, an oracle smart contract is used to provide a single consistent source of external data; but, simultaneously, this introduces a single point of failure, which is called the oracle problem. To address the oracle problem, we propose an adaptive conformal consensus (ACon2) algorithm that derives a consensus set of data from multiple oracle contracts via the recent advance in online uncertainty quantification learning. Interesting, the consensus set provides a desired correctness guarantee under distribution shift and Byzantine adversaries. We demonstrate the efficacy of the proposed algorithm on two price datasets and an Ethereum case study. In particular, the Solidity implementation of the proposed algorithm shows the potential practicality of the proposed algorithm, implying that online machine learning algorithms are applicable to address security issues in blockchains.

Snapping Snap Sync: Practical Attacks on Go Ethereum Synchronising Nodes

Massimiliano Taverna and Kenneth G. Paterson, ETH Zurich

Available Media

Go Ethereum is by far the most used Ethereum client. It originally implemented the Ethereum proof-of-work consensus mechanism, before the switch to proof-of-stake in 2022. We analyse the Go Ethereum implementation of chain synchronisation – the process through which a node first joining the network obtains the blockchain from its peers – in proof-of-work. We present three novel attacks that allow an adversary controlling a small fraction of the network mining power to induce synchronising nodes to deviate from consensus and eventually operate on an adversary-controlled version of the blockchain. We successfully implemented the attacks in a test network. We describe how the attacks can be leveraged to realise financial profits, through off-chain trading and via arbitrary code execution. Notably, the cheapest of our attacks can be mounted using a fraction of one GPU against both Ethereum Classic and EthereumPoW, two Ethereum forks still relying on the proof-of-work consensus mechanism and whose combined market capitalisation is around 3 billion USD. Our attacks would have also applied to the pre-Merge Ethereum mainnet during the period 2017 – 2022.

Token Spammers, Rug Pulls, and Sniper Bots: An Analysis of the Ecosystem of Tokens in Ethereum and in the Binance Smart Chain (BNB)

Federico Cernera, Massimo La Morgia, Alessandro Mei, and Francesco Sassi, Sapienza University of Rome

Available Media

In this work, we perform a longitudinal analysis of the BNB Smart Chain and Ethereum blockchain from their inception to March 2022. We study the ecosystem of the tokens and liquidity pools, highlighting analogies and differences between the two blockchains. We discover that about 60% of tokens are active for less than one day. Moreover, we find that 1% of addresses create an anomalous number of tokens (between 20% and 25%). We discover that these tokens are used as disposable tokens to perform a particular type of rug pull, which we call 1-day rug pull. We quantify the presence of this operation on both blockchains discovering its prevalence on the BNB Smart Chain. We estimate that 1-day rug pulls generated $240 million in profits. Finally, we present sniper bots, a new kind of trader bot involved in these activities, and we detect their presence and quantify their activity in the rug pull operations.

Automated Inference on Financial Security of Ethereum Smart Contracts

Wansen Wang and Wenchao Huang, University of Science and Technology of China; Zhaoyi Meng, Anhui University; Yan Xiong and Fuyou Miao, University of Science and Technology of China; Xianjin Fang, Anhui University of Science and Technology; Caichang Tu and Renjie Ji, University of Science and Technology of China

Available Media

Nowadays millions of Ethereum smart contracts are created per year and become attractive targets for financially motivated attackers. However, existing analyzers are not sufficient to analyze the financial security of a large number of contracts precisely. In this paper, we propose and implement FASVERIF, an automated inference system for fine-grained analysis of smart contracts. FASVERIF automatically generates models to be verified against security properties of smart contracts. Besides, different from existing approaches of formal verifications, our inference system also automatically generates the security properties. Specifically, we propose two types of security properties, invariant properties and equivalence properties, which can be used to detect various types of finance-related vulnerabilities and can be automatically generated based on our statistical analysis. As a result, FASVERIF can automatically process source code of smart contracts, and uses formal methods whenever possible to simultaneously maximize its accuracy. We also prove the soundness of verifying our properties using our translated model based on a custom semantics of Solidity.

We evaluate FASVERIF on a vulnerabilities dataset of 549 contracts by comparing it with other automatic tools. Our evaluation shows that FASVERIF greatly outperforms the representative tools using different technologies, with respect to accuracy and coverage of types of vulnerabilities. We also evaluate FASVERIF on a real-world dataset of 1700 contracts, and find 13 contracts with bugs that can still be leveraged by adversaries online.

Track 6

Supply Chains and Third-Party Code

Session Chair: Kevin Roundy, Norton Research

Platinum Salon 1–2

LibScan: Towards More Precise Third-Party Library Identification for Android Applications

Yafei Wu and Cong Sun, State Key Lab of ISN, School of Cyber Engineering, Xidian University, China; Dongrui Zeng, Palo Alto Networks, Inc., Santa Clara, CA, USA; Gang Tan, The Pennsylvania State University, University Park, PA, USA; Siqi Ma, University of New South Wales, Australia; Peicheng Wang, State Key Lab of ISN, School of Cyber Engineering, Xidian University, China

Available Media

Android apps pervasively use third-party libraries (TPL) to reuse functionalities and improve development efficiency. The insufficient knowledge of the TPL internal exposes the developers and users to severe threats of security vulnerabilities. To mitigate such threats, people have proposed diversified approaches to identifying vulnerable or even malicious TPLs. However, the rich features of different modern obfuscators, including advanced repackaging, dead code removal, and control-flow randomization, have significantly impeded the precise detection of the TPLs. In this work, we propose a general-purpose TPL detection approach, LibScan. We first fingerprint code features to build the potential class correspondence relations between the app and TPL classes. Then, we use the method-opcode similarity and call-chain-opcode similarity to improve the accuracy of detected class correspondences. Moreover, we design early-stop criteria and reuse intermediate results to improve the efficiency of LibScan. In experiments, the evaluation with ground truths demonstrated the effectiveness of LibScan and its detection steps. We also applied LibScan to detect vulnerable TPLs in the top Google Play apps and large-scale wild apps, which shows the efficiency and scalability of our approach, as well as the potential of our approach as an auxiliary tool that helps malware detection.

Union under Duress: Understanding Hazards of Duplicate Resource Mismediation in Android Software Supply Chain

Xueqiang Wang, University of Central Florida; Yifan Zhang and XiaoFeng Wang, Indiana University Bloomington; Yan Jia, Nankai University; Luyi Xing, Indiana University Bloomington

Available Media

Malicious third-party libraries have become a major source of security risks to the Android software supply chain. A recent study shows that a malicious library could harvest data from other libraries hosted in the same app via unauthorized API accesses. However, it is unclear whether third-party libraries could still pose a threat to other libraries after their code and APIs are thoroughly vetted for security.

A third-party Android library often contains diverse resources to support its operations. These resources, along with resources from other libraries, are managed by the Android resource compiler (ARC) during the app build process. ARC needs to mediate the resources in case multiple libraries have duplicate resources.

In this paper, we report a new attack surface on the Android app supply chain: duplicate resource mismediation (Duress). This attack surface provides an opportunity for attackers to contaminate security- and privacy-sensitive resources of a victim library by exploiting ARC, using duplicate resources in malicious libraries. Our attack cases demonstrate that with several effective attack strategies, an attacker can stealthily mislead the victim library and its users to expose sensitive data, and lower down the security protections, etc. Further, we conduct the first systematic study to understand the impacts of Duress risks. Our study has brought to light the pervasiveness of the Duress risks in third-party libraries: an analysis of over 23K libraries and 150K apps discovered that 18.4% libraries have sensitive resources that are exposed to Duress risks, 25.7% libraries have duplicate sensitive resources with other libraries, i.e., integration risks, and over 400 apps in the wild are affected by potential occurrences of Duress, etc. To mitigate the risks, we discuss a lightweight and compile-time resource isolation method to prevent malicious libraries from contaminating the sensitive resources of other libraries.

UVSCAN: Detecting Third-Party Component Usage Violations in IoT Firmware

Binbin Zhao, Georgia Institute of Technology and Zhejiang University; Shouling Ji and Xuhong Zhang, Zhejiang University; Yuan Tian, University of California, Los Angeles; Qinying Wang, Yuwen Pu, and Chenyang Lyu, Zhejiang University; Raheem Beyah, Georgia Institute of Technology

Available Media

Nowadays, IoT devices integrate a wealth of third-party components (TPCs) in firmware to shorten the development cycle. TPCs usually have strict usage specifications, e.g., checking the return value of the function. Violating the usage specifications of TPCs can cause serious consequences, e.g., NULL pointer dereference. Therefore, this massive amount of TPC integrations, if not properly implemented, will lead to pervasive vulnerabilities in IoT devices. Detecting vulnerabilities automatically in TPC integration is challenging from several perspectives: (1) There is a gap between the high-level specifications from TPC documents, and the low-level implementations in the IoT firmware. (2) IoT firmware is mostly the closed-source binary, which loses a lot of information when compiling from the source code and has diverse architectures.

To address these challenges, we design and implement UVScan, an automated and scalable system to detect TPC usage violations in IoT firmware. In UVScan, we first propose a novel natural language processing (NLP)-based rule extraction framework, which extracts API specifications from inconsistently formatted TPC documents. We then design a rule-driven NLP-guided binary analysis engine, which maps the logical information from the high-level TPC document to the low-level binary, and detects TPC usage violations in IoT firmware across different architectures. We evaluate UVScan from four perspectives on four popular TPCs and six ground-truth datasets. The results show that UVScan achieves more than 70% precision and recall, and has a significant performance improvement compared with even the source-level API misuse detectors. To provide an in-depth status quo understanding of the TPC usage violation problem in IoT firmware, we conduct a large-scale analysis on 4,545 firmware images and detect 27,621 usage violations. Our further case studies, the Denial-of-Service attack and the Man-In-The-Middle attack on several firmware images, demonstrate the serious risks of TPC usage violations. Currently, 206 usage violations have been confirmed by vendors as vulnerabilities, and seven of them have been assigned CVE IDs with high severity.

Beyond Typosquatting: An In-depth Look at Package Confusion

Shradha Neupane, Worcester Polytechnic Institute; Grant Holmes, Elizabeth Wyss, and Drew Davidson, University of Kansas; Lorenzo De Carli, University of Calgary

Available Media

Package confusion incidents - where a developer is misled into importing a package other than the intended one - are one of the most severe issues in supply chain security with significant security implications, especially when the wrong package has malicious functionality. While the prevalence of the issue is generally well-documented, little work has studied the range of mechanisms by which confusion in a package name could arise or be employed by an adversary. In our work, we present the first comprehensive categorization of the mechanisms used to induce confusion, and we show how this understanding can be used for detection.

First, we use qualitative analysis to identify and rigorously define 13 categories of confusion mechanisms based on a dataset of 1200+ documented attacks. Results show that, while package confusion is thought to mostly exploit typing errors, in practice attackers use a variety of mechanisms, many of which work at semantic, rather than syntactic, level. Equipped with our categorization, we then define detectors for the discovered attack categories, and we evaluate them on the entire npm package set.

Evaluation of a sample, performed through an online survey, identifies a subset of highly effective detection rules which (i) return high-quality matches (77% matches marked as potentially or highly confusing, and 18% highly confusing) and (ii) generate low warning overhead (1 warning per 100M+ package pairs). Comparison with state-of-the-art reveals that the large majority of such pairs are not flagged by existing tools. Thus, our work has the potential to concretely improve the identification of confusable package names in the wild.

SandDriller: A Fully-Automated Approach for Testing Language-Based JavaScript Sandboxes

Abdullah AlHamdan and Cristian-Alexandru Staicu, CISPA Helmholtz Center for Information Security

Available Media

Language-based isolation offers a cheap way to restrict the privileges of untrusted code. Previous work proposes a plethora of such techniques for isolating JavaScript code on the client-side, enabling the creation of web mashups. While these solutions are mostly out of fashion among practitioners, there is a growing trend to use analogous techniques for JavaScript code running outside of the browser, e.g., for protecting against supply chain attacks on the server-side. Irrespective of the use case, bugs in the implementation of language-based isolation can have devastating consequences. Hence, we propose SandDriller, the first dynamic analysis-based approach for detecting sandbox escape vulnerabilities. Our core insight is to design testing oracles based on two main objectives of language-based sandboxes: Prevent writes outside the sandbox and restrict access to privileged operations. Using instrumentation, we interpose oracle checks on all the references exchanged between the host and the guest code to detect foreign references that allow the guest code to escape the sandbox. If at run time, a foreign reference is detected by an oracle, SandDriller proceeds to synthesize an exploit for it. We apply our approach to six sandbox systems and find eight unique zero-day sandbox breakout vulnerabilities and two crashes. We believe that SandDriller can be integrated in the development process of sandboxes to detect security vulnerabilities in the pre-release phase.

12:00 pm–1:30 pm

Symposium Luncheon

Sponsored by Meta

Marquis Ballroom

1:30 pm–2:45 pm

Track 1

Cellular Networks

Session Chair: Yongdae Kim, KAIST

Platinum Salon 6

Instructions Unclear: Undefined Behaviour in Cellular Network Specifications

Daniel Klischies, Ruhr University Bochum; Moritz Schloegel and Tobias Scharnowski, CISPA Helmholtz Center for Information Security; Mikhail Bogodukhov, Independent; David Rupprecht, Radix Security; Veelasha Moonsamy, Ruhr University Bochum

Available Media

Cellular networks are a cornerstone of modern communication and indispensable to our daily lives. Their specifications span thousands of pages, written primarily in natural language. The ensuing complexity and lack of explicitness lead to underspecification, where only subsets of possible interactions are properly specified, while other behaviour is left undefined and open to interpretation by developers. In practice, this causes weird, unintended interactions in smartphone modems implementing the specification that, in the worst case, lead to security vulnerabilities.

In this work, we present the first generic approach for systematically discovering undefined behaviour in cellular specifications. Requiring solely a model of the behaviour defined in the specification, our technique extends this model to automatically reason about the presence of undefined behaviour. For each undefined behaviour, it automatically infers concrete examples as proof of existence. This not only allows improving the specification but also enables us to test smartphone modems. This way, we can verify whether an instance of undefined behaviour leads to a security vulnerability within modem firmware. With our approach, we identify 58 cases of undefined behaviour in LTE's Public Warning System, SMS, and Radio Resource Control specifications. Five of these cases resulted in previously unknown vulnerabilities that allow adversaries to read modem memory contents and perform remote Denial of Service attacks (in one case just via a single SMS) against commonly used smartphone modems. So far, two CVEs of high and one CVE of critical severity have been assigned.

MobileAtlas: Geographically Decoupled Measurements in Cellular Networks for Security and Privacy Research

Gabriel K. Gegenhuber, University of Vienna; Wilfried Mayer, SBA Research; Edgar Weippl, University of Vienna; Adrian Dabrowski, CISPA Helmholtz Center for Information Security

Available Media

Cellular networks are not merely data access networks to the Internet. Their distinct services and ability to form large complex compounds for roaming purposes make them an attractive research target in their own right. Their promise of providing a consistent service with comparable privacy and security across roaming partners falls apart at close inspection.

Thus, there is a need for controlled testbeds and measurement tools for cellular access networks doing justice to the technology's unique structure and global scope. Particularly, such measurements suffer from a combinatorial explosion of operators, mobile plans, and services. To cope with these challenges, we built a framework that geographically decouples the SIM from the cellular modem by selectively connecting both remotely. This allows testing any subscriber with any operator at any modem location within minutes without moving parts. The resulting GSM/UMTS/LTE measurement and testbed platform offers a controlled experimentation environment, which is scalable and cost-effective. The platform is extensible and fully open-sourced, allowing other researchers to contribute locations, SIM cards, and measurement scripts.

Using the above framework, our international experiments in commercial networks revealed exploitable inconsistencies in traffic metering, leading to multiple phreaking opportunities, i.e., fare-dodging. We also expose problematic IPv6 firewall configurations, hidden SIM card communication to the home network, and fingerprint dial progress tones to track victims across different roaming networks and countries with voice calls.

Eavesdropping Mobile App Activity via Radio-Frequency Energy Harvesting

Tao Ni, Shenzhen Research Institute, City University of Hong Kong, and Department of Computer Science, City University of Hong Kong; Guohao Lan, Department of Software Technology, Delft University of Technology; Jia Wang, College of Computer Science and Software Engineering, Shenzhen University; Qingchuan Zhao, Department of Computer Science, City University of Hong Kong; Weitao Xu, Shenzhen Research Institute, City University of Hong Kong, and Department of Computer Science, City University of Hong Kong

Available Media

Radio-frequency (RF) energy harvesting is a promising technology for Internet-of-Things (IoT) devices to power sensors and prolong battery life. In this paper, we present a novel side-channel attack that leverages RF energy harvesting signals to eavesdrop mobile app activities. To demonstrate this novel attack, we propose AppListener, an automated attack framework that recognizes fine-grained mobile app activities from harvested RF energy. The RF energy is harvested from a custom-built RF energy harvester which generates voltage signals from ambient Wi-Fi transmissions, and app activities are recognized from a three-tier classification algorithm. We evaluate AppListener with four mobile devices running 40 common mobile apps (e.g., YouTube, Facebook, and WhatsApp) belonging to five categories (i.e., video, music, social media, communication, and game); each category contains five application-specific activities. Experiment results show that AppListener achieves over 99% accuracy in differentiating four different mobile devices, over 98% accuracy in classifying 40 different apps, and 86.7% accuracy in recognizing five sets of application-specific activities. Moreover, a comprehensive study is conducted to show AppListener is robust to a number of impact factors, such as distance, environment, and non-target connected devices. Practices of integrating AppListener into commercial IoT devices also demonstrate that it is easy to deploy. Finally, countermeasures are presented as the first step to defend against this novel attack.

Sherlock on Specs: Building LTE Conformance Tests through Automated Reasoning

Yi Chen and Di Tang, Indiana University Bloomington; Yepeng Yao, {CAS-KLONAT, BKLONSPT}, Institute of Information Engineering, CAS, and School of Cyber Security, University of Chinese Academy of Sciences; Mingming Zha and Xiaofeng Wang, Indiana University Bloomington; Xiaozhong Liu, Worcester Polytechnic Institute; Haixu Tang, Indiana University Bloomington; Baoxu Liu, {CAS-KLONAT, BKLONSPT}, Institute of Information Engineering, CAS, and School of Cyber Security, University of Chinese Academy of Sciences

Available Media

Conformance tests are critical for finding security weaknesses in carrier network systems. However, building a conformance test procedure from specifications is challenging, as indicated by the slow progress made by the 3GPP, particularly in developing security-related tests, even with a large amount of resources already committed. A unique challenge in building the procedure is that a testing system often cannot directly invoke the condition event in a security requirement or directly observe the occurrence of the operation expected to be triggered by the event. Addressing this issue requires an event chain to be found, which once initiated leads to a chain reaction so the testing system can either indirectly triggers the target event or indirectly observe the occurrence of the expected event. To find a solution to this problem and make progress towards a fully automated conformance test generation, we developed a new approach called Contester , which utilizes natural language processing and machine learning to build an event dependency graph from a 3GPP specification, and further perform automated reasoning on the graph to discover the event chains for a given security requirement. Such event chains are further converted by Contester into a conformance testing procedure, which is then executed by a testing system to evaluate the compliance of user equipment (UE) with the security requirement. Our evaluation shows that given 22 security requirements from the LTE NAS specifications, Contester successfully generated over a hundred test procedures in just 25 minutes. After running these procedures on 22 popular UEs including iPhone 13, Pixel 5a and IoT devices, our approach uncovered 197 security requirement violations, with 190 never reported before, rendering these devices to serious security risks such as MITM, fake base station and reply attacks.

BASECOMP: A Comparative Analysis for Integrity Protection in Cellular Baseband Software

Eunsoo Kim, Min Woo Baek, and CheolJun Park, KAIST; Dongkwan Kim, Samsung SDS; Yongdae Kim and Insu Yun, KAIST

Available Media

Baseband software is an important component in cellular communication. Unfortunately, it is almost impossible to implement baseband software correctly due to the complexity and the large volume of cellular specifications. As a result, dynamic testing has been widely used to discover implementation bugs in them. However, this approach suffers from the reachability problem, resulting in many missed bugs. Recently, BaseSpec proposed a static approach for analyzing baseband. However, BaseSpec requires heavy manual analysis and is limited to message decoding, failing to support integrity protection, the most critical step in mobile communication. In this paper, we propose a novel, semi-automated approach, BASECOMP, for analyzing integrity protection. To tame the complexity of baseband firmware, BASECOMP utilizes probabilistic inference to identify the integrity protection function. In particular, BASECOMP builds a factor graph from the firmware based on specifications and discovers the most probable function for integrity protection. Then, with additional manual analysis, BASECOMP performs symbolic analysis to validate that its behavior conforms to the specification and reports any discrepancies. We applied BASECOMP to 16 firmware images from two vendors (Samsung and MediaTek) in addition to srsRAN, an open-source 4G and 5G software radio suite. As a result, we discovered 29 bugs, including a NAS AKA bypass vulnerability in Samsung which was assigned critical severity. Moreover, BASECOMP can narrow down the number of functions to be manually analyzed to 1.56 on average. This can significantly reduce manual efforts for analysis, the primary limitation of the previous static analysis approach for baseband.

Track 2

Usability and User Perspectives

Session Chair: Rikke Bjerg Jensen, Royal Holloway, University of London

Platinum Salon 5

Investigating Verification Behavior and Perceptions of Visual Digital Certificates

Dañiel Gerhardt and Alexander Ponticello, CISPA Helmholtz Center for Information Security and Saarland University; Adrian Dabrowski and Katharina Krombholz, CISPA Helmholtz Center for Information Security

Available Media

This paper presents a qualitative study to explore how individuals perceive and verify visual digital certificates with QR codes. During the COVID-19 pandemic, such certificates have been used in the EU to provide standardized proof of vaccination.

We conducted semi-structured interviews with N=17 participants responsible for verifying COVID-19 certificates as part of their job. Using a two-fold thematic analysis approach, we, among other things, identified and classified multiple behavioral patterns, including inadequate reliance on visual cues as a proxy for proper digital verification.

We present design and structural recommendations based on our findings, including conceptual changes and improvements to storage and verification apps to limit shortcut opportunities. Our empirical findings are hence essential to improve the usability, robustness, and effectiveness of visual digital certificates and their verification.

"My Privacy for their Security": Employees' Privacy Perspectives and Expectations when using Enterprise Security Software

Jonah Stegman, Patrick J. Trottier, Caroline Hillier, and Hassan Khan, University of Guelph; Mohammad Mannan, Concordia University

Available Media

Employees are often required to use Enterprise Security Software (“ESS”) on corporate and personal devices. ESS products collect users’ activity data including users’ location, applications used, and websites visited — operating from employees’ device to the cloud. To the best of our knowledge, the privacy implications of this data collection have yet to be explored. We conduct an online survey (n=258) and a semistructured interview (n=22) with ESS users to understand their privacy perceptions, the challenges they face when using ESS, and the ways they try to overcome those challenges. We found that while many participants reported receiving no information about what data their ESS collected, those who received some information often underestimated what was collected. Employees reported lack of communication about various data collection aspects including: the entities with access to the data and the scope of the data collected. We use the interviews to uncover several sources of misconceptions among the participants. Our findings show that while employees understand the need for data collection for security, the lack of communication and ambiguous data collection practices result in the erosion of employees’ trust on the ESS and employers. We obtain suggestions from participants on how to mitigate these misconceptions and collect feedback on our design mockups of a privacy notice and privacy indicators for ESS. Our work will benefit researchers, employers, and ESS developers to protect users’ privacy in the growing ESS market.

Account Security Interfaces: Important, Unintuitive, and Untrustworthy

Alaa Daffalla and Marina Bohuk, Cornell University; Nicola Dell, Jacobs Institute Cornell Tech; Rosanna Bellini, Cornell University; Thomas Ristenpart, Cornell Tech

Distinguished Paper Award Winner

Available Media

Online services increasingly rely on user-facing interfaces to communicate important security-related account information—for example, which devices are logged into a user's account and when recent logins occurred. These are used to assess the security status of an account, which is particularly critical for at-risk users likely to be under active attack. To date, however, there has been no investigation into whether these interfaces work well.

We begin to fill this gap by partnering with a clinic that supports survivors of intimate partner violence (IPV). We investigated hundreds of transcripts to identify ones capturing interactions between clinic consultants and survivors seeking to infer the security status of survivor accounts, and we performed a qualitative analysis of 28 transcripts involving 19 consultants and 22 survivors. Our findings confirm the importance of these interfaces for assessing a user's security, but we also find that these interfaces suffer from a number of limitations that cause confusion and reduce their utility.

We go on to experimentally investigate the lack of integrity of information contained in device lists and session activity logs for four major services. For all the services investigated, we show how an attacker can either hide accesses entirely or spoof access details to hide illicit logins from victims.

Defining "Broken": User Experiences and Remediation Tactics When Ad-Blocking or Tracking-Protection Tools Break a Website’s User Experience

Alexandra Nisenoff, University of Chicago and Carnegie Mellon University; Arthur Borem, Madison Pickering, Grant Nakanishi, Maya Thumpasery, and Blase Ur, University of Chicago

Available Media

To counteract the ads and third-party tracking ubiquitous on the web, users turn to blocking tools—ad-blocking and tracking-protection browser extensions and built-in features. Unfortunately, blocking tools can cause non-ad, non-tracking elements of a website to degrade or fail, a phenomenon termed breakage. Examples include missing images, non-functional buttons, and pages failing to load. While the literature frequently discusses breakage, prior work has not systematically mapped and disambiguated the spectrum of user experiences subsumed under breakage, nor sought to understand how users experience, prioritize, and attempt to fix breakage. We fill these gaps. First, through qualitative analysis of 18,932 extension-store reviews and GitHub issue reports for ten popular blocking tools, we developed novel taxonomies of 38 specific types of breakage and 15 associated mitigation strategies. To understand subjective experiences of breakage, we then conducted a 95-participant survey. Nearly all participants had experienced various types of breakage, and they employed an array of strategies of variable effectiveness in response to specific types of breakage in specific contexts. Unfortunately, participants rarely notified anyone who could fix the root causes. We discuss how our taxonomies and results can improve the comprehensiveness and prioritization of ongoing attempts to automatically detect and fix breakage.

Cryptographic Deniability: A Multi-perspective Study of User Perceptions and Expectations

Tarun Kumar Yadav, Brigham Young University; Devashish Gosain, KU Leuven; Kent Seamons, Brigham Young University

Available Media

Cryptographic deniability allows a sender to deny authoring a message. However, it requires social and legal acceptance to be effective. Although popular secure messaging apps support deniability, security experts are divided on whether it should be the default property for these applications. This paper presents a multi-perspective, multi-methods study of user perceptions and expectations of deniability. The methodology includes (1) qualitative analysis of expert opinions obtained from a public forum on deniability, (2) qualitative analysis of semi-structured interviews of US participants, (3) quantitative analysis of a survey (n=664) of US participants, and (4) qualitative and quantitative analysis of US court cases with help from a legal expert to understand the legal standpoint of deniability. The results show that deniability is not socially accepted, and most users prefer non-repudiation. We found no US court cases involving WhatApp that consider deniability. Significant human-centered research is needed before deniability can adequately protect vulnerable users.

Track 3

Entomology

Session Chair: Giancarlo Pellegrino, CISPA

Platinum Salon 7–8

Silent Bugs Matter: A Study of Compiler-Introduced Security Bugs

Jianhao Xu, Nanjing University; Kangjie Lu, University of Minnesota; Zhengjie Du, Zhu Ding, and Linke Li, Nanjing University; Qiushi Wu, University of Minnesota; Mathias Payer, EPFL; Bing Mao, Nanjing University

Available Media

Compilers assure that any produced optimized code is semantically equivalent to the original code. However, even "correct" compilers may introduce security bugs as security properties go beyond translation correctness. Security bugs introduced by such correct compiler behaviors can be disputable; compiler developers expect users to strictly follow language specifications and understand all assumptions, while compiler users may incorrectly assume that their code is secure. Such bugs are hard to find and prevent, especially when it is unclear whether they should be fixed on the compiler or user side. Nevertheless, these bugs are real and can be severe, thus should be studied carefully.

We perform a comprehensive study on compiler-introduced security bugs (CISB) and their root causes. We collect a large set of CISB in the wild by manually analyzing 4,827 potential bug reports of the most popular compilers (GCC and Clang), distilling them into a taxonomy of CISB. We further conduct a user study to understand how compiler users view compiler behaviors. Our study shows that compiler-introduced security bugs are common and may have serious security impacts. It is unrealistic to expect compiler users to understand and comply with compiler assumptions. For example, the "no-undefined-behavior" assumption has become a nightmare for users and a major cause of CISB.

A Bug's Life: Analyzing the Lifecycle and Mitigation Process of Content Security Policy Bugs

Gertjan Franken, Tom Van Goethem, Lieven Desmet, and Wouter Joosen, imec-DistriNet, KU Leuven

Distinguished Paper Award Winner

Available Media

The constantly evolving Web exerts a chronic pressure on the development and maintenance of the Content Security Policy (CSP), which stands as one of the primary security policies to mitigate attacks such as cross-site scripting. Indeed, to attain comprehensiveness, the policy must account for virtually every newly introduced browser feature, and every existing browser feature must be scrutinized upon extension of CSP functionality. Unfortunately, this undertaking's complexity has already led to critical implementational shortcomings, resulting in the security subversion of all CSP-employing websites.

In this paper, we present the first systematic analysis of CSP bug lifecycles, shedding new light on bug root causes. As such, we leverage our automated framework, BugHog, to evaluate the reproducibility of publicly disclosed bug proofs of concept in over 100,000 browser revisions. By considering the entire source code revision history since the introduction of CSP for Chromium and Firefox, we identified 123 unique introducing and fixing revisions for 75 CSP bugs. Our analysis shows that inconsistent handling of bugs led to the early public disclosure of three, and that the lifetime of several others could have been considerably decreased through adequate bug sharing between vendors. Finally, we propose solutions to improve current bug handling and response practices.

Remote Code Execution from SSTI in the Sandbox: Automatically Detecting and Exploiting Template Escape Bugs

Yudi Zhao, Yuan Zhang, and Min Yang, Fudan University

Available Media

Template engines are widely used in web applications to ease the development of user interfaces. The powerful capabilities provided by the template engines can be abused by attackers through server-side template injection (SSTI), enabling severe attacks on the server side, including remote code execution (RCE). Hence, modern template engines have provided a sandbox mode to prevent SSTI attacks from RCE.

In this paper, we study an overlooked sandbox bypass vulnerability in template engines, called template escape, that could elevate SSTI attacks to RCE. By escaping the template rendering process, template escape bugs can be used to inject executable code on the server side. Template escape bugs are subtle to detect and exploit, due to their dependencies on the template syntax and the template rendering logic. Consequently, little knowledge is known about their prevalence and severity in the real world. To this end, we conduct the first in-depth study on template escape bugs and present TEFuzz, an automatic tool to detect and exploit such bugs. By incorporating several new techniques, TEFuzz does not need to learn the template syntax and can generate PoCs and exploits for the discovered bugs. We apply TEFuzz to seven popular PHP template engines. In all, TEFuzz discovers 135 new template escape bugs and synthesizes RCE exploits for 55 bugs. Our study shows that template escape bugs are prevalent and pose severe threats.

Detecting API Post-Handling Bugs Using Code and Description in Patches

Miaoqian Lin, Kai Chen, and Yang Xiao, Institute of Information Engineering, Chinese Academy of Sciences, China; School of Cyber Security, University of Chinese Academy of Sciences, China

Available Media

Program APIs must be used in accordance with their specifications. API post-handling (APH) is a common type of specification that deals with APIs' return checks, resource releases, etc. Violation of APH specifications (aka, APH bug) could cause serious security problems, including memory corruption, resource leaks, etc. API documents, as a good source of APH specifications, are often analyzed to extract specifications for APH bug detection. However, documents are not always complete, which makes many bugs fail to be detected. In this paper, we find that patches could be another good source of APH specifications. In addition to the code differences introduced by patches, patches also contain descriptions, which help to accurately extract APH specifications. In order to make bug detection accurate and efficient, we design API specification-based graph for reducing the number of paths to be analyzed and perform partial path-sensitive analysis. We implement a prototype named APHP (API Post-Handling bugs detector using Patches) for static detection of APH bugs. We evaluate APHP on four popular open-source programs, including the Linux kernel, QEMU, Git and Redis, and detect 410 new bugs, outperforming existing state-of-the-art work. 216 of the bugs have been confirmed by the maintainers, and 2 CVEs have been assigned. Some bugs have existed for more than 12 years. Till now, many submitted patches have been backported to long-term stable versions of the Linux kernel.

Place Your Locks Well: Understanding and Detecting Lock Misuse Bugs

Yuandao Cai, Peisen Yao, Chengfeng Ye, and Charles Zhang, The Hong Kong University of Science and Technology

Available Media

Modern multi-threaded software systems commonly leverage locks to prevent concurrency bugs. Nevertheless, due to the complexity of writing the correct concurrent code, using locks itself is often error-prone. In this work, we investigate a general variety of lock misuses. Our characteristic study of existing CVE IDs reveals that lock misuses can inflict concurrency errors and even severe security issues, such as denial-of-service and memory corruption. To alleviate the threats, we present a practical static analysis framework, namely Lockpick, which consists of two core stages to effectively detect misused locks. More specifically, Lockpick first conducts path-sensitive typestate analysis, tracking lock-state transitions and interactions to identify sequential typestate violations. Guided by the preceding results, Lockpick then performs concurrency-aware detection to pinpoint various lock misuse errors, effectively reasoning about the thread interleavings of interest. The results are encouraging—we have used Lockpick to uncover 203 unique and confirmed lock misuses across a broad spectrum of impactful open-source systems, such as OpenSSL, the Linux kernel, PostgreSQL, MariaDB, FFmpeg, Apache HTTPd, and FreeBSD. Three exciting results are that those confirmed lock misuses are long-latent, hiding for 7.4 years on average; in total, 16 CVE IDs have been assigned for the severe errors uncovered; and Lockpick can flag many real bugs missed by the previous tools with significantly fewer false positives.

Track 4

Adversarial Examples

Session Chair: Birhanu Eshete, University of Michigan, Dearbor

Platinum Salon 9–10

The Space of Adversarial Strategies

Ryan Sheatsley, Blaine Hoak, Eric Pauley, and Patrick McDaniel, University of Wisconsin-Madison

Available Media

Adversarial examples, inputs designed to induce worst-case behavior in machine learning models, have been extensively studied over the past decade. Yet, our understanding of this phenomenon stems from a rather fragmented pool of knowledge; at present, there are a handful of attacks, each with disparate assumptions in threat models and incomparable definitions of optimality. In this paper, we propose a systematic approach to characterize worst-case (i.e., optimal) adversaries. We first introduce an extensible decomposition of attacks in adversarial machine learning by atomizing attack components into surfaces and travelers. With our decomposition, we enumerate over components to create 576 attacks (568 of which were previously unexplored). Next, we propose the Pareto Ensemble Attack (PEA): a theoretical attack that upper-bounds attack performance. With our new attacks, we measure performance relative to the PEA on: both robust and non-robust models, seven datasets, and three extended p-based threat models incorporating compute costs, formalizing the Space of Adversarial Strategies. From our evaluation we find that attack performance to be highly contextual: the domain, model robustness, and threat model can have a profound influence on attack efficacy. Our investigation suggests that future studies measuring the security of machine learning should: (1) be contextualized to the domain & threat models, and (2) go beyond the handful of known attacks used today.

“Security is not my field, I’m a stats guy”: A Qualitative Root Cause Analysis of Barriers to Adversarial Machine Learning Defenses in Industry

Jaron Mink, University of Illinois at Urbana-Champaign; Harjot Kaur, Leibniz University Hannover; Juliane Schmüser and Sascha Fahl, CISPA Helmholtz Center for Information Security; Yasemin Acar, Paderborn University and George Washington University

Available Media

Adversarial machine learning (AML) has the potential to leak training data, force arbitrary classifications, and greatly degrade overall performance of machine learning models, all of which academics and companies alike consider as serious issues. Despite this, seminal work has found that most organizations insufficiently protect against such threats. While the lack of defenses to AML is most commonly attributed to missing knowledge, it is unknown why mitigations are unrealized in industry projects. To better understand the reasons behind the lack of deployed AML defenses, we conduct semi-structured interviews (n=21) with data scientists and data engineers to explore what barriers impede the effective implementation of such defenses. We find that practitioners’ ability to deploy defenses is hampered by three primary factors: a lack of institutional motivation and educational resources for these concepts, an inability to adequately assess their AML risk and make subsequent decisions, and organizational structures and goals that discourage implementation in favor of other objectives. We conclude by discussing practical recommendations for companies and practitioners to be made more aware of these risks, and better prepared to respond.

X-Adv: Physical Adversarial Object Attacks against X-ray Prohibited Item Detection

Aishan Liu and Jun Guo, Beihang University; Jiakai Wang, Zhongguancun Laboratory; Siyuan Liang, Chinese Academy of Sciences; Renshuai Tao, Beihang University; Wenbo Zhou, University of Science and Technology of China; Cong Liu, iFLYTEK; Xianglong Liu, Beihang University, Zhongguancun Laboratory, and Hefei Comprehensive National Science Center; Dacheng Tao, JD Explore Academy

Available Media

Adversarial attacks are valuable for evaluating the robustness of deep learning models. Existing attacks are primarily conducted on the visible light spectrum (e.g., pixel-wise texture perturbation). However, attacks targeting texture-free X-ray images remain underexplored, despite the widespread application of X-ray imaging in safety-critical scenarios such as the X-ray detection of prohibited items. In this paper, we take the first step toward the study of adversarial attacks targeted at X-ray prohibited item detection, and reveal the serious threats posed by such attacks in this safety-critical scenario. Specifically, we posit that successful physical adversarial attacks in this scenario should be specially designed to circumvent the challenges posed by color/texture fading and complex overlapping. To this end, we propose X-Adv to generate physically printable metals that act as an adversarial agent capable of deceiving X-ray detectors when placed in luggage. To resolve the issues associated with color/texture fading, we develop a differentiable converter that facilitates the generation of 3D-printable objects with adversarial shapes, using the gradients of a surrogate model rather than directly generating adversarial textures. To place the printed 3D adversarial objects in luggage with complex overlapped instances, we design a policy-based reinforcement learning strategy to find locations eliciting strong attack performance in worst-case scenarios whereby the prohibited items are heavily occluded by other items. To verify the effectiveness of the proposed X-Adv, we conduct extensive experiments in both the digital and the physical world (employing a commercial X-ray security inspection system for the latter case). Furthermore, we present the physical-world X-ray adversarial attack dataset XAD. We hope this paper will draw more attention to the potential threats targeting safety-critical scenarios. Our codes and XAD dataset are available at https://github.com/DIG-Beihang/X-adv.

SMACK: Semantically Meaningful Adversarial Audio Attack

Zhiyuan Yu, Yuanhaur Chang, and Ning Zhang, Washington University in St. Louis; Chaowei Xiao, Arizona State University

Available Media

Voice controllable systems rely on speech recognition and speaker identification as the key enabling technologies. While they bring revolutionary changes to our daily lives, their security has become a growing concern. Existing work has demonstrated the feasibility of using maliciously crafted perturbations to manipulate speech or speaker recognition. Although these attacks vary in targets and techniques, they all require the addition of noise perturbations. While these perturbations are generally restricted to Lp-bounded neighborhood, the added noises inevitably leave unnatural traces recognizable by humans, and can be used for defense. To address this limitation, we introduce a new class of adversarial audio attack, named Semantically Meaningful Adversarial Audio AttaCK (SMACK), where the inherent speech attributes (such as prosody) are modified such that they still semantically represent the same speech and preserves the speech quality. The efficacy of SMACK was evaluated against five transcription systems and two speaker recognition systems in a black-box manner. By manipulating semantic attributes, our adversarial audio examples are capable of evading the state-of-the-art defenses, with better speech naturalness compared to traditional Lp-bounded attacks in the human perceptual study.

URET: Universal Robustness Evaluation Toolkit (for Evasion)

Kevin Eykholt, Taesung Lee, Douglas Schales, Jiyong Jang, and Ian Molloy, IBM Research; Masha Zorin, University of Cambridge

Available Media

Machine learning models are known to be vulnerable to adversarial evasion attacks as illustrated by image classification models. Thoroughly understanding such attacks is critical in order to ensure the safety and robustness of critical AI tasks. However, most evasion attacks are difficult to deploy against a majority of AI systems because they have focused on image domain with only few constraints. An image is composed of homogeneous, numerical, continuous, and independent features, unlike many other input types to AI systems used in practice. Furthermore, some input types include additional semantic and functional constraints that must be observed to generate realistic adversarial inputs. In this work, we propose a new framework to enable the generation of adversarial inputs irrespective of the input type and task domain. Given an input and a set of pre-defined input transformations, our framework discovers a sequence of transformations that result in a semantically correct and functional adversarial input. We demonstrate the generality of our approach on several diverse machine learning tasks with various input representations. We also show the importance of generating adversarial examples as they enable the deployment of mitigation techniques.

Track 5

Private Record Access

Session Chair: Wouter Lueks, CISPA

Platinum Salon 3–4

Authenticated private information retrieval

Simone Colombo, EPFL; Kirill Nikitin, Cornell Tech; Henry Corrigan-Gibbs, MIT; David J. Wu, UT Austin; Bryan Ford, EPFL

Available Media

This paper introduces protocols for authenticated private information retrieval. These schemes enable a client to fetch a record from a remote database server such that (a) the server does not learn which record the client reads, and (b) the client either obtains the "authentic" record or detects server misbehavior and safely aborts. Both properties are crucial for many applications. Standard private-information-retrieval schemes either do not ensure this form of output authenticity, or they require multiple database replicas with an honest majority. In contrast, we offer multi-server schemes that protect security as long as at least one server is honest. Moreover, if the client can obtain a short digest of the database out of band, then our schemes require only a single server. Performing an authenticated private PGP-public-key lookup on an OpenPGP key server's database of 3.5 million keys (3 GiB), using two non-colluding servers, takes under 1.2 core-seconds of computation, essentially matching the time taken by unauthenticated private information retrieval. Our authenticated single-server schemes are 30-100× more costly than state-of-the-art unauthenticated single-server schemes, though they achieve incomparably stronger integrity properties.

Don’t be Dense: Efficient Keyword PIR for Sparse Databases

Sarvar Patel and Joon Young Seo, Google; Kevin Yeo, Google and Columbia University

Distinguished Paper Award Winner

Available Media

In this paper, we introduce SparsePIR, a single-server keyword private information retrieval (PIR) construction that enables querying over sparse databases. At its core, SparsePIR is based on a novel encoding algorithm that encodes sparse database entries as linear combinations while being compatible with important PIR optimizations including recursion. SparsePIR achieves response overhead that is half of state-of-the art keyword PIR schemes without requiring long-term client storage of linear-sized mappings. We also introduce two variants, SparsePIRg and SparsePIRc, that further reduces the size of the serving database at the cost of increased encoding time and small additional client storage, respectively. Our frameworks enable performing keyword PIR with, essentially, the same costs as standard PIR. Finally, we also show that SparsePIR may be used to build batch keyword PIR with halved response overhead without any client mappings.

GigaDORAM: Breaking the Billion Address Barrier

Brett Falk, University of Pennsylvania; Rafail Ostrovsky, Matan Shtepel, and Jacob Zhang, University of California, Los Angeles

Available Media

We design and implement GigaDORAM, a novel 3-server Distributed Oblivious Random Access Memory (DORAM) protocol. Oblivious RAM allows a client to read and write to memory on an untrusted server, while ensuring the server itself learns nothing about the client's access pattern. Distributed Oblivious RAM (DORAM) allows a group of servers to efficiently access a secret-shared array at a secret-shared index.

A recent generation of DORAM implementations (e.g. FLORAM, DuORAM) have focused on building DORAM protocols based on Function Secret-Sharing (FSS). These protocols have low communication complexity and low round complexity but linear computational complexity of the servers. Thus, they work for moderate size databases, but at a certain size these FSS-based protocols become computationally inefficient.

In this work, we introduce GigaDORAM, a hierarchical-solution-based DORAM featuring poly-logarithmic computation and communication, but with an over 100× reduction in rounds per query compared to previous hierarchical DORAM protocols. In our implementation, we show that for moderate to large databases where FSS-based solutions become computation bound, our protocol is orders of magnitude more efficient than the best existing DORAM protocols. When N = 231, our DORAM is able to perform over 700 queries per second.

One Server for the Price of Two: Simple and Fast Single-Server Private Information Retrieval

Alexandra Henzinger, Matthew M. Hong, and Henry Corrigan-Gibbs, MIT; Sarah Meiklejohn, Google; Vinod Vaikuntanathan, MIT

Available Media

We present SimplePIR, the fastest single-server private information retrieval scheme known to date. SimplePIR’s security holds under the learning-with-errors assumption. To answer a client’s query, the SimplePIR server performs fewer than one 32-bit multiplication and one 32-bit addition per database byte. SimplePIR achieves 10 GB/s/core server throughput, which approaches the memory bandwidth of the machine and the performance of the fastest two-server private-information-retrieval schemes (which require non-colluding servers). SimplePIR has relatively large communication costs: to make queries to a 1 GB database, the client must download a 121 MB "hint" about the database contents; thereafter, the client may make an unbounded number of queries, each requiring 242 KB of communication. We present a second single-server scheme, DoublePIR, that shrinks the hint to 16 MB at the cost of slightly higher per-query communication (345 KB) and slightly lower throughput (7.4 GB/s/core). Finally, we apply our new private-information-retrieval schemes, together with a novel data structure for approximate set membership, to the task of private auditing in Certificate Transparency. We achieve a strictly stronger notion of privacy than Google Chrome’s current approach with 13x more communication: 16 MB of download per week, along with 1.5 KB per TLS connection.

Duoram: A Bandwidth-Efficient Distributed ORAM for 2- and 3-Party Computation

Adithya Vadapalli, University of Waterloo; Ryan Henry, University of Calgary; Ian Goldberg, University of Waterloo

Available Media

We design, analyze, and implement Duoram, a fast and bandwidth-efficient distributed ORAM protocol suitable for secure 2- and 3-party computation settings. Following Doerner and shelat's Floram construction (CCS 2017), Duoram leverages (2,2)-distributed point functions (DPFs) to represent PIR and PIR-writing queries compactly—but with a host of innovations that yield massive asymptotic reductions in communication cost and notable speedups in practice, even for modestly sized instances. Specifically, Duoram introduces a novel method for evaluating dot products of certain secret-shared vectors using communication that is only logarithmic in the vector length. As a result, for memories with n addressable locations, Duoram can perform a sequence of m arbitrarily interleaved reads and writes using just O(mlgn) words of communication, compared with Floram's O(mn) words. Moreover, most of this work can occur during a data-independent preprocessing phase, leaving just O(m) words of online communication cost for the sequence—i.e., a constant online communication cost per memory access.

Track 6

It’s All Fun and Games Until...

Session Chair: Tadayoshi Kohno, University of Washington

Platinum Salon 1–2

A Peek into the Metaverse: Detecting 3D Model Clones in Mobile Games

Chaoshun Zuo, Chao Wang, and Zhiqiang Lin, The Ohio State University

Available Media

3D models are indispensable assets in metaverse in general and mobile games in particular. Yet, these 3D models can be readily extracted, duplicated, or cloned, a reality that poses a considerable threat. Although instances of games duplicating 3D models from other games have been documented, the pervasiveness of this issue remains unexplored. In this paper, we undertake the first systematic investigation of 3D model cloning within mobile games. However, multiple challenges have to be addressed for clone detection, including scalability, precision, and robustness. Our solution is 3DSCAN, an open source 3D Scanning tool for Clone Assessment and Notification. We have evaluated 3DSCAN with about 12.2 million static 3D models and 2.5 million animated 3D models extracted from 176K mobile games. With these 3D models, 3DSCAN determined that 63.03% of the static models are likely cloned ones (derived from 1,046,632 distinct models), and 37.07% animated 3D models are likely cloned ones (derived from 180,174 distinctive models). With a heuristic-based clone detection algorithm, 3DSCAN finally detected 5,238 mobile games likely containing unauthorized 3D model clones.

PATROL: Provable Defense against Adversarial Policy in Two-player Games

Wenbo Guo, UC Berkeley; Xian Wu, Northwestern University; Lun Wang, UC Berkeley; Xinyu Xing, Northwestern University; Dawn Song, UC Berkeley

Available Media

Recent advances in deep reinforcement learning (DRL) takes artificial intelligence to the next level, from making individual decisions to accomplishing sophisticated tasks via sequential decision makings, such as defeating world-class human players in various games and making real-time trading decisions in stock markets. Following these achievements, we have recently witnessed a new attack specifically designed against DRL. Recent research shows by learning and controlling an adversarial agent/policy, an attacker could quickly discover a victim agent's weaknesses and thus force it to fail its task.

Due to differences in the threat model, most existing defenses proposed for deep neural networks (DNN) cannot be migrated to train robust policies against adversarial policy attacks. In this work, we draw insights from classical game theory and propose the first provable defense against such attacks in two-player competitive games. Technically, we first model the robust policy training problem as finding the nash equilibrium (NE) point in the entire policy space. Then, we design a novel policy training method to search for the NE point in complicated DRL tasks. Finally, we theoretically prove that our proposed method could guarantee the lowerbound performance of the trained agents against arbitrary adversarial policy attacks. Through extensive evaluations, we demonstrate that our method significantly outperforms existing policy training methods in adversarial robustness and performance in non-adversarial settings.

The Blockchain Imitation Game

Kaihua Qin, Imperial College London, RDI; Stefanos Chaliasos, Imperial College London; Liyi Zhou, Imperial College London, RDI; Benjamin Livshits, Imperial College London; Dawn Song, UC Berkeley, RDI; Arthur Gervais, University College London, RDI

Available Media

The use of blockchains for automated and adversarial trading has become commonplace. However, due to the transparent nature of blockchains, an adversary is able to observe any pending, not-yet-mined transactions, along with their execution logic. This transparency further enables a new type of adversary, which copies and front-runs profitable pending transactions in real-time, yielding significant financial gains.

Shedding light on such ''copy-paste'' malpractice, this paper introduces the Blockchain Imitation Game and proposes a generalized imitation attack methodology called Ape. Leveraging dynamic program analysis techniques, Ape supports the automatic synthesis of adversarial smart contracts. Over a timeframe of one year (1st of August, 2021 to 31st of July, 2022), Ape could have yielded 148.96M USD in profit on Ethereum, and 42.70M USD on BNB Smart Chain (BSC).

Not only as a malicious attack, we further show the potential of transaction and contract imitation as a defensive strategy. Within one year, we find that Ape could have successfully imitated 13 and 22 known DeFi attacks on Ethereum and BSC, respectively. Our findings suggest that blockchain validators can imitate attacks in real-time to prevent intrusions in DeFi.

It's all in your head(set): Side-channel attacks on AR/VR systems

Yicheng Zhang, Carter Slocum, Jiasi Chen, and Nael Abu-Ghazaleh, University of California, Riverside

Available Media

With the increasing adoption of Augmented Reality/Virtual Reality (AR/VR) systems, security and privacy concerns attract attention from both academia and industry. This paper demonstrates that AR/VR systems are vulnerable to side-channel attacks launched from software; a malicious application without any special permissions can infer private information about user interactions, other concurrent applications, or even the surrounding world. We develop a number of side-channel attacks targeting different types of private information. Specifically, we demonstrate three attacks on the victim's interactions, successfully recovering hand gestures, voice commands made by victims, and keystrokes on a virtual keyboard, with accuracy exceeding 90%. We also demonstrate an application fingerprinting attack where the spy is able to identify an application being launched by the victim. The final attack demonstrates that the adversary can perceive a bystander in the real-world environment and estimate the bystander's distance with Mean Absolute Error (MAE) of 10.3 cm. We believe the threats presented by our attacks are pressing; they expand our understanding of the threat model faced by these emerging systems and inform the development of new AR/VR systems that are resistant to these threats.

Egg Hunt in Tesla Infotainment: A First Look at Reverse Engineering of Qt Binaries

Haohuang Wen and Zhiqiang Lin, The Ohio State University

Available Media

As one of the most popular C++ extensions for developing graphical user interface (GUI) based applications, Qt has been widely used in desktops, mobiles, IoTs, automobiles, etc. Although existing binary analysis platforms (e.g., angr and Ghidra) could help reverse engineer Qt binaries, they still need to address many fundamental challenges such as the recovery of control flow graphs and symbols. In this paper, we take a first look at understanding the unique challenges and opportunities in Qt binary analysis, developing enabling techniques, and demonstrating novel applications. In particular, although callbacks make control flow recovery challenging, we notice that Qt’s signal and slot mechanism can be used to recover function callbacks. More interestingly, Qt’s unique dynamic introspection can also be repurposed to recover semantic symbols. Based on these insights, we develop QtRE for function callback and semantic symbol recovery for Qt binaries. We have tested QtRE with two suites of Qt binaries: Linux KDE and the Tesla Model S firmware, where QtRE additionally recovered 10,867 callback instances and 24,973 semantic symbols from 123 binaries, which cannot be identified by existing tools. We demonstrate a novel application of using QtRE to extract hidden commands from a Tesla Model S firmware. QtRE discovered 12 hidden commands including five unknown to the public, which can potentially be exploited to manipulate vehicle settings.

2:45 pm–3:15 pm

Break with Refreshments

Platinum Foyer

3:15 pm–4:30 pm

Track 1

Enclaves and Serverless Computing

Session Chair: Thorsten Holz, CISPA

Platinum Salon 6

Reusable Enclaves for Confidential Serverless Computing

Shixuan Zhao, The Ohio State University; Pinshen Xu, Southern University of Science and Technology; Guoxing Chen, Shanghai Jiao Tong University; Mengya Zhang, The Ohio State University; Yinqian Zhang, Southern University of Science and Technology; Zhiqiang Lin, The Ohio State University

Available Media

The recent development of Trusted Execution Environment has brought unprecedented opportunities for confidential computing within cloud-based systems. Among various popular cloud business models, serverless computing has gained dominance since its emergence, leading to a high demand for confidential serverless computing services based on trusted enclaves. However, the issue of cold start overhead significantly hinders its performance, as new enclaves need to be created to ensure a clean and verifiable execution environment. In this paper, we propose a novel approach for constructing reusable enclaves that enable rapid enclave reset and robust security with three key enabling techniques: enclave snapshot and rewinding, nested attestation, and multi-layer intra-enclave compartmentalisation. We have built a prototype system for confidential serverless computing, integrating OpenWhisk and a WebAssembly runtime, which significantly reduces the cold start overhead in an end-to-end serverless setting while imposing a reasonable performance impact on standard execution.

EnigMap: External-Memory Oblivious Map for Secure Enclaves

Afonso Tinoco, Sixiang Gao, and Elaine Shi, CMU

Available Media

Imagine that a privacy-conscious client would like to query a key-value store residing on an untrusted server equipped with a secure processor. To protect the privacy of the client's queries as well as the database, one approach is to implement an oblivious map inside a secure enclave. Indeed, earlier works demonstrated numerous applications of an enclaved-based oblivious map, including private contact discovery, key transparency, and secure outsourced databases.

Our work is motivated by the observation that the previous enclave implementations of oblivious algorithms are suboptimal both asymptotically and concretely. We make the key observation that for enclave applications, the number of page swaps should be a primary performance metric. We therefore adopt techniques from the external-memory algorithms literature, and we are the first to implement such algorithms inside hardware enclaves. We also devise asymptotically better algorithms for ensuring a strong notion of obliviousness that resists cache-timing attacks. We complement our algorithmic improvements with various concrete optimizations that save constant factors in practice. The resulting system, called ENIGMAP, achieves 15× speedup over Signal's linear scan implementation, and 53× speedup over the prior best oblivious algorithm implementation, at a realistic database size of 256 million and a batch size of 1000. The speedup is asymptotical in nature and will be even greater as Signal's user base grows.

AEX-Notify: Thwarting Precise Single-Stepping Attacks through Interrupt Awareness for Intel SGX Enclaves

Scott Constable, Intel Corporation; Jo Van Bulck, imec-DistriNet, KU Leuven; Xiang Cheng, Georgia Institute of Technology; Yuan Xiao, Cedric Xing, and Ilya Alexandrovich, Intel Corporation; Taesoo Kim, Georgia Institute of Technology; Frank Piessens, imec-DistriNet, KU Leuven; Mona Vij, Intel Corporation; Mark Silberstein, Technion

Available Media

Intel® Software Guard Extensions (Intel® SGX) supports the creation of shielded enclaves within unprivileged processes. While enclaves are architecturally protected against malicious system software, Intel SGX's privileged attacker model could potentially expose enclaves to new powerful side-channel attacks. In this paper, we consider hardware-software co-design countermeasures to an important class of single-stepping attacks that use privileged timer interrupts to precisely step through enclave execution exactly one instruction at a time, as supported, e.g., by the open-source SGX-Step framework. This is a powerful deterministic attack primitive that has been employed in a broad range of high-resolution Intel SGX attacks, but so far remains unmitigated.

We propose AEX-Notify, a flexible hardware ISA extension that makes enclaves interrupt aware: enclaves can register a trusted handler to be run after an interrupt or exception. AEX-Notify can be used as a building block for implementing countermeasures against different types of interrupt-based attacks in software. With our primary goal to thwart deterministic single-stepping, we first diagnose the underlying hardware behavior to determine the root cause that enables it. We then apply the learned insights to remove this root cause by building an efficient software handler and constant-time disassembler to transparently determine and atomically prefetch the working set of the next enclave application instruction.

The ISA extension we propose in this paper has been incorporated into a revised version of the Intel SGX specification.

Controlled Data Races in Enclaves: Attacks and Detection

Sanchuan Chen, Fordham University; Zhiqiang Lin, The Ohio State University; Yinqian Zhang, Southern University of Science and Technology

Available Media

This paper introduces controlled data race attacks, a new class of attacks against programs guarded by trusted execution environments such as Intel SGX. Controlled data race attacks are analog to controlled channel attacks, where the adversary controls the underlying operating system and manipulates the scheduling of enclave threads and handling of interrupts and exceptions. Controlled data race attacks are of particular interest for two reasons: First, traditionally non-deterministic data race bugs can be triggered deterministically and exploited for security violation in the context of SGX enclaves. Second, an intended single-threaded enclave can be concurrently invoked by the adversary, which triggers unique interleaving patterns that would not occur in traditional settings. To detect the controlled data race vulnerabilities in real-world enclave binaries (including the code linked with the SGX libraries), we present a lockset-based binary analysis detection algorithm. We have implemented our algorithm in a tool named SGXRacer, and evaluated it with four SGX SDKs and eight open-source SGX projects, identifying 1,780 data races originated from 476 shared variables.

Guarding Serverless Applications with Kalium

Deepak Sirone Jegan, University of Wisconsin-Madison; Liang Wang, Princeton University; Siddhant Bhagat, Microsoft; Michael Swift, University of Wisconsin-Madison

Available Media

As an emerging application paradigm, serverless computing attracts attention from more and more adversaries. Unfortunately, security tools for conventional web applications cannot be easily ported to serverless computing due to its distributed nature, and existing serverless security solutions focus on enforcing user specified information flow policies which are unable to detect the manipulation of the order of functions in application control flow paths. In this paper, we present Kalium, an extensible security framework that leverages local function state and global application state to enforce control-flow integrity (CFI) in serverless applications. We evaluate the performance overhead and security of Kalium using realistic open-source applications; our results show that Kalium mitigates several classes of attacks with relatively low performance overhead and outperforms the state-of-the-art serverless information flow protection systems.

Track 2

Email and Phishing

Session Chair: Sebastian Schinzel, Münster University of Applied Sciences

Platinum Salon 5

“To Do This Properly, You Need More Resources”: The Hidden Costs of Introducing Simulated Phishing Campaigns

Lina Brunken, Annalina Buckmann, Jonas Hielscher, and M. Angela Sasse, Ruhr University Bochum

Available Media

Many organizations use phishing simulation campaigns to raise and measure their employees' security awareness. They can create their own campaigns, or buy phishing-as-a-service from commercial providers; however, the evaluations of the effectiveness in reducing the vulnerability to such attacks have produced mixed results. Recently, researchers have pointed out "hidden costs" - such as reduced productivity and employee trust. What has not been investigated is the cost involved in preparing an organization for a simulated phishing campaign. We present the first case study of an organization going through the process of selecting and purchasing a phishing simulation. We document and analyze the effort of different stakeholders involved, and present reflection from semi-structured interviews with 6 key actors at the end of the procurement process. Our data analysis shows that procuring such simulations can require significant effort from different stakeholders - in our case, at least 50,000€ in person hours - and many hidden intangible costs. Evaluating if a product or service meets training requirements, is acceptable to employees, and preparing the technical infrastructure and operational processes for running such a product all require significant time and effort. The prevailing perception that phishing simulation campaigns are a quick and low-cost solution to providing security training to employees thus needs to be challenged.

You've Got Report: Measurement and Security Implications of DMARC Reporting

Md. Ishtiaq Ashiq and Weitong Li, Virginia Tech; Tobias Fiebig, Max-Planck-Institut für Informatik; Taejoong Chung, Virginia Tech

Available Media

Email, since its invention, has become the most widely used communication system and SMTP is the standard for email transmission on the Internet. However, SMTP lacks built-in security features, such as sender authentication, making it vulnerable to attacks, including sender spoofing. To address the threat of spoofing, several security extensions, such as SPF or DKIM, have been proposed. Domain-based Message Authentication Reporting and Conformance (DMARC) was introduced in 2012 as a way for domain name owners to publish desired actions for email receivers to take through a DNS record if SPF or DKIM validation fails. The DMARC record can also request email receivers to send machine-generated reports back to the specified addresses to aid domain name owners in detecting and evaluating the risk of spoofed emails. However, DMARC's complexity creates opportunities for mismanagement that can be exploited by attackers. This paper presents a large-scale and comprehensive measurement study of DMARC reporting deployment and management. We collected data for all second-level domains under the .com, .net, .org, and .se TLDs over 13 months to analyze deployment and management from the domain name owner's perspective. Additionally, we investigated 7 popular email hosting services and 2 open-source DMARC reporting software to understand their reporting practices. Our study reveals pervasive mismanagement and missing security considerations in DMARC reporting. For example, we found that a single email from an attacker can make a victim SMTP server receive a large number of reports with a high amplification factor (e.g., 1,460×) by exploiting misconfigured SMTP servers. Based on our findings of several operational misconfigurations for DMARC reporting, we provide recommendations for improvement.

Knowledge Expansion and Counterfactual Interaction for Reference-Based Phishing Detection

Ruofan Liu, Shanghai Jiao Tong University and National University of Singapore; Yun Lin, Shanghai Jiao Tong University; Yifan Zhang, Penn Han Lee, and Jin Song Dong, National University of Singapore

Available Media

Phishing attacks have been increasingly prevalent in recent years, significantly eroding societal trust. As a state-of-the-art defense solution, reference-based phishing detection excels in terms of accuracy, timeliness, and explainability. A reference-based solution detects phishing webpages by analyzing their domain-brand consistencies, utilizing a predefined reference list of domains and brand representations such as logos and screenshots. However, the predefined references have limitations in differentiating between legitimate webpages and those of unknown brands. This issue is particularly pronounced when new and emerging brands become targets of attacks.

In this work, we propose DynaPhish as a remedy for reference-based phishing detection, going beyond the predefined reference list. DynaPhish assumes a runtime deployment sce