Apple and The Cryptrickery Factory

33 min readJan 4, 2022

Does Apple’s proposed content scanning technology turn iPhones and Mac computers into “compliance assistants” that save us from data-hungry regulators? Or is it one of the most dangerous global surveillance systems the world has ever seen?

NOTE: 10.12.2022. Truth matters. Putting good arguments out in the world is not in vain. Apple told WIRED this week that it would permanently drop image content scanning. Thanks to all the great people and organisations who have spoken up; some of which statements can be found below.

Apple’s announcement in August 2021 of new features called “Expanded Protections for Children” has been hotly debated ever since. Did the company achieve a technological breakthrough for some of the most challenging problems of our digital age — striking a balance between regulatory oversight and secure, trustworthy systems that respect a user’s privacy? Or is the whole idea of building technology for improving “child safety” just a cover story for introducing global mass surveillance technology on a scale never seen before.

“Expanded Protections for Children”

On August 5, 2021, Apple published several articles outlining Child Safety features on its website. They summarized how Apple wanted to introduce these features in three areas: tools for parents to help their children navigate communication online; intervention by Siri and Search when users try to search for Child Sexual Abuse Material (CSAM); and detection technology that will scan all photo material on a client device for inappropriate content, whenever it is connected to iCloud (which an option the majority of all Apple users choose).

All these capabilities were going to be rolled out in the recent updates to iOS 15, iPadOS 15, watchOS 8, and macOS Monterey. The idea was to activate them first for users in the United States and later in other parts of the world.

However, a week after the initial announcement, Apple published another document, called “Security Threat Model Review of Apple’s Child Safety Features” ¹. This was because of the global public outcry immediately after the first announcement. The protests came from all over the globe: high ranking security researchers (including cryptographer Bruce Schneier, Cambridge University professor of security engineering Ross Anderson, and whistleblower in exile Edward Snowden ²), organizations such as the Electronic Frontier Foundation (EFF)³, and international organizations committed to civil rights, human rights, and digital rights⁴. There were even concerns within Apple’s ranks⁵.

On September 3, Apple announced it would postpone its plans. The company deserves credit for doing this; at least some people at the top started to understand the enormous impact of the new features.

In November, Apple changed its tune about a tiny component of the child protection features: the Messages app will no longer send notifications to parent accounts when it detects “inappropriate” content in text messages. This feature would have broken end-to-end encryption in Messages completely, harming the privacy and safety of all users. This is now canceled, not just postponed, which is good⁶.

On December 15, Apple quietly removed all mentions of CSAM from its Child Safety webpage, suggesting that the plans to detect child sexual abuse images on iPhones, iPads and Mac computers will at least get some re-thinking⁷. They are not called off, though. The company confirmed⁸.

This means it is still necessary to continue reflecting on the implications of client-side content scanning and on the first principles that should govern our judgment: What processes and type of governance are necessary for this to happen? Since the technology has the potential to fundamentally harm the central values of a free society, such as privacy and the right to free speech, it is highly necessary to start thinking about this topic holistically. What digital world do we want to live in?

Platform Security

The “Expanded Protections for Children” have severe implications for Apple’s Platform Security. The published and then retracted documents seemed to be either written up hastily or kept deliberately vague. The word “deliberately” is ominous. But within the cyber security community, suspicions are rampant that the expanded child protection plans are mainly a cover story for rolling out mass-surveillance technology on an unprecedented global scale. The hunt for pedophiles has been long viewed as one of the “Four Horsemen of the Infocalypse” (next to the war on drug dealers, money launderers, and terrorists)⁹. This “infocalypse” would lead to a world in which ideas such as privacy, anonymity, data protection, and cybersecurity have been completely abandoned.

In a recent article, entitled “Bugs in your Pockets, The Risks of Client-Side Scanning,” the authors — some of the most internationally renowned crypto-analysts and security researchers, such as Ross Anderson, Ron Rivest, Bruce Schneier and Whitfield Diffie (when these people are part of a research paper, you sit up and take notice!) — make precisely this argument: the risks of client-side content scanning are so high, and there are so many ways the technology can be abused or simply tricked into ineffectiveness, that one cannot sensibly argue for it unless the aim is to create mass surveillance technology¹⁰. “Such bulk surveillance, the authors state, “can result in a significant chilling effect on freedom of speech and, indeed, on democracy itself.”

Why Content Scanning?

Content scanning allows digital services providers to identify and take action on illegal content proactively. Regulators increasingly require this type of interception from tech organizations. The European Union, for instance, is working towards some of the most far-reaching content scanning regulations in the world to be able to combat online child sexual abuse. The urgency for EU regulators is partly because most illegal sexual child abuse material worldwide is hosted within the EU¹¹. (Update, 22.03.2022: A host of organisations have protested the developments, using the current geopolitical developments related to the Ukraine as an urgent argument against any measures that jeopardise the privacy and confidentiality of communications).

The planned EU regulation enforces tech providers to indiscriminately and monitor all content uploaded to cloud infrastructures without prior court orders. Also, all private communication that flows through digital networks, including images and messages on WhatsApp, Skype, and Signal need to be scanned for illegal content. This would preclude the end-to-end encryption of these applications. End-to-end encryption matters because it makes sure that any third party cannot read content between a sender and receiver. The mathematical algorithms used for this encryption are — in theory at least — unbreakable, even by the most skillful and actors with the most powerful computers

A coalition of nations that share intelligence called “Five Eyes” (Australia, USA, UK, New Zealand, and Canada) has pushed for years to limit or ban encryption. China has put a legal obligation on tech companies since 2015 to provide technical and decryption assistance for public security and intelligence gathering purposes. The 2017 National Intelligence Law further solidifies that obligation, as does the Cybersecurity Law, which requires tech companies to store internet logs of users’ online activity for at least six months to aid law enforcement¹². In November 2021, China ordered a ban on all encryption using keys with a length of 256 bits and more¹³.

Given this regulatory pressure on tech companies, relinquishing strong encryption may be unavoidable. Apple has had first-hand experience with this in China, where it cooperated extensively with authorities and agreed to hand over all encryption keys to the authorities¹⁴.

If a ban on encryption as we know it is inevitable, can we apply the laws of mathematics to create a better kind of encryption? One that keeps communication private and at the same time checks it for compliance? The answer appears to be “yes,” which is the way out of the conundrum that Apple intends to take.

A billboard from an ad campaign in 2019. Not true anymore.

A New Kind of Encryption

The new class of encryption that would solve the privacy/compliance contradiction is called “homomorphic encryption.” The first ideas for homomorphic encryption go back to the 1970s, when cryptographer Ronald Linn Rivest proposed a new type of cryptography that can process encrypted data, e.g. read it, write to it, and insert values to it, without knowing the decryption key¹⁵. This means that code can perform computations on encrypted data without knowing the content of the input nor the content of the output. But when the result of that operation is being decrypted with the correct key, it is identical to the result that would have been produced had the operations been performed on unencrypted data.

Homomorphic encryption offers something substantially new in the information security domain: “encryption in use”, in addition to the two already existing modes, “encryption at rest” and “encryption in transit.” Encryption in use is exactly what is needed for meeting both regulatory requirements banning illegal material while at the same time keeping prying eyes away from confidential information that two parties want to exchange.

But until recently, homomorphic encryption was not useful in the real world. Even simple computations on encrypted data would take weeks or months to complete.

This all changed around 2010 when it became possible thanks to faster processors, better algorithms to perform homomorphic computations within reasonable timeframes. It can still be a factor of a million slower than standard encryption methods. Yet, thanks to massive resource injections by companies such as Microsoft, Facebook, and Apple, amazing real-world applications have already been developed in the past few years.

For instance, users can store encrypted files on a remote file server and, even without that server knowing the decryption key for the files, nor knowing the content of the query, it can retrieve the matching files for the search query. Another example: it is now possible to scan large encrypted medical datasets for particular diseases without revealing personal information in patient records.

With “encryption in use” within reach, computations can now be split up between multiple devices, e.g., mobile devices and powerful computers in a data center, without privacy being affected at any stage.

Private Set Intersection

Homomorphic encryption allows for a new type of functionality in which tasks are shared between computational agents while the data for that computation remains encrypted, and none of the agents have access to all that data. A scenario for this is as follows: two agents, each holding a set of elements that remain secret to each other, ask an external agent to calculate the intersection of their two sets. These data agents do not reveal anything to the computational agent, as the sets they share with it are encrypted. The data agents learn nothing about each other’s sets, except which items are held by both of them. For his part, the computational agent knows nothing about either the sets or their intersection — it will just do the computations on the encrypted datasets and return an encrypted result to both parties.

This is called “private set intersection” (or PSI for short). It is now a mature, practical technology. It is used for things that seemed utterly impossible only relatively recently, such as privacy-preserving location sharing.

For example, it makes proximity alerting possible: informing two parties that they are in each other’s vicinity ¹⁶. Facebook is using PSI for measuring the effectiveness of its online advertising business. It compares the list of people who have seen an ad with those who have completed a transaction, without revealing to advertisers anything about the individuals¹⁷.

Apple recently implemented PSI in the Safari web browser to alert users on leaked passwords without Apple knowing the passwords themselves. This is realized by doing a private set intersection calculation, comparing the encrypted passwords in Safari’s keychain with an encrypted list of hundreds of millions of leaked passwords.

Avoiding Datacenter Disclosure

What could be more suitable for protecting personal privacy if regulators require content scanning than applying a private set intersection? Regulators would only get to know the content that would match a list of known illegal material and gain no insight into other information. The computational tasks necessary for the content scanning and the private set intersection can easily be divided between local devices owned by the content owner and computational agents in data centers. Until the advent of homomorphic encryption, the only other option would be to decrypt data upon arrival in the data center to allow scanning, which would amount to full disclosure of all the data — to the immediate service provider, and possibly to external service providers that perform the content scanning.

Microsoft is one of the biggest providers of on-demand CSAM assessment services. Its image-identification technology for detecting child pornography and other illegal material has been developed in 2009 and is called “photo DNA” .¹⁸ Cloudflare, a US-based web security company that is best known for its DDoS mitigation services, has also recently started to offer CSAM services¹⁹. All of these tools are proprietary. To most internet or service providers, implementing content scanning is mainly a black box that inputs an unencrypted image and outputs an answer to the question whether the content contains targeted material.

When do these actors get access to user data? Usually, access is granted when receiving the content in a data center. It is comparable to how things work in airports: passengers are scanned at security lines upon entering the check-in area. For a data center, this means, data that has been encrypted on the user device for the data transit is decrypted upon arrival. Information is then sent through the content scanning system, after which it is usually encrypted again before it is stored or sent further.

How long can different actors have access to the data? This is something that varies per provider and service. Some providers may be conscious of privacy and keep the number of actors, and also when these actors have access to the data and for how long, to a minimum. Others may be willing or are forced to share data longer, sometimes even permanently. For instance, ICT providers must hand over the encryption keys used for long-term data storage to authorities in China. This allows full access to all stored data, not just during the content scanning phase.

Personal Compliance Assistant

With “encryption in use” now being available as a new, third application domain for encryption, it is for the first time in history possible to create a solution to “datacenter disclosure” while still being able to flag illegal material.

Rather than being a passive data source, in which mobile devices hand over all information to remote data centers, the client device would, in that scenario, make a selection of possibly illegal information that must be further examined. The idea is not to allow data centers access to all data but only to content that is highly likely to be illegal.

With the device you hold in your hands becoming your trusted personal compliance assistant, the idea is to allow it to continuously, automatically, and without any user interaction scan all images stored on the device or on any local storage medium connected to that device. The local device compares the content with a list of known illegal material and silently flags authorities when such material has been found.

How does it work? First, the device makes a digital fingerprint. Error margins and weaknesses of the used algorithm need to be considered, so we will briefly touch upon this. Then, fingerprints that match — the proof of illegal material — should somehow raise alerts, ideally without the owner of the iPhone being aware of these. Raising silent alerts is a whole subject in itself: high levels of complexity in the system arise due to managing just one simple question: who knows what and when?

Creating an Image’s Digital Fingerprint

The requirements for selectivity for any useful scanning for Child Sexual Abuse Material are high. For example, any image scanner on a phone that I use should correctly classify my pictures of our naked daughters enjoying their bathtub parties, or us taking a skinny dip during our last vacation or our pictures of Gustav Klimt’s “Danae” (which happens to be visible on quite some of our snapshots, because we have a large reproduction of it hanging in our bedroom). None of this should be flagged as illegal child pornography material.

On the other hand, the selectivity of the content scanning algorithm should be so high that it would recognize child sexual abuse material, even if the owner would try to conceal the true nature of the images, using obfuscation operations like resizing, changing from color to black and white, cropping or compressing the image²⁰.

The algorithm that Apple engineers created for this is called “NeuralHash.” NeuralHash analyzes the content of an image and converts the result to a unique number, a kind of fingerprint called a “hash.” It does the calculation using neural networks that perform perceptual image analysis. This analysis does not classify, let alone judge, the content of the images, but it is capable of creating the same unique hash for visually similar images. Apple writes:

“Apple’s perceptual hash algorithm […] has not been trained on CSAM images, e.g., to deduce whether a given unknown image may also contain CSAM. It does not contain extracted features from CSAM images (e.g., faces appearing in such images) or any ability to find such features elsewhere. Indeed, NeuralHash knows nothing at all about CSAM images. It is an algorithm designed to answer whether one image is the same image as another, even if some image-altering transformations have been applied (like transcoding, resizing, and cropping).”

Until now, Apple has not provided many details of the NeuralHash algorithm. We do not know how selective it is. How big must visual differences between two images be for the algorithm to recognize them as distinctive? Is image selectivity dependent on the visual properties of the two images being compared, e.g., does it work better for images of humans than for landscapes?

It is frustrating that Apple won’t make the details of such a crucial aspect of the new algorithm public. By not making the algorithm public, it cannot become subject to public scrutiny and deconstruction or attack.

I wonder why this is the case. It seems to be something specific to the problem space, as the other prominent provider of technology for the automated selection of Child Sexual Abuse Material, Microsoft, with its PhotoDNA, also has not publicly released any information about the algorithm.

One possibility is that the code is proprietary, and its inventors are just trying to defend their intellectual property. It is also conceivable that providing more details would lead to an irresponsibly large attack surface. For instance, it could lead to an unacceptably high risk of reverse hash lookup so that it would be possible to know what image is part of the set of illegal material.

A third reason might be that the algorithm has some technical fragility, making it susceptible to deception. This could happen, for instance, by creating “phantom images”, by inserting small modifications in images that would be picked up by the algorithm while being (almost) invisible to the human eye²¹.

Error Tolerance

Like any image classification algorithm, NeuralHash is susceptible to errors. For example, the algorithm can produce false positives (two different images creating the same hash, called “hash collisions”) and false negatives.

A developer reverse-engineered the NeuralHash algorithm. He found the neural network model settings files hidden in the operating system of current iPhones and Apple computers. He imported these files into a general neural network framework and reverse-engineered how it’s supposed to work. With that setup, he managed to create the first known hash-collision images.

Apple stated that such collisions have been an expected outcome and that the NeuralHash settings files that have been shipped with previous versions of iPhones and macOS devices do not reflect the current technical state²².

So, how good is it then? Apple initially stated that “the algorithm has an extremely low error rate of less than one case in one trillion images per year”²³. In later released information, the accuracy was stated as, “We empirically assessed NeuralHash performance by matching 100 million non-CSAM photographs against the perceptual hash database created from NCMEC’s (The National Center for Missing & Exploited Children) obtaining a total of 3 false positives”²⁴.

Such statements are marketing as long as we don’t have the precise details of the empirical test. For example, we don’t know the nature of the tested images or what range of variance and distortion tested. And it’s not even known what the exact size of the NCMEC’s CSAM collection is. (Based on what the NCMEC website says, the organization has scanned around 300 million images²⁵. If we assume that about one percent of these are positively classified as child abuse images, then that would mean the total collection size would be around 30 million, yet informal statements say it’s more like seven million²⁰).

Given that Apple sold 1.65 billion devices, with around half of them being in use²⁶, and assuming that the average user will be making a few thousand images per year, even the error rate Apple is quoting would lead to hundreds or even thousands of users per year that will be incorrectly flagged as having inappropriate images on their device. This is something that Apple has created a manual process for — of which we don’t have any meaningful details.

Apple has encouraged the independent security research community to validate and verify its security claims²⁷. Yet, ironically, it has been in a protracted legal battle with Corellium, a company that provides the tools for doing exactly this. The company offers an iPhone emulator that allows a level of inspection of the hardware, software, and network traffic that is hard to achieve with a physical device. It announced that it would award grants to researchers who want to inspect the new image scanning technique²⁸).

From Fingerprints to Alerts

The error rate of the NeuralHash fingerprint technology is only one aspect of the system’s overall accuracy. It surely is necessary to look at its selectivity and whether it is possible to hack the algorithm.

The actual task of “recognizing” child abuse material is split up between the local device and Apple’s servers. On the server-side, Apple keeps a database with a set of hashes from images known to contain child sexual abuse material. This server-side part is much more critical from a privacy perspective than the NeuralHash image scanning algorithm. Apple has focussed most of its current publications on the algorithm, something which might be a smokescreen that keeps other essential systems and processes out of the limelight.

Apple’s documentation contains hardly any information about this, but the process should work as follows. Independent, private, non-profit organizations whose mission is to help find missing children and reduce child sexual exploitation curate known child abuse material collections. Apple asks these organizations to generate hashes of these images with the NeuralHash algorithm and send them to Apple’s servers. Only hashes that at least two organizations report will be flagged by Apple as linked to child abuse material.

Requiring images to be flagged by two organizations independently mitigates some of the risks that are connected to the process: it could lessen the chance of flagging material that is considered offensive in one culture but not in another; it gives some protection against the risk that material contributed by one organization is compromised, for instance by a nation-state actor. However, this does not mitigate any of the risks of a supply chain attack on contributing non-profit organizations. And such an attack is quite likely to happen, given that non-profit organizations have IT structures that can be easily exploited and do not have the resources to withstand attacks by powerful entities. Criminals may want to delete offensive material, so that it can remain in circulation among device users. Nation-state actors may want to add “offensive material,” either by legal requirements or covert actions, to harass political opponents.

There need to be processes in place to protect the image supply chain. At a minimum, Apple would have to have a security operations center dedicated to protecting the IT infrastructures of the contributing organizations on a day-to-day basis. There would also need to be periodic audits by Apple experts and independent external experts to check the processing and storage of the images and the encryption and transportation of hashes to Apple’s servers. The results of these audits would need to contain full technical details, and they would need to be made publicly available.

Since Apple does not know the content of the images pointed to by the hashes, the only validation seems to be a formal one: the same hash needs to be delivered by more than one organization. Yet such a formal requirement alone is by far not enough to have “extremely high confidence” (as Apple states in its documentation) that only CSAM images — and no other images — were used to generate the central hash database.

How could Apple ensure such high levels of confidence? Does the company keep records of every hash, tracking its history, such as when it was first delivered and by whom? Are hashes validated based on an exclusion algorithm, e.g., comparing them to a gigantic set of guaranteed non-CSAMmaterial? Are there anomaly detection techniques in place for the hash database, ensuring that it has not been tampered with? Who audits the integrity of the code that is necessary for storing the hashes in the database?

Hash Table Distribution

The set of child abuse material will continuously grow. This means that Apple needs to periodically distribute the central database to the billions of devices in use globally. There will not be different versions for different geographic or political regions. The process seems to have been cleverly ingrained in Apple’s operating systems’ existing software release process. From what we know about that process (from the sparse public documentation and historical incidents), it seems to be sound.

Users will be able to verify that their version of the database is indeed the one distributed by Apple. For this, users will be able to inspect the root hash of the encrypted database present on their device and compare it to the root hash that Apple publishes with each operating system update.

However, the calculation of the root hash, which is shown to the user in the settings on the device, is done by code from Apple and is only subject to code inspection by Apple’s internal security reviews. This is entirely insufficient and, in the current state, unacceptable. External auditors should inspect the code.

Apple can switch CSAM-scanning on and off for subsets of user devices, based on the political or geographic region where they are being used. This enables Apple to introduce the system stepwise for different countries and regions. It will have possibly regional legal controls in place before rolling it out to, say, the EU or China.

It has not been documented so far, but the on-off switch based on geographical location is likely tied into the existing process for the iTunes store, which also defines what types of content are visible to the user based on regional location.

It is unknown if the scanning algorithm can be remotely switched on and off for individual devices or if hackers might find ways to trick individual devices into signing in to different regions. The easiest way for child abusers who need to switch the CSAM-scanning off on their devices would be to trick the Apple store into believing that they would be located in a region that is not included in “Expanded Protections for Children” program.

Currently, the connection between a user and in what geolocation he or she is located is tied to having a credit card with a valid address in that specific region. But it might be possible that Apple added further validation of the user region by tracking the GPS geolocation of the device.

Complexity on the Device

An essential part of getting from fingerprints to alerts is happening on the device. The complexity needed for this seemingly simple functionality is insanely high. This is not so much due to the NeuralHash algorithm, as creating hashes from images and comparing them to a database is a straightforward task. Instead, the complexity stems from a simple question: who knows what?

Let’s assume there are four actors: the user who is the creator or holder of the images, the local device, Apple, and any third parties, such as adversaries. (In the following, I will leave out most of the adversaries because the multitude of security mechanisms and extra layers of encryption needed for keeping them out distract too much from the main principles.).

Users are explicitly excluded from knowing several things that happen on their devices. They are not allowed to see the content of the CSAM image database. The user is also excluded from all processes within the code of the local device that are responsible for the comparison of image hashes. Users will not notice anything of the background code that runs to analyze a picture, calculate its NeuralHash, and compare it to the CSAM-database. Since this is a critical piece of software, it should have a good code reviewing process. Yet there is nothing that we know about this, except Apple’s repeated statement, “this claim is subject to code inspection by security researchers like all other iOS device-side security claims.” Without further proof or evidence, such a claim is entirely insufficient.

All actors are excluded from the result of the comparison of the hash of a picture against the CSAM database. The outcome must stay a secret to all actors until a threshold has been reached. A user must possess 30 child abuse images before Apple gets notified.

This means that the code on the device also can’t know how many CSAM images were detected. It does not know the content of the CSAM database, and it does not know the outcome of the comparison of an image against the database that it performs. Otherwise, the code would have to do things like keeping a total counter of matching images to know whether a specific threshold has been passed. That number would have to be stored in a highly secure place, not available to any other instance than the code itself, and it would pose a too high risk of leaking. The user might be alerted to the detection of the CSAM images, or adversaries might find ways to detect the state of the code or the processor and circumvent it.

If the code does not know the outcome of the image comparison, how can it raise a CSAM-alarm? The answer is the local code does not raise alarms. It acts the same for all images, regardless of being a positive or negative match. The local code only creates something that Apple calls a “safety voucher.” This is a document that contains the outcome of the comparison of an image with the local CSAM database. The result includes information about the image, a thumbnail version of the original picture, and its NeuralHash value. This outcome is stored nowhere else but in the safety voucher.

It is reasonable to assume that the safety voucher is wrapped in several layers of encryption. It is then sent to Apple’s iCloud servers. At this stage, nobody, not even Apple, can read the encrypted content of the vouchers. Apple’s iCloud servers periodically scan them to discover whether a given account exceeded the match threshold. There is most likely some terrific use of homomorphic encryption involved in this step. Apple seems to have added truly innovative extensions to the existing homomorphic encryption algorithms to ensure that nobody can read any encrypted content until the alert threshold is passed.

From Encryption To Encryptrickery

Cryptography is defined as the practice and study of techniques for secure communication. The aim of the field is straightforward. Phil Zimmerman, the inventor of the widespread encryption protocol “Pretty Good Privacy,” summarized it beautifully in 1991: “It’s personal. It’s private. And it’s no one’s business but yours” ²⁹. Encryption is part theory and part practice. The theory part requires a good understanding of discrete mathematics, probability theory, and complexity theory. The practical part involves knowledge of ICT systems, regulations, and an insane amount of attention to detail³⁰. No wonder cryptographic failures moved up to place two recently on the top 10 list of risks of the Open Web Application Security Project. (The only reason it’s not on the first place is that most organizations will probably never get their access control in good shape)³¹.

The consensus among cryptographers is that you can aim for only one goal with cryptography: keeping things private. Attaining that one goal is such a huge challenge that it is almost impossible to reach, even in tightly controlled data center environments, given the nature of ICT technology, let alone on billions of decentralized user devices.

Examples of failure are everywhere; highly competent engineers work in fine organizations to build secure systems and fail miserably. Take, for example, the recent audit of Mozilla VPN. This open-source virtual private network is available as a browser extension, desktop application, and mobile app. It was released last year, and it has a top-notch code base. And the VPN system has been designed and built by security-aware and competent engineers reaching for only one goal: keeping things secure. The software is audited annually by an external company, and the results are made public, which is all highly laudable.³²The 2021 and 2020 audits have found numerous vulnerabilities, some of which are classified as “critical.” “Critical” includes things such as attackers being able to make secret screenshots from a victim’s screen and send them to an external host.

Failing security is not just because of encryption that isn’t working or protocols not being followed, but also because of the nature of technology itself. Systems degrade over time. Systems must contend with uses that did not exist and could not be foreseen at the time of their design and manufacture. Systems must operate within circumstances that vary considerably from the original operating parameters. Systems must co-exist with other systems that did not exist and were not anticipated at the time of their design and manufacture. Bad actors analyze systems. Users must interact with systems under conditions outside of the normal parameters. Designers try to anticipate these use-cases, but they can never predict them all. And when it comes to security, it’s rarely the case that systems fail gracefully³³.

That’s why the best people in the security community have embraced humility and transparency. Unfortunately, it seems like some people see “homomorphic encryption” as the end of humility.

That’s a mistake.

Take, for instance, the level of trust you must have in a local device. Automatically scanning all local images in an automated process outside your control is much more invasive than scanning all images you actively upload to a cloud service. The difference is comparable with having your bags searched by authorities at the airport while you’re traveling or having your home continuously searched by authorities because you sometimes travel by air.

As Apple plans to implement the content scanning technique, data subjects have now basically one choice to make: if you don’t want all of your local content being continuously scanned all the time without you being in control, you do not upload any content to a cloud service that you do not operate yourself. If you are an Apple user, this means that you actively need to select that you do not want to use the iCloud Photo Library.

The underlying assumption for this is that you trust your device fully: if you opt in to cloud usage, you trust that your device will flag only genuinely illegal material and that your privacy will be respected. If you opt-out of cloud usage, you trust your device that it will indeed stop all client-side content scanning and reporting to authorities (in addition to losing the convenience of the cloud) And that this will be the case not just now, but for all future software updates that your device will receive.

The user would not have the means to check this because the results of the content scanning are intentionally shielded away from the user — as will be outlined below. Even digital forensics experts might not be able to verify what is going on in your iPhone.

It is also conceivable that the content scanning, intentionally or unintentionally, would look at other file types than just images. It would require just a few changes to — or bugs in — the complex infrastructure that Apple built. This is because the content scanning infrastructure Apple created is fully context- and content-agnostic; it can work for any type of material, be it image, text, audio or video, and for anything that is deemed “illegal”, be it child pornography or anything else that has been defined as “obscene” or “unwanted” by authorities.

Apple states that internal and external code reviews guarantee that the local devices do what they intend to do. This would prove that the code acts as designed and stays within scope and context. This might work for the first aspect. It is possible to have auditors review code and verify that it does what it is designed to do. But having auditors ascertain that intentionally scope- and context agnostic, is being used within the intended boundaries is next to impossible.

Also, for this argument to be significant, it is necessary to make all internal and external audit results public. Since this small piece of code, rolled out to millions or perhaps even billions of devices, is such a crucial element in the whole content scanning mechanism, it is not sufficient for Apple to publish every three years a generic ISO-certificate for their information security management systems (ISMS) — as it does currently.

In the case of an invasive technology like this, international regulating bodies need to step in and must make it mandatory that all vendors who offer local content scanning technology shall publish audit results for the content scanning parts of their devices’ operating systems periodically. It should probably also be prohibited that code with content scanning capabilities is proprietary.

There should also be a whole set of more extensive quality requirements for content scanning code. For instance, it should be mandatory that algorithms that do not have a proven minimum level of “understanding” content or cannot take the political, social, or environmental context in which they operate into account should be banned entirely. For instance, it should be inherent in any CSAM detection technology, that it would consider the nature of this illegal activity and how it’s located in our legal frameworks.

But to Apple’s image scanning algorithm, images of sexual abuse or political activists are the same. From a technical standpoint, widening the scope of the content scanning technology is a trivial thing to do. It would require just minimal changes to the system to include content regarded as “illegal” by some governments.

A crucial piece of software for unprecedented-scale content scanning is not located in tightly controlled server environments anymore, but in hundreds of millions and perhaps even billions of devices. This presents a new and severe vulnerability.

Adversaries who want to target one individual or a group of individuals — e.g., a social or political, or ethnic group, would try to use this as a new surveillance method. They could grant themselves access to the targeted devices and trick the local code into acting as if the user has opted in for cloud storage to their server. They could also use the local content scanning mechanisms simply for obfuscating the activity of their malicious code. It would be hard, if not entirely impossible, for both users or even forensic experts to detect malicious additional scanning activity.

Governance

How will companies like Apple be able to withstand pressure from nation-states to include files in the local content scanning that are not just about child abuse or that are not uploaded to a cloud? How will it be possible to verify that this new technology is not being misused for general surveillance?

Such questions are part of the “slippery slope” objection against CSAM, which researchers have recently raised from Princeton University ³⁴. They wrote the only peer-reviewed publication to date on building an image content scanning system. They conclude that Apple’s technology is hazardous because of previously touched upon: it is intentionally and probably also technically bound to being content and context agnostic.

This means that the used content scanning technologies are not specifically designed to flag only child abuse images. Apple explicitly states that the algorithm used for image scanning does not even know anything about child abuse images. The technology is fully universal and cannot distinguish between different types of content, even if that content may have dramatically different social or legal implications.

From a governance point of view, therefore, content scanning — both if it’s done in data centers or on local devices — should be subject to the same type of international regulatory scrutiny as we currently have for financial services or critical infrastructures.

The Context of Information Technology

The actual “slippery slope” does not reside in the governance of a particular algorithm, or application of technology. It’s more about ICT technology in general and the direction we are heading in as a digital society. ICT technology is on a highway towards a complete merging of client devices with cloud services. Today, handheld devices are merged with cloud services up to a point where both are almost indistinguishable. Some of your data may be stored on a local device some of the time, while other data may be in a data center. Since the operating system will dynamically manage this, users will no longer have any insight into what is processed, where, and how content is stored.

This means that the only method users have to prevent total transparency today — switching off cloud integration for phones and computers — will soon not be available. But most future devices will have minimal functionality or will not work without connecting to the cloud.

Transparency

Local content scanning seems to be a promising strategy for reaching full legal compliance and privacy.

Yet it’s a fallacy to think that technology alone will solve these issues. Apple’s tradition of secrecy is massively in the way of the company being even close to rolling out content scanning technology in a responsible manner. The documentation that was shortly available and then retracted was of such low quality that it was simply impossible to understand the implications of the new technology on overall platform security. For example, the previously mentioned article written by some of the best cryptographers to date completely missed that Apple plans to roll out homomorphic encryption at scale. This is due to Apple not communicating the technology so that independent outsiders can validate its claims. Information about security from Apple is always a combination of marketing, protection of intellectual property, and keeping corporate strategies secret. For security, this is a “toxic mix.” The lack of full transparency from an organization planning to introduce technology that could potentially lead to an all-invasive, global surveillance technology on a scale never seen before is simply unacceptable³⁵.

Outlook

Even if Apple’s content scanning system has been designed and built by the most competent system architects and engineers on this planet, and even if Apple would be the most humble and transparent organization in the world and the system would do exactly what it is designed for, there is such a high level of complexity needed for the balancing act between compliance and privacy that the main concern is not if it works as advertised for the time being but how long it will work as intended.

From a technical point of view, from a governance standpoint, and also from the technological context in which the image scanning technique lives, this is not something to be taken for granted.

From a technical point of view, making changes to a system with high levels of complexity is daunting. Even in perfect lab settings, it would be a challenge to keep the peephole in Apple’s Platform security as intended while making changes to other parts. So how is Apple going to make sure that future changes in the system won’t break things? The available documentation does not say. Are there even people at Apple thinking about what it means to ensure the quality of future states of the “child protection” technology?

Apple would need to improve the access of external security researchers to audit the technology. For a change like this to an operating system, it is essential to allow full and periodic access to the source code to validate and monitor the quality of the encryption. In addition, independent experts should regularly check whether we don’t have a case of “Encryptrickery” here — the art of deceiving people into thinking there is secure communication where there is, in fact, none.

Who will monitor the quality of future changes in the code or processes? There need to be mandatory, regular, publicly available audits. All of this needs independent, external oversight.

This requires something from Apple (and most companies of its size), which it struggles with most and perhaps will never reach in its current state of mind: radical transparency.

References

(1) https://www.apple.com/child-safety/pdf/Security_Threat_Model_Review_of_Apple_Child_Safety_Features.pdf

(2) An Open Letter Against Apple’s Privacy-Invasive Content Scanning TechnologySecurity & Privacy Experts, Cryptographers, Researchers, Professors, Legal Experts and Apple Consumers Decry Apple’s Planned Move to Undermine User Privacy and End-to-End Encryption; Edward Snowdens’ rather polemic piece: https://edwardsnowden.substack.com/p/all-seeing-i. See also: Kurt Opsahl, If You Build It, They Will Come: Apple Has Opened the Backdoor to Increased Surveillance and Censorship Around the World, August 11, 2021 2021/08/if-you-build-it-they-will-come-apple-has-opened-backdoor-increased-surveillance.; Steven J Murdoch, Apple letting the content-scanning genie out of the bottle, August 17, 2021bottle/.; Paul Rosenzweig, The Apple Client-Side Scanning System, Lawfare, August 24, 2021: //www.lawfareblog.com/apple-client-side-scanning-system.Daniel Kahn Gillmor, “Apples New Child Safety Plan for iPhones Isnt So Safe, ACLU, August 26, 2021”: https://Isn'taclu.org/news/privacy-technology/apples-new-child-safety-plan-for- iphones-isn’t-so-safe/.

(3) India McKinney, Erica Portnoy, Apple’s Plan to ‘Think Different’ About Encryption Opens a Backdoor to Your Private Life, August 5, 2021impossible to build a client-side scanning system that can only be used for sexually explicit images sent or received by children. As a consequence, even a well-intentioned effort to build such a system will break key promises of the messengers encryption itself and open the door to broader abuses [] That’s not a slippery slope; thats a fully built system just waiting for external pressure to make the slightest change.The Center for Democracy and Technology has said that it is deeply concerned that Apples changes in fact create new risks to children and all users, and mark a significant departure from long-held privacy and security protocols

(4) International Coalition Calls on Apple to Abandon Plan to Build Surveillance Capabilities into iPhones, iPads, and Other Products

(5) Joseph Menn and Julia Love, Exclusive: Apple’s child protection features spark concern within its own ranks -sources, Reuters, August 13, 2021

(6) https://www.eff.org/pl/deeplinks/2021/11/apple-has-listened-and-will-retract-some-harmful-phone-scanning

(7) https://www.macrumors.com/2021/12/15/apple-nixes-csam-references-website/

(8) https://www.theverge.com/2021/12/15/22837631/apple-csam-detection-child-safety-feature-webpage-removal-delay

(9) https://www.cybersociology.com/files/6_publicandprivatesecurity.html

(10) Ross Anderson Steven M. Bellovin, Matt Blaze Jon Callas Peter G. Neumann Jeffrey I. Schiller Bruce Schneier, Whitfield Diffie, Ronald L. Rivest, Vanessa Teague, Bugs in our Pockets:The Risks of Client-Side Scanning, Arxiv, October 15, 2021.

(11) Curbing the surge in online child abuse my home country, the Netherlands accounts for most of the hosting, accounting for almost three quarters of all globally detected illegal material.

(12) Lorand Laskai, Adam Segal, The Encryption Debate in China, 2019.

(13) https://sanctionsnews.bakermckenzie.com/mofcom-issues-new-encryption-import-control-effective-immediately/

(14) Jack Nicas, Raymond Zhong and Daisuke Wakabayashi, Censorship, Surveillance and Profits: A Hard Bargain for Apple in China, NYTimes, 17.5.2021;He Warned Apple About the Risks in China. Then They Became Reality.

(15) R. Rivest, L. Adleman, and M. Dertouzos. On data banks and privacy homomorphisms. In Foundations of Secure Computation, pages 169180, 1978.

(16) — Arvind Narayanan, Narendran Thiagarajan, Mugdha Lakhani, Michael Hamburg, DanBoneh, et al. Location privacy via private proximity testing. In NDSS, volume 11,2011.- Xuan Xia et al, PPLS: A Privacy-Preserving Location-Sharing Scheme in Vehicular Social Networks, arXiv:1804.02431v1 [cs.CR] April 6, 2018.

(17) Pinkas, Benny, Schneider, T., Segev, G., andZohner, M. Phasing: “Private set intersection using permutation-based hashing. InUSENIX Security Symposium. USENIX,2015.

(18) https://www.microsoft.com/en-us/photodna

(19) For instance, CloudFlare’s CSAM service

(20) Paul Rosenzweig, ‘The Law and Policy of Client-Side Scanning’

(21) Video detection algorithms can be easily hacked by inserting images that would appear for a few milliseconds. See the “split-second phantom attacks,” on Advanced Driver Assistance Systems. https://www.nassiben.com/phantoms

(22) On the latest version of macOS or a jailbroken iOS (14.7+), it is possible to copy the model files from /System/Library/Frameworks/Vision.framework/Resources/ (on macOS) or /System/Library/Frameworks/Vision.framework/ (on iOS) See AppleNeuralHash2ONNX

(23) Expanded Protections for Children, p.8

(24) Apple, “Security Threat Model Review of Apples Child Safety Features”, p. 10; See also Apple says collision in child-abuse hashing system is not a concern, The Verge, August 18, 2021.

(25) https://www.missingkids.org/theissues/csam

(26) https://www.theverge.com/2021/1/27/22253162/iphone-users-total-number-billion-apple-tim-cook-q1-2021

(27) ‘Independent research firm sued by Apple now wants to help vet the phone makers child sexual abuse scanning system’, Washington Post, 16–08–2021

(28) https://www.corellium.com/blog/open-security-initiative— —

(29) Phill Zimmermann, Why I Wrote PGP, Part of the Original 1991 PGP User’s Guide

(30) Rolf Oppliger, Cryptography 101: From Theory to Practice, Artec House, 2021.

(31) https://owasp.org/Top10/

(32) Cure53, Dr.-Ing. M. Heiderich, Dipl.-Ing. A. Inführ, BSc. T.-C. Filedescriptor Hong,MSc. R. Peraglie, MSc. S. Moritz, N. Hippert, MSc. F. Fäßler, Pentest-Report Mozilla VPN Apps & Clients 03.2021

(33) Cory Doctorow, Everything is Always Broken, and That’s Okay.

(34) Annual Kulshrestha, Jonathan Mayer, Identifying Harmful Media in End-to-End Encrypted Communication: Efficient Private Membership Computation, Security Symposium ({USENIX} Security 21, 2021,

(35) S. Rispens, “Apple Platform Security and Corporate Cyber Responsibility”: https://drrispens.medium.com/apple-platform-security-and-corporate-cyber-responsibility-dbc64673c3e6