The problem with metadata
Anyone who has seen a police procedural or spy dramas knows the scenario: a suspect is tracked down not by the content of their messages, but by their phone activity and contacts. The information about who they called or communicated with, when they did it, and how often—that’s all metadata. What many people don’t realize is that this type of data is often much easier to access and analyze than the content of the communication itself.
What is Metadata?
Metadata is data that describes other data. For example, it could include information about when and where a photo was taken or the technical details of a file. Websites, for instance, rely on metadata to help search engines index them properly. Metadata plays an important role in making digital information easier to manage, but it can also expose sensitive details about our digital behavior.
In the world of messaging and online communication, metadata includes details about who is communicating with whom, when the communication happens, and how much data is exchanged. While it doesn’t reveal the content of the communication, it can still be incredibly revealing. Knowing who spoke with whom and when can be enough to draw detailed conclusions about someone’s life, social network, and daily routines.
How is Metadata Collected?
Metadata is generated automatically when we use digital services, from websites to messaging apps. In many cases, it’s a crucial part of how these systems function. For example, search engines need metadata to find websites, and messaging services use it to route and deliver messages. However, while the content of messages is often protected by end-to-end encryption, metadata typically isn’t.
This leaves metadata vulnerable to being collected and analyzed. A service provider—or in some cases, third parties—can examine this data to learn a surprising amount about a person’s behavior. For instance, researchers have been able to analyze messaging app metadata to reconstruct daily routines, such as what time someone wakes up, based on when they send their first message. The patterns of communication can also reveal connections between people, even without access to the actual content of their messages.
The Risks of Metadata Exposure
The problem with metadata is that it accumulates everywhere in the digital world, and it’s difficult to avoid. Simply using a messaging app generates metadata, such as the time and frequency of communications. Even though the content of the message is encrypted, the metadata remains exposed.
In some cases, this can be especially problematic. For example, during protests or in politically sensitive situations, metadata can reveal not just who is communicating, but also where they are, thanks to the geographic information linked to IP addresses. Turning on airplane mode or using privacy-focused messaging apps that collect less metadata can help reduce exposure, but these steps only go so far.
How Can We Protect Our Metadata?
There are already various methods for reducing metadata exposure. One example is Sealed Sending, a technique where messages are sent without revealing the sender’s identity—similar to dropping off a letter at the post office without a return address. However, even this doesn’t fully solve the problem. It’s still possible to infer who is communicating with whom based on the size and timing of the data exchanges. If IP address 1 sends a packet of data to a server and then the server sends an identical packet of data to IP address 2, it’s easy to conclude that IP 1 is communicating with IP 2.
IP addresses themselves are another source of metadata vulnerability. They can reveal not only who is communicating, but also provide clues about a user’s physical location. This can be a concern not just for messaging service providers, but also for any third parties that may have access to this data.
The Solution: Metadata Shredding
For truly robust privacy protection, a more advanced solution is needed: metadata shredding. This approach involves making metadata completely unrecognizable by mixing it into large sets of anonymized data, often referred to as “anonymity sets.” By doing this, neither service providers nor third parties can track communication patterns or link senders to receivers.
The key advantage of metadata shredding is that it offers comprehensive privacy protection. Both the content of the messages and the metadata are kept private, making it impossible to draw conclusions about who is communicating or when. While this technique has primarily been applied to messaging services, it also has potential applications in payment systems and other online activities where metadata exposure is a concern.
As privacy becomes an increasing concern in today’s digital age, metadata shredding offers a promising solution to one of the more subtle but serious risks associated with online communication.