Updated: May 8, 2020
In the Smart Data group, we are not only concerned with every step of data processing - from data ingestion, through data cleaning, transformation and analysis to visualisation and insight - but also with how to guarantee data security throughout the entire process.
This can be largely achieved using encryption. It is nowadays even possible to process encrypted data directly, i.e. without the need to decrypt it before processing. This is called privacy preserving computation and we are doing active research on it.
However, even though this provides a so-called end-to-end protection of the data, there is one severe security gap: the interaction between computers and humans. While computers can process encrypted data internally, it has to be decrypted for a person to read. This means that data which is shown on a screen or printed out on paper is usually completely unprotected.
To see why this can be a problem, consider the following examples:
Industrial espionage: while waiting at an airport, an engineer opens her laptop to work on some sensitive documents about cutting-edge innovative technology. She uses a VPN with end-to-end encryption to access the files on her company's server, and of course her local hard drive and RAM are protected by encryption as well. However, all her security architecture is bypassed when a spy from a competing company manages to photograph her screen from behind her.
Plausible deniability: in an authoritarian country, a human rights activist is arrested by the police. They confiscate all his electronics. As long as all relevant files are encrypted, he can claim that their content is private – perhaps embarrassing – but legal. This may provide him some plausible deniability. However, once the police simply force him to type in his password to unlock them, they are easily readable for everyone and can be used as evidence against him.
National security: a highly skilled group of hackers manages to infiltrate smartphones and computers of politicians and government employees with a spying software that directly taps the screens of their devices, effectively creating and leaking screenshots of whatever is displayed there. In this way, they circumvent all end-to-end protection and obtain classified information about critical infrastructure, defence tactics, intelligence, ongoing criminal investigation, etc.
To solve this issue, we propose the concept of Human-Readable Encryption (HRE). The idea is to find ways of encryption, such that a human is able to read a text in its encrypted form. This eliminates the need for decryption and closes the aforementioned security gap.
But is this even possible? And even if it is, doesn't the massive computing power of modern day computers far exceed the capacity of the human brain? Wouldn't any encryption that is simple enough for humans to read also be easy to break?
Well, not necessarily. Encryption is secure not because it is overwhelmingly complex, but rather because it is made in such a way that it is hard to crack without a key. With the right key however, decrypting a ciphertext is relatively easy. Based on this principle, there might be encryption schemes that a human with the right key can easily read, while a computer without a key cannot.
What's more, there are in fact already known forms of encryption that are human-readable. Consider for example monoalphabetic substitution. This is a class of schemes where every letter of the alphabet is replaced by some symbol. (This symbol may be a different letter of the same alphabet, or we may use an entirely different set of symbols - it doesn't really matter.)
Given a table listing all the letters of the alphabet and their corresponding replacement symbol, a human can learn how to read a text encrypted in such a way in less than one hour – quite fluently, in fact. (Go ahead and try it for yourself!)
Unfortunately, monoalphabetic substitution is also very easy to crack. So this is clearly not the solution, but it demonstrates that a solution might, in principle, be found.
To our knowledge, nothing like this exists today, but we would like to start the research necessary to develop it. Together with our colleagues from the Human Computer Interaction group, we want to answer the following questions:
What types of encryption (other than monoalphabetic substitution) are in principle human-readable?
For each of these types, how fluent can a human become in reading an encrypted text?
For each of these types, how much time and effort does a human have to invest into learning to read fluently?
For each of these types, how secure are they against attacks from modern supercomputers?
Is there an overlap between sufficiently secure and sufficiently human-readable encryption types that may be used to develop secure and practical HRE?
Given a working HRE scheme, can it be modified to encrypt not only text, but also other types of information (visual, audio,…)?
Can we develop an HRE scheme that guarantees plausible deniability, i.e. when given a random ciphertext and a cleartext, it is impossible to check whether they belong to each other?
Given a classically encrypted ciphertext, how do you translate it into an HRE ciphertext without decrypting it first? (Otherwise this would be a new security gap.)
We cannot do this research alone. We are data experts, not crypto experts. Which is why we need your help!
Do you think the concept sounds interesting? Do you have some ideas for how to construct a secure HRE scheme? Do you know how to analyse and optimise the security of an encryption scheme? Can you provide a real-world use case for HRE? Would you like to develop a marketable product with us – an app, a browser add-on, a service? Do you know of existing attempts to achieve something similar? Or perhaps you can think of a good reason why secure HRE is simply impossible?
Then feel free to contact us! Send an e-mail to Philip.firstname.lastname@example.org to share and discuss ideas.