Please see the disclaimer.

Introduction

I am not a real cryptographer, so I have not verified the safety and security of the method of verifying safety numbers in Signal that I have described in this post. Use this method at your own risk. I welcome comments from real cryptographers.

I needed to ask my mother a question the other day, but it was not just any question; it was a question with an answer that needed to be kept private. So I asked her to download and install Signal Private Messenger, my choice of end-to-end encrypted messenger.

The process went relatively easily, but there was a problem: my mother and I several hundred miles apart, and we don’t see each other in person very often.

With normal messengers, this is not a problem, but Signal is no ordinary messenger.

Because it is encrypted, it needs to use an encryption key, and it is important to keep that key secret.1 Signal also has something called a “safety number” for every person that you have a message conversation with, and it asks you to verify that number. The reason for this is that Signal is still vulnerable to a man-in-the-middle (MITM) attack, and verifying that you have the same safety number as the other person in the conversation ensures that your messages are not being intercepted.

Technically, the safety number can be public,2 but if Signal is vulnerable to MITM attacks, then any other way you could send safety numbers is also vulnerable, and governments have shown a desire to get around encryption by silently listening in on conversations.

I admit that I am just a little too paranoid to use regular methods because of that, which means that verifying in person was the only way I knew would be secure enough to satisfy me, especially since Signal has the awesome feature of just scanning a QR code on the other person’s device.

But since I could not meet my mother in person, I had to find another way. Before I asked her to download Signal, I spent time thinking about a way to verify our safety number.

What Won’t Work

First, I realized what would not work.

  1. Sending a message through Signal with the safety number.

    This is because an attacker would intercept your message, realize that you are trying to verify a safety number, and instead of sending your safety number to the other person, they can just send the safety number they have with the other person.

  2. Sending a screenshot of the safety number.

    This will not work because of the same situation as above: the attacker can just intercept the message, realize that you are trying to verify the safety number, take a screenshot of their own with the other person, and send it on.

Solution

Both of the above broken methods have the same thing in common: they allow the attacker time to intercept your message, interpret it, and send their own instead. This is why my solution tries to remove that time advantage, but it may not seem like it at first glance.

This is my solution:

  1. Call the other person through Signal.
  2. Have the other person read the safety number to you, digit by digit, and listen to their voice carefully.
  3. For extra paranoia, read the number back to them.
  4. For even more paranoia, do a video call.

Why It Seems Secure

This seems to be secure for several reasons:

  1. If you are not subject to a MITM attack, your call is encrypted, giving you all of the benefits that come with encryption.
  2. If you are subject to a MITM attack, this allows you to know that you are, assuming that your attacker cannot create a convincing recreation of either person’s voice in real time.

I will explain what I mean in the second point, but its assumption is the keystone on which this method is built. If it is wrong, then I am wrong that this method is secure.

What an Attacker Has to Do

But I think I am right, though not because of exhaustive proof, mathematical or otherwise. I think I am right just based on what I know about the current state of technology.

All it would take for me to be wrong is if there was someone in the world who could recreate anybody’s voice in real time, which in this case basically means “as fast as a person can talk.” As far as I know, there is no person, institution, government, or entity that can do that.

But why would an attacker have to do that?

The reason is that, if they were to attempt to intercept the messages and insert their own, this is what they would have to do:

  1. Use speech recognition to recognize that a safety number is about to be verified.
  2. Generate prose text, something that would be recognized as normal spoken speech, from their own safety number.
  3. Use speech synthesis to say that prose text.
  4. Do number 3 in the voice of the speaking person.
  5. Do all of the above within about 200 milliseconds.

Numbers 1, 2, and 3 are not out of the range of actors like Three-letter Agencies, even within the time limit of number 5, but as far as I know, number 4 is still safe because of number 5, though that may change at any time.

And since I assume that 4 is not currently possible within the time limit, I can assume that no one can generate my voice or the voice of any person I may try to verify safety numbers with.

So, because 4 may not be currently possible, you can detect, with fairly good certainty, if someone is trying to interpose audio. And if they don’t try, the person you are trying to verify a safety number with will probably have a different safety number, and you will know you are subject to a MITM attack.

On top of that, if you do a video call, the attacker has to also create convincing video within the 200 millisecond time limit, though that may be easier than recreating a voice.

The Economics of Cybersecurity

But even if I am wrong that certain voices cannot be generated in real time, I am still somewhat confident in this method, and the reason is that it raises the cost of the attack.

That’s the important thing about using real time audio and video: doing so raises the cost of doing such an attack, and when it comes to cybersecurity, cost is king.

If an attack costs the attacker more than what he thinks he will get out of it, then he won’t do it, so the most important thing to do in cybersecurity is to raise the cost of attacks.

And that is what this method does.

What This Method Does Not Do

This method does not help you find a way to stop a MITM attack on your Signal conversations; it just helps you detect it.

It is also not meant to protect the security of your safety number, which can be public. Keeping your security number secret just happens to be a (likely) side effect.

Conclusion

Is this method foolproof? No, of course not. But for everyday ordinary people, who are not high-profile targets, I believe that this method raises the attack cost high enough that it would be prohibitive to all but the most persistent of attackers, and may even make the cost unbearable for a persistent attacker.

And until a real cryptographer comes up with a better method, this is the one that I personally use.


  1. Technically, there are two keys, only one of which must be kept secret. It is called the “private key.” The other key is the “public key,” and as its name implies, it is public. ↩︎

  2. This is because it’s a version of the hash of the public key. ↩︎