Decentralizing the Internet and Other Ideas

Introduction

My life has changed recently; I have begun an important task. This means that I have to shift my focus away from implementing all of my ideas and toward running a campaign.

Because of that and because I still want my ideas to be implemented, I am writing them down for anyone to do.

So, what do I mean by “ideas”? Ideas to make money or to improve the software industry.

Most of my ideas, at least in this post, have one thing in common: breaking monopolies that large tech companies have.

That is why I am writing them down here, for everyone to steal; I want to break those monopolies.

Also, good ideas are cheap. If someone else can implement my ideas better than I can, I should let them.

Decentralized Data

If we want to break the monopolies, we have to change the paradigm from “the company controls the data” to “the user controls the data.” In other words, we need to move from “centralized” data to “decentralized” data.

How do we do this? We do it by changing the storage of the data. Instead of having companies’ servers be the “master” copy of our data, we store our data ourselves and use the companies as backups.

This does mean that we need to get used to the idea of managing our own data and paying for cloud companies to store it. However, because of the competition it will generate, prices will go down, and I think the cost will be well worth it because this removes the monopolies of tech giants including cloud storage companies. If a user decides he doesn’t like the cloud company he is using, he can easily switch; he just uploads his data elsewhere.

Decentralized Identities

Except, it’s not that simple. How does a user let his family, friends, or audience know he moved?

This is really the biggest technical challenge to my ideas: implementing a decentralized identity system, which I will call “IDSYS” for the sake of discussion.

Yes, the name “IDSYS” sucks, especially compared to catchy names like Bitcoin and Ethereum. I am not trying to sell the product here; if I was actually implementing it (which I would do if I didn’t have more important things to do), I would spend time coming up with a good name and a great logo.

If solved, it means that a user would update the location of his content in IDSYS, and all people who care could easily go to the new location.

However, solving this problem is…hard.

Requirements

What are the requirements for a good decentralized identity system?

Well, most people start with Zooko’s Triangle:

Decentralized
Human-meaningful/memorable
Secure

According to Wikipedia, “secure” means that malicious attackers cannot inflict much damage on the system.

This is a good place to start, but it is actually incomplete. The important elements are there, though.

As far as I can tell, there are actually six requirments:

Decentralized (obviously)
Human-readable/memorable
Secure
Safe
Unique/global
Recyclable

Let’s define what each of those are.

Decentralized

Even though this should be obvious, I want to precisely define it.

“Decentralized” means that no central authority can (easily) take control. It means that identities can be gained without appealing to any central authority. It also means that an identity can be gotten at any time, if it is not taken and that gaining a name should not depend on anyone else except (maybe) consensus.

By the way, this definition precludes systems that use a hierarchical scheme, like Urbit. If anyone can deny someone else a name, besides (maybe) through consensus, the system is not decentralized.

Human-Meaningful/Memorable

This one is easy to define: if you can remember an identity at all, it is human-meaningful/memorable. Another good test is if it can be pronounced.

A bunch of random characters is not exactly meaningful. A word or a phrase is.

This means that “tsonhe189829o809ej7jnstw” is not memorable and “Yzena” is.

This is important because without memorable names, people will inevitably make mistakes when refering to an identity, either in speech or writing.

This is also another reason Urbit is less than ideal. While they did well in coming up with pneumonics to make the names they use as memorable as possible, it is still hard to get used to.

Secure

“Secure” means exactly what Wikipedia says: the system is secure from attack, or secure “enough.”

Safe

While it may seem that this is the same as “secure,” it is not.

“Safe” means an attacker cannot “steal” a name, whether by actually stealing it (whatever that might mean in IDSYS) or impersonating it.

In other words, “secure” is about the security of the system, and “safe” is about the security of individual names.

Unique/Global

“Unique/global” means that only one person or entity can have any given name at any given time, across the entire Earth (and farther, if we make it into the stars).

This means that any way of getting a name that might cause a collision is out.

This is important because without it, IDSYS will have a hard time differentiating between two people/entities with the same name, and it could also reduce the safety of names and security of the system.

Recyclable

And here we come to the hardest one of all.

“Recyclable” means that a name can (almost always) be recycled and claimed by a new person or entity after the original owner dies or otherwise gives up the name.

This is important because without it, the namespace will slowly become filled up over time and eventually become unusable. But to stay safe, it must be clear when a name changes hands.

Starting Point

From the list of requirements, the starting point should be obvious: a blockchain.

But that’s all it is: a starting point.

This is because blockchains have very serious flaws that may eventually make the system untenable in the future. We must solve those problems in IDSYS while maintaining all of the advantages of blockchains.

So…what do blockchains have that we want? Proof of ownership through proof of work. We also want any to force “miners” to accept any name transaction, whether renewal, claim, or otherwise. (In Bitcoin, miners mostly only add transactions to blocks for a small fee.)

What don’t we want? A ledger that grows without bound. We need some way of removing old blocks that don’t matter anymore, or the ledger will make IDSYS untenable.

What don’t we care about? Transaction speed. Identities should not change hands that often.

I bet we can come up with something that uses those facts to our advantage.

I am NOT a cryptographer. The ideas in this section (the whole post, really) have not been checked by professionals for viability. Do NOT implement these ideas without consultation and go-ahead from at least one real cryptographer, and preferably more than one.

Timestamps

In Bitcoin, blocks have timestamps, and there are specific rules about whether clients should accept blocks with specific timestamps.

Bitcoin has a problem with time: the network time is allowed to be off by more than the amount of time it takes for a block to be made. This is important for Bitcoin because otherwise, a common network time would be impossible at the rate which financial transactions need to be made.

But identity transactions do not need to be made that quickly. In fact, with DNS, delays in network propagation can be measured in days. So we do not need to worry about that.

But let’s go further and turn it into an advantage. The Block Timestamps page says that block times can be accurate within an hour or two. Let’s say our “grace period” for IDSYS will be two hours.

Then, let’s split the process of creating a block into several phases.

Right after a block is created, the window for sending transactions that will make it into the next block closes. Any transactions with a timestamp later than this point do not have to be included in the next block (though a miner could, to give his block a better chance of being accepted). This period goes for two hours, during which time blocks are not created; this time is to ensure that all of the transactions made before the previous block was created are propagated through the network.
After the two hour period expires, miners can safely start mining for the next block. The reason I say “safely” is because if they do not include all of the transactions (those that were sent before the first block was created) within their block, it will not be accepted. This period continues until the next block is made.
After the block is created, the first period starts again. The next block will contain all of the transactions before the previous block that were not included in that block.

Because blocks won’t be accepted if they don’t have all of the transactions, it forces miners to include them in their blocks.

Benefits of Mining

What do miners get? The identities they create by mining a block become the most trusted blocks, and only those identities have a shot at becoming permanent.

“Most trusted”? “Permanent”? Yes.

Every identity will be marked by the amount of work (the amount of “proof” in “proof of work”) that it took to reserve that identity. The more work that it took to make a block, the more it will be trusted.

This works because most identities can be created with proof of work that does not depend on previous blocks, so they can be done offline. This is done to enable regular people with regular computers to create identities without needing access to gross amounts of computing power; they can just run their machines for longer, across many blocks. The longer they spend, the more trusted their identity.

However, mined blocks are the exception. Technically, they may have less proof of work than an offline transaction, but since they were mined blocks, they become the most trusted by virtue of that alone.

And it is also only identities tied to mined blocks, “online” proof of work, that can become permanent.

Permanent Identities

Permanent identities are meant to solve two issues in one: first, some identities become tied to a famous person for long after they have died. Consider that the estate of Barack Obama would probably like to keep the rights to that name in perpetuity, since he was a President of the United States, and they don’t want his name to be reused later by someone who tries to destroy his legacy.

Just to be clear, I am no fan of President Obama’s legacy, but erasing history isn’t a good idea either because if it is, we can’t learn from it.

Of course, President Obama’s estate could just renew the identity in perpetuity, but that starts to take a lot of computing power, especially as the blockchain grows larger. So let’s make it so his estate doesn’t have to do it in perpetuity and also stop the blockchain from growing.

What his estate could do is go to the last block where they renewed the identity. Then, taking a list of the currently permanent identities, they add the identity to that list and then keep trying to add random data until the hash of the block that contains the list of permanent names is exactly the same as the renewal block. Then, they can broadcast that new “permanent” block as the newest Genesis Block, which I will call a “Permanence Block.” If accepted, it does become the newest Permanence Block, and the network collectively forgets all of the blocks in the blockchain that came before that block, including the old renewal block.

In this way, the estate of President Obama gets a permanent identity, and the blockchain does not grow without bound.

There is one caveat: the old renewal block must have come before the most recent renewal or claim of every non-permanent identity currently contained in the blockchain; otherwise, some identities would be lost, and that is not good.

Renewal

in order to be recyclable, non-permanent identities should have a way of being reclaimed, and that way is time. After a certain amount of time, identities should expire and not be trusted anymore, unless they are renewed with new proof of work in a new block.

This is where the “most trusted” versus “least trusted” comes in. The less trusted an identity is, the shorter the time until renewal is required.

Of course, a lot of identities will expire all the time, allowing them to be recycled.

This design has another good side effect: it will make cybersquatting less profitable since it will take computing power to keep names reserved.

Identities IRL

All of that is well and good, but how would people know who is attached to what identities, and how could those identities prove it?

The answer also comes from Bitcoin: wallets. In Bitcoin, bitcoins are contained in wallets, and in IDSYS, identities should as well.

This also has a good side effect that people can claim aliases as well.

By the way, this is also how identities will prove themselves legitimate when, say, signing into websites: it will sign/encrypt a token with its private key, which the website can take and use the public key, of the wallet that the identity is attached to, in order to confirm that the sign-in is valid.

However, this presents a problem: what if someone wants to move an identity to another wallet, such as another one they own, or to someone else’s wallet when that someoen bought their identity for real-world money. This should be possible, but it should put the identity into the lowest trust possible until it has been renewed with better proof of work.

By the way, this means that if a permanent identity is transferred, its permanence should be removed, and it should become a normal identity. This also means that miners attempting to create new Permanence Blocks will need to delete that identity from their new Permanence Blocks.

Attaching Data

But of what real use are identities if there isn’t a way to attach data to them?

As part of defining an identity, users should be able to attach data to them, probably in a key, value format like JSON. As part of renewal, they should be able to edit that data.

This data should not be defined, except that specific applications should be able to define what data they expect.

Problems

Any new and untested design has problems; this is no exception. I will try to answer them here.

Computing Power

Like Bitcoin, it is possible that the IDSYS network could take copious amounts of energy if there is a big incentive to mine as many identities as possible, especially as ASIC’s come online.

We can reduce the effect of this by lowering the profitability of ASIC’s and GPU’s, and thus, lowering the incentives of mining for profit.

One of the first things we can do is use a memory-hard hash function, though this has limits.

Sybil Attack

There is an attack on network services called the Sybil Attack. It is an attack that attempts to use a lot of identities to grow an outsized amount of reputation.

It has been proven that decentralized systems are always vulnerable to Sybil Attacks, so we can’t eliminate the problem. However, we can probably reduce it.

Since IDSYS already uses wallets, we can use that to our advantage. If any identity from a wallet is caught doing bot-like behavior, any user can decide to not trust any identity from that wallet.

But that just moves the problem, right? Because bad actors can just make as many wallets as they want.

Yes, unless we prevent them from doing so.

Bitcoin does not have any proof of work for its wallets. Why don’t we have proof of work for IDSYS wallets? We can do that by generating the public/private key pair, and then appending random data to the public key until we get a hash with enough zeroes.

The way it would work is like this: an identity’s trust “score” would be tied to a combination of its own proof of work and the proof of work of the wallet it is a part of. Since the wallet does not have to be renewed, its proof of work should count less than the proof of work of the actual identity, but it should still matter.

I think it might be best to have a low proof of work on the wallet reduce the trust score of its identities, while a high proof of work leave them unaffected.

And once again, the higher the trust score of the identity (including the wallet), the less the identity needs to be renewed.

What this will do is it will make it difficult for bad actors to create a lot of identities that could just be marked untrusted. If they create a lot of identities attached to few wallets with high trust scores, they risk losing the impact of all of those identities when one is found to be a bad actor. On the other hand, if they try to spread those identities across many wallets, it becomes prohibitively expensive to keep creating wallets for new identities, especially since a low proof of work will hurt the trust of the identities.

In other words, we can’t eliminate the possibility of a Sybil Attack, but we can attempt to make it economically infeasible.

So…let’s assume we managed to solve all of the problems with IDSYS, and it became a reality. What could we do then?

Well, all we have to do is write a piece of software that will serve social media content, whether posts, videos, pictures, etc. Then users will run that software on virtual machines provided by cloud companies.

And suddenly, it’s a decentralized social media platform. It has the same network effects as a centralized social media platform, because all instances share a common way to identify users, but it is not controlled by one company or another. Cloud companies have to compete for users. And users can move, removing vendor lock-in.

Facebook, Twitter, YouTube, Instagram, and others will all be irrelevant almost overnight, and cloud storage companies will see a boom in demand. Yet despite that boom, they will need to compete for users, so their incentives will align with users’ incentives.

Decentralized Services

That doesn’t really take care of Google’s power; Google doesn’t really do social media (except for YouTube).

But we can break Google’s monopoly too. We can write software to replace all of Google’s services, from Maps, to Contacts, to Calendar, to Keep, to Images, to Drive, to Documents, to Sheets, and more. All that is needed is to write the software to serve you data from your data.

Oh, and this includes email.

Encrypted Data

But if we are to do all of that, we must ensure one thing: that the data is only readable by those we give permission to.

And that does not include the cloud companies.

The data should be encrypted by default when stored on anyone else’s servers. It should not be readable to anyone, unless you say so. Such is the idea behind Tarsnap, for example.

Of course, to have decentralized social media, we must have a way to say that some data is not encrypted and a way to share some encrypted data with some people.

Portable Executables

Nowadays, a binary executable compiled for x86_64 Windows will not work on Linux, nor the other way around. This is true for almost all combinations of architectures and operating systems.

That is stupid. Let’s fix that.

Let’s take ideas from the LLVM IR and its bitcode format. Let’s build a new IR first to get rid of LLVM’s problems. Then, let’s make it so the IR and the bitcode formats we make can embed platform-specific code and routines for an optimizer to use for optimizing the code.

Voilà! We have a format for portable executables. As long as there is a compiler backend on the system that understands the format, those executables can be run.

Types

Let’s go further and standardize the types used in those executables. Since types can be values, we should be able to embed those types into the executable as well.

This also applies to types that have more than one type inside of them. For example, a struct in C is a type, but it has multiple types inside of it. In other words, these executables should know the structure of the types that they use.

I know this doesn’t seem important, but it is. Trust me. You will see why soon.

Universal File Format

But let’s go further.

Some of the biggest monopolies companies have are their control over useful file formats. Microsoft basically controls everything you do with .docx files. Adobe does the same with the files produced by their Creative Cloud suite.

Let’s destroy those monopolies.

Most of the ideas in this section come from the post No Formats, No Format Wars. I have simply adapted them to ideas I already had.

The reason that files are controlled by companies is because they control the code that implements reading and writing those files.

In No Formats, No Format Wars, a story is told about how a Staff Sergeant in the US Air Force made many competing formats into a universal format by just wrapping them in a format that allowed them to embed the code necessary for reading and writing the format.

Hmm…embedding code seems familiar. Oh yeah, we just did it.

We can wrap every file in a universal format that includes routines for reading and writing the format. It also needs to include type(s) that describe the in-memory form of the file.

Yes, this is where the types are important. Well, one place where they are important.

If we want to go further, the wrapper format can also include optional routines for displaying the file and for doing other things.

With these changes, any program should have the ability to work with any relevant files. All spreadsheet programs would be interchangable; all word processors would have to compete on the things that matter, rather than relying on control of a format.

One more set of monopolies down.

Version Control for Binary Files

But we can take this even further.

Since we have a universal wrapper format that understands types, even in binary files, could we implement a version control system (VCS) that can track changes to binary files as well as it can for text files?

Absolutely. Well, it can for files wrapped in the universal format.

All we need to do is add optional routines for displaying diffs, and since the file format knows the in-memory format of the files, it can detect differences like “this 4-byte integer was changed” or “items 1 and 5 in this array of floats were changed,” even if the file is different on disk than it is in-memory since it also has routines for reading in the file.

Requirements

This is the idea that I would most love to implement as a proof-of-concept for Yao, so I am attached to it a little more. As such, I want to give some more details of what I would want in this VCS.

First, it should use patch theory from Darcs and Pijul. In fact, it should probably use Pijul’s if it has eliminated the exponential merge problem. (See more about patch theory in Pijul here.) The biggest reason that patch theory should be used is because it is the most compatible with tracking changes in binary types.

Second, it should use a more reliable form of storage than just a pile-of-files.

Third, it should remove the ability to rewrite history. I like the idea of version control being reliable enough to use in court, and I like to know what actually happened rather than what should have happened.

Yes, I admit that I used git’s history rewrite to wipe irrelevant code and committers from one of my repos. My opinion has changed since then.

Fourth, it should include something like git’s index. I don’t see this as incompatible with the requirement above since the changes are not committed yet. Sometimes, I end up working on several pieces at the same time, and I would still like to differentiate between them in commits.

Fifth, it should have something like git’s stash, also because I end up working on several pieces at a time.

Sixth, it should include a built-in website, wiki, bug tracker, documentation and forum. The reason for this is that bug tickets, documentation, and other such things are as much a part of a software project as the code. When I moved from GitHub to a self-hosted Gitea instance, the decision was hard because of the bug and pull request history that I ended up losing.

In the end, I moved because I hate monopolies so much that I was willing to make a sacrifice to hurt a monopoly.

Bonus points will go to any implementation of this idea that uses IDSYS so that people don’t have to create an account for every individual software project they participate in.

Double bonus points will go to any implementation of this idea that allows themes for the website, and quadruple bonus points to any implementation that does that and allows the look and form of the website (beyond just colors) to be controlled by the site admin.

Conclusion

I hope that, despite the speed and carelessness with which this post was written, you may have some idea of how we can remove the monopolies from the Internet and substitute the fair competition of capitalism. Once that happens, I believe the Internet will be more free (as in freedom) while reducing or eliminating surveillance capitalism.

Doing one of those two things would be worth it. Doing both at the same time? Priceless.

Oh, and the other ideas seem pretty neat, too.

About

Contact

Archive

Categories

Tags

Subscribe

Decentralizing the Internet and Other Ideas

Table of Contents

Introduction

Decentralized Data

Decentralized Identities

Requirements

Decentralized

Human-Meaningful/Memorable

Secure

Safe

Unique/Global

Recyclable

Starting Point

Timestamps

Benefits of Mining

Permanent Identities

Renewal

Identities IRL

Attaching Data

Problems

Computing Power

Sybil Attack

Decentralized Services

Encrypted Data

Portable Executables

Types

Universal File Format

Version Control for Binary Files

Requirements

Conclusion

Recent Posts

Subscribe

About

Contact

Archive

Categories

Tags

Subscribe

Decentralizing the Internet and Other Ideas

Table of Contents

Introduction

Decentralized Data

Decentralized Identities

Requirements

Decentralized

Human-Meaningful/Memorable

Secure

Safe

Unique/Global

Recyclable

Starting Point

Timestamps

Benefits of Mining

Permanent Identities

Renewal

Identities IRL

Attaching Data

Problems

Computing Power

Sybil Attack

Decentralized Social Media

Decentralized Services

Encrypted Data

Portable Executables

Types

Universal File Format

Version Control for Binary Files

Requirements

Conclusion

Recent Posts

Subscribe