What's the solution to cloud blackouts and censorship? It's all in the CAN!
My good friend @thecryptofiend made a very interesting post today that I thought I would highlight and bring to everyone's attention so I answered it and resteemed the post.
If you read my comments there you'll notice that I purposefully avoided mentioning any sort of VIVA based answer, because I wanted to highlight what's here now.
But I thought I would take this opportunity to share with you the VIVA based answer, drawn from our own whitepaper, because this is coming in the next few weeks and it would have prevented the problems inherent in centralized content distribution networks.
But what actually is the problem here?
AWS has an outage, so what? How does that effect you?
Well in this particular instance AWS S3 is/was being used by a large hunk of the internet as a file store.
When it went down, so did half the internet, because so much of the internet was relying upon them to have it.
Basically, people were taking for granted the fact that Amazon Web Services are offered solely on a best effort basis and because of that, they had no fallback plan when AWS started returning errors. In fact had AWS just disappeared for whatever reason, there's a very good chance that much of the internet just could not recover.
There's a very good chance that if you have a file backup solution running, that you're backing up to the same S3 service that just went out. You might want to check your data integrity, because there are no guarantees data wasn't lost or corrupted during this event.
So what happens if your family photos disappear forever?
The problem here isn't amazon, a dozen competitors can and do popup and it still doesn't really solve the problem.
So what is the problem?
It's centralization. Too much content is consolidating in too few hands.
The reason for this is simple. Amazon and these other CDNs have vast amounts of resources sitting idle. These are resources that they can sell cheap and still make amazing amounts of profit on. This race to the bottom has caused massive consolidation around Amazon, Akamai, Cloudflare and a handful of others. You sitting at home on your computer could never even begin to compete.
It also isn't any good if your backup solution is cloud based, because you're at the mercy of a company. A company with shareholders and a company that must turn a profit and must comply with the laws of whatever draconian regime happens to be in place in whatever countries they decide to operate in.
Even if you discount the unlikely event that Amazon went completely out of business, all these services are still massively centralized. If you have something that may be offensive or god forbid copyrighted, they can censor you instantly and you'll lose your content, period.
But what about other solutions?
If you take a good look around, you'll find plenty of potential contenders that believe that they can solve this problem.
StorJ, IPFS, zeronet all have very good ideas but they suffer from some serious flaws as well.
StorJ - Requires a monthly contract to be paid or your data disappears. Not only no longer accessible to you, but just deleted from the network.
IPFS - Speaks a protocol that is a mix of bitorrent and git, they do not truly use the world wide web, and you either need a special client app running in the background to interact with it.
zeronet is like IPFS, but focuses on sites and adds a nice layer on top that handles domain name resolution within the .bit namespace handled by namecoin. The big problem with zeronet though is it's solidly on namecoin and thus you need to register a custom .bit domain through namecoin and .bit isn't a valid Top Level Domain (TLD). Again, you need to have a custom application installed to use it at all.
Maidsafe? Who knows? It changes every few months.
This list gets really long, and they're all really nice tries. But the thing is they miss the most important aspect of solving the long term problems inherent in data storage.
If you want to defeat cloud based webservices, you need to be able to interface with the world wide web like they do.
That isn't possible if people need to download, install & configure some custom app and ensure it's running in the background all the time.
There just isn't a good way to distribute content if it's not coming straight off a valid top level domain.
So what does the best solution look like here?
How about a globally distributed peer to content caching network?
How would something like that work without installing a custom app?
With a simple browser extension of course!
Isn't this the same as installing an app?
No, because your website is still hosted on the world wide web at any address you want.
Users don't have to install to access it, but you can incentivize them to use the plugin because it makes money for them and for you.
With the VIVA Content Addressable Network or VIVA (CAN) plugin, every visitor to the site can seamlessly share their local cache of your content.
As the page loads, a hook fires in the browser. The URL of each resource (images, javascripts etc) is hashed and the VIVA network is queried for a "live version" of the resource.
If the resource is not found, then the plugin will download the content from the website, hash the URL, hash the content and upload it to the VIVA network.
How is this possible if I'm not running a server app on my machine?
This is the point where everyone else falls down, but the answer is so simple, you're going to be banging your head on your desk.
We use a websocket connection for peer discovery, but we serve all content, peer to peer directly over WebRTC.
An example of just how easy that is to accomplish is given here...
https://www.html5rocks.com/en/tutorials/webrtc/datachannels/
Ok so that's a good gloss, but what about some more details?
When the plugin first fires up, it connects via a websocket connection to mint(s) chosen by the end user.
It then broadcasts a message that announces "Hey I'm here!" along with a list of data_hashes it has on hand and a list of data_hashes it's still looking for if any.
{
message_type: 'HELLO',
data_discovered: [array of hashes],
data_seeking: [array of DRCs]
}
This message is broadcast over the websocket to all connected clients, and then also relayed Peer to Peer via WebRTC
Each client gets that message and keeps an index of who has what hashes and also what clients relayed the welcome message to it the most quickly.
This allows every client to find nearby nodes that contain content they're looking for.
Doesn't this reveal the URLs of the sites I'm visiting?
No, it doesn't.
VIVA is a content addressable network where the hash of each URL is used as an additional index in addition to the data hash.
What is to prevent someone from faking the data, i.e. hashing some random data, but attaching it to legitimate URL?
When new content is discovered, the upload process, does NOT actually upload the data.
Instead the URL is hashed, the content is downloaded from it's original source and a hash of the content taken.
This goes into a "content claim manifest" CCM, which looks like this...
{
index_type: 'CCM'
urlhash: SHA256 of URL
datahash: SHA256 of DATA
discovered: timestamp (now)
expires: content_expiry date
discoveredby: viva account name
signature: signature of the VIVA account holder
}
This information is all that's known. The actual URL is NEVER stored.
A CCM is important information, but it's low value and useless for faking or tracking.
So why do it this way?
It's a way of announcing to the network that we have discovered content available at particular location in the graph.
It enables rapid indexing. The URL hash isn't anything more than an additional attribute in the graph search.
Other nodes in the network can then request the raw data and perform their own hash on the data to validate that the content is a match.
In the meantime, other nodes can check their own cache of the URL hash (if they have it already), and see if the hash is a match, if it's not, they can re-request the resource from it's original source and validate that the hash is new or not, i.e. still valid or not.
If it is found that the new datahash supersedes the old datahash, then the validating node gives what amounts to an upvote on the content, thereby lending their weight to the urlhash in the global index. If it's found that a node is misbehaving and uploading junk, other nodes can silence it by downvoting it in the index. Thus when a node requests a given URL hash, the results returned are ordered by the combined weight that other nodes which have seen that URL, have lent to the representative data hash. We call this a popularity index.
At this point the client should have multiple copies of the data sitting in their cache and they can lend their own weight, by upvoting the correct one.
Now as for requesting content, that comes through a "data request contract" (DRC), which looks like this...
{
index_type: 'DRC',
request_hash: ANY,
contracts : [{
contract data
}]
}
A contract is an offer for payment, and you'll notice that there is no mention of who's paying for it.
In VIVA all smart contracts are JSON objects and anonymous by default.
What they do contain is a signature field.
Public keys are registered with mints. Each public key has a spending limit provisioned by the owner of the key, but this is not public information.
When a node sees a CRM it is interested in claiming, all it needs to do is to complete the contract.
The default contract is called DATA_HASH_OF and it's fullfilled by stapling data that hashes to the request_hash, and submitting it their mint.
The mint first validates that the datahash is a match. If it is, then it checks it's publickey storage for a publickey that matches the signature on any of the contracts. If it finds a match then it debits the publickey and makes a credit to the account of the fullfiller node.
If it doesn't find a signature match, then it blinds the DRC, (deleting the content), places it's own stamp on the contract (certifying it has the matching data) and begins forwarding it to other mints for their signatures.
Once a contract has been settled, then each requester with a valid signature is sent the raw data, those nodes can begin to relay the information back upstream.
In most cases this is going to be a 1:1 mapping of data to signatures, however content that has been requested, but not found, can circulate for quite a bit, gaining additional contracts with additional signatures and thus getting more valuable. This can happen if it's been a long time since anyone on the network has seen data matching that hash.
So what does a contract for content look like?
{
contract_type: "C4C",
contract_operands: ['DATA_HASH_OF'],
data_hash: hash of requested chunk,
reference_hash: parent DRC,
expires: some date in the future>,
amount: some quantity of VIVA, defaults to 0.01VIVA,
signature: signature data
}
So in this regard a Data Request Contract is really an array of contracts, and the longer it circulates, the more valuable it becomes.
This is all well and good, but aren't we just moving the centralization to mints?
No, we aren't.
The data is still served by peers. Any node has the option of immediately responding with the correct data and claiming the contract at their leisure.
By being the first with the correct data, they have a legitimate claim to the funds of any contracts that are appended to the request, they've broadcast it to the whole network, and other nodes are signing off that they've seen it.
So they can freely submit the data directly upstream to their nearest nodes at the same time they are submitting the claim to their mint.
If a contract is claimed, it is ultimately the responsibility of the mint that executes the contract, to ensure that the data begins circulating correctly.
The mint will do this within 30 seconds regardless of if the contract claim has fully executed, i.e. all signatures found. It is optimistic execution and this negates any reason that the peer node would have for failing to submit the data.
Mints do not get paid for serving the data, but they must keep the data actively circulating for a minimum of 24hrs.
The final question then becomes if everyone is connected to a mint, then don't we have a traditional client / server architecture? What happens if the mint goes offline?
The client / server architecture only applied to payment processing and initial peer discovery. The data exchange aspects are 100% peer to peer, the mint is merely a datasource of last resort, if for instance the client is firewalled.
As mentioned before, all mints are required to keep content for a minimum of 24hrs.
However special "data storage nodes" exist to specifically archive content, potentially forever.
The nodes are registered with the mint and may or may not be owned by the mint as an extra revenue source.
These are a data source of last resort, because those nodes will wait up until the last second of the contract to supply information, or until the content age & content size vs fee has moved to a point that it becomes worthwhile to respond. In the meantime, their primary purpose is to rapidly slurp as much data as they can.
How do we know if data is really fresh?
If you right click on any website and do a view source, you're going to notice "meta" tags.
These meta tags are placed on the site by the owners of the site in order to among other things, tell search engines how long to cache the content.
Some examples are below...
meta http-equiv="cache-control" content="max-age=0"
meta http-equiv="cache-control" content="no-cache"
meta http-equiv="expires" content="0"
meta http-equiv="expires" content="Tue, 01 Jan 1980 1:00:00 GMT"
meta http-equiv="pragma" content="no-cache"
So we examine the meta-tags and compare it to the cache age of the content received from peers. If the content has expired, then we fill out a new data claim manifest.
Why do we even have data claim manifests?
DCMs are stored in the blockchain and kept indefinitely or until expiration. Mint owners set a fee split with DCM creators when they are trying to attract content hosts, and when data is requested and fulfilled that was subject to DCM, then the initial finder of the data gets their cut,regardless of if they were the ones that served the data on that request or not. In this way, we reward content discovery, but only at the time it is being requested. It also helps to avoid over duplication.
We want several copies of every piece of data circulating at all times, but it needs to strike a fine balance.
Something like bootstrap would be in the user's local cache, but if the CDN serving it were offline, the user could request it for nearly free.
Whereas, someone's youtube upload of their child's birthday party isn't likely to be requested very often and if for example youtube took it down, because the happy birthday song is copyrighted, the network would still have a cached copy of the video and while it might cost a dollar or two to revive it, at least it would be possible to do so, which is something not presently possible.
What if two people submit identical DCMs but with different datahashes?
The URL hash is merely an index, all data stored in the VIVA CAN is returned for each URL hash, automatically sorted by popularity.
Thus if there is a collision (which can happen because data does change), the network returns the most popular result first.
It is up to the mint to correctly maintain the popularity index, invalidating at expiration and a mint that doesn't do this properly is going to find themselves quickly out of business.
Does this work for only web accessible content? What about my cat pictures?
You have a private space within VIVA where you can upload whatever you want and you can mark it public, private or paid.
If it's marked private it's AES encrypted and only you have the key, but the key can be regenerated because it's deterministic.
If it's marked public then it's marked public then the entire world can see it.
If it's marked "paid", then it's still encrypted, but it's encrypted with a mint key and a shared key.
The mint will only release the key upon execution of a contract for the key, which generally means someone paid the mint according to terms you set.
In this case, the key is sent ONLY to the direct requester.
For any of this content you can set a viva link which looks like any other URL
viva://username @mint/filename or filehash
or simply
viva://hash
Again the neat thing about all of this is there's nothing external to download and run.
You just install the upcoming viva plugin in your browser, and you're good to go!
I hope this has gotten you excited about some of the features upcoming in VIVA.
Interested in learning more about VIVA?
Start with these links...
@williambanks/introduction-to-viva-a-price-stable-crypto-currency-with-basic-income-that-s-not-hypothetical
@williambanks/introduction-to-viva-part-2-more-than-meets-the-eye
@williambanks/introduction-to-viva-part-3-how-does-it-work
@williambanks/introduction-to-viva-part-4-how-do-you-bootstrap-a-new-economy
This post is 100% steem powered!