Security is obviously one of the most important aspects of Avatar design. We’ve tried to follow best practices and minimize as many different threats as we can come up with, but it is our sincere hope that this section is closely inspected by the crypto community and questions are asked.
We have tried to keep everything related to security as simple as possible for two reasons. The first reason is that we don't want to create complex protocols prone to either implementation, or logical errors. Sticking with the best practices and fundamentals will provide a solution for almost all security challenges. The second reason is that we want everything to be as easy as possible for others to understand and review properly. This way possible errors in thinking are found fast and fixed even faster.
If you are reading this chapter, you are probably familiar with Alice and Bob. In case you aren't, all you need to know is that they are two friends using Avatar.
Design goals and threat models
The design goals for the Avatar OS are summarized in Avatar Principles. The overarching goal is to connect and enable people to interact over the Internet securely and privately.
The threat model for Avatar OS is an adversary that can seize and access the device but is not able to constantly monitor the user using the device. There's very little that we can do against an adversary who is running a keylogger or spying on the screen in real-time.
The threat model for Avatar Network is based on an adversary not being able to control all nodes in the routing path. We also assume the adversary isn't able to intercept 100% of the packets. Due to the internet's global nature it's unlikely that an adversary has the capability to intercept 100% of traffic all over the world. The decentralization fundamentals provide the mental framework about how Avatar OS was designed to work with Avatar Network.
Encryption schemes used
All Objects are encrypted with AES in CTR mode using 256bit keys with Encrypt-then-MAC composition. Avatar uses the Forge library as a generic crypto library.
For messaging and identification/signing purposes we use elliptic curve DSA with secp256k1 (Koblitz curves). The first iteration of the design used RSA with 2048bit keys but tests showed that performance is noticeably poor with today's mobile devices. Please note that NSA backdoor applies only to elliptic curve DSA with secp256r1 curves. The library for ECC is JSBN extended with ECDSA_JSBN for ECDSA and secp256k1. We are aware of theoretical weaknesses in secp256k1 and are evaluating Curve25519 as a replacement.
Avatar OS code delivery protocol
Secure code delivery within the browser is a challenging problem because, due to the nature of the web, the browser was designed to treat all code as potentially malicious and to survive from running it without any external authority saying what code is good and what code is bad. This is of course fantastic from the freedom point of view but it means browsers don't have any methods to verify the validity of any code it attempts to execute. There has been few propositions in recently published whitepapers that would provide this functionality but at the moment there's no mechanism that enables browser to ask "is this code 100% identical to what foo.com/code.js is supposed to be".
Traditionally websites use SSL (TLS) connection to secure the connection and believe it's enough protection. However even if the connection between web server and visitor is secured, a properly motivated adversary can always compromise the server itself, replacing the necessary files with malicious ones. For this reason we believe it's not optimal to serve Avatar OS code in avatar.ai without an external verification mechanism making sure the files on the server have not been tampered with. The current solution is still not optimal because files are stored at avatar.ai. The optimal solution would be to use Avatar Network to store Avatar OS code but due to various challenges it's not possible yet.
Our code delivery protocol is based on storing a proof of validity in Namecoin blockchain and doing near real-time checks for validity of the live code in avatar.ai.
Our thinking behind this is that an adversary would have to either get enough computing power to replace the hash in a Namecoin blockchain, or create a very expensive attack separating requests that validate to content from requests from normal users. We know some governments have the capabilities to divert traffic before it even touches the target server but the current assumption is that traffic diverting is possible only on a very small scale and can't be done on a large enough scale required here.
In the Avatar OS release packaging phase, Avatar OS code will be base64 encoded and hashed with sha256. The hash is then stored into a Namecoin blockchain. Because Avatar is packaged as one HTML file for portability, we can verify it just by requesting it over the Internet, computing hash from the received file and comparing it with the hash in the Namecoin blockchain.
The checks are done in 5-15 second intervals via Tor network. Tor helps us to protect the location of servers doing the checks thus making the attack surface smaller. It also makes it harder for anybody monitoring the traffic to know which requests are doing the validation. Checker servers access the Namecoin blockchain directly to get the hash and make the comparison with the live code. If the live code verification fails, the checker server will alert the admins and attempt to replace the compromised live code with a verified one.
Everything is an encrypted Object
All communication between Avatars happens with encrypted JSON objects. All messages, files and basically everything you produce with Avatar OS are Objects. These Objects are stored either locally or in the Network. To access any Object in Avatar, even the locally stored, you need to know the unique id (UID), the access key (AKEY) and the key for encryption (EKEY). UID, AKEY and EKEY are explained in more detail further on in the Object address protocol chapter.
The main design goal with Objects is that if somebody is able to obtain an Object, there should be no way for them, without a correct EKEY, to say which type of Object it is, what data it has, who created it and who has accessed it. By requiring both UID and AKEY to fetch an Object we make it much harder to bulk request random objects by generating random 64 strings. The secondary goal was to provide more security when Alice uses a shared device. This is achieved by storing meta data inside Alice's Avatar instead of Objects themselves. Even if Alice doesn't securely log out and leaves her Objects in the shared devices they can't be opened, or analyzed, because of the missing EKEY that only Alice's Avatar knows. This helps with the inadequate access controls of browser technologies like IndexedDB.
Anatomy of Avatar
We have many different Avatar parts in the system but when we are saying "Avatar", we are talking about a user's Avatar Object which contains all the important data for that user. Each Avatar is composed of an Object Registry, Personal Information Registry and Contact List. Everything is stored in one JSON object which is treated just like any other Object in the system. This makes it very hard for an adversary to separate high-value targets like Avatar Objects from any other Objects in the Network.
When a new Avatar is created it will generate a unique Avatar ID (AID), a public address (APA) and – for cryptographic signing purposes – a private key (APRK) and a public key (APK).
Object Registry
Object Registry is the place where all the knowledge of all Objects which Avatar knows about is kept. Essentially it's one long list of Object UIDs and each record contains enough information to reach the Object. Object Registry will be explained in detail later in this document.
Personal Information Registry
PIR is a place where Avatar stores all data about the user. The user can store any arbitrary data and if any external entity wants to request information from PIR, Avatar will request permission from the user. Nothing in PIR is available to any external entity without the user's explicit permission.
The idea behind PIR is to provide a way for users to store important data about themselves so that it's protected but at the same time they can give permission for others to access it.
Contact List
Contact List has all the information related to relationships with different Avatars. Each Contact shares its unique address (shared address) with a user's Avatar enabling one-to-one communication. Messaging and friending protocols are explained later in the document.
Authentication protocol
Usernames and passwords are never broadcast or sent anywhere.When Avatar OS has been executed successfully it expects the user to provide a username and password to authenticate. The user's Avatar may or may not exist on the device. If UID and AKEY don’t return anything from the local cache, Avatar OS will ask Avatar Network to deliver the correct Object.
The user's Avatar consists of two separate objects: Buffer Object and Avatar Object. Like everything else in Avatar, Objects are encrypted JSON objects.
Buffer Object’s sole purpose is to give the user the ability to change password without encrypting Avatar Object again and the ability to see a password reminder in case they have forgotten their password. Because Avatar is a decentralized system, there is no one who can reset the user’s password. If the password is forgotten, there is no way to retrieve the data from the user's Avatar. Another reason for this design was to make it more expensive to steal objects from Avatar Network and try to find Avatar Objects by bruteforcing weak passwords.
When the user starts the authentication process, the only information available are the username and the password. Because the whole point of Buffer Object is to give the user a chance to recover the forgotten password from their memory, we can’t use user's password to derive UID and AKEY for Buffer Object.
The salt is the username reversed, N is 1024, r is 8 and p is 1. We realize that N isn’t optimal but we had to decrease it due to serious performance issues on mobile devices. The N value will be customizable at some point.Buffer Object UID and AKEY are derived by computing a 128 character hash from the username with scrypt and then splitting the hash into two 64-character long strings. Avatar uses sjcl-scrypt library for scrypt, however this will change before the first version to another implementation without SJCL.
The Buffer Object consists of two layers. The first layer is encrypted by using username as EKEY. The first layer contains the reminder question as a plain text and the second layer as an encrypted string. The first layer isn't expected to be secure. The second layer is encrypted with the user's password and contains UID, AKEY and EKEY for the user’s Avatar Object. If the user decides to change the password, only the second layer needs to be changed and encrypted again.
Once Avatar Object has been successfully fetched from the Network and decrypted with EKEY from the Buffer Object it can be imported into Avatar OS. Avatar Object contains all user information from personal information to references to stored data. Essentially it's the heart of Avatar OS and what makes it yours.
Object address protocol
Storing data in Avatar Network is a challenge because our assumptions dictate that we can't trust any nodes to behave properly. This makes it difficult to create a system that offers the features users are expecting for any messaging or online storage platform. The first versions of Avatar will focus on providing message storing via Avatar Network and the DHT item size will be limited at the protocol level. Once we have more production data to verify everything works as designed, this limitation will be removed.
Avatar Network uses a modified R5N DHT (whitepaper) as a distributed storage layer on top of Avatar Network. Avatar Network Protocol also utilizes R5N DHT for its internal purposes.
In DHT terminology UID would be the equivalent of a key for a data item. DHT has been modified so that to fetch a data item you also need to send a password which is AKEY. If AKEY is missing or wrong, DHT will drop the request. UID and AKEY are both 64-character strings.
Object address protocol for non-chainable Objects
Non-chainable Objects are individual Objects that exist without a previous or next Object. For example individual chunks from a bigger Object are non-chainable Objects because they are meant to exist without any means to connect them to other chunks.
For non-chainable Objects both UID and AKEY are randomly generated by using Fortuna PRNG and the current timestamp.
Object address protocol for chainable Objects
Some Manifest Objects need to have chainability capabilities to provide a chronological ordering where needed. For example discussion threads and status updates. We can solve this by creating a UID/AKEY scheme that provides a chronologically ordered list of Objects. Chainable Objects use a deterministic scheme to create UID and AKEY.
Chainable Object's UID is derived from the following deterministic scheme: sha256(seed + (n+1)). The seed can be anything but most likely it will be a shared address which will be covered later on. When Avatar wants to add a new Object to the chain, it will simply keep adding to n until it finds a free slot. AKEY is derived by taking the first 32 characters of the seed: sha256(32 characters + (n+1)). This isn't optimal but makes bulk fishing the Network for Objects very expensive and difficult.
Object address protocol for multi-level chainable Objects
Multi-level chainable Object means that there are two or more seed values and an iteration number. At the moment these are only used in messaging protocols to provide conversation-like mechanics.
In multi-level UID is derived from the following deterministic scheme: sha256(seed1 + seed2 + (n+1)). Similar to chainable Objects AKEY is derived by taking the first 32 characters of the seed: sha256(32 characters + seed2 + (n+1)).
Because Avatar Network can also expire items for storage efficiency purposes, expired keys are not deleted but instead show up as empty. When Avatar follows the chain it will keep trying until it finds a key that doesn't exist. For efficiency purposes we can also utilize other Avatars (with permission) for knowledge about Objects and their chain lengths.
Data handling
Avatar OS wants to provide a truly easy way to store and share files online while keeping maximum security.
When the user imports a file, the OS automatically converts it to an Object. An Object can store any data, binary or ascii. It has a meta data partition so the OS can determine whether it can open/edit it by itself or if it is data that requires an external application. We use the file's content type to determine this. By converting everything into Objects we hope to create a standardized way to share data over the Internet without having to worry about file formats. In the future Avatar Bridge will support basic transcoding tasks which opens up a lot of opportunities. For example Avatars could sell their computing power for transcoding.
Object Registry
Each Avatar Object has an Object Registry which is a list of all Objects it knows about. Objects can be either available in the local cache or fetched from Avatar Network when required. Being able to choose what to keep locally is necessary to support devices with lower storage capabilities and to make it possible to quickly log in to Avatar on any device to check messages without downloading everything.
Object Registry Item
When a new Object appears Object Registry creates an Object Registry Item for that specific Object. Object Registry Item only carries UID, AKEY and EKEY for an Object's Manifest Object. This is because we want to keep Object Registry as small as possible so Avatar Object's size remains relatively low even with a lot of files. If Avatar has been given a permission secrets for an Object, those are also stored in Object Registry Item. Permission secrets will be covered later on.
Manifest Object and Chunks
Manifest Objects are basically small documents listing the contents of an Object. Their main purpose is to separate security details from Objects containing data and to provide a lightweight way to share Objects.
One of the main challenges for a voluntary-participation-driven network is how to balance network load, fairly distribute bandwidth usage, and guarantee availability of the data. These challenges are even harder when it comes to mobile clients. For these reasons, Avatar splits all files bigger than 32 kilobytes into chunks of varying sizes between 32KB and 2MB. The size of the chunk is mainly determined by the size of the file and randomized to a degree. Small chunk size will help availability challenges with unreliable infrastructure where individual Bridges are expected to drop out frequently. Random sizes also make it harder to analyze chunks to identify more high value Objects like Object Registries or Avatar Objects for example.
Each data chunk will be assigned a unique and random UID, AKEY and EKEY. All chunk sizes, UIDs, AKEYs and EKEYs are stored in Manifest Object in order. Each chunk will be hashed with sha256 and the hash will be stored in the Manifest. The first chunk is always the meta data partition of the Object so Avatar doesn't have to download every chunk to access the meta data. This also makes it possible to stream Objects.
When you want to send data to another Avatar you send the UID, AKEY and EKEY to the Manifest Object instead of sending the Object.
Object permissions and secure Object modifying
Updating anything in a distributed environment is hard from a security point of view. The challenge can be distilled into a question: How do you guarantee that nobody other than the owner is able to replace data? If you rely on the standard merchanism of the owner sending a new data and update command, how do you make sure the receiving node doesn't use this information to issue another update command replacing the data with its own?
One common everyday work scenario is that Alice creates a document and she wants to give Bob editing rights. However the document will also be sent to Charlie but Charlie shouldn't be allowed editing rights.
With centralized architecture it would be as simple as restricting what Charlie can do at the server level, but we don't have that luxury with Avatar. In a decentralized architecture we can't restrict what Charlie does with the Object if he has received it, but we can control that he doesn't replace Alice's and Bob's version in the Network without permission from Alice or Bob.
Permission info is kept in a separate permissions object for each chunk inside the Manifest Object. The Permission object contains scopes for various actions like changing permissions, updating and deleting the Object from the Network. Each scope contains three items: shared secret (PSEC), public key hash (PHASH) and meta data (PMETA). Having to compute each chunk separately is not very efficient but we haven't found any other solutions that would provide a way for Alice to update her Objects in the Network without giving the malicious node a way to reuse Alice's update credentials and make malicious updates in her name.
PSEC is the first secret derived with Sharmir's secret sharing algorithm from 256bit ECDSA private key. Please note that anybody who has access to permission change scope has access to variables used to generate shared secrets, and therefore can add secrets without anybody else's permission. This happens because if we want to be able to potentially add shared secrets (for new users) later on, we must lock in the parameters used to generate the secrets in the first place, in order to be able to keep the existing shared secrets while adding new ones. So allowing other users to change permissions should be decided with care.
By using Sharmir's secret sharing we can also build m-of-n permissions. This means that you can create an Object that can be modified only if, for example, 3 out of 5 agrees and uses their shared secret to get the private key. The default setting is 1 out of X.
PHASH is sha256 hash computed from the public key counterpart to the private key in PSEC. Because Shor's algorithm will provide quantum computers a very, very fast way to bruteforce private keys from public keys, we want to avoid exposing any keys unless absolutely necessary. Instead of storing the public key in the chunk's permission scope, we only store a hash of it. This way the public key is only exposed if the Object needs to be updated. Quantum computing doesn't provide any meaningful threat to cryptographic hash functions.
PMETA offers a way to store other arbitrary data related to permission scope. For example when generating new shared secrets Bob needs the original random number used for Sharmir's secret sharing algorithm. PMETA is encrypted with AES using the scope's ECDSA private key as an encryption key.
Updating example
Alice has decided to give Bob permission to update the data portion of her Object. First she computes new shared secrets for the update permission scope for all the chunks, and sends them to Bob. Bob stores his shared secrets to the Object Registry Item corresponding to the Object. Bob makes some changes and wants to publish them. First he will chunk the Object with chunk sizes specified in the Manifest. After that he checks which chunks have changed due to his updates by hashing the old and the new and comparing the hashes.
For each changed chunk he takes his shared secrets and derives ECDSA private keys from PSECs stored in the update permission scopes. Each private key will give him the public key for that specific chunk. For each chunk Bob will now encrypt the data with the chunk's EKEY, hash the encrypted data with sha256 and use a private key and hash to get a signature consisting of two values, checksum 1 (PCHK1) and checksum 2 (PCHK2).
Bob will now do an update call to the Network. With the call he sends new chunk data, UID/AKEY for access, a public key, PCHK1 and PCHK2. The network will route Bob's call to the nodes that are holding the chunk with that specific UID. The nodes will verify access by matching AKEY. If Bob's AKEY matches with the one on the chunk, the node will then compute sha256(public key). The hash of the public key will be matched against the requested permission scope's PHASH value. If the hashes match, Bob's public key is valid and the node now knows that Bob has the correct private key.
Now the node needs to make sure that Bob's new data hasn't been altered in transit and do one more check that Bob has the right to change the data. The node will now compute a sha256 hash from the new data chunk. Then it will feed the hash, PHCK2 and public key through ECDSA. If the result matches with PHCK1 then the node knows that Bob truly has the private key and the data is identical with the one where PCHK1 and PCHK2 came from.
Because the node doesn't know the private key, it can't compute PCHK1 and PCHK2 to create a malicious update. The only thing a malicious node can do is to corrupt the data and store that, or drop the update. A malicious node can do both of these actions anyway. Either way, the node would soon be flagged and dropped from the Network. However, having to expose the public key creates a weakness which quantum computing could exploit. A malicious node can store all public keys it encounters and derive private keys later on. The attack surface is limited in a sense that if the private key is compromised, the node can't read data, it can only forge a certain action provided by the permission scope. This could be theoretically fixed by creating some sort of changing key scheme.
"Frending" and following protocol
Avatar's friending mechanism operates on shared addresses. A shared address is a 64-character string that is shared between two Avatars, so it’s essentially a relationship ID. When Alice wants to friend Bob and give Bob a chance to send her something, Alice's Avatar will create a new unique address for that relationship, and then send that address to Bob's APA. We are assuming that Bob has shared his public address with Alice. When Bob receives Alice's friend request, he can then either accept the request or decline it. Bob can assign Alice to a certain group or mark Alice as an "acquaintance", so that unless Bob specifically views Alice's profile, he won't see Alice's public updates but he will still receive any private messages Alice sends. From his Avatar's Contact List Bob can configure exactly what type of communication he will allow Alice to send him. The idea is that you can store contacts in Avatar that you don't feel are really part of your life but you still want to possibly reach in the future.
Because Avatar always defaults to private, your Avatar is not visible to anybody else unless you post a public update or you give out your AID. When you post an update you can limit it to a certain group you have created, or make it public. Public updates are available to everyone in Avatar Network who knows your AID or APA.
If you've published a public update or shared your AID, another Avatar may "friend" you and add your APA to their Contact List. Your APA enables people to only read your public updates, nothing more. This is like a personal RSS feed. Other people can read the feed but you can't approach them unless they let you know you have their permission.
You can follow somebody's public updates with or without giving them permission to send you anything other than their public updates. It's also optional to share that you are following their updates. Following somebody's updates should not be an all-inclusive right to send anything else. This mechanism allows you to decide how visibly you want to follow somebody. You might want to follow, for example, a person representing a certain ideology but wouldn't want to share your interest publicly or with the person in question.
Messaging protocol
Avatar wants to offer easy-to-use, anonymous and secure messaging with store-and-forward capabilities (like email). Avatar Network protocol takes care of anonymization so the messaging protocol focuses on storing and delivering secured messages in one-to-one, one-to-many and many-to-many scenarios.
The biggest challenge with a decentralized messaging solution is to offer secure store-and-forward capabilities without going back to centralized servers. Store-and-forward capabilities are essential for anybody who would prefer to send a message than chat.
Message security
Messages are Objects just like everything else and enjoy the same protection as other Objects. All messages themselves are signed by the sender with their APK.
One-to-one messaging
When Alice wants to send a message to Bob, she first generates a random 64-character string as a new conversation ID. The conversation ID is used as a secondary seed in a multi-level chainable Object. Then her Avatar follows data handling and Object address protocols and ends up with a multi-level chainable Manifest Object. She stores the Manifest Object in the Network and tries to establish a connection to Bob's Avatar over the Network and send him a notification about the message. The notification includes the last 10 characters of the shared address, conversation id and AKEY. This way we minimize the risk of compromising identities, or the shared address. By looking at his Contact List for shared addresses Bob's Avatar can easily identify which Contact it came from. Bob's Avatar then follows multi-level Object chaining protocol and gets the correct UID and AKEY for fetching the Manifest Object.
If Bob is not online when Alice tries to notify him, Alice will find the next free key in Bob's shared address and store the notification there. When Bob successfully connects to Avatar Network, his Avatar will go through all Contacts and try to follow each shared address' Object chain. Alice's notification will be waiting for Bob and Bob's Avatar now knows to get the message.
If Bob wants to reply, he uses multi-level Object chaining protocol to generate possible UIDs and AKEYs and keeps trying until he finds a free slot. Then he stores a Manifest Object and tries to notify Alice, just like Alice did before.
One-to-many messaging
In one-to-many messaging Alice sends a message to multiple recipients. These are usually status updates. One-to-many messaging works exactly like one-to-one messaging but instead of notifying one receiver, Avatar notifies multiple. Here using Manifest Objects pays off because sending the whole Object, which could be text, video or any random data, would be very expensive.
Many-to-many messaging
Many-to-many messaging is essentially how discussion boards works. All participants see everyone else's messages. Many-to-many messaging can be created by automatically sharing relevant Manifest Objects to participating users.
There are few efficiency concerns in many-to-many messaging which require further research. The default behaviour in Avatar OS is that new messages would be propagated by the author when published, essentially notifying all other participants. However with many-to-many messaging this is very inefficient so a better option is to have Avatar manually pulling new messages when the user views the discussion. Automated notifications for new messages in "followed" discussions would be nice though.
Push or pull
Avatar uses both. Wherever possible, pushing is used for performance reasons. There are Twitter accounts with nearly 50 million followers and follower counts are constantly increasing. If an Avatar user were that popular and all of those 50 million followers checked at frequent intervals for potential updates it would create a huge amount of unnecessary stress on Avatar Network. This can be avoided by pushing an update notification and the follower's Avatar OS would pull the update when the user logs in.
Off-the-record (OTR)
Avatar messaging doesn't support Off-the-Record messaging in its current form. The essential part of OTR is to provide a private, face-to-face type of conversation. With OTR the problem is store-and-forward mechanism which can't exist without compromising OTR's basic principles. Theoretically it is possible to create OTR chat between Avatars over Avatar Network but it's not currently in our roadmap.
We feel that it's more important to offer a secure alternative to an email type of messaging than to provide OTR functionality that already exists in many IM applications.