Comments:"SpiderOak's Analysis and Recommendations for the Crypto in Kim Dotcom's Mega, Part One"
User Authentication starts in user0001.js
with theu_login()
function. It works by sending the server a calculated
authentication value value called "userhash."
Here's the code for calculating userhash
.
It's a cute little hash with similarities to CBC-MAC. A 128-bit "message" is created by splitting up the string into 128-bit sections and xor-ing all of those together with 128 bits of zeroes. This message is then run through 16384 rounds of AES using a supplied key. In the case of the userhash, the key is the output of your password run through the Key Derivation Function.
Your email address and userhash are sent to Mega's servers, which presumably authenticate using the userhash, and your encrypted master key is returned along with a session identifier (sid). Getting the master key is therefore based on a simple token that can be replayed instead of a more secure challenge/response mechanism.
On the server side, auditors should look for a timing vulnerability that can be exploited when the server compares your userhash, which would allow an attacker to gradually learn the right hash over many attempts. If the server side implementation is competent, they're treating this stringhash output as a password (it effectively is one for auth purposes) and storing it server side using bcrypt or scrypt.
Now that the client has your master key, it is decrypted using the
password-derived key. It gets a little confusing with their unfortunate reuse
of variable names, but aes
winds up being your master key. It's
also interesting to note that if you select "remember me" on login, your master
key gets stored unencrypted in browser local storage, where it undoubtedly gets
written to disk.
if (typeof res == 'object') { var aes = new sjcl.cipher.aes(ctx.passwordkey); // decrypt master key if (typeof res[0].k == 'string') { k = base64_to_a32(res[0].k); if (k.length == 4) { k = decrypt_key(aes,k); aes = new sjcl.cipher.aes(k);
The sid is derived in one of two ways. Either you get a tsid
with the encrypted master key, which seems to be for ephemeral and unconfirmed
accounts, or you get a csid
, which seems to be for confirmed
accounts. If you get a csid
, your RSA private key is unpacked from
the response and decrypted with your master key. t
will be used in
a bit.
var t = mpi2b(base64urldecode(res[0].csid)); var privk = a32_to_str(decrypt_key(aes,base64_to_a32(res[0].privk)));The RSA private key is then turned into more usable integer arrays and used to decrypt
csid
, which becomes your sid. You will have to squint to
see what's going on here, it's the second element of r
.r = [k, base64urlencode( b2s( RSAdecrypt(t, rsa_privk[2], rsa_privk[0], rsa_privk[1], rsa_privk[3]) ).substr(0,43)), rsa_privk];
The far more interesting path is when you get a tsid
. The
client checks it by encrypting the first 16 bytes with your master key and
verifying it against the last 16 bytes.
t = base64urldecode(res[0].tsid); if (a32_to_str( encrypt_key(aes,str_to_a32(t.substr(0,16))) ) == t.substr(-16) ) r = [k,res[0].tsid];
It seems that in the case of a tsid
, the server does have the
unprotected master key. It's the only way the server could have calculated the
last sixteen bytes that are verified here. Since this is probably only for
ephemeral accounts, the master key won't be used again and this doesn't really
matter. It does open the interesting possibility that data uploaded with
ephemeral accounts could be less private than confirmed accounts.
Analysis and First Recommendations
Beyond our caveats above, Mega's first "hair on fire" problem is the implementation of their Key Derivation Function. This function's job is to take a (probably weak) passphrase and 1) turn it into a usable symmetric key that can encrypt secrets and 2) do so in a way that is not easily brute forced. The first part it does well; the second part it hardly does at all.
Mega's KDF is not one of the best practice recommendations such as PBKDF2 or Bcrypt, but rather a simple home grown implementation which runs the user's passphrase through 216 rounds of an AES cipher with a fixed key. That key is the same across all users — there is no added "salt".
A "salt" is a small random value used to vary KDF output with the same password input. No salt means that any two users with the same passphrase will have the same KDF output. Various research suggests that a large portion of end users will have passwords found on a list of the top 10,000 most common passwords. Calculating Mega keys from such a list would be simple. Having to account for a random salt in the KDF exponentially complicates this effort, offering some protection to a database of users who often make poor password choices.
The other problem with Mega's KDF is just that AES is too fast. It's implemented in hardware on commodity systems — you're probably reading this on a system with hardware AES. Fast AES makes each password attempt faster, making it easier to brute force the password. In contrast, Bcrypt intentionally uses Blowfish because it is expensive to compute.
With this easily computed KDF, huge databases of password keys can be built using the fixed key and every password combination out to some reasonable length. These tables will not only allow the recovery of data encrypted inside every account, but they will also recover all the passwords themselves. (A database with combinations of usernames and passwords is often considered more valuable than the accounts they protect, because they can be re-used at PayPal.com, etc. and many will succeed.) It's been revealed that Mega confirmation emails contain the encrypted master key, making it that much easier to brute force the password offline. Oh, and someone already created a crack tool to help you.
I suspect that the implementers of the system were reasoning to the effect of: "So what? This derived key is only held in memory and is only used for encrypting another key, and only that encrypted output is saved to the server. That next key is acting like a salt." To be effective, the salt must go at the beginning of this process rather than the end, so that every attack attempt has to restart calculations from the beginning. Whatever those next level of keys decrypt will be plaintext and look like plaintext so checking for success only adds a couple of cheap steps to an attack, if they're necessary at all.
Recommendation: Use PBKDF2, Scrypt, or Bcrypt instead of AES. There are JavaScript implementations available (we use jsBCrypt for web signups). If you must stick with AES for some reason, make the input to the KDF not be the raw password, but a SHA256 HMAC of the password with the salt as the HMAC key. Save the salt (plaintext) along with the output. Use at least 32 bytes of true random data for the salt. Turn up the rounds as high as you can stand (and use web workers to avoid stalling the browser).
While you're working on that section of the code, you can also improve the authentication process. When creating an account, run the KDF twice with the same password, but with two different salts. One salt is for deriving a outer layer key like usual, and the other is for creating and storing a challenge key. Then you can use a real challenge/response approach to auth instead of a static token, challenging a user to decrypt something using their challenge key. This makes it cheap for the server to do zero knowledge password proof auth (no CPU intensive bcrypt comparisons), avoids an attacker authenticating through just replaying a previous login session, and keeps you out of the business of storing a database of password equivalents.
In the short term, those of you using Mega should set a very long random password that you never use elsewhere and avoid storing sensitive data. Then as soon as these flaws are resolved, close your old Mega account and create a new one with a different password (and thus a whole new set of keys). Expect everything from the old account to be compromised.
We will continue in additional parts to discuss metadata encryption, data encryption, MACs, convergent encryption / encrypted data de-duplication, and exchanging messages with others.