Female Founders Conference

January 21, 2014, 10:51 am

≫ Next: Announcing the Female Founders Conference - Y Combinator Posthaven

≪ Previous: New Year, New CEO for GitHub · GitHub

Comments:"Female Founders Conference"

↧

Announcing the Female Founders Conference - Y Combinator Posthaven

January 21, 2014, 10:43 am

≫ Next: Why Bitcoin Matters - NYTimes.com

≪ Previous: Female Founders Conference

Comments:" Announcing the Female Founders Conference - Y Combinator Posthaven "

URL:http://blog.ycombinator.com/announcing-the-female-founders-conference

Jessica Livingston

I'm delighted to announce that Kat Manalac, Kirsty Nathoo, Carolynn Levy and I are hosting Y Combinator's first Female Founders Conference on Saturday, March 1. We're going to gather together female founders at all stages to share stories, give advice, and make connections.

The original idea was to make this an event where female YC alumni shared their experiences. But once we started planning the event we thought it would be exciting to invite Julia Hartz and Diane Greene to speak as well, so that we'd have speakers who could talk about what happens at even later stages.

As well as the speakers, many female YC alumni will be attending the event, so this will be an opportunity to get to know them and ask questions.

The best source of information about startups is the stories of people who've started them. Our goal with this conference is to inspire women to start (or hang in there with!) a startup through the insights and experiences of those who have done it already. If you're a woman interested in learning more about startups, I encourage you to apply.

↧

Why Bitcoin Matters - NYTimes.com

January 21, 2014, 9:44 am

≫ Next: Clone Angry Birds with SpriteBuilder

≪ Previous: Announcing the Female Founders Conference - Y Combinator Posthaven

Comments:"Why Bitcoin Matters - NYTimes.com "

URL:http://mobile.nytimes.com/blogs/dealbook/2014/01/21/why-bitcoin-matters/

Editor’s note: Marc Andreessen’s venture capital firm, Andreessen Horowitz, has invested just under $50 million in Bitcoin-related start-ups. The firm is actively searching for more Bitcoin-based investment opportunities. He does not personally own more than a de minimis amount of Bitcoin.

A mysterious new technology emerges, seemingly out of nowhere, but actually the result of two decades of intense research and development by nearly anonymous researchers.

Political idealists project visions of liberation and revolution onto it; establishment elites heap contempt and scorn on it.

On the other hand, technologists – nerds – are transfixed by it. They see within it enormous potential and spend their nights and weekends tinkering with it.

Eventually mainstream products, companies and industries emerge to commercialize it; its effects become profound; and later, many people wonder why its powerful promise wasn’t more obvious from the start.

What technology am I talking about? Personal computers in 1975, the Internet in 1993, and – I believe – Bitcoin in 2014.

One can hardly accuse Bitcoin of being an uncovered topic, yet the gulf between what the press and many regular people believe Bitcoin is, and what a growing critical mass of technologists believe Bitcoin is, remains enormous. In this post, I will explain why Bitcoin has so many Silicon Valley programmers and entrepreneurs all lathered up, and what I think Bitcoin’s future potential is.

First, Bitcoin at its most fundamental level is a breakthrough in computer science – one that builds on 20 years of research into cryptographic currency, and 40 years of research in cryptography, by thousands of researchers around the world.

Bitcoin is the first practical solution to a longstanding problem in computer science called the Byzantine Generals Problem. To quote from the original paper defining the B.G.P.: “[Imagine] a group of generals of the Byzantine army camped with their troops around an enemy city. Communicating only by messenger, the generals must agree upon a common battle plan. However, one or more of them may be traitors who will try to confuse the others. The problem is to find an algorithm to ensure that the loyal generals will reach agreement.”

More generally, the B.G.P. poses the question of how to establish trust between otherwise unrelated parties over an untrusted network like the Internet.

The practical consequence of solving this problem is that Bitcoin gives us, for the first time, a way for one Internet user to transfer a unique piece of digital property to another Internet user, such that the transfer is guaranteed to be safe and secure, everyone knows that the transfer has taken place, and nobody can challenge the legitimacy of the transfer. The consequences of this breakthrough are hard to overstate.

What kinds of digital property might be transferred in this way? Think about digital signatures, digital contracts, digital keys (to physical locks, or to online lockers), digital ownership of physical assets such as cars and houses, digital stocks and bonds … and digital money.

All these are exchanged through a distributed network of trust that does not require or rely upon a central intermediary like a bank or broker. And all in a way where only the owner of an asset can send it, only the intended recipient can receive it, the asset can only exist in one place at a time, and everyone can validate transactions and ownership of all assets anytime they want.

How does this work?

Bitcoin is an Internet-wide distributed ledger. You buy into the ledger by purchasing one of a fixed number of slots, either with cash or by selling a product and service for Bitcoin. You sell out of the ledger by trading your Bitcoin to someone else who wants to buy into the ledger. Anyone in the world can buy into or sell out of the ledger any time they want – with no approval needed, and with no or very low fees. The Bitcoin “coins” themselves are simply slots in the ledger, analogous in some ways to seats on a stock exchange, except much more broadly applicable to real world transactions.

The Bitcoin ledger is a new kind of payment system. Anyone in the world can pay anyone else in the world any amount of value of Bitcoin by simply transferring ownership of the corresponding slot in the ledger. Put value in, transfer it, the recipient gets value out, no authorization required, and in many cases, no fees.

That last part is enormously important. Bitcoin is the first Internetwide payment system where transactions either happen with no fees or very low fees (down to fractions of pennies). Existing payment systems charge fees of about 2 to 3 percent – and that’s in the developed world. In lots of other places, there either are no modern payment systems or the rates are significantly higher. We’ll come back to that.

Bitcoin is a digital bearer instrument. It is a way to exchange money or assets between parties with no pre-existing trust: A string of numbers is sent over email or text message in the simplest case. The sender doesn’t need to know or trust the receiver or vice versa. Related, there are no chargebacks – this is the part that is literally like cash – if you have the money or the asset, you can pay with it; if you don’t, you can’t. This is brand new. This has never existed in digital form before.

Bitcoin is a digital currency, whose value is based directly on two things: use of the payment system today – volume and velocity of payments running through the ledger – and speculation on future use of the payment system. This is one part that is confusing people. It’s not as much that the Bitcoin currency has some arbitrary value and then people are trading with it; it’s more that people can trade with Bitcoin (anywhere, everywhere, with no fraud and no or very low fees) and as a result it has value.

It is perhaps true right at this moment that the value of Bitcoin currency is based more on speculation than actual payment volume, but it is equally true that that speculation is establishing a sufficiently high price for the currency that payments have become practically possible. The Bitcoin currency had to be worth something before it could bear any amount of real-world payment volume. This is the classic “chicken and egg” problem with new technology: new technology is not worth much until it’s worth a lot. And so the fact that Bitcoin has risen in value in part because of speculation is making the reality of its usefulness arrive much faster than it would have otherwise.

Critics of Bitcoin point to limited usage by ordinary consumers and merchants, but that same criticism was leveled against PCs and the Internet at the same stage. Every day, more and more consumers and merchants are buying, using and selling Bitcoin, all around the world. The overall numbers are still small, but they are growing quickly. And ease of use for all participants is rapidly increasing as Bitcoin tools and technologies are improved. Remember, it used to be technically challenging to even get on the Internet. Now it’s not.

The criticism that merchants will not accept Bitcoin because of its volatility is also incorrect. Bitcoin can be used entirely as a payment system; merchants do not need to hold any Bitcoin currency or be exposed to Bitcoin volatility at any time. Any consumer or merchant can trade in and out of Bitcoin and other currencies any time they want.

Why would any merchant – online or in the real world – want to accept Bitcoin as payment, given the currently small number of consumers who want to pay with it? My partner Chris Dixon recently gave this example:

“Let’s say you sell electronics online. Profit margins in those businesses are usually under 5 percent, which means conventional 2.5 percent payment fees consume half the margin. That’s money that could be reinvested in the business, passed back to consumers or taxed by the government. Of all of those choices, handing 2.5 percent to banks to move bits around the Internet is the worst possible choice. Another challenge merchants have with payments is accepting international payments. If you are wondering why your favorite product or service isn’t available in your country, the answer is often payments.”

In addition, merchants are highly attracted to Bitcoin because it eliminates the risk of credit card fraud. This is the form of fraud that motivates so many criminals to put so much work into stealing personal customer information and credit card numbers.

Since Bitcoin is a digital bearer instrument, the receiver of a payment does not get any information from the sender that can be used to steal money from the sender in the future, either by that merchant or by a criminal who steals that information from the merchant.

Credit card fraud is such a big deal for merchants, credit card processors and banks that online fraud detection systems are hair-trigger wired to stop transactions that look even slightly suspicious, whether or not they are actually fraudulent. As a result, many online merchants are forced to turn away 5 to 10 percent of incoming orders that they could take without fear if the customers were paying with Bitcoin, where such fraud would not be possible. Since these are orders that were coming in already, they are inherently the highest margin orders a merchant can get, and so being able to take them will drastically increase many merchants’ profit margins.

Bitcoin’s antifraud properties even extend into the physical world of retail stores and shoppers.

For example, with Bitcoin, the huge hack that recently stole 70 million consumers’ credit card information from the Target department store chain would not have been possible. Here’s how that would work:

You fill your cart and go to the checkout station like you do now. But instead of handing over your credit card to pay, you pull out your smartphone and take a snapshot of a QR code displayed by the cash register. The QR code contains all the information required for you to send Bitcoin to Target, including the amount. You click “Confirm” on your phone and the transaction is done (including converting dollars from your account into Bitcoin, if you did not own any Bitcoin).

Target is happy because it has the money in the form of Bitcoin, which it can immediately turn into dollars if it wants, and it paid no or very low payment processing fees; you are happy because there is no way for hackers to steal any of your personal information; and organized crime is unhappy. (Well, maybe criminals are still happy: They can try to steal money directly from poorly-secured merchant computer systems. But even if they succeed, consumers bear no risk of loss, fraud or identity theft.)

Finally, I’d like to address the claim made by some critics that Bitcoin is a haven for bad behavior, for criminals and terrorists to transfer money anonymously with impunity. This is a myth, fostered mostly by sensationalistic press coverage and an incomplete understanding of the technology. Much like email, which is quite traceable, Bitcoin is pseudonymous, not anonymous. Further, every transaction in the Bitcoin network is tracked and logged forever in the Bitcoin blockchain, or permanent record, available for all to see. As a result, Bitcoin is considerably easier for law enforcement to trace than cash, gold or diamonds.

What’s the future of Bitcoin?

Bitcoin is a classic network effect, a positive feedback loop. The more people who use Bitcoin, the more valuable Bitcoin is for everyone who uses it, and the higher the incentive for the next user to start using the technology. Bitcoin shares this network effect property with the telephone system, the web, and popular Internet services like eBay and Facebook.

In fact, Bitcoin is a four-sided network effect. There are four constituencies that participate in expanding the value of Bitcoin as a consequence of their own self-interested participation. Those constituencies are (1) consumers who pay with Bitcoin, (2) merchants who accept Bitcoin, (3) “miners” who run the computers that process and validate all the transactions and enable the distributed trust network to exist, and (4) developers and entrepreneurs who are building new products and services with and on top of Bitcoin.

All four sides of the network effect are playing a valuable part in expanding the value of the overall system, but the fourth is particularly important.

All over Silicon Valley and around the world, many thousands of programmers are using Bitcoin as a building block for a kaleidoscope of new product and service ideas that were not possible before. And at our venture capital firm, Andreessen Horowitz, we are seeing a rapidly increasing number of outstanding entrepreneurs – not a few with highly respected track records in the financial industry – building companies on top of Bitcoin.

For this reason alone, new challengers to Bitcoin face a hard uphill battle. If something is to displace Bitcoin now, it will have to have sizable improvements and it will have to happen quickly. Otherwise, this network effect will carry Bitcoin to dominance.

One immediately obvious and enormous area for Bitcoin-based innovation is international remittance. Every day, hundreds of millions of low-income people go to work in hard jobs in foreign countries to make money to send back to their families in their home countries – over $400 billion in total annually, according to the World Bank. Every day, banks and payment companies extract mind-boggling fees, up to 10 percent and sometimes even higher, to send this money.

Switching to Bitcoin, which charges no or very low fees, for these remittance payments will therefore raise the quality of life of migrant workers and their families significantly. In fact, it is hard to think of any one thing that would have a faster and more positive effect on so many people in the world’s poorest countries.

Moreover, Bitcoin generally can be a powerful force to bring a much larger number of people around the world into the modern economic system. Only about 20 countries around the world have what we would consider to be fully modern banking and payment systems; the other roughly 175 have a long way to go. As a result, many people in many countries are excluded from products and services that we in the West take for granted. Even Netflix, a completely virtual service, is only available in about 40 countries. Bitcoin, as a global payment system anyone can use from anywhere at any time, can be a powerful catalyst to extend the benefits of the modern economic system to virtually everyone on the planet.

And even here in the United States, a long-recognized problem is the extremely high fees that the “unbanked” — people without conventional bank accounts – pay for even basic financial services. Bitcoin can be used to go straight at that problem, by making it easy to offer extremely low-fee services to people outside of the traditional financial system.

A third fascinating use case for Bitcoin is micropayments, or ultrasmall payments. Micropayments have never been feasible, despite 20 years of attempts, because it is not cost effective to run small payments (think $1 and below, down to pennies or fractions of a penny) through the existing credit/debit and banking systems. The fee structure of those systems makes that nonviable.

All of a sudden, with Bitcoin, that’s trivially easy. Bitcoins have the nifty property of infinite divisibility: currently down to eight decimal places after the dot, but more in the future. So you can specify an arbitrarily small amount of money, like a thousandth of a penny, and send it to anyone in the world for free or near-free.

Think about content monetization, for example. One reason media businesses such as newspapers struggle to charge for content is because they need to charge either all (pay the entire subscription fee for all the content) or nothing (which then results in all those terrible banner ads everywhere on the web). All of a sudden, with Bitcoin, there is an economically viable way to charge arbitrarily small amounts of money per article, or per section, or per hour, or per video play, or per archive access, or per news alert.

Another potential use of Bitcoin micropayments is to fight spam. Future email systems and social networks could refuse to accept incoming messages unless they were accompanied with tiny amounts of Bitcoin – tiny enough to not matter to the sender, but large enough to deter spammers, who today can send uncounted billions of spam messages for free with impunity.

Finally, a fourth interesting use case is public payments. This idea first came to my attention in a news article a few months ago. A random spectator at a televised sports event held up a placard with a QR code and the text “Send me Bitcoin!” He received $25,000 in Bitcoin in the first 24 hours, all from people he had never met. This was the first time in history that you could see someone holding up a sign, in person or on TV or in a photo, and then send them money with two clicks on your smartphone: take the photo of the QR code on the sign, and click to send the money.

Think about the implications for protest movements. Today protesters want to get on TV so people learn about their cause. Tomorrow they’ll want to get on TV because that’s how they’ll raise money, by literally holding up signs that let people anywhere in the world who sympathize with them send them money on the spot. Bitcoin is a financial technology dream come true for even the most hardened anticapitalist political organizer.

The coming years will be a period of great drama and excitement revolving around this new technology.

For example, some prominent economists are deeply skeptical of Bitcoin, even though Ben S. Bernanke, formerly Federal Reserve chairman, recently wrote that digital currencies like Bitcoin “may hold long-term promise, particularly if they promote a faster, more secure and more efficient payment system.” And in 1999, the legendary economist Milton Friedman said: “One thing that’s missing but will soon be developed is a reliable e-cash, a method whereby on the Internet you can transfer funds from A to B without A knowing B or B knowing A – the way I can take a $20 bill and hand it over to you, and you may get that without knowing who I am.”

Economists who attack Bitcoin today might be correct, but I’m with Ben and Milton.

Further, there is no shortage of regulatory topics and issues that will have to be addressed, since almost no country’s regulatory framework for banking and payments anticipated a technology like Bitcoin.

But I hope that I have given you a sense of the enormous promise of Bitcoin. Far from a mere libertarian fairy tale or a simple Silicon Valley exercise in hype, Bitcoin offers a sweeping vista of opportunity to reimagine how the financial system can and should work in the Internet era, and a catalyst to reshape that system in ways that are more powerful for individuals and businesses alike.

↧

Clone Angry Birds with SpriteBuilder

January 21, 2014, 9:39 am

≫ Next: 2014 Gates Annual Letter: Myths About Foreign Aid - Gates Foundation

≪ Previous: Why Bitcoin Matters - NYTimes.com

Comments:" Clone Angry Birds with SpriteBuilder "

URL:https://www.makegameswith.us/tutorials/getting-started-with-spritebuilder/

We're going to give you an outline of how you will use SpriteBuilder to clone Angry Birds and what you will learn along the way!

Create a new XCode project in SpriteBuilder and verify that XCode and SpriteBuilder have been successfully installed.

Learn how to navigate the SpriteBuilder interface.

Learn how to add art assets to your project and manage them in SpriteBuilder!

Use the art in our art pack to animate a bear with SpriteBuilder's new Timeline feature!

Learn how to connect your CCB files to your classes in XCode.

Learn how to easily create menus using Spritebuilder!

Set up the scene that will load in all your levels!

Build awesome levels easily with SpriteBuilder's drag-and-drop interface!

Learn how to use SpriteBuilder's Chipmunk integration to set up the catapult in your game.

Delve a little deeper in Chipmunk physics and learn how to release the penguins from the catapult!

Learn how to detect physics collisions in your game so you can destroy the enemy seals on impact!

SpriteBuilder has a built-in particle effect generator. Learn how to use it to add particle effects when seals are eliminated!

Description Goes Here

Now for some finishing touches. Allow the player to shoot multiple penguins!

↧

2014 Gates Annual Letter: Myths About Foreign Aid - Gates Foundation

January 21, 2014, 8:55 am

≫ Next: StrongLoop | What’s New in Node.js v0.12 – Performance Optimizations

≪ Previous: Clone Angry Birds with SpriteBuilder

Comments:" 2014 Gates Annual Letter: Myths About Foreign Aid - Gates Foundation "

↧

StrongLoop | What’s New in Node.js v0.12 – Performance Optimizations

January 21, 2014, 8:47 am

≫ Next: Yesterday, The Internet Solved a 20-year-old Mystery - On The Media

≪ Previous: 2014 Gates Annual Letter: Myths About Foreign Aid - Gates Foundation

Comments:"StrongLoop | What’s New in Node.js v0.12 – Performance Optimizations"

URL:http://strongloop.com/strongblog/performance-node-js-v-0-12-whats-new/

↧

Yesterday, The Internet Solved a 20-year-old Mystery - On The Media

January 21, 2014, 8:43 am

≫ Next: ondras/my-mind · GitHub

≪ Previous: StrongLoop | What’s New in Node.js v0.12 – Performance Optimizations

Comments:" Yesterday, The Internet Solved a 20-year-old Mystery - On The Media"

URL:http://www.onthemedia.org/story/yesterday-internet-solved-20-year-old-mystery/

Back in October, we told a story on the TLDR podcast about Daniel Drucker. Drucker was looking through his recently deceased dad's computer when he found a document that contained only joke punchlines. He turned to the website Ask Metafilter for help. Within hours, the website's users had reunited the punchlines with their long lost setups.

It looks like they've done it again.

Yesterday afternoon, a user posted a thread asking for help with a decades old family mystery:

My grandmother passed away in 1996 of a fast-spreading cancer. She was non-communicative her last two weeks, but in that time, she left at least 20 index cards with scribbled letters on them. My cousins and I were between 8-10 years old at the time, and believed she was leaving us a code. We puzzled over them for a few months trying substitution ciphers, and didn't get anywhere.

The index cards appear to just be a random series of letters, and had confounded the poster's family for years. But it only took Metafilter 15 minutes to at least partially decipher. User harperpitt quickly realized she was using the first letters of words, and that she was, in fact, writing prayers:

Was she a religious woman? The last As, as well as the AAA combo, make me think of "Amen, amen, amen." So extrapolating -- TYAGF = "Thank you Almighty God for..."It would make sense to end with "Thank you, Almighty God, for everything, Amen - Thank you, Almighty God, for everything, Amen, Amen, Amen." AGH, YES! Sorry for the double post, but:OFWAIHHBTNTKCTWBDOEAIIIHFUTDODBAFUOTAWFTWTAUALUNITBDUFEFTITKTPATGFAEAOur Father who art in Heaven, hallowed be thy name... etc etc etc

The whole thread is fascinating. You should take a look at it. You might even be able to contribute. And if you haven't heard our interview with Daniel Drucker, you can listen to it below.

↧

ondras/my-mind · GitHub

January 21, 2014, 6:41 am

≫ Next: Backblaze Blog » What Hard Drive Should I Buy?

≪ Previous: Yesterday, The Internet Solved a 20-year-old Mystery - On The Media

Comments:"ondras/my-mind · GitHub"

URL:https://github.com/ondras/my-mind

↧

Backblaze Blog » What Hard Drive Should I Buy?

January 21, 2014, 6:29 am

≫ Next: Geocodio | Ridiculously cheap bulk geocoding

≪ Previous: ondras/my-mind · GitHub

Comments:"Backblaze Blog » What Hard Drive Should I Buy?"

URL:http://blog.backblaze.com/2014/01/21/what-hard-drive-should-i-buy/

My last two blog posts were about expected drive lifetimes and drive reliability. These posts were an outgrowth of the careful work that we’ve done at Backblaze to find the most cost-effective disk drives. Running a truly unlimited online backup service for only $5 per month means our cloud storage needs to be very efficient and we need to quickly figure out which drives work.

Because Backblaze has a history of openness, many readers expected more details in my previous posts. They asked what drive models work best and which last the longest. Given our experience with over 25,000 drives, they asked which ones are good enough that we would buy them again. In this post, I’ll answer those questions.

Drive Population

At the end of 2013, we had 27,134 consumer-grade drives spinning in Backblaze Storage Pods. The breakdown by brand looks like this:

Hard Drives by Manufacturer Used by Backblaze Brand Numberof Drives Terabytes AverageAge in Years Seagate 12,765 39,576 1.4 Hitachi 12,956 36,078 2.0 Western Digital 2,838 2,581 2.5 Toshiba 58 174 0.7 Samsung 18 18 3.7

As you can see, they are mostly Seagate and Hitachi drives, with a good number of Western Digital thrown in. We don’t have enough Toshiba or Samsung drives for good statistical results.

Why do we have the drives we have? Basically, we buy the least expensive drives that will work. When a new drive comes on the market that looks like it would work, and the price is good, we test a pod full and see how they perform. The new drives go through initial setup tests, a stress test, and then a couple weeks in production. (A couple of weeks is enough to fill the pod with data.) If things still look good, that drive goes on the buy list. When the price is right, we buy it.

We are willing to spend a little bit more on drives that are reliable, because it costs money to replace a drive. We are not willing to spend a lot more, though.

Excluded Drives

Some drives just don’t work in the Backblaze environment. We have not included them in this study. It wouldn’t be fair to call a drive “bad” if it’s just not suited for the environment it’s put into.

We have some of these drives running in storage pods, but are in the process of replacing them because they aren’t reliable enough. When one drive goes bad, it takes a lot of work to get the RAID back on-line if the whole RAID is made up of unreliable drives. It’s just not worth the trouble.

The drives that just don’t work in our environment are Western Digital Green 3TB drives and Seagate LP (low power) 2TB drives. Both of these drives start accumulating errors as soon as they are put into production. We think this is related to vibration. The drives do somewhat better in the new low-vibration Backblaze Storage Pod, but still not well enough.

These drives are designed to be energy-efficient, and spin down aggressively when not in use. In the Backblaze environment, they spin down frequently, and then spin right back up. We think that this causes a lot of wear on the drive.

Failure Rates

We measure drive reliability by looking at the annual failure rate, which is the average number of failures you can expect running one drive for a year. A failure is when we have to replace a drive in a pod.

This chart has some more details that don’t show up in the pretty chart, including the number of drives of each model that we have, and how old the drives are:

Number of Hard Drives by Model at Backblaze Model Size Numberof Drives AverageAge inYears AnnualFailureRate Seagate Desktop HDD.15(ST4000DM000) 4.0TB 5199 0.3 3.8% Hitachi GST Deskstar 7K2000(HDS722020ALA330) 2.0TB 4716 2.9 1.1% Hitachi GST Deskstar 5K3000(HDS5C3030ALA630) 3.0TB 4592 1.7 0.9% Seagate Barracuda(ST3000DM001) 3.0TB 4252 1.4 9.8% Hitachi Deskstar 5K4000(HDS5C4040ALE630) 4.0TB 2587 0.8 1.5% Seagate Barracuda LP(ST31500541AS) 1.5TB 1929 3.8 9.9% Hitachi Deskstar 7K3000(HDS723030ALA640) 3.0TB 1027 2.1 0.9% Seagate Barracuda 7200(ST31500341AS) 1.5TB 539 3.8 25.4% Western Digital Green(WD10EADS) 1.0TB 474 4.4 3.6% Western Digital Red(WD30EFRX) 3.0TB 346 0.5 3.2% Seagate Barracuda XT(ST33000651AS) 3.0TB 293 2.0 7.3% Seagate Barracuda LP(ST32000542AS) 2.0TB 288 2.0 7.2% Seagate Barracuda XT(ST4000DX000) 4.0TB 179 0.7 n/a Western Digital Green(WD10EACS) 1.0TB 84 5.0 n/a Seagate Barracuda Green(ST1500DL003) 1.5TB 51 0.8 120.0%

The following sections focus on different aspects of these results.

1.5TB Seagate Drives

The Backblaze team has been happy with Seagate Barracuda LP 1.5TB drives. We’ve been running them for a long time – their average age is pushing 4 years. Their overall failure rate isn’t great, but it’s not terrible either.

The non-LP 7200 RPM drives have been consistently unreliable. Their failure rate is high, especially as they’re getting older.

1.5 TB Seagate Drives Used by Backblaze Model Size Numberof Drives AverageAge inYears AnnualFailureRate Seagate Barracuda LP(ST31500541AS) 1.5TB 1929 3.8 9.9% Seagate Barracuda 7200(ST31500341AS) 1.5TB 539 3.8 25.4% Seagate Barracuda Green(ST1500DL003) 1.5TB 51 0.8 120.0%

The Seagate Barracuda Green 1.5TB drive, though, has not been doing well. We got them from Seagate as warranty replacements for the older drives, and these new drives are dropping like flies. Their average age shows 0.8 years, but since these are warranty replacements, we believe that they are refurbished drives that were returned by other customers and erased, so they already had some usage when we got them.

Bigger Seagate Drives

The bigger Seagate drives have continued the tradition of the 1.5Tb drives: they’re solid workhorses, but there is a constant attrition as they wear out.

2.0 to 4.0 TB Seagate Drives Used by Backblaze Model Size Numberof Drives AverageAge inYears AnnualFailureRate Seagate Desktop HDD.15(ST4000DM000) 4.0TB 5199 0.3 3.8% Seagate Barracuda(ST3000DM001) 3.0TB 4252 1.4 9.8% Seagate Barracuda XT(ST33000651AS) 3.0TB 293 2.0 7.3% Seagate Barracuda LP(ST32000542AS) 2.0TB 288 2.0 7.2% Seagate Barracuda XT(ST4000DX000) 4.0TB 179 0.7 n/a

The good pricing on Seagate drives along with the consistent, but not great, performance is why we have a lot of them.

Hitachi Drives

If the price were right, we would be buying nothing but Hitachi drives. They have been rock solid, and have had a remarkably low failure rate.

Hitachi Drives Used by Backblaze Model Size Numberof Drives AverageAge inYears AnnualFailureRate Hitachi GST Deskstar 7K2000(HDS722020ALA330) 2.0TB 4716 2.9 1.1% Hitachi GST Deskstar 5K3000(HDS5C3030ALA630) 3.0TB 4592 1.7 0.9% Hitachi Deskstar 5K4000(HDS5C4040ALE630) 4.0TB 2587 0.8 1.5% Hitachi Deskstar 7K3000(HDS723030ALA640) 3.0TB 1027 2.1 0.9%

Western Digital Drives

Back at the beginning of Backblaze, we bought Western Digital 1.0TB drives, and that was a really good choice. Even after over 4 years of use, the ones we still have are going strong.

We wish we had more of the Western Digital Red 3TB drives (WD30EFRX). They’ve also been really good, but they came after we already had a bunch of the Seagate 3TB drives, and when they came out their price was higher.

Western Digital Drives Used by Backblaze Model Size Numberof Drives AverageAge inYears AnnualFailureRate Western Digital Green(WD10EADS) 1.0TB 474 4.4 3.6% Western Digital Red(WD30EFRX) 3.0TB 346 0.5 3.2% Western Digital Green(WD10EACS) 1.0TB 84 5.0 n/a

What About Drives That Don’t Fail Completely?

Another issue when running a big data center is how much personal attention each drive needs. When a drive has a problem, but doesn’t fail completely, it still creates work. Sometimes automated recovery can fix this, but sometimes a RAID array needs that personal touch to get it running again.

Each storage pod runs a number of RAID arrays. Each array stores data reliably by spreading data across many drives. If one drive fails, the data can still be obtained from the others. Sometimes, a drive may “pop out” of a RAID array but still seem good, so after checking that its data is intact and it’s working, it gets put back in the RAID to continue operation. Other times a drive may stop responding completely and look like it’s gone, but it can be reset and continue running.

Measuring the time spent in a “trouble” state like this is a measure of how much work a drive creates. Once again, Hitachi wins. Hitachi drives get “four nines” of untroubled operation time, while the other brands just get “two nines”.

Untroubled Operation of Drives by Manufacturer used at Backblaze Brand Active Trouble Number of Drives Seagate 99.72 0.28% 12459 Western Digital 99.83 0.17% 933 Hitachi 99.99 0.01% 12956

Drive Lifetime by Brand

The chart below shows the cumulative survival rate for each brand. Month by month, how many of the drives are still alive?

Hitachi does really well. There is an initial die-off of Western Digital drives, and then they are nice and stable. The Seagate drives start strong, but die off at a consistently higher rate, with a burst of deaths near the 20-month mark.

Having said that, you’ll notice that even after 3 years, by far most of the drives are still operating.

What Drives Is Backblaze Buying Now?

We are focusing on 4TB drives for new pods. For these, our current favorite is the Seagate Desktop HDD.15 (ST4000DM000). We’ll have to keep an eye on them, though. Historically, Seagate drives have performed well at first, and then had higher failure rates later.

Our other favorite is the Western Digital 3TB Red (WD30EFRX).

We still have to buy smaller drives as replacements for older pods where drives fail. The drives we absolutely won’t buy are Western Digital 3TB Green drives and Seagate 2TB LP drives.

A year and a half ago, Western Digital acquired the Hitachi disk drive business. Will Hitachi drives continue their excellent performance? Will Western Digital bring some of the Hitachi reliability into their consumer-grade drives?

At Backblaze, we will continue to monitor and share the performance of a wide variety of disk drive models. What has your experience been?

Tags: Cloud Storage, Storage Pod, TechBytes

↧

Geocodio | Ridiculously cheap bulk geocoding

January 21, 2014, 6:06 am

≫ Next: Generate a Random Name - Fake Name Generator

≪ Previous: Backblaze Blog » What Hard Drive Should I Buy?

Comments:"Geocodio | Ridiculously cheap bulk geocoding"

URL:http://geocod.io

↧

Generate a Random Name - Fake Name Generator

January 21, 2014, 5:18 am

≫ Next: Dots and Perl | Perl Hacks

≪ Previous: Geocodio | Ridiculously cheap bulk geocoding

Comments:"Generate a Random Name - Fake Name Generator"

URL:http://fakenamegenerator.com

↧

Dots and Perl | Perl Hacks

January 21, 2014, 5:29 am

≫ Next: An open letter to the Yale Community from Dean Mary Miller | Yale College

≪ Previous: Generate a Random Name - Fake Name Generator

Comments:"Dots and Perl | Perl Hacks"

URL:http://perlhacks.com/2014/01/dots-perl/

I was running a training course this week, and a conversation I had with the class reminded me that I have been planning to write this article for many months.

There are a number of operators in Perl that are made up of nothing but dots. How many of them can you name?

There are actually five dot operators in Perl. If the people on my training courses are any guide, most Perl programmers can only name two or three of them. Here’s a list. It’s in approximate order of how well-known I think the operators are (which is, coincidentally, also the order of increasing length).

One Dot

Everyone knows the one dot operator, right? It’s the concatenation operator. It converts its two operands to strings and concatenates them.

# $string contains 'one stringanother string'
my $string = 'one string' . 'another string';

It’s also sometimes useful to use it to force scalar context onto an expression. Consider the difference between these two statements.

say "Time: ", localtime;
say "Time: " . localtime;

The difference between the two statements is tiny, but the output is very different.

There’s not much more to say about the single dot.

Two Dots

Things start to get a little more interesting when we look at two dots. The two dot operator is actually two different operators depending on how it is used. Most people know that it can be used to generate a range of values.

my @numbers = (1 .. 100);
my @letters = ('a' .. 'z');

You can also use it in something like a for loop. This is an easy way to execute an operation a number of times.

do_something() for 1 .. 3;

In older versions of Perl, this could potentially eat up a lot of memory as a temporary array was created to contain the range and therefore something like

do_something() for 1 .. 1000000;

could be a problem. But in all modern versions of Perl, that temporary array is not created and the expression causes no problems.

Two dots acts as a range operator when it is used in list context. In scalar context, its behaviour is different. It becomes a different operator – the flip-flop operator.

The flip-flop operator is so-called because it flip-flops between two states. It starts as returning false and continues to do so until something “flips” it into its true state. It then continues to return true until something else “flops” it back into a false state. The cycle then repeats.

So what causes it to flip-flop between its two states? It’s the evaluation of its left and right operands. Imagine you are processing a file that contains text that you are interested in and other text that you can ignore. The start of a block that you want to process is marked with a line containing “START” and the end of the block is marked with “END”. Between and “END” marker and the next “START”, there can be lots of text that you want to ignore.

A naive way to process this would involve some kind of “process this” flag.

my $process_this = 0;
while (<$file>) {
 $process_this = 1 if /START/;
 $process_this = 0 if /END/;
 process_this_line($_) if $process_this;
}

I’ve often seen code that looks like this. The author of that code didn’t know about the flip-flop operator. The flip-flop operator encapsulates all of that logic for you. Using the flip-flop operator, the previous code becomes this:

while (<$file>) {
 process_this_line($_) if /START/ .. /END/;
}

The flip-flop operator returns false until its left operand (/START/) evaluates as true. It then returns true until its right operand (/END/) evaluates as true. And then the cycle repeats.

The flip-flop operator has one more trick up its sleeve. One common requirement is to only process certain line numbers in a file (perhaps we just want to process lines 20 to 40). If one of the operands is a constant number, then it is compared the the current record number (in $.) and the operand is fired if the line number matches. So processing lines 20 to 40 of a file is a simple as:

while (<$file>) {
 process_this_line($_) if 20 .. 40;
}

Three Dots

It was the three dot operator that triggered the conversation which reminded me to write this article. A three dot operator was added in Perl 5.12. It’s called the “yada-yada” operator. It is used to stand for unimplemented code.

sub reverse_polarity {
 # TODO: Must write this
 ...
}

Programmers have been leaving “TODO” comments in their code for decades. And they’ve been using ellipsis (a.k.a. three dots) to signify unimplemented code for just as long. But now you can actually put three dots in your source code and it will compile and run. So what’s the benefit of this over just leaving a TODO comment? Well, what happens if you call a function that just contains a comment? The function executes and does nothing. You might not realise that you haven’t implemented that function yet. With the yada-yada operator standing in for the unimplemented code, Perl throws an “unimplemented” error and you are reminded that you need to write that code before calling it.

But the yada-yada operator wasn’t the first three dot operator in Perl. There has been another one in the language for a very long time (you might say since the year dot!) And I bet very few of you know what it is.

The original three dot operator is another flip-flop operator. And the difference between it and the two dot version is subtle. It’s all to do with how many tests can be run against the same line. With the two dot version, when the operator flips to true it also checks the right-hand operand in the same iteration – meaning that it can flip from false to true and back to false again as part of the same iteration. If you use the three dot version, then once the operator flips to true, then it won’t check the right-hand operand until the next iteration – meaning that the operator can only flip once per iteration.

When does this matter? Well if our text file sometimes contains empty records that contain START and END on the same line, the difference between the two and three dot flip-flops will determine exactly which lines are processed.

If the two dot version encounters one of these empty records, it flips to true (because it matches /START/) and then flops back to false (because it matches (/END/). However, the flop to false doesn’t happen until after the expression returns a value (which is true). The net effect, therefore is that the line is printed, but the flip-flop is left in the false state and the following lines won’t be printed until one contains a START.

If the three dot version encounters one of these empty records, it also flips to true (because it matches /START/) but then doesn’t check the right-hand operand. So the line gets printed and the flip-flop remains in its true state so the following lines will continue to be printed until one contains an end.

Which is correct? Well, of course, it depends on your requirements. In this case, I expect that the two dot version gives the results that most people would expect. But the three dot version is also provided for the cases when its behaviour is required. As always, Perl gives you the flexibility to do exactly what you want.

But I suspect that the relatively small number of people who seem to know about the three dot flip-flip would indicate that its behaviour isn’t needed very often.

So, there you have them. Perl’s five dot operators – concatenation, range, flip-flop, yada-yada and another flip-flop. I hope that helps you impress people in your next Perl job interview.

Update: dakkar points out that yada-yada isn’t actually an operator. He’s absolutely right, of course, it stands in place of a statement and has no operands. But being slightly loose with our terminology here makes for a more succinct article.

↧

An open letter to the Yale Community from Dean Mary Miller | Yale College

January 21, 2014, 5:29 am

≫ Next: Article 26

≪ Previous: Dots and Perl | Perl Hacks

Comments:"An open letter to the Yale Community from Dean Mary Miller | Yale College"

URL:http://yalecollege.yale.edu/content/open-letter-yale-community-dean-mary-miller?

*** UPDATED ***

January 20, 2014

To the Yale community,

A great deal has happened since I posted my January 17 open letter regarding YBB+, so I write a second time with more information and the latest updates.

Many of you have written to me directly or posted public comments expressing your concerns that the University’s reaction to YBB+ was heavy-handed. In retrospect, I agree that we could have been more patient in asking the developers to take down information they had appropriated without permission, before taking the actions that we did. However, I disagree that Yale violated its policies on free expression in this situation.

The information at the center of this controversy is the faculty evaluation, which Yale began collecting, not as a course selection tool, but as a way of helping faculty members improve their teaching. When a faculty committee decided in 2003 to collect and post these evaluations online for student use, it gave careful consideration to the format and felt strongly that numerical data would be misleading and incomplete if they were not accompanied by student comments. The tool created by YBB+ set aside the richer body of information available on the Yale website, including student comments, and focused on simple numerical ratings. In doing so, the developers violated Yale’s appropriate use policy by taking and modifying data without permission, but, more importantly, they encouraged students to select courses on the basis of incomplete information. To claim that Yale’s effort to ensure that students received complete information somehow violated freedom of expression turns that principle on its head.

Although the University acted in keeping with its policies and principles, I see now that it erred in trying to compel students to have as a reference the superior set of data that the complete course evaluations provide. That effort served only to raise concerns about the proper use of network controls. In the end, students can and will decide for themselves how much effort to invest in selecting their courses.

Technology has moved faster than the faculty could foresee when it voted to make teaching evaluations available to students over a decade ago, and questions of who owns data are evolving before our very eyes. Just this weekend, we learned of a tool that replicates YBB+'s efforts without violating Yale’s appropriate use policy, and that leapfrogs over the hardest questions before us. What we now see is that we need to review our policies and practices. To that end, the Teaching, Learning, and Advising Committee, which originally brought teaching evaluations online, will take up the question of how to respond to these developments, and the appropriate members of the IT staff, along with the University Registrar, will review our responses to violations of University policy. We will also state more clearly the requirement/expectation for student software developers to consult with the University before creating applications that depend on Yale data, and we will create an easy means for them to do so.

I thank all who have written, either to me directly or publicly, for their thoughts and for the civility with which they expressed them.

Mary Miller
Dean of Yale College
Sterling Professor of History of Art

------------------------------------------------------------

January 17, 2014

To the Yale Community:

This past week, students in Yale College lost access to YBB+ because its developers, although acting with good intentions, used university resources without permission and violated the acceptable use policy that applies to all members of the Yale community. The timing for its users could not have been worse: over 1,000 of them had uploaded worksheets during the course selection period and relied on those worksheets to design their course schedules. And the means for shutting down the site immediately -- by blocking it -- led to charges that the university was suppressing free speech.

Free speech defines Yale's community; the people who belong to it understand that they are entitled to share their views just as they must tolerate the views of others, no matter how offensive. The right to free speech, however, does not entitle anyone to appropriate university resources. In the case of YBB+, developers were unaware that they were not only violating the appropriate use policy but also breaching the trust the faculty had put in the college to act as stewards of their teaching evaluations. Those evaluations, whose primary purpose is to inform instructors how to improve their teaching, became available to students only in recent years and with the understanding that the information they made available to students would appear only as it currently appears on Yale's sites -- in its entirety.

Members of the YCDO and the University Registrar met this week with the developers, and to good end: the developers learned more about the underlying problems with using data without permission, the importance of communicating in advance with the university on projects that require approval and cooperation, and some of the existing mechanisms for collaborating with the university, among them the Yale College Council. Administrators, for their part, heard more about the demand for better tools and guidelines for the growing number of student developers, the need for a better approach to students who violate the acceptable use policy -- in most cases unwittingly -- and the value students place on information contained in teaching evaluations. All parties agreed to work toward a positive outcome, and they remain in conversation with each other to that end.

Mary Miller
Dean of Yale College
Sterling Professor of History of Art

↧

Article 26

January 21, 2014, 5:29 am

≫ Next: Announcing New Amazon EC2 M3 Instance Sizes and Lower Prices for Amazon S3 and Amazon EBS

≪ Previous: An open letter to the Yale Community from Dean Mary Miller | Yale College

Comments:""

URL:http://nbviewer.ipython.org/url/norvig.com/ipython/Economics.ipynb

This is a simulation of an economic marketplace in which there is a population of actors, each of which has a level of wealth (a single number) that changes over time. On each time step two agents (chosen by an interaction rule) interact with each other and exchange wealth (according to a transaction rule). The idea is to understand the evolution of the population's wealth over time. My hazy memory is that this idea came from a class by Prof. Sven Anderson at Bard (any errors or misconceptions here are due to my (Peter Norvig) misunderstanding of his idea). Why this is interesting: (1) an example of using simulation to model the world. (2) Many students will have preconceptions about how economies work that will be challenged by the results shown here.

Population Distributions¶

First things first: what should our initial population look like? We will provide several distribution functions (constant, uniform, Gaussian, etc.) and a sample function, which samples N elements form a distribution and then normalizes them to have a given mean. By default we will have N=5000 actors and an initial mean wealth of 100 simoleons.

In [299]:

importrandomimportmatplotlibimportmatplotlib.pyplotaspltN=5000# Default size of populationmu=100.# Default mean of population's wealthdefsample(distribution,N=N,mu=mu):"Sample from the distribution N times, then normalize results to have mean mu."returnnormalize([distribution()for_inrange(N)],mu*N)defconstant(mu=mu):returnmudefuniform(mu=mu,width=mu):returnrandom.uniform(mu-width/2,mu+width/2)defgauss(mu=mu,sigma=mu/3):returnrandom.gauss(mu,sigma)defbeta(alpha=2,beta=3):returnrandom.betavariate(alpha,beta)defpareto(alpha=4):returnrandom.paretovariate(alpha)defnormalize(numbers,total):"Scale the numbers so that they add up to total."factor=total/float(sum(numbers))return[x*factorforxinnumbers]

In a transaction, two actors come together; they have existing wealth levels X and Y. For now we will only consider transactions that conserve wealth, so our transaction rules will decide how to split up the pot of X+Y total wealth.

In [360]:

defrandom_split(X,Y):"Take all the money in the pot and divide it randomly between X and Y."pot=X+Ym=random.uniform(0,pot)returnm,pot-mdefwinner_take_most(X,Y,most=3/4.):"Give most of the money in the pot to one of the parties."pot=X+Ym=random.choice((most*pot,(1-most)*pot))returnm,pot-mdefwinner_take_all(X,Y):"Give all the money in the pot to one of the actors."returnwinner_take_most(X,Y,1.0)defredistribute(X,Y):"Give 55% of the pot to the winner; 45% to the loser."returnwinner_take_most(X,Y,0.55)defsplit_half_min(X,Y):"""The poorer actor only wants to risk half his wealth;  the other actor matches this; then we randomly split the pot."""pot=min(X,Y)m=random.uniform(0,pot)returnX-pot/2.+m,Y+pot/2.-m

How do you decide which parties interact with each other? The rule anyone samples two members of the population uniformly and independently, but there are other possible rules, like nearby(pop, k), which choses one member uniformly and then chooses a second within k index elements away, to simulate interactions within a local neighborhood.

In [356]:

defanyone(pop):returnrandom.sample(range(len(pop)),2)defnearby(pop,k=5):i=random.randrange(len(pop))j=i+random.choice((1,-1))*random.randint(1,k)returni,(j%len(pop))defnearby1(pop):returnnearby(pop,1)

Now let's describe the code to run the simulation and summarize/plot the results. The function simulate does the work; it runs the interaction function to find two actors, then calls the transaction function to figure out how to split their wealth, and repeats this T times. The only other thing it does is record results. Every so-many steps, it records some summary statistics of the population (by default, this will be every 25 steps).

What information do we record to summarize the population? Out of the N=5000 (by default) actors, we will record the wealth of exactly nine of them: the ones, in sorted-by-wealth order that occupy the 1% spot (that is, if N=5000, this would be the 50th wealthiest actor), then the 10%, 25% 1/3, and median; and then likewise from the bottom the 1%, 10%, 25% and 1/3.

(Note that we record the median, which changes over time; the mean is defined to be 100 when we start, and since all transactions conserve wealth, the mean will always be 100.)

What do we do with these results, once we have recorded them? First we print them in a table for the first time step, the last, and the middle. Then we plot them as nine lines in a plot where the y-axis is wealth and the x-axis is time (note that when the x-axis goes from 0 to 1000, and we have record_every=25, that means we have actually done 25,000 transactions, not 1000).

In [368]:

defsimulate(population,transaction_fn,interaction_fn,T,percentiles,record_every):"Run simulation for T steps; collect percentiles every 'record_every' time steps."results=[]fortinrange(T):i,j=interaction_fn(population)population[i],population[j]=transaction_fn(population[i],population[j])ift%record_every==0:results.append(record_percentiles(population,percentiles))returnresultsdefreport(distribution=gauss,transaction_fn=random_split,interaction_fn=anyone,N=N,mu=mu,T=5*N,percentiles=(1,10,25,33.3,50,-33.3,-25,-10,-1),record_every=25):"Print and plot the results of the simulation running T steps."# Run simulationpopulation=sample(distribution,N,mu)results=simulate(population,transaction_fn,interaction_fn,T,percentiles,record_every)# Print summaryprint('Simulation: {} * {}(mu={}) for T={} steps with {} doing {}:\n'.format(N,name(distribution),mu,T,name(interaction_fn),name(transaction_fn)))fmt='{:6}'+'{:10.2f} '*len(percentiles)print(('{:6}'+'{:>10} '*len(percentiles)).format('',*map(percentile_name,percentiles)))for(label,nums)in[('start',results[0]),('mid',results[len(results)//2]),('final',results[-1])]:printfmt.format(label,*nums)# Plot resultsforlineinzip(*results):plt.plot(line)plt.show()defrecord_percentiles(population,percentiles):"Pick out the percentiles from population."population=sorted(population,reverse=True)N=len(population)return[population[int(p*N/100.)]forpinpercentiles]defpercentile_name(p):return('median'ifp==50else'{} {}%'.format(('top'ifp>0else'bot'),abs(p)))defname(obj):returngetattr(obj,'__name__',str(obj))

Finally, let's run a simulation!

In [369]:

report(gauss,random_split)

How do we interpret this? Well, we can see the mass of wealth spreading out: the rish get richer and the poor get poorer. We know the rich get richer because the blue and green lines (top 10% and top 1%) are going up: the actor in the 1% position (the guy with the least money out of the 1%, or to put it another way, the most money out of the 99%) starts with 177.13 and ends up with 447.98 (note this is not necessarily the same guy, just the guy who ends up in that position). The guy at the 10% spot also gets richer, going from 141.87 to 228.06. The 25% and 33% marks stay roughly flat, but everyone else gets poorer! The median actor loses 30% of his wealth, and the bottom 1% actor loses almost 95% of his wealth.

Effect of Starting Population¶

Now let's see if the starting population makes any difference. My vague impression is that we're dealing with ergodic Markov chains and it doesn't much matter what state you start in. But let's see:

It looks like we can confirm that the starting population doesn't matter much—if we are using the random_split rule then in the end, wealth accumulates to the top third at the expense of the bottom two-thirds, regardless of starting population.

Effect of Transaction Rule¶

Now let's see what happens when we vary the transaction rule. The random_split rule produces inequality: the actor at the bottom quarter of the population has only about a third of the mean wealth, and the actor at the top 1% spot has 4.5 times the mean. Suppose we want a society with more income equality. We could use the split_half_min rule, in which each transaction has a throttle in that the poorer party only risks half of their remaining wealth. Or we could use the redistribute rule, in which the loser of a transaction still gets 45% of the total (meaning the loser will actually gain in many transactions). Let's see what effects these rules have. In analyzing these plots, note that they have different Y-axes.

We see that the redistribute rule is very effective in reducing income inequality: the lines of the plot all converge towards the mean of 100 instead of diverging. With the split_half_min rule, inequality increases at a rate about half as fast as random_split. However, the split_half_min plot looks like it hasn't converged yet (whereas all the other plots reach convergence at about the 500 mark). Let's try running split_half_min 10 times longer:

In [372]:

report(gauss,split_half_min,T=50*N)

It looks like split_half_minstill hasn't converged, and is continuing to (slowly) drive wealth to the top 10%.

Now let's shift gears: suppose that we don't care about decreasing income inequality; instead we want to increase opportunity for some actors to become wealthier. We can try the winner_take_most or winner_take_all rules (compared to the baseline random_split):

We see that the winner_take_most rule, in which the winner of a transaction takes 3/4 of the pot, does not increase the opportunity for wealth as much as random_split, but that winner_take_all is very effective at concentrating almost all the wealth in the hands of the top 10%, and makes the top 1% 4 times as wealthy as random_split.

That suggests we look at where the breaking point is. Let's consider several different amounts for what winner takes:

In [375]:

defwinner_take_80(X,Y):returnwinner_take_most(X,Y,0.80)defwinner_take_90(X,Y):returnwinner_take_most(X,Y,0.90)defwinner_take_95(X,Y):returnwinner_take_most(X,Y,0.95)report(gauss,winner_take_80)report(gauss,winner_take_90)report(gauss,winner_take_95)

We see that winner takes 80% produces results similar to random_split, and that winner takes 95% is similar to winner takes all for the top 10%, but is much kinder to the bottom 75%.

Suppose that transactions are constrained to be local; that you can only do business with your close neighbors. Will that make income more equitable, because there will be no large, global conglomorates? Let's see:

We see that the nearby rule, which limits transactions to your 5 closest neighbors in either direction (out of 5000 total actors), has a negligible effect on the outcome. I found that fairly surprising. But the nearby1 rule, which lets you do business only with your immediate left or right neighbor does have a slight effect towards income equality. The bottom quarter still do poorly, but the top 1% only gets to about 85% of what they get under unconstrained trade.

↧

Announcing New Amazon EC2 M3 Instance Sizes and Lower Prices for Amazon S3 and Amazon EBS

January 21, 2014, 4:11 am

≫ Next: Anonyfox/node-webkit-hipster-seed · GitHub

≪ Previous: Article 26

Comments:"Announcing New Amazon EC2 M3 Instance Sizes and Lower Prices for Amazon S3 and Amazon EBS"

URL:http://aws.amazon.com/about-aws/whats-new/2014/01/21/announcing-new-amazon-ec2-m3-instance-sizes-and-lower-prices-for-amazon-s3-and-amazon-ebs/

Announcing New Amazon EC2 M3 Instance Sizes and Lower Prices for Amazon S3 and Amazon EBS

Posted on Jan 21, 2014

We are excited to announce the availability of two new Amazon EC2 M3 instance sizes, m3.medium and m3.large. We are also lowering the prices of storage for Amazon S3 and Amazon EBS in all regions, effective February 1st, 2014.

Amazon EC2 M3 instance sizes and features: We have introduced two new sizes for M3 instances: m3.medium and m3.large with 1 and 2 vCPUs respectively. We have also added SSD-based instance storage and support for instance store-backed AMIs (previously known as S3-backed AMIs) for all M3 instance sizes. M3 instances feature high frequency Intel Xeon E5-2670 (Sandy Bridge or Ivy Bridge) processors. When compared to previous generation M1 instances, M3 instances provide higher, more consistent compute performance at a lower price. These new instance sizes are available in all AWS regions, with AWS GovCloud (US) support coming soon. You can launch M3 instances as On-Demand, Reserved or Spot instance. To learn more about M3 instances, please visit the Amazon EC2 Instance Types page.

Amazon S3 storage prices are lowered up to 22%: All Amazon S3 standard storage and Reduced Redundancy Storage (RRS) customers will see a reduction in their storage costs. In the US Standard region, we are lowering S3 standard storage prices up to 22%, with similar price reductions across all other regions. The new lower prices can be found on the Amazon S3 pricing page.

Amazon EBS prices are lowered up to 50%: EBS Standard volume prices are lowered up to 50% for both storage and I/O requests. For example, in the US East region, the price for Standard volumes is now $0.05 per GB-month of provisioned storage and $0.05 per 1 million I/O requests. The new lower prices can be found on the Amazon EBS pricing page.

↧

Anonyfox/node-webkit-hipster-seed · GitHub

January 21, 2014, 3:28 am

≫ Next: Snowden-haters are on the wrong side of history | The Reinvigorated Programmer

≪ Previous: Announcing New Amazon EC2 M3 Instance Sizes and Lower Prices for Amazon S3 and Amazon EBS

Comments:"Anonyfox/node-webkit-hipster-seed · GitHub"

URL:https://github.com/Anonyfox/node-webkit-hipster-seed

node-webkit-hipster-seed

Bootstrap a crossplatform Desktop Application using tools you probably never heard of.

If you're familiar with the node.js world, this sketch should get you informed, if not: an explanation is placed below the workflow.

Workflow

0. Prerequisites

You need the following stuff installed on your machine:

Node.js & NPM (see the instructions for your operating system. Ensure that globally installed NPM modules are in your PATH!)
Git. (Brunch and Bower depend on Git to work.)
Brunch via a global npm installation: npm install -g brunch.
Bower via a global npm installation: npm install -g bower.

1. Bootstrap a new Desktop App!

brunch new https://github.com/Anonyfox/node-webkit-hipster-seed MyApp

This may take a few minutes depending on your hardware and internet connection, since this git repo will be cloned, a bunch of npm modules will be installed, including the somewhat big node-webkit, and several bower modules afterwards.

2. Develop an AngularJS App on Steroids!

cd MyApp. Place your typical application code under /app. So:

/app/styles contains all your stylesheets as LESS files. You may look into /app/styles/app.less when fine-tuning your included CSS-related components.
/app/scripts is the folder for your coffeescript application logic, especially your AngularJS stuff. The mighty AngularJS main-module is defined in /app/app.coffee and includes the angular module loader and the url routing definitions.
/app/partials contains your Jade templates which are compiled and merged into an AngularJS template module. The main index file is located at /app/index.jade and will be compiled to an actual index.html file.
/app/assets is the catch-all directory for everything else, like images or fonts. The whole directory, including the folder-hierarchy, is copied as is into the final application folder. If you want to use npm modules inside your application, install them here, and NOT in the toplevel folder! Also, the /app/assets/package.json is used to describe and build your application, NOT the toplevel /package.json!

The App-level structure is basically the same as angular-brunch-seed.

All this assembling stuff is managed for you automatically when you run the following command:

npm run compiler

While this task is running, every change in your /app folder triggers an efficient partial rebuild of the relevant files. Any bower install <frontend-module> triggers this, too.

To run your app locally, just enter:

npm run app

3. Add more modules and plugins!

Gone are the days of drag'n'droppin' your jQuery plugins from diverse websites into your script folders. Just use Bower for anything "browser related". Think of it as a NPM for the frontend. Any components installed by bower are saved in bower_components and automatically inserted in the compilation process.

4. Test ALL the things!

Since your desktop application is basically just an AngularJS app, you can use Karma, which is especially written for testing AngularJS apps end-to-end. (ToDo: configure karma to fire up node-webkit instead of chromium.)

5. Deploy your App!

When you're done building your awesome app, just type

npm run deploy

and you'll have your final application folders located in /dist for each major operating system. When performing this task the first time, it'll take several minutes to download the necessary node-webkit binaries per target system.

so far only tested on OSX. The application icon and several minor features still require some work, have a look at grunt-node-webkit-builder if you want to give a helping hand.

So, what is this?

Let's look at the sketch again:

Imagine building a Single Page App (SPA) with Angular.js, using the brunch skeleton from angular-brunch-seed.

This means you're using:

Coffeescript instead of raw Javascript.
LESS instead of plain CSS.
Jade as your HTML templating language.
Bootstrap as UI-Framework, directly integrated as AngularJS directives.
and of course: Angular.js as superior Client-MV*-Framework.

Now you want to build a real desktop application instead of just another web app. Fine! Just start your app with node-webkit instead of a http-server. Think of it as a Chromium Browser merged with Node.js in one process. Basically your final application is capable of of doing everything a modern browser can do, plus some very interesting quirks. Look at the wiki for some features.

Most important: you don't need any webserver at all. instead of doing Ajax-requests, just do what you want to do on the server directly in-place! Yeah, that's right, you can require "my-node-js-module" in your Angular.js application! Oh, and you have access to the file system as you would have in Node.js. Really, it's full blown node.js and a real chromium melted together!

Last but not least: ship your app with ease! Just type npm run deploy, and your app will be compiled for windows, osx and linux, ready to distribute. Yes, it's that easy. Kudos to grunt-node-webkit-builder for the toolchain.

TL;DR?

npm run compiler assembles your application into /_public and watches file changes.
npm run app starts your application locally.
npm run deploy builds your app for windows, osx and linux. the binaries are placed in /dist after building.
bower install <frontend-module> for any frontend-related stuff. jQuery, Angular-plugins, and so on.
npm install my-moduleinside of app/assets to install node.js modules.

Licence

MIT. Drop me a line if some of the used stuff collides with the MIT Licence.

Feedback

Just use the issues section to discuss features or report bugs.
There is a ongoing discussion on HackerNews.
If you have general questions not related to this project, you may tweet to @Hisako1337 (that's me.).

Roadmap

So far everything described should work as is, but there are some more advanced features I'd like to see:

include Apache Cordova in the build task, to build crossplatform mobile apps additionally.
include Greenworks to make it easy to sell your games on steam.
develop an automatic updating mechanism for the apps, as it is used in Google Chrome to keep the App up to date.
set up a default storage solution, probably NeDB with a thin wrapper/API.

↧

Snowden-haters are on the wrong side of history | The Reinvigorated Programmer

January 21, 2014, 2:28 am

≫ Next: Dropbox - Dropbox for Business - Dropbox for Business

≪ Previous: Anonyfox/node-webkit-hipster-seed · GitHub

Comments:"Snowden-haters are on the wrong side of history | The Reinvigorated Programmer"

URL:http://reprog.wordpress.com/2014/01/20/snowden-haters-are-on-the-wrong-side-of-history/

In the autumn on 1963, J. Edgar Hoover’s FBI, worried at Martin Luther King’s growing influence, began tapping his phones and bugging his hotel rooms. They hoped to discredit him by gaining evidence that he was a communist, but found no such evidence. But they did find evidence that he was having affairs. The FBI gathered what they considered to be the most incriminating clips, and in November 1964 they anonymously sent tapes to him along with a letter telling him to commit suicide:

White people in this country have enough frauds of their own but I am sure they don’t have one at this time anywhere near your equal. [...] You are a colossal fraud and an evil, vicious one at that. [...] you don’t believe in any personal moral principles. You [...] have turned out to be not a leader but a dissolute, abnormal moral imbecile. [...] Your “honorary” degrees, your Nobel Prize (what a grim farce) and other awards will not save you. King, there is only one thing left for you to do. You know what it is. [...] There is but one way out for you. You better take it before your filthy, abnormal fraudulent self is bared to the nation.

I seems incredible that a law-enforcement agency could write this, but it’s well documented and uncontroversial that they did.

Jump forward fifty years, and here is what NSA analysts and Pentagon insiders are saying about ubiquitous-surveillance whistleblower Edward Snowden:

“In a world where I would not be restricted from killing an American, I personally would go and kill him myself. A lot of people share this sentiment.” “I would love to put a bullet in his head. I do not take pleasure in taking another human beings life, having to do it in uniform, but he is single-handedly the greatest traitor in American history.” “His name is cursed every day over here. Most everyone I talk to says he needs to be tried and hung, forget the trial and just hang him.”

Sounds kinda familiar, doesn’t it?

Meanwhile, Marc Thiessen, conservative commentator and previously George W. Bush speech-writer, is saying this:

Amnesty? Have they lost their minds? Snowden is a traitor to his country, who is responsible for the most damaging theft and release of classified information in American history. [...] Maybe we offer him life in prison instead of a firing squad, but amnesty? That would be insanity

Today, the third Monday in January, is Martin Luther King day.

Ever notice how we don’t have a J. Edgar Hoover day?

For anyone who’s paying attention to all this, the verdict of history is already in. Fools trying to paint Snowden as a spy are really not paying attention. For the hard of thinking, here is key observation: spies do not give their material to newspapers. An actual spy would have quietly disappeared with the damaging intel, and no-one in America would ever have known anything about it. Instead, Snowden has demonstrated extraordinary courage in doing what he knew to be the right thing — revealing a threat to the American constitution that he swore to uphold — even knowing it meant that his life as he knew it was over.

It seems perfectly clear that Snowden will eventually receive a full presidential pardon and a place in the history books as an American hero. It seems extremely unlikely that Obama will have the guts to issue the pardon (though I wouldn’t necessarily rule it out); his successor might not; his successor might not. But eventually a president with the perspective of history, clearly seeing Snowden in his place alongside Martin Luther King, Daniel Ellsberg and Rosa Parks, will issue that pardon. We can only hope it will be soon enough for Snowden to enjoy a good chunk of his life back in the country he loves.

So. The verdict of history on Snowden is really not in question.

The question that remains is what side of history commentators like Marc Thiessen, and all those conveniently anonymous NSA sources, want to be on. Because at the moment, they’re setting themselves up to be this decade’s J. Edgar Hoover, George Wallace and Bull Connor.

Check out my new Doctor Who book, the Eleventh Doctor

Like this:

LikeLoading...

↧

Dropbox - Dropbox for Business - Dropbox for Business

January 21, 2014, 2:28 am

≫ Next: AMC movie theater calls FBI to arrest a Google Glass user :: The Gadgeteer

≪ Previous: Snowden-haters are on the wrong side of history | The Reinvigorated Programmer

Comments:"Dropbox - Dropbox for Business - Dropbox for Business"

URL:https://www.dropbox.com/business/

↧

AMC movie theater calls FBI to arrest a Google Glass user :: The Gadgeteer

January 20, 2014, 8:08 pm

≫ Next: Interview with Raffi Krikorian on Twitter's Infrastructure

≪ Previous: Dropbox - Dropbox for Business - Dropbox for Business

Comments:"AMC movie theater calls FBI to arrest a Google Glass user :: The Gadgeteer"

URL:http://the-gadgeteer.com/2014/01/20/amc-movie-theater-calls-fbi-to-arrest-a-google-glass-user

↧

Interview with Raffi Krikorian on Twitter's Infrastructure

January 20, 2014, 7:28 pm

≫ Next: 'OpenBSD Foundation Fundraising for 2014' - MARC

≪ Previous: AMC movie theater calls FBI to arrest a Google Glass user :: The Gadgeteer

Comments:"Interview with Raffi Krikorian on Twitter's Infrastructure"

URL:http://www.infoq.com/articles/twitter-infrastructure

Raffi Krikorian, Vice President of Platform Engineering at Twitter, gives an insight on how Twitter prepares for unexpected traffic peaks and how system architecture is designed to support failure.

InfoQ: Hi, Raffi. Would you please introduce yourself to the audience and the readers of InfoQ?

Raffi: Sure. My name is Raffi Krikorian. I'm the Vice President of Platform Engineering at Twitter. We're the team that runs basically the backend infrastructure for all of Twitter.

InfoQ: With the help of "Castle in the Sky," Twitter created a new peak TPS record. How does Twitter deal with the unpredictable peak traffic?

Raffi: Sure. So what you're referring to is the Castle in the Sky event is what we call it internally, and that was a television show that aired in Tokyo. We did our new record of around 34,000 tweets a second came into Twitter during that event. Normally, Twitter experiences something on order of 5,000 to 10,000 tweets a second, so this is pretty far out of our standard operating bounds. I think it says a few things about us. I think it says how twitter reacts to the world at large, like things happen in the world and they get reflected on Twitter. So the way that we end up preparing for something like this is really years of work beforehand, like this type of events could happen at any time without real notice. So what we end up doing is we do load tests against the Twitter infrastructure. We'd run those on the order of every month -- I don't know what the exact schedule is these days -- and then we do analyses of every single system at Twitter. So when we build architecture and we build systems at Twitter, we look at the performance of all those systems on a weekly basis to really understand what do we think the theoretical capacity of this system looks like right now on a per service basis, and then we try to understand what the theoretical capacity looks like for overall. So from that, we can decide: (1) do we have the right number of machines in production at any given time or do we need to buy more computers? (2) we can have a rational conversation on whether or not the system is operating efficiently. So if we have certain services, for example, that can only take half the number of requests a second as other services, we should go look at those and understand architecturally, are they performing correctly or do we need to make a change. So for us, the architecture to get to something like the Castle in the Sky event is a slow evolutionary process. We make a change, we see how that change reacts and how that change behaves in the system, and we look and we make a decision on the slow rolling basis of whether or not this is acceptable to us, and we make a tradeoff, like do we buy more machinery or do we write new software in order to withstand this? So while we never have experienced a Castle in the Sky-like event before, some of our load tests have pushed us to those limits before so we were comfortable knowing when it happened in real life. We're like "Yes, it actually worked."

InfoQ: Are there any emergency plans in Twitter? Will you guys do some practice at usual time, such as shut down some servers or switches?

Raffi: Yeah. So we do two different things basically as our emergency planning, maybe three, it depends how you look at it. Every system is carefully documented on what would it take to turn it on, what would it take to turn it off, so we have what we call runbooks for every single system so we understand what we would do in an emergency. We've already thought through the different types of failures. We don't believe we thought through everything, but at least the most common ones we think we've documented and we understand what we need to do. Two, we're always running tests against production, so we have a good understanding of what the system would look like when we hit it really hard so we can practice. So like we hit it really hard, teams on call, they might get a page or something might happen or a pager might go off, so we can try to decide whether or not we do need to do something differently and how to react to that. And third, we've taken some inspiration from Netflix. And Netflix has what they call their Chaos Monkey which proactively kills machines in production. We have something similar to that within Twitter so we can make sure that we didn't accidentally introduce a single point of failure somewhere. So we can randomly kill machines within the data center and make sure that the service doesn't see a blip while that's happening. All this requires us to have really good transparency into what the success rate of all the different systems are. So we have a massive board. It's a glass wall with all these graphs on it so we can really see what’s going on within Twitter. And then when these events happen, we can see in an instant like whether or not something is changing, whether it would be traffic to Twitter or whether it's a failure within a data center so that we can react to it as quickly as we can.

InfoQ: How to isolate the broken module in the system? When something goes wrong, what's your reaction at the first moment?

Raffi: Yeah. So the way that Twitter is architected these days is that a failure should stay relatively constrained to the feature that the failure occurred in. Now, of course, the deeper you get down the stack, the bigger the problem becomes. So if our storage mechanisms all of a sudden have a problem, a bunch of different systems would exhibit a behavior of something going wrong. For example, if someone made a mistake on the website, it won't affect the API these days. So the way that we know that something is going wrong again is just being able to see the different graphs of the system, and then we have alerts set up over different thresholds on a service-by-service basis. So if the success rate of the API fell below some number, a bunch of pagers immediately go off, there's always someone on call for every single service at Twitter and they can react to that as quickly as they can. Our operations team and our network command center will also see this, and they might try some really rudimentary things the equivalent is like should we turn it off and on again and see what happens on that, while the actual software developers on a second track try to really understand what is going on and wrong with the system. So operations is trying to make sure the site comes back as quickly as it can. Software development is trying to understand what actually went wrong, and do we have a bug that we need to take care of. So this is how we end up reacting to this. But, like I said, the architecture at Twitter keeps failure fairly well constrained. If we think it's going to propagate or we think that, for example, the social graph is having a problem, it's only being seen in this particular feature, the social graph team will then start immediately notifying everyone else just in case they should be on alert for something going wrong. It is very much one of our strengths these days, I like to say jokingly, is emergency management, like what do you do in a case of a disaster because it could happen at any time and my contract to the world is that Twitter will be up so you don't have to worry about it.

InfoQ: The new architecture helps a lot in stability and performance. Could you give us a brief introduction of the new architecture?

Raffi: Sure. So when I joined Twitter a couple of years ago, we ran the system on what we call the monolithic codebase. So everything you had to do with the software Twitter was in one codebase that anyone could deploy, anyone could touch, anyone could modify. So that sounds great. In theory, that's actually excellent. It means that every developer in Twitter is empowered to do the right thing. In practice however, there's a balancing act that developers then need to understand how everything actually works in order to make change. And in practical realities, the concern I would have is that the speed at which Twitter is writing new code, people don't give deep thought into the right – in just places that they haven't seen before. I think this is standard in the way developers write software. It's like I don't understand what I fully need to do to make this change, but if I change just this one line it probably gets the effect I want. I'm not saying that this is a bad behavior. It's a very prudent and expedient behavior. But this means that there is technical depth that's being built up when you do that. So what we've done instead is we've taken this monolithic codebase and broken it up into hundreds of different services that comprise Twitter. This way we can have actual real owners for every single piece of business logic and every single piece of functionality at Twitter. There's actually a team who one of their jobs is to manage photos for Twitter. There's another team who one of their jobs is to manage the URLs for Twitter so that there are subject matter experts now throughout the company, and you could consult them when you want to make a feature change that would change something where URLs work, for example. So since we've broken it up in all these different ways, we now have subject matter experts but this also allows things that we've spoken about: isolation for failure, also isolation for feature development. If you want to make a change to the way tweets work, you only have to change a certain number of systems. You don't have to change everything in Twitter anymore so we can have real good isolation both for failure and for development.

InfoQ: You have a system called Decider, what's the role of Decider in the system?

Raffi: Sure. So Decider is one of our runtime configuration mechanisms at Twitter. What I mean by that is that we can turn off features and software in Twitter without doing a deploy. So every single service at Twitter is constantly looking to the Decider system as to what are the current runtime values of Twitter right now. How that practically maps is I could say the discover homepage, for example, has a Decider value that wraps it, and that Decider value tells discover whether it's on or off right now. So I can deploy discover into Twitter and have it deployed in the state that Decider says it should be off. So this point we don't get an inconsistent state. The discover, for example, or any feature at Twitter runs across many machines. It doesn't run on one machine, so you don't want to get in the inconsistent state where some of the machines have the feature and some of them don't. So we can deploy it off using Decider and then when is on all the machines that we want it to be on, we can turn it on atomically across the data center by flipping a Decider switch. This also gives us the ability to do a percentage-based control. So I can say actually now that it's on all of the machines, I only want 50% of users to get it. I can actually make that decision as opposed to it being a side effect of the way that things are being deployed in Twitter. So this allows us to really have a runtime control over Twitter without having to push code. Pushing code is actually a dangerous thing, like the highest correlation to failure in a system like ours, not just Twitter but any big system, is software development error. So this way we can actually deploy software in a relatively safe way because it's off. Turn it on really slowly, purposely, make sure it's good and then ramp it up as fast as I want.

InfoQ: How does Twitter push code online? Would you please share the deployment process with us? For example, how many different stages, you choose daily pushing or weekly pushing or both?

Raffi: Sure. So Twitter deployment, because we have this services architecture, is really actually up to the control of every single individual team. So the onus is on the team to make sure that when they're deploying code everyone that could probably be affected by it should know that you're doing it, and the network control center should also know what you're doing just so they have a global view of the system. But it's really up to every single team to decide when and if they want to push. On average, I would say teams have a bi or tri-weekly deploy schedule. Some teams deploy every single day; some teams only deploy once a month. But the deployment process looks about the same to everybody which is: you deploy into a developing environment. This is so developers can hack on it really quickly, make changes, look at the product manager, look at the designer, make sure it does the right thing. Then we deploy into what we call a "Canary system" within Twitter, which means that it's getting live production traffic but we don't rely on its results just yet. So it's just basically loading it off to make sure it handles it performantly, and we can look at the results that would have returned and compare it and manually inspect it to make sure that it did what we thought it would do given live traffic. Our testing scenarios may not have covered all the different edge cases that the live traffic gets, so it's one way we learn to understand what the real testing scenarios should look like. Then after we go into Canary, then we deploy at Dark, then we slowly start to ramp it up to really understand what it looks like at the scale. And that ramp up could take anywhere from a day to a week actually, like we've had different products that we've ramped to a hundred in the course of a week or two. We've added different products that we've ramped up to 100% in the course of minutes. So again, it's really up to the team. And each team is responsible for their feature, is responsible for their service. So it's their call on how they want to do it, but those stages of development, canary, dark reading, ramp up by Decider is the pattern that everyone follows.

InfoQ: There are huge amounts of data in Twitter. You must have some special infrastructure (such as Gizzard and Snowflake) and methods to store the data, even processing them in real-time.

Raffi: Yeah. So that's really two different questions I think. So there is how do we ingest all this data that's coming into Twitter because Twitter is a real-time system, like I measure the latency for a tweet to get delivered in milliseconds to Twitter users. And then there's the second question of you have a lot of data. What do we do with all that data? So the first one you're right; we have systems like Snowflake, Gizzard and things like that to handle tweet ingest. Tweets are only one piece of data that comes into Twitter, obviously. We have things like favorites. We have retweets. We have people sending direct messages. People change their avatar images, their background images and things like that. So all people click on URLs; people load web pages. These are all events that are coming into Twitter. So we begin to ingest all this and log them so we can do analysis. It's a pretty hard thing. We actually have different SLAs depending on what kind of data comes in. So tweets, we measure that in milliseconds. In order to get around database locking, for example, we developed Snowflake that can generate unique IDs for us incredibly quickly and do it decentralized so that we don't have a single point of failure in generating IDs for us. We have Gizzard which handles data flowing in and sharding it as quickly as possible so that we don't have hot spots on different clusters in the system, like it actually tries to probabilistically spread the load so that databases don't get overloaded by the amount of data coming in. Again, tweets go through very fast on a system. Logs, for example, like people are clicking on things, people view tweets, have their SLA measured in minutes as opposed to milliseconds. So those go into completely different pipeline. Most of it is based around Scribe these days. So those just slowly trickle through, get aggregated, get collected and get jumped into HDFS so we can do a later analysis of them. For long-term retention, all of the data, whether it be real-time or not, ends up in HDFS and that's where we run massive like MapReduce jobs and Hadoop jobs to really understand what's going on in the system. So we try to achieve a balance of what needs to be taken care of right now especially given the onslaught of data we have and then where do we put things because this unclogged data accumulates very fast. Like if I'm generating 400 million tweets a day and Twitter has been running for a couple of years now, you can imagine the size of our corpus. So HDFS handles all that for us so then we can run these big mass of MapReduce jobs off them.

InfoQ: Twitter is an amazing place for engineers, what's the growing path of an engineer in Twitter? Especially, how to become a successful geek like you, would you please give us some advice?

Raffi: Well, I can't say I'm a successful engineer since I don't write software anymore these days. I started at Twitter as an engineer, and I've risen into this position of running a lot of engineering these days. Twitter has a couple of different philosophies and mentalities around it, but we have a career path for engineers which basically involves tackling harder and harder and harder problems. We would like to say that it doesn’t actually matter how well the feature you built does. In some cases, it does. But really like what's the level of technical thought and technical merit you've put into the project you work on. So growth through Twitter is done very much in a peer-based mechanism. So for example, to talk very concretely about promotions. To be promoted from one level to the next level at Twitter requires consensus -- not consensus but requires a bunch of engineers at that higher level to agree that yes, you've done the work needed in order to get to this level at Twitter. To help with that, managers make sure projects go to engineers that are looking for big challenges. Engineers can move between teams. They're not stuck on the tweet team, for example, or a timeline team. If an engineer says, "I want to work on the mobile team because that's interesting. I think there's career growth for me. In fact, my job as a person that manages a lot of this is to make that possible." So you can do almost whatever you want within Twitter. I tell you what my priorities are in running engineering and what the company's priorities are in either user growth or money or features you want to build. And then engineers should flow to the projects that they think they can make the biggest impact on. And then on top of that, I run a small university within Twitter that we call Twitter University. It's a group of people whose whole job is training. So for example, if an engineer wants to join the mobile team but they are a back-end Java developer, we're like "Great. We've created a training class so you can learn Android engineering or iOS engineering and you can take a one weeklong class that will get you to the place that you've committed to that codebase and then you can join that team for real." So this gives you a way to sort of expand your horizons within Twitter and a way to safely decide whether or not you want to go and try something new. So we invest in our engineers because honestly they're the backbone of the company. The engineers build the thing that we all thrive on within Twitter and the world uses, so I give them as many opportunities as I can in order to try different things and to geek out in lots of different ways.

About the Interviewee

Raffi Krikorian is Vice President of Platform Engineering at Twitter. His teams manage the business logic, scalable delivery, APIs, and authentication of Twitter's application. His group helped create the iOS 5 Twitter integration as well as the "The X Factor" + Twitter voting mechanism.

↧