Quantcast
Channel: Hacker News 50
Viewing all articles
Browse latest Browse all 9433

Damien Katz: Follow up to "The Unreasonable Effectiveness of C"

$
0
0

Comments:"Damien Katz: Follow up to "The Unreasonable Effectiveness of C""

URL:http://damienkatz.net/2013/01/follow_up_to_the_unreasonable.html


Follow up to "The Unreasonable Effectiveness of C"

My post The Unreasonable Effectiveness of C generated a ton discussion on Reddit and Hacker News, nearly 1200 comments combined as people got in to all sorts of heated arguments. I also got a bunch of private correspondence about it.

So I'm going to answer some of the most common questions, feedback and misunderstandings it's gotten.

Is C the best language for everything?

Hell no! Higher level languages, like Python and Ruby, are extremely useful and should definitely be used where appropriate. Java has a lot of advantages, C++ does too. Erlang is amazing. Most every popular language has uses where it's a better choice.

But when both raw performance and reliability are critical, C is very very hard to beat. At Couchbase we need industrial grade reliability without compromising performance.

I love me some Erlang. It's very reliable and predictable, and the whole design of the language is about robustness, even in the face of hardware failures. Just because we experienced a crash problem in the core of Erlang shouldn't tarnish it's otherwise excellent track record.

However it's not fast enough for our and our customers needs. This is key, the hard work to make our code as efficient and fast as possible in C now benefits our many thousands of Couchbase server deployments all over the world, saving a ton of money and resources. It's an investment that is payed back many, many times.

But for most projects the extra engineering cost isn't worth it. if you are building something that's only used by your organization, or small # of customers, your money is likely better spent on faster/more hardware than very expensive engineers coding, testing and debugging C code. There is a good chance you don't have the same economies of scale we do at Couchbase where the costs are spread over high # of customers.

Don't just blindly use C, understand it's own tradeoffs and if it makes sense in your situation. Erlang is quite good for us, but to stay competitive we need to move on to something faster and industrial grade for our performance oriented code. And Erlang itself is written in C.

If a big problem was C code in Erlang, why would using more C be good?

Because it's easier to debug when you don't lose context between the "application" layer and the lower level code. The big problem we've seen is when C code is getting called from higher level code in the same process, we lose all the debugging context between the higher level code and the underlying C code.

So when we were getting these crashes, we didn't have the expertise and tooling to figure out what exactly the Erlang code was doing at the moment it crashed. Erlang is highly concurrent and many different things were all being executed at the same time. We knew it had something to do with the async IO settings we were using in the VM and the opening and closing of files, but exactly what or why still eluded us.

Also, we couldn't manifest the crash with test code, though we tried, making it hard to report the issue to Erlang maintainers. We had to run the full Couchbase stack with heavy load in order to trigger the crash, and it would often take 6 or more hours before we saw it. This made debugging problematic as we had confounding factors of our own in-process C code that also could have been the source of the crashes.

In the end, we found through code inspection the problem was Erlang's disk based sorting code, the compression options it was using, and the interaction with how Erlang closes files. When Erlang closed files with the compression option it would occasionally have a race condition low down in VM that would lead to a dangling pointer and a double-free. If we hadn't lost all the context between the Erlang user code and the underlying C code, we could have tracked this problem down much sooner. We would have had a complete C stacktrace of what our code was doing when the library code crashed, allowing us to narrow down very quickly the flawed C code/modules.

Why Isn't C++ a suitable replacement for C?

Often it is, but the problem with C++ you have to be very disciplined to use it and not complicate/obfuscate your code unnecessarily. It's also not as portable to as many environments (particularly embedded), and tends to have much higher compilation and build times, which negatively affects developer productivity.

C++ is also a complicated mess, so when you adopt C++ for it's libraries and community, you have to take the good with the bad and weird to get the benefits. And there is a lot of disagreement what constitutes bad or weird. Your sane subset of the language is very likely to be at odds with others ideas of a sane subset. C has this problem to a much much smaller degree.

What about Go as a replacement for C?

Perhaps someday. Right now Go is far slower than C. It's also doesn't give as good of control over memory since it's garbage collected. It's not as portable, and you also can't host Go code in other environments or language VMs, limiting what you can do with your code.

Go however has a lot of momentum and a very bright future, they've made some very nice and pragmatic choices in it's design. If it continues to flourish I expect every objection I listed, except for the garbage collection, will eventually be addressed.

What about D as a replacement for C?

It's not there for the same reasons as Go. It's possible that someday it will be suitable, but I'm less optimistic about it strictly from a momentum perspective, it doesn't have a big backer like Google and doesn't seem to be growing very rapidly. But perhaps it will get there someday.

Is there anything else that could replace C?

I don't know a lot of what's out there on the horizon, and there are some efforts to create a better C. But for completely new languages as a replacement, I'm most hopeful and optimistic about Mozilla's Rust. It's designed to be fast and portable, callable from any language/environment much like C, with no garbage collection yet still safe from buffer overruns, leaks and race conditions. It also has Erlang style concurrency features built in.

But it's a very young and rapidly evolving language. The performance is not yet close to C. The syntax might be too foreign for the masses to ever hit the mainstream, and it may suffer the same niche fate as Erlang because of that.

However if Rust achieves its stated goals, C-like performance but safe with Erlang concurrency and robustness built in, it would be the language of my dreams. I'll be watching its progress very closely.

That's just, like, your opinion, man

Yes, my post was an opinion piece.

But I'm not new to this programming game. I've done this professionally since 1995.

I've coded GUIs using very high level languages likes HTML/Javascript and Visual Basic, and with lower level languages like Java, C and C++.

I've built a ton of backend code in C, C++ and Erlang. I've written in excess of 100k lines of C and C++ code. I've easily read, line by line, 300k lines of C code.

I've written a byte code VM in C++ that's been deployed on 100 million+ desktops and 100's of thousands of servers. I used C++ inheritance, templates, exceptions, custom memory allocation and a bunch of other features I thought were very cool at the time. Now I feel bad for the people who have to maintain it.

I've fixed many bugs in MySQL and it's C++ codebase. I wrote the Enterprise Edition thread pooling and evented network IO feature that increases client scalability by over an order of magnitude.

Also created and wrote, from scratch, Apache CouchDB, including the storage engine & tail append btrees, incremental Map/Reduce indexer and query engine, master/master replication with conflict management, and the HTTP API, plus a zillion of small details necessary to make it all work.

In short, I have substantial real world experience in projects used by millions of people everyday. Maybe I know what I'm talking about.

So while most of what I wrote is my opinion and difficult to back up with hard data, it's born from being cut so many times with the newest and coolest stuff. My view of C has changed over the years, and I used to think the older guys who loved C were just behind the times. Now I see why many of them felt that way, they saw what is traded away when you stray from the simple and effective.

Think about the most widely used backend projects around and see how they are able to get both reliability and performance. Chances are, they are using plain C. That's not just a coincidence.

Follow me on Twitter for more of my coding opinions and updates on Couchbase progress.

Posted January 17, 2013 11:27 AM


Viewing all articles
Browse latest Browse all 9433

Trending Articles