Quantcast
Channel: Hacker News 50
Viewing all 9433 articles
Browse latest View live

Fascinating Graphs Show How Reddit Got Huge by Going Mainstream | Wired Design | Wired.com

$
0
0

Comments:"Fascinating Graphs Show How Reddit Got Huge by Going Mainstream | Wired Design | Wired.com"

URL:http://www.wired.com/design/2014/01/the-gentrification-of-reddit-in-a-few-great-graphs/


A detail of Randy Olson’s stacked area chart of reddit. Fuuuuuuu. Image: Randy Olson

A detail of Randy Olson’s stacked area chart of reddit. Fuuuuuuu. Image: Randy Olson

Here's 2006. In the site's early days, Reddit.com was the main attraction, but in February they added the first topical subreddits. Image: Randy Olson

Here's 2006. In the site's early days, Reddit.com was the main attraction, but in February they added the first topical subreddits. Image: Randy Olson

2007. Nerd-friendly subreddits thrive. A celebrity-centric Lipstick.com section is replaced by r/entertainment. Image: Randy Olson

2007. Nerd-friendly subreddits thrive. A celebrity-centric Lipstick.com section is replaced by r/entertainment. Image: Randy Olson

2008. The year of Reddit's "Cambrian Explosion," as Olson puts it. Why? This was the first point Reddit allowed users to create their own subreddits. Image: Randy Olson

2008. The year of Reddit's "Cambrian Explosion," as Olson puts it. Why? This was the first point Reddit allowed users to create their own subreddits. Image: Randy Olson

2009. The subredditification continues. Olson's not sure what caused the /r/reddit.com spike in the middle of the year. Image: Randy Olson

2009. The subredditification continues. Olson's not sure what caused the /r/reddit.com spike in the middle of the year. Image: Randy Olson

2010. Olson calls this "the year gamers and porn invaded," partially thanks to an influx of users from the then-recently-defunct Digg.com Image: Randy Olson

2010. Olson calls this "the year gamers and porn invaded," partially thanks to an influx of users from the then-recently-defunct Digg.com Image: Randy Olson

2011. The year of the memes. Also the year mods pulled the plug on the catchall subreddit r/reddit.com. Image: Randy Olson

2011. The year of the memes. Also the year mods pulled the plug on the catchall subreddit r/reddit.com. Image: Randy Olson

2012. The Reddit of today, dominated by images and other more mainstream, accessible content. Image: Randy Olson

2012. The Reddit of today, dominated by images and other more mainstream, accessible content. Image: Randy Olson

A detail of Randy Olson’s stacked area chart of reddit. Fuuuuuuu. Image: Randy Olson A detail of Randy Olson’s stacked area chart of reddit. Fuuuuuuu. Image: Randy Olson Here's 2006. In the site's early days, Reddit.com was the main attraction, but in February they added the first topical subreddits. Image: Randy Olson Here's 2006. In the site's early days, Reddit.com was the main attraction, but in February they added the first topical subreddits. Image: Randy Olson 2007. Nerd-friendly subreddits thrive. A celebrity-centric Lipstick.com section is replaced by r/entertainment. Image: Randy Olson 2007. Nerd-friendly subreddits thrive. A celebrity-centric Lipstick.com section is replaced by r/entertainment. Image: Randy Olson 2008. The year of Reddit's "Cambrian Explosion," as Olson puts it. Why? This was the first point Reddit allowed users to create their own subreddits. Image: Randy Olson 2008. The year of Reddit's "Cambrian Explosion," as Olson puts it. Why? This was the first point Reddit allowed users to create their own subreddits. Image: Randy Olson 2009. The subredditification continues. Olson's not sure what caused the /r/reddit.com spike in the middle of the year. Image: Randy Olson 2009. The subredditification continues. Olson's not sure what caused the /r/reddit.com spike in the middle of the year. Image: Randy Olson 2010. Olson calls this "the year gamers and porn invaded," partially thanks to an influx of users from the then-recently-defunct Digg.com Image: Randy Olson 2010. Olson calls this "the year gamers and porn invaded," partially thanks to an influx of users from the then-recently-defunct Digg.com Image: Randy Olson 2011. The year of the memes. Also the year mods pulled the plug on the catchall subreddit r/reddit.com. Image: Randy Olson 2011. The year of the memes. Also the year mods pulled the plug on the catchall subreddit r/reddit.com. Image: Randy Olson 2012. The Reddit of today, dominated by images and other more mainstream, accessible content. Image: Randy Olson 2012. The Reddit of today, dominated by images and other more mainstream, accessible content. Image: Randy Olson

Recently, we were treated to a clever interactive map of the popular news site reddit, created by Michigan State University PhD student Randy Olson. Instead of one big group of horny programmers, Olson’s map revealed a vast constellation of communities dedicated to all sorts of different topics. With these graphs, Olson shows how reddit developed into that diverse ecosystem. The keystone species? Sure enough: Horny programmers!

The most revealing image from Olson’s research is this stacked area image of the site’s 24 most popular subreddits over the years. It gives us a stratigraphic look at reddit’s past. Olson calls it the evolution of reddit. You could also say it shows the site’s gentrification. Layer by layer, we see the influx of safe, accessible content that has allowed reddit to expand to its massive size today.

Still, the graph shows the site’s beginnings in the primordial muck of porn and programming. The r/NSFW subreddit was one of the first, Olson explains (though he notes that the site was never 100 percent NSFW; his graph leaves out /r/reddit.com for purposes of legibility.)

A stacked area graph showing the evolution of Reddit. Image: Randy Olson

The biggest change came around 2008, the year reddit gave users the ability to create subreddits. This introduction led to reddit’s “Cambrian Explosion,” as Olson puts it. The year starts with an uptick in politics, coincident with the run-up to the 2008 presidential election. Around that same time, you see the supernova of activity that gave us the reddit we know today: subreddits “WTF,” “funny,” “Ask reddit,” and more.

These days, Olson points out, four of the five top subreddits are dedicated to sharing images. “From this brief survey, it becomes abundantly clear that the primary content of reddit nowadays is pictures and videos,” he writes on the project page. “This trend makes sense, too: Pictures are easy content to produce and take only a few seconds to look at, enjoy, and upvote.”

In other words, reddit’s media diet has transformed along with the rest of the internet’s. Even amongst the site’s one-step-ahead-of-the-mainstream audience, quick hits, funny pictures, and cute cat videos rule the day.


Wave Group Video Chat App For Sale

$
0
0

Comments:"Wave Group Video Chat App For Sale"

URL:http://www.trywaveapp.com/techsale/


Here's the offer:

  • Exclusive ownership of our proprietary multi-party video chat technology for iPhone. This is 5k lines of Objective-C on the phone, and 4k lines of backend Python+Cython. Getting this right required expertise in video and audio encoding, deep systems-level understanding of the iPhone platform, years of networking and distributed systems experience, and a hefty dose of creativity.
  • Google Hangouts and ooVoo don't work for group chat. No existing apps work well when chatting with groups on mobile devices. They all get slow, choppy and unusable as you add participants to the chat. We figured out how to make it work well with up to 6.
  • Try it for yourself in the App Store. The purchaser will receive the latest code, and full development history, for all the frontend and backend code that implements this app.
  • As a bonus, Lincoln will come on-site for a week to help you integrate this technology into your codebase. Lincoln is the CTO and primary developer of the technology.

Bid Now for the Code + 1 Week Consulting Integration

Contact Lincoln by email lincoln@trywaveapp.com or in the chat box below with questions.

Bid Now on eBay

Details & Technical Specifications

See the discussion on Hacker News here.

What you don't get:
  • Sorry, you don't get to hire the founders. We're starting a new project.
  • We are keeping ownership of the Wave brand, and shutting down the app. So you don't get the name, logo, or userbase. (We want to reuse the penguin logo and name, and the userbase is only in the 4 figures)
  • Since we will be working on a new project, we can't give technical support for the code beyond the one week integration period. (But you can pay for 1 more week of Lincoln's time - see below.)
More details about what you get:
  • You'll get access to the Git repos for our source code:
    • iPhone app code - 15k lines of objective-c, plus libraries. The code to support video chatting is about 5k lines.
    • Backend video server code - 3k lines of Python and 1k lines of Cython, plus libraries.
    • Backend Parse Cloud Code - 1k lines of Javascript that runs on Parse. Very little of this (only about 100 lines, at most) supports video chat.
  • With the source code comes our proprietary implementations of live drawing on top of the video, and hardware h.264 encoding. These were also tricky to get right, and are probably of value on their own.
  • One working week of Lincoln's time for integration and training. This can be flexibly scheduled. If you aren't based in NYC, you will be responsible for up to $1000 of Lincoln's travel costs.
  • The option to buy another week of Lincoln's time at the price of $20k plus travel costs.
Technical specs
  • Our tech implements server-mediated video chat for up to 6 participants. It requires some processing power on the server, but it is designed with a "scale-out" architecture so you should be able to get it running cheaply at scale without major rewrites. It's currently iOS only, but it should work to copy the iPhone code's structure to implement clients for all relevant platforms -- Android, WinPhone, Blackberry, web, and/or desktop. Performance will be critical for slower devices, so you'll need to implement it carefully.
  • The video codec is h.264 at 144x144. (We experimented with up to 320x320 and found that with current connection quality on phones, especially on urban wifi and 3G, sticking with a low-resolution default tends to produce the best user experience.)
  • We encode video in hardware on the phone (but decode it in software).
  • The audio codec is Opus at 16 KHz.
  • We use UDP for most of the protocol. Our protocol is based on RTP, but we modified it in incompatible ways. We don't use SIP at all. We use TCP for the live drawing.

Copyright © Chime Inc, 2013

All Norwegians become crown millionaires, in oil saving landmark | Reuters

$
0
0

Comments:" All Norwegians become crown millionaires, in oil saving landmark | Reuters "

URL:http://www.reuters.com/article/2014/01/08/us-norway-millionaires-idUSBREA0710U20140108


By Alister Doyle

OSLOWed Jan 8, 2014 12:25pm EST

OSLO (Reuters) - Everyone in Norway became a theoretical crown millionaire on Wednesday in a milestone for the world's biggest sovereign wealth fund that has ballooned thanks to high oil and gas prices.

Set up in 1990, the fund owns around 1 percent of the world's stocks, as well as bonds and real estate from London to Boston, making the Nordic nation an exception when others are struggling under a mountain of debts.

A preliminary counter on the website of the central bank, which manages the fund, rose to 5.11 trillion crowns ($828.66 billion), fractionally more than a million times Norway's most recent official population estimate of 5,096,300.

It was the first time it reached the equivalent of a million crowns each, central bank spokesman Thomas Sevang said.

Not that Norwegians will be able to access or spend the money, squirreled away for a rainy day for them and future generations. Norway has resisted the temptation to splurge all the windfall since striking oil in the North Sea in 1969.

Finance Minister Siv Jensen told Reuters the fund, called the Government Pension Fund Global, had helped iron out big, unpredictable swings in oil and gas prices. Norway is the world's number seven oil exporter.

"Many countries have found that temporary large revenues from natural resource exploitation produce relatively short-lived booms that are followed by difficult adjustments," she said in an email.

The fund, equivalent to 183 percent of 2013 gross domestic product, is expected to peak at 220 percent around 2030.

"The fund is a success in the sense that parliament has managed to put aside money for the future. There are many examples of countries that have mot managed that," said Oeystein Doerum, chief economist at DNB Markets.

Norway has sought to avoid the boom and bust cycle by investing the cash abroad, rather than at home. Governments can spend 4 percent of the fund in Norway each year, slightly more than the annual return on investment.

Still, in Norway, oil wealth may have made the state reluctant to make reforms or cut subsidies unthinkable elsewhere. Farm subsidies allow farmers, for instance, to keep dairy cows in heated barns in the Arctic.

It may also have made some Norwegians reluctant to work. "One in five people of working age receives some kind of social insurance instead of working," Doerum said, despite an official unemployment rate of 3.3 percent.

(Reporting by Alister Doyle; Editing by Alison Williams)

Airlock - Facebook's mobile A/B testing framework | Engineering Blog | Facebook Code | Facebook

$
0
0

Comments:"Airlock - Facebook's mobile A/B testing framework | Engineering Blog | Facebook Code | Facebook"

URL:https://code.facebook.com/posts/520580318041111/airlock-facebook-s-mobile-a-b-testing-framework/


Two years ago, we rewrote our mobile apps on iOS and Android to use the native development stacks in place of the custom web-stack we had been developing. This gave us finer control over when and how items were downloaded, cached, and freed. It also opened up access for deeper integration into the respective operating systems and revealed a full toolbox for tuning and tweaking all systems under the hood.

Testing is an important part of our development, but in switching to native, we lost the ability to A/B test. Not every test makes it into production, but even failed tests help us understand how to improve. Losing some of this ability became a challenge.

A/B Testing

Shipping our apps on iOS and Android requires developers from many different teams to coordinate and produce a new binary packed with new features and bug fixes every four weeks. After we ship a new update, it's important for us to understand how:

• New features perform

• The fixes improved performance and reliability

• Improvements to the user interface change how people use the app and where they spend their time

In order to analyze these objectives, we needed a mobile A/B testing infrastructure that would let us expose our users to multiple versions of our apps (version A and version B), which are the same in all aspects except for some specific tests. So we created Airlock, a testing framework that lets us compare metric data from each version of the app and the various tests, and then decide which version to ship or how to iterate further.

Build from scratch

We began with the simplest experiment possible--using the A/B binning system we had for our web-stack, we constructed an experiment to test changing a chat icon into the word “Chat.” When the app started up, it would send a network request to our servers requesting the parameters for that experiment. Once the response came back, we would update the button and voilà, some employees had the icon and others had the word “Chat.” Our expectation was that the only effect it would have would be to impact the amount of messages sent and likely not by much, and that no other metrics would move.

Exposure logging

Once that version of the app went public, we waited for the data to stabilize and found that the people who saw the word “Chat” were much more engaged with the app. Had we found some secret, magic like-incentive? Sadly, no. We had arrived at a pile of bugs, the main issue being that one of the many components was incorrectly caching the value. With a system this large, the infrastructure had to be bulletproof or the data gathered was useless.

The data pipeline began with the server deciding which variant a given person belonged to. Then that value had to be packaged up and sent to the device, which would parse the response and store it. Next, the value would be used to reconfigure the UI and then it was finally available onscreen. The problem was that we were relying on the server’s categorization for our data analysis. A single bug led to a large number of people seeing a different variant than we had anticipated. The server was insisting, “I told the device to show the string!” but somewhere along the way the statement became a little fuzzy (a bug in the client storage logic).

Deployment graph for a given experiment

The graph above shows how an experiment was deployed. The light green bar is the number of people in the experiment and the dark green bar is the actual number of affected users. As we can see, the difference between the data from server and device are pretty large: at the first day, most people received the configuration but most users did not see our experiment.

This issue is coupled with the problem of not only what the device is told, but what our data analysts need to know when the device receives the information and shows it correctly in UI. Even if the information arrives correctly, there is a delay during which the UI is incorrect. We solved this by adding a two-way handshake. The device requests the data for an experiment and the server logs the response that it sends out. When the client actually uses data, it logs it on the server. Therefore, even if someone does not see what we want, we can still perform the correct analysis (but have to beware of selection bias or skew if the distribution ends up uneven for any reason).

Scale it up

Over the course of a few months, we had to scale our system from supporting two experiments to supporting many throughout the entire app. The experiment that drove the evolution of Airlock was a project we started with the intent of evolving and simplifying the navigation model within our apps. Over the course of a few months we tested making the left-hand drawer narrower with only icons, putting a tab-bar at the bottom of the screen with your timeline in it, combining friend requests and notifications into one tab in a tab-bar, and eventually landing on the tab-bar design that is now the user interface for the Facebook for iPhone app. We built a bunch of different versions of the UI that didn't make the cut, but that's the nature of testing.

The creation of Airlock helped us ship a navigation model that feels slicker, is easier to use one-handed, and keeps better track of your state in the app. This tool has allowed us to now scale the framework to support 10 or 15 different variations of a single experiment and put it in the hands of millions of people using our apps. We had to relearn the rules of not letting one experiment pollute another, keeping some experiments dependent and others exclusive, and how to ensure the logging was correct in the control group. The last bit was tricky because sometimes a control group means that some piece of UI doesn’t exist. How does one log that someone did not go to a place that doesn’t exist? Here we learned to log both the decision on which UI to construct and then separately to log the interaction with it.

As the framework scaled to support more experiments, the amount of parameters requests, data logging, and client-side computation began to rise very quickly. The framework needed to be fast on the client in order to have experiments ready without blocking any of the startup path, so we optimized the cold-start performance on our apps was so that basic, critical configurations could be loaded when the app starts and all heavy work was deferred until after the app's UI is displayed. Likewise, we had to tune the interaction with the device and the server, minimizing the data flow and simplifying the amount of data processing on both ends.

Airlock has made it possible to test on native and improve our apps faster than ever. With the freedom to test, re-test, and evaluate the results, we’re looking forward to building better and better tests and user experiences.

Implementing a JIT Compiler with Haskell and LLVM ( Stephen Diehl )

$
0
0

Comments:"Implementing a JIT Compiler with Haskell and LLVM ( Stephen Diehl )"

URL:http://www.stephendiehl.com/llvm/#chapter-1-introduction


Adapted by Stephen Diehl ( @smdiehl )

This is an open source project hosted on Github. Corrections and feedback always welcome.

The written text licensed under the LLVM License and is adapted from the original LLVM documentation. The new Haskell source is released under the MIT license.

Welcome to Haskell version of "Implementing a language with LLVM" tutorial. This tutorial runs through the implementation of a simple language, showing how fun and easy it can be. This tutorial will get you up and started as well as help to build a framework you can extend to other languages. The code in this tutorial can also be used as a playground to hack on other LLVM specific things. This tutorial is the Haskell port of the C++, Python and OCaml Kaleidoscope tutorials. Although most of the original meaning of the tutorial is preserved, most of the text has been rewritten to incorporate Haskell.

An intermediate knowledge of Haskell is required. We will make heavy use of monads and transformers without pause for exposition. If you are not familiar with monads, applicatives and transformers then it is best to learn these topics before proceeding. Conversely if you are an advanced Haskeller you may notice the lack of modern techniques which could drastically simplify our code. Instead we will shy away from advanced patterns since the purpose is to instruct in LLVM and not Haskell programming. Whenever possible we will avoid cleverness and just do the "stupid thing".

The overall goal of this tutorial is to progressively unveil our language, describing how it is built up over time. This will let us cover a fairly broad range of language design and LLVM-specific usage issues, showing and explaining the code for it all along the way, without overwhelming you with tons of details up front.

It is useful to point out ahead of time that this tutorial is really about teaching compiler techniques and LLVM specifically, not about teaching modern and sane software engineering principles. In practice, this means that we'll take a number of shortcuts to simplify the exposition. If you dig in and use the code as a basis for future projects, fixing these deficiencies shouldn't be hard.

I've tried to put this tutorial together in a way that makes chapters easy to skip over if you are already familiar with or are uninterested in the various pieces. The structure of the tutorial is:

  • Chapter #1: Introduction to the Kaleidoscope language, and the definition of its Lexer - This shows where we are going and the basic functionality that we want it to do. LLVM obviously works just fine with such tools, feel free to use one if you prefer.

  • Chapter #2: Implementing a Parser and AST - With the lexer in place, we can talk about parsing techniques and basic AST construction. This tutorial describes recursive descent parsing and operator precedence parsing. Nothing in Chapters 1 or 2 is LLVM-specific, the code doesn't even link in LLVM at this point. :)

  • Chapter #3: Code generation to LLVM IR - With the AST ready, we can show off how easy generation of LLVM IR really is.

  • Chapter #4: Adding JIT and Optimizer Support - Because a lot of people are interested in using LLVM as a JIT, we'll dive right into it and show you the 3 lines it takes to add JIT support. LLVM is also useful in many other ways, but this is one simple and "sexy" way to show off its power. :)

  • Chapter #5: Extending the Language: Control Flow - With the language up and running, we show how to extend it with control flow operations (if/then/else and a ‘for' loop). This gives us a chance to talk about simple SSA construction and control flow.

  • Chapter #6: Extending the Language: User-defined Operators - This is a silly but fun chapter that talks about extending the language to let the user program define their own arbitrary unary and binary operators (with assignable precedence!). This lets us build a significant piece of the "language" as library routines.

  • Chapter #7: Extending the Language: Mutable Variables - This chapter talks about adding user-defined local variables along with an assignment operator. The interesting part about this is how easy and trivial it is to construct SSA form in LLVM: no, LLVM does not require your front-end to construct SSA form!

  • Chapter #8: Conclusion and other useful LLVM tidbits - This chapter wraps up the series by talking about potential ways to extend the language, but also includes a bunch of pointers to info about "special topics" like adding garbage collection support, exceptions, debugging, support for "spaghetti stacks", and a bunch of other tips and tricks.

This tutorial will be illustrated with a toy language that we'll call Kaleidoscope (derived from "meaning beautiful, form, and view" or "observer of beautiful forms"). Kaleidoscope is a procedural language that allows you to define functions, use conditionals, math, etc. Over the course of the tutorial, we'll extend Kaleidoscope to support the if/then/else construct, a for loop, user defined operators, JIT compilation with a simple command line interface, etc.

If you do not have Haskell setup, it is recommended you that you install the Haskell Platform. This will provide you with GHC, cabal and most of the Haskell libraries needed for building our compiler.

You will of course also need LLVM 3.3 (not 3.2 or earlier) installed on your system. Run the command for your Linux distribution:

$ pacman -S llvm # Arch Linux
$ apt-get install llvm # Debian/Ubuntu
$ emerge llvm # Gentoo
$ yum install llvm # SuSE Linux

The included "kaleidoscope.cabal" will install the necessary Haskell bindings. It is recommended that you you work within a sandbox:

$ cabal sandbox init
$ cabal configure
$ cabal install --only-dependencies

Because we want to keep things simple, the only datatype in Kaleidoscope is a 64-bit floating point type (aka ‘double' in C parlance). As such, all values are implicitly double precision and the language doesn't require type declarations. This gives the language a very nice and simple syntax. For example, the following simple example computes Fibonacci numbers:

# Compute the x'th fibonacci number.def fib(x)if x < 3 then1else
 fib(x-1)+fib(x-2)# This expression will compute the 40th number.
fib(40)

We also allow Kaleidoscope to call into standard library functions (the LLVM JIT makes this completely trivial). This means that we can use the ‘extern' keyword to define a function before we use it (this is also useful for mutually recursive functions). For example:

extern sin(arg);
extern cos(arg);
extern atan2(arg1 arg2);
atan2(sin(.4), cos(42))

A more interesting example is included in Chapter 6 where we write a little Kaleidoscope application that displays a Mandelbrot Set at various levels of magnification.

Lets dive into the implementation of this language!

A typical compiler pipeline will consist of several stages. The middle phase will often consist of several representations of the code to be generated known as intermediate representations.

LLVM is a statically typed intermediate representation and an associated toolchain for manipulating, optimizing and converting this intermediate form into native code. LLVM code comes in two flavors, a binary bitcode format (.bc) and assembly (.ll). The command line tools llvm-dis and 1lvm-as can be used to convert between the two forms. We'll mostly be working with the human readable LLVM assembly and will just refer to it casually as IR and reserve the word assembly to mean the native assembly that is the result of compilation. An important note is that the binary format for LLVM bitcode starts with the magic two byte sequence ( 0x42 0x43 ) or "BC".

An LLVM module consists of a sequence of toplevel mutually scoped definitions of functions, globals, type declarations, and external declarations.

Symbols used in an LLVM module are either global or local. Global symbols begin with @ and local symbols begin with %. All symbols must be defined or forward declared.

declare i32 @putchar(i32)
define i32 @add(i32 %a, i32 %b) {%1 = add i32 %a, %b
 ret i32 %1
}
define void @main() {%1 = call i32 @add(i32 0, i32 97)
 call i32 @putchar(i32 %1)
 ret void
}

A LLVM function consists of a sequence of basic blocks containing a sequence of instructions and assignment to local values. During compilation basic blocks will roughly correspond to labels in the native assembly output.

define double @main(double %x) {
entry:%0 = alloca double
 br body
body:
 store double %x, double*%0%1 = load double*%0%2 = fadd double %1, 1.000000e+00
 ret double %2
}

First class types in LLVM align very closely with machine types. Alignment and platform specific sizes are detached from the type specification in the data layout for a module.

i1 A unsigned 1 bit integer i32 A unsigned 32 bit integer i32* A pointer to a 32 bit integer i32** A pointer to a pointer to a 32 bit integer double A 64-bit floating point value float (i32) A function taking a i32 and returning a 32-bit floating point float<4 x i32> A width 4 vector of 32-bit integer values. {i32, double} A struct of a 32-bit integer and a double.<{i8*, i32}> A packed structure of a integer pointer and 32-bit integer. [4 x i32] A pointer to an array of four i32 values.

While LLVM is normally generated procedurally we can also write it by hand. For example consider the following minimal LLVM IR example.

declare i32 @putchar(i32)
define void @main() {
 call i32 @putchar(i32 42)
 ret void
}

This will compile (using llc) into the following platform specific assembly. For example march=x86-64 on a Linux system we generate output like following:

 .file "minimal.ll"
 .text
 .globl main
 .align 16, 0x90
 .type main,@function
main:
 movl $42, %edi
 jmp putchar 
.Ltmp0:
 .size main, .Ltmp0-main
 .section ".note.GNU-stack","",@progbits

What makes LLVM so compelling is it lets us write our assembly-like IR as if we had an infinite number of CPU registers and abstracts away the register allocation and instruction selection. LLVM IR also has the advantage of being mostly platform independent and retargatable, although there are some details about calling conventions, vectors, and pointer sizes which make it not entirely independent.

As part of the Clang project LLVM is very well suited for compiling C-like languages, but it is nonetheless a very adequate toolchain for compiling both imperative and functional languages. Some notable languages using LLVM include:

  • Idris - A dependently typed general purpose language
  • Rust - A general purpose systems language
  • Parakeet - Numeric specializer for Python
  • Cloudera Impala Open-source real-time query engine for Apache Hadoop
  • Disciple - A experimental Haskell-like language with effect typing
  • Haskell - GHC has a LLVM compilation path that is enabled with the -fllvm flag. The library ghc-core can be used to view the IR compilation artificats.

See src/chapter1 for the full source from this chapter.

For parsing in Haskell it is quite common to use a family of libraries known as Parser Combinators which let us write code to generate parsers which itself looks very similar to the BNF ( Backus–Naur Form ) of the parser grammar itself!

Structurally a parser combinator is a collection of higher-order functions which composes with other parsing functions as input and returns a new parser as its output. Our lexer will consist of functions which operate directly on matching string inputs and are composed with a variety of common combinators yielding the full parser. The Parsec library exposes a collection of combinators:

<|> The choice operator tries to parse the first argument before proceeding to the second. Can be chained sequentially to a generate a sequence of options. many Consumes an arbitrary number of patterns matching the given pattern and returns them as a list. many1 Like many but requires at least one match. optional Optionally parses a given pattern returning it's value as a Maybe. try Backtracking operator will let us parse ambiguous matching expressions and restart with a different pattern.

Our initial language has very simple lexical syntax.

integer: 1, -2, 42

integer ::ParserInteger
integer = Tok.integer lexer

float: 3.14, 2.71, 0.0

float ::ParserDouble
float = Tok.float lexer

identifier: a, b, foo, ncc1701d

identifier ::ParserString
identifier = Tok.identifier lexer

And several tokens which enclose other token(s) returning a compose expression.

parens ::Parser a ->Parser a
parens = Tok.parens lexersemiSep ::Parser a ->Parser [a]
semiSep = Tok.semiSep lexercommaSep ::Parser a ->Parser [a]
commaSep = Tok.commaSep lexer

Lastly our lexer requires that several tokens be reserved and not used identifiers, we reference these as separately.

reserved: def, extern

reservedOp: +, *, -, ;

reserved ::String->Parser ()
reserved = Tok.reserved lexer
reservedOp ::String->Parser ()
reservedOp = Tok.reservedOp lexer

Putting it all together we have our Lexer.hs module.

moduleLexerwhereimport Text.Parsec.String (Parser)import Text.Parsec.Language (emptyDef)importqualified Text.Parsec.Token as Toklexer ::Tok.TokenParser ()
lexer = Tok.makeTokenParser stylewhere
 ops = ["+","*","-",";"]
 names = ["def","extern"]
 style = emptyDef {
 Tok.commentLine ="#"
 , Tok.reservedOpNames = ops
 , Tok.reservedNames = names
 }integer ::ParserInteger
integer = Tok.integer lexerfloat ::ParserDouble
float = Tok.float lexerparens ::Parser a ->Parser a
parens = Tok.parens lexercommaSep ::Parser a ->Parser [a]
commaSep = Tok.commaSep lexersemiSep ::Parser a ->Parser [a]
semiSep = Tok.semiSep lexeridentifier ::ParserString
identifier = Tok.identifier lexerreserved ::String->Parser ()
reserved = Tok.reserved lexerreservedOp ::String->Parser ()
reservedOp = Tok.reservedOp lexer

The AST for a program captures its behavior in such a way that it is easy for later stages of the compiler (e.g. code generation) to interpret. We basically want one object for each construct in the language, and the AST should closely model the language. In Kaleidoscope, we have expressions, a prototype, and a function object. When parsing with Parsec we will unpack tokens straight into our AST which we define as the Expr algebraic data type:

moduleSyntaxwheretypeName=StringdataExpr=FloatDouble|BinOpOpExprExpr|VarString|CallName [Expr]|FunctionName [Expr] Expr|ExternName [Expr]deriving (Eq, Ord, Show)dataOp=Plus|Minus|Times|Dividederiving (Eq, Ord, Show)

This is all (intentionally) rather straight-forward: variables capture the variable name, binary operators capture their operation (e.g. Plus, Minus, ...), and calls capture a function name as well as a list of any argument expressions.

We create Parsec parser which will scan a input source and unpack it into our Expr type. The code composes within the Parser to generate the resulting parser which is then executed using the parse function.

moduleParserwhereimport Text.Parsecimport Text.Parsec.String (Parser)importqualified Text.Parsec.Expr as Eximportqualified Text.Parsec.Token as Tokimport Lexerimport Syntax
binary s f assoc =Ex.Infix (reservedOp s >>return (BinOp f)) assoc
table = [[binary "*"TimesEx.AssocLeft,
 binary "/"DivideEx.AssocLeft]
 ,[binary "+"PlusEx.AssocLeft,
 binary "-"MinusEx.AssocLeft]]int ::ParserExpr
int =do
 n <- integerreturn$Float (fromInteger n)floating ::ParserExpr
floating =do
 n <- floatreturn$Float nexpr ::ParserExpr
expr = Ex.buildExpressionParser table factorvariable ::ParserExpr
variable =do
 var <- identifierreturn$Var varfunction ::ParserExpr
function =do
 reserved "def"
 name <- identifier
 args <- parens $ many variable
 body <- exprreturn$Function name args bodyextern ::ParserExpr
extern =do
 reserved "extern"
 name <- identifier
 args <- parens $ many variablereturn$Extern name argscall ::ParserExpr
call =do
 name <- identifier
 args <- parens $ commaSep exprreturn$Call name argsfactor ::ParserExpr
factor = try floating<|> try int<|> try extern<|> try function<|> try call<|> variable<|> parens exprdefn ::ParserExpr
defn = try extern<|> try function<|> exprcontents ::Parser a ->Parser a
contents p =do
 Tok.whiteSpace lexer
 r <- p
 eofreturn rtoplevel ::Parser [Expr]
toplevel = many $do
 def <- defn
 reservedOp ";"return defparseExpr ::String->EitherParseErrorExpr
parseExpr s = parse (contents expr) "<stdin>" sparseToplevel ::String->EitherParseError [Expr]
parseToplevel s = parse (contents toplevel) "<stdin>" s

The driver for this simply invokes all of the compiler in a loop feeding the resulting artificats to the next iteration. We will use the haskeline library to give us readline interactions for the small REPL.

moduleMainwhereimport Parserimport Control.Monad.Transimport System.Console.Haskelineprocess ::String->IO ()
process line =dolet res = parseToplevel linecase res ofLeft err ->print errRight ex ->mapM_print exmain ::IO ()
main = runInputT defaultSettings loopwhere
 loop =do
 minput <- getInputLine "ready> "case minput ofNothing-> outputStrLn "Goodbye."Just input -> (liftIO $ process input) >> loop

In under 100 lines of code, we fully defined our minimal language, including a lexer, parser, and AST builder. With this done, the executable will validate Kaleidoscope code, print out the Haskell representation of the AST, and tell us the position information for any syntax errors. For example, here is a sample interaction:

ready> def foo(x y) x+foo(y, 4.0);Function"foo" [Var "x",Var "y"] (BinOp Plus (Var "x") (Call"foo" [Var "y",Float 4.0]))ready> def foo(x y) x+y y;Function"foo" [Var "x",Var "y"] (BinOp Plus (Var "x") (Var"y"))Var"y"ready> def foo(x y) x+y );"<stdin>"(line 1, column 18):unexpected")"expecting float, natural, "extern", "def", identifier, "(" or ";"ready> extern sin(a);Extern"sin" [Var "a"]ready> ^DGoodbye.

There is a lot of room for extension here. You can define new AST nodes, extend the language in many ways, etc. In the next installment, we will describe how to generate LLVM Intermediate Representation (IR) from the AST.

See src/chapter2 for the full source from this chapter.

This chapter illustrates how to transform the Abstract Syntax Tree, built in Chapter 2, into LLVM IR. This will demonstrate a little bit about how LLVM does things, as well as demonstrate how easy it is to use.

The LLVM bindings for Haskell are split across two packages:

  • llvm-general-pure is a pure Haskell representation of the LLVM IR.

  • llvm-general is the FFI bindings to LLVM required for constructing the C representation of the LLVM IR and performing optimization and compilation.

llvm-general-pure does not require the LLVM libraries be available on the system.

There is an older version of llvm bindings on Hackage called llvm and should likely be avoided since it has not been updated since it's development a few years ago.

As an aside the GHCi can have issues with the FFI and can lead to errors when working with llvm-general. If you end up with errors like the following, then you are likely trying to use GHCi or runhaskell and it is unable to link against your LLVM library. Instead compile with standalone ghc.

Loading package llvm-general-3.3.8.2 ... linking ... ghc: /usr/lib/llvm-3.3/lib/libLLVMSupport.a: unknown symbol `_ZTVN4llvm14error_categoryE'ghc: unable to load package `llvm-general-3.3.8.2'

We start with a new Haskell module Codegen.hs which will hold the pure code generation logic that we'll use to drive building llvm-general's AST. For simplicities sake we'll insist that all variables be of a single type, the double type.

double ::Type
double =FloatingPointType64IEEE

To start we create a new record type to hold the internal state of our code generator as we walk the AST. We'll use two records, one for the toplevel module code generation and one for basic blocks inside of function definitions.

typeSymbolTable= [(String, Operand)]dataCodegenState=CodegenState { currentBlock ::Name-- Name of the active block to append to
 , blocks ::Map.MapNameBlockState-- Blocks for function
 , symtab ::SymbolTable-- Function scope symbol table
 , blockCount ::Int-- Count of basic blocks
 , count ::Word-- Count of unnamed instructions
 , names ::Names-- Name Supply
 } derivingShowdataBlockState=BlockState { idx ::Int-- Block index
 , stack :: [NamedInstruction] -- Stack of instructions
 , term ::Maybe (NamedTerminator) -- Block terminator
 } derivingShow

We'll hold the state of the code generator inside of Codegen State monad, the Codegen monad contains a map of block names to their BlockState representation.

newtypeCodegen a =Codegen { runCodegen ::StateCodegenState a }deriving (Functor, Applicative, Monad, MonadStateCodegenState )

At the top level we'll create a LLVM State monad which will hold all code a for the LLVM module and upon evaluation will emit llvm-general Module containing the AST. We'll append to the list of definitions in the AST.Module field moduleDefinitions.

newtypeLLVM a =LLVM { unLLVM ::StateAST.Module a }deriving (Functor, Applicative, Monad, MonadStateAST.Module )runLLVM ::AST.Module->LLVM a ->AST.Module
runLLVM =flip (execState . unLLVM)emptyModule ::String->AST.Module
emptyModule label = defaultModule { moduleName = label }addDefn ::Definition->LLVM ()
addDefn d =do
 defs <- gets moduleDefinitions
 modify $ \s -> s { moduleDefinitions = defs ++ [d] }

Inside of our module we'll need to insert our toplevel definitions. For our purposes this will consist entirely of local functions and external function declarations.

define ::Type->String-> [(Type, Name)] -> [BasicBlock] ->LLVM ()
define retty label argtys body = addDefn $GlobalDefinition$ functionDefaults {
 name =Name label
 , parameters = ([Parameter ty nm [] | (ty, nm) <- argtys], False)
 , returnType = retty
 , basicBlocks = body
 }external ::Type->String-> [(Type, Name)] -> [BasicBlock] ->LLVM ()
external retty label argtys body = addDefn $GlobalDefinition$ functionDefaults {
 name =Name label
 , parameters = ([Parameter ty nm [] | (ty, nm) <- argtys], False)
 , returnType = retty
 , basicBlocks = body
 }

With our monad we'll create several functions to manipulate the current block state so that we can push and pop the block "cursor" and append instructions into the current block.

entry ::CodegenName
entry = gets currentBlockaddBlock ::String->CodegenName
addBlock bname =do
 bls <- gets blocks
 ix <- gets blockCount
 nms <- gets nameslet new = emptyBlock ix
 (qname, supply) = uniqueName bname nms
 modify $ \s -> s { blocks = Map.insert (Name qname) new bls
 , blockCount = ix +1
 , names = supply
 }return (Name qname)setBlock ::Name->CodegenName
setBlock bname =do
 modify $ \s -> s { currentBlock = bname }return bnamegetBlock ::CodegenName
getBlock = gets currentBlockmodifyBlock ::BlockState->Codegen ()
modifyBlock new =do
 active <- gets currentBlock
 modify $ \s -> s { blocks = Map.insert active new (blocks s) }current ::CodegenBlockState
current =do
 c <- gets currentBlock
 blks <- gets blockscase Map.lookup c blks ofJust x ->return xNothing->error$"No such block: "++show c

Now that we have the basic infrastructure in place we'll wrap the raw llvm-general AST nodes inside a collection of helper functions to push instructions onto the stack held within our monad.

Instructions in LLVM are either numbered sequentially (%0, %1, ...) or given explicit variable names (%a, %foo, ..). For example the arguments to the following function are named values, while the result of the add instructions unnamed.

define i32 @add(i32 %a, i32 %b) {%1 = add i32 %a, %b
 ret i32 %1
}

In the implementation of llvm-general both these types are represented in a sum type containing the constructors UnName and Name. For most of our purpose we will simply use numbered expressions and map them numbers to identifiers with in our symbol table. Every instruction added will increment the internal counter, to accomplish we add a fresh name supply.

fresh ::CodegenWord
fresh =do
 i <- gets count
 modify $ \s -> s { count =1+ i }return$ i +1

Throughout our code we will however refer named values within the module, these have a special data type Name for which we'll create a second name supply map which guarantees that our block names are unique. We'll also instantiate a IsString instance for this type so that Haskell can automatically perform the boilerplate coercions between String types.

typeNames=Map.MapStringIntuniqueName ::String->Names-> (String, Names)
uniqueName nm ns =case Map.lookup nm ns ofNothing-> (nm, Map.insert nm 1 ns)Just ix -> (nm ++show ix, Map.insert nm (ix+1) ns)instanceIsStringNamewhere
 fromString =Name. fromString

Since we can now work with named LLVM values we need to create several functions for referring to the references of these values.

local ::Name->Operand
local =LocalReferenceexternf ::Name->Operand
externf =ConstantOperand.C.GlobalReference

Our function externf will emit a named value which refers to a toplevel function (@add) in our module or will refer to an externally declared function (@putchar). For instance:

declare i32 @putchar(i32)
define i32 @add(i32 %a, i32 %b) {%1 = add i32 %a, %b
 ret i32 %1
}
define void @main() {%1 = call i32 @add(i32 0, i32 97)
 call i32 @putchar(i32 %1)
 ret void
}

Since we'd like to refer to values on the stack by named quantities we'll implement a simple symbol table as an association list letting us assign variable names to operand quantities and subsequently look them up on use.

assign ::String->Operand->Codegen ()
assign var x =do
 lcls <- gets symtab
 modify $ \s -> s { symtab = [(var, x)] ++ lcls }getvar ::String->CodegenOperand
getvar var =do
 syms <- gets symtabcaselookup var syms ofJust x ->return xNothing->error$"Local variable not in scope: "++show var

Now that we have a way of naming instructions we'll create a internal function to take a llvm -general AST node and push it on the current basic block stack. We'll return the left hand side reference of the instruction. Instructions will come in two flavors, instructions and terminators. Every basic block has a unique terminator and every last basic block in a function must terminate in a ret.

instr ::Instruction->CodegenOperand
instr ins =do
 n <- fresh
 blk <- currentlet i = stack blklet ref = (UnName n)
 modifyBlock $ blk { stack = i ++ [ref := ins] }return$ local refterminator ::NamedTerminator->Codegen (NamedTerminator)
terminator trm =do
 blk <- current
 modifyBlock $ blk { term =Just trm }return trm

Using the instr function we now wrap the AST nodes for basic arithmetic operations of floating point values.

fadd ::Operand->Operand->CodegenOperand
fadd a b = instr $FAdd a b []fsub ::Operand->Operand->CodegenOperand
fsub a b = instr $FSub a b []fmul ::Operand->Operand->CodegenOperand
fmul a b = instr $FMul a b []fdiv ::Operand->Operand->CodegenOperand
fdiv a b = instr $FDiv a b []

On top of the basic arithmetic functions we'll add the basic control flow operations which will allow us to direct the control flow between basic blocks and return values.

br ::Name->Codegen (NamedTerminator)
br val = terminator $Do$Br val []cbr ::Operand->Name->Name->Codegen (NamedTerminator)
cbr cond tr fl = terminator $Do$CondBr cond tr fl []ret ::Operand->Codegen (NamedTerminator)
ret val = terminator $Do$Ret (Just val) []

Finally we'll add several "effect" instructions which will invoke memory and evaluation side-effects. The call instruction will simply take a named function reference and a list of arguments and evaluate it and simply invoke it at the current position. The alloca instruction will create a pointer to a stack allocated uninitialized value of the given type.

call ::Operand-> [Operand] ->CodegenOperand
call fn args = instr $CallFalseCC.C [] (Right fn) (toArgs args) [] []alloca ::Type->CodegenOperand
alloca ty = instr $Alloca ty Nothing0 []store ::Operand->Operand->CodegenOperand
store ptr val = instr $StoreFalse ptr val Nothing0 []load ::Operand->CodegenOperand
load ptr = instr $LoadFalse ptr Nothing0 []

Now that we have the infrastructure in place we can begin ingest our AST from Syntax.hs and construct a LLVM module from it. We will create a new Emit.hs module and spread the logic across two functions. The first codegenTop will emit toplevel constructions in modules ( functions and external definitions ) and will return a LLVM monad. The last instruction on the stack we'll bind into the ret instruction to ensure and emit as the return value of the function. We'll also sequentially assign each of the named arguments from the function to a stack allocated value with a reference in our symbol table.

codegenTop ::S.Expr->LLVM ()
codegenTop (S.Function name args body) =do
 define double name fnargs blswhere
 fnargs = toSig args
 bls = createBlocks $ execCodegen $do
 entry <- addBlock entryBlockName
 setBlock entry
 forM args $ \a ->do
 var <- alloca double
 store var (local (AST.Name a))
 assign a var
 cgen body >>= ret
codegenTop (S.Extern name args) =do
 external double name fnargs []where fnargs = toSig args
codegenTop exp=do
 define double "main" [] blks
 nwhere
 blks = createBlocks $ execCodegen $do
 entry <- addBlock entryBlockName
 setBlock entry
 cgen exp>>= rettoSig :: [String] -> [(AST.Type, AST.Name)]
toSig =map (\x -> (double, AST.Name x))

The second is the expression level code code generation (cgen) which will recursively walk the AST pushing instructions on the stack and changing the current block as needed. The simplest AST node is constant integers and floating point values which simply return constant values in LLVM IR.

cgen ::S.Expr->CodegenAST.Operand
cgen (S.Float n) =return$ cons $C.Float (F.Double n)

We need to reference local variables so we'll invoke our getvar function in conjunction with a load use values. The conscious reader will intuit that this might an excessive amount of extraneous instructions pushing temporary values on the stack, something that we'll address later with a simple optimization pass.

cgen (S.Var x) = getvar x >>= load

For Call we'll first evaluate each argument and then invoke the function with the values. Since our language only has double type values, this is trivial and we don't need to worry too much.

cgen (S.Call fn args) =do
 largs <-mapM cgen args
 call (externf (AST.Name fn)) largs

Finally for our operators we'll construct a predefined association map of symbol strings to implementations of functions which the corresponding logic for the operation.

binops = Map.fromList [
 ("+", fadd)
 , ("-", fsub)
 , ("*", fmul)
 , ("/", fdiv)
 , ("<", lt)
 ]

For the comparison operator we'll invoke the uitofp which will convert a unsigned integer quantity to a floating point value. LLVM requires the unsigned single bit types as the values for comparison and test operations but we prefer to work entirely with doubles where possible.

lt ::AST.Operand->AST.Operand->CodegenAST.Operand
lt a b =do
 test <- fcmp FP.ULT a b
 uitofp double test

Just like the call instruction above we simply generate the code for operands and invoke the function we just looked up for the symbol.

cgen (S.BinaryOp op a b) =docase Map.lookup op binops ofJust f ->do
 ca <- cgen a
 cb <- cgen b
 f ca cbNothing->error"No such operator"

Putting everything together we find that we nice little minimal language that supports both function abstraction and basic arithmetic. The final step is to hook into LLVM bindings to generate a string representation of the LLVM IR which we'll print our the string on each action in the REPL. We'll discuss these functions in more depth in the next chapter.

codegen ::AST.Module-> [S.Expr] ->IOAST.Module
codegen mod fns = withContext $ \context ->
 liftError $ withModuleFromAST context newast $ \m ->do
 llstr <- moduleString mputStrLn llstrreturn newastwhere
 modn =mapM codegenTop fns
 newast = runLLVM mod modn

Running Main.hs we can observer our code generator in action.

ready> def foo(a b) a*a + 2*a*b + b*b
; ModuleID = 'my cool jit'
define double @foo(double %a, double %b) {
entry:%0 = fmul double %a, %a%1 = fmul double %a, 2.000000e+00%2 = fmul double %1, %b%3 = fadd double %0, %2%4 = fmul double %b, %b%5 = fadd double %4, %3
 ret double %5
}
ready> def bar(a) foo(a, 4.0) + bar(31337)
define double @bar(double %a) {
entry:%0 = alloca double
 store double %a, double*%0%1 = load double*%0%2 = call double @foo(double %1, double 4.000000e+00)%3 = call double @bar(double 3.133700e+04)%4 = fadd double %2, %3
 ret double %4
}

See src/chapter3 for the full source from this chapter.

In the previous chapter we were able to map our language Syntax into the LLVM IR and print it out to the screen. This chapter describes two new techniques: adding optimizer support to our language, and adding JIT compiler support. These additions will demonstrate how to get nice, efficient code for the Kaleidoscope language.

We'll refer to a Module as holding the internal representation of the LLVM IR. Modules can be generated from the Haskell LLVM AST or from strings containing bitcode.

Both data types have the same name ( Module ), so as convention we will call qualify the imports of the libraries to distinguish between the two.

  • AST.Module : Haskell AST Module
  • Module : Internal LLVM Module

llvm-general provides two important functions for converting between them. withModuleFromAST has type ErrorT since it may fail if given a malformed expression, it is important to handle both cases of the resulting Either value.

withModuleFromAST ::Context->AST.Module-> (Module->IO a) ->ErrorTStringIO amoduleAST ::Module->IOAST.Module

We can also generate the assembly code for our given module by passing a specification of the CPU and platform information we wish to target, called the TargetMachine.

moduleAssembly ::TargetMachine->Module->ErrorTStringIOString

Recall the so called "Bracket" pattern in Haskell for managing IO resources. llvm-general makes heavy use this pattern to manage the life-cycle of certain LLVM resources. It is very important to remember not to pass or attempt to use resources outside of the bracket as this will lead to undefined behavior and/or segfaults.

bracket ::IO a -- computation to run first ("acquire resource")-> (a ->IO b) -- computation to run last ("release resource")-> (a ->IO c) -- computation to run in-between->IO c

In addition to this we'll often be dealing with operations which can fail in an EitherT monad if given bad code. We'll often want to lift this error up the monad transformer stack with the pattern:

liftError ::ErrorTStringIO a ->IO a
liftError = runErrorT >=>eitherfailreturn

To start we'll create a runJIT function which will start with a stack of brackets. We'll then simply generate the IR and print it out to the string.

runJIT ::AST.Module->IO (EitherString ())
runJIT mod=do
 withContext $ \context ->
 runErrorT $ withModuleFromAST context mod$ \m ->
 s <- moduleString mputStrLn s

Our demonstration for Chapter 3 is elegant and easy to extend. Unfortunately, it does not produce wonderful code. However the naive construction of the LLVM module will perform some minimal transformations to generate a module which not a literal transcription of the AST but preserves the same semantics.

The "dumb" transcription would look like:

ready> def test(x) 1+2+x
define double @test(double %x) {
entry:
 %addtmp = fadd double 2.000000e+00, 1.000000e+00
 %addtmp1 = fadd double %addtmp, %x
 ret double %addtmp1
}

The "smarter" transcription would eliminate the first line since it contains a simple constant that can be computed at compile-time.

ready> def test(x) 1+2+x
define double @test(double %x) {
entry:
 %addtmp = fadd double 3.000000e+00, %x
 ret double %addtmp
}

Constant folding, as seen above, in particular, is a very common and very important optimization: so much so that many language implementors implement constant folding support in their AST representation. This technique is limited by the fact that it does all of its analysis inline with the code as it is built. If you take a slightly more complex example:

ready> def test(x) (1+2+x)*(x+(1+2))
define double @test(double %x) {
entry:
 %addtmp = fadd double 3.000000e+00, %x
 %addtmp1 = fadd double %x, 3.000000e+00
 %multmp = fmul double %addtmp, %addtmp1
 ret double %multmp
}

In this case, the left and right hand sides of the multiplication are the same value. We'd really like to see this generate tmp = x+3; result = tmp*tmp instead of computing x+3 twice.

Unfortunately, no amount of local analysis will be able to detect and correct this. This requires two transformations: reassociation of expressions (to make the adds lexically identical) and Common Subexpression Elimination (CSE) to delete the redundant add instruction. Fortunately, LLVM provides a broad range of optimizations that we can use, in the form of “passes”.

LLVM provides many optimization passes, which do many different sorts of things and have different trade-offs. Unlike other systems, LLVM doesn't hold to the mistaken notion that one set of optimizations is right for all languages and for all situations. LLVM allows a compiler implementor to make complete decisions about what optimizations to use, in which order, and in what situation.

As a concrete example, LLVM supports both “whole module” passes, which look across as large of body of code as they can (often a whole file, but if run at link time, this can be a substantial portion of the whole program). It also supports and includes “per-function” passes which just operate on a single function at a time, without looking at other functions. For more information on passes and how they are run, see the How to Write a Pass document and the List of LLVM Passes.

For Kaleidoscope, we are currently generating functions on the fly, one at a time, as the user types them in. We aren't shooting for the ultimate optimization experience in this setting, but we also want to catch the easy and quick stuff where possible.

We won't delve too much into the details of the passes since they are better described elsewhere. We will instead just invoke the default "curated passes" with an optimization level which will perform most of the common clean-ups and a few non-trivial optimizations.

passes ::PassSetSpec
passes = defaultCuratedPassSetSpec { optLevel =Just3 }

To apply the passes we create a bracket for a PassManager and invoke runPassManager on our working module. Note that this modifies the module in-place.

runJIT ::AST.Module->IO (EitherStringAST.Module)
runJIT mod=do
 withContext $ \context ->
 runErrorT $ withModuleFromAST context mod$ \m ->
 withPassManager passes $ \pm ->do
 runPassManager pm m
 optmod <- moduleAST m
 s <- moduleString mputStrLn sreturn optmod

With this in place, we can try our test above again:

ready> def test(x) (1+2+x)*(x+(1+2))
; ModuleID = 'my cool jit'
; Function Attrs: nounwind readnone
define double @test(double %x) #0 {
entry:
 %0 = fadd double %x, 3.000000e+00
 %1 = fmul double %0, %0
 ret double %1
}
attributes #0 = { nounwind readnone }

As expected, we now get our nicely optimized code, saving a floating point add instruction from every execution of this function. We also see some extra metadata attached to our function, which we can ignore for now, but is indicating certain properties of the function that aid in later optimization.

LLVM provides a wide variety of optimizations that can be used in certain circumstances. Some documentation about the various passes is available, but it isn't very complete. Another good source of ideas can come from looking at the passes that Clang runs to get started. The “opt” tool allows us to experiment with passes from the command line, so we can see if they do anything.

One important optimization pass is a "analysis pass" which will validate that the internal IR is well-formed. Since it quite possible (even easy!) to construct nonsensical or unsafe IR it is very good practice to validate our IR before attempting to optimize or execute it. To do we simply invoke the verify function with our active module.

runJIT ::AST.Module->IO (EitherStringAST.Module)
runJIT mod=do...
 withPassManager passes $ \pm ->do
 runErrorT $ verify m

Now that we have reasonable code coming out of our front-end, lets talk about executing it!

Code that is available in LLVM IR can have a wide variety of tools applied to it. For example, we can run optimizations on it (as we did above), we can dump it out in textual or binary forms, we can compile the code to an assembly file (.s) for some target, or we can JIT compile it. The nice thing about the LLVM IR representation is that it is the “common currency” between many different parts of the compiler.

In this section, we'll add JIT compiler support to our interpreter. The basic idea that we want for Kaleidoscope is to have the user enter function bodies as they do now, but immediately evaluate the top-level expressions they type in. For example, if they type in “1 + 2;”, we should evaluate and print out 3. If they define a function, they should be able to call it from the command line.

In order to do this, we add another function to bracket the creation of the JIT Execution Engine. There are two provided engines: jit and mcjit. The distinction is not important for us but we will opt to use the newer mcjit.

importqualified LLVM.General.ExecutionEngine as EEjit ::Context-> (EE.MCJIT->IO a) ->IO a
jit c = EE.withMCJIT c optlevel model ptrelim fastinswhere
 optlevel =Just2-- optimization level
 model =Nothing-- code model ( Default )
 ptrelim =Nothing-- frame pointer elimination
 fastins =Nothing-- fast instruction selection

The result of the JIT compiling our function will be a C function pointer which we can call from within the JIT's process space. We need some (unsafe!) plumbing to coerce our foreign C function into a callable object from Haskell. Some care must be taken when performing these operations since we're telling Haskell to "trust us" that the pointer we hand it is actually typed as we describe it. If we don't take care with the casts we can expect undefined behavior.

foreign import ccall "dynamic" haskFun :: FunPtr (IO Double) -> (IO Double)run ::FunPtr a ->IODouble
run fn = haskFun (castFunPtr fn ::FunPtr (IODouble))

Integrating this with our function from above we can now manifest our IR as executable code inside the ExecutionEngine and pass the resulting native types to and from the Haskell runtime.

runJIT ::AST.Module->IO (EitherString ())
runJIT mod=do...
 jit context $ \executionEngine ->...
 EE.withModuleInEngine executionEngine m $ \ee ->do
 mainfn <- EE.getFunction ee (AST.Name"main")case mainfn ofJust fn ->do
 res <- run fnputStrLn$"Evaluated to: "++show resNothing->return ()

Having to statically declare our function pointer type is rather inflexible, if we wish to extend to this to be more flexible a library like libffi is very useful for calling functions with argument types that can determined at runtime.

The JIT provides a number of other more advanced interfaces for things like freeing allocated machine code, rejit'ing functions to update them, etc. However, even with this simple code, we get some surprisingly powerful capabilities - check this out:

ready> extern sin(x)
; ModuleID = 'my cool jit'declaredouble @sin(double)ready> extern cos(x)
; ModuleID = 'my cool jit'declaredouble @sin(double)declaredouble @cos(double)ready> sin(1.0)
; ModuleID = 'my cool jit'declaredouble @sin(double)declaredouble @cos(double)define double @main() {entry:%0 = call double @sin(double 1.000000e+00)ret double %0}Evaluated to: 0.8414709848078965

Whoa, how does the JIT know about sin and cos? The answer is surprisingly simple: in this example, the JIT started execution of a function and got to a function call. It realized that the function was not yet JIT compiled and invoked the standard set of routines to resolve the function. In this case, there is no body defined for the function, so the JIT ended up calling dlsym("sin") on the Kaleidoscope process itself. Since "sin" is defined within the JIT's address space, it simply patches up calls in the module to call the libm version of sin directly.

The LLVM JIT provides a number of interfaces for controlling how unknown functions get resolved. It allows us to establish explicit mappings between IR objects and addresses (useful for LLVM global variables that we want to map to static tables, for example), allows us to dynamically decide on the fly based on the function name, and even allows us JIT compile functions lazily the first time they're called.

One interesting application of this is that we can now extend the language by writing arbitrary C code to implement operations. For example, if create a shared library cbits.so:

/* cbits$ gcc -fPIC -shared cbits.c -o cbits.so$ clang -fPIC -shared cbits.c -o cbits.so*/#include "stdio.h"// putchard - putchar that takes a double and returns 0.double putchard(double X) {
 putchar((char)X);
 fflush(stdout);return0;
}

Compile this with your favorite C compiler. We can then link this into our Haskell binary by simply including it along side the rest of the Haskell source files

$ ghc cbits.so --make Main.hs -o Main

Now we can produce simple output to the console by using things like: extern putchard(x); putchard(120);, which prints a lowercase 'x' on the console (120 is the ASCII code for 'x'). Similar code could be used to implement file I/O, console input, and many other capabilities in Kaleidoscope.

To bring external shared objects into the process address space we can call Haskell's bindings to the system dynamic linking loader to load external libraries. In addition if we are statically compiling our interpreter we can tell GHC to link against the shared objects explicitly by passing them in with the -l flag.

This completes the JIT and optimizer chapter of the Kaleidoscope tutorial. At this point, we can compile a non-Turing-complete programming language, optimize and JIT compile it in a user-driven way. Next up we'll look into extending the language with control flow constructs, tackling some interesting LLVM IR issues along the way.

See src/chapter4 for the full source from this chapter.

Welcome to Chapter 5 of the Implementing a language with LLVM tutorial. Parts 1-4 described the implementation of the simple Kaleidoscope language and included support for generating LLVM IR, followed by optimizations and a JIT compiler. Unfortunately, as presented, Kaleidoscope is mostly useless: it has no control flow other than call and return. This means that we can't have conditional branches in the code, significantly limiting its power. In this episode of "build that compiler", we'll extend Kaleidoscope to have an if/then/else expression plus a simple 'for' loop.

Extending Kaleidoscope to support if/then/else is quite straightforward. It basically requires adding lexer support for this "new" concept to the lexer, parser, AST, and LLVM code emitter. This example is nice, because it shows how easy it is to "grow" a language over time, incrementally extending it as new ideas are discovered.

Before we get going on "how" we add this extension, lets talk about "what" we want. The basic idea is that we want to be able to write this sort of thing:

def fib(x)if x < 3 then1else
 fib(x-1) + fib(x-2)

In Kaleidoscope, every construct is an expression: there are no statements. As such, the if/then/else expression needs to return a value like any other. Since we're using a mostly functional form, we'll have it evaluate its conditional, then return the ‘then' or ‘else' value based on how the condition was resolved. This is very similar to the C "?:" expression.

The semantics of the if/then/else expression is that it evaluates the condition to a boolean equality value: 0.0 is considered to be false and everything else is considered to be true. If the condition is true, the first subexpression is evaluated and returned, if the condition is false, the second subexpression is evaluated and returned. Since Kaleidoscope allows side-effects, this behavior is important to nail down.

Now that we know what we "want", let's break this down into its constituent pieces.

To represent the new expression we add a new AST node for it:

dataExpr...|IfExprExprExprderiving (Eq, Ord, Show)

We also extend our lexer definition with the new reserved names.

lexer ::Tok.TokenParser ()
lexer = Tok.makeTokenParser stylewhere
 ops = ["+","*","-","/",";",",","<"]
 names = ["def","extern","if","then","else] style = emptyDef { Tok.commentLine = "#" , Tok.reservedOpNames = ops , Tok.reservedNames = names }

Now that we have the relevant tokens coming from the lexer and we have the AST node to build, our parsing logic is relatively straightforward. First we define a new parsing function:

ifthen ::ParserExpr
ifthen =do
 reserved "if"
 cond <- expr
 reserved "then"
 tr <- expr
 reserved "else"
 fl <- exprreturn$If cond tr fl

Now that we have it parsing and building the AST, the final piece is adding LLVM code generation support. This is the most interesting part of the if/then/else example, because this is where it starts to introduce new concepts. All of the code above has been thoroughly described in previous chapters.

To motivate the code we want to produce, lets take a look at a simple example. Consider:

extern foo();
extern bar();def baz(x) if x then foo() else bar();
declare double @foo()
declare double @bar()
define double @baz(double %x) {
entry:%ifcond = fcmp one double %x, 0.000000e+00
 br i1 %ifcond, label %then, label %else
then: ; preds = %entry%calltmp = call double @foo()
 br label %ifcontelse: ; preds = %entry%calltmp1 = call double @bar()
 br label %ifcont
ifcont: ; preds = %else, %then%iftmp = phi double [ %calltmp, %then ], [ %calltmp1, %else ]
 ret double %iftmp
}

To visualize the control flow graph, we can use a nifty feature of the LLVM opt tool. If we put this LLVM IR into "t.ll" and run

$ llvm-as< t.ll |opt -analyze -view-cfg

A window will pop up and we'll see this graph:

LLVM has many nice features for visualizing various graphs, but note that these are available only if your LLVM was built with Graphviz support (accomplished by having Graphviz and Ghostview installed when building LLVM).

Getting back to the generated code, it is fairly simple: the entry block evaluates the conditional expression ("x" in our case here) and compares the result to 0.0 with the fcmp one instruction (one is "Ordered and Not Equal"). Based on the result of this expression, the code jumps to either the "then" or "else" blocks, which contain the expressions for the true/false cases.

Once the then/else blocks are finished executing, they both branch back to the if.exit block to execute the code that happens after the if/then/else. In this case the only thing left to do is to return to the caller of the function. The question then becomes: how does the code know which expression to return?

The answer to this question involves an important SSA operation: the Phi operation. If you're not familiar with SSA, the Wikipedia article is a good introduction and there are various other introductions to it available on your favorite search engine. The short version is that "execution" of the Phi operation requires "remembering" which block control came from. The Phi operation takes on the value corresponding to the input control block. In this case, if control comes in from the if.then block, it gets the value of calltmp. If control comes from the if.else block, it gets the value of calltmp1.

At this point, you are probably starting to think "Oh no! This means my simple and elegant front-end will have to start generating SSA form in order to use LLVM!". Fortunately, this is not the case, and we strongly advise not implementing an SSA construction algorithm in your front-end unless there is an amazingly good reason to do so. In practice, there are two sorts of values that float around in code written for your average imperative programming language that might need Phi nodes:

  • Code that involves user variables: x = 1; x = x + 1;
  • Values that are implicit in the structure of your AST, such as the Phi node in this case.

In Chapter 7 of this tutorial ("mutable variables"), we'll talk about #1 in depth. For now, just believe accept that you don't need SSA construction to handle this case. For #2, you have the choice of using the techniques that we will describe for #1, or you can insert Phi nodes directly, if convenient. In this case, it is really really easy to generate the Phi node, so we choose to do it directly.

Okay, enough of the motivation and overview, lets generate code!

In order to generate code for this, we implement the Codegen method for If node:

cgen (S.If cond tr fl) =do
 ifthen <- addBlock "if.then"
 ifelse <- addBlock "if.else"
 ifexit <- addBlock "if.exit"-- %entry------------------
 cond <- cgen cond
 test <- fcmp FP.ONE false cond
 cbr test ifthen ifelse -- Branch based on the condition-- if.then------------------
 setBlock ifthen
 trval <- cgen tr -- Generate code for the true branch
 br ifexit -- Branch to the merge block
 ifthen <- getBlock-- if.else------------------
 setBlock ifelse
 flval <- cgen fl -- Generate code for the false branch
 br ifexit -- Branch to the merge block
 ifelse <- getBlock-- if.exit------------------
 setBlock ifexit
 phi double [(trval, ifthen), (flval, ifelse)]

We start by creating three blocks.

 ifthen <- addBlock "if.then"
 ifelse <- addBlock "if.else"
 ifexit <- addBlock "if.exit"

Next emit the expression for the condition, then compare that value to zero to get a truth value as a 1-bit (i.e. bool) value. We end this entry block by emitting the conditional branch that chooses between them the two cases.

 test <- fcmp FP.ONE false cond
 cbr test ifthen ifelse -- Branch based on the condition

After the conditional branch is inserted, we move switch blocks to start inserting into the if.then block.

 setBlock ifthen

We recursively codegen the tr expression from the AST. To finish off the if.then block, we create an unconditional branch to the merge block. One interesting (and very important) aspect of the LLVM IR is that it requires all basic blocks to be "terminated" with a control flow instruction such as return or branch. This means that all control flow, including fallthroughs must be made explicit in the LLVM IR. If we violate this rule, the verifier will emit an error.

 trval <- cgen tr -- Generate code for the true branch
 br ifexit -- Branch to the merge block
 ifthen <- getBlock -- Get the current block

The final line here is quite subtle, but is very important. The basic issue is that when we create the Phi node in the merge block, we need to set up the block/value pairs that indicate how the Phi will work. Importantly, the Phi node expects to have an entry for each predecessor of the block in the CFG. Why then, are we getting the current block when we just set it block 3 lines above? The problem is that theifthen expression may actually itself change the block that the Builder is emitting into if, for example, it contains a nested "if/then/else" expression. Because calling cgen recursively could arbitrarily change the notion of the current block, we are required to get an up-to-date value for code that will set up the Phi node.

 setBlock ifelse
 flval <- cgen fl -- Generate code for the false branch
 br ifexit -- Branch to the merge block
 ifelse <- getBlock

Code generation for the if.else block is basically identical to codegen for the if.then block.

 setBlock ifexit
 phi double [(trval, ifthen), (flval, ifelse)]

The first line changes the insertion point so that newly created code will go into the if.exit block. Once that is done, we need to create the Phi node and set up the block/value pairs for the Phi.

Finally, the cgen function returns the phi node as the value computed by the if/then/else expression. In our example above, this returned value will feed into the code for the top-level function, which will create the return instruction.

Overall, we now have the ability to execute conditional code in Kaleidoscope. With this extension, Kaleidoscope is a fairly complete language that can calculate a wide variety of numeric functions. Next up we'll add another useful expression that is familiar from non-functional languages...

Now that we know how to add basic control flow constructs to the language, we have the tools to add more powerful things. Lets add something more aggressive, a ‘for' expression:

extern putchard(char)
def printstar(n)
 for i =1, i < n, 1.0in
 putchard(42); # ascii 42='*'#print100'*' characters
printstar(100);

This expression defines a new variable (i in this case) which iterates from a starting value, while the condition (i < n in this case) is true, incrementing by an optional step value (1.0 in this case). While the loop is true, it executes its body expression. Because we don't have anything better to return, we'll just define the loop as always returning 0.0. In the future when we have mutable variables, it will get more useful.

To get started, we again extend our lexer with new reserved names "for" and "in".

lexer ::Tok.TokenParser ()
lexer = Tok.makeTokenParser stylewhere
 ops = ["+","*","-","/",";",",","<"]
 names = ["def","extern","if","then","else","in","for"]
 style = emptyDef {
 Tok.commentLine ="#"
 , Tok.reservedOpNames = ops
 , Tok.reservedNames = names
 }

As before, lets talk about the changes that we need to Kaleidoscope to support this. The AST node is just as simple. It basically boils down to capturing the variable name and the constituent expressions in the node.

dataExpr...|ForNameExprExprExprExprderiving (Eq, Ord, Show)

The parser code captures a named value for the iterator variable and the four expressions objects for the parameters of the loop parameters.

for ::ParserExpr
for =do
 reserved "for"
 var <- identifier
 reservedOp "="
 start <- expr
 reservedOp ","
 cond <- expr
 reservedOp ","
 step <- expr
 reserved "in"
 body <- exprreturn$For var start cond step body

Now we get to the good part: the LLVM IR we want to generate for this thing. With the simple example above, we get this LLVM IR (note that this dump is generated with optimizations disabled for clarity):

declare double @putchard(double)
define double @printstar(double %n) {
entry:
 br label %loop
loop:%i = phi double [ 1.000000e+00, %entry ], [ %nextvar, %loop ]%calltmp = call double @putchard(double 4.200000e+01)%nextvar = fadd double %i, 1.000000e+00%cmptmp = fcmp ult double %i, %n%booltmp = uitofp i1 %cmptmp to double%loopcond = fcmp one double %booltmp, 0.000000e+00
 br i1 %loopcond, label %loop, label %afterloop
afterloop:
 ret double 0.000000e+00
}

The code to generate this is only slightly more complicated than the above "if" statement.

cgen (S.For ivar start cond step body) =do
 forloop <- addBlock "for.loop"
 forexit <- addBlock "for.exit"-- %entry------------------
 i <- alloca double
 istart <- cgen start -- Generate loop variable initial value
 stepval <- cgen step -- Generate loop variable step
 store i istart -- Store the loop variable initial value
 assign ivar i -- Assign loop variable to the variable name
 br forloop -- Branch to the loop body block-- for.loop------------------
 setBlock forloop
 cgen body -- Generate the loop body
 ival <- load i -- Load the current loop iteration
 inext <- fadd ival stepval -- Increment loop variable
 store i inext
 cond <- cgen cond -- Generate the loop condition
 test <- fcmp FP.ONE false cond -- Test if the loop condition is True ( 1.0 )
 cbr test forloop forexit -- Generate the loop condition

The first step is to set up the LLVM basic block for the start of the loop body. In the case above, the whole loop body is one block, but remember that the generating code for the body of the loop could consist of multiple blocks (e.g. if it contains an if/then/else or a for/in expression).

 forloop <- addBlock "for.loop"
 forexit <- addBlock "for.exit"

Next we create the allocate the iteration variable and generate the code for the constant initial value and step.

 i <- alloca double
 istart <- cgen start -- Generate loop variable initial value
 stepval <- cgen step -- Generate loop variable step

Now the code starts to get more interesting. Our ‘for' loop introduces a new variable to the symbol table. This means that our symbol table can now contain either function arguments or loop variables. Once the loop variable is set into the symbol table, the code recursively codegen's the body. This allows the body to use the loop variable: any references to it will naturally find it in the symbol table.

 store i istart -- Store the loop variable initial value
 assign ivar i -- Assign loop variable to the variable name
 br forloop -- Branch to the loop body block

Now that the "preheader" for the loop is set up, we switch to emitting code for the loop body.

 setBlock forloop
 cgen body -- Generate the loop body

The body will contain the iteration variable scoped with it's code generation. After load it's current state we increment it by the step value and store the value.

 ival <- load i -- Load the current loop iteration
 inext <- fadd ival stepval -- Increment loop variable
 store i inext

Finally, we evaluate the exit test of the loop, and conditionally either branch back to the same block or exit the loop.

 cond <- cgen cond -- Generate the loop condition
 test <- fcmp FP.ONE false cond -- Test if the loop condition is True ( 1.0 )
 cbr test forloop forexit -- Generate the loop condition

Finally, code generation of the for loop always returns 0.0. Also note that the loop variable remains in scope even after the function exits.

 setBlock forexitreturn zero

See src/chapter5 for the full source from this chapter.

Welcome to Chapter 6 of the "Implementing a language with LLVM" tutorial. At this point in our tutorial, we now have a fully functional language that is fairly minimal, but also useful. There is still one big problem with it, however. Our language doesn't have many useful operators (like division, logical negation, or even any comparisons besides less-than).

This chapter of the tutorial takes a wild digression into adding user-defined operators to the simple and beautiful Kaleidoscope language. This digression now gives us a simple and ugly language in some ways, but also a powerful one at the same time. One of the great things about creating our own language is that we get to decide what is good or bad. In this tutorial we'll assume that it is okay to use this as a way to show some interesting parsing techniques.

At the end of this tutorial, we'll run through an example Kaleidoscope application that renders the Mandelbrot set. This gives an example of what we can build with Kaleidoscope and its feature set.

The "operator overloading" that we will add to Kaleidoscope is more general than languages like C++. In C++, we are only allowed to redefine existing operators: we can't programatically change the grammar, introduce new operators, change precedence levels, etc. In this chapter, we will add this capability to Kaleidoscope, which will let the user round out the set of operators that are supported.

The two specific features we'll add are programmable unary operators (right now, Kaleidoscope has no unary operators at all) as well as binary operators. An example of this is:

# Logical unary not.def unary!(v)if v then0else1;# Define > with the same precedence as <.def binary> 10 (LHS RHS)
 RHS < LHS;# Binary "logical or", (note that it does not "short circuit")def binary| 5 (LHS RHS)if LHS then1elseif RHS then1else0;# Define = with slightly lower precedence than relationals.def binary= 9 (LHS RHS)
 !(LHS < RHS | LHS > RHS);

Many languages aspire to being able to implement their standard runtime library in the language itself. In Kaleidoscope, we can implement significant parts of the language in the library!

We will break down implementation of these features into two parts: implementing support for user-defined binary operators and adding unary operators.

We extend the lexer with two new keywords for "binary" and "unary" toplevel definitions.

lexer ::Tok.TokenParser ()
lexer = Tok.makeTokenParser stylewhere
 ops = ["+","*","-","/",";","=",",","<",">","|",":"]
 names = ["def","extern","if","then","else","in","for"
 ,"binary", "unary"]
 style = emptyDef {
 Tok.commentLine ="#"
 , Tok.reservedOpNames = ops
 , Tok.reservedNames = names
 }

Parsec has no default function to parse "any symbolic" string, but it can be added simply by defining an operator new token.

operator ::ParserString
operator =do
 c <- Tok.opStart emptyDef
 cs <- many $ Tok.opLetter emptyDefreturn (c:cs)

Using this we can then parse any binary expression. By default all our operators will be left-associative and have equal precedence, except for the bulletins we provide. A more general system would allow the parser to have internal state about the known precedences of operators before parsing. Without predefined precedence values we'll need to disambiguate expressions with parentheses.

binop =Ex.Infix (BinaryOp<$> op) Ex.AssocLeft

Using the expression parser we can extend our table of operators with the "binop" class of custom operators. Note that this will match any and all operators even at parse-time, even if there is no corresponding definition.

binops = [[binary "*"Ex.AssocLeft,
 binary "/"Ex.AssocLeft]
 ,[binary "+"Ex.AssocLeft,
 binary "-"Ex.AssocLeft]
 ,[binary "<"Ex.AssocLeft]]expr ::ParserExpr
expr = Ex.buildExpressionParser (binops ++ [[binop]]) factor

The extensions to the AST consist of adding new toplevel declarations for the operator definitions.

dataExpr|BinaryOpNameExprExpr|UnaryOpNameExpr|BinaryDefName [Name] Expr|UnaryDefName [Name] Expr

The parser extension is straightforward and essentially a function definition with a few slight change. Note that we capture the string value of the operator as given to us by the parser.

binarydef ::ParserExpr
binarydef =do
 reserved "def"
 reserved "binary"
 o <- op
 prec <- int
 args <- parens $ many identifier
 body <- exprreturn$BinaryDef o args body

To generate code we'll implement two extensions to our existing code generator. At the toplevel we'll emit the BinaryDef declarations as simply create a normal function with the name "binary" suffixed with the operator.

codegenTop (S.BinaryDef name args body) =
 codegenTop $S.Function ("binary"++ name) args body

Now for our binary operator instead of failing with an the presence of a binary operator not declared in our binops list, we instead create a call to a named "binary" function with the operator name.

cgen (S.BinaryOp op a b) =docase Map.lookup op binops ofJust f ->do
 ca <- cgen a
 cb <- cgen b
 f ca cbNothing-> cgen (S.Call ("binary"++ op) [a,b])

For unary operators we implement the same strategy as binary operators. We add a parser for unary operators simply as a Prefix operator matching any symbol.

unop =Ex.Prefix (UnaryOp<$> op)

We add this to the expression parser like above.

expr ::ParserExpr
expr = Ex.buildExpressionParser (binops ++ [[unop], [binop]]) factor

The parser extension for the toplevel unary definition is precisely the same as function syntax except prefixed with the "unary" keyword.

unarydef ::ParserExpr
unarydef =do
 reserved "def"
 reserved "unary"
 o <- op
 args <- parens $ many identifier
 body <- exprreturn$UnaryDef o args body

For toplevel declarations we'll simply emit a function with the convention that the name is prefixed with the word "unary". For example ("unary!", "unary-").

codegenTop (S.UnaryDef name args body) =
 codegenTop $S.Function ("unary"++ name) args body

Up until now we have not have had any unary operators so code generation we will simply always search for a implementation as a function.

cgen (S.UnaryOp op a) =do
 cgen $S.Call ("unary"++ op) [a]

That's it for unary operators, quite easy indeed!

It is somewhat hard to believe, but with a few simple extensions we’ve covered in the last chapters, we have grown a real-ish language. With this, we can do a lot of interesting things, including I/O, math, and a bunch of other things. For example, we can now add a nice sequencing operator (printd is defined to print out the specified value and a newline):

ready> extern printd(x)
declare double @printd(double)
ready> def binary : 1 (x y) 0;
..
ready> printd(123) : printd(456) : printd(789);123.000000456.000000789.000000
Evaluated to 0.000000

We can also define a bunch of other "primitive" operations, such as:

# Logical unary not.def unary!(v)if v then0else1;# Unary negate.def unary-(v)0-v;# Define > with the same precedence as <.def binary> 10 (LHS RHS)
 RHS < LHS;# Binary logical or, which does not short circuit.def binary| 5 (LHS RHS)if LHS then1elseif RHS then1else0;# Binary logical and, which does not short circuit.def binary& 6 (LHS RHS)if !LHS then0else
 !!RHS;# Define = with slightly lower precedence than relationals.def binary = 9 (LHS RHS)
 !(LHS < RHS | LHS > RHS);# Define ':' for sequencing: as a low-precedence operator that ignores operands# and just returns the RHS.def binary : 1 (x y) y;

Given the previous if/then/else support, we can also define interesting functions for I/O. For example, the following prints out a character whose "density" reflects the value passed in: the lower the value, the denser the character:

ready>
extern putchard(char)def printdensity(d)if d > 8 then
 putchard(32) # ' 'elseif d > 4 then
 putchard(46) # '.'elseif d > 2 then
 putchard(43) # '+'else
 putchard(42); # '*'
...
ready> printdensity(1): printdensity(2): printdensity(3):
 printdensity(4): printdensity(5): printdensity(9):
 putchard(10);
**++.
Evaluated to 0.000000

The Mandelbrot set is a set of two dimensional points generated by the complex function z = z2 + c whose boundary forms a fractal.

Based on our simple primitive operations defined above, we can start to define more interesting things. For example, here's a little function that solves for the number of iterations it takes a function in the complex plane to converge:

# Determine whether the specific location diverges.# Solve for z = z^2 + c in the complex plane.def mandleconverger(real imag iters creal cimag)if iters > 255 | (real*real + imag*imag > 4) then
 iterselse
 mandleconverger(real*real - imag*imag + creal,2*real*imag + cimag,
 iters+1, creal, cimag);# Return the number of iterations required for the iteration to escapedef mandleconverge(real imag)
 mandleconverger(real, imag, 0, real, imag);

Our mandelconverge function returns the number of iterations that it takes for a complex orbit to escape, saturating to 255. This is not a very useful function by itself, but if we plot its value over a two-dimensional plane, we can see the Mandelbrot set. Given that we are limited to using putchard here, our amazing graphical output is limited, but we can whip together something using the density plotter above:

# Compute and plot the mandlebrot set with the specified 2 dimensional range# info.def mandelhelp(xmin xmax xstep ymin ymax ystep)for y = ymin, y < ymax, ystep in (
 (for x = xmin, x < xmax, xstep in
 printdensity(mandleconverge(x,y)))
 : putchard(10)
 );# mandel - This is a convenient helper function for plotting the mandelbrot set# from the specified position with the specified Magnification.def mandel(realstart imagstart realmag imagmag)
 mandelhelp(realstart, realstart+realmag*78, realmag,
 imagstart, imagstart+imagmag*40, imagmag);

Given this, we can try plotting out the mandlebrot set! Lets try it out:


******************************************************************************
******************************************************************************
****************************************++++++********************************
************************************+++++...++++++****************************
*********************************++++++++.. ...+++++**************************
*******************************++++++++++.. ..+++++*************************
******************************++++++++++. ..++++++************************
****************************+++++++++.... ..++++++***********************
**************************++++++++....... .....++++**********************
*************************++++++++. . ... .++*********************
***********************++++++++... ++*********************
*********************+++++++++.... .+++********************
******************+++..+++++.... ..+++*******************
**************++++++. .......... +++*******************
***********++++++++.. .. .++*******************
*********++++++++++... .++++******************
********++++++++++.. .++++******************
*******++++++..... ..++++******************
*******+........ ...++++******************
*******+... .... ...++++******************
*******+++++...... ..++++******************
*******++++++++++... .++++******************
*********++++++++++... ++++******************
**********+++++++++.. .. ..++*******************
*************++++++.. .......... +++*******************
******************+++...+++..... ..+++*******************
*********************+++++++++.... ..++********************
***********************++++++++... +++********************
*************************+++++++.. . ... .++*********************
**************************++++++++....... ......+++**********************
****************************+++++++++.... ..++++++***********************
*****************************++++++++++.. ..++++++************************
*******************************++++++++++.. ...+++++*************************
*********************************++++++++.. ...+++++**************************
***********************************++++++....+++++****************************
***************************************++++++++*******************************
******************************************************************************
******************************************************************************
******************************************************************************
******************************************************************************

At this point, you may be starting to realize that Kaleidoscope is a real and powerful language. It may not be self-similar :), but it can be used to plot things that are!

With this, we conclude the "adding user-defined operators" chapter of the tutorial. We have successfully augmented our language, adding the ability to extend the language in the library, and we have shown how this can be used to build a simple but interesting end-user application in Kaleidoscope. At this point, Kaleidoscope can build a variety of applications that are functional and can call functions with side-effects, but it can't actually define and mutate a variable itself.

Strikingly, variable mutation is an important feature of imperative languages, and it is not at all obvious how to add support for mutable variables without having to add an "SSA construction" phase to our front-end. In the next chapter, we will describe how we can add variable mutation without building SSA in our front-end.

See src/chapter6 for the full source from this chapter.

Welcome to Chapter 7 of the "Implementing a language with LLVM" tutorial. In chapters 1 through 6, we've built a very respectable, albeit simple, functional programming language. In our journey, we learned some parsing techniques, how to build and represent an AST, how to build LLVM IR, and how to optimize the resultant code as well as JIT compile it.

While Kaleidoscope is interesting as a functional language, the fact that it is functional makes it "too easy" to generate LLVM IR for it. In particular, a functional language makes it very easy to build LLVM IR directly in SSA form. Since LLVM requires that the input code be in SSA form, this is a very nice property and it is often unclear to newcomers how to generate code for an imperative language with mutable variables.

The short (and happy) summary of this chapter is that there is no need for our front-end to build SSA form: LLVM provides highly tuned and well tested support for this, though the way it works is a bit unexpected for some.

To understand why mutable variables cause complexities in SSA construction, consider this extremely simple C example:

int G, H;int test(_Bool Condition) {int X;if (Condition)
 X = G;else
 X = H;return X;
}

In this case, we have the variable "X", whose value depends on the path executed in the program. Because there are two different possible values for X before the return instruction, a Phi node is inserted to merge the two values. The LLVM IR that we want for this example looks like this:

@G = weak global i32 0 ; type of @G is i32*@H = weak global i32 0 ; type of @H is i32*
define i32 @test(i1 %Condition) {
entry:
 br i1 %Condition, label %cond_true, label %cond_false
cond_true:%X.0 = load i32*@G
 br label %cond_next
cond_false:%X.1 = load i32*@H
 br label %cond_next
cond_next:%X.2 = phi i32 [ %X.1, %cond_false ], [ %X.0, %cond_true ]
 ret i32 %X.2
}

The control flow graph for the above IR:

In this example, the loads from the G and H global variables are explicit in the LLVM IR, and they live in the then/else branches of the if statement (cond_true/cond_false). In order to merge the incoming values, the X.2 phi node in the cond_next block selects the right value to use based on where control flow is coming from: if control flow comes from the cond_false block, X.2 gets the value of X.1. Alternatively, if control flow comes from cond_true, it gets the value of X.0. The intent of this chapter is not to explain the details of SSA form. For more information, see one of the many online references.

The question for this article is "who places the phi nodes when lowering assignments to mutable variables?". The issue here is that LLVM requires that its IR be in SSA form: there is no "non-SSA" mode for it. However, SSA construction requires non-trivial algorithms and data structures, so it is inconvenient and wasteful for every front-end to have to reproduce this logic.

The ‘trick' here is that while LLVM does require all register values to be in SSA form, it does not require (or permit) memory objects to be in SSA form. In the example above, note that the loads from G and H are direct accesses to G and H: they are not renamed or versioned. This differs from some other compiler systems, which do try to version memory objects. In LLVM, instead of encoding dataflow analysis of memory into the LLVM IR, it is handled with Analysis Passes which are computed on demand.

With this in mind, the high-level idea is that we want to make a stack variable (which lives in memory, because it is on the stack) for each mutable object in a function. To take advantage of this trick, we need to talk about how LLVM represents stack variables.

In LLVM, all memory accesses are explicit with load/store instructions, and it is carefully designed not to have (or need) an "address-of" operator. Notice how the type of the @G/@H global variables is actually i32* even though the variable is defined as i32. What this means is that @G defines space for an i32 in the global data area, but its name actually refers to the address for that space. Stack variables work the same way, except that instead of being declared with global variable definitions, they are declared with the LLVM alloca instruction:

define i32 @example() {
entry:%X = alloca i32 ; type of %X is i32*.
 ...%tmp = load i32*%X ; load the stack value %X from the stack.%tmp2 = add i32 %tmp, 1 ; increment it
 store i32 %tmp2, i32*%X ; store it back
 ...

This code shows an example of how we can declare and manipulate a stack variable in the LLVM IR. Stack memory allocated with the alloca instruction is fully general: we can pass the address of the stack slot to functions, we can store it in other variables, etc. In our example above, we could rewrite the example to use the alloca technique to avoid using a Phi node:

@G = weak global i32 0 ; type of @G is i32*@H = weak global i32 0 ; type of @H is i32*
define i32 @test(i1 %Condition) {
entry:%X = alloca i32
 br i1 %Condition, label %cond_true, label %cond_false
cond_true:%X.0 = load i32*@G
 store i32 %X.0, i32*%X
 br label %cond_next
cond_false:%X.1 = load i32*@H
 store i32 %X.1, i32*%X
 br label %cond_next
cond_next:%X.2 = load i32*%X
 ret i32 %X.2
}

With this, we have discovered a way to handle arbitrary mutable variables without the need to create Phi nodes at all:

  • Each mutable variable becomes a stack allocation.
  • Each read of the variable becomes a load from the stack.
  • Each update of the variable becomes a store to the stack.
  • Taking the address of a variable just uses the stack address directly.

While this solution has solved our immediate problem, it introduced another one: we have now apparently introduced a lot of stack traffic for very simple and common operations, a major performance problem. Fortunately for us, the LLVM optimizer has a highly-tuned optimization pass named "mem2reg" that handles this case, promoting allocas like this into SSA registers, inserting Phi nodes as appropriate. If we run this example through the pass, for example, we'll get:

$ llvm-as< example.ll |opt -mem2reg |llvm-dis@G = weak global i32 0@H = weak global i32 0define i32 @test(i1 %Condition) {entry:br i1 %Condition, label %cond_true, label %cond_falsecond_true:%X.0 = load i32* @Gbr label %cond_nextcond_false:%X.1 = load i32* @Hbr label %cond_nextcond_next:%X.01 = phi i32 [ %X.1, %cond_false ], [ %X.0, %cond_true ]ret i32 %X.01}

We say a block "A" dominates a different block "B" in the control flow graph if it's impossible to reach "B" without passing through "A", equivalently "A" is the dominator of "B". The mem2reg pass implements the standard "iterated dominance frontier" algorithm for constructing SSA form and has a number of optimizations that speed up (very common) degenerate cases.

The mem2reg optimization pass is the answer to dealing with mutable variables, and we highly recommend that you depend on it. Note that mem2reg only works on variables in certain circumstances:

  • mem2reg is alloca-driven: it looks for allocas and if it can handle them, it promotes them. It does not apply to global variables or heap allocations.
  • mem2reg only looks for alloca instructions in the entry block of the function. Being in the entry block guarantees that the alloca is only executed once, which makes analysis simpler.
  • mem2reg only promotes allocas whose uses are direct loads and stores. If the address of the stack object is passed to a function, or if any funny pointer arithmetic is involved, the alloca will not be promoted.
  • mem2reg only works on allocas of first class values (such as pointers, scalars and vectors), and only if the array size of the allocation is 1 (or missing in the .ll file).
  • mem2reg is not capable of promoting structs or arrays to registers. Note that the "scalarrepl" pass is more powerful and can promote structs, "unions", and arrays in many cases.

All of these properties are easy to satisfy for most imperative languages, and we'll illustrate it below with Kaleidoscope. The final question you may be asking is: should I bother with this nonsense for my front-end? Wouldn't it be better if I just did SSA construction directly, avoiding use of the mem2reg optimization pass? In short, we strongly recommend that you use this technique for building SSA form, unless there is an extremely good reason not to. Using this technique is:

  • Proven and well tested: llvm-gcc and clang both use this technique for local mutable variables. As such, the most common clients of LLVM are using this to handle a bulk of their variables. You can be sure that bugs are found fast and fixed early.
  • Extremely Fast: mem2reg has a number of special cases that make it fast in common cases as well as fully general. For example, it has fast-paths for variables that are only used in a single block, variables that only have one assignment point, good heuristics to avoid insertion of unneeded phi nodes, etc.
  • Needed for debug info generation: Debug information in LLVM relies on having the address of the variable exposed so that debug info can be attached to it. This technique dovetails very naturally with this style of debug info.

If nothing else, this makes it much easier to get our front-end up and running, and is very simple to implement. Lets extend Kaleidoscope with mutable variables now!

Now that we know the sort of problem we want to tackle, lets see what this looks like in the context of our little Kaleidoscope language. We're going to add two features:

  • The ability to mutate variables with the ‘=' operator.
  • The ability to define new variables.

While the first item is really what this is about, we only have variables for incoming arguments as well as for induction variables, and redefining those only goes so far :). Also, the ability to define new variables is a useful thing regardless of whether we will be mutating them. Here's a motivating example that shows how we could use these:

# Define ':' for sequencing: as a low-precedence operator that ignores operands# and just returns the RHS.def binary : 1 (x y) y;# Recursive fib, we could do this before.def fib(x)if (x < 3) then1else
 fib(x-1)+fib(x-2);# Iterative fib.def fibi(x)
 var a = 1, b = 1, c = 0 in
 (for i = 3, i < x in
 c = (a + b) :
 a = b :
 b = c) :
 b;# Call it.
fibi(10);

At this point in Kaleidoscope’s development, it only supports variables for two things: incoming arguments to functions and the induction variable of ‘for’ loops. For consistency, we’ll allow mutation of these variables in addition to other user-defined variables. This means that these will both need memory locations.

We introduce a new var syntax which behaves much like the let notation in Haskell. We will let the user define a sequence new variable names and inject these in new variables into the symbol table.

dataExpr...|LetNameExprExprderiving (Eq, Ord, Show)

The parser for will allow for multiple declarations on a single and right fold the AST node bodies, allowing us to use variables declared earlier in the list in subsequent declarations (i.e. var x = 3, y = x + 1).

letins ::ParserExpr
letins =do
 reserved "var"
 defs <- commaSep $do
 var <- identifier
 reservedOp "="
 val <- exprreturn (var, val)
 reserved "in"
 body <- exprreturn$foldr (uncurryLet) body defs

The code generation for this new syntax is very straight forward, we simply allocate a new reference and assign it to the name given then return the assigned value.

cgen (S.Let a b c) =do
 i <- alloca double
 val <- cgen b
 store i val
 assign a i
 cgen c

We can test out this new functionality. Note that code below is unoptimized and involves several extranous instructions that would normally be optimized away by mem2reg.

ready> def main(x) var y = x + 1 in y;
; ModuleID = 'my cool jit'
define double @main(double %x) {
entry:%0 = alloca double
 store double %x, double*%0%1 = alloca double%2 = load double*%0%3 = fadd double %2, 1.000000e+00
 store double %3, double*%1%4 = load double*%1
 ret double %4
}
Evaluated to: 1.0

Mutation of existing variables is also quite simple. We we'll sepcial case our code generator for the "=" operator to add internal logic for looking up the LHS variable and assign it the right hand side using the store operation.

cgen (S.BinaryOp"=" (S.Var var) val) =do
 a <- getvar var
 cval <- cgen val
 store a cvalreturn cval

Testing this out for a trivial example we find that we can now update variables.

ready> def main(x) x = 1;
; ModuleID = 'my cool jit'
define double @main(double %x) {
entry:%0 = alloca double
 store double %x, double*%0
 store double 1.000000e+00, double*%0
 ret double 1.000000e+00
}
Evaluated to: 1.0

Finally we can write down our Fibonacci example using mutable updates.

def fibi(x)
 var a = 1, b = 1, c = 0 in
 (for i = 3, i < x, 1.0 in 
 c = (a + b) : 
 a = b : 
 b = c
 ): b;
fibi(10);

With this, we completed what we set out to do. Our nice iterative fib example from the intro compiles and runs just fine. The mem2reg pass optimizes all of our stack variables into SSA registers, inserting PHI nodes where needed, and our front-end remains simple: no “iterated dominance frontier” computation anywhere in sight.

define double @fibi(double %x) #0 {
entry:
 br label %for.loopfor.loop: ; preds = %for.loop, %entry%0 = phi double [ %4, %for.loop ], [ 3.000000e+00, %entry ]%1 = phi double [ %3, %for.loop ], [ 1.000000e+00, %entry ]%2 = phi double [ %1, %for.loop ], [ 1.000000e+00, %entry ]%3 = fadd double %2, %1%4 = fadd double %0, 1.000000e+00%5 = fcmp ult double %4, %x
 br i1 %5, label %for.loop, label %for.exitfor.exit: ; preds = %for.loop%6 = call double @"binary:"(double 0.000000e+00, double %3)
 ret double %6
}

Running the optimizations we see that we get nicely optimal assembly code for our loop.

fibi: # @fibi# BB#0: # %entry
 vmovsd .LCPI2_0(%rip), %xmm2
 vmovsd .LCPI2_1(%rip), %xmm3
 vmovaps %xmm2, %xmm1
 vmovaps %xmm2, %xmm4
 .align 16, 0x90
.LBB2_1: # %for.loop
 vmovaps %xmm1, %xmm5
 vaddsd %xmm4, %xmm5, %xmm1
 vaddsd %xmm2, %xmm3, %xmm3
 vucomisd %xmm0, %xmm3
 vmovaps %xmm5, %xmm4
 jb .LBB2_1# BB#2: # %for.exit
 vmovaps %xmm1, %xmm0
 ret

See src/chapter7 for the full source from this chapter.

Welcome to the final chapter of the "Implementing a language with LLVM" tutorial. In the course of this tutorial, we have grown our little Kaleidoscope language from being a useless toy, to being a semi-interesting (but probably still useless) toy. :)

It is interesting to see how far we've come, and how little code it has taken. We built the entire lexer, parser, AST, code generator, and an interactive run-loop (with a JIT!) by-hand in under 700 lines of (non-comment/non-blank) code.

Our little language supports a couple of interesting features: it supports user defined binary and unary operators, it uses JIT compilation for immediate evaluation, and it supports a few control flow constructs with SSA construction.

Part of the idea of this tutorial was to show how easy and fun it can be to define, build, and play with languages. Building a compiler need not be a scary or mystical process! Now that we've seen some of the basics, I strongly encourage you to take the code and hack on it. For example, try adding:

  • global variables - While global variables have questionable value in modern software engineering, they are often useful when putting together quick little hacks like the Kaleidoscope compiler itself. Fortunately, our current setup makes it very easy to add global variables: just have value lookup check to see if an unresolved variable is in the global variable symbol table before rejecting it.
  • typed variables - Kaleidoscope currently only supports variables of type double. This gives the language a very nice elegance, because only supporting one type means that we never have to specify types. Different languages have different ways of handling this. The easiest way is to require the user to specify types for every variable definition, and record the type of the variable in the symbol table along with its Value*.
  • arrays, structs, vectors, etc - Once we add types, we can start extending the type system in all sorts of interesting ways. Simple arrays are very easy and are quite useful for many different applications. Adding them is mostly an exercise in learning how the LLVM getelementptr instruction works: it is so nifty/unconventional, it has its own FAQ! If we add support for recursive types (e.g. linked lists), make sure to read the section in the LLVM Programmer's Manual that describes how to construct them.
  • standard runtime - Our current language allows the user to access arbitrary external functions, and we use it for things like "printd" and "putchard". As we extend the language to add higher-level constructs, often these constructs make the most sense if they are lowered to calls into a language-supplied runtime. For example, if we add hash tables to the language, it would probably make sense to add the routines to a runtime, instead of inlining them all the way.
  • memory management - Currently we can only access the stack in Kaleidoscope. It would also be useful to be able to allocate heap memory, either with calls to the standard libc malloc/free interface or with a garbage collector. If we would like to use garbage collection, note that LLVM fully supports Accurate Garbage Collection including algorithms that move objects and need to scan/update the stack.
  • debugger support - LLVM supports generation of DWARF Debug info which is understood by common debuggers like GDB. Adding support for debug info is fairly straightforward. The best way to understand it is to compile some C/C++ code with "clang -g -O0" and taking a look at what it produces.
  • exception handling support - LLVM supports generation of zero cost exceptions which interoperate with code compiled in other languages. You could also generate code by implicitly making every function return an error value and checking it. You could also make explicit use of setjmp/longjmp. There are many different ways to go here.
  • object orientation, generics, database access, complex numbers, geometric programming, ... - Really, there is no end of crazy features that we can add to the language.
  • unusual domains - We've been talking about applying LLVM to a domain that many people are interested in: building a compiler for a specific language. However, there are many other domains that can use compiler technology that are not typically considered. For example, LLVM has been used to implement OpenGL graphics acceleration, translate C++ code to ActionScript, and many other cute and clever things. Maybe you will be the first to JIT compile a regular expression interpreter into native code with LLVM? Have fun try doing something crazy and unusual. Building a language like everyone else always has, is much less fun than trying something a little crazy or off the wall and seeing how it turns out. If you get stuck or want to talk about it, feel free to email the llvmdev mailing list: it has lots of people who are interested in languages and are often willing to help out.

llvm-as

The assembler transforms the human readable LLVM assembly to LLVM bitcode.

Usage:

$ clang -S -emit-llvm hello.c -c -o hello.ll
$ llvm-as hello.ll -o hello.bc

llvm-dis

The disassembler transforms the LLVM bitcode to human readable LLVM assembly.

Usage:

$ clang -emit-llvm hello.c -c -o hello.bc
$ llvm-dis< hello.bc |less

llvm-link

llvm-link links multiple LLVM modules into a single program.

Usage:

$ llvm-link foo.ll bar.ll -o foobar.ll

lli

lli is the LLVM interpreter, which can directly execute LLVM bitcode.

Usage:

$ clang -emit-llvm hello.c -c -o hello.bc
$ lli hello.bc
$ lli -use-mcjit hello.bc

llc

llc is the LLVM backend compiler, which translates LLVM bitcode to native code assembly.

Usage:

$ clang -emit-llvm hello.c -c -o hello.bc
$ llc hello.bc -o hello.s
$ cc hello.s -o hello.native
$ llc -march=x86-64 hello.bc -o hello.s
$ llc -march=arm hello.bc -o hello.s

opt

opt reads LLVM bitcode, applies a series of LLVM to LLVM transformations and then outputs the resultant bitcode. opt can also be used to run a specific analysis on an input LLVM bitcode file and print out the

Usage:

$ clang -emit-llvm hello.c -c -o hello.bc
$ opt -mem2reg hello.bc
$ opt -simplifycfg hello.bc
$ opt -inline hello.bc
$ opt -dce hello.bc
$ opt -analyze -view-cfg hello.bc
$ opt -bb-vectorize hello.bc
$ opt -loop-vectorize -force-vector-width=8

IBM's Watson For Business: The $1 Billion Siri Slayer | Fast Company | Business + Innovation

$
0
0

Comments:"IBM's Watson For Business: The $1 Billion Siri Slayer | Fast Company | Business + Innovation"

URL:http://www.fastcompany.com/3024604/ibms-watson-for-business-the-1-billion-siri-slayer


IBM is announcing the launch of a new, $1 billion Watson Business Group, $100 million in venture capital earmarks toward new Watson apps, and a shiny new Watson headquarters in New York's East Village neighborhood on Thursday. The news means that IBM has essentially made a $1 billion bet on Watson, the big data-cloud service hybrid which famously competed on Jeopardy and, more importantly, saves lives. If it succeeds, IBM will have the means to provide Watson's services to every American workplace.

Put another way, IBM wants to transform Watson into a Siri for business. The platform is designed for users to ask Watson questions, with Watson giving answers--such as medical diagnoses for hard-to-diagnose diseases, or the likely outcome of business decisions--on the spot. The iconic tech multinational has excelled for decades at enterprise services, but has had much less success targeting home users (just remember how IBM PC clones became de rigeur in the 1980s and the failure of OS/2, but the new Watson Business Group appears to be aimed at making Watson as ubiquitous as IBM's computer equipment. Stephen Gold, vice president of IBM Watson Solutions, tells Fast Company that over 2000 employees will work in support of the new Watson Business Group, which will preside over an app ecosystem where over 750 applicants have expressed interest in developing Watson-based apps using $100 million in future IBM equity investment.

Gold says a big part of this push grew from Watson being used in industries that do not currently use much big data or analytics. Two new Watson products, IBM Watson Discovery Advisor and IBM Watson Analytics Advisor, target a mass business audience. According to an IBM press release, the Discovery Advisor takes aim at publishing, education, and health care. And Analytics Advisor could be a game-changer for IBM if it works as promised: The software “allows business users to send natural language questions and raw data sets into the cloud, for Watson to crunch, uncover, and visualize insights; without the need for advanced analytics training. After analyzing the data, Watson will deliver results to its users through graphic representations that are easy to understand, interact with, and share with colleagues; all in a matter of minutes.”

The 2,000+ employees of the Watson Business Group will be headquartered at 51 Astor in New York's East Village. Twitter is reportedly looking at office space in the building and the headquarters of AOL and the Huffington Post (and soon Facebook) are directly across the street. The formerly bohemian neighborhood is situated right next to New York University and Cooper Union, too. (Expect the line at Ippudo ramen house to grow even more spectacularly longer, as well). Alongside conventional office space, the Watson headquarters will also include space for a tech incubator for startups building Watson-based apps and an events area that will offer “networking opportunities” alongside workshops and seminars for Watson developers. “We chose (the Silicon Alley location) because of access to universities, talent, and accessibility, “ Gold said. “Being based in New York City is central for drawing talent for Watson.”

Photo by Dan Winters

Also buried in IBM's announcement is boring but important news: The company is reengineering Watson to be deployed on Softlayer, a cloud computing firm IBM purchased for $2 billion in 2013. Watson's lack of integration with Softlayer is reportedly a sore spot for IBM customers; as of press time, no date is available for the Softlayer integration. Wall Street Journal reporter Spencer E. Ante also pointed out that “Watson's basic learning process requires IBM engineers to master the technicalities of a customer's business--and translate those requirements into usable software. The process has been arduous.”

Over the past few years, IBM has worked hard to sell Watson to health care consumers and to turn the big data platform into a linchpin of research medicine. This past November, the company announced a series of new Watson apps for health care, including WatsonPaths--a fascinating product that crunches text from tens of thousands of medical textbooks in order to generate semantic data. When doctors and researchers ask WatsonPaths a question about a hard-to-diagnose patient, Watson uses the data it was fed to refine diagnoses and suggest treatment options. It's an early effort, to be sure, but an example of the transformative things that can be done with data these days. IBM has also built Watson partnerships with medical facilities and educational institutions around the country.

As we spoke, Gold seemed most excited about the new ecosystem that is being built around Watson--this past November, IBM introduced a Watson API that's a precursor to the new Watson business ecosystem and this is being expanded into a larger app store in 2014. This year's currently slated apps seem interesting but not world-changing--personal shopping app Fluid, products by medical supply chain firm MD Buyline, and products by WellTok, a company which lets health insurers engage with consumers. At press time, the Royal Bank of Canada and the Nielsen Company are confirmed as firms speaking with IBM about Watson business possibilities.

Gold tells Fast Company that Watson for Business is "one of the top innovations in IBM's history" and it could even be the biggest IBM innovation since the IBM PC: If IBM can make the UI and search functions easy enough to grasp, it'll be them--and not Apple's Siri or Google's Google Now--that really makes big data part of our everyday lives. But the onus is on IBM to convince the business world that Watson is worth adopting.

Correction: An earlier version of this article incorrectly stated that Watson makes diagnoses for medical patients. Rather, it refines potential diagnoses and suggests treatment options.

Review Cryptocat Mobile's source code, 3 months before it hits app stores

Devcharm | Must-have Django packages.

$
0
0

Comments:"Devcharm | Must-have Django packages."

URL:https://devcharm.com/pages/79-must-have-django-packages


Whenever you need to integrate your web app with StackOverflow, run jobs asynchronously, debugging slow pages, or building an API, there is an extension you can easily pip install. This list contains some of the most interesting Django extensions out there.

Authentication and authorization

  • Python social auth is the most comprehensive social authentication/registration mechanism for Python. The backend support is massive: you can authenticate against more than 50 providers. Install it via pip install python-social-auth

  • Django Guardian implements a per object permissions for your models. Install it via pip install django-guardian

  • Django OAuth Toolkit provides out of the box all the endpoints, data and logic needed to add OAuth2 provider capabilities to your Django projects. It can be nicely integrated with Django REST framework. Install it via pip install django-oauth-toolkit

  • django-allauth Integrated set of Django applications addressing authentication, registration, account management as well as 3rd party (social) account authentication.

Backend

  • Celery. Celery is the de facto standard to manage asynchronous, distributed job queues, and it can be easely integrated in your Django app. Install it via pip install Celery

  • Django REST framework is an insanely awesome framework to build REST APIs. It manages for you serialization, throttling, content negotiation, pagination and—drum roll—it builds a browsable API for free, so developers can browse and experiment with your API from the web browser. Install it via pip install djangorestframework

  • Django stored messages is a small, non-intrusive app which integrates smoothly with Django’s messages framework (django.contrib.messages) and lets users decide which messages have to be stored on the database backend thus making them available over sessions.

  • django-cors-headers is a tiny app for setting up CORS headers. Very handful to manage cross-domain requests in your Django apps (e.g. a javascript client served by a CDN). Install it via pip install django-cors-headers

Debugging

  • Debug toolbar. Ever wondered why your app is so freaking slow? Debug toolbar is a nice plugin that will show you all the SQL queries Django is doing to render your page, and much more. Install it via pip install django-debug-toolbar

Static Assets

  • Django Storages is a powerful and configurable plugin to make storing your static assets on an external service super easy. Simply run python manage.py collectstatic after installing it to copy all modified static files to your chosen backend. The most popular add-on works with the python-boto library to let you store those files on Amazon S3 using their cheap, easy-to-use, and fast file buckets. Install it via pip install django-storages

  • Django Pipeline is a static asset packaging library for Django, providing both CSS and JavaScript concatenation and compression. Supporting multiple compilers (LESS, SASS, et al), multiple compressors for CSS and JS, it gives you pleanty of customizability. Pipeline also works nicely with Django Storages and other storage backends. Install it with pip install django-pipeline.

Utils

  • Reversion provides version control facilities to your models. With a few configuration lines, you can recover deleted models or roll back to any point in a model's history. The integration with the Django admin interface takes seconds. Install it via pip install django-reversion

  • Django extensions is a collection of 17 custom extensions for the Django Framework. The most notable ones are: shell_plus, a shell with autoloading of the apps database models; RunScript, to run scripts in the Django context; graph_models, to render a graphical overview of your models (it's extremely useful); sqldiff, to print the ALTER TABLE statements for the given appnames. Install it via pip install django-extensions

  • Django braces is a collection of reusable, generic mixins for Django providing common behaviours and patterns for views, forms and other components. Very effective on removing boilerplates. Install it via pip install django-braces

PS: loads of useful comments on hacker news.


Sense · Cloud Data Science Platform

Lessons from working 6 months on a math problem (and failing)

$
0
0

Comments:"Lessons from working 6 months on a math problem (and failing)"

URL:http://alexandros.resin.io/lessons-from-working-6-months-on-a-math-problem-and-failing/


I'm a coder, most certainly not a mathematician. But when I saw the 17x17 challenge in late 2011, I couldn't resist having a crack at it. You can read a bit more on the problem itself here. Each day, I couldn't resist spending another day on it. Long story short, despite some temporary success, I didn't end up solving the problem. But I consider that time spent, about 6 months of full-time effort, to have been an incredibly worthwhile investment. Below is the furthest I got, a grid with 3 monochromatic rectangles (every corner of the rectangles marked is on the same colour). Still not zero which was the goal, but for a few months, it was the 'least broken' solution known.

It might not look like much, but it took crazy amounts of time to find the right colourings for those damn cells. The solution ended up being found by means of a SAT solver, but here's what I learned in the process of not finding it:

1. A problem which is easy to state can addict quite severely

I was able to get some people hooked on the problem just by explaining it to them in a few minutes. The same process probably happened with me and lots of others on the internet who got caught up with this problem over the years it took to solve. Maybe there's something there to learn about motivation, or perhaps asymmetries like this are just a rare occurence.

2. Optimisation with instant feedback is incredibly addictive

As I started writing code for this problem, I found out I could work on it for hours on end without distraction, quite unusual for me. I chalk this up to a tight feedback loop. Have an idea, implement it, get back a number. Think of ways to improve the number, start the cycle all over again. This is a cycle I would run dozens of times a day. Obviously more fundamental ideas would take more time to see the fruits of, and that's when I actually lost focus, but when the brain has been rewarded so richly with small cycles, it can afford to go a bit longer without reinforcement. This tells me that A/B testing or a similar numeric optimisation area would be quite motivating to me and I've made a mental note to go into this in the future.

3. There are some really big numbers out there

The space of possible colourings for the 17x17 problem is almost 10^174. Comprehending this number is beyond the human mind. If you took the number of all particles in the observable universe and squared it, you would still be a factor of 10^14 off.

While I can't say I grasped that number in the slightest, I do feel it has recalibrated my sense of what 'big' is. The earth's population which nears 7 billion now feels like a decidedly small number.

4. The value of optimisations at different levels of the stack

Once my basic strategy was set, a lot of the work came down to optimisation. I categorise that work into three fairly separate categories: mathematical, algorithmic, and implementation/hardware.

By mathematical optimisations, I mean ones where discovering a symmetry in the search space allowed me to prove that solving one case equated solving a whole class of cases, as each member of the class was equivalent to every other member. If I discovered a symmetry that folded 32 cases into one, I automatically had a 32x speedup.

Algorithmic optimisations are more pedestrian. A certain computation needs to be made, and finding a better way to compute it means speeding up the whole process, since as a rule these computations ran millions of times. By the end, I realised that indeed the best way is to have the code do almost nothing.

Implementation optimisations are ones that have to do with better mapping to hardware. For instance at some point I decided that Python wasn't going to cut it and I'd need to drop to C. By translating my algorithms unchanged into C I got about a 10x improvement. This was a very naive translation done in a day, without me having much experience in C past some classes in university.

Of course, I also cheated. I realised that my code spent a huge amount of time counting the set bits on a byte. I could implement that on my side, or get a CPU that implements that as a single instruction. I did spend quite some time implementing faster versions of it, mostly ending up with lookup tables, but at the end of the day, the i7 family of CPUs have POPCNT as a builtin. Moving to the i7 caused a big speedup, and using the popcnt builtin was another large boost to speed. Unfortunately I don't have exact data but it was certainly integer multiples of the previous level of performance.

The vast array of optimisation techniques I could bring to bear was definitely a surprise. Generally the mathematical optimisations tended to dominate whenever I could get them, but each optimisation level in aggregate yielded many orders of magnitude worth of improvement in my final result. While I don't have exact numbers, I do recall that a single run went from around 4 minutes to tens of ms, and then the mathematical optimisations reduced the amount of runs needed from a theoretical 2 * 10^20 to about 100 million. All in all, by the end I was able to do a complete run in a bit over a day on 2 cores of an i7. That run produced the matrix above, but unfortunately, no solution.

5. The interplay between thinking in code and thinking on paper

It's clear that an algorithmic breakthrough can be worth 1000 micro-optimisations. But sometimes the opposite happened. Glancing through the results of an algorithmic run showed a pattern I wasn't aware of. Taking that back to paper allowed me to make further theoretical breakthroughs that sped up runtime by additional orders of magnitude. So getting my hands dirty with code, besides giving non-trivial speedups, also hinted at mathematical symmetries not to be sneezed at.

6. Deep understanding of combinatorics and symmetries

In other words, I learned some math on the intuitive level. My knowledge of symbols is still probably poor, but I do have a mental toolkit on combinatorics developed that I trot out once in a while, and will probably invest more in in the future. I hear combinatorics is a good avenue to probability, and probability is something I really want to get good at.

7. Intuitive understanding of trees

Working through combinatorics, one cannot avoid getting very familliar with the tree data structure. It's not that I couldn't understand a tree before, but there is a whole other level of 'burnt in' that something can get when you keep digging those neural pathways day in and day out. It helped me work on parsers and grammars soon after, and to this day I think I use the tree structure to consider possible avenues of action in my current role as 'business guy'.

8. If your approach is wrong, hard work doesn't matter

Despite the enormous amount of work I put in, I bet on an early assumption that I knew would make or break the effort. I did this knowingly as I felt I couldn't beat more experienced mathematicians without cheating somehow. My assumption ended up being wrong, and therefore my code yielded no results. No matter how hard I worked, it wouldn't have made a difference.

This is always something to keep in mind when working on startups. If your high-order bit is set wrong, you can optimise the lower ones to infinity and while the final cause of death might be 'ran out of money', the effort may be finished before it starts for reasons like 'chose too small a market'.

9. If you're passionate enough, it may be the direct results that don't matter.

This is not to say that just because you might fail you shouldn't try. While working on this problem, I knew full well I could and probably would fail. But doing a mental check on the pleasure and learning I drew from the excercise, I constantly came up positive, such that actually finding a solution would be the icing on the cake. I feel the same way about my startup. I want it to succeed, in fact I'm doing my absolute best to ensure it does, but at the same time I know I'm already on the plus side in terms of experience gained such that if it all was to evaporate tomorrow, I'd still have been more than adequately rewarded for my efforts.

10. The immense joy of shutting everything out and falling into absolute focus

I consider my focus to be very fragile. It has taken many years of trial and error (or just aging) to approach some stable level of productivity, and even this is definitely not 100%. While working on the 17x17 however, I was able to experience something completely new for me: absolute focus approaching bliss. Some times I'd start working in the morning, and when my girlfriend would come back home in the evening I would feel like I'm waking up into reality, only to spend a few hours resting to start again next morning. There's something special to be said about that, regardless of all the other lessons above. My girlfriend did mention that I may be slipping into Uncle Petros territory, but thankfully I was able to stop on time to finish my PhD and move on.

I'd like to avoid closing a cliche worthy of a whiskey commercial such as "...Because sometimes, the journey is worth more than the destination...". Instead, I'll just say that I'd trade several of my minor successes for another failure this good. Maybe another problem will come and capture my imagination in a similar way in the future.

Fun aside: HN user TN1ck did up the actual 17x17 solution in d3.js and SVG/AngularJS, using the technique described in my previous post. Cool stuff!

Udacity's Sebastian Thrun, Godfather Of Free Online Education, Changes Course | Fast Company | Business + Innovation

$
0
0

Comments:"Udacity's Sebastian Thrun, Godfather Of Free Online Education, Changes Course | Fast Company | Business + Innovation"

URL:http://www.fastcompany.com/3021473/udacity-sebastian-thrun-uphill-climb


There's a story going around college campuses--whispered about over coffee in faculty lounges, held up with great fanfare in business-school sections, and debated nervously by chain-smoking teaching assistants.

It begins with a celebrated Stanford University academic who decides that he isn't doing enough to educate his students. The Professor is a star, regularly packing 200 students into lecture halls, and yet he begins to feel empty. What are 200 students in an age when billions of people around the world are connected to the Internet?

So one day in 2011, he sits down in his living room with an inexpensive digital camera and starts teaching, using a stack of napkins instead of a chalkboard. "Welcome to the first unit of Online Introduction to Artificial Intelligence," he begins, his face poorly lit and slightly out of focus. "I'll be teaching you the very basics today." Over the next three months, the Professor offers the same lectures, homework assignments, and exams to the masses as he does to the Stanford students who are paying $52,000 a year for the privilege. A computer handles the grading, and students are steered to web discussion forums if they need extra help.

Some 160,000 people sign up: young men dodging mortar attacks in Afghanistan, single mothers struggling to support their children in the United States, students in more than 190 countries. The youngest kid in the class is 10; the oldest is 70. Most struggle with the material, but a good number thrive. When the Professor ranks the scores from the final exam, he sees something shocking: None of the top 400 students goes to Stanford. They all took the class on the Internet. The experiment starts to look like something more.

Higher education is an enormous business in the United States--we spend approximately $400 billion annually on universities, a figure greater than the revenues of Amazon, Apple, Facebook, Google, Microsoft, and Twitter combined--and the Professor has no trouble rounding up a group of Silicon Valley's most prestigious investors to support his new project. The Professor's peers follow suit: Two fellow Stanford faculty members launch a competing service the following spring, with tens of millions of dollars from an equally impressive group of backers, and Harvard and MIT team up to offer their own platform for online courses. By early 2013, nearly every major institution of higher learning--from the University of Colorado to the University of Copenhagen, Wesleyan to West Virginia University--will be offering a course through one of these platforms.

Suddenly, something that had been unthinkable--that the Internet might put a free, Ivy League–caliber education within reach of the world's poor--seems tantalizingly close. "Imagine," an investor in the Professor's company says, "you can hand a kid in Africa a tablet and give him Harvard on a piece of glass!" The wonky term for the Professor's work, massive open online course, goes into such wide use that a New York Times headline declares 2012 the "Year of the MOOC." "Nothing has more potential to lift more people out of poverty," its star columnist Thomas Friedman enthuses, terming the new category "a budding revolution in global online higher education."

It is a good story, as well manicured as a college quad during homecoming weekend. But there's a problem: The man who started this revolution no longer believes the hype.

"I'd aspired to give people a profound education--to teach them something substantial," Professor Sebastian Thrun tells me when I visit his company, Udacity, in its Mountain View, California, headquarters this past October. "But the data was at odds with this idea."

As Thrun was being praised by Friedman, and pretty much everyone else, for having attracted a stunning number of students--1.6 million to date--he was obsessing over a data point that was rarely mentioned in the breathless accounts about the power of new forms of free online education: the shockingly low number of students who actually finish the classes, which is fewer than 10%. Not all of those people received a passing grade, either, meaning that for every 100 pupils who enrolled in a free course, something like five actually learned the topic. If this was an education revolution, it was a disturbingly uneven one.

"We were on the front pages of newspapers and magazines, and at the same time, I was realizing, we don't educate people as others wished, or as I wished. We have a lousy product," Thrun tells me. "It was a painful moment." Turns out he doesn't even like the term MOOC.

When Thrun says this, I nearly fall out of my chair. He is arguably the most famous scientist in the world--and perhaps only Elon Musk bests him in successfully persuading regular people to embrace wild ideas. Thrun has been a public figure since 2005, when a modified Volkswagen Touareg of his design won a Department of Defense–sponsored competition that pitted cars without drivers through a 128-mile, pedestrian-free course in the Mojave Desert. That such a competition almost seems ho-hum eight years later is itself a tribute to Thrun's genius. He joined Google in 2007, where he led the program to develop its self-driving car, and then founded Google X, the ultra-secretive research lab behind Google Glass and other research projects so far-out that Google calls them "moon shots."

But building a company is different from building a research lab. It requires compromises, humility, and, crucially, taking in more money than you spend. And it's why Thrun might be giving up the moon--free education for all! Harvard on a piece of glass!--in favor of something far more pedestrian. It will be, Thrun admits, "the biggest shift in the history of the company," a pivot that involves charging money for classes and abandoning academic disciplines in favor of more vocational-focused learning. In short, Thrun must prove that Udacity is something more than a good story.

Sebastian Thrun is in a hurry.

"Let's just get dressed here," he says, leading me into an empty suite two floors below the Udacity offices. He tosses a pair of bike cleats and a Lycra cycling kit onto the ground, kicks off his sneakers, starts taking off his pants, and then motions for me to do the same. "I don't mind," he says. There's no locker room at the Udacity office, so he's led me downstairs, into a part of the building that is still under construction--never mind the floor-to-ceiling windows. After a few awkward seconds, I move into an adjacent room, throw on my gear, and follow Thrun east toward the Los Altos Hills.

Thrun, who is 46 years old and originally from Germany, is a committed athlete who possesses that outdoorsy vigor (and lack of physical modesty) often found in middle-aged European men. He has run half a dozen marathons; he snowboards; he kite-surfs; and he is an avid road cyclist. "I haven't been biking as much as I'd normally like to," Thrun confesses before we set out, explaining that he's done "only two" centuries, or 100-mile bike rides, this year.

I'd been warned that keeping up with Thrun tends to be a challenge in any setting, but I hadn't entirely appreciated it until Thrun clipped into his custom-made road bike and scooted up Arastradero Road, leaving me panting a few lengths behind. "Sebastian is like the smartest guy you've ever met, but on speed," says the entrepreneur Steve Blank, a friend of Thrun's and a Udacity investor. "And he hates to lose."

When I catch up to him, trying not to seem out of breath, he acknowledges that he normally doesn't ride with anyone, for this very reason. "I feel like everyone has this competitive instinct," he says. "And I want to be able to go at my own pace. I have trouble with all of these little decisions of running a company. Being alone--that helps."

The youngest of three children in a lower-middle-class family in Hildesheim, a town of 100,000 just outside Hannover, Thrun was a geeky kid, spending much of his free time in libraries or in front of a NorthStar Horizon home computer, on which he tried to write software programs to solve puzzles and play solitaire. As a lonely undergraduate at an obscure provincial college, Thrun thrust himself into trying to understand people better, dabbling in psychology, economics, and medicine. Eventually, he found his way to what was at the time a relatively obscure field: artificial intelligence, or the study of making machines that make their own decisions. "Nobody phrases it this way, but I think that artificial intelligence is almost a humanities discipline," Thrun says. "It's really an attempt to understand human intelligence and human cognition."

Thrun seems to owe much of his academic success to this early insight. As his peers wrestled with theoretical quandaries and high mathematics, Thrun's work had a romantic, populist flair. He designed and built robots around human problems, and gave them accessible names. Rhino, part of his thesis project at the University of Bonn, gave guided tours of the local museum. During a stint at Carnegie Mellon University, Thrun developed Pearl, a Jetsons-like "nursebot" with a human-looking face, to assist in elder-care facilities. His greatest achievement, though, was Stanley, the autonomous car that won Stanford a $2 million Defense Department prize and won Thrun the notice of Google cofounder Larry Page.

Thrun and his team originally planned to spin their research out into their own company that would create detailed images of the world's roads, using car-mounted cameras like the ones used to steer Stanley. Page offered to hire them instead. The collaboration helped lay the groundwork for Google Street View, and eventually for the fleet of self-driving Google-branded Priuses that these days navigate rush-hour traffic on Bay Area freeways without incident. Page and cofounder Sergey Brin went on to ask Thrun to launch Google X.

His trip in March of 2011 to the TED Conference in Long Beach, California, where he delivered a talk about his work, led to an unexpected change in his plans. Thrun movingly recounted how a high school friend had been killed in a car accident, the result of the kind of human error that self-driving cars would eliminate. Although he was well received, Thrun was upstaged by a young former hedge-fund analyst named Sal Khan, who spoke of using cheaply produced, wildly popular web videos to tutor millions of high school students on the Internet. Thrun's competitive streak kicked in. "I was a fully tenured Stanford professor . . . and here's this guy who teaches millions," he would later recount. "It was embarrassing." Though Thrun insists the timing was coincidental, just a few weeks later, he informed Stanford that he would be giving up tenure and joining Google full time as a VP. (He did continue teaching and is still a faculty member.)

Initially, Udacity was just another modest research project on Thrun's docket; he didn't even bother warning the higher-ups in the computer science department until after he had announced that first AI class. After two weeks, more than 56,000 students had signed up. "The conversation took a radically different turn," says Blank of his friend's interaction with Stanford after the response far outpaced anyone's expectations. The university was initially cool to the idea but ultimately embraced it, allowing two other computer science courses to be offered in the same manner. (Blank's popular entrepreneurship class at Stanford would eventually be offered on Udacity as well.) Thrun contributed $300,000 of his own money in seed funding, installed one of his old Stanford graduate students, David Stavens, as CEO of the new company, and set about recording crude course videos about Markov models and the like.

"It was this catalytic moment," Thrun says. "I was educating more AI students than there were AI students in all the rest of the world combined."

"It was this catalytic moment," Thrun says. "I was educating more AI students than there were AI students in all the rest of the world combined." By the end of the semester, he'd raised another $5 million and was standing in front of the Digital Life Design conference in Munich, promising a world in which education was nearly free, available to poor people in the developing world, and better than anything that had come before it. "I can't teach at Stanford again," he said definitively. "I feel like there's a red pill and a blue pill. And you can take the blue pill and go back to your classroom and lecture your students. But I've taken the red pill. I've seen Wonderland."

It's hard to imagine a story that more thoroughly flatters the current sensibilities of Silicon Valley than the one into which Thrun stumbled. Not only is reinventing the university a worthy goal--tuition prices at both public and private colleges have soared in recent years, and the debt burden borne by American students is more than $1 trillion--but it's hard to imagine an industry more ripe for disruption than one in which the professionals literally still don medieval robes. "Education hasn't changed for 1,000 years," says Peter Levine, a partner with Andreessen Horowitz and a Udacity board member, summing up the Valley's conventional wisdom on the topic. "Udacity just seemed like a fundamentally new way to change how communities of people are educated."

The dream that new technologies might radically disrupt education is much older than Udacity, or even the Internet itself. As rail networks made the speedy delivery of letters a reality for many Americans in the late 19th century, correspondence classes started popping up in the United States. The widespread proliferation of home radio sets in the 1920s led such institutions as New York University and Harvard to launch so-called Colleges of the Air, which, according to an article in The Chronicle of Higher Education, prompted a 1924 journalist to contemplate a world in which the new medium would be "the chief arm of education" and suggest that "the child of the future [would be] stuffed with facts as he sits at home or even as he walks about the streets with his portable receiving-set in his pocket." Udacity wasn't even the first attempt to deliver an elite education via the Internet: In 2001, MIT launched the OpenCourseWare project to digitize notes, homework assignments, and, in some cases, full video lectures for all of the university's courses.

And yet, all of these efforts have been hampered by the same basic problem: Very few people seem to finish courses when they're not sitting in a lecture hall. Udacity employs state-of-the-art technology and sophisticated pedagogical strategies to keep their users engaged, peppering students with quizzes and gamifying their education with progress meters and badges. But a recent study found that only 7% of students in this type of class actually make it to the end. (This is even worse than for-profit colleges such as the University of Phoenix, which graduates 17% of its full-time online students, according to the Department of Education.) Although Thrun initially positioned his company as "free to the world and accessible everywhere," and aimed at "people in Africa, India, and China," the reality is that the vast majority of people who sign up for this type of class already have bachelor's degrees, according to Andrew Kelly, the director of the Center on Higher Education Reform at the American Enterprise Institute. "The sort of simplistic suggestion that MOOCs are going to disrupt the entire education system is very premature," he says.

Thrun had assumed that low completion rates in his early classes would be temporary, and during Udacity's early days he continued to spend most of his time at Google, recording his Udacity classes in the middle of the night. His investors had been urging him to expand his role for months, and in May 2012, Thrun informed Page and Brin that he'd have to step down from Google X to focus on Udacity. For the first time in his life, he was now CEO of a company. "There was no one who understood the nuances of what he was trying to accomplish as well as Sebastian did," says Levine, who led a $15 million investment in Udacity, on behalf of Andreessen Horowitz, in October 2012. (Thrun still serves as a part-time consultant to Google X, spending one day a week working there.) "If it hadn't been for Sebastian," says Levine, "we wouldn't have done this investment."

Thrun initially approached the problem of low completion rates as one that he could solve single-handedly. "I was looking at the data, and I decided I would make a really good class," he recalls. Statistics 101, taught by the master himself and recorded that summer, is interactive and full of accessible analogies. Most important, it is designed so that students who are not particularly adept at math or programming can make it through. Thrun told me that he tried to smile whenever he was recording a voice-over, so that even though he couldn't be seen, his enthusiasm for the subject would be imputed to his online students. "From a pedagogical perspective, it was the best I could have done," he says. "It was a good class."

Only it wasn't: For all of his efforts, Statistics 101 students were not any more engaged than any of Udacity's other students. "Nothing we had done had changed the drop-off curve," Thrun acknowledges.

He then set about a number of other initiatives to address this thorny problem, including hiring "mentors," many of them former academics looking for a change, to moderate class forums and offer help via live chats. But he also pursued the more obvious way to incentivize students to finish their courses: He offered college credit. In late 2012, Thrun proposed a collaboration to California Governor Jerry Brown, who had been struggling to cope with rising tuition costs, poor student performance, and overcrowding in state universities. At a press conference the following January, Brown and Thrun announced that Udacity would open enrollment in three subjects--remedial math, college algebra, and elementary statistics--and they would count toward credit at San Jose State University, a 30,000-student public college. Courses were offered for just $150 each, and students were drawn from a lower-income high school and the underperforming ranks of SJSU's student body. "A lot of these failures are avoidable," Thrun said at the press conference. "I would love to set these students up for success, not for failure."

Viewed within this frame, the results were disastrous. Among those pupils who took remedial math during the pilot program, just 25% passed. And when the online class was compared with the in-person variety, the numbers were even more discouraging. A student taking college algebra in person was 52% more likely to pass than one taking a Udacity class, making the $150 price tag--roughly one-third the normal in-state tuition--seem like something less than a bargain. The one bright spot: Completion rates shot through the roof; 86% of students made it all the way through the classes, better than eight times Udacity's old rate. (The program is supposed to resume this January; for more on the pilot, see "Mission Impossible.")

But for Thrun, who had been wrestling over who Udacity's ideal students should be, the results were not a failure; they were clarifying. "We were initially torn between collaborating with universities and working outside the world of college," Thrun tells me. The San Jose State pilot offered the answer. "These were students from difficult neighborhoods, without good access to computers, and with all kinds of challenges in their lives," he says. "It's a group for which this medium is not a good fit."

BEEP-BEEP-BEEP.

A 43-year-old instructor named Chris Wilson sits hunched over a tablet computer in a soundproof recording studio--one of three in Udacity's offices--and hits a button that emits three quick tones that indicate the start of a new take.

The room is dark except for two bright drafting lamps pointed at the table. A digital camera mounted above his head records everything he writes, and a small headset microphone--the kind worn by megachurch pastors and TED talkers--records everything he says. Lounging on a beanbag chair just outside the studio is Udacity course developer Sean Bennett, who is staying close at hand in case Wilson needs help with a last-minute revision. All Udacity classes are scripted and storyboarded in advance by the same five-person in-house team, which means they generally look more uniform and polished than those offered by the competition. "A lot of the scripting process is thinking about what the students are going to be doing," Bennett says. "The words are mostly Chris's."

I watch as Wilson--a big man with wavy shoulder-length hair, wearing a baggy T-shirt and cargo shorts--struggles to communicate a web-development concept called fluid layout, which allows pages to render properly on differently sized screens. "Now, fluid layout means I should stop fixing all those width--eh. All right."

He tries again, and then stumbles a few words later. "The average for me is probably about three takes," he says.

If Wilson seems slightly unprofessional as an educator, that's because his only formal teaching credential is as an assistant scuba-diving instructor. Wilson works at Google as a developer advocate in the company's Chrome division. His class was conceived, and paid for, by Google as a way to attract developers to its platforms. Over the past year, Udacity has recruited a dozen or so companies, including Autodesk, Intuit, Cloudera, Nvidia, 23andMe, and Salesforce.com, which had sent a couple of reps to discuss a forthcoming course on how to best use its application programming interface, or API. The companies pay to produce the classes and pledge to accept the certificates awarded by Udacity for purposes of employment.

Udacity won't disclose how much it is making, but Levine of Andreessen Horowitz says he's pleased. "The attitude from the beginning, about how we'd make money, was, 'We'll figure it out,'" he says. "Well, we figured it out."

Thrun, ever a master of academic branding, terms this sponsored-course model the Open Education Alliance and says it is both the future of Udacity and, more generally, college education. "At the end of the day, the true value proposition of education is employment," Thrun says, sounding more CEO than professor. "If you focus on the single question of who knows best what students need in the workforce, it's the people already in the workforce. Why not give industry a voice?"

"At the end of the day, the true value proposition of education is employment," Thrun says, sounding more CEO than professor. "Why not give industry a voice?"

Thrun's friends and colleagues repeatedly told me that he has a great capacity for intellectual flexibility. "Most founder–CEOs have this belief that their vision of the universe will prevail and everyone else's vision will lose," says George Zachary, a partner with Charles River Ventures and Thrun's first investor. "Sebastian is the opposite. He's so far away from Steve Jobs on the CEO spectrum, it's amusing."

Still, I couldn't help but feel as if Thrun's revised vision for Udacity was quite a comedown from the educational Wonderland he had talked about when he launched the company. Learning, after all, is about more than some concrete set of vocational skills. It is about thinking critically and asking questions, about finding ways to see the world from different points of view rather than one's own. These, I point out, are not skills easily acquired by YouTube video.

Thrun seems to enjoy this objection. He tells me he wasn't arguing that Udacity's current courses would replace a traditional education--only that it would augment it. "We're not doing anything as rich and powerful as what a traditional liberal-arts education would offer you," he says. He adds that the university system will most likely evolve to shorter-form courses that focus more on professional development. "The medium will change," he says.

It might already be changing. This January, several hundred computer science students around the world will begin taking classes for an online master's degree program being jointly offered by Udacity and the Georgia Institute of Technology. Fees will be substantial--$6,600 for the equivalent of a three-semester course of study--but still less than one-third of what an in-state student would pay at Georgia Tech, and one-seventh of the tuition charged to an out-of-state one.

It's a bold program, partly because it is the first accredited degree to be offered by a provider of massive open online courses, but also because of how it's structured. Georgia Tech professors will teach the courses and handle admissions and accreditation, and students will get a Georgia Tech diploma when they're done, but Udacity will host the course material. Thrun expects the partnership to generate
$1.3 million by the end of its first year. The sum will be divided 60-40 between the university and Udacity, respectively, giving the startup its single largest revenue source to date.

Crucially, the program won't ultimately cost either Udacity or Georgia Tech anything. Expenses are being covered by AT&T, which put up $2 million in seed capital in the hope of getting access to a new pool of well-trained engineers. "There's a recruiting angle for us, but there's also a training angle," says Scott Smith, an SVP of human resources at the telco. Though Smith says the grant to Georgia Tech came with no strings attached, AT&T plans to send a large group of its employees through the program and is in talks with Udacity to sponsor additional courses as well. "That's the great thing about this model," Smith says. "Sebastian is reaching out to us and saying, 'Help us build this--and, oh, by the way, the payoff is you get instruction for your employees.'" Says Zachary, "The Georgia Tech deal isn't really a Georgia Tech deal. It's an AT&T deal."

I first became acquainted with Thrun's work nearly 10 years ago, in a very traditional university setting. I was getting my bachelor's degree in English--an experience that, I must say, taught me very little of obvious professional value but nonetheless seemed worth the outrageously high price--and had been required to take three science classes. In the final semester of my senior year, I took an introduction to mechanical engineering, where the professor showed us a video of the first DARPA Grand Challenge. I remember being moved by the quiet beauty of a driverless car winding up hills in an empty desert, and when I saw pictures of Stanley the following year, I felt a sense of awe, like a little boy getting a good look at a car for the first time.

I tell Thrun this, and he seems flattered. "They put it in the Smithsonian Air and Space Museum," he says proudly. "So now a lot of 8- and 9-year-olds know who I am."

"I hope [my 5-year-old son] can hit the workforce relatively early and engage in lifelong education," Thrun says. "I wish to do away with the idea of spending one big chunk of time learning."

Thrun's 5-year-old son, Jasper, is not yet old enough to be impressed by his father's work, but he's already starting his education. "In my son's kindergarten, they're telling us how to get him into Stanford," he says. "By their advice, I'm doing everything wrong, because I'm trying to make him happy rather than putting him through as many piano lessons as possible." He dreams that his son will take a less conventional view of education. "I hope he can hit the workforce relatively early and engage in lifelong education," Thrun says. "I wish to do away with the idea of spending one big chunk of time learning."

I ask Thrun if it isn't odd that someone like him--someone for whom the traditional education system has done so much--would wind up railing against it. "Innovation means change," he says. "I could restrict myself to helping a class of 20 insanely smart Stanford students who would be fine without me. But how could that impact not be dwarfed by teaching 160,000 students?"

All visionary entrepreneurs must, at some point, find their own sense of romance in the compromises they make to build a profitable business, and the size of the crowd is where Thrun finds his. He's moved by the idea of many, many students from many, many places learning something because of him--even if it's something as mundane as a Salesforce.com API. I have a hard time believing that he really wants his son to get Salesforce certified rather than Stanford educated, but in this one thing Thrun seems entirely earnest.

Two days after our bike ride, I return to the Udacity offices, where Thrun is rerecording a segment for his statistics class. He'd mistakenly used an incorrect notation in writing out a math problem, and he's returned to the studio to get it right, spending an hour or so alone in the dark room, talking into the microphone and scribbling on a tablet. "It's kind of like being onstage, where you have all these lights in your face and can't see the audience, but you still have to be able to excite them," he says. "So I think of the football stadium full of people that I'm facing. I get a kick out of that." Thrun's taken the red pill. There's no going back.

Frameworkless JavaScript – Why Angular, Ember, or Backbone don't work for us

India Manages to Free Itself of Polio - WSJ.com

Best development book I've read, has no code in it.

$
0
0

Comments:"Best development book I've read, has no code in it."

URL:http://arasatasaygin.com/pages/best-development-book-I-read-has-no-code-in-it.html


ARAS ATASAYGIN

Apprenticeship Patterns by Dave Hoover and Adewale Oshineye is a great guidance book for technical craftsmans. For me the main doctrine of the book is walking the long road. As the book says: "The people who walk The Long Road are not the heroes who sprint for a few years and burn out—they are the people moving at a sustainable pace for decades."

Here are some of my highlight from the book:

Mastering is more than just knowing. It is knowing in a way that lightens your load.

If you’re worried that your current job is rotting your brain, it probably is.

The best way to learn is to be in the same room with people who are trying to achieve some goal using the skills you wish to learn.

“How long will it take to master aikido?” a prospective student asks. “How long do you expect to live?” is the only respectable response.

Expose Your Ignorance. Tomorrow I need to look stupider and feel better about it. This staying quiet and trying to guess what’s going on isn’t working so well.

Just as the runner training for a marathon develops stronger leg muscles. She’s not training to have strong legs; she’s training to run. Like the motivated developer who after working on a Python project for two years achieves a deep knowledge of Python, the marathon runner’s strong leg muscles are a means, not an end.

Be the Worst. Be the lion’s tail rather than the fox’s head! Surround yourself with developers who are better than you. Find a stronger team where you are the weakest member and have room to grow.

Software development is composed of two primary activities: learning and communication.

You have been drinking steadily through a straw. But there are seasons in an apprenticeship when one must drink from the fire hose of information available to most software developers. Expanding your ability to take in new information is a critical, though sometimes overwhelming, step for apprentices. You must develop the discipline and techniques necessary to efficiently absorb new information, as well as to understand it, retain it, and apply it.

We can all benefit by doing occasional “toy” programs, when artificial restrictions are set up, so that we are forced to push our abilities to the limit.

If you hang on long enough, people will start calling you “experienced,” but that should not be your goal. All experience indicates is that you have been able to survive. It does not indicate the amount you have learned, only the time you have spent. Your goal should be to become skilled rather than experienced.

Software is not a product, it’s a medium for storing knowledge. Therefore, software development is not a product producing activity, it is a knowledge acquiring activity.

Sometimes the best tool for the job and the one you’re most familiar with are different. At those times, you have to decide if your productivity is more important than the team’s productivity.

Being a genius, lucky, rich, or famous doesn’t make you a master. These things aren’t essential to craftsmanship. Skill across all facets of software development and the ability to transmit that skill in ways that move the discipline forward are at the heart of the craft.

For a craftsman to starve is a failure; he’s supposed to earn a living at his craft.

Working with masters is the best way to learn a craft.

Check my other posts

How were the pyramids built? British engineer dubbed Indiana James stuns archaeologists with new theory - Mirror Online

$
0
0

Comments:"How were the pyramids built? British engineer dubbed Indiana James stuns archaeologists with new theory - Mirror Online"

URL:http://www.mirror.co.uk/news/weird-news/how-were-pyramids-built-british-3010204


Peter James said he believes ancient Egyptians formed the structures by piling up rubble on the inside and attaching bricks later

A British engineer dubbed Indiana James has stunned archaeologists by claiming their theories on how pyramids were built are wrong.

Peter James said he believes ancient Egyptians formed the structures by piling up rubble on the inside and attaching bricks later.

His shock claim challenges hundreds of years of accepted belief that the pyramids were built with giant blocks carried up huge ramps.

The structural engineer reckons that would be impossible as the ramps would have had to have been at least a quarter of a mile long to get the right angle for the bricks to be taken to such great heights.

Good point: Pyramid

Getty

 

.  

Mr James, who has spent the past 20 years studying the pyramids, said: "Under the current theories, to lay the two million stone blocks required the Egyptians would had to have laid a large block once every three minutes on long ramps.

"If that happened, there would still be signs that the ramps had been there, and there aren't any.

"I'm going to have a war with archaeologists. They will say: 'How would you know? You're not an archaeologist.' But if you wanted a house built would you use me or an archaeologist? "Archaeologists have never had the engineering experience."

Mr James, an engineer for 54 years, and his team at Cintec International in Newport, South Wales, are world leaders in restoring ancient structures.

 


Pux - High Performance Router for PHP.

Stephen Colbert urged to cancel speech for NSA-linked privacy firm RSA | World news | theguardian.com

$
0
0

Comments:" Stephen Colbert urged to cancel speech for NSA-linked privacy firm RSA | World news | theguardian.com "

URL:http://www.theguardian.com/world/2014/jan/10/stephen-colbert-nsa-linked-privacy-firm-rsa


Privacy rights groups are calling on comedian Stephen Colbert to cancel his guest speaker appearance at a conference organised by RSA, the security firm accused of accepting millions from the National Security Agency to weaken encryption software.

The host of Comedy Central’s Colbert Report is due to be the closing speaker at RSA’s annual conference in San Francisco in February. A number of security experts scheduled to speak at the conference have already dropped out following reports that RSA was paid $10m by the NSA to distribute a flawed encryption that allowed the security agency to bypass security protections on personal computers and other products.

The Guardian reported last September that the NSA was using a battery of methods to undermine encryption, the codes used to keep users’ data private online. Last month Reuters revealed that RSA was paid $10m by the NSA to incorporate a weakened algorithm into an encryption product called BSafe that would allow the spy agency easier access to protected information.

RSA has been one of the most respected names in online security. It is now part of EMC, one of the world’s largest data storage and cloud computing companies. The payment for the adoption of a flawed system by a company with a long history of championing online privacy caused widespread anger in the tech community.

The company has vehemently denied that it knowingly undermined its own encryption. “Recent press coverage has asserted that RSA entered into a ‘secret contract’ with the NSA to incorporate a known flawed random number generator into its BSafe encryption libraries. We categorically deny this allegation,” it said in a statement last month.

Digital rights group Fight For The Future has now set up an online petition asking Colbert to withdraw from the conference. “Last month, we learned that RSA accepted $10m from the NSA to stick a backdoor in one of their encryption products, and intentionally weaken the safety of the entire internet.

“We know you, Stephen, and we know you love a good backdoor as much as we do – but this is no laughing matter. By colluding with the NSA and covering it up, RSA has endangered all of us,” says the petition.

Earlier this week Google software engineer, Adam Langley, Mozilla’s global chief of privacy Alex Fowler and six other security and privacy experts announced they would cancel their talks at this year’s conference. “I've become convinced that a public stance serves more than self-aggrandisement, so: I've pulled out of the Cryptographers Panel at RSA 2014,” Langley said via Twitter.

Jeffrey Carr, a respected cybersecurity analyst, has also withdrawn from the conference and called for a boycott. “It's not enough to just talk about how bad this is. RSA's parent EMC, like every other corporation, has a board of directors that is answerable to its shareholders for maximizing revenue. If RSA's customers begin canceling their contracts and/or refuse to buy RSA products, the company's earnings will drop and that's the type of message that forces boards to make changes,” he wrote on his blog.

Holmes Wilson, co-founder of Fight For The Future, said: “Colbert isn’t a technologist but he understands this issue very well. His appearance at this conference will let participants laugh about something that is a very serious issue. I’d like to hear his speech too but this is not the venue.”

RSA and Colbert were not immediately available for comment. 

Daily chart: Money can buy happiness | The Economist

Swatch Internet Time - Wikipedia, the free encyclopedia

$
0
0

Comments:"Swatch Internet Time - Wikipedia, the free encyclopedia"

URL:https://en.wikipedia.org/wiki/Swatch_Internet_Time


Swatch Internet Time (or beat time) is a decimal time concept introduced in 1998 by the Swatch corporation as part of their marketing campaign for their line of "Beat" watches.

Instead of hours and minutes, the mean solar day is divided up into 1000 parts called ".beats". Each .beat lasts 1 minute and 26.4 seconds. Times are notated as a 3-digit number out of 1000 after midnight. So, @248 would indicate a time 248 .beats after midnight representing 248/1000 of a day, just over 5 hours and 57 minutes.

There are no time zones in Internet Time; instead, the new time scale of Biel Meantime (BMT) is used, based on Swatch's headquarters in Biel, Switzerland and equivalent to Central European Time, West Africa Time, and UTC+1. Unlike civil time in Switzerland and many other countries, Swatch Internet Time does not observe daylight saving time.

History[edit]

Swatch Internet Time was announced on October 23, 1998, in a ceremony at the Junior Summit '98, attended by Nicolas G. Hayek, President and CEO of the Swatch Group, G.N. Hayek, President of Swatch Ltd., and Nicholas Negroponte, founder and then-director of the MIT Media Lab. During the Summit, Swatch Internet Time became the official time system for Nation1, an online country (supposedly) created and run by children.

During 1999, Swatch produced several models of watch, branded "Swatch .beat", that displayed Swatch Internet Time as well as standard time, and even convinced a few websites (such as CNN.com) to use the new format.[citation needed]. PHP's date() function has a format specifier 'B' which returns the Swatch Internet Time notation for a given time stamp.[1] It is also used as a time reference on ICQ, and the online role-playing gamePhantasy Star Online has used it since its launch on the Dreamcast in 2000 to try to facilitate cross-continent gaming (as the game allowed Japanese, American and European players to mingle on the same servers). In March 2001, Ericsson released the T20e, a mobile phone which gave the user the option of displaying Internet Time. Outside these areas, however, it appears to be infrequently used. While Swatch still offers the concept on its website, it no longer markets beat watches.

Beatnik satellite controversy[edit]

In early 1999, Swatch began a marketing campaign based around the launch of their Beatnik satellite for a set of Internet Time watches. However, they were criticized for planning to use an amateur radio frequency for broadcasting a commercial message (an act banned by international treaties). Swatch reallocated the transmitter batteries to a different function on the MIR space station, thus the satellite never broadcast.[2]

Description[edit]

The concept was touted as an alternative, decimal measure of time. One of the supposed goals was to simplify the way people in different time zones communicate about time, mostly by eliminating time zones altogether.

Instead of hours and minutes, the mean solar day is divided up into 1000 parts called ".beats". Each .beat lasts 1 minute and 26.4 seconds. Although Swatch does not specify units smaller than one .beat, third party implementations have extended the standard by adding "centibeats" or "sub-beats" as a decimal fraction, for extended precision: @248.00.[3][4] One .beat is equal to one decimal minute in French decimal time.

Time zones[edit]

There are no time zones; instead, the new time scale of Biel Meantime (BMT) is used, based on the company's headquarters in Biel, Switzerland. Despite the name, BMT does not refer to mean solar time at the Biel meridian (7°15′E), but to the standard time there. It is equivalent to the Central European Time and West Africa Time or UTC+1.

Like UTC, Swatch Internet Time is the same throughout the world. For example, when the time is 875 .beats, or @875, in New York, it is also @875 in Tokyo: 0.875 × 24 hours = 21:00 BMT = 20:00 UTC. Unlike civil time in most European countries, Internet Time does not observe daylight saving time and thus it matches Central European Time during (European) winter and Western European Summer Time, which is observed by the United Kingdom, Ireland, Portugal and the Canary Islands (Spain), during summer.

Notation[edit]

The most distinctive aspect of Swatch Internet Time is its notation; as an example, "@248" would indicate a time 248 .beats after midnight, equivalent to a fractional day of 0.248 CET, or 04:57:07.2 UTC. No explicit format was provided for dates, although the Swatch website formerly displayed the Gregorian calendar date in the order day-month-year, separated by periods and prefixed by the letter d (d31.01.99).

Time unit conversion[edit]

Unit Beat Conversion
1 day1000 .beats
1 hour41.6 .beats
1 min0.694 .beats
1 s0.011574 .beats

See also[edit]

References[edit]

External links[edit]

NSA makes final push to retain most mass surveillance powers | World news | theguardian.com

$
0
0

Comments:" NSA makes final push to retain most mass surveillance powers | World news | theguardian.com "

URL:http://www.theguardian.com/world/2014/jan/10/nsa-mass-surveillance-powers-john-inglis-npr


The National Security Agency and its allies are making a final public push to retain as much of their controversial mass surveillance powers as they can, before President Barack Obama’s forthcoming announcement about the future scope of US surveillance.

Security officials concede a need for greater transparency and for adjustments to broad domestic intelligence collection, but argue that limiting the scope of such collection would put the country at greater risk of terrorist attacks. 

In a lengthy interview that aired on Friday on National Public Radio (NPR), the NSA’s top civilian official, the outgoing deputy director John C Inglis, said that the agency would cautiously welcome a public advocate to argue for privacy interests before the secret court which oversees surveillance. Such a measure is being promoted by some of the agency’s strongest legislative critics.

Inglis also suggested that the so-called Fisa court have “somebody who would assist them with matters of interpreting technology”, which also has the potential to recast the court’s relationship with the NSA.

Currently, the judges on the panel rely entirely on the NSA to explain how the agency’s complex technological systems work, an institutional disadvantage that judges have highlighted in secret rulings bemoaning “systemic” misrepresentations by the powerful surveillance agency. 

But security officials are arguing strongly against curtailing the substance of domestic surveillance activities. 

While Inglis conceded in his NPR interview that at most one terrorist attack might have been foiled by NSA’s bulk collection of all American phone data – a case in San Diego that involved a money transfer from four men to al-Shabaab in Somalia– he described it as an “insurance policy” against future acts of terrorism. 

“I'm not going to give that insurance policy up, because it's a necessary component to cover a seam that I can't otherwise cover,” Inglis said.

White House spokesman Jay Carney said Friday that Obama will unveil his surveillance proposals on January 17. Expectations are high that Obama will follow the recommendations of a review group he set up, which suggested that the responsibility for the bulk domestic call records database should be transferred from the NSA to a third party, such as the phone companies. But Inglis said that would not necessarily mean the end of the program, provided any dataset held outside of NSA had “sufficient depth” and “sufficient breadth” over “the whole haystack” of call records going back for years, and provided “sufficient agility” to the NSA to search it.

That is the subject of a heated dispute between the NSA and privacy advocates at the White House this week. Civil libertarian groups want to ensure that the legal standards for NSA to access phone records are heightened to prevent Americans not suspected of wrongdoing from being caught in surveillance dragnets, and that companies are not required to store data for longer than the 18-month average maximum in the industry.

On Thursday afternoon, a coalition of civil liberties groups met the White House counsel, Kathryn Ruemmler. Michelle Richardson of the American Civil Liberties Union, who was at the meeting, said the coalition’s main message to the White House was to end bulk domestic phone collection, rather than repackage it. 

“Bulk collection is the big one, that’s the big question: whether you continue to spy on Americans or not,” Richardson said after the meeting. “You have to resolve that. That’s all people will remember.”

Inglis was bolstered on Thursday by the new FBI director James Comey, who said he opposed curbing the bureau’s power to collect information from businesses through a non-judicial subpoena called a national security letter. The use of national security letters, which occurs in secret, came under sharp criticism from Obama’s surveillance review panel, which advocated judicial approval over them.

Comey told reporters that would make it harder for his agency to investigate national security issues than conduct bank fraud investigations.

Surveillance skeptics who left meetings at the White House this week said they believe deliberations are still ongoing internally about how far back to scale US surveillance. Ron Wyden, the Oregon Democrat who sits on the Senate intelligence committee, told the Guardian on Thursday the “debate is clearly fluid”.

On Friday, Wyden and two intelligence committee colleagues, the Democrats Mark Udall and Martin Heinrich, released a letter to Obama in which they urged him to definitively end bulk surveillance on domestic phone data.

“While it might be more convenient for the NSA to collect phone records in bulk rather than directing individual queries to the various phone companies, convenience alone does not justify the collection of the personal information of millions of ordinary, innocent Americans, especially when the same or more information can be obtained in a timely manner using less intrusive methods,” the three senators wrote.

Obama’s White House staff were meeting representatives of tech firms on Friday, concluding a packed week packed of meetings with surveillance stakeholders.

Even as Inglis, who will soon retire, argued against the restriction of his agency’s powers, he said the burden will be on the NSA to be more forthcoming about what its activities actually are. 

“We have to actually kind of be more transparent going forward, so the American public understands what we do, why we do it, how we do it,” he told NPR.

Viewing all 9433 articles
Browse latest View live