Quantcast
Channel: Hacker News 50
Viewing all 9433 articles
Browse latest View live

Eve Online wages largest war in its 10 year history | Polygon

$
0
0

Comments:"Eve Online wages largest war in its 10 year history | Polygon"

URL:http://www.polygon.com/2014/1/28/5352774/eve-online-wages-largest-battle-in-its-10-year-history


A massive battle involving more than 2,200 players in main battle is underway in CCP's massively multiplayer online game Eve Online, easily the largest battle in the game's decade-long history, according to Alexander "The Mittani" Gianturco, the CEO of Goonswarm Federation.

An exact real-life currency figure of the in-game damage is not currently cemented, but unconfirmed estimates put the amount at more than $200,000. An automated battle report will be generated from server data once the battle concludes. The current battle stems from 'The Halloween War" which instigated multiple warfares between unofficial political alliances.

"But today's battle happened literally by accident: system control (sovereignty) was dropped in a key staging system due to a mistake," Gianturco told Polygon about the war underway in B-R5 sector in Immensea, a Nulli Secunda system. "This then escalated into the largest battle in Eve history, which delightfully our forces seem to be winning."

Last year, a fight erupted in the lowsec Asakai system after a ship piloted by a member of Goonswarm accidentally jumped into enemy space alone, which escalated as more ships joined the fray to attack and defend. According to Gianturco, the participants in the current fight were involved in last year's battle, which involved more than 2,800 players.

"This is a bigger battle in terms of the hardware and players involved though there's less raw numbers in a single system — this fight is over multiple systems," Gianturco explained. "The main fight is in B-R5, with approximately 2,200 players. The blocs are keeping the numbers intentionally low to make sure our guns don't lag out."

More than 50 Titans, capital class starships, have been destroyed in the 15-hour battle so far. Gianturco estimates that his side has lost 18 to 23 Titans, at the time of writing, and that the other side is down more than 40 Titans so far with "more to come." Notable Titan kills include The Kan's Erebus, which is valued at 222b ISK, or approximately $5,500.

"As vengeance for Asakai goes, it's somewhat ironic; our forces lost three Titans and seven supercarriers last year in Asakai, and lost the battle," he said. "This year we've killed 40+ hostile Titans and we have seven more hours of killing before downtime."

Beginning at 14:30 GMT, there is less than seven hours left of fighting to go. Gianturco explains that their forces — made up of his coalition, the CFC, along with a group called Black Legion and another coalition called the "Russian Bloc" — have already won the battle and all that remains is mopping up work.

"We've already won the fight, what remains is just killing work when someone breaks and tries to run from the field it becomes a rout," he said. "They've lost the ability to kill any more of our Titans and we're still killing theirs, as their force quality has degraded. So this is mop-up as they try to extract."

To check out the battle in action, users dobbbnick fuzzeh and sdeel are streaming the Eve Online fight on Twitch.tv.


Court of Appeals denies IP-block | Duly noted | bureau Brandeis

$
0
0

Comments:"Court of Appeals denies IP-block | Duly noted | bureau Brandeis"

URL:http://bureaubrandeis.com/duly-noted/court-appeals-denies-ip-block/


Today, the Court of Appeals of The Hague rendered its judgment in the appeal of internet service providers XS4ALL and Ziggo against anti-piracy organization BREIN. In first instance, the District Court allowed Brein’s claims: an IP-block and DNS-block. Purpose of the block was to prevent the subscribers of the providers to access The Pirate Bay-website.

The Court of Appeals overturned the ruling, since the providers could show that the block had not been effective since the first ruling. In applying the case law from the European Court of Justice (ECJ), the Court of Appeal held that an access provider is not under an obligation to take measures that are disproportional and/or ineffective.

XS4ALL was represented by Christiaan Alberdingk Thijm and Caroline de Vries of bureau Brandeis.

Read the judgment Arrest Hof Den Haag 20140128

 

 

HubSpot/tether · GitHub

$
0
0

Comments:"HubSpot/tether · GitHub"

URL:https://github.com/HubSpot/tether


Tether

Tether is a javascript library for efficiently making an absolutely positioned element stay next to another element on the page.

It aims to be the canonical implementation of this type of positioning, such that you can build products, not positioning libraries.

Take a look at the documentation for a more detailed explanation of why you should star it now to remember it for your next project.

This is the one map you need to understand Ukraine’s crisis

$
0
0

Comments:"This is the one map you need to understand Ukraine’s crisis"

URL:http://www.washingtonpost.com/blogs/worldviews/wp/2014/01/24/this-is-the-one-map-you-need-to-understand-ukraines-crisis/


Ukraine's protests and the 2010 election results. Click to enlarge. (Max Fisher/Washington Post)

After two months of rallies in the capital city of Kiev against President Viktor Yanukovych's decision to reject a deal for closer integration with the European Union, Ukraine's protests are spreading to other major cities throughout the country's west. Protesters have even seized government administrative buildings in several regional capitals, heightening concerns about where Ukraine's crisis will go.

What's happening in Ukraine is about much more than the anger over Yanukovych rejecting the European Union deal and drawing the country closer to Russia. To help explain what's going on, I've put this map together up top. The red stripes show regions where mass protests are surrounding the regional capital buildings. The black stripes show regions where protesters have actually seized the government administrative buildings. The blue regions are where Yanukovych won a majority in the last presidential election, in 2010; dark blue means he won at least 70 percent. Orange regions show where Yulia Tymoshenko, then prime minister and candidate for a pro-European party, won the majority; she won at least 70 percent in dark orange regions.

Here's why this map is important: There is a big dividing line in Ukrainian politics -- an actual, physical line that separates the north and west from the south and east. You can see it in this map and in just about every electoral map since the country's independence. That divide goes beyond the question of whether Ukraine faces toward Europe or toward Russia, but that question is a major factor. And it's polarizing.

This map drives two things home: First is that the protests are practically endemic in the half of the country that voted against Yanukovych, which includes Kiev. Second, the protests are not really a factor in the half who voted for Yanukovych. That doesn't mean that people in the blue areas adore Yanukovych, but they're certainly not pouring out into the streets to oppose him. It also doesn't mean that the protesters lack legitimate gripes or that it's just about their candidate losing. The economy is in terrible shape, and the government recently imposed severe restrictions against free speech, media and assembly rights, which is part of why the protests kicked back up again.

In other words, in the European-facing half of Ukraine, the orange half, the protests are even more widespread and severe than you might have gathered from watching the media coverage. But it's important to keep in mind that the other half of the country, the blue half, is much quieter.

You may be wondering, then, why there is such a consistent and deep divide between these two halves of Ukraine. Here's the really crucial thing to understand about Ukraine: A whole lot of the country speaks Russian, rather than Ukrainian. This map shows the country's linguistic divide, which you may notice lines up just about perfectly with its political divide.

Ukraine's protests and linguistic breakdown. Click to enlarge. (Max Fisher/Washington Post)

Ukrainian is the majority and official language of Ukraine. But, as a legacy of of the country's subjugation by Russia, many Ukrainians speak Russian, which is the native language for about one-third of the population. The Russian speakers are clustered in the south and east. A significant chunk of them are ethnic Russian, as well. In some regions, more than three-quarters of the population speaks Russian as their primary language.

Heavily Russian-speaking regions can tend to be more sympathetic (or at least less hostile) to policies that bring their country closer to Russia, as Yanukovych has been doing. But the Ukrainian-speaking regions have historically sought a Ukrainian national identity that is less Russia-facing and more European. So this is about politics, yes, but it's also about identity, about the question of what it means to be Ukrainian.

Ukraine's ethno-lingistic political division is sort of like the United States' "red America" and "blue America" divide, but in many ways much deeper -- imagine if red and blue America literally spoke different languages. The current political conflict, which at its most basic level is over whether the country will lean toward Europe or toward Russia, is part of a long-running and unresolved national identity crisis. Yes, it's also about Yanukovych's failures to fix the economy and his draconian restrictions against basic freedoms. But there's so much more to it than that, which helps make the crisis so intractable.

linux - When should I not kill -9 a process? - Unix & Linux Stack Exchange

$
0
0

Comments:"linux - When should I not kill -9 a process? - Unix & Linux Stack Exchange"

URL:http://unix.stackexchange.com/questions/8916/why-not-kill-9-a-process


I use kill -9 in much the same way that I throw kitchen implements in the dishwasher: if a kitchen implement is ruined by the dishwasher then I don't want it.

The same goes for most programs (even databases): if I can't kill them without things going haywire, I don't really want to use them. (And if you happen to use one of these non-databases that encourages you to pretend they have persisted data when they haven't: well, I guess it is time you start thinking about what you are doing).

Because in the real world stuff can go down at any time for any reason.

People should write software that is tolerant to crashes. In particular on servers. You should learn how to design software that assumes that things will break, crash etc.

The same goes for desktop software. When I want to shut down my browser it usually takes AGES to shut down. There is nothing my browser needs to do that should take more than at most a couple of seconds. When I ask it to shut down it should manage to do that immediately. When it doesn't, well, then we pull out kill -9 and make it.

There’s Only Four Billion Floats–So Test Them All! | Random ASCII

$
0
0

Comments:"There’s Only Four Billion Floats–So Test Them All! | Random ASCII"

URL:http://randomascii.wordpress.com/2014/01/27/theres-only-four-billion-floatsso-test-them-all/


A few months ago I saw a blog post touting fancy new SSE3 functions for implementing vector floor, ceil, and round functions. There was the inevitable proud proclaiming of impressive performance and correctness. However the ceil function gave the wrong answer for many numbers it was supposed to handle, including odd-ball numbers like ‘one’.

The floor and round functions were similarly flawed. The reddit discussion of these problems then discussed two other sets of vector math functions. Both of them were similarly buggy.

Fixed versions of some of these functions were produced, and they are greatly improved, but some of them still have bugs.

Floating-point math is hard, but testing these functions is trivial, and fast. Just do it.

The functions ceil,floor, and round are particularly easy to test because there are presumed-good CRT functions that you can check them against. And, you can test every float bit-pattern (all four billion!) in about ninety seconds. It’s actually very easy. Just iterate through all four-billion (technically 2^32) bit patterns, call your test function, call your reference function, and make sure the results match. Properly comparing NaN and zero results takes a bit of care but it’s still not too bad.

Aside: floating-point math has a reputation for producing results that are unpredictably wrong. This reputation is then used to justify sloppiness, which then justifies the reputation. In fact IEEE floating-point math is designed to, whenever practical, give the best possible answer (correctly rounded), and functions that extend floating-point math should follow this pattern, and only deviate from it when it is clear that correctness is too expensive.

Later on I’ll show the implementation for my ExhaustiveTest function but for now here is the function declaration:

typedef float(*Transform)(float);
// Pass in a range of float representations to compare against.
// start and stop are inclusive. Pass in 0, 0xFFFFFFFF to scan all
// floats. The floats are iterated through by incrementing
// their integer representation.
void ExhaustiveTest(uint32_t start, uint32_t stop, Transform TestFunc,
 Transform RefFunc, const char* desc)

Typical test code that uses ExhaustiveTest is shown below. In this case I am testing the original SSE 2 _mm_ceil_ps2 function that started the discussion, with a wrapper to translate between float and __m128. The function didn’t claim to handle floats outside of the range of 32-bit integers so I restricted the test range to just those numbers:

float old_mm_ceil_ps2(float f)
{
 __m128 input = { f, 0, 0, 0 };
 __m128 result = old_mm_ceil_ps2(input);
 return result.m128_f32[0];
}
int main()
{
 // This is the biggest number that can be represented in
 // both float and int32_t. It’s 2^31-128.
 Float_t maxfloatasint(2147483520.0f);
 const uint32_t signBit = 0×80000000;
 ExhaustiveTest(0, (uint32_t)maxfloatasint.i, old_mm_ceil_ps2, ceil,
 "old _mm_ceil_ps2");
 ExhaustiveTest(signBit, signBit | maxfloatasint.i, old_mm_ceil_ps2, ceil,
 "old _mm_ceil_ps2");
}

Note that this code uses the Float_t type to get the integer representation of a particular float. I described Float_t years ago in Tricks With the Floating-Point Format .

How did the original functions do?

_mm_ceil_ps2 claimed to handle all numbers in the range of 32-bit integers, which is already ignoring about 38% of floating-point numbers. Even in that limited range it had 872,415,233 errors – that’s a 33% failure rate over the 2,650,800,128 floats it tried to handle. _mm_ceil_ps2 got the wrong answer for all numbers between 0.0 and FLT_EPSILON * 0.25, all odd numbers below 8,388,608, and a few other numbers. A fixed version was quickly produced after the errors were pointed out.

Another set of vector math functions that was discussed was DirectXMath. The 3.03 version of DirectXMath’s XMVectorCeiling claimed to handle all floats. However it failed on lots of tiny numbers, and on most odd numbers. In total there were 880,803,839 errors out of the 4,294,967,296 numbers (all floats) that it tried to handle. The one redeeming point for XMVectorCeiling is that these bugs have been known and fixed for a while, but you need the latest Windows SDK (comes with VS 2013) in order to get the fixed 3.06 version. And even the 3.06 version doesn’t entirely fix XMVectorRound.

The LiraNuna / glsl-sse2 family of functions were the final set of math functions that were mentioned. The LiraNuna ceil function claimed to handle all floats but it gave the wrong answer on 864,026,625 numbers. That’s better than the others, but not by much.

I didn’t exhaustively test the floor and round functions because it would complicate this article and wouldn’t add significant value. Suffice it to say that they have similar errors.

Sources of error

Several of the ceil functions were implemented by adding 0.5 to the input value and rounding to nearest. This does not work. This technique fails in several ways:

Round to nearest even is the default IEEE rounding mode. This means that 5.5 rounds to 6, and 6.5 also rounds to 6. That’s why many of the ceil functions fail on odd integers. This technique also fails on the largest float smaller than 1.0 because this plus 0.5 gives 1.5 which rounds to 2.0. For very small numbers (less than about FLT_EPSILON * 0.25) adding 0.5 gives 0.5 exactly, and this then rounds to zero. Since about 40% of the positive floating-point numbers are smaller than FLT_EPSILON*0.25 this results in a lot of errors – over 850 million of them!

The 3.03 version of DirectXMath’s XMVectorCeiling used a variant of this technique. Instead of adding 0.5 they added g_XMOneHalfMinusEpsilon. Perversely enough the value of this constant doesn’t match its name – it’s actually one half minus 0.75 times FLT_EPSILON. Curious. Using this constant avoids errors on 1.0f but it still fails on small numbers and on odd numbers greater than one.

NaN handling

The fixed version of _mm_ceil_ps2 comes with a handy template function that can be used to extend it to support the full range of floats. Unfortunately, due to an implementation error, it fails to handle NaNs. This means that if you call _mm_safeInt_ps<new_mm_ceil_ps2>() with a NaN then you get a normal number back. Whenever possible NaNs should be ‘sticky’ in order to aid in tracking down the errors that produce them.

The problem is that the wrapper function uses cmpgt to create a mask that it can use to retain the value of large floats – this mask is all ones for large floats. However since all comparisons with NaNs are false this mask is zero for NaNs, so a garbage value is returned for them. If the comparison is switched to cmple and the two mask operations (and and andnot) are switched then NaN handling is obtained for free. Sometimes correctness doesn’t cost anything. Here’s a fixed version:

template< __m128 (FuncT)(const __m128&) >
inline __m128 _mm_fixed_safeInt_ps(const __m128& a){
 __m128 v8388608 = *(__m128*)&_mm_set1_epi32(0x4b000000);
 __m128 aAbs = _mm_and_ps(a, *(__m128*)&_mm_set1_epi32(0x7fffffff));
 // In order to handle NaNs correctly we need to use le instead of gt.
 // Using le ensures that the bitmask is clear for large numbers *and*
 // NaNs, whereas gt ensures that the bitmask is set for large numbers
 // but not for NaNs.
 __m128 aMask = _mm_cmple_ps(aAbs, v8388608);
 // select a if greater then 8388608.0f, otherwise select the result of
 // FuncT. Note that 'and' and 'andnot' were reversed because the
 // meaning of the bitmask has been reversed.
 __m128 r = _mm_xor_ps(_mm_andnot_ps(aMask, a), _mm_and_ps(aMask, FuncT(a)));
 return r;
}

With this fix and the latest version of _mm_ceil_ps2 it becomes possible to handle all 4 billion floats correctly.

Conventional wisdom Nazis

Conventional wisdom says that you should never compare two floats for equality – you should always use an epsilon. Conventional wisdom is wrong.

I’ve written in great detail about how to compare floating-point values using an epsilon, but there are times when it is just not appropriate. Sometimes there really is an answer that is correct, and in those cases anything less than perfection is just sloppy.

So yes, I’m proudly comparing floats to see if they are equal.

How did the fixed versions do?

After the flaws in these functions were pointed out fixed versions of _mm_ceil_ps2 and its sister functions were quickly produced and these new versions work better.

I didn’t test every function, but here are the results from the final versions of functions that I did test:

  • XMVectorCeiling 3.06: zero failures
  • XMVectorFloor 3.06: zero failures
  • XMVectorRound 3.06: 33,554,432 errors on incorrectly handled boundary conditions
  • _mm_ceil_ps2 with _mm_safeInt_ps: 16777214 failures on NaNs
  • _mm_ceil_ps2 with _mm_fixed_safeInt_ps: zero failures
  • LiraNuna ceil: this function was not updated so it still has
  • 864,026,625 failures.

Exhaustive testing works brilliantly for functions that take a single float as input. I used this to great effect when rewriting all of the CRT math functions for a game console some years ago. On the other hand, if you have a function that takes multiple floats or a double as input then the search space is too big. In that case a mixture of test cases for suspected problem areas and random testing should work. A trillion tests can complete in a reasonable amount of time, and it should catch most problems.

Test code

Here’s a simple function that can be used to test a function across all floats. The sample code linked below contains a more robust version that tracks how many errors are found.

// Pass in a uint32_t range of float representations to test.
// start and stop are inclusive. Pass in 0, 0xFFFFFFFF to scan all
// floats. The floats are iterated through by incrementing
// their integer representation.
void ExhaustiveTest(uint32_t start, uint32_t stop, Transform TestFunc,
 Transform RefFunc, const char* desc)
{
 printf("Testing %s from %u to %u (inclusive).\n", desc, start, stop);
 // Use long long to let us loop over all positive integers.
 long long i = start;
 while (i <= stop)
 {
 Float_t input;
 input.i = (int32_t)i;
 Float_t testValue = TestFunc(input.f);
 Float_t refValue = RefFunc(input.f);
 // If the results don’t match then report an error.
 if (testValue.f != refValue.f &&
 // If both results are NaNs then we treat that as a match.
 (testValue.f == testValue.f || refValue.f == refValue.f))
 {
 printf("Input %.9g, expected %.9g, got %1.9g \n",
 input.f, refValue.f, testValue.f);
 }
 ++i;
 }
}

Subtle errors

My test code misses one subtle difference – it fails to detect one type of error. Did you spot it?

The correct result for ceil(-0.5f) is -0.0f. The sign bit should be preserved. The vector math functions all fail to do this. In most cases this doesn’t matter, at least for game math, but I think it is at least important to acknowledge this (minor) imperfection. If the compare function was put into ‘fussy’ mode (just compare the representation of the floats instead of the floats) then each of the ceil functions would have an additional billion or so failures, from all of the floats between -0.0 and -1.0.

References

The original post that announced _mm_ceil_ps2 can be found here – with corrected code:

http://dss.stephanierct.com/DevBlog/?p=8

This post discusses the bugs in the 3.03 version of DirectXMath and how to get fixed versions:

http://blogs.msdn.com/b/chuckw/archive/2013/03/06/known-issues-directxmath-3-03.aspx

This post links to the LiraNuna glsl-sse2 math library:

https://github.com/LiraNuna/glsl-sse2

The original reddit discussion of these functions can be found here:

http://w3.reddit.com/r/programming/comments/1p2yys/sse3_optimized_vector_floor_ceil_round_and_mod/

Sample code for VC++ 2013 to run these tests. Just uncomment the test that you want to run from the body of main.

ftp://ftp.cygnus-software.com/pub/Test4Billion.zip

I’ve written before about running tests on all of the floats. The last time I was exhaustively testing round-tripping of printed floats, which took long enough that I showed how to easily parallelize it and then I verified that they round-tripped between VC++ and gcc. This time the tests ran so quickly that it wasn’t even worth spinning up extra threads.

Like this:

LikeLoading...

Related

Facebook can now read your texts

$
0
0

Comments:"Facebook can now read your texts"

URL:http://tony.calileo.com/fb/


As of their latest update, Facebook can read your texts on Android phones.

Edit: I just found out via the Reddit comments that Facebook updates are rolled out, so this update may not be available to you yet - or you may already have it on your device. This page originally stated that the update came out today (Jan 27).

Do you have an Android phone with Facebook installed? Like most people, I blindly clicked "accept" when prompted for new permissions on Facebook's Android App update today (Jan 27). Something caught my eye, and after I cancelled the update, I look a screenshot. I figured maybe other people might be interested in what they agreed to earlier today if they updated:

This is just one of a bunch of new permissions the app is requesting for this update, but it's probably the most alarming.

Facebook, like any social media site, makes its money off of targeted advertising. Advertisements for that new Nikon camera, for instance, will sell for more money if only people interested in photography are seeing it. Facebook gathers this information from pretty much everything you feed them - that's nothing new. But now they're able to read your text messages. At this point, it's either stick with the old version, get rid of Facebook on Android, or accept that they've got access to every one of your text messages, call log, your personal contact card, the names of whatever other apps you're using..... :/

Billionaire Beach Owner Wants Californians To Keep Out : NPR

$
0
0

Comments:"Billionaire Beach Owner Wants Californians To Keep Out : NPR"

URL:http://www.npr.org/2014/01/27/264901370/california-fights-billionaire-s-keep-out-sign-for-beach-access


hide captionA view of Martins Beach, just south of San Francisco.

Courtesy of Ed Grant

A view of Martins Beach, just south of San Francisco.

Courtesy of Ed Grant

California officials are battling a Silicon Valley billionaire for public access to a spectacular slice of sand south of San Francisco.

It's the latest in an ongoing fight between the state and its richest residents over choice stretches of beach, a particular problem in the Southern California city of Malibu. But as California's economic center has shifted north, so has the battle over its coast.

This latest round involves Martins Beach, a crescent-shaped stretch of sand totally hidden from the highway.

hide captionVinod Khosla owns Martins Beach and has blocked public access with a gate and a sign that says "Private Property: Keep Out."

James Duncan Davidson/O'Reilly Media, Inc. via Wikimedia Commo

Vinod Khosla owns Martins Beach and has blocked public access with a gate and a sign that says "Private Property: Keep Out."

James Duncan Davidson/O'Reilly Media, Inc. via Wikimedia Commo

"For a lot of us, it feels like a little Yosemite of the coast," saysMike Wallace, a surf coach at the local high school. His daughter caught her first wave off of Martins Beach.

All California beaches are, by law, public between the ocean and the high-tide mark. The problem is getting here. Unless you're on a boat, the only way to get to Martins Beach is the road — which has a gate and a big sign that says "Private Property: Keep Out."

For almost a century, the land was owned by a family who charged a small entrance fee to visitors. In 2008, they sold Martins Beach to a new owner for $37 million. Almost immediately, says Wallace, the gate and sign went up.

"No one knew quite what the status was," he says. "They weren't sure who the owners were."

Eventually, they figured it out: Silicon Valley venture capitalist Vinod Khosla. Khosla wouldn't comment for this story, and his lawyers declined to answer questions.

With his background in solar power and biofuels, Khosla — who recently promoted one of his clean energy ventures on 60 Minutes— isn't the kind of person you'd expect to find in a showdown with environmentalists.

"Even billionaires with a solid track record of conservation efforts, taking coastal property and trying to privatize it — people generally are not willing to allow that to happen," says Mark Massara, a longtime surfer and a lawyer involved in one of several lawsuits filed over Martins Beach.

He and others point out that people have been coming to Martins Beach for decades. They say amenities like public bathrooms, an old cafe and a parking lot set a precedent of access.

hide captionMike Wallace, a high school surf coach, sits outside a cafe on Martins Beach. Some argue that amenities like the cafe, public bathrooms and a parking lot set a precedent of access.

Amy Standen/KQED

Mike Wallace, a high school surf coach, sits outside a cafe on Martins Beach. Some argue that amenities like the cafe, public bathrooms and a parking lot set a precedent of access.

Amy Standen/KQED

Massara says if Khosla wins this fight, he won't be the last one. "Make no mistake that if the beach is allowed to be privatized in this case, it will inspire other efforts by other wealthy individuals."

Nancy Cave of the California Coastal Commission — the state agency whose job it is to keep beaches open to the public — says the group wrote to Khosla asking if hewanted to resolve the issue. Maybe, she thought, something could get worked out. After all, Khosla is going to need permits if he wants to develop the property — and he's going to have to get them from her office.

The group did meet with Khosla's attorneys. "They showed no interest in resolving," says Cave. "They only wanted to litigate."

She and her colleagues say they know how quickly a billionaire can strain the legal resources of a small and chronically underfunded state agency like hers — but they're steeling themselves for a long fight.

Meanwhile, the most recent lawsuit could go to trial in the spring.


Homophony Groups in Haskell - Andrew Gibiansky

$
0
0

Comments:"Homophony Groups in Haskell - Andrew Gibiansky"

URL:http://andrew.gibiansky.com/blog/linguistics/homophony-groups


A few days ago, a friend of mine sent me a fascinating problem. The problem goes like this:

The homophony group (of English) is the group with 26 generators a,b, c, and so on until z and one relation for every pair of English words which sound the same. Prove that the group is trivial!

For example, consider the group elements knight and night. By the cancellation laws, this implies that k must be the identity element. Recall that a trivial group is one which consists solely of its identity element, so our task is to show that each letter of the English alphabet is the identity element.

Skipping all of the algebraic jargon, we want to show that if we set all homophones "equal" to one another, and do left cancellation, right cancellation, and substitution, we can show that all the English letters equal one.

This is a fun exercise to do by hand, but I'd like to do it in Haskell.

Note: This work was done in IHaskell, and what you're reading is the IHaskell notebook exported to HTML for viewing in the browser. You can download the original notebook here.

I've started by compiling a list of homophones in American English, starting with this list and removing all single letters (such as j being a homophone with jay) and all words with apostrophes and periods, as well as some less commonly used words. You can download my list, or make your own.

The contents of the file look like this:

ad add
add ad
arc ark
ark arc
...

Each line is a space-delimited list of words. The first word in the list sounds identical to all the remaining words in the list. This is why you see repeats - ad sounds like add but also add sounds like ad. This repetition isn't necessary, as we could do it programmatically, but is convenient.

Let's go ahead and load this list:

In [1]:

importControl.Applicative((<$>))importData.List.Utils(split)removeEmpty=filter(not.null)homophones<-removeEmpty.mapwords.lines<$>readFile"homophones.list"

Let's take a look at a few more of these homophones.

In [2]:

importControl.Monad(forM_)importData.List(intercalate)-- Show ten of the homophone setsforM_(take10homophones)$\homs->putStrLn$intercalate"\t"homs
adieu ado
ado adieu
affect effect
aid aide
aide aid
ail ale
air err heir
airs errs heirs
aisle isle
ale ail

Note that some of the sets have more than two elements, yet they are all on the same line.

Let's convert this into a more usable format. We'll define a new type WordPair which represents a single pair of homophones, and convert this list into a list of WordPairs.

In [3]:

dataWordPair=WordPairStringStringderivingShow-- Convert a list of homophones into a list of word pairs.-- Note that the wordpairs should only use the first of the -- list as the first word, since there will be repeat sets. -- For instance, the set ["a", "b", "c"] would only generate -- word pairs [WordPair "a" "b", WordPair "a" "c"].pairs::[String]->[WordPair]pairs(str:strs)=map(WordPairstr)strs-- All pairs of words we consider homophones.wordPairs=concatMappairshomophones

Now that we have this data in a usable form, let's use it to derive relations.

The initial relations we have are simply the set of word pairs. However, we can use two operations in order to derive more relations:

  • reduce: The reduction operation will be the application of left and right cancellation laws. If a relation has the same thing on the left of both sides, we can take it off; same for the right side. This generates a new, simpler relation.
  • substitute: The substitution operation will be substituting identity relations in. For instance, if we've derived that d is the identity element, then we can remove d from all known relations to get new, simpler relations.

In addition to each relation storing what strings it considers equal, we'd also like to be able to track what operations led to the creation of that word pair. So before defining a relation, let's define a history data type:

In [4]:

dataHistory=ReduceStringString|SubstituteCharderivingShow

Now, we'd like a relation to store all the transformations that were used to generate it, and also the two things it relates:

In [5]:

dataRelation=Relation[History]StringStringderivingShow-- We'd like equality to only look at the strings, not the history.instanceEqRelationwhereRelation_s1s2==Relation_t1t2=s1==t1&&s2==t2

Since Relation and WordPair are slightly different, let's convert all our WordPairs to Relations. This gives us our initial set of relations, which we will use to derive all other relations.

In [6]:

toRelation::WordPair->RelationtoRelation(WordPairfirstsecond)=Relation[]firstsecondinitRelations=maptoRelationwordPairs

Eventually, we're going to iteratively improve these relations until we have proven that all letters equal the identity. First, though, let's define our two operators, starting with reduce.

When we reduce a relation, we apply the right and left cancellation laws. If we have the equation \[ab = ac\] we can use the left cancellation law to reduce it to \(b = c\); similarly, using the right cancellation law, we can reduce the equation \[xa = ya\] to just \(x = y\).

Our reduce operator repeats these steps until it can no longer do so, and then the resulting strings are the reduced relation.

In [7]:

reduce::Relation->Relationreducerel@(Relationhistfirstsecond)|canReducefirstsecond=go(first,second)-- Note that we also have to be careful with the history.-- If the `reduce` does nothing, then we do not want to add-- anything to the history of the relation.|otherwise=relwhere-- A reduction can happen if both strings are non-zero-- and share a common first or last letter.canReducefirstsecond=not(nullfirst)&&not(nullsecond)&&(headfirst==headsecond||lastfirst==lastsecond)-- Modified history including this reduction.hist'=Reducefirstsecond:hist-- Base case: if we've reduced a word pair to an empty string -- and something else, we're done, as that something else-- is equivalent to the identity element.go("",word)=Relationhist'word""go(word,"")=Relationhist'word""go(first,second)-- Chop off the first element if they're equal.|headfirst==headsecond=go(tailfirst,tailsecond)-- Chop off the last element if they're equal.|lastfirst==lastsecond=go(initfirst,initsecond)-- If netiher first nor last element are equal,-- we've simplified the relation down as much-- as we can simplify it.|otherwise=Relationhist'firstsecond

This looks pretty good. Next, let's define the substitute operator.

The substitute operator removes a character from a relation. For instance, if we know that d is the identity, we can simplify the relation \[ad = dyd\] to just \(a = y\).

Just like the reduce operator, we avoid modifying the Relation's history if the substitute does nothing.

In [8]:

importData.List.Utils(replace)-- Generate a new relation by removing characters we know to be -- the identity. Make sure to update the history of the relation-- with this substitution!substitute::Char->Relation->Relationsubstitutecharrel@(Relationhistfirstsecond)|canSubstitutefirstsecond=Relation(Substitutechar:hist)(replacedfirst)(replacedsecond)|otherwise=relwherecanSubstitutefirstsecond=char`elem`first||char`elem`secondreplaced=replace[char]""

With substitute implemented, we've finished all the machinery we're going to use for simplifying our relations. We're going to iteratively reduce and substitute until we've found that all the English letters are the identity element of the homophony group. We're still missing one thing, though - how do we know which letters we've proven to be the identity?

Let's define a quick helper datatype for every identity we find. We're going to store the character that we've proven is the identity, as well as the history; that way, when we want to examine the results, we can see exactly how each letter was reduced to the identity.

In [9]:

dataFoundIdent=FoundIdent{char::Char,hist::[History]}

Let's also define a function that extracts all the identity elements from a set of relations.

In [10]:

-- mapMaybe = map fromJust . filter isJust . mapimportData.Maybe(mapMaybe)identities::[Relation]->[FoundIdent]identities=mapMaybegowherego::Relation->MaybeFoundIdentgo(Relationhist[char]"")=Just$FoundIdentcharhistgo(Relationhist""[char])=Just$FoundIdentcharhistgo_=Nothing

Let's finally put all of this together. We're going to start with our initial set of relations, initRelations, and then we're going to iteratively simplify them. Initially, we have no known identity elements.

In each iteration, we

  • Substitute into each relation each known identity (replacing it with the empty string).
  • Reduce the resulting relations.
  • Collect all known identity elements.

In [11]:

importData.List(nubBy)importData.Function(on)-- The iteration starts with a list of known identity elements-- and the current set of relations. It outputs the updated -- relations and all known identity elements.iteration::([FoundIdent],[Relation])->([FoundIdent],[Relation])iteration(idents,relations)=(newIdents,newRelations)where-- Collect all the substitutions into a single function.substitutions=foldl(.)id$map(substitute.char)idents-- Do all substitutions, then reduce (for each relation).newRelations=map(reduce.substitutions)relations-- We have to remove duplicate identity elements, because-- in each iteration we find multiple ways to prove that some-- letters are the identity element. We just want one.removeDuplicateIdents=nubBy((==)`on`char)-- Find all identities in the new relations.newIdents=removeDuplicateIdents$idents++identitiesnewRelations

Let's iterate this process until we have all the identities we want. We want 26 of them, so we can just check the length. (If this operation never finishes, we're out of luck!)

In [12]:

-- Generate the infinite list of iterations and their results.initIdents=[]iterations=iterateiteration(initIdents,initRelations)-- Define a completion condition.-- We're done when there are 26 known identity elements.done(idents,_)=lengthidents==26-- Discard all iteration results until completion.-- Take the next one - the first one where the condition is met.result=head$dropWhile(not.done)iterations

Woohoo! We're done! Let's take a look at the results!

In [13]:

importData.List(sort)idents=fstresultidentChars=mapcharidentsputStrLn$sortidentCharslengthidentChars
abcdefghijklmnopqrstuvwxyz

Looks like we do indeed have every single letter mapped to the identity.

Let's see if we can deduce, for each letter, how it was mapped to the identity. Instead of doing it in alphabetical order, we'll look at them in the order they were deduced, so it follows some logical flow.

In [14]:

importText.Printf(printf)forM_idents$\(FoundIdentcharhist)->doprintf"Proving %c = 1:\n"charforM_(reversehist)$\op->putStrLn$caseopofReducefirstsecond->printf"Reduce %s and %s"firstsecondSubstitutech->printf"Substitute %c for ''"chputStr"\n"
Proving e = 1:
Reduce aid and aide
Proving a = 1:
Reduce aisle and isle
Proving u = 1:
Reduce ant and aunt
Proving t = 1:
Reduce but and butt
Proving n = 1:
Reduce cannon and canon
Proving s = 1:
Reduce cent and scent
Proving h = 1:
Reduce choral and coral
Proving k = 1:
Reduce doc and dock
Proving l = 1:
Reduce filet and fillet
Proving w = 1:
Reduce hole and whole
Proving b = 1:
Reduce plum and plumb
Proving g = 1:
Reduce reign and rein
Proving c = 1:
Reduce scent and sent
Proving o = 1:
Reduce to and too
Proving i = 1:
Reduce waive and wave
Proving r = 1:
Reduce air and err
Substitute i for ''
Substitute a for ''
Substitute e for ''
Proving d = 1:
Reduce awed and odd
Substitute o for ''
Substitute w for ''
Substitute a for ''
Substitute e for ''
Proving y = 1:
Reduce bite and byte
Substitute i for ''
Proving z = 1:
Reduce boos and booze
Substitute s for ''
Substitute e for ''
Proving q = 1:
Reduce cask and casque
Substitute k for ''
Substitute u for ''
Substitute e for ''
Proving x = 1:
Reduce coax and cokes
Substitute k for ''
Substitute s for ''
Substitute a for ''
Substitute e for ''
Proving p = 1:
Reduce coo and coup
Substitute o for ''
Substitute u for ''
Proving f = 1:
Reduce draft and draught
Substitute g for ''
Substitute h for ''
Substitute u for ''
Proving m = 1:
Reduce damned and dammed
Substitute n for ''
Proving j = 1:
Reduce genes and jeans
Substitute g for ''
Substitute n for ''
Substitute a for ''
Substitute e for ''
Proving v = 1:
Reduce felt and veldt
Substitute l for ''
Substitute e for ''
Substitute f for ''
Substitute d for ''

If you scan through the list above, there's a few weird cases, but for the most part, it seems legitimate. (I mildly question felt and veldt, but it depends on how you pronounce things. If you look at the British English list of homophones, it's totally different anyways!)

So that's that! We've found the ways to reduce every letter to the identity, and shown how to do it.

I wonder if other languages also have trivial homophony groups. It might be fun to try Spanish, French, Russian, and others, and see if the homophony groups tell us anything interesting about the language!

The Descent to C

$
0
0

Comments:"The Descent to C"

URL:http://www.chiark.greenend.org.uk/~sgtatham/cdescent/


by Simon Tatham

1. Introduction

This article attempts to give a sort of ‘orientation tour’ for people whose previous programming background is in high (ish) level languages such as Java or Python, and who now find that they need or want to learn C.

C is quite different, at a fundamental level, from languages like Java and Python. However, well-known books on C (such as the venerable Kernighan & Ritchie) tend to have been written before Java and Python changed everyone's expectations of a programming language, so they might well not stop to explain the fundamental differences in outlook before getting into the nitty-gritty language details. Someone with experience of higher-level languages might therefore suffer a certain amount of culture shock when picking up such a book. My aim is to help prevent that, by warning about the culture shocks in advance.

This article will not actually teach C: I'll show the occasional code snippet for illustration and explain as much as I need to make my points, but I won't explain the language syntax or semantics in any complete or organised way. Instead, my aim is to give an idea of how you should expect C to differ from languages you previously knew about, so that when you do pick up an actual C book, you won't be distracted from the details by the fundamental weirdness.

I'm mostly aiming this article at people who are learning C in order to work with existing C programs. So I'll discuss ways in which things are commonly done, and things you're likely to encounter in real-world code, but not things that are theoretically possible but rare. (I do have other articles describing some of those.)

2. Memory layout

Modern high-level languages generally try to arrange that you don't need to think – or even know – about how the memory in a computer is actually organised, or how data of the kinds you care about is stored in it. Indeed, they actively try to hide this information from you: if you do know about that sort of thing, and you find yourself wondering (say) how the fields of a Java class are laid out in memory or where a Python lambda stores the variables captured from the scope it was defined in, the languages will provide no means for you to even ask it those questions. The implication is that that's not your business: you just write the semantics, and don't bother your head with implementation details.

By contrast, C thinks that these implementation details are your business. In fact, C will expect you to have a basic understanding that memory consists of a sequence of bytes each identified by a numeric address...

that most items of data are stored in one or more bytes with consecutive addresses...

... and that the language's methods of aggregating multiple data items into one larger one (arrays and structures) work by placing those data items adjacent to each other in contiguous pieces of memory (sometimes with padding to make the addresses nice round multiples of numbers like 4). For example, defining this array...

struct foo { char c; int i; };
struct foo my_array[3] = {
 { 'x', 123 },
 { 'y', 456 },
 { 'z', 789 },
};

... might give you a memory layout looking something like this:

C will expect you to not only understand this concept, but to use it in your work. For example, the C standard library provides a function called ‘memcpy’, which copies a certain number of ‘char’ values (i.e. bytes) from one contiguous memory area to another. If you wrote that function in Java, it would work fine, but you'd only be able to use it with actual arrays of char; there would be no way to use it to copy the contents of an array of int, or the contents of a Java class. But in C, memcpy can copy anything at all – as long as you know the address in memory where your object starts and how many bytes it occupies, you can copy it to a different address one byte at a time using memcpy, and all the larger data items such as ints will arrive at their destination unharmed by taking them apart and putting them back together like that. This isn't just a curiosity: sometimes you'll have to do this sort of thing, because it will be the only convenient way to get the data you need into the place you need it. That's what I mean by saying that C will expect you to have and use a basic understanding of memory organisation.

3. Pointers

Naturally, if you're going to refer to objects in memory by their addresses, the language you do it in must provide some kind of data type suitable for storing an address. In C, this data type is called a pointer. In fact, there's a whole family of them: for every data type, there is a corresponding pointer type (pointer to int, pointer to char, pointer to some structure you defined yourself) which stores the address of an object of the original type. (Yes, you can have pointers to pointers too.)

Higher-level languages generally have some kind of mechanism which is essentially pointer-like in implementation, in that it lets you have more than one variable through which you can access the same actual data. However, C pointers are a bit different in several ways.

3.1. Pointers are explicit

If you've used Java or Python, you'll probably be familiar with the idea that some types of data behave differently from others when you assign them from one variable to another. If you write an assignment such as ‘a = b’ where a and b are integers, then you get two independent copies of the same integer: after the assignment, modifying a does not also cause b to change its value. But if a and b are both variables of the same Java class type, or Python lists, then after the assignment they refer to the same underlying object, so that if you make a change to a (e.g. by calling a class method on it, or appending an item to the list) then you see the same difference when you look at b.

In Java and Python, you don't get much of a choice about that. You can't make a class type automatically copy itself properly on assignment, or make multiple ‘copies’ of an integer really refer to the same single data item. It's implicit in the type system that some types have ‘value semantics’ (copies are independent) and some have ‘reference semantics’ (copies are still really the same thing underneath).

In C, this distinction is explicit, not implicit. Every data type has value semantics by default, even big structures containing lots of fields (analogous to a Java class): assigning one structure variable to another causes a large amount of memory to be physically copied, and then you end up with two independent instances of your large structure type. On the other hand, if you want reference semantics, you can take the address of any ordinary variable, creating a value of the appropriate pointer type which points at the original variable.

For example, suppose you write this snippet of code:

void function(int x)
{
 x = x + 1;
 other_function(x);
}

Just like in Java or Python, the parameter variable x is treated as a local variable of the function: you're allowed to modify it, but the modifications only apply to your own copy. If someone calls this function by writing

 function(my_important_int_variable);

then the value of their important int variable will be unchanged afterwards.

But if you want to write a function that modifies an int variable specified by the caller, you can do it using a pointer. You'd change the function so that it looked like this:

void function(int *x)
{
 *x = *x + 1;
 other_function(*x);
}

and then the caller would have to write this:

 function(&my_int_variable);

The syntax ‘int *x’ declares x to be a pointer to int; the syntax ‘*x’ inside the function is called a dereference, and means ‘the int value stored at the address given by x’. The ‘&’ sign is the opposite of ‘*’, in that it starts from an int (or anything else) and returns a pointer value giving the address of that int.

For these purposes, there's no difference between simple types like int and complicated user-defined structure types analogous to Java classes. If I'd written the entire example above using a structure type in place of int, it would have looked exactly the same – passing a structure argument to a function in the most obvious way causes an independent copy to be made, whereas if you want the called function to be able to modify the structure then you have to pass a pointer instead, and there will be an ‘&’ visible at every call site to remind you of that.

3.2. Pointers can go stale

In the previous section I showed an example where you call a function and pass it the address of a variable you want it to write into. Suppose the calling function had done something like this:

int caller(void)
{
 int my_int_variable = 3;
 function(&my_int_variable);
 return my_int_variable;
}

This function initialises a variable to 3; then it calls the example function from the previous section to change the value of the variable; finally it returns the updated value.

Now suppose that, instead of doing what it did in the previous section, function() had kept a copy of the pointer it was passed, by stashing it in a global variable:

int *stashed_pointer;
void function(int *x)
{
 stashed_pointer = x; /* save a copy of the pointer */
}

Then it returns to caller(), which returns in turn. When caller() returns, its local variable my_int_variable goes ‘out of scope’ – that is to say, it stops existing.

So now what's the status of the value in ‘stashed_pointer’?

The answer is that it's a stale pointer. That is, it's the address of a piece of memory which used to hold a currently active variable, but doesn't hold one any more. So if any part of the program uses that pointer value in future, bad things will happen. Some kinds of stale pointer access can cause the program to crash, but the one described here is unlikely to do that; more probably, it will have the much worse effect of overwriting a piece of memory that's been reused to store something completely different, so that you don't immediately notice the problem, and your program starts behaving oddly at a later time when it's too late to easily work out why.

(Java's implicit approach to reference semantics avoids this risk, because local variables in functions are either references already– pointing to objects whose lifetime is not tied to the lifetime of the function – or else cannot be converted into references. The risk arises because C lets you take the address of any variable you like.)

3.3. You can do arithmetic on pointers

Objects in C are often stored at addresses which differ by a fixed amount. Successive elements of an array are laid end-to-end in memory, for example, and so are fields of a structure. So it's often useful to be able to construct a new pointer value by doing arithmetic on an old one.

C arranges this by letting you add an integer to a pointer value, e.g. if p is a pointer then you can write p+1 or p-100. The number you add is implicitly multiplied by the size of the type that p points to. In other words, if p is a pointer to int, then writing p+1 returns a pointer whose address is not one byte after it in memory, but one int after it in memory. This means, in particular, that if p originally pointed to one of the elements of an array, then adding 1 to it causes it to point at the next element in the array. For example, in this snippet:

int array[4] = { 1, 11, 121, 1331 };
int *ptr = &array[1]; /* ptr points to the second element (11) */

if we suppose that our array starts at address 100 as shown in the above diagram, then the pointer variable ptr holds the address 104 (pointing at array[1]). If you compute ptr+1, the resulting pointer will hold the value 108 (not 105), so that it points at the next array element array[2]; if you compute ptr-1, you'll get a pointer holding the value 100, pointing at array[0].

Put another way: in C, a pointer to type Foo isn't just an address where you can expect to find an object of type Foo. It's always possible that what you can find there is one Foo in a long sequence stuck end-to-end, so the language provides you with a convenient way to step back and forth along that sequence.

Idiomatic C code will use this facility frequently. String handling code in particular (which we'll examine in more detail in section 7) has a strong tendency to use pointer arithmetic in preference to array indices. That is, instead of having an integer variable i starting at zero, examining a character of the string by referring to string[i], and incrementing i when you want to move on to the next character, it's more common to assign a pointer value p to point at the start of the string, examine a character by referring to *p, and increment p to step on to the next character.

3.4. Pointers are not type-safe

Another way in which pointers in C differ from high-level languages' references is that you're not constrained to point them at objects of the right type.

A reference in a high-level language can be relied on to either point at a valid object of the right type, or at NULL or nil or None or some such. We've already seen (section 3.2) that C pointers don't obey the same constraint, because they can point at memory that used to contain a valid object but doesn't any more.

But that's not the only way in which a pointer can point at ‘wrong’ things. Pointers are just addresses in memory, and C doesn't assume it knows better than you about what you keep where in memory. So C will typically let you just construct any pointer value you like by casting an integer to a pointer type, or by taking an existing pointer to one type and casting it so that it becomes a pointer to an entirely different type.

For example, in section 2 I already mentioned that you can treat any C object of any type as a plain sequence of bytes, just by taking the address of the object and casting it to a pointer to char. It's not generally a good idea to rely on the exact byte values you'll read out of an object that way; the exact details of objects' memory layout will vary between platforms, so you won't get the same answers on every kind of system if you do this. But you can depend on a few things, such as that if you copy a whole object byte by byte from one place to another then you'll end up with a valid copy of the same object at the destination.

4. Arrays are not bounds-checked

In higher-level languages, you're probably used to the idea that looking up an element of an array will either successfully return an element if the index is in the correct range, or else crash with an exception of some kind. In order to do this, arrays have to come with some implicit metadata indicating their length, and the language generally lets you access that metadata directly as well (len(array) in Python, array.length in Java).

In C, arrays are a much simpler and more primitive concept. An array is just some number of pieces of data of the same type, all laid out in memory end-to-end. You have a pointer to the start of the array, and you access array[i] by first adding i to the value of that pointer, and then looking up whatever is in memory at that location. (In fact, that's what array[i]means– the language defines it to be a synonym for *(array+i).)

So the length of the array isn't stored in memory anywhere, and therefore C arrays have no way to look up the length at run time like Java and Python do. This also means that you are responsible for making sure you don't overrun the bounds of the array: the language cannot detect it if you do, so you won't be notified of your mistake by a nice clean exception. Instead, overrunning an array will have effects similar to those of dereferencing a stale pointer (see section 3.2 above): if you're lucky the operating system will manage to crash your program, but if you're unlucky, you'll overwrite memory that's being used by something else, and cause an unpredictable change to your program's behaviour which might not show up for a long time.

Therefore, it's important to make sure you keep track of array lengths yourself. This is often done by keeping the array pointer alongside a variable giving the array's length; alternatively, you might arrange that the array has some kind of recognisable terminating element at the end, and then any function to which you pass a pointer to the array must make sure to stop when it encounters that terminating element. (Strings are often handled this way; see section 7.)

A compensatory advantage to C's very primitive concept of arrays is that you can pretend that they're a different size or that they start in a different place. For example, if you have a function which expects to be given a pointer to an array of ten Whatsits, and in fact you have an array of fifty Whatsits and you want the function to operate on elements 20,…,29 of that array, that's just fine – pass the value array+20 to the function and it'll never know or care that the pointer it received wasn't ‘really’ the start of the array from your point of view.

5. Allocated memory must be manually freed

Perhaps the best-known difference between C and high-level languages is the manual memory management.

In a language like Java, you're used to being able to say ‘new Foo’ to create a new object of type Foo. In C, things are not too different: you say ‘malloc(n)’ to ask for a piece of memory n bytes long, so if you want to create a new object of type Foo then you say ‘malloc(sizeof(Foo))’, which will allocate just the right amount of memory to keep a Foo in.

(Though note that the small difference between the apparently synonymous ‘new Foo’ and ‘malloc(sizeof(Foo))’ does have one noticeable consequence, which is that the return value of malloc has a generic pointer type rather than being a pointer to Foo. So if you pass sizeof(the wrong thing) to malloc, nothing at compile time will warn you that you've allocated the wrong amount of memory. This is another way in which you can accidentally overwrite important data when programming in C.)

The difference in C is that when you've finished with that piece of memory, you have to dispose of it by hand, by calling the function free() and passing it the same address you got from malloc.

This has two complementary consequences. Firstly, of course, it means you have to remember to free your memory, and in particular, to free it at the right moment – after it's not needed any more, but before you lose all your copies of the pointer (so that you wouldn't be able to free it anyway). And secondly, this gives you another way in which pointers can become stale: if you have several copies of a pointer to allocated memory in different parts of your code, and then one part of the code frees the memory, the other parts now have a stale pointer which they must avoid using for anything.

(In particular, you mustn't free a pointer that's already been freed. Freeing twice is an error and will typically corrupt the memory allocator's state.)

A typical technique for getting this right is to imagine a concept of ‘ownership’ of allocated memory. The idea is, if several parts of the code have copies of the same allocated pointer, one of them is considered to ‘own’ it. The owner of the memory is the only one who can free it. So the piece of code which owns the pointer must make sure to free it at some appropriate moment (e.g. if the pointer is kept in another allocated structure, then when the latter structure is disposed of that's often a good moment to free the pointer too); on the other hand, any code which doesn't own the pointer has a potential risk of finding the pointer has become stale, if the owner frees it and doesn't let them know. So you need to think about how that's going to work, and find a way for non-owners of the pointer to avoid making that mistake. (There are lots of ways to do that. For example, you might be able to arrange that the owner is freed last, so that all the non-owners have gone away already before the owner calls free(). Alternatively, you might keep a list of non-owners, and notify them all when you free the memory. And so on.)

6. Undefined behaviour

In previous sections I've described a number of things you can get wrong in a C program – accessing stale pointers, allocating the wrong size of memory, overrunning the bounds of an array, freeing the same pointer twice – which will cause your program to overwrite the wrong piece of memory. In a ‘safe’ high-level language, you would expect all of these to be either impossible in the first place (because the language syntax doesn't permit you to even express the idea) or else be checked at run time and give rise to an exception, so that the program deliberately crashes and tells you something about what you did wrong.

In C, you are not protected in either of those ways, so making mistakes of this kind can lead to your program corrupting pieces of memory that other parts of the program were depending on. If you're lucky, this will lead to a reasonably prompt crash of some kind; but if you're unlucky, the effects won't be immediately noticeable, and your program will appear to work sensibly for the moment and then fail much later for no obvious reason. Worse still, if your program is on a security boundary and any of these errors can occur in response to input from untrusted users, then a malicious attacker may be able to manipulate errors like this to arrange to overwrite particular pieces of memory on purpose, and perhaps take control of your program.

The technical term for this kind of situation, in the C standard, is ‘undefined behaviour’. The C standard is intentionally only a partial specification of how programs must behave: there are a lot of things you can do in C for which the standard specifies no behaviour, so that if you do any of those things then anything can happen without the compiler or runtime being in violation of the standard.

Overwriting the wrong piece of memory is a major and important example of undefined behaviour, but it's not the only one. There are other things which you should avoid doing in C, but which the language implementation won't give you any help with, and if you do get them wrong then unpredictable weirdness can result.

An example of this is integer overflow: trying to store a too-large or too-small value in a C signed integer type such as ‘int’. You could imagine several plausible things that might happen if you try this: wraparound (adding one to the largest possible value gets you the smallest possible value), saturation (adding to the largest possible value leaves it unchanged), or a crash (the run-time environment detects the error and terminates the program). In fact, C says that the behaviour if you do this is completely undefined: that is, any of those reasonably sensible things could happen, but anything else could happen instead, no matter how silly. For example, consider this code:

int f(int n)
{
 if (n < 0)
 return 0;
 n = n + 100;
 if (n < 0)
 return 0;
 return n;
}

Looking at this function, you'd think there was no way it could return a negative number. We first make sure n is positive; then even after we add 100 to it, we check again that it's positive, in case wraparound due to integer overflow caused it to become negative. You could imagine the function crashing, if it were compiled and run on a platform where that's the response to overflow, but if it returns at all then surely it must return either zero, or a positive integer. Right?

Actually, no. The GNU C compiler (gcc) generates code for this function which can return a negative integer, if you pass in (for example) the maximum representable ‘int’ value. Because the compiler knows after the first if statement that n is positive, and then it assumes that integer overflow does not occur and uses that assumption to conclude that the value of n after the addition must still be positive, so it completely removes the second if statement and returns the result of the addition unchecked.

Put another way, you could imagine that the compiler is thinking to itself: ‘If the user passed in an integer so large that the addition overflows, that would be undefined behaviour, so we can assume the user didn't do it. So they must be implicitly promising to always call this function with input values that don't cause overflow – in which case the function doesn't need the second test, and will run faster with it removed.’ And, in a sense, it's right: as long as you don't cause integer overflow by passing a too-large value as input to the above function, it will work correctly like that and run faster. It's just that the test you deliberately put in as a safety check hasn't worked, because undefined behaviour is too weird for any safety check to be able to reliably spot it after it's happened.

And even that is still only one possible example of how a program might misbehave if you cause undefined behaviour – another time, it might be some other totally different thing that you couldn't have predicted in advance. So beware! The only safe thing is not to allow undefined behaviour to happen in the first place: for example, in the above code, the right thing would have been to check whether n is close to overflow before trying to add anything to it, and not do the addition at all if so. (Another option in some situations is to rewrite the code using the type ‘unsigned int’, which the standard defines to be somewhat better behaved.)

7. There is no convenient string type

String handling in C is very, very primitive by the standards of almost any other language. In Java or Python, you expect that most of the time you can treat string variables just like any other kind of variable. In C, that's not true at all.

In C, a string is just an array of ‘char’ values (or ‘wchar_t’, if you're using Unicode, but that doesn't really affect the upcoming discussion). That is to say, you just have a lot of chars laid end-to-end in memory. The standard convention (although you can choose to do it differently if you need to) is that strings are terminated with a zero byte; so if you pass a string to a function, for example, you typically just pass a char pointer that points to the first character of the string, and the function receiving it has to search along the string for the zero byte if it wants to know how long the string is.

The interesting effect of this is: suppose you want to construct a new string, for example by concatenating two existing strings. Where do you put it? You need to find an appropriately sized chunk of memory to keep your new string in.

The most obvious approach is to use malloc to allocate a chunk of memory the right size. In order to do that, first you have to find out the lengths of your two input strings, by counting along each one looking for the terminating zero byte. Then you have to call malloc and ask for a number of bytes equal to the sum of those two lengths plus one (leaving room for the terminating zero byte), and then you have to copy the actual string data from the two input strings into the newly allocated memory. And then, of course, you have to keep track of when you've finished with the string, and remember to free it once it's no longer needed.

That all adds up to a lot of effort compared to the typical high-level language just letting you write something like ‘newstring = string1 + string2’. And it's a typical example, unfortunately: most string handling operations in C are about as annoying as that.

Of course, the comparatively easy string handling in high-level languages will really be having to do all of this same work; it's just that the language hides it from you, and takes care of the tedious details automatically. So in a high-level language, you can easily write some pretty slow string-handling code without really noticing – you can concatenate two strings with a single plus sign, and in some situations that one character can cause the language to have to move megabytes of data bodily around the computer's memory. In C, you're much more likely to notice that the string handling you asked for is likely to be long and tedious and slow – and sometimes, you can find ways to avoid some of the pain.

For example, suppose you've got a long string, and you want to break it up at the space characters so that you end up with lots of smaller strings representing individual words. If you do that in a high-level language with the built-in string handling, you'll almost certainly end up with a series of string variables containing copies of the parts of the original string that contain each word.

In C, a common approach to this problem (at least if you didn't need the original string itself afterwards, which you often don't) avoids having to actually copy anything, so it will run much faster. What you do is to modify the original array of chars, by writing zero bytes just beyond the ends of words. For example, suppose you have an input string like this:

char input_string[] = " string of four words";

(The \0 in the last place is C's notation for the zero byte value used to terminate the string.)

Then you might transform it by writing zero bytes into these locations:

Now we can construct an array of char-pointer values (let's call it ptrs) pointing at the beginning of each word. For example, our pointer array might look like this in memory ...

... so that the values in it point to the start of each word, avoiding the spaces before them:

So each pointer points to a zero-terminated string containing one of the original words. Hence, we've effectively constructed a separate string value per word, without ever having to move the string data from one place to another or allocate new memory to contain copies of it. (Well, we probably did have to allocate memory for our ptrs array, but that's typically a lot smaller.)

In C, this kind of trickery is done as often as possible, to save the considerable pain of allocating memory and copying stuff. When it's possible, it can make C's string handling a lot faster than doing the same thing in a high-level language – so there at least is some kind of advantage in return for all the extra programmer effort.

7.1. The standard library is not always your friend

Something worth remembering about string handling in C is that although the standard library provides you with a set of functions to do typical things (e.g. copying a string, finding its length, comparing two strings), it's not always sensible to do a particular job by using the library function for that job. It can often be worth thinking about how the library function will work, and noticing that in some situations there's a way to avoid doing unnecessary work.

For example, the standard library provides a function called ‘strcat’, which concatenates one string on to the end of another. This doesn't work as I describe above (allocating new memory to put the combined string in), but instead it expects you to have spare space after the end of the first string. For example, if you already had room for 100 characters of data, and what's actually stored in those 100 characters is the string ‘hello’ followed by a terminating zero byte, then there's plenty of room to add more stuff on the end without overflowing your buffer, so you could safely use strcat to add the word ‘world’ on to the end.

So, suppose you have an enormous buffer already allocated, and you want to concatenate hundreds of little strings together. You might naturally suppose that the best way to do that is to start with an empty string and to repeatedly call strcat to add another little bit on to the end of it. But in fact, that's a slow way to do the job: each time you call strcat, it has to start at the beginning of the combined string (because that's the only thing you gave it a pointer to), count all the way along it to find the zero byte at the end, and then when it finds it, write the new string into the buffer. So if you combine lots of tiny strings this way, you'll spend most of your time walking repeatedly along the finished part.

A better approach is to keep a pointer to the end of the string, i.e. always pointing at the terminating zero byte. Every time you add an extra little bit to the string, you just copy it to where your pointer currently points, and then you advance the pointer until it finds the new terminating zero. Then you don't have to keep retracing your steps. So the string-concatenation function provided by the standard library is not always the best way to concatenate strings.

(Of course, in this scenario, you also have to make sure you don't overrun your buffer, either by tracking how much space you have left and stopping if you're about to go over, or by counting up the total length of all the strings in the first place and allocating the right amount beforehand.)

8. No object orientation

Object orientation is pretty standard these days: most modern languages have it in some form. C does not, because it predates that being true.

The closest thing in C to a class type is a structure, which is effectively a class without any methods – just a collection of data items, clustered together in memory so they can be conveniently treated as a unit, but there's no support provided for defining functions that go along with that data, and also there's no direct language support for data-hiding (‘private’ or ‘protected’).

So, how do you solve problems in C that users of other languages would do with classes?

One very common pattern in C is just to do the same thing unofficially. Define a structure type containing your data fields, and then write a set of C functions which take a pointer to your structure type as one argument, and can read and write the data fields in order to implement the operations provided by the ‘class’. A simple class in Java is really no different from that; it's just that where a Java programmer writes ‘myclass.doThing(1, 2)’, a C programmer writes something more like ‘do_thing(&myclass, 1, 2)’.

Although there's no hard guarantee of data hiding, you can do something about it: if your program is big enough to be divided into multiple source files, then you can put the definition of your structure type and the implementations of all the ‘methods’ in one source file, so that no other file can see the structure definition. That will prevent other parts of the program from reading the fields of the structure in practice, because they don't have access to the information about what the fields all are and where they live in the structure. Of course, this being C, it's possible to do uncontrolled writes to the structure regardless (just convert the structure pointer into some other kind of pointer and then write into the memory it points at), but sensible programmers won't do that, because the results will almost certainly not be useful.

All of that works fine as long as you don't want to use inheritance, or Java-style interfaces, or polymorphism. In that situation, you would typically do something in C which mimics the underlying mechanism by which high-level languages implement that kind of feature: you'd define your structure type to contain some pointers to functions, and implement certain class operations by extracting one of those pointers from the structure and calling the function it points to. That way, different instances of the same structure type could behave differently, by having their function pointer fields point at different functions.

In other words, you can do anything in C that a higher-level language's object orientation system provides. You just have to do it by hand, rather than having it done automatically for you.

More importantly, you don't get the error checking. In a high-level language, you can typically take a reference to a derived class and treat it as if it was a reference to the base class, but if you try to do the same with a class that isn't a derived class of that base, you'll get a compiler error to let you know you've made a mistake. In C, there's no selective type checking like that: you can convert any structure pointer to a different kind of structure pointer if you write an explicit cast operator to indicate to the compiler ‘don't worry, I know what I'm doing’, but if you do that, the compiler will never complain, no matter what two pointer types you convert between. So you can't make it diagnose only the conversions that are semantically wrong; instead, you have to use a lot of self-discipline to make sure you don't make those mistakes in the first place.

9. The preprocessor

An unusual feature of C, not found in more modern languages (unless you count C++), is its preprocessor. Before being compiled, every C source file is put through a conceptually separate textual translator, which responds to special directives beginning with ‘#’ on lines by themselves and uses them to transform the source code. The output of the preprocessor still looks like C source code, but with pieces removed or added and some words replaced by other things; then that output goes to the ‘real’ compiler, which turns it into machine code.

The main preprocessor directives are: ‘#include’, ‘#define’, and a set of conditional directives ‘#if’, ‘#else’, ‘#elif’ and ‘#endif’.

9.1. #include

#include's job is to cause another file of C code to be copied into the main one. This is usually used as part of C's system for breaking a program up into modules: in order to refer to a function or variable in another module, the compiler needs to know what type the variable is, or what types of arguments and return values the function expects. Typically you don't #include a file containing the actual definitions of functions; instead, you include a file that just gives that information on its own. For example, if you had two source files ‘main.c’ and ‘foo.c’, and main.c needs to call a function foo() which is defined in foo.c, you might write a third file ‘foo.h’ saying something like this:

int foo(int a, char *b);

That declares the function, meaning that it lets the compiler know that a function with that name exists and that it takes two arguments of particular types and returns an int, but it does not define the function (meaning to provide the actual code showing what the function does). Then, in foo.c, you'd repeat basically the same line of code, but replace the trailing ‘;’ with the body of the function:

int foo(int a, char *b)
{
 return a * strlen(b) + 1;
}

And finally, in main.c, you'd include foo.h in order to be able to call the funtion:

#include "foo.h"
int main(int argc, char **argv)
{
 printf("The answer is %d\n", foo(argc, argv[0]));
}

(In fact, it's usually a good idea to include foo.h in foo.c itself as well as in other modules that need to use the functions. That way, if you make a mistake and write different argument types in the function definition in foo.c and the declaration in foo.c, the compiler will warn you that they don't match.)

9.2. #define

#define allows you to define ‘macros’, which are pieces of text that the preprocessor substitutes for other text whenever it sees them in the source code. Macros can look like individual words, or they can look like function calls. For example:

#define ARRAYSIZE 1000
#define TRIANGLE(n) (n*(n+1)/2)
int array[ARRAYSIZE];
int red_snooker_balls = TRIANGLE(5);

After the preprocessor has done its job, the #define statements themselves have been removed, and everywhere the macros appear elsewhere they will have been substituted for whatever they were defined to be. So this source file will turn into:

int array[1000];
int red_snooker_balls = (5*(5+1)/2);

and then the compiler in turn will actually do the arithmetic to turn the second definition into the number 15.

It's important to note that all of this substitution is done textually, with no understanding of how C works. So, for example, one of my definitions above contains a deliberate mistake: TRIANGLE will go wrong if you write the following.

 a = TRIANGLE(b+c);

To work out what would happen to this, go back to the definition of the macro TRIANGLE, and substitute the text ‘b+c’ everywhere the definition used the name of the macro parameter ‘n’. The preprocessor will transform it into this:

 a = (b+c*(b+c+1)/2);

But now this doesn't mean what you might have expected it to mean, because of operator precedence: as in most languages, C interprets ‘b + c * stuff’ to mean multiplying c by stuff and then adding b to the result, whereas the macro definition clearly wanted ‘(b+c) * stuff’. So macros can be dangerous if not used carefully, because they don't have to respect the syntactic integrity of their parameters – as here, where what looked like an unambiguous command to add b to c turned into something entirely different by the time the code was generated. This is one reason why it's conventional to write macro names in capitals – to warn the programmer that they might do strange things.

This unchecked nature can sometimes be used to deliberate effect. If you do it right, it's possible to write C macros containing partial statements and unbalanced brackets in such a way as to extend the capabilities of the language and let you pretend it has all sorts of useful features that the C designers didn't think to put in. However, I won't go into that here; the usual uses for the preprocessor are the sorts of thing you see above. Just be aware that if you're reading someone else's C code and it uses a control construction that you don't recognise, or it looks as if it's violating the structure of the language in some way, it's worth checking to see if there's a macro somewhere which makes the strange-looking code turn into legal code by the time the main compiler sees it.

You might be wondering why it wasn't more sensible to define ‘ARRAYSIZE’ as a constant and ‘TRIANGLE’ as a function, something like this:

const int ARRAYSIZE = 1000;
int TRIANGLE(int n) { return n*(n+1)/2; }

The answer is that if you do that, then those definitions can't be used in some contexts. In the example above, I used ‘ARRAYSIZE’ to set the number of elements in an array variable, and ‘TRIANGLE’ to set the initial value of a global variable. In both cases, the compiler needs to be able to know the right number at compile time rather than waiting until the compiled code is run, and it can't do that if you define them like this: TRIANGLE is a function call, which means running the compiled code, and ARRAYSIZE requires looking in the piece of memory where the variable is stored. (Yes, even if it's declared as ‘const’, for annoying technical reasons.)

For this sort of reason, a lot of constants and simple functions of this kind tend to be written using the preprocessor rather than in the main C language.

9.3. #if

Finally, #if and its friends allow you to ‘conditionally compile’ code: that is, to decide at preprocessing time whether to include a piece of code in the output program, depending on any criterion the preprocessor has the ability to judge. A typical use for this would be to write a program which mostly looks the same on several operating systems, but in one particular place where things have to be done differently on (say) Windows and Linux, there's a #if segment which tells the preprocessor to choose between the Windows and the Linux implementations of a function depending on what OS it's compiling for. You might write something like this, for example:

void clean_up_temp_file(void)
{
#if defined _WINDOWS
 DeleteFile("C:\\TEMP\\TEMPFILE.DAT");
#elif defined linux
 unlink("/tmp/tempfile.dat");
#else
#error Unrecognised platform
#endif
}

This would cause the call to the Windows operating system function ‘DeleteFile’ to be included in the program when a Windows compiler compiles it, because the Windows C compiler's preprocessor defines the special macro ‘_WINDOWS’ to allow this kind of decision-making in the preprocessor. On Linux, on the other hand, the macro ‘linux’ is defined and so the function call to the Linux function ‘unlink’ would be used instead. And if you try to compile the same program on a platform which defines neither of those macros, then the ‘#error’ directive (another feature of the preprocessor) ensures that compilation fails, so that you don't get as far as running the program before realising you left something out.

10. So why is C like this, anyway?

You're probably thinking, by now, that C sounds like a horrible language to work in. It forces you to do by hand a lot of things you're used to having done for you automatically; it constantly threatens you with unrecoverably weird behaviour, hard-to-find bugs, and dangerous security holes if you put one foot across any of a large number of completely invisible lines that neither the compiler nor the runtime will help you to avoid; and, for goodness' sake, it can't even handle strings properly. How could anyone have designed a language that bad?

To a large extent, the answer is: C is that way because reality is that way. C is a low-level language, which means that the way things are done in C is very similar to the way they're done by the computer itself. If you were writing machine code, you'd find that most of the discussion above was just as true as it is in C: strings really are very difficult to handle efficiently (and high-level languages only hide that difficulty, they don't remove it), pointer dereferences are always prone to that kind of problem if you don't either code defensively or avoid making any mistakes, and so on.

(That doesn't explain all of C's curious design. The undefined-behaviour problem with integer overflow wouldn't happen in machine code; that's a consequence of C needing to run fast on lots of very different kinds of computer, which is a problem machine code doesn't even try to solve. And there's no simple excuse for the preprocessor; I don't know exactly why that exists, but my guess is that back in the 1970s it was an easy way to get at least an approximation to several desirable language features without having to complicate the actual compiler. These days compilers don't have to fit into such small computers, so we don't mind complicating them a lot more.)

A key feature of C is that it needs very little ‘run-time support’, by which I mean not just library functions your program can call if it wants to (C does have those) but library code which the program can't run at all without. Higher-level language features like garbage collection, bounds checking and exceptions all need complicated library code to support them, and all that library code has to be written in some language – and you can't write Java's garbage collector in Java, because you need to have the garbage collector already before you can even run Java code. So one niche in which C is still important is that it's a language in which it's possible to write the supporting code that high-level languages need to run in the first place. The Python interpreter is written in C, for example.

C can also be extremely fast. As a direct result of leaving out all the safety checks that other languages include, C code can run faster, because you can judge for yourself which of the safety checks are actually necessary, and not bother with the rest. Of course, make one mistake and you've had it – but that's just like the rest of C…

But those aren't the reasons why most C code is in C. Mostly, C is important simply because lots of code was written in it before safer languages gained momentum, and now lock-in and network effects mean that C (or C++) can still be the path of least resistance – if you need to work with existing libraries of code that have C interfaces, or reuse and adapt existing programs that were written in C, then naturally you'll have to write in C too, and so the cycle continues.

Copyright © 2013 Simon Tatham. This document is OpenContent. You may copy and use the text under the terms of the OpenContent Licence. Please send comments and criticism on this article to anakin@pobox.com.

URX - Job Board

$
0
0

Comments:" URX - Job Board "

URL:http://goo.gl/YiJZoU


URX drives sales for retailers by creating mobile ads that take users to the product pages in their apps.

URX was a member of the YCombinator Summer 2013 class and is funded by thought leading investors including First Round Capital, Google Ventures, SV Angel, and Greylock. Our work has been covered in Forbes, AdExchanger, TechCrunch, and Mobile Marketer.

We’re a fast growing, highly technical team with deep backgrounds in machine learning, data science, mobile advertising and e-commerce. We believe in bringing together the most creative, driven and focused thinkers to help accelerate the growth of connected commerce.

URX offers employees both massive responsibility and competitive compensation (including stock options, health insurance, and paid vacation) in a high growth environment. As one of the first 20 members of our team, you’ll play an essential role in our success; expect (and be excited) to wear multiple hats. URX is headquartered out of the storied South Park neighborhood of SOMA, San Francisco.

Microsoft Joins Open Compute Project, Shares its Server Designs | Data Center Knowledge

$
0
0

Comments:"Microsoft Joins Open Compute Project, Shares its Server Designs | Data Center Knowledge"

URL:http://www.datacenterknowledge.com/archives/2014/01/27/microsoft-joins-open-compute-project-shares-server-designs/


January 27th, 2014By: Rich Miller

These are some of the more than 1 million servers powering Microsoft’s Internet infrastructure. The company is joining the Open Compute Project and sharing the designs of its servers and storage. (Photo: Microsoft)

SAN JOSE, Calif.– In a dramatic move that illustrates how cloud computing has altered the data center landscape, Microsoft is opening up the server and rack designs that power its vast online platforms and sharing them with the world.

Microsoft has joined the Open Compute Project and will be contributing specs and designs for the cloud servers that power Bing, Windows Azure and Office 365. The company will discuss its plans tomorrow in the keynote session of the Open Compute Summit in San Jose.

Why would Microsoft, long seen as the standardbearer for proprietary technology, suddenly make such an aggressive move into open hardware?

“We came to the conclusion that by sharing these hardware innovations, it will help us accelerate the growth of cloud computing,” said Kushagra Vaid, General Manager of Cloud Server Engineering for Microsoft. “This will directly factor into products for enterprise and private clouds. It’s a virtuous cycle in which we create a consistent experience across all three clouds.”

Azure Clouds for the Enterprise

The designs and code for Microsoft’s cloud servers will now be available for other companies to use. A larger circle of vendors will be able to build hardware based upon the designs, which in turn will allow enterprises to create hybrid Windows Azure clouds running on the same hardware across its on-premises data centers and in Microsoft’s cloud.

The Open Compute Project (OCP) was founded by Facebook in 2011 to take the concepts behind open source software and create an “open hardware” movement to build commodity systems for hyperscale data centers. It has spurred the growth of a vibrant development community, which is now expanding its focus to cover network equipment.

Microsoft now wants to reap the benefits of that ecosystem, which has rapidly transformed Facebook’s initial server and storage designs into commercial products. It also hopes to expand OCP’s efforts to include management software.

“The depth of information Microsoft is sharing with OCP is unprecedented,” said Bill Laing, Microsoft Corporate VP for Server and Cloud, in a blog post. “As part of this effort, Microsoft Open Technologies is open sourcing the software code we created for the management of hardware operations, such as server diagnostics, power supply and fan control. We would like to help build an open source software community within OCP as well.”

Competition in the Cloud

Microsoft’s move to align with Open Compute reflects the intensifying competition in cloud services, where Microsoft, Google and Rackspace are among the players seeking to wrest share from market leader Amazon Web Services. Tapping the OCP’s nimble ecosystem of hardware vendors could accelerate innovation on Microsoft’s cloud platform, resulting in an integrated hybrid cloud platform that can keep pace with AWS.

Pages: 1 2 Next 

Rich Miller is the founder and editor-in-chief of Data Center Knowledge, and has been reporting on the data center sector since 2000. He has tracked the growing impact of high-density computing on the power and cooling of data centers, and the resulting push for improved energy efficiency in these facilities.

Glad to see, MS is taking steps into the correct direction, IMHO:
In the long run, companies who leverages best the network effects of open source & open communinites will have a bigger competitive advantage.
And in MS case, in special, nice to get more information on their system design & architecture.
Regards

One key motivation is likely Google. Data centers represent one of Google’s largest competitive advantages. Commoditization of your competitors’ products and infrastructure is just good business: It’s a strategy that works and usually benefits both consumers and upstart businesses.

Square thinks I don’t exist — Kevin Chen

$
0
0

Comments:"Square thinks I don’t exist — Kevin Chen"

URL:http://kevinchen.co/blog/square-identity-verification/


Square thinks I don’t exist

27 Jan 2014

Square thinks I’m not a real person.

Seriously. During the signup process, they ask for some personal information: your name, home address, phone number, and Social Security Number. They hand it off to an identity verification service, which scrapes, say, public records and credit reports, to ask questions that only you can answer:

This is might be one of the reasons Square says you have to be 18 to sign up: without a financial history, this kind of identity verification simply doesn’t work. The problem is that ID verification companies take a long time to update their data. Assuming you apply for a credit card on your 18th birthday, you could be drinking alcohol legally by the time they catch up. When they can’t find you in their database, they pull irrelevant questions associated with somebody else’s dossier — especially if you have a common name like I do.

Other services, such as Amazon Payments and Citibank, forced me to fax copies of my driver’s license and Social Security card. How backwards.

Square, being the champion of user experience in finance, swiftly and permanently blocked me from using their service without warning:

I took the obvious next step: searching their support site. All that they had was a cute but ultimately useless fish video.

Next, I tried to contact customer service. After all, there’s plenty of evidence that I’m real: I visited their office once, they gave my friends a ton of Square Readers, and I even interviewed for a job there. A week later, nobody had responded. As it turns out, they ignore you unless you call them out on Twitter. Their response? “Our decision remains final.”

What’s worse, Square taunts me about it with a popup. Every single time I log in.

I have honestly never seen such a user-hostile design in financial services — and I use PayPal! As much as Jack Dorsey likes to make fun of PayPal, for people my age, Square’s user experience is usually orders of magnitude worse.

Design is how it works, not how it looks.

| Mobile A/B Testing That Actually Works

$
0
0

Comments:" | Mobile A/B Testing That Actually Works"

URL:http://blog.taplytics.com/mobile-ab-testing-youll-actually-use/


 

Anyone who has built a mobile app would probably agree that iterating and optimizing a mobile app is a slow process. It generally requires countless cycles of updating your app and then waiting for Apple to approve it before users can download the update and you can see whether your changes were positive.

As a result, it’s generally been easier to just link your app to an analytics suite like Mixpanel, Google Analytics, or Flurry and track changes over time than to actually use any of the mobile A/B testing platforms available today.

Today, A/B Testing Means Waiting for Updates

Today, A/B tests must be written in code and then reviewed by Apple before your users can see them. When a platform says they allow you to push experiments without an App Store update, what they mean is that you can deliver the winning experiment to the rest of your users without an update. The real problem is that those experiments needed an update to get there in the first place.

As a result, as your app evolves and you learn more about your users, you still need to do an app update to try a new A/B test.

What Should A/B Testing Be Like?

Because of this limited benefit, few mobile teams actually do any A/B testing, even though it is generally accepted that testing can significantly increase conversions, engagement and retention of users. This begs the question, what would mobile A/B testing have to be like to actually get people to test on mobile? We think the answer is that it has to be dead simple, it can’t require any coding and it has to maintain the performance of the app.

Introducing Taplytics One-Line SDK

We thought long and hard about the problems in mobile A/B testing and what the right platform should be and today we are delivering the A/B testing platform that people will actually use. Because it is; dead simple, it requires no coding of experiments or goals,  while completely maintaining the performance of your app. The Taplytics SDK requires that all you do is initialize the SDK with one line and from there do no more coding. To link elements to Taplytics all you have to do is touch, tap or click on them directly in the live app. Our SDK does all of the work for you, binding the elements to our platform and creating all of the event triggers for goal tracking.

The Nitty Gritties

If you haven’t watched the video above yet, we strongly encourage that you do. It shows the open-source WordPress app running multiple variations, linking elements with just a tap and how changes can be viewed live on a series of dev devices.

We clearly showed a simple test, just calls to action, position and color, but that simple test is actually quite powerful and a great place for anyone to start. As mentioned in our previous post about mobile on-boarding, the vast majority of users are lost in the process of login. With our app you don’t have to worry about taking a month to dream up a complex test for on-boarding. Because you can push tests instantly without writing code you should start small, see how positioning, messaging and highlighting elements changes the process and as you gather more information, Taplytics can scale in complexity with you. Your simple content experiment can, with a few taps and clicks be modified into an experiment of your on-boarding flows, pushing segments of users through a different sequence of screens. The power that you can have to retain and engage your users with Taplytics is huge, but if it doesn’t start somewhere simple and accessible, you will never get to the really powerful tests.

We are really excited that we can finally share this new SDK and feature set. We’re also excited to see how people use this tool to improve their apps in endless ways, big and small. So please take the first step to getting more out of your mobile app by signing up for a free trial. In the spirit of simplicity we’ve even included the sign up form below, so you can get started without having to go anywhere else. That’s how committed we are to the concept of making all of our users’ lives easier.

[PATCH] SPDY/3.1 protocol implementation


Billionlet's take Buffett's Billion

Government to Allow Companies to Disclose More Data on Surveillance Requests

Apple - Press Info - Apple Reports First Quarter Results

$
0
0

Comments:"Apple - Press Info - Apple Reports First Quarter Results"

URL:http://www.apple.com/pr/library/2014/01/27Apple-Reports-First-Quarter-Results.html


iPhone and iPad Sales Drive Record Revenue and Operating Profit

CUPERTINO, California—January 27, 2014—Apple® today announced financial results for its fiscal 2014 first quarter ended December 28, 2013. The Company posted record quarterly revenue of $57.6 billion and quarterly net profit of $13.1 billion, or $14.50 per diluted share. These results compare to revenue of $54.5 billion and net profit of $13.1 billion, or $13.81 per diluted share, in the year-ago quarter. Gross margin was 37.9 percent compared to 38.6 percent in the year-ago quarter. International sales accounted for 63 percent of the quarter’s revenue.

The Company sold 51 million iPhones, an all-time quarterly record, compared to 47.8 million in the year-ago quarter. Apple also sold 26 million iPads during the quarter, also an all-time quarterly record, compared to 22.9 million in the year-ago quarter. The Company sold 4.8 million Macs, compared to 4.1 million in the year-ago quarter. 

Apple’s Board of Directors has declared a cash dividend of $3.05 per share of the Company’s common stock.  The dividend is payable on February 13, 2014, to shareholders of record as of the close of business on February 10, 2014.

“We are really happy with our record iPhone and iPad sales, the strong performance of our Mac products and the continued growth of iTunes, Software and Services,” said Tim Cook, Apple’s CEO. "We love having the most satisfied, loyal and engaged customers, and are continuing to invest heavily in our future to make their experiences with our products and services even better."

“We generated $22.7 billion in cash flow from operations and returned an additional $7.7 billion in cash to shareholders through dividends and share repurchases during the December quarter, bringing cumulative payments under our capital return program to over $43 billion,” said Peter Oppenheimer, Apple’s CFO.

Apple is providing the following guidance for its fiscal 2014 second quarter:

  • revenue between $42 billion and $44 billion
  • gross margin between 37 percent and 38 percent
  • operating expenses between $4.3 billion and $4.4 billion
  • other income/(expense) of $200 million
  • tax rate of 26.2 percent

Apple will provide live streaming of its Q1 2014 financial results conference call beginning at 2:00 p.m. PST on January 27, 2014 at www.apple.com/quicktime/qtv/earningsq114. This webcast will also be available for replay for approximately two weeks thereafter.

This press release contains forward-looking statements including without limitation those about the Company’s estimated revenue, gross margin, operating expenses, other income/(expense), and tax rate. These statements involve risks and uncertainties, and actual results may differ. Risks and uncertainties include without limitation the effect of competitive and economic factors, and the Company’s reaction to those factors, on consumer and business buying decisions with respect to the Company’s products; continued competitive pressures in the marketplace; the ability of the Company to deliver to the marketplace and stimulate customer demand for new programs, products, and technological innovations on a timely basis; the effect that product introductions and transitions, changes in product pricing or mix, and/or increases in component costs could have on the Company’s gross margin; the inventory risk associated with the Company’s need to order or commit to order product components in advance of customer orders; the continued availability on acceptable terms, or at all, of certain components and services essential to the Company’s business currently obtained by the Company from sole or limited sources; the effect that the Company’s dependency on manufacturing and logistics services provided by third parties may have on the quality, quantity or cost of products manufactured or services rendered; risks associated with the Company’s international operations; the Company’s reliance on third-party intellectual property and digital content; the potential impact of a finding that the Company has infringed on the intellectual property rights of others; the Company’s dependency on the performance of distributors, carriers and other resellers of the Company’s products; the effect that product and service quality problems could have on the Company’s sales and operating profits; the continued service and availability of key executives and employees; war, terrorism, public health issues, natural disasters, and other circumstances that could disrupt supply, delivery, or demand of products; and unfavorable results of other legal proceedings. More information on potential factors that could affect the Company’s financial results is included from time to time in the “Risk Factors” and “Management’s Discussion and Analysis of Financial Condition and Results of Operations” sections of the Company’s public reports filed with the SEC, including the Company’s Form 10-K for the fiscal year ended September 28, 2013 and its Form 10-Q for the quarter ended December 28, 2013 to be filed with the SEC. The Company assumes no obligation to update any forward-looking statements or information, which speak as of their respective dates.

Apple designs Macs, the best personal computers in the world, along with OS X, iLife, iWork and professional software. Apple leads the digital music revolution with its iPods and iTunes online store. Apple has reinvented the mobile phone with its revolutionary iPhone and App Store, and is defining the future of mobile media and computing devices with iPad.

Press Contact
Steve Dowling
Apple
dowling@apple.com
(408) 974-1896

Investor Relations Contacts:
Nancy Paxton
Apple
paxton1@apple.com
(408) 974-5420

Joan Hoover
Apple
hoover1@apple.com
(408) 974-4570

 


Apple, the Apple logo, Mac, Mac OS and Macintosh are trademarks of Apple. Other company and product names may be trademarks of their respective owners.

Why isn't People-Centric UI Design taking off? - Scott Hanselman

$
0
0

Comments:"Why isn't People-Centric UI Design taking off? - Scott Hanselman"

URL:http://www.hanselman.com/blog/WhyIsntPeopleCentricUIDesignTakingOff.aspx


NOTE: This post is just speculation and brainstorming. I'm not a UX expert by any means, although I have worked in UI testing labs, run A/B tests, yada yada yada. I dabble. Also, I work for Microsoft, but on the Web and in Open Source. I use an iPhone. Those facts don't affect my ramblings here.

I'm just a little disappointed that 30 years later (longer of course, if you consider Xerox Alto and before, but you get the idea) and we're still all looking at grids of icons. But not just icons, icons are great. It's that the icons still represent applications. Even on my iPhone or iPad I can't have an icon that represents a document. The closest I can get is to add a URL from Mobile Safari.

After Windows 3.1, Microsoft made a big deal about trying to say that Windows was a "document-centric operating system." OS/2 Warp did similarly, except object-centric, which was rather too meta for the average business user. Bear with me here, this is old news, but it was a big deal while we were living it. They kept pushing it up through Windows 98.

This document-centric approach is reflected in a number of Windows 98 features. For example, you can place new blank documents on the Desktop or in any folder window. You can access documents via the Documents menu on the Start menu. You can click a file icon and have its associated application open it, and you can define actions to be taken on a file and display those actions as options in the context menu

Today on the desktop we take all this for granted. Ubuntu, OS X, Windows all know (for the most part) how a document was created and let us open documents in associated programs. iOS is starting to get similar document-centric abilities, although it appears Open In is limited to 10 apps.

In Windows Phone and Windows 8+ I can pin People to the Start Screen. It's a killer feature that no one talks about. In fact, Nokia recently tweeted a screenshot of a 1080p Windows Phone (I've been testing the this last month myself) and I think they made a mistake here. Rather than pinning People, Faces, Groups, Friends, Family, Co-Workers, etc, they shrunk down a bunch of ordinarily good looking icons to their most unflattering to see how many they could fit on the screen.

(Plus they have 19 Updates pending, which I just find annoying.)

Here's mine next to theirs, just to contrast. Now, far be it from me to tell someone how to personalize their phone, I'm just trying to show that it doesn't have to be cartoonish.

What I'm really interested inis why do we, as humans, find App Centric interfaces more intuitive than People Centric ones?

 

The "story" around People Centric is that you don't think "go to twitter and tweet my friend" or "go to Skype and call my friend," instead you click a picture of your friend and then contact them in any possible way using any enlisted app from there.

For example, if I search my Windows machine for "Scott Guthrie" I get this (Scott is lousy about keeping his pictures up to date.)

You can see from here I can Email, Call, Facebook, Skype (if he had Skype), or get a map to his house. All his actual accounts, Twitter, Facebook, etc are linked into one Scott Guthrie Person.

It works great on the phone, where I'm more likely to do more than just email. Note at the bottom there's a chain with a number showing that my wife has 6 accounts (Google, Hotmail, Facebook, Skype, etc) that are all linked into one Contact.

Folks that use Windows Phone mostly know about these features, and the hardcore users I know pin people to Start. On the desktop, though, I never see this. I wonder why. I am surprised that in a people focused world of social networks that elevating our friends, family and loved ones to be at least peers with notepad.exe would have happened by now.

What do you think, Dear Reader? Have you given this some thought in your interfaces?

Sponsor: Thanks to Red Gate for sponsoring Blog Feed this week! Want Easy release management? Deploy your SQL Server databases in a single, repeatable process with Red Gate's Deployment Manager. There's a free Starter edition, so get started now!

Dropbox Tech Blog » Blog Archive » Improving Dropbox Performance: Retrieving Thumbnails

$
0
0

Comments:"Dropbox Tech Blog » Blog Archive » Improving Dropbox Performance: Retrieving Thumbnails"

URL:https://tech.dropbox.com/2014/01/retrieving-thumbnails/


Posted by Ziga Mahkovecon January 27, 2014

Dropbox brings your photos, videos, documents, and other files to any platform: mobile, web, desktop, or API. Over time, through automatic camera uploads on iOS and Android, you might save thousands of photos, and this presents a performance challenge: photo thumbnails need to be accessible on all devices, instantly.

We pre-generate thumbnails at various resolutions for the different devices at upload time, to reduce the cost of scaling photos at rendering time. But when users are quickly scrolling through many photos, we need to request a large number of thumbnails. Since most platforms have limitations on the number of concurrent requests, the requests might get queued and cause slow render times. We present a solution that allows us to reduce the number HTTP requests and improve performance on all platforms, without major changes to our serving infrastructure.

Request queuing

Let’s look at this problem in more detail on the web, specifically the Photos tab at www.dropbox.com/photos. Here’s what the Network view in Chrome’s Developer Tools looks like if we were to load every photo thumbnail on the page individually:

You can see that a limited set of images is loaded in parallel, blocking the next set of thumbnails from being loaded. If the latency of fetching each image is high—e.g. for users far away from our datacenters—loading the images can drastically increase the page load time. This waterfall effect is common for web pages loading lots of subresources, since most browsers have a limit of 6 concurrent connections per host name.

A common workaround for web pages is to use domain sharding, spreading resources over multiple domains (in this case photos1.dropbox.com, photos2.dropbox.com, etc.) and thus increasing the number of concurrent requests. However, domain sharding has its downsides—each new domain requires a DNS resolution, a new TCP connection, and SSL handshake—and is also not practical when loading thousands of images and requiring many domains. We saw similar issues on our mobile apps: both iOS and Android have per-host or global limits on the number of concurrent connections.

To solve the problem, we need to reduce the number of HTTP requests. This way we avoid problems with request queueing, make full use of the available connections, and speed up photo rendering.

Measuring performance

Before embarking on any performance improvement, we need to make sure we have all of the instrumentation and measurements in place. This allows us to quantify any improvements, run A/B experiments to evaluate different approaches, and make sure we’re not introducing performance regressions in the future.

For our web application, we use the Navigation Timing API to report back performance metrics. The API allows us to collect detailed metrics using JavaScript, for example DNS resolution time, SSL handshake time, page render time, and page load time:

Similarly, we log detailed timing data from the desktop and mobile clients.

All metrics are reported back to our frontends, stored in log files and imported into Apache Hive for analysis. We log every request with metadata (e.g. the originating country of the request), which allows us to break down the metrics. Hive’s percentile() function is useful to look at the page load time distribution – it’s important to track tail latency in addition to mean. More importantly, the data is fed into dashboards that the development teams use to track how we’re doing over time.

We instrumented our clients to measure how long it takes to load thumbnails. This included both page-level metrics (e.g. page render time) and more targeted metrics measured on the client (e.g. time from sending thumbnail requests to rendering all the thumbnails in the current viewport).

Batching requests

With the instrumentation in place, we set off on improving the thumbnail loading times. The first solution we had in mind was SPDY. SPDY improves on HTTP by allowing multiple multiplexed requests over a single connection. This solves the issue with request queueing and saves on round-trips (a single TCP connection and SSL handshake needs to be established for all the requests). However, we hit a few roadblocks on the way:

  • We use nginx on our frontends. At the time, there was no stable nginx version with SPDY support.
  • We use Amazon ELB for load balancing, and ELB doesn’t support SPDY.
  • For our mobile apps, we didn’t have any SPDY support in the networking stack. While there are open-source SPDY implementations, this would require more work and introduce potentially risky changes to our apps.

Instead of SPDY, we resorted to plain old HTTPS. We used a scheme where clients would send HTTP requests with multiple image urls (batch requests):

GET https://photos.dropbox.com/thumbnails_batch?paths=
 /path/to/thumb0.jpg,/path/to/thumb1.jpg,[...],/path/to/thumbN.jpg

The server sends back a batch response:

HTTP/1.1 200 OK
Cache-Control: public
Content-Encoding: gzip
Content-Type: text/plain
Transfer-Encoding: chunked
1:data:image/jpeg;base64,4AAQ4BQY5FBAYmI4B[...]
0:data:image/jpeg;base64,I8FWC3EAj+4K846AF[...]
3:data:image/jpeg;base64,houN3VmI4BA3+BQA3[...]
2:data:image/jpeg;base64,MH3Gw15u56bHP67jF[...]
[...]

The response is:

  • Batched: we return all the images in a single plain-text response. Each image is on its own line, as a base-64-encoded data URI. Data URIs are required to make batching work with the web code rendering the photos page, since we can no longer just point an <image> src tag to the response. JavaScript code sends the batch request with AJAX, splits the response and injects the data URIs directly into <image> src tags. Base-64 encoding makes it easier to manipulate the response with JavaScript (e.g. splitting the lines). For mobile apps, we need to base64-decode the images before rendering them.
  • Progressive with chunked transfer encoding: on the backend, we fire off thumbnail requests in parallel to read the image data from our storage system. We stream the images back the moment they’re retrieved on the backend, without waiting for the entire response to be ready; this avoids head-of-line blocking, but also means we potentially send the images back out of order. We need to use chunked transfer encoding, since we don’t know the content length of the response ahead of time. We also need to prefix each line with the image index based on the order of request urls, to make sure the client can reorder the responses.
    On the client side, we can start interpreting the response the moment the first line is received. For web code we use progressive XMLHttpRequest; similarly for mobile apps, we simply read the response as it’s streamed down.
  • Compressed: we compress the response with gzip. Base64-encoding generally introduces 33% overhead. However, that overhead goes away after gzip compression. The response is no longer than sending the raw image data.
  • Cacheable: we mark the response as cacheable. When clients issue the same request in the future, we can avoid network traffic and serve the response out of cache. This does require us to make sure the batches are consistent however – any change in the request url would bypass the cache and require us to re-issue the network request.

Results

Since the scheme is relatively simple and uses plain HTTPS instead of SPDY, it allowed us to deploy it on all platforms and we saw significant performance improvements: 40% page load time improvement on web.

However, we don’t see this as a long-term strategy – we’re planning on adding SPDY support to all of our clients and take care of pipelining at the protocol level. This will simplify the code, give us similar performance improvements and better cacheability (see note about consistent batches above).

The Dropbox performance team is a small team of engineers focused on instrumentation, metrics and improving performance across Dropbox’s many platforms. If you obsess over making things faster and get excited when graphs point down and to the right, join us!

Viewing all 9433 articles
Browse latest View live