Quantcast
Channel: Hacker News 50
Viewing all 9433 articles
Browse latest View live

MSDN Blogs

$
0
0

Comments:" MSDN Blogs "

URL:http://blogs.msdn.com/b/oldnewthing/archive/2013/12/24/10484402.aspx


James Mickens has written a number of essays for ;login: magazine. The wall-of-text presentation is kind of scary, and the first time I encountered them, I skimmed the essays rather than reading them through. As a result, my reaction was, "I got tired." But if you follow the path and read the essays through, you realize that they are all brilliant.

You can't just place a LISP book on top of an x86 chip and hope the hardware learns about lambda calculus by osmosis.

and in the "so funny because it's true that it wraps around and isn't funny any more, but then wraps around a second time and is funny again, but with a tinge of sadness" category:

I HAVE NO TOOLS BECAUSE I'VE DESTROYED MY TOOLS WITH MY TOOLS. ... because in those days, you could XOR anything with anything and get something useful. When researchers talk about mobile computers, they use visionary, exciting terms like "fast", "scalable", and "this solution will definitely work in practice." With careful optimization, only 14 gajillion messages are necessary.

One of my colleagues found the stash of columns in the "Miscellaneous Excellence" section on Mickens' Web site and reacted with "This is better than getting free cookies."

Here's an interview with "the funniest man in Microsoft Research".

I would have done this for TechNet Magazine if I had known this was possible.

Also if I had the talent.

Mostly the talent part.

Bonus Mickensian reading: What is art?


Unprofessionalism - Allen Pike

$
0
0

Comments:"Unprofessionalism - Allen Pike"

URL:http://www.allenpike.com/2013/unprofessionalism/


Acting professionally and being a professional are different things.

One year ago, we released Party Monster, our fun little DJ app for parties and road trips. We take our work seriously, but we included something a little unprofessional: by default, the app wouldn’t play Nickelback. Here’s what reviewers had to say about it:

“You know you want to check out an app with a “Refuse to Play Nickelback” preference setting.” - John Gruber, Daring Fireball “There’s a very clever option in the app’s settings, enabled by default, that refuses playback of Nickelback songs.” - Preshit Deorukhkar, Beautiful Pixels “Party Monster offers the single best settings option ever: Refuse to Play Nickelback. It’s on by default. Canada, we forgive you.” - Dave Wiskus, iMore

It was a lot of fun to get the positive feedback. People appreciated our silly little easter egg! Well, some people appreciated it.

“Soooooo creative.” - @nickelbackers “I am very unhappy with this as it is not there right to band certain artists especially when you have paid for the app and created all the playlists for your wedding. Very displeased!” - Junomei, one star app reviewer “Congratulations on subtracting functionality for the sake of a circlejerk, you obnoxious cunt.” - eifersucht12a

Wait, what? We did something unprofessional and knew we might get some blowback, but I was surprised by that last one. I shouldn’t have been - this is what happens when you reveal to the awkward truth: you’re a human being.

Pissing people off

The behaviours that make us human are not professional. Honesty, frankness, humour, emotionality, embracing the moment, speaking up for what you believe, affection, sincerity. Quoting extremely offensive trolls. These are all things that will make some people love you and others hate you. When you get more attention, these aspects of your personality fuel the inevitable backlash. As your audience grows, the chance of any given action triggering criticism asymptotically approaches 100%.

“Look on Twitter, look at the @-reply stream of a celebrity, or somebody with 100,000 followers or more. Look at Gruber’s @-replies. Look at anybody who has a large following who says anything of any value ever. You’re going to see hundreds and hundreds of people calling them an asshole and calling them an idiot.” - Marco Arment, ATP 41

It’s particularly bad if you speak or write on a broad range of topics. If you write about a very narrow topic, for example Objective-C, you won’t get as much crap. People will only listen if they’re interested in Objective-C.

If you have a broad range of passions, however, you will subject fans of some passions (say, productivity) to your other passions (say, comics). Some of your audience will be annoyed because they don’t care about comics, and won’t be equipped to have a thoughful conversation on that topic. Similarly, we get shit when our otherwise useful music app refuses to play Nickelback. It would be a lot easier on us if we all just stuck to the script.

Staying human

“If you want to get better at what you do, and you want to make better things, you are going to have to make your peace with the fact that it’s going to be out there. People are going to think what they think of it, and you have to decide what their response to it has to do with what you decide to do next. I would hope that strangers not liking what you do is not going to stop you from doing things you want in the way that you want. Until you can make your peace with that, there’s a pretty good chance that you’re not going to make that much cool stuff. ” - Merlin Mann, Back to Work #149

People with audiences have three potential strategies for surviving the shit generated by being a human with an audience:

Resistance: Developing a thick skin. A better way of describing it is learning how to filter feedback in a way that helps you grow, but discards trolling and lashing out. Usually this involves only paying attention to criticism when it comes from somebody you know and trust. If a celebrity comes off like a jerk, this is often what’s happening. Split Personality: Being able to turn on and off the “professional version” of yourself in “professional” settings. Many rock stars have an alter ego, perhaps under an entirely different name. This alter ego isn’t political, divisive, emotional, or anything else that might be controversial. One may get their unprofessional side out via a joke tumblr, or by maintaining a crazy alter ego that uses different punctuation. Reclusion: Pulling back when things get too hot to handle. When somebody’s ego isn’t able to handle the grief they’re getting, that person can either slowly or abruptly pull back from the spotlight to save themselves. Occasionally this is for the best, but sometimes it’s a tragedy, as in the case of Kathy Sierra.

All three of these strategies are valid. All three of these are preferable to taking all available feedback, and reducing what you do to boring pablum that isn’t worth giving feedback on at all. We can take our work seriously while still being ourselves.

This year, I’ve been more myself in public, and taken more opportunities to be unprofessional. Unprofessional in the best possible sense: taking my humanity just as seriously as I take my profession. It generates a lot more feedback, positive and negative. Still, I encourage you to try it.

North Korean Officials Flood to China, Possible Mass Defection - koreaBANG

$
0
0

Comments:"North Korean Officials Flood to China, Possible Mass Defection - koreaBANG"

URL:http://www.koreabang.com/2013/stories/north-korean-officials-flood-to-china-possible-mass-defection.html


Sources within the South Korean government report that a large number of senior North Korean officials have fled to China in the wake of Jang Song-Taek’s execution, presenting a golden opportunity for South Korean intelligence. One official is rumored to have a list of North Korean spies living in the South, another has knowledge of North Korean provocations planned for sometime between January and March.

Online, South Koreans eagerly anticipated further evidence of a collapse in the North and any revelations about spies in the South.

Article From Segye Ilbo:

[Exclusive] 70 high-ranking North Korean Officials, including Jang’s former aides, fled to China.

A senior official related to North Korea’s nuclear arms and slush funds in secret contact with Seoul concerning possible defection

List of possible defectors includes former ambassador and high-ranking party and military officials.

Jang is reported to have transferred more than ₩70 million to Kim Jong-nam, the elder half brother of Kim Jong-un.

A group of about 70 North Koreans-high-ranking party, military officials, and their families-are reported to have fled North Korea into China around the time Jang Song-Taek, former deputy chairman of North Korea’s Military Commission, was executed.

Some of the group are known to be in touch with South Korean intelligence while they lie low at a safe place in China and decide whether to defect.

A North Korean officer watches the hunt for Jang Song-Taek’s associates and makes a call to South Korea’s intelligence service, saying he may have some interesting information…

On December 18th, a source within the South Korean government said, “Jang’s aides, who are concerned about being targeted for a political purge by North Korean leader Kim Jong-un in the wake of the execution of Jang Song-thaek, as well as some other officials fearing the reign of terror in the North have escaped from Pyongyang to China en masse.” The sources added, “Intelligence authorities (of South Korea) have already identified about seventy North Koreans who fled in recent days.”

The mass exodus of North Koreans to China, including dignitaries, is highly unusual, presenting fresh challenges to China-North Korea relations as well as inter-Korean ties over the handling of the escapees.

The sources said, “the group of seventy North Koreans does not count the ordinary North Koreans who may have fled, the list includes a former ambassador who served in multiple European countries and an official who handed over confidential documents containing Pyongyang’s provocation plans.”

In a November 17th teleconference with military commanders, Defense Minister Kim Kwan-jin stated that North Korea is highly likely to make provocations sometime between late January and early March in 2014. The government source said that the minster’s comments were based on the classified documents from the high-ranking North Korean official who spoke with South Korean intelligence.

Military sources familiar with intelligence on North Korea said, “We know that among the North Korean officials is one who is well aware of how slush funds of the ruling family in North Korea have been run and another heavyweight who is bargaining with the National Intelligence Service (of South Korea) to share a list of Pyongyang-deployed spies in the South and nuclear arms-related information,” adding, “Most of the escaped North Koreans want to defect to South Korea.”

Another source said, “If one of Jang’s lieutenants want to defect to South Korea, chances are high that he has been in charge of Jang Song-Taek’s secret funds. An official who has handled Jang’s money would not survive in North Korea, since one of the charges Jang faced was corruption.”

Some lawmakers of the ruling Saenuri Party floated the possibility that Kim Jong-nam, the elder half brother of North Korean leader Kim Jong-un, might even decide to seek asylum in the South.

According to the intelligence agency’s assessment, Jang’s swift execution was triggered by his attempt to make Kim Jong-nam the leader of North Korea rather than his nephew. Jang is known to have sent a total of $70 million to the elder Kim.

Diplomatic sources said, “The Ministry of Foreign Affairs has yet to receive any information regarding the flight and possible defection to South Korea of North Korean officials,” adding, “It appears that intelligence authorities will be taking direct control of the situation in the interest of security.”

In a hurriedly-arranged meeting of the Foreign Affairs and Unification Committee of the National Assembly, Minister of Unification Ryoo Kihl-jae said, “We need to keep a close watch on the possible defections of Jang’s aides,” adding, “I have no knowledge about the defection of Jang’s aides and a deputy prime minister-level official that has been reported in the media.”

Comments from Naver:

mola****:

We must get our hands on the list of North Korean spies here.

xhrl****

Hey Kim Jong-un, you just opened Pandora’s Box.

kbio****:

Reporter! Your article will encourage North Korea to tighten its border patrol and prevent more North Korean residents from fleeing out of there.

ingl****:

I trust NIS officials. Though you are in trouble right now, but I think you guys work hard to fight against foreign spies. Please do your job for the sake of our nation. If you do well, it will be all dae-bak for us.

ilik****:

When Defense Minister Kim Kwan-jing said,”North Korea is likely to make armed provocations between January and March of next year,” the opposition Democratic Party (DP) said, “vague predictions stir up unease among people,” and made partisan attacks. The DP always does things this way.

ykro****:

North Korea has already began its implosion. Pyongyang’s collapse is just a matter of time.

rlwn****:

The most welcome news is that a North Korean official fled with the list of North Korean spies here. Let’s find out how many spies operate in the South. No need to have a trial for them, all they need is summary execution. But they are not worth bullets, so shatter their brains with a hammer.

hjle****:

Why are the Catholic Priests’ Association for Justice (CPAJ) and religious groups remaining silent about the execution of Jang Song-Taek and the people around him? Is it because these things happened in other country? Aren’t they supposed to pray for peace for all? or are they just politicians pretending to religious people? Please, you need to talk about it.[...] [Note: In a November 22 mass, CPAJ called for President Park's resignation over the spy agency's alleged meddling in the 2012 presidential election and said that North Korea's November 23, 2010 shelling of Yeonpyeong island near the western maritime border can be described as inevitable because the maritime border has had problems before, sparking strong protest against the priests.]

prel****:

Kim Jong’s ascent to power brings us one step closer to the unification of two Koreas. The day when taxes for defense will be lifted and taxes for unification are collected seems close now. I think it will not be long before the Republic of Korea will become resurgent, one unified Korea able to take on the Number 1 country in Asia. With Japan on the verge of self destruction, a reunified Korea will beat Japan easily. China has as much trash as it has talent, so it will be long before China joins the ranks of advanced nations..[...]

Comments from Daum:

비니파더님:

My feeling is that it will not be long before North Korea’s collapse. We need to quickly unification and get rid of that bastard Kim Jong-un. It will be a pity for the Saenuri Party because they will no longer be able to play around with McCarthyism. Quickly bring the North Korean officials here, catch the spies and execute them all. And keep an eye out for spies who try to run away.

야생국화:

For our part, We need to drive away the North Korean sympathizers here in North Korea.

푸른:

The list of spies must include Lee Seok-ki and Lee Jeong-hee.

dlathdxo:

North Korean spies here will get fucked soon. They will get identified after we get that list from the North Korean official. I will personally fire 100 bullets at the heads of the Unified Progressive Party (UPP) bitches when they are caught fleeing to North Korea.

홍어헌터:

I think cutting the source of money will make Kim Jong-nam defect to South Korea soon.

GNUnet 0.10.0 | GNUnet

$
0
0

Comments:"GNUnet 0.10.0 | GNUnet"

URL:https://gnunet.org/gnunet0-10-0


We are pleased to announce the release of GNUnet 0.10.0. This release represents a major overhaul of the cryptographic primitives used by the system. GNUnet used RSA 2048 since its inception in 2001, but as of GNUnet 0.10.0, we are "powered by Curve25519". Naturally, changing cryptographic primitives like this breaks backwards compatibility entirely. We have used this opportunity to implement protocol improvements all over the system. In terms of usability, users should be aware that (1) compiling GNUnet requires recent versions of libraries that were only released in December 2013 and are thus unlikely to be available in common distributions, (2) the nascent network is tiny and thus unlikely to provide good anonymity or extensive amounts of interesting information, and (3) that we had limited time to test the new code, especially in a real-world deployment. As a result, this release is only suitable for early adopters with some reasonable pain tolerance.

About GNUnet

GNUnet is a framework for secure peer-to-peer networking. GNUnet's primary design goals are to protect the privacy of its users and to guard itself against attacks or abuse. At this point, GNUnet offers four primary applications on top of the framework:

The file-sharing service allows anonymous censorship-resistant file-sharing. Files, searches and search results are encrypted to make it hard to control, track or censor users. GNUnet's anonymity protocol (gap) is designed to make it difficult to link users to their file-sharing activities. Users can also individually trade-off between performance and anonymity. Despite providing anonymity, GNUnet's excess-based economy rewards contributing users with better performance.

The VPN service allows offering of services within GNUnet (using the .gnu TLD) and can be used to tunnel IPv4 and IPv6 traffic over the P2P network. The VPN can also be used for IP protocol translation (6-to-4, 4-to-6) and it is possible to tunnel IP traffic over GNUnet (6-over-4, 4-over-6). Note that at this stage, it is possible for peers to determine the IP address at which services are hosted, so the VPN does not offer anonymity.

The GNU Name System (GNS) provides a fully-decentralized and censorship resistant replacement for DNS. GNS can be used alongside DNS and can be integrated with legacy applications (such as traditional browsers) with moderate effort. GNS provides censorship-resistance, memorable names and cryptographic integrity protection for the records. Note that at this stage, it is possible for a strong adversary to determine which peer is responsible for a particular zone, GNS does not offer strong anonymity. However, GNS offers query privacy, that is other participants can typically not decrypt queries or replies.

GNUnet Conversation allows voice calls to be made over GNUnet. Users are identified using GNS and voice data is encrypted. However, GNUnet Conversation does not provide anonymity at this stage --- other peers may observe a connection between the two endpoints and it is possible to determine the IP address associated with a phone.

Other applications are still under development.

Key features of GNUnet include:

  • Works on GNU/Linux, FreeBSD, OS X and W32
  • P2P communication over TCP, UDP, HTTP (IPv4 or IPv6), HTTPS, WLAN or Bluetooth
  • Communication can be restricted to friends (F2F mode)
  • Includes a general-purpose, secure distributed hash table
  • NAT traversal using UPnP, ICMP or manual hole-punching (possibly in combination with DynDNS)
  • Small memory footprint (specifics depend on the configuration)

For developers, GNUnet offers:

  • Access to all subsystems via clean C APIs
  • Mostly written in C, but extensions possible in other languages
  • Multi-process architecture for fault-isolation between components
  • Use of event loop and processes instead of threads for ease of development
  • Extensive logging and statistics facilities
  • Integrated testing library for automatic deployment of large-scale experiments with tens of thousands of peers

Noteworthy improvements in 0.10.0

  • Improved documentation, including an extensive developer handbook and a new post-installation tutorial with first-steps for users
  • New application: GNUnet Conversation
  • New combined multi-process GUI gnunet-gtk
  • New tool to create GNS Business Cards gnunet-bcd
  • New tool to import GNS QR codes gnunet-qr
  • Use of EdDSA and ECDHE instead of RSA for peer's public key cryptography
  • CORE connections now use perfect forward secrecy with 12h rotation intervals
  • Use of ECDSA for GNU Name System and identity management
  • Unified identity management for GNS and File-sharing
  • KSK and SKS queries in file-sharing are now indistinguishable
  • Peers in F2F mode can use "do not gossip" flag to hide their existence from non-friends entirely
  • End-to-end encrypted mesh tunnels
  • Flow- and congestion-control for mesh tunnels
  • Improved key revocation scheme for the GNU Name System
  • Improved query privacy for the GNU Name System
  • Improved name shortening for the GNU Name System
  • Improved handling of shadow records for the GNU Name System

The above is just the short list, our bugtracker lists over 350 individual issues that were resolved. It also contains a list of known open issues that might be useful to consult.

Known Issues

We have a few issues that were reported by developers in the last week that were most likely not resolved in the final release. Users should be aware of these issues, which we hope to address shortly.

  • NAT traversal does not work as well as it should (feature), explicit hole punching and specification of the external IP in the configuration is advised
  • Timestamps in log files do not respect winter time (#3236)
  • When the HTTP(S) transport plugins are enabled, peers sometimes fail to connect at all (#3238)
  • Rarely, the TCP transport plugin may cause a crash (#3232)
  • Bandwidth allocation among the neighbors of a peer seems to be sometimes rather unfair (#3237)
  • Crashes in gnunet-fs-gtk (#3240) and the MESH service (#3239) were reported but could not yet be reproduced

In addition to this list, you may also want to consult our bug tracker at
https://gnunet.org/bugs/.

Availability

The GNUnet 0.10.0 source code is available from all GNU FTP mirrors. The GTK frontends (which includes the gnunet-setup tool) are a separate download.
Please note that the mirrors might still be synchronizing..

All known releases https://gnunet.org/current-downloads GNUnet on a FTP mirror near you http://ftpmirror.gnu.org/gnunet/gnunet-0.10.0.tar.gz GNUnet GTK on an FTP mirror near you http://ftpmirror.gnu.org/gnunet/gnunet-gtk-0.10.0.tar.gz GNUnet FUSE on an FTP mirror near you http://ftpmirror.gnu.org/gnunet/gnunet-fuse-0.10.0.tar.gz GNUnet on the primary GNU FTP server ftp://ftp.gnu.org/pub/gnu/gnunet/gnunet-0.10.0.tar.gz GNUnet GTK on the primary GNU FTP server ftp://ftp.gnu.org/pub/gnu/gnunet/gnunet-gtk-0.10.0.tar.gz GNUnet FUSE on the primary GNU FTP server ftp://ftp.gnu.org/pub/gnu/gnunet/gnunet-fuse-0.10.0.tar.gz

Note that GNUnet is now started using "gnunet-arm -s". GNUnet should be stopped using "gnunet-arm -e".

Thanks

This release was the work of many people. The following people contributed code and were thus easily identified: Alejandra Morales, Andreas Fuchs, Bart Polot, Bruno Cabral, Christian Fuchs, Christian Grothoff, Claudiu Olteanu, David Barksdale, Fabian Oehlmann, Florian Dold, Gabor X Toth, LRN, Martin Schanzenbach, Matthias Wachs, Maximilian Szengel, Nils Durner, Simon Dieterle, Sree Harsha Totakura, Stephan A. Posselt, and Werner Koch. Additionally, we thank Sébastien Moratinos, Diana del Burgo, and gillux for their work on the website.

Further Information

GNUnet Homepage https://gnunet.org/ GNUnet Installation Handbook https://gnunet.org/installation-handbook GNUnet Forum https://gnunet.org/forum GNUnet Bug tracker https://gnunet.org/bugs/ IRC irc://irc.freenode.net/#gnunet

Article 42

$
0
0

Comments:""

URL:http://blog.existentialize.com/so-you-want-to-crypto.html


I've been following the Telegramstory over the past week.

I couldn't get past how the team at Telegram made such odd decisions. Presumably they are a group of smart people who want to help people communicate. So how did they manage to piss off the entire crypto community?

They did it by disregarding best practices and mountains of advice. They did it by not consulting professional cryptographers. They did it by assuming they were smart enough to figure it out as they went along.

Don't make the same mistakes they did.

So, you want to implement some sort of cryptography in your software or hardware project. Great. If you fuck this up people aren't going to be just mad like they might be with other bugs. They might be in prison or they might have been assassinated.

Cryptography in practice is engineering, not hacking, and it comes with serious responsibility. So get it right. Ask for help. Do not let users use your product until it's been vetted. Don't listen to idiots who tell you otherwise.

Learn the theoretical background

So, if you don't know much about cryptography, you should probably take a course. The Cryptography I course at Coursera is a good start, as is your local university's cryptography course.

Both Applied Cryptography and the Handbook of Applied Cryptography are great resources, although they're a little dated now.

Matthew Green also has a great list of cryptography resources

Learn how to implement it

More important than knowing how to use the Chinese remainder theorem is how to use cryptography in practice.

Step one is to read Cryptography Engineering. This is not optional. Read it. It is a fantastic book that details how to use cryptographic primitives. You'll be able to say to your crypto-ignorant friends: Yes, you encrypt your message with AES, but you used ECB! Or, you used Encrypt-and-MAC instead of Encrypt-then-MAC you dummy!

Step two is to study crypto in practice. OTR, the standard for secure messaging is a very well studied implementation of several key crypto features: key exchange, socialist millionaire's protocol, perfect forward secrecy and more. Read this presentation and maybe even the protocol spec if for nothing other than reading what a well written cryptographic protocol looks like.

Other good pieces of software to study are Tarsnap the backup utility and its crypto choices, and WhisperSystems'sTextSecure app. Their blog posts are typically excellent.

Keep up-to-date

Cryptography is something you'll need to keep learning for the rest of your career.

The way I do it is via blogs. Here's some to get you started:

There are also a few good mailing lists.

The most important thing to do is to follow crypto best practices. Since you're not a professional cryptographer, you aren't really aware of the security trade-offs of, say AES-IGE verses AES-CTR and a SHA256-HMAC.

So what are the best practices?

Steal first

Most applications of cryptography are not secure messaging or anonymity networks. Instead, they're "authenticate this REST API" or "encrypt this gossip protocol".

If your application fits in this area, try to steal someone else's design first.

So if you need a way to authenticate your REST API, don't roll-your-own. Adaptthe AWS authentication scheme to your own purposes. This is how you can utilize the years of Amazon's experience of getting it wrong to your own benefit.

If you go this route, you will still need to vet your design.

Design and vetting

So, if your application is fairly unique, and you can't just borrow someone else's design what do you do?

Design your own.

Cryptography isn't something you can iterate on until you get it right, because you'll never know if you do. It's best if you design your protocol up front (before you write any code) and then ask people who are in the know what they think.

If you're a company, hire a professional cryptographer. She can audit your design, or (better yet) design one for you. This isn't something you can afford to get wrong.

If you're just an individual, try emailing your design to the Liberation Tech mailing list. I did this for a project and was told (rightly so) that my design wasn't good enough. Ask for feedback early and often.

You'll need to pick cryptographic primitives too.

Colin Percival has a great blog post, "Cryptographic Right Answers" which details what ciphers/modes/hashes to use. Even though it was written in 2009, it's still valid today.

Do not pick wacky modes or unknown ciphers. There is little reason to be creative when you can be correct. Choosing AES-IGE is suspicious and there's no reason to pick that when you can instead use CFB or CTR.

Explanations are paramount

You know the old saying: “Every ‘why’ has a ‘wherefore.’” Dromio of Syracuse The Comedy of Errors by William Shakespeare

Cryptographers have nothing up my sleeve numbers. These numbers are typically values that are static in the cipher, and could be anything. However, what if you chose your numbers such that they dramatically reduced the effort an attacker?

Sound implausible? It's happened.

So, to show that they have nothing to hide, they use famous numbers such as π or e.

In your design, you similarly need to show that you chose the right cipher modes, the right IV generation tactic and so on. You'll need to explain why you chose what you did.

If you've chosen standard algorithms and implementations recommended by cryptographers, your job is easy. Similarly, if you've copied a design from another service, your job is easier.

Implementation

Often, attacks are easier against the implementation of the cryptography than against the cryptography itself. These are known as side channel attacks

Timing attacks are attacks you should especially watch out for.

Say you've signed your REST API with an HMAC and want to compare it against the value you computed on your server?

Easy right?

@post('/update/<apikey>/<hash_value>')defupdate(apikey,hash_value):server_hash_value=compute_hash(apikey,request)ifserver_hash_value!=hash_value:raiseHTTPError()else:# Process

This is wrong.

Because strings are checked one byte at a time (not always true, I know), they will stop at the first difference. Attackers can then test how many bytes matched by timing the comparison.

Instead, you need to use a timing attack resistant comparison function.

Cryptography is littered with seemingly minor implementation gotchas. It's a long road to writing great cryptography software. The best thing you can do is embrace the crypto community and ask for help.

People who aren't aware get this wrong.

After you've designed your software and implemented it, you need to open source your code and have people review it. Even if you're a company selling software. If it's not open source, it's not safe.

You'll also need to encourage people to look at your software. Everyone is busy and not everyone cares about your project or company.

Offer a bug bounty program. Not a contest. Actually pay people who find problems, even minor problems.

And don't make your cryptography project sound like snake oil. Saying military grade encryption or N-bits of security makes you sound like you don't know what you're talking about.

You can't just learn cryptography like you learn CSS or Erlang or MySQL. You need to study it first and then implement it. Otherwise you'redangerous.

But don't let that stop you from learning cryptography. Some people will sayyou shouldn't be doing crypto at all unless you're a cryptographer. That's nonsense.

More people should learn cryptography. But you should realize you're no longer shooting yourself in the foot if you mess up. You'll be hurting other people.

And you'll be responsible.

Philip Guo - Unicorn Jobs

$
0
0

Comments:"Philip Guo - Unicorn Jobs"

URL:http://www.pgbovine.net/unicorn-jobs.htm


December 2013

Summary

Many creative technically-minded people want a dream “unicorn job” where they can get paid to spend most of their time hacking on their own projects. Bad news: Unicorn jobs don't exist, at least not for long. Good news: There are ways to approximate them if you're willing to work hard enough. This article provides some ideas on how to find near-unicorn jobs.

In the past year, I've talked with dozens of creative technical people – mostly computer scientists – who are looking for a dream “unicorn job” where they can get paid to spend the majority of their work days on their own personal projects. I've also spent quite a bit of time looking for my own unicorn job as I was finishing up my Ph.D.

My conclusion thus far is that these mythical jobs simply don't exist ... at least not for long. This article provides some ideas on how to approximate such an ideal work setup, though.

Personal Unicorns

So far I've managed to hold down two unicorn jobs for about three months each:

  • During Summer 2011, I interned at Google to work on CDE, my own open-source project with about 10,000 users. I got this lucky gig by giving a Google Tech Talk on CDE earlier that year. Serendipitously a technical manager attended my talk and loved the project enough to offer me an internship on the spot. While all the other interns were grinding away on their boss's projects, I had full freedom to hack on CDE all summer.

  • From July to October 2012, I was a full-time employee at Google spending most of my time working on Online Python Tutor, another open-source project of mine, which had around 100,000 users at the time. Again, I got this lucky gig because a very influential person at Google vouched for me and offered to sponsor this project for the near term.

Since the first gig was an internship, I knew it was a limited time offer. And I didn't know how long the second one would last, so I rode it out as long as I could. After few months, though, the political tides within my division at Google shifted, and I suddenly lost the freedom to work primarily on my own project. My unicorn job morphed into a regular software engineering job (albeit one with great pay and benefits).

When I realized that my unicorn job was gone, I still remained at Google for a few more months but started looking for a new job right away. I concluded that the closest approximation for me was to get a job as a tenure-track assistant professor at a research-focused university, so that's the path I'm on right now. It's not a unicorn job – but it's the closest that I can get at this point, so I'm very grateful for this opportunity.

Unicorns Are Temporary

I've heard similar stories of influential people at companies sponsoring unicorn jobs for lucky unknowns (like me) for a few months at a time. But ultimately this model is not sustainable. Sooner or later, the political tides will inevitably shift, and the unicorn job will disappear as quickly as it was created. After all, companies aren't in the business of paying unknowns to work on their own projects.

Two years was the longest that anyone I know has ever kept a unicorn job. The co-founder and CTO of a prominent startup really valued the expertise that my friend built up during his Ph.D., and also liked him as a friend, so he hired him to unicorn it nearly full time. Again, his gig was great for a while, but as the company grew and political tides shifted against the CTO, it became harder and harder for him to protect my friend from the forces of capitalism: His unicorn project wasn't benefiting the company financially, so it was constantly in danger of getting killed. It was only a matter of time.

Even famous programmers cannot sustain unicorn jobs forever. Guido van Rossum, the creator of the Python language, which is used by bazillions of people, worked for Google for seven years. He was allowed to spend 50% of his time working on Python, his unicorn. Other famous programming language and operating systems creators usually have similar agreements with their employers, where they split time evenly between their own project and the company's projects. So if world-famous luminaries can't even spend most of their days working on unicorn projects, then what makes you think that you can?

Finding Your Own Temporary Unicorn

Maybe I'm turning into a crabby old man, but whenever someone now asks me for advice about how to find a unicorn job, I first ask them, Okay, are you really good at anything? I mean, really, really good, like world-class good. If you've got a moderately known software project like mine, then maybe you can sneak into a unicorn job for three to six months. If you're a leading expert in your field like my friend with his Ph.D. work, maybe you can hide out at a company (or at a university research lab as a postdoc or visiting research fellow) unicorning for two years. And if you're a world-renowned, award-winning programmer likeGuido, then maybe you can sustain a long-term job with 50% unicorn time.

On the flip side, if you're not really good at anything, then you have no shot at a unicorn job, even a temporary one. So start getting good at something, preferably something useful to others. How do you do that? Contribute to open source software, get a Ph.D. in an applied field with compelling real-world applications to become an expert on a given topic, or work in a bunch of hard jobs to build up your technical expertise, street cred, and professional connections. Merely graduating from college – even a top-ranked one – isn't nearly enough; there are tens of thousands of fresh-faced 21-year-old grads just like you every year coming out of the world's top universities.

My second question is: Are you good with people? Ultimately somebody influential needs to “sponsor” your project and convince their organization to let you work on it. And for that to happen, you need to be good at knowing what other people want and love, to lead from below. You need to convince others to pay you to work on your dream; that's no easy feat. If you're a reclusive asshole whom nobody likes, then nobody is going to sponsor you to unicorn it all day. (The only exception is if you're so incredibly good at what you do that you become world famous even despite your lack of social graces.)

My third question is: Are you willing to move? The only unicorn jobs that exist are temporary and maybe not where you live, so you need to be willing to pick up at a moment's notice and move on to the next gig. It's a nomadic lifestyle, and finding your next unicorn gets more and more difficult as you grow older and desire more stability. One way to approximate a long-term unicorn job is to continuously hop from one short-term unicorn job to another, but if each gig lasts for less than a year, then that's a lot of job hopping, which can get exhausting.

In sum, if you want to unicorn it, you stand the best chance by getting short-term gigs that cater to your specialized skill set. Doing so requires deep expertise in something, not just a vanilla college degree; a Ph.D. is one well-recognized way to demonstrate expertise, but so is becoming a famous technical blogger or open-source software contributor. Also, unicorn jobs usually materialize through personal referrals, so the better you are with people, the more likely you'll find one. And finally, remember that even if your unicorn doesn't last, it will still be a memorable and worthwhile experience, and might help you catch future unicorns. Good luck!

Conclusion: Nobody Truly Unicorns

The creative people I know who love their jobs most are successful professors, entrepreneurs, senior engineers, and corporate business leaders who have made it to the top of their respective professional ladders. Even though they have sustained a passion for their work over many years, they only spend a tiny fraction of their time doing what they love – maybe 10% to 30% at most. In other words, they don't have unicorn jobs. However, they put up with the other 70% to 90% of unglamorous grind because that 10% to 30% is totally worth it to them. (Even world-famous programming language inventors get only 50% unicorn time.)

Thus, it seems like the only way to sustain a near-unicorn career long-term – over a few decades – is to acknowledge that you will only get to do what you love a small fraction of the time. Creative people must fight hard to carve out the freedom to pursue their passions at work, and keep fighting on different fronts as political tides shift. That means doing lots of tasks that they don't ordinarily want to do, and frequently stepping out of their comfort zone. Freedom isn't free.

One trite suggestion is, Why don't you just start your own company? From talking with friends who have done so, I can confidently say that entrepreneurship is not a unicorn job. You spend the majority of your work days on logistics, errands, coordination, and other overhead that's not at all related to furthering your core dream – but those steps are ultimately necessary for launching your product and succeeding in the marketplace. It's a great gig for some people, but definitely not a unicorn job.

The only people who can potentially have a true unicorn job are those who are already wealthy enough that they don't need to work for a living. In other words, they don't need a job. Succeeding as an entrepreneur is one common way to get there. And if you're in that category, then you obviously have better things to do than reading this article.

Created: 2013-12-11
Last modified: 2013-12-11

Related pages tagged as how-to:

Related pages tagged as jobs:

Jude : How to Become a Better Developer

$
0
0

Comments:"Jude : How to Become a Better Developer"

URL:https://coderwall.com/p/hd49fq


It’s important that you read through the entire documentation for a technology before asking questions in IRC, maybe even multiple times.

I strongly disagree with this. Especially when a certain piece of technology is advertised, hyped, implied to be able to do X, then it's in the best interest of the author(s) of said tech to show why and how it achieves that.

It's nice if I get ten pages about how the inner file descriptor and buffer allocation happens, and how revolutionary X's network approach is, but without at least a bunch of different examples that show why good-old APIs, for example the BSD sockets or the POSIX file systems, don't cut it or how easier it is to do what those do with X, I get really sad. And for-better-or-worse we have enough alternative implementations, proposals, protocols, stacks, gems and packages, and not necessarily the best technological solution (that best caters to the most of the use cases, user problems, aka needs).

So, I think, it should be a desirable goal for engineers to clearly communicate their achievements and results, and tend to their creations.

Indian Scientists developed Insulin Pill for diabetics

$
0
0

Comments:"Indian Scientists developed Insulin Pill for diabetics"

URL:http://www.jagranjosh.com/current-affairs/indian-scientists-developed-insulin-pill-for-diabetics-1387625296-1


The Indian Scientists, National Institute of Pharmaceutical Education and Research (NIPER) developed insulin pill for diabetics in third week of December 2013.


The Scientists developed a long-sought insulin pill that could spare millions of diabetics and soughed a way the delivery of insulin therapy from a jab to a pill.

The experiments with rats, the pill lowered blood glucose levels almost as much as injected insulin and the effects of the pill lasted longer than injected insulin.

The body’s digestive enzymes in the body are so good at breaking down food also break down insulin before it can get to work. In addition, insulin does not get easily absorbed through the gut into bloodstream.

To solve these problem researchers from National Institute of Pharmaceutical Education and Research (NIPER) in Punjab combined two approaches to shield insulin from the digestive enzymes and then get it into the blood.

The team of researchers Ashish Kumar Agrawal, Harshad Harde, Kaushik Thanki and Sanyog Jain, packaged insulin in tiny sacs made of lipids or fats called liposome.Then wrapped the liposomes in layers of protective molecules called polyelectrolytes.
To get absorbed and to transport the layersome across the intestinal wall into the blood stream they attached folic acid and a kind of vitamin B.

This was published in American Chemical Society journal biomacromolecules, Washington.

About Diabetics:
Diabetics also called as diabetic mellitus, describes a group of metabolic diseases in which the person has high blood glucose (blood sugar) either because insulin production is inadequate or because the body's cells do not respond properly to insulin or both.

There are three types of diabetics that is Type1 diabetics, Type2 diabetics and Gestational Diabetes

An estimated 347 million people globally are living with diabetes. Diabetics must test their blood sugar several times and need insulin jabs for the rest of their lives in order to maintain adequate levels of the hormone.

If you have any Question/Point on the above information, please ask/discuss it in the Current Affairs Group


Why [Programming Language X] Is Unambiguously Better than [Programming Language Y] | Joel Grus

$
0
0

Comments:"Why [Programming Language X] Is Unambiguously Better than [Programming Language Y] | Joel Grus"

URL:http://joelgrus.com/2013/12/24/why-programming-language-x-is-unambiguously-better-than-programming-language-y/


Recently I have seen a lot of people wondering about the difference between [X] and [Y]. After all, they point out, both are [paradigm] languages that target [platform] and encourage the [style] style of programming while leaving you enough flexibility to [write shitty code].

Having written [simple program that's often asked about in phone screens] in both languages, I think I’m pretty qualified to weigh in. I like to think about it in the following way: imagine [toy problem that you might give to a 5th grader who is just learning to program]. A [Y] implementation of it might look like this:

[really poorly engineered Y code]

Whereas in [X] you could accomplish the same thing with just

[slickly written X code that shows off syntactic sugar]

It’s pretty clear that the second is easier to understand and less error-prone.

Now consider type systems. [Religious assertion about the relative merits and demerits of static and dynamic typing.] Sure, [Y] gives you [the benefit of Y's type system or lack thereof] but is this worth [the detriment of Y's type system or lack thereof]? Obviously not!

Additionally, consider build tools. While [Y] uses [tool that I have never bothered to understand], [X] uses the far superior [tool that I marginally understand]. That’s reason enough to switch!

Finally, think about the development process. [X] has the amazing [X-specific IDE that's still in pre-alpha], and it also integrates well with [text-editor that's like 50 years old and whose key-bindings are based on Klingon] and [IDE that everyone uses but that everyone hates]. Sure, you can use [Y] with some of these, but it’s a much more laborious and painful process.

In conclusion, while there is room for polyglotism on the [platform] platform, we would all be well served if you [Y] developers would either crawl into a hole somewhere or else switch to [X] and compete with us for the handful of [X] jobs. Wait, never mind, [Y] is awesome!

(Hacker News link)

Rap Genius Founders – Open Letter to Google About Rap Genius SEO | News Genius

$
0
0

Comments:"Rap Genius Founders – Open Letter to Google About Rap Genius SEO | News Genius"

URL:http://news.rapgenius.com/Rap-genius-founders-open-letter-to-google-about-rap-genius-seo-lyrics


On Rap Genius, users and the artists themselves explore lyrics interactively via line-by-line annotations that they can read, create, and edit. By contrast, other popular lyrics sites are ad-strewn and reminiscent of a spammier era of the internet. For example, compare Rap Genius’ annotated edition of Justin Bieber's new hit single "Heartbreaker"– which dozens of Bieber fans have annotated with details of his break up with Selena Gomez – to AZLyrics’ version of the same.

That’s our main SEO strategy: to create an amazing experience for users and hope they prefer us to all other lyrics sites and link to us. We believe that any unbiased user would prefer the Rap Genius version over the alternatives – and that this advantage in quality is responsible for the majority of our search traffic.

The other strategy we employ on a much smaller scale (the subject of recent Hacker News controversy) is to find blogs whose content we think our followers will enjoy and ask them to link pages on Rap Genius that are relevant to their posts. We actually thought we had set this up to be compliant with Google’s linking policy in its Terms of Service, but we messed up and want to explain how. Here's a look at this strategy in relation to Google's guidelines:

1) Buying or selling links that pass PageRank. This includes exchanging money for links, or posts that contain links; exchanging goods or services for links; or sending someone a “free” product in exchange for them writing about it and including a link.

We’ve never bought or sold links.

We do provide exposure on our Twitter and Facebook feed (not rapgenius.com) for fans that link to us, if and only if they send us good content. We don’t tweet out weak content because our followers won’t like it, and we don’t want links to Rap Genius placed on irrelevant or poorly written blogs.

Although we extend an open invitation to all bloggers to reach out to us – and we respond to all who do – our policy is to only promote the ones who send us good and relevant content.

But the terms also state: "Additionally, creating links that weren’t editorially placed or vouched for by the site’s owner on a page, otherwise known as unnatural links, can be considered a violation of our guidelines."

This is where we messed up. Though any links to our tracks that our fans put on their pages were editorially placed or vouched for by them, in some instances we have fallen short in terms of making sure that the links people post are natural.

Here’s an example of good content: a post on Beyonce’s new album, followed by a useful list of links to the corresponding tracks.

Here’s an example of what shouldn’t have happened: a post with the best verses of 2013 followed by a list of Bieber links.

Posts like the former are what we intended; posts like the latter could indeed fall under the “unnatural links” policy, and we’ll discourage things like this in the future. We are also getting in touch with the relevant site owners individually to request that they remove any such links. Just to be clear, this is an not a widespread practice, and it should not be too difficult to stamp out.

2) Excessive link exchanges ("Link to me and I'll link to you") or partner pages exclusively for the sake of cross-linking

We don’t do this.

In more detail: for that subset of fans whose posts we tweet, we are (a) not linking back to them from rapgenius.com and (b) all links on Twitter are rel=nofollow anyway. That is, links shared on Twitter may give temporary traffic to fan sites, but not long-term link juice.

All links in annotations and other user-generated content on rapgenius.com are marked rel=nofollow as well, specifically to avoid the possibility of any link-juice value exchange.

3) Large-scale article marketing or guest posting campaigns with keyword-rich anchor text links

We don’t do this.

The only thing that might be relevant: we write a weekly "The top 5 lines of the week" guest feature on thegrio.com (example) which contains links to the 5 hottest lines of the week (natural) and links to hot new songs (less natural).

We also do occasional collaborations around new album releases with sites like Vibe and Esquire.

4) Using automated programs or services to create links to your site

We don’t do this.

As noted above, we’re going to be requesting that site owners take down links to Rap Genius that don’t fit well with their editorial content. Going forward, we do believe a track list widget that bloggers can embed has value, but we’ll develop one in JS rather than HTML.

Rap Genius is the product of many passionate communities collaborating to create a massive, living knowledge base for the world to enjoy. We do not want to break Google’s rules, and will do whatever it takes to learn them inside out and comply with them. Thank you very much!

Tom, Ilan, and Mahbod

PS:

With limited tools (Open Site Explorer), we found some suspicious backlinks to some of our competitors (CLICK FOR DETAILS):

AZLyrics.com

Metrolyrics.com

Lyricsfreak.com

Lyricsmode.com

Lyrics007.com

Songlyrics.com

Who’s Selling Credit Cards from Target? — Krebs on Security

$
0
0

Comments:"Who’s Selling Credit Cards from Target? — Krebs on Security"

URL:http://krebsonsecurity.com/2013/12/whos-selling-credit-cards-from-target/


The previous twoposts on this blog have featured stories about banks buying back credit and debit card accounts stolen in the Target hack and that ended up for sale on rescator[dot]la, a popular underground store. Today’s post looks a bit closer at open-source information on a possible real-life identity for the proprietor of that online fraud shop.

Rescator[dot]la is run by a miscreant who uses the nickname Rescator, and who is a top member of the Russian and English language crime forum Lampeduza[dot]la. He operates multiple online stores that sell stolen card data, including rescator[dot]la, kaddafi[dot]hk, octavian[dot]su and cheapdumps[dot]org. Rescator also maintains a presence on several other carding forums, most notably cpro[dot]su and vor[dot]cc.

A private message on cpro[dot]su between Rescator and a member interested in his card shop. Notice the ad for Rescator’s email flood service at the bottom; this will become important as you read on.

In an Aug. 2011 thread that has since been deleted, Rescator introduced himself to the existing members of vor[dot]cc, a fairly exclusive Russian carding forum. When new members join a carding community, it is customary for them to explain their expertise and list previous nicknames and forums on which they have established reputations.

Rescator, a.k.a. “Hel” a.k.a. “Helkern” the onetime administrator of the Darklife forum, introduces himself to vor[dot]cc crime forum members.

In this particular thread, pictured in the screenshot above, we can see Rescator listing his bona fides and telling others he was “Hel,” one of three founders of darklife[dot]ws, a now-defunct hacker forum.

Rescator says his former nickname was “Hel,” short for Helkern, the administrator of Darklife.

The only darklife member who matched that nickname was “Helkern,” one of darklife’s three founders. Darklife administrators were all young men who fancied themselves skilled hackers, and at one point the group hacked into the venerable and closely-guarded Russian hacking forum cih[dot]ms after guessing the password of an administrator there.

Darklife admin “Helkern” brags to other members about hacking into cih[dot]ms, a more elite Russian hacking forum.

In a counterattack documented in the entertaining thread that is still posted as a trophy of sorts at cih[dot]ms/old/epicfail, hackers from cih[dot]ms hacked into the Darklife forum, and posted personal photos of Helkern and fellow Darklife leaders, including these two of Helkern:

And a self-portrait of Helkern:

So if Helkern is Rescator, who is Helkern? If we check at some of the other Russian forums that Helkern was active in at the time that Darklife was online in 2008, we can see he was a fairly frequent contributor to the now-defunct Grabberz[dot]com; in this cached post, Helkern can be seen pasting an exploit he developed for a remote SQL injection vulnerability. In it, he claims ownership of the ICQ instant messenger address 261333.

In this introductions page from a Russian language gaming forum, a user named Helkern also was active in 2008 and claimed that same ICQ address. Helkern said his email address was root@helkern.net.ua, his Skype address was helkern_skype, and that he lived in Odessa, the third-largest city in Ukraine. Helkern — going by his shortened username “Hel,” also was a VIP member of xaker[dot]name. In this cached post we can see him again claiming the 261333 ICQ address, and pointing out to other members that his real nickname is Helkern.

Andrew from Odessa’s LiveJournal profile pic from the account ikaikki”

A historic WHOIS lookup ordered from domaintools.com shows that helkern.net.ua was first registered in 2008 to an Andrey Hodirevski from Illichivsk, a city in the Odessa province of southwestern Ukraine.

I located a relatively recent Livejournal profile (ikaikki.livejournal.com/profile) for an Andrew Hodirevski from Odessa, Ukraine that includes several profile pictures which are remarkably similar to the photos of Helkern leaked by the cih[dot]ms guys. That profile (“ikaikki“) says Hodirevski’s email address is ikaikki@livejournal.com, that his Jabber instant message address is ikaikki@neko.im, and that his Twitter account is “purplexcite” (that Twitter has since been deleted). In almost a dozen posts on LiveJournal, Hodirevski talks about his interest in Java programming, and even includes a fewpictures of himself attending an instructional class on Java.

The same anime profile image for Andrew’s LiveJournal page is also on the LinkedIn profile for an Andrew Hodirevski from Ukraine, and the two pages share the aforementioned Twitter profile (purplexcite). Andrew’s LinkedIn page also says he is the administrator and Web developer at a hosting company in Ukraine called ghost.ua. 

That site is no longer online, but a cached copy of it at archive.org shows that the business is located in Odessa at this address, and the phone number +38 (048) 799-53-13. Ghost.ua lists several pricing plans for its servers, naming them after different despotic leaders, including Fidel Castro and Muammar Gaddafi (it is spelled “Kaddafi” on Ghost.ua). Recall as I mentioned at the top of this post that one of the clones of the card shop at Rescator[dot]la is kaddafi[dot]hk.

This page at it-portfolio.net lists an Andrey Hodirevski from Odessa with the same anime profile image, the “purplexcite” Twitter profile, and a Skype address by the same name. It says his professional skills include programming in Java, CakePHP and MySQL, among others. This Google groups discussion about CakePHP includes a message from an Andrey Hodirevski who uses the email address andrew@purpled.biz.

Purpled.biz is no longer online, but a cached copy of it from archive.org shows it was once Andrew’s personal site. Here we learned that Andrew’s current goals (as of 2010) were to get married to his girlfriend, buy the $20,000 Toyota Solara pictured below, move to Helsinki, and to achieve world domination. In order to accomplish the latter goal, Andrew jokes that he “will probably have to rob all the banks in the world.”

After searching my huge personal archive of hacked cybercrime forums for Andrew’s various email and Jabber addresses, I found several private messages sent by different users on the Spamdot[dot]biz forum who recommended to other members the “ikaikki@neko.im” Jabber address as someone to contact in order to hire a service that could be used to flood someone’s Gmail inbox with tens or hundreds of thousands of junk messages. Recall that this Jabber address is the same one listed at Andrew’s LiveJournal profile.

To bring this full circle, one of the many services that Rescator sells these days is a popular email flooding service at rescator[dot]me. Turns out, Yours Truly has already been the direct target of an attack launched through Rescator’s service; I wrote about it in this July 2012 story, Cyberheist Smokescreen: Email, Phone, SMS Floods.

The email flood service at rescator[dot]me

I have no idea if Rescator/Helkern/Andrew was involved in hacking Target, but it’s a good bet that he at least knows who was. I sought comment from various contact addresses listed above for this individual, and received a reply from someone at kaddafi[dot]me who said he knew Andrew and would relay my questions to him. Ultimately, he came back to me not with answers, but with a bribe not to run my story.

(1:48:35 PM) krebs//: hi

(1:48:44 PM) krebs//: brian krebs here

(1:49:05 PM) krebs//: trying to reach rescator

(1:49:11 PM) krebs//: aka andrey

(1:51:12 PM) krebs//: don’t believe it’s really krebs?

(1:51:15 PM) krebs//: http://krebsonsecurity.com/wp-content/uploads/2013/12/kaddaficon.png

(1:53:32 PM) krebs//:

(1:53:53 PM) krebs//: tyt?

(2:00:14 PM) kaddafi.me: Hello Brian

(2:00:24 PM) kaddafi.me has not been authenticated yet. You should authenticate this buddy.

(2:00:24 PM) Unverified conversation with kaddafi.me/Muammar started. Your client is not logging this conversation.

(2:00:30 PM) kaddafi.me: ooo you’ve got OTR

(2:00:37 PM) kaddafi.me: Afraid of NSA? )

(2:01:38 PM) kaddafi.me: Why do you want to talk to Andrew?

(2:03:46 PM) krebs//: i am more afraid of others

[Image] (2:03:56 PM) The privacy status of the current conversation is now: Private

(2:04:11 PM) kaddafi.me: Yeah well you should after someone sent you drugs from silkroad.

(2:04:24 PM) krebs//:

(2:04:59 PM) krebs//: you’re right of course, it’s andrew

(2:05:17 PM) kaddafi.me: What’s all the commotion about Rescator anyways?

(2:05:20 PM) krebs//: well i have a story about him going up tomorrow

(2:05:23 PM) kaddafi.me: Did you even notice other shops are selling same shit?

(2:05:32 PM) krebs//: sure

(2:05:46 PM) krebs//: but I’m not looking at other shops right now

(2:06:05 PM) kaddafi.me: Well you should )

(2:06:10 PM) krebs//: in time

Kaddafi promised a response by 10 p.m. ET yesterday. This morning, not seeing a response, I pinged this individual again, and received the following response:

(10:08:46 AM) kaddafi.me: Hi.

(10:09:19 AM) kaddafi.me: You better contact me from another jabber that’s not associated with your name, I’ve got an offer for you.

(10:11:12 AM) krebs//: why from a different jabber?

(10:11:33 AM) kaddafi.me: Because I’ve got an offer for you. So you don’t think I’m trying to play games and fool around with logs after you read my offer.

(10:11:52 AM) krebs//: what kind of offer?

(10:12:27 AM) $10.000 not to post your article

Obviously, I did not take him up on his offer, assuming he was not just messing with me. Here is a mind map I put together (using MindNode Pro for Mac) that outlines how much of this information was derived and connected.

Tags: andrew hodrievski, andrew@purpled.biz, andrey hodirevski, cheapdumps[dot]org, cih[dot]ms, cpro[dot]su, darklife[dot]ws, Grabberz[dot]com, Hel, Helkern, helkern_skype, icq 261333, ikaikki, kaddafi[dot]hk, Lampeduza[dot]la, octavian[dot]su, purplexcite, rescator, rescator[dot]me, root@helkern.net.ua, vor[dot]cc, Андрей Ходыревский

This entry was posted on Tuesday, December 24th, 2013 at 10:28 am and is filed under A Little Sunshine, Breadcrumbs. You can follow any comments to this entry through the RSS 2.0 feed. You can skip to the end and leave a comment. Pinging is currently not allowed.

benjojo/Countdown · GitHub

$
0
0

Comments:"benjojo/Countdown · GitHub"

URL:https://github.com/benjojo/Countdown


Countdown

What

This is a system that can watch Countdown (if fed the video stream for it) and then decode the letters that are being offered, then tweet the responce in speed. The system runs at around 100FPS on a i5 Laptop far greater then the input rate of 25 FPS from TV.

Contained inside

Inside is the source code for the MJPEG decoder that is to be used for the input. You can feed the system a flv by using the following command in the terminal.

ffmpeg.exe -i ../countdowna.flv -acodec none -vcodec mjpeg -f mjpeg - | ./AutoCountdown/bin/Debug/AutoCountdown.exe

This will take the input FLV (In the real system it is piped into ffmpeg) and have ffmpeg output a MJPEG stream.

Inside of the program is a MJPEG decoder for this use.

How

Input

This system uses FFMPEG for its major heavy lifting. The system takes MJPEG in.

Detection

The system looks for frames like the following:

Each frame is evaluated for being a frame with the letters in by using the two solid parts as shown:

Extraction

If the frame passes that test then it scanned to find the 'box' edges:

The 'box' is then processed to find the bright parts (the text) then outputted to a PNG file

This file is then passed to tesseract in order to do OCR, the result is then read out.

Twitter posting

Once the winning words are calculated, Tinytwitter is used to post the tweet. (Patch for API 1.1 is coming.)

Requirements

Right now there are a few hardcoded paths to things like tesseract. While the system should run under mono, the paths for tesseract need to be fixed for that to happen.

Why use Clojure?

$
0
0

Comments:"Why use Clojure?"

URL:http://www.paradiso.cc/why-use-clojure/


I've seen a few posts now regarding why people should start using Clojure, but until recent events, I couldn't exactly place why I thought Clojure should be used over 'X' language. Now, after becoming a daily user of Clojure, I think I can make a good case for it.

The Business Case

But first, some history...

Traditionally development languages for a product throughout the years were imperative in nature (think C, C++, Java, etc.) and chosen based on their ease of use1, number of available developers for that language, and, likely, its portability into the given customer domain(s). This whole time product development has been going on other languages, functional in nature, were taking off from an academic standpoint (Lisp, Haskell, Scheme, etc.). Unfortunately the market penetration for the latter set of languages never took off in the commercial sector.

Now, and this is purely conjecture, I can only assume that one of the primary reasons functional languages never took hold from a commercial standpoint was from a lack of qualified developers for those languages in the workplace. The comments I've heard from others in the field today are that those languages are too 'hard' or 'difficult' to comprehend and don't have a place in the commercial world. The statement "yeah, I remember 'functional language x' in college, hated it" comes up all too often. In the end I'd chock it up to poor perseption as its ultimate demise2.

(def Clojure "v1.5.1")

If you aren't familiar with Clojure then lets begin with a ten second elevator pitch...

Clojure is a functional language built on top of the JVM with sound features in concurrent operations and Java interoperability

Business Case No. 1 - Fewer lines of code is a great thing

Every developer has heard the saying that the number of lines of code written is directly proportional to the number of bugs left needing to be solved. If you've been developing for a while then you probably agree, to some extent or another, the above to be true. Although, if you've never been a functional language person, I would venture to guess you haven't experienced it at such a noticable magnitude. With Clojure, as many will attest3, the average code reduction size from, for instance, Java is roughly 10 times4. Nothing speaks better than an example though so lets look at the Fibonacci sequence and how that would be computed in both traditional Java and in Clojure.

In Java5:

public class Fibonacci { 
 public static long fib(int n) {
 if (n <= 1) return n;
 else return fib(n-1) + fib(n-2);
 }
 public static void main(String[] args) {
 int N = Integer.parseInt(args[0]);
 for (int i = 1; i <= N; i++)
 System.out.println(i + ": " + fib(i));
 }
}

In Clojure6:

(defn fibonacci [n]
 (take n (lazy-cat [0 1] (map + (rest fib-seq) fib-seq)))

Both examples are extremely brief, but still present the brevity and simplicity of Clojure over other imperative languages.

Business Case No. 2 - Invest early for greater payout later

Functional languages, as personal experience above leads me to believe, are inherently more difficult to comprehend, but provide greater benefit over time. As with a RISC versus CISC architecture debate on whether a small number or large number of instructions are better, so too can the debate reach into imperative versus functional programming languages. The core difference being that for architecture debates the developer doesn't need to actively remember such a plethora of commands.

This concept can be burdensome for many newcomers to Clojure. With a total of around 590 unique functions7 in the Clojure core library alone that can be quite daunting when compared to C or Java which only range from around 30 to 50 keywords. If one takes the plunge though, and moves slowly, they will come out the other end of the tunnel with a host of new capabilities allowing them to operate at higher speeds with a more confident pace and with fewer errors.

Business Case No. 3 - Natural breakdown of tasks

Anyone who has ever tested code knows that constant and concerted efforts are always taken to ensure everyone is working towards that ever-elusive +90% code coverage. This leads to one of the best side effects (no pun intended)8 of working with Clojure.

Functional languages naturally have a tendency to remove state from within their functions and rather, like mathematics, pass their state in (e.g. f(g(x))). This can be an extremely powerful way to express computation, but becomes burdensome to work with when coming from an imperative language.

But, and this is a large but, because functional languages avoid state and mutable data they have a natural tendancy to break down into convenient subtasks. These subtasks can then be tested, evaluated, and promoted into production code much faster than in other languages. As an example here is the validation function I wrote for a project to verify configuration parameters.

(defn validate
 "Given a configuration file as a Clojure map, this will validate each value of each key within
 the map. For all values that do not pass validation they will be logged with the default logging
 mechanism and removed from the map."
 [conf-map]
 (loop [conf conf-map, valid-conf {}]
 (if (empty? conf)
 valid-conf
 (let [[k v] (first conf)]
 (if (contains? pred-val-map k)
 (recur (rest conf) (if ((k pred-val-map) v) (merge {k v} valid-conf) valid-conf))
 (do
 (log/warn "Cannot validate key" k "and value" v "; no predicate validation function found")
 (recur (rest conf) valid-conf)))))))

For this example you can assume the pred-val-map looks something like:

(def pred-val-map
 {:host #(str? %)
 :port #(let [port (Integer. %)]
 (and (>= port 0) (< port 65536)))
 :version (fn [x] 
 (empty? (filter #(not (number? (Integer. %)))
 (clj-string/split x #"\."))))
 :description #(str? %)})

And finally to test my validate function I would merely need to write a few different map structures to ensure each function acted as it should. This can be done quite easily with Midje, but that is covered in detail below.

It begins to clear up as you work with the language more, but I find that people always gravitate towards this style and end up writing more coherent and cohesive code.

Compatibility

One of Java's major selling points is the Java Virtual Machine (JVM). That small utility grants any and all JVM-based developer the capability to write code and have it run anywhere and on nearly any hardware. That said, Clojure also benefits from this lovely feature, but in reality that only skims the surface of why Clojure and the JVM work so well together.

Java, with its rich history, has become a force to be reckoned with. It would only make sense that if Clojure could leverage any and all Java code then it would only benefit from the years of development cycles that have already been consumed making Java what it is today. Lucky for us this happened and all Clojure converts can rejoice knowing they have the ability to continue calling their favorite Java libraries from within their Clojure code. This is, in my opinion, one of the major selling points of the language and provides a nice segway for traditional Java developers to move into a functional world.

I won't go into great depth on all of the interopability functions Clojure has, but, if the reader is so inclined, I've found this blog post to be of great assistance when needing examples of Java to Clojure and vice versa.

What makes a language great isn't just its own innate ability, but that of the community surrounding it and, I must say, Clojure has an extremely active following with some very bright individuals. I don't want to go into great detail here as I feel the 'proof is in the pudding'9, but I will share a few of the major tools I use on a daily basis which should help any beginning Clojure developer.

If you've ever looked at a Maven pom.xml and thought "there must be something easier" then you were referring to Leiningen. It is extremely lightweight, has a simple and concise design, and couldn't be easier to work with. Is it perfect? Probably not, but I haven't found anything I couldn't do that I needed to with it. From deploying to remote Nexus repositories (code signing as well) to polyglot codebase uberjars Leiningen does everything I could ask. Here is a simple example defining a project:

(defproject sample-project "0.0.1"
 :description "My sample description"
 :plugins [[lein-marginalia "0.7.1"]
 [lein-javadoc "0.1.1"]]
 :javadoc-opts {:package-names ["tecs.ingest"]
 :output-dir "docs"}
 :resource-paths ["resources/hadoop"]
 :source-paths ["src/clj"]
 :java-source-paths ["src/jvm/"]
 :javac-options ["-target" "1.6" "-source" "1.6" "-Xlint:-options"]
 :dependencies [[org.apache.hadoop/hadoop-core "1.0.3"]
 [org.apache.accumulo/accumulo-core "1.4.0" :exclusions [org.slf4j/slf4j-log4j12]]
 [org.apache.commons/commons-lang3 "3.1"]
 [com.taoensso/timbre "2.7.1"]]
 ;; These dependencies are used to compile against, but not placed in the uberjar
 :profiles {:dev {:dependencies [[org.clojure/clojure "1.4.0"]
 [storm "0.8.2"]]}
 :1.5 {:dependencies [[org.clojure/clojure "1.5.1"]]}}
 :aliases {"docs"
 ^{:doc "Generate all documentation and place in the docs/ folder"}
 ["do" "javadoc," "marg"]
 "build"
 ^{:doc "Build a clean project; like 'fresh!', but without the documentation"}
 ["do" "clean," "deps," "javac," "compile," "uberjar"]
 "fresh!"
 ^{:doc "Pull dependencies, clean the project, build, jar, and generate documentation"}
 ["do" "clean," "deps," "javac," "compile," "uberjar," "docs"]}
 :aot :all)

This might seem complex, but in that project I'm also leveraging two plugins which will allow me to generate all the documentation on a codebase which uses both Clojure and Java. Long story short, there's a lot of power packed into that little application.

Up until I found Light Table and started working heavily in Clojure my IDE of choice was Emacs (if you can even call it an IDE). I found other IDE's much too bulky, providing myriad features I never used, unstable, or attempting to do too much at once. I am, and always will be, a minimalist and only want what I need to get the job done. That said, the permanent switch to Light Table became apparent after testing it out for a few days and stumbling on the 'InstaREPL' feature. Given Clojure can run in a REPL there is a certain nicety that comes with having this function at your fingertips right next to your code. My typical development time for any activity went down (roughly) 30% because I was able to actively diagnose as I was working. Once I noticed this I knew I'd be working with Light Table full time.

I don't want everyone to think that's the only good function provided though. There are loads of other capabilities packed into that application that I'm continuing to discover and I leave it as a test for the reader to download and give Light Table a try for yourself.

Testing has always been a necessary part of software development, but in my own personal experience testing has never felt quite right. Enter Midje - a beautiful and idomatic way to assert facts about the code you've produced and then evaluate those facts to ensure correctness. If anyone has ever taken a Programming Languages course that sentence should speak volumes.

Here is a simple example taken from the Midje website which walks through evaluating the split function built into Clojure.

(fact "`split` splits strings on regular expressions and returns a vector"
 (str/split "a/b/c" #"/") => ["a" "b" "c"]
 (str/split "" #"irrelevant") => [""]
 (str/split "no regexp matches" #"a+\s+[ab]") => ["no regexp matches"])

As you can see from the above example Midje helps to further alleviate the testing woes many face by providing a convenient and intuitive way to assert facts that others can readily understand.

Marginalia has become my de-facto documentation tool. Given the plethora of doc-string capabilities already built into Clojure it is extremely convenient to provide, as nothing more, a Leiningen plugin to leverage the comments which are likely already spattered throughout and offer them up as a beautifully intuitive .html document.

The above are only a few of the many tools and libraries available to Clojure. For everything else there's the Clojure Toolbox, a site constantly updated with new and exciting Clojure capabilities categorized by their primary focus (i.e. logging, databases, NLP, machine learning, etc.). Before rolling my own anything I check here first.

Closing Thoughts

For me Clojure is a gem of language to work with. It has had a net-positive effect on my output at work as well as the mental aspects of my overall work life. Every day I'm more excited to tackle new problem spaces or architect the next application and, to this end, I've seen noticable improvements to my code; in brevity, time to completion, readability, and less erroneous. To conclude I hope I've inspired everyone to take away these two points...

Think about your individual preconcieved notions regarding functional languages and determine if what is there is honest truth or bias from a previous opinion. I'm not trying to convert the world into Clojure experts, but to present facts so that others can see a different point of view regarding functional languages. Take a moment out of the day and give Clojure a try. It won't hurt and, if you like it, then the community as a whole will only prosper. If you aren't a fan then you have experience and fact to present the next time someone asks you about functional languages.

If you have any questions or comments feel free to email me at brennon@paradiso.cc.

Thanks for reading, until next time!

References

1 - I use the words 'ease of use' in a relative sense here as that definition is opinionated by the given developer(s) at the time.
2 - Some agreement with this train of thought; Why do Java programmers love Scala and shy away from Clojure?
3 - One Night With Clojure Makes a Scala Guy Humble
4 - Java to clojure rewrite
5 - Shortest fibonacci sequence I could find written in Java; Princeton Fibonacci Sequence
6 - Roughly similar to this example
7 - Roughly how many functions are in the Clojure core libraries?
8 - If you laughed at that then you're already 'in the know' with functional languages.
9 - Etymology of "the proof is in the pudding".

I, Developer | Blog

$
0
0

Comments:"I, Developer | Blog"

URL:http://jmoses.co/2013/12/21/is-ruby-dying.html


Is Ruby Dying?

I have been working with nodejs alot lately and have been discussing with coworkers if nodejs is taking steam away from ruby at all. I think the popularity of the language is an important talking point when selecting a language and framework for a new project.

I think a graph on the release date of gems over time could help determine an answer. The front page of rubygems only shows data on the most popular, but I am really interested in seeing recent activity. My theory is that if developers’ contributions to different gems is slowing down, then so is the popularity of the language.

Getting the data

After a little searching, I was unable to find the data in the format necessary for putting together a graph. And anyway, I was interested in scraping the site. My tool of choice is the nodejs cheerio library. The gem executable can give us a list of each gem by running gem list --remote. Fortunately, each gem has its own homepage with a nice URL. e.g. Rails can be found at http://rubygems.org/gems/rails

varrequest=require('request'),cheerio=require('cheerio'),bytes=require('bytes'),sys=require('sys'),fs=require('fs'),exec=require('child_process').exec;console.log('program begin: '+newDate());fs.openSync('out.csv','w');exec("gem list --remote",{maxBuffer:20000*1024},processGems);functionprocessGems(error,stdout,stderr){vargems=stdout.split("\n");console.info('total gems parsed: '+gems.length);gems.forEach(function(gem){gem=gem.substring(0,gem.indexOf(' '));console.info('crawling gem: '+gem);request({uri:'https://rubygems.org/gems/'+gem,},getContent);});}functionparseSize(size){size=size.replace(' ','');size=size.replace(/\)|\(/g,'');size=size.toLowerCase();try{size=bytes(size);}catch(e){console.error('unable to parse :'+size);}returnsize;}functionescape(s){if(s.indexOf('"')!=-1){s=s.replace(/"/g,'""');}if(s.match(/"|,/)){s='"'+s+'"';}returns;}functiongetContent(error,response,body){if(error&&response.statusCode!=200){console.error(error);console.error(response);return;}console.info(response.request.href+' complete, status: '+response.statusCode);var$=cheerio.load(body),gem=$('div.title h2 a').text(),latest=$('div.versions ol li').last(),version=$(latest).children('a').text(),date=$(latest).children('small').text(),size=$(latest).children('span.size').text(),line;line=escape(gem)+','+escape(version)+','+escape(date)+','+parseSize(size)+'\n';fs.appendFile('out.csv',line);}

Cleaning the data

The resulting file is 2.7mb and can be found here. It is not in the format to make the graph I want. The graph I am picturing has the number of releases for that day on the y-axis and the release date on the x-axis. So I need two columns, the first with release date and the second with the number of releases on that day. The following nodejs takes the dumped csv and puts in the format I want.

var__=require('lodash'),moment=require('moment'),csv=require('csv'),fs=require('fs');fs.openSync('releasedate.csv','w');csv().from.path(__dirname+'/gems.csv',{delimiter:',',escape:'"'}).to.array(function(data){varcsv='';vargrouped=__.groupBy(data,function(gem){returngem[2];});vararray=[]for(vargemingrouped){array.push({date:escape(gem),unixtime:moment(gem,'MMM D, YYYY').valueOf(),released:grouped[gem].length});}arraySorted=__.sortBy(array,'unixtime');arraySorted.forEach(function(gem){csv+=gem.date+','+gem.released+'\n';});fs.appendFile('releasedate.csv',csv);});functionescape(s){if(s.indexOf('"')!=-1){s=s.replace(/"/g,'""');}if(s.match(/"|,/)){s='"'+s+'"';}returns;}

Graphing in R

Perfect! Now to get graphing. My first thought is to use Excel, but when I try I find out that I can only create a graph with 255 points. That is not going to work as I have ~65k so I boot up R. The following two lines of R produce the line chart I want to see and remind me why I like to use R.

data<-read.csv('releasedate.csv')
plot(zoo(data$Released,as.Date(data$Date,"%m/%d/%y")),xlab="Release Date",ylab="Releases",main="RubyGems Release Date Trend")

The end result is a nice line chart showing the trend I am looking for.

At first I am surprised. It does not appear to be slowing as I had thought. When I think more about it, it makes sense that the developers are still using gems:

  • The Ruby meetup groups have a devout following and strong leadership
  • There are several highly marketed learning resources online
  • Rails makes it super easy to create a web application

What do you guys think? Is Ruby dying? Did you once use Ruby and have now started to use something else? Comments can be found over at Hackernews

How Netflix Reinvented HR - Harvard Business Review

$
0
0

Comments:"How Netflix Reinvented HR - Harvard Business Review"

URL:http://hbr.org/2014/01/how-netflix-reinvented-hr/ar/1


Netflix founder and CEO Reed Hastings discusses the company’s unconventional HR practices.

HBR: Why did you write the Netflix culture deck?
Hastings: It’s our version of Letters to a Young Poet for budding entrepreneurs. It’s what we wish we had understood when we started. More than 100 people at Netflix have made major contributions to the deck, and we have more improvements coming.

Many of the ideas in it seem like common sense, but they go against traditional HR practices. Why aren’t companies more innovative when it comes to talent management?
As a society, we’ve had hundreds of years to work on managing industrial firms, so a lot of accepted HR practices are centered in that experience. We’re just beginning to learn how to run creative firms, which is quite different. Industrial firms thrive on reducing variation (manufacturing errors); creative firms thrive on increasing variation (innovation).

What reactions have you gotten from your peers to steps such as abolishing formal vacation and performance review policies? In general, do you think other companies admire your HR innovations or look askance at them?
My peers are mostly in the creative sector, and many of the ideas in our culture deck came from them. We are all learning from one another.

Which idea in the culture deck was the hardest sell with employees?
“Adequate performance gets a generous severance package.” It’s a pretty blunt statement of our hunger for excellence.

Have any of your talent management innovations been total flops?
Not so far.

Patty talks about how leaders should model appropriate behaviors to help people adapt to an environment with fewer formal controls. With that in mind, how many days off did you take in 2013?
“Days off” is a very industrial concept, like being “at the office.” I find Netflix fun to think about, so there are probably no 24-hour periods when I never think about work. But I did take three or four weeklong family trips over the past year, which were both stimulating and relaxing.


Python Practice Projects

Thoughts on Programming: Clojure vs Scala

The Universe of Discourse : Moonpig: a billing system that doesn't suck

$
0
0

Comments:"The Universe of Discourse : Moonpig: a billing system that doesn't suck"

URL:http://blog.plover.com/prog/Moonpig.html


           

Moonpig: a billing system that doesn't suck
I'm in Amsterdam now, because Booking.com brought me out to tell them about Moonpig, the billing and accounting system that Rik Signes and I wrote. The talk was mostly a rehash of one I gave a Pittsburgh Perl Workshop a couple of months ago, but I think it's of general interest.

The assumption behind the talk is that nobody wants to hear about how the billing system actually works, because most people either have their own billing system already or else don't need one at all. I think I could do a good three-hour talk about the internals of Moonpig, and it would be very interesting to the right group of people, but it would be a small group. So instead I have this talk, which lasts less than an hour. The takeaway from this talk is a list of several basic design decisions that Rik and I made while building Moonpig which weren't obviously good ideas at the time, but which turned out well in hindsight. That part I think everyone can learn from. You may not ever need to write a billing system, but chances are at some point you'll consider using an ORM, and it might be useful to have a voice in your head that says “Dominus says it might be better to do something completely different instead. I wonder if this is one of those times?”

So because I think the talk was pretty good, and it's fresh in my mind right now, I'm going to try to write it down. The talk slides are here if you want to see them. The talk is mostly structured around a long list of things that suck, and how we tried to design Moonpig to eliminate, avoid, or at least mitigate these things.

Moonpig, however, does not suck.

Sometimes I see other people fuck up a project over and over, and I say “I could do that better”, and then I get a chance to try, and I discover it was a lot harder than I thought, I realize that those people who tried before are not as stupid as as I believed.

That did not happen this time. Moonpig is a really good billing system. It is not that hard to get right. Those other guys really were as stupid as I thought they were.

When I tell people I was working for IC Group, they frown; they haven't heard of it. But quite often when I say that IC Group runs pobox.com, those same people smile and say “Oh, pobox!”.

ICG is a first wave dot-com. In the late nineties, people would often have email through their employer or their school, and then they would switch jobs or graduate and their email address would go away. The basic idea of pobox was that for a small fee, something like $15 per year, you could get a pobox.com address that would forward all your mail to your real email address. Then when you changed jobs or schools you could just tell pobox to change the forwarding record, and your friends would continue to send email to the same pobox.com address as before. Later, ICG offered mail storage, web mail, and, through listbox.com, mailing list management and bulk email delivery.

Moonpig was named years and years before the project to write it was started. ICG had a billing and accounting system already, a terrible one. ICG employees would sometimes talk about the hypothetical future accounting system that would solve all the problems of the current one. This accounting system was called Moonpig because it seemed clear that it would never actually be written, until pigs could fly.

And in fact Moonpig wouldn't have been written, except that the existing system severely constrained the sort of pricing structures and deals that could actually be executed, and so had to go. Even then the first choice was to outsource the billing and accounting functions to some company that specialized in such things. The Moonpig project was only started as a last resort after ICG's president had tried for 18 months to find someone to take over the billing and collecting. She was unsuccessful. A billing provider would seem perfect and then turn out to have some bizarre shortcoming that rendered it unsuitable for ICG's needs. The one I remember was the one that did everything we wanted, except it would not handle checks. “Don't worry,” they said. “It's 2010. Nobody pays by check any more.”

Well, as it happened, many of our customers, including some of the largest institutional ones, had not gotten this memo, and did in fact pay by check.

So with some reluctance, she gave up and asked Rik and me to write a replacement billing and accounting system.

As I mentioned, I had always wanted to do this. I had very clear ideas, dating back many years, about mistakes I wouldnot make, were I ever called upon to write a billing system.

For example, I have many times received a threatening notice of this sort:

Your account is currently past due! Pay the outstanding balance of $      0 . 00   or we will be forced to refer your account for collection. What I believe happened here is: some idiot programmer knows that money amounts are formatted with decimal points, so decides to denominate the money with floats. The amount I paid rounds off a little differently than the amount I actually owed, and the result after subtraction is all roundoff error, and leaves me with a nominal debt on the order of !!2^{-64}!! dollars.

So I have said to myself many times “If I'm ever asked to write a billing system, it's not going to use any fucking floats.” And at the meeting at which the CEO told me and Rik that we would write it, those were nearly the first words out of my mouth: No fucking floats.

I will try to keep this as short as possible, including only as much as is absolutely required to understand the more interesting and generally applicable material later. ICG has two basic use cases. One is Pobox addresses and mailboxes, where the customer pays us a certain amount of money to forward (or store) their mail for a certain amount of time, typically a year. The other is Listbox mailing lists, where the customer pays us a certain amount to attempt a certain number of bulk email deliveries on their behalf. The life cycle for a typical service looks like this: The customer pays us some money: a flat fee for a Pobox account, or a larger or smaller pile for Listbox bulk mailing services, depending on how much mail they need us to send. We deliver service for a while. At some point the funds in the customer's account start to run low. That's when we send them an invoice for an extension of the service. If they pay, we go back and continue to provide service and the process repeats; if not, we stop providing the service. But on top of this basic model there are about 10,019 special cases:
  • Customers might cancel their service early.

  • Pobox has a long-standing deal where you get a sixth year free if you pay for five years of service up front.

  • Sometimes a customer with only email forwarding ($20 per year) wants to upgrade their account to one that does storage and provides webmail access ($50 per year), or vice-versa, in the middle of a year. What to do in this case? Business rules dictate that they can apply their current balance to the new service, and it should be properly pro-rated. So if I have 64 days of $50-per-year service remaining, and I downgrade to the $20-per-year service, I now have 160 days of service left.

    Well, that wasn't too bad, except that we should let the customer know the new expiration date. And also, if their service will now expire sooner than it would have, we should give them a chance to pay to extend the service back to the old date, and deal properly with their payment or nonpayment.

    Also something has to be done about any 6th free year that I might have had. We don't want someone to sign up for 5 years of $50-per-year service, get the sixth year free, then downgrade their account and either get a full free year of $50-per-year service or get a full free year of $20-per-year service after only !!\frac{20}{50}!! of five full years.

  • Sometimes customers do get refunds.

  • Sometimes we screw up and give people a credit for free service, as an apology. Unlike regular credits, these are not refundable!

  • Some customers get gratis accounts. The other cofounder of ICG used to hand these out at parties.

  • There are a number of cases for coupons and discounts. For example, if you refer a friend who signs up, you get some sort of credit. Non-profit institutions get some sort of discount off the regular rates. Customers who pay for many accounts get some sort of bulk discount. I forget the details.

  • Most customers get their service cut off if they don't pay. Certain large and longstanding customers should not be treated so peremptorily, and are allowed to run a deficit.

  • And so to infinity and beyond.

The Moonpig data store is mostly organized as a huge pile ofledgers. Each represents a single customer or account. It contains some contact information, a record of all the transactions associated with that customer, a history of all the invoices ever sent to that customer, and so forth.

It also contains some consumer objects. Each consumer represents some service that we have promised to perform in exchange for money. The consumer has methods in it that you can call to say “I just performed a certain amount of service; please charge accordingly”. It has methods for calculating how much money has been allotted to it, how much it has left, how fast it is consuming its funds, how long it expects to last, and when it expects to run out of money. And it has methods for constructing its own replacement and for handing over control to that replacement when necessary.

Every day, a cron job sends a heartbeat event to each ledger. The ledger doesn't do anything with the heartbeat itself; its job is to propagate the event to all of its sub-components. Most of those, in turn, ignore the heartbeat event entirely.

But consumers do handle heartbeats. The consumer will wake up and calculate how much longer it expects to live. (For Pobox consumers, this is simple arithmetic; for mailing-list consumers, it guesses based on how much mail has been sent recently.) If it notices that it is going to run out of money soon, it creates a successor that can take over when it is gone. The successor immediately sends the customer an invoice: “Hey, your service is running out, do you want to renew?”

Eventually the consumer does run out of money. At that time it hands over responsibility to its replacement. If it has no replacement, it will expire, and the last thing it does before it expires is terminate the service.

Somewhere is a machine that runs a daily cron job to heartbeat each ledger. What if one day, that machine is down, as they sometimes are, and the cron job never runs?

Or what if that the machine crashes while the cron job is running, and the cron job only has time to heartbeat 3,672 of the 10,981 ledgers in the system?

In a perfect world, every component would be able to depend on exactly one heartbeat arriving every day. We don't live in that world. So it was an ironclad rule in Moonpig development that anything that handles heartbeat events must be prepared to deal with missing heartbeats, duplicate heartbeats, or anything else that could screw up.

When a consumer gets a heartbeat, it must not cheerfully say "Oh, it's the dawn of a new day! I'll charge for a day's worth of service!". It must look at the current date and at its own charge record and decide on that basis whether it's time to charge for a day's worth of service.

Now the answers to those questions of a few paragraphs earlier are quite simple. What if the machine is down and the cron job never runs? What to do?

A perfectly acceptable response here is: Do nothing. The job will run the next day, and at that time everything will be up to date. Some customers whose service should have been terminated today will have it terminated tomorrow instead; they will have received a free day of service. This is an acceptable loss. Some customers who should have received invoices today will receive them tomorrow. The invoices, although generated and sent a day late, will nevertheless show the right dates and amounts. This is also an acceptable outcome.

What if the cron job crashes after heartbeating 3,672 of 10,981 ledgers? Again, an acceptable response is to do nothing. The next day's heartbeat will bring the remaining 7,309 ledgers up to date, after which everything will be as it should. And an even better response is available: simply rerun the job. 3,672 of the ledgers will receive the same event twice, and will ignore it the second time.

Contrast this with the world in which heartbeats were (mistakenly) assumed to be reliable. In this world, the programming staff must determine precisely which ledgers received the event before the crash, either by trawling through the log files or by grovelling over the ledger data. Then someone has to hack up a program to send the heartbeats to just the 7,309 ledgers that still need it. And there is a stiff deadline: they have to get it done before tomorrow's heartbeat issues!

Making everything robust in the face of heartbeat failure is a little more work up front, but that cost is recouped the first time something goes wrong with the heartbeat process, when instead of panicking you smile and open another beer. Let N be the number of failures and manual repairs that are required before someone has had enough and makes the heartbeat handling code robust. I hypothesize that you can tell a lot about an organization from the value ofN.

Here's an example of the sort of code that is required. The non-robust version of the code would look something like this:

 sub charge {
 my ($self, $event) = @_;
 $self->charge_one_day();
 }
The code, implemented by a role called Moonpig::Role::Consumer::ChargesPeriodically, actually looks something like this:
 has last_charge_date => ( … );
 sub charge {
 my ($self, $event) = @_;
 my $now = Moonpig->env->now;
 CHARGE: until ($self->next_charge_date->follows($now)) {
 my $next = $self->next_charge_date;
 $self->charge_one_day();
 $self->last_charge_date($next);
 if ($self->is_expired) {
 $self->replacement->handle_event($event) if $self->replacement;
 last CHARGE;
 }
 }
 }
The last_charge_date member records the last time the consumer actually issued a charge. The next_charge_date method consults this value and returns the next day on which the consumer should issue a charge—not necessarily the following day, since the consumer might issue weekly or monthly charges. The consumer will issue charge after charge until the next_charge_date is the future, when it will stop. It runs the until loop, using charge_one_day to issue another charge each time through, and updating last_charge_date each time, until the next_charge_date is in the future.

The one tricky part here the if block. This is because the consumer might run out of money before the loop completes. In that case it passes the heartbeat event on to its successor (replacement) and quits the loop. The replacement will run its own loop for the remaining period.

A customer pays us $20. This will cover their service for 365 days. The business rules say that they should receive their first invoice 30 days before the current service expires; that is, after 335 days. How are we going to test that the invoice is in fact sent precisely 335 days later?

Well, put like that, the answer is obvious: Your testing system must somehow mock the time. But obvious as this is, I have seen many many tests that made some method call and then did sleep 60, waiting and hoping that the event they were looking for would have occurred by then, reporting a false positive if the system was slow, and making everyone that much less likely to actually run the tests.

I've also seen a lot of tests that crossed their fingers and hoped that a certain block of code would execute between two ticks of the clock, and that failed nondeterministically when that didn't happen.

So another ironclad law of Moonpig design was that no object is ever allowed to call the time() function to find out what time it actually is. Instead, to get the current time, the object must call Moonpig->env->now.

The tests run in a test environment. In the test environment, Moonpig->env returns a Moonpig::Env::Test object, which contains a fake clock. It has a stop_clock method that stops the clock, and an elapse_time method that forces the clock forward a certain amount. If you need to check that something happens after 40 days, you can call Moonpig->env->elapse_time(86_400 * 40), or, more likely:

 for (1..40) {
 Moonpig->env->elapse_time(86_400);
 $test_ledger->heartbeat;
 }
In the production environment, the environment object still has a now method, but one that returns the true current time from the system clock. Trying to stop the clock in the production environment is a fatal error.

Similarly, no Moonpig object ever interacts directly with the database; instead it must always go through the mediator returned by Moonpig->env->storage. In tests, this can be a fake storage object or whatever is needed. It's shocking how many tests I've seen that begin by allocating a new MySQL instance and executing a huge pile of DDL. Folks, this is not how you write a test.

Again, no Moonpig object ever posts email. It asks Moonpig->env->email_sender to post the email on its behalf. In tests, this uses the CPAN Email::Sender::Transport suite, and the test code can interrogate the email_sender to see exactly what emails would have been sent.

We never did anything that required filesystem access, but if we had, there would have been a Moonpig->env->fs for opening and writing files.

The Moonpig->env object makes this easy to get right, and hard to screw up. Any code that acts on the outside world becomes a red flag: Why isn't this going through the environment object? How are we going to test it?

I've already complained about how I loathe floating-point numbers. I just want to add that although there are probably use cases for floating-point arithmetic, I don't actually know what they are. I've had a pretty long and varied programming career so far, and legitimate uses for floating point numbers seem very few. They are really complicated, and fraught with traps; I say this as a mathematical expert with a much stronger mathematical background than most programmers.

The law we adopted for Moonpig was that all money amounts areintegers. Each money amount is an integral number of “millicents”, abbreviated “m¢”, worth !!\frac1{1000}!! of a cent, which in turn is !!\frac1{100}!! of a U.S. dollar. Fractional millicents are not allowed. Division must be rounded to the appropriate number of millicents, usually in the customer's favor, although in practice it doesn't matter much, because the amounts are so small.

For example, a $20-per-year Pobox account actually bills $$\$\left\lfloor\frac{20,00,000}{365}\right\rfloor = 5479$$ m¢ each day. (5464 in leap years.)

Since you don't want to clutter up the test code with a bunch of numbers like 1000000 ($10), there are two utterly trivial utility subroutines:

 sub cents { $_[0] * 1000 }
 sub dollars { $_[0] * 1000 * 100 }
Now $10 can be written dollars(10).

Had we dealt with floating-point numbers, it would have been tempting to write test code that looked like this:

 cmp_ok(abs($actual_amount - $expected_amount), "<", $EPSILON, …);
That's because with floats, it's so hard to be sure that you won't end up with a leftover !!2^{-64}!! or something, so you write all the tests to ignore small discrepancies. This can lead to overlooking certain real errors that happen to result in small discrepancies. With integer amounts, these discrepancies have nowhere to hide. It sometimes happened that we would write some test and the money amount at the end would be wrong by 2m¢. Had we been using floats, we might have shrugged and attributed this to incomprehensible roundoff error. But with integers, that is a difference of 2, and you cannot shrug it off. There is no incomprehensible roundoff error. All the calculations are exact, and if some integer is off by 2 it is for a reason. These tiny discrepancies usually pointed to serious design or implementation errors. (In contrast, when a test would show a gigantic discrepancy of a million or more m¢, the bug was always quite easy to find and fix.)

There are still roundoff errors; they are unavoidable. For example, a consumer for a $20-per-year Pobox account bills only 365·5479m¢ = 1999835m¢ per year, an error in the customer's favor of 165m¢ per account; after 12 million years the customer will have accumulated enough error to pay for an extra year of service. For a business of ICG's size, this loss was deemed acceptable. For a larger business, it could be significant. (Imagine 6,000,000 customers times 165m¢ each; that's $9,900.) In such a case I would keep the same approach but denominate everything in micro-cents instead.

Happily, Moonpig did not have to deal with multiple currencies. That would have added tremendous complexity to the financial calculations, and I am not confident that Rik and I could have gotten it right in the time available.

Dates and times are terribly complicated, partly because the astronomical motions they model are complicated, and mostly because the world's bureaucrats keep putting their fingers in. It's been suggested recently that you can identify whether someone is a programmer by asking if they have an opinion on time zones. A programmer will get very red in the face and pound their fist on the table.

After I wrote that sentence, I then wrote 1,056 words about the right way to think about date and time calculations, which I'll spare you, for now. I'm going to try to keep this from turning into an article about all the ways people screw up date and time calculations, by skipping the arguments and just stating the main points:

Date-time values are a kind of number, and should be considered as such. In particular: Date-time values inside a program should be immutable There should be a single canonical representation of date-time values in the program, and it should be chosen for ease of calculation. If the program does have to deal with date-time values in some other representation, it should convert them to the canonical representation as soon as possible, or from the canonical representation as late as possible, and in any event should avoid letting non-canonical values percolate around the program. The canonical representation we chose was DateTime objects in UTC time. Requiring that the program deal only with UTC eliminates many stupid questions about time zones and DST corrections, and simplifies all the rest as much as they can be simplified. It also avoids DateTime's unnecessarily convoluted handling of time zones.

We held our noses when we chose to use DateTime. It has my grudging approval, with a large side helping of qualifications. The internal parts of it are okay, but the methods it provides are almost never what you actually want to use. For example, it provides a set of mutators. But, as per item 1 above, date-time values are numbers and ought to be immutable. Rik has a good story about a horrible bug that was caused when he accidentally called the ->subtract method on some widely-shared DateTime value and so mutated it, causing an unexpected change in the behavior of widely-separated parts of the program that consulted it afterward.

So instead of using raw DateTime, we wrapped it in a derived class called Moonpig::DateTime. This removed the mutators and also made a couple of other convenient changes that I will shortly describe.

If you have a pair of DateTime objects and you want to know how much time separates the two instants that they represent, you have several choices, most of which will return a DateTime::Duration object. All those choices are wrong, because DateTime::Duration objects are useless. They are a kind of Roach Motel for date and time information: Data checks into them, but doesn't check out. I am not going to discuss that here, because if I did it would take over the article, but I will show the simple example I showed in the talk:
 my $then = DateTime->new( month => 4, day => 2, year => 1969,
 hour => 0, minute => 0, second => 0);
 my $now = DateTime->now();
 my $elapsed = $now - $then;
 print $elapsed->in_units('seconds'), "\n";
You might think, from looking at this code, that it might print the number of seconds that elapsed between 1969-04-02 00:00:00 (in some unspecified time zone!) and the current moment. You would be mistaken; you have failed to reckon with the $elapsed object, which is a DateTime::Duration. Computing this object seems reasonable, but as far as I know once you have it there is nothing to do but throw it away and start over, because there is no way to extract from it the elapsed amount of time, or indeed anything else of value. In any event, the print here does not print the correct number of seconds. Instead it prints ME CAGO EN LA LECHE, which I have discovered is Spanish for “I shit in the milk”.

So much for DateTime::Duration. Whena andb are Moonpig::DateTime objects, a-b returns the number of seconds that have elapsed between the two times; it is that simple. You can divide it by 86,400 to get the number of days.

Other arithmetic is similarly overloaded: If i is a number, then a+i and a-i are the times obtained by adding or subtracting i seconds to a, respectively.

(C programmers should note the analogy with pointer arithmetic; C's pointers, and date-time values—also temperatures—are examples of a mathematical structure called an affine space, and study of the theory of affine spaces tells you just what rules these objects should obey. I hope to discuss this at length another time.)

Going along with this arithmetic are a family of trivial convenience functions, such as:

 sub hours { $_[0] * 3600 }
 sub days { $_[0] * 86400 }
so that you can use $a + days(7) to find the time 7 days after $a. Programmers at the Amsterdam talk were worried about this: what about leap seconds? And they are correct: the name days is not quite honest, because it promises, but does not deliver, exactly 7 days. It can't, because the definition of the day varies widely from place to place and time to time, and not only can't you know how long 7 days unless you know where it is, but it doesn't even make sense to ask. That is all right. You just have to be aware, when you add days(7), the the resulting time might not be the same time of day 7 days later. (Indeed, if the local date and time laws are sufficiently bizarre, it could in principle be completely wrong. But since Moonpig::DateTime objects are always reckoned in UTC, it is never more than one second wrong.)

Anyway, I was afraid that Moonpig::DateTime would turn out to be a leaky abstraction, producing pleasantly easy and correct results thirty times out of thirty-one, and annoyingly wrong or bizarre results the other time. But I was surprised: it never caused a problem, or at least none has come to light. I am working on releasing this module to CPAN, under the name DateTime::Moonpig. (A draft version is already available, but I don't recommend that you use it.)

I left this out of the talk, by mistake, but this is a good place to mention it: mutable data is often a bad idea. In the billing system we wanted to avoid it for accountability reasons: We never wanted the customer service agent to be in the position of being unable to explain to the customer why we thought they owed us $28.39 instead of the $28.37 they claimed they owed; we never wanted ourselves to be in the position of trying to track down a billing system bug only to find that the trail had been erased.

One of the maxims Rik and I repeated freqently was that the moving finger writes, and, having writ, moves on. Moonpig is full of methods with names like is_expired, is_superseded, is_canceled, is_closed, is_obsolete, is_abandoned and so forth, representing entities that have been replaced by other entities but which are retained as part of the historical record.

For example, a consumer has a successor, to which it will hand off responsibility when its own funds are exhausted; if the customer changes their mind about their future service, this successor might be replaced with a different one, or replaced with none. This doesn't delete or destroy the old successor. Instead it marks the old successor as "superseded", simultaneously recording the supersession time, and pushes the new successor (or undef, if none) onto the end of the target consumer's replacement_history array. When you ask for the current successor, you are getting the final element of this array. This pattern appeared in several places. In a particularly simple example, a ledger was required to contain a Contact object with contact information for the customer to which it pertained. But the Contact wasn't simply this:

 has contact => (
 is => 'rw',
 isa => role_type( 'Moonpig::Role::Contact' ),
 required => 1,
 );
Instead, it was an array; "replacing" the contact actually pushed the new contact onto the end of the array, from which the contact accessor returned the final element:
 has contact_history => (
 is => 'ro',
 isa => ArrayRef[ role_type( 'Moonpig::Role::Contact' ) ],
 required => 1,
 traits => [ 'Array' ],
 handles => {
 contact => [ get => -1 ],
 replace_contact => 'push',
 },
 );
Why do we use relational databases, anyway? Is it because they cleanly and clearly model the data we want to store? No, it's because they are lightning fast.

When your data truly is relational, a nice flat rectangle of records, each with all the same fields, RDBs are terrific. But Moonpig doesn't have much relational data. It basic datum is the Ledger, which has a bunch of disparate subcomponents, principally a heterogeneous collection of Consumer objects. And I would guess that most programs don't deal in relational data; Like Moonpig, they deal in some sort of object network.

Nevertheless we try to represent this data relationally, because we have a relational database, and when you have a hammer, you go around hammering everything with it, whether or not that thing needs hammering.

When the object model is mature and locked down, modeling the objects relationally can be made to work. But when the object model is evolving, it is a disaster. Your relational database schema changes every time the object model changes, and then you have to find some way to migrate the existing data forward from the old schema. Or worse, and more likely, you become reluctant to let the object model evolve, because reflecting that evolution in the RDB is so painful. The RDB becomes a ball and chain locked to your program's ankle, preventing it from going where it needs to go. Every change is difficult and painful, so you avoid change. This is the opposite of the way to design a good program. A program should be light and airy, its object model like a string of pearls.

In theory the mapping between the RDB and the objects is transparent, and is taken care of seamlessly by an ORM layer. That would be an awesome world to live in, but we don't live in it and we may never.

Right now the principal value of ORM software seems to be if your program is too fast and you need it to be slower; the ORM is really good at that. Since speed was the only benefit the RDB was providing in the first place, you have just attached two large, complex, inflexible systems to your program and gotten nothing in return.

Watching the ORM try to model the objects is somewhere between hilariously pathetic and crushingly miserable. Perl's DBIx::Class, to the extent it succeeds, succeeds because it doesn't even try to model the objects in the database. Instead it presents you with objects that represent database rows. This isn't because a row needs to be modeled as an object—database rows have no interesting behavior to speak of—but because the object is an access point for methods that generate SQL. DBIx::Class is not for modeling objects, but for generating SQL. I only realized this recently, and angrily shouted it at the DBIx::Class experts, expecting my denunciation to be met with rage and denial. But they just smiled with amusement. “Yes,” said the DBIx::Class experts on more than one occasion, “that is exactly correct.” Well then.

So Rik and I believe that for most (or maybe all) projects, trying to store the objects in an RDB, with an ORM layer mediating between the program and the RDB, is a bad, bad move. We determined to do something else. We eventually brewed our own object store, and this is the part of the project of which I'm least proud, because I believe we probably made every possible mistake that could be made, even the ones that everyone writing an object store should already know not to make.

For example, the object store has a method, retrieve_ledger, which takes a ledger's ID number, reads the saved ledger data from the disk, and returns a live Ledger object. But it must make sure that every such call returns not just a Ledger object with the right data, but the same object. Otherwise two parts of the program will have different objects to represent the same data, one part will modify its object, and the other part, looking at a different object, will not see the change it should see. It took us a while to figure out problems like this; we really did not know what we were doing.

What we should have done, instead of building our own object store, was use someone else's object store. KiokuDB is frequently mentioned in this context. After I first gave this talk people asked “But why didn't you use KiokuDB?” or, on hearing what we did do, said “That sounds a lot like KiokuDB”. I had to get Rik to remind me why we didn't use KiokuDB. We had considered it, and decided to do our own not for technical but for political reasons. The CEO, having made the unpleasant decision to have me and Rik write a new billing system, wanted to see some progress. If she had asked us after the first week what we had accomplished, and we had said “Well, we spent a week figuring out KiokuDB,” her head might have exploded. Instead, we were able to say “We got the object store about three-quarters finished”. In the long run it was probably more expensive to do it ourselves, and the result wascertainly not as good. But in the short run it kept the customer happy, and that is the most important thing; I say this entirely in earnest, without either sarcasm or bitterness.

(On the other hand, when I ran this article by Rik, he pointed out that KiokuDB had later become essentially unmaintained, and that had we used it he would have had to become the principal maintainer of a large, complex system which which he did not help design or implement. The Moonpig object store may be technically inferior, but Rik was with it from the beginning and understands it thoroughly.)

All that said, here is how our object store worked. The bottom layer was an ordinary relational database with a single table. During the test phase this database was SQLite, and in production it was IC Group's pre-existing MySQL instance. The table had two fields: a GUID (globally-unique identifier) on one side, and on the other side a copy of the corresponding Ledger object, serialized with Perl's Storable module. To retrieve a ledger, you look it up in the table by GUID. To retrieve a list of all the ledgers, you just query the GUID field. That covers the two main use-cases, which are customer service looking up a customer's account history, and running the daily heartbeat job. A subsidiary table mapped IC Group's customer account numbers to ledger GUIDs, so that the storage engine could look up a particular customer's ledger starting from their account number. (Account numbers are actually associated with Consumers, but once you had the right ledger a simple method call to the ledger would retrieve the consumer object. But finding the right ledger required a table.) There were a couple of other tables of that sort, but overall it was a small thing.

There are some fine points to consider. For example, you can choose whether to store just the object data, or the code as well. The choice is clear: you must store only the data, not the code. Otherwise, you would have to update all the objects every time you make a code change such as a bug fix. It should be clear that this would discourage bug fixes, and that had we gone this way the project would have ended as a pile of smoking rubble. Since the code is not stored in the database, the object store must be responsible, whenever it loads an object, for making sure that the correct class for that object actually exists. The solution for this was that along with every object is stored a list of all the roles that it must perform. At object load time, if the object's class doesn't exist yet, the object store retrieves this list of roles (stored in a third column, parallel to the object data) and uses the MooseX::ClassCompositor module to create a new class that does those roles. MooseX::ClassCompositor was something Rik wrote for the purpose, but it seems generally useful for such applications.

Every once in a while you may make an upward-incompatible change to the object format. Renaming an object field is such a change, since the field must be renamed in all existing objects, but adding a new field isn't, unless the field is mandatory. When this happened—much less often than you might expect—we wrote a little job to update all the stored objects. This occurred only seven times over the life of the project; the update programs are all very short.

We did also make some changes to the way the objects themselves were stored: Booking.Com's Sereal module was released while the project was going on, and we switched to use it in place of Storable. Also one customer's Ledger object grew too big to store in the database field, which could have been a serious problem, but we were able to defer dealing with the problem by using gzip to compress the serialized data before storing it.

The use of the RDB engine for the underlying storage got us MySQL's implementation of transactions and atomicity guarantees, which we trusted. This gave us a firm foundation on which to build the higher functions; without those guarantees you have nothing, and it is impossible to build a reliable system. But since they are there, we could build a higher-level transactional system on top of them.

For example, we used an opportunistic locking scheme to prevent race conditions while updating a single ledger. For performance reasons you typically don't want to force all updates to be done through a single process (although it can be made to work; see Rochkind's Advanced Unix Programming). In an optimistic locking scheme, you store a version number with each record. Suppose you are the low-level storage manager and you get a request to update a ledger with a certain ID. Instead of doing this:

 update ledger set serialized_data = …
 where ledger_id = 789
You do this:
 update ledger set serialized_data = …
 , version = 4
 where ledger_id = 789 and version = 3
and you check the return value from the SQL to see how many records were actually updated. The answer must be 0 or 1. If it is 1, all is well and you report the successful update back to your caller. But if it is 0, that means that some other process got there first and updated the same ledger, changing its version number from the 3 you were expecting to something bigger. Your changes are now in limbo; they were applied to a version of the object that is no longer current, so you throw an exception.

But is the exception safe? What if the caller had previously made changes to the database that should have been rolled back when the ledger failed to save? No problem! We had exposed the RDB transactions to the caller, so when the caller requested that a transaction be begun, we propagated that request into the RDB layer. When the exception aborted the caller's transaction, all the previous work we had done on its behalf was aborted back to the start of the RDB transaction, just as one wanted. The caller even had the option to catch the exception without allowing it to abort the RDB transaction, and to retry the failed operation.

The major drawback of the object store was that it was very difficult to aggregate data across ledgers: to do it you have to thaw each ledger, one at a time, and traverse its object structure looking for the data you want to aggregate. We planned that when this became important, we could have a method on the Ledger or its sub-objects which, when called, would store relevant numeric data into the right place in a conventional RDB table, where it would then be available for the usual SELECT and GROUP BY operations. The storage engine would call this whenever it wrote a modified Ledger back to the object store. The RDB tables would then be a read-only view of the parts of the data that were needed for building reports.

A related problem is some kinds of data really are relational and to store them in object form is extremely inefficient. The RDB has a terrible impedance mismatch for most kinds of object-oriented programming, but not for all kinds. The main example that comes to mind is that every ledger contains a transaction log of every transaction it has ever performed: when a consumer deducts its 5479 m¢, that's a transaction, and every day each consumer adds one to the ledger. The transaction log for a large ledger with many consumers can grow rapidly.

We planned from the first that this transaction data would someday move out of the ledger entirely into a single table in the RDB, access to which would be mediated by a separate object, called an Accountant. At present, the Accountant is there, but it stores the transaction data inside itself instead of in an external table.

The design of the object store was greatly simplified by the fact that all the data was divided into disjoint ledgers, and that only ledgers could be stored or retrieved. A minor limitation of this design was that there was no way for an object to contain a pointer to a Ledger object, either its own or some other one. Such a pointer would have spoiled Perl's lousy garbage collection, so we weren't going to do it anyway. In practice, the few places in the code that needed to refer to another ledger just store the ledger's GUID instead and looked it up when it was needed. In fact every significant object was given its own GUID, which was then used as needed. I was surprised to find how often it was useful to have a simple, reliable identifier for every object, and how much time I had formerly spent on programming problems that would have been trivially solved if objects had had GUIDs.

In all, I think the object store technique worked well and was a smart choice that went strongly against prevailing practice. I would recommend the technique for similar projects, except for the part where we wrote the object store ourselves instead of using one that had been written already. Had we tried to use an ORM backed by a relational database, I think the project would have taken at least a third longer; had we tried to use an RDB without any ORM, I think we would not have finished at all. After I had been using Moose for a couple of years, including the Moonpig project, Rik asked me what I thought of it. I was lukewarm. It introduces a lot of convenience for common operations, but also hides a lot of complexity under the hood, and the complexity does not always stay well-hidden. It is very big and very slow to start up. On the whole, I said, I could take it or leave it.

“Oh,” I added. “Except for Roles. Roles are awesome.” I had a long section in the talk about what is good about Roles, but I moved it out to a separate talk, so I am going to take that as a hint about what I should do here. As with my theory of dates and times, I will present only the thesis, and save the arguments for another post:

Object-oriented programming is centered around objects, which are encapsulated groups of related data, and around methods, which are opaque functions for operating on particular kinds of objects. OOP does not mandate any particular theory of inheritance, either single or multiple, class-based or prototype based, etc., and indeed, while all OOP systems have objects and methods that are pretty much the same, each has an inheritance system all its own. Over the past 30 years of OOP, many theories of inheritance have been tried, and all of them have had serious problems. If there were no alternative to inheritance, we would have to struggle on with inheritance. However, Roles are a good alternative to inheritance: Every problem solved by inheritance is solved at least as well by Roles. Many problems not solved at all by inheritance are solved by Roles. Many problems introduced by inheritance do not arise when using Roles. Roles introduce some of their own problems, but none of them are as bad as the problems introduced by inheritance. It's time to give up on inheritance. It was worth a try; we tried it as hard as we could for thirty years or more. It didn't work. I'm going to repeat that: Inheritance doesn't work. It's time to give up on it.Moonpig doesn't use any inheritance (except that Moonpig::DateTime inherits from DateTime, which we didn't control). Every class in Moonpig is composed from Roles. This wasn't because it was our policy to avoid inheritance. It's because Roles did everything we needed, usually in simple and straightforward ways.

I plan to write more extensively on this later on.

This section is the end of the things I want to excoriate. Note the transition from multiple inheritance, which was a tremendous waste of everyone's time, to Roles, which in my opinion are a tremendous success, the Right Thing, and gosh if only Smalltalk-80 had gotten this right in the first place look how much trouble we all would have saved.

Moonpig has a web API. Moonpig applications, such as the customer service dashboard, or the heartbeat job, invoke Moonpig functions through the API. The API is built using a system, developed in parallel with Moonpig, called Stick. (It was so-called because IC Group had tried before to develop a simple web API system, but none had been good enough to stick. This one, we hoped, would stick.)

The basic principle of Stick is distributed routing, which allows an object to have a URI, and to delegate control of the URIs underneath it to other objects.

To participate in the web API, an object must compose the Stick::Role::Routable role, which requires that it provide a _subroute method. The method is called with an array containing the path components of a URI. The _subroute method examines the array, or at least the first few elements, and decides whether it will handle the route. To refuse, it can throw an exception, or just return an undefined value, which will turn into a 404 error in the web protocol. If it does handle the path, it removes the part it handled from the array, and returns another object that will handle the rest, or, if there is nothing left, a public resource of some sort. In the former case the routing process continues, with the remaining route components passed to the _subroute method of the next object.

If the route is used up, the last object in the chain is checked to make sure it composes the Stick::Role::PublicResource role. This is to prevent accidentally exposing an object in the web API when it should be private. Stick then invokes one final method on the public resource, either resource_get, resource_post, or similar. Stick collects the return value from this method, serializes it and sends it over the network as the response.

So for example, suppose a ledger wants to provide access to its consumers. It might implement _subroute like this:

 sub _subroute {
 my ($self, $route) = @_;
 if ($route->[0] eq "consumer") {
 shift @$route;
 my $consumer_id = shift @$route;
 return $self->find_consumer( id => $consumer_id );
 } else {
 return; # 404
 }
 }
Then if /path/to/ledger is any URI that leads to a certain ledger, /path/to/ledger/consumer/12435 will be a valid URI for the specified ledger's consumer with ID 12345. A request to /path/to/ledger/FOOP/de/DOOP will yield a 404 error, as will a request to /path/to/ledger/consumer/98765 whenever find_consumer(id => 98765) returns undefined.

A common pattern is to have a path that invokes a method on the target object. For example, suppose the ledger objects are already addressable at certain URIs, and one would like to expose in the API the ability to tell a ledger to handle a heartbeat event. In Stick, this is incredibly easy to implement:

 publish heartbeat => { -http_method => 'post' } => sub {
 my ($self) = @_;
 $self->handle_event( event('heartbeat') );
 };
This creates an ordinary method, called heartbeat, which can be called in the usual way, but which is also invoked whenever an HTTP POST request arrives at the appropriate URI, the appropriate URI being anything of the form /path/to/ledger/heartbeat.

The default case for publish is that the method is expected to be GET; in this case one can omit mentioning it:

 publish amount_due => sub {
 my ($self) = @_;…
 return abs($due - $avail);
 };
More complicated published methods may receive arguments; Stick takes care of deserializing them, and checking that their types are correct, before invoking the published method. This is the ledger's method for updating its contact information:
 publish _replace_contact => {
 -path => 'contact',
 -http_method => 'put',
 attributes => HashRef,
 } => sub {
 my ($self, $arg) = @_;
 my $contact = class('Contact')->new($arg->{attributes});
 $self->replace_contact($contact);
 return $contact;
 };
Although the method is named _replace_contact, is is available in the web API via a PUT request to /path/to/ledger/contact, rather than one to /path/to/ledger/_replace_contact. If the contact information supplied in the HTTP request data is accepted by class('Contact')->new, the ledger's contact is updated. (class('Contact') is a utility method that returns the name of the class that represents a contact. This is probably just the string Moonpig::Class::Contact.)

In some cases the ledger has an entire family of sub-objects. For example, a ledger may have many consumers. In this case it's also equipped with a "collection" object that manages the consumers. The ledger can use the collection object as a convenient way to look up its consumers when it needs them, but the collection object also provides routing: If the ledger gets a request for a route that begins /consumers, it strips off /consumers and returns its consumer collection object, which handles further paths such as /guid/XXXX and /xid/1234 by locating and returning the appropriate consumer.

The collection object is a repository for all sorts of convenient behavior. For example, if one composes the Stick::Role::Collection::Mutable role onto it, it gains support for POST requests to …/consumers/add, handled appropriately.

Adding a new API method to any object is trivial, just a matter of adding a new published method. Unpublished methods are not accessible through the web API.

After I wrote this talk I wished I had written a talk about Stick instead. I'm still hoping to write one and present it at YAPC in Orlando this summer.

Unit tests often have a lot of repeated code, to set up test instances or run the same set of checks under several different conditions. Rik's Test::Routine makes a test program into a class. The class is instantiated, and the tests are methods that are run on the test object instance. Test methods can invoke one another. The test object's attributes are available to the test methods, so they're a good place to put test data. The object's initializer can set up the required test data. Tests can easily load and run other tests, all in the usual ways. If you like OO-style programming, you'll like all the same things about building tests with Test::Routine. All this stuff is available for free under open licenses: (This has been a really long article. Thanks for sticking with me. Headers in the article all have named anchors, in case you want to refer someone to a particular section.)

(I suppose there is a fair chance that this will wind up on Hacker News, and I know how much the kids at Hacker News love to dress up and play CEO and Scary Corporate Lawyer, and will enjoy posting dire tut-tuttings about whether my disclosure of ICG's secrets is actionable, and how reluctant they would be to hire anyone who tells such stories about his previous employers. So I may as well spoil their fun by mentioning that I received the approval of ICG's CEO before I posted this.)


[Other articles in category /prog] permanent link


   

Writing a full-text search engine using Bloom filters - Stavros' Stuff

$
0
0

Comments:"Writing a full-text search engine using Bloom filters - Stavros' Stuff"

URL:http://www.stavros.io/posts/bloom-filter-search-engine/?print


Search engines and Bloom filters: A match made in heaven?

A few minutes ago I came across a Hacker News post that detailed a method of adding search to your static site. As you probably know, adding search to a static site is a bit tricky, because you can’t just send the query to a server and have the server process it and return the results. If you want full-text search, you have to implement something like an inverted index.

How an inverted index works

An inverted index is a data structure that basically maps every word in every document to the ID of the document it can be found in. For example, such an index might look like {"python": [1, 3, 6], "raspberry": [3, 7, 19]}. To find the documents that mention both “python” and “raspberry”, you look those terms up in the index and find the common document IDs (in our example, that is only document with ID 3).

However, when you have very long documents with varied words, this can grow a lot. It’s a hefty data structure, and, when you want to implement a client-side search engine, every byte you transmit counts.

Client-side search engine caveats

The problem with client-side search engines is that you (obviously) have to do all the searching on the client, so you have to transmit all available information there. What static site generators do is generate every required file when generating your site, then making those available for the client to download. Usually, search-engine plugins limit themselves to tags and titles, to cut down on the amount of information that needs to be transmitted. How do we reduce the size? Easy, use a Bloom filter!

Bloom filters to the rescue

A Bloom filter is a very interesting data structure that can store elements in a fixed number of bits and tell you whether it’s seen those elements before when you query it. It sounds great for our use case, but let’s see if it will live up to the challenge.

A simple way to implement a full-text search engine that uses Bloom filters is the following:

  • Create one filter per document and add all the words in that document in the filter.
  • Serialize the (fixed-size) filter in some sort of string and send it to the client.
  • When the client needs to search, iterate through all the filters, looking for ones that match all the terms, and return the document names.
  • Profit!

Next, we’ll implement this very quickly in my favorite language, Python, using pybloom.

A quick implementation

To start with, let’s read some posts from this blog and create a list of all the words in each one:

frompybloomimportBloomFilterimportosimportre# Read all my posts.posts={post_name:open(POST_DIR+post_name).read()forpost_nameinos.listdir(POST_DIR)}# Create a dictionary of {"post name": "lowercase word set"}.split_posts={name:set(re.split("\W+",contents.lower()))forname,contentsinposts.items()}

At this point, we have a dictionary of posts and a normalized set of words in each. We could do more things, like stemming, removing common words (a, the, etc), but we’re going for naive, so let’s just create the filters for now:

filters={}forname,wordsinsplit_posts.items():filters[name]=BloomFilter(capacity=len(words),error_rate=0.1)forwordinwords:filters[name].add(word)

You can see above that the capacity of each filter is exactly the number of words in each post, to cut down on the number of bits needed to represent it. The error rate is tweakable, and it is the probability of a false positive. The lower the probability, the more accurate the filter, but the longer it becomes.

Now that we have all the filters ready, we can transmit them to the client using whatever serialization method we like. Let’s write a very simple search algorithm to find posts based on some search terms:

defsearch(search_string):search_terms=re.split("\W+",search_string)return[nameforname,filterinfilters.items()ifall(terminfilterforterminsearch_terms)]

All search() does is iterate through all the filters and returning the ones that match every given term. Let’s try it out:

>>>search("android raspberry")['2013-06-19 - how-remote-control-rf-devices-raspberry-pi.md','2013-06-24 - writing-my-first-android-app-control-your-raspberr.md']

Judging by the titles, it found all the relevant posts, and without any false positives! Not bad at all, for a few minutes’ work! Let’s see what the average size of the filter is:

>>>sum(len(filter.bitarray.tobytes())forfilterinfilters.values())/len(filters)298

298 bytes per post strikes me as a pretty reasonable size for something like this. We could decrease this further, if we didn’t mind more false positives, but, given the search results above, I think it’s pretty good for such a naive approach. For comparison, this paragraph is also 298 bytes long.

Strengths and weaknesses of this approach

Using a Bloom filter has a few advantages that make it suitable for use in static sites:

  • The space it takes up is proportional to the number of pages, rather than the number of words, so it takes up roughly the same space for a very long page as for a very short one (this isn’t exactly true because we size the filters depending on the number of words, but it’s much more compact than an inverted index).
  • Search complexity is proportional to the number of pages, rather than to their length. This doesn’t matter much when you have, at most, a few thousand pages, but it’s still good if you only have a few long ones.
  • Since Bloom filters are probabilistic, this method may produce false positives, but it will not produce false negatives, which is what we want for a search engine (we don’t mind a few irrelevant pages in the results, but we do mind if relevant ones are missing).

Naturally, it also has weaknesses, which include:

  • You can’t weight pages by relevance, since you don’t know how many times a word appears in a page, all you know is whether it appears or not. You may or may not care about this.
  • You need to implement a Bloom filter algorithm on the client side. This will probably not be much longer than the inverted index search algorithm, but it’s still probably a bit more complicated.

Of course, the full-text index will still be large and probably not practical to load on every page view, even when using this approach, but this method may be suitable for using in a dedicated “search” page in your static site.

I’m not actually advocating using this method on your static site, but, hey, it made for a fun hour of Saturday night hacking. Now, it’s time to go out for drinks. If you’re in Thessaloniki and want to join me, drop me an email or get me on Twitter.

Advanced Image Optimization Tricks

$
0
0

Comments:"Advanced Image Optimization Tricks"

URL:http://sixrevisions.com/web-development/advanced-image-optimization/


Tweet

Dec 24 2013 by Rahul Mistry |

You can use automated image optimization tools to compress your images. However, if you also take the time to manually optimize them, you can further improve your results. Here are five techniques for manually optimizing images.

Gaussian Blur JPEG Optimization

Gaussian blur softens the details of an image. In photo-editing, it’s typically used to enhance a photo’s quality or to give it an interesting visual effect.

However, if you only introduce a small amount of Gaussian blur to a photo — an amount that doesn’t alter its visual fidelity too much — you can lower its file size.

Demonstration

The following image is 60.9 KB:

We’ll open the image in Photoshop and then we will apply Filter > Blur > Gaussian Blur.

We then increase the Radius option until it starts to noticeably reduce the sharpness of the image. When then choose a value that’s visually acceptable to us.

After applying the Gaussian Blur filter, we then save our image in the normal manner.

Here is the optimized image:

The optimized image is 58.7 KB— a 3.6% decrease in file size.

Image Posterization

Posterization allows us to lower the file size of an image without harming the perceived image quality too much. Posterization works by converting continuous color gradients into non-continuous segments that require fewer colors to render.

Demonstration

In this demo, I will use a PNG image from a freebie:

The PNG image above is 51.0 KB.

I opened the PNG image in Photoshop to posterize it.

To posterize the image, go to Image > Adjustments > Posterize. In the Posterize dialog window, check the Preview option to see your edits in real-time. Set the Levels option to the lowest possible value you can get away with.

For my example, at a Levels value less than 76, the perceived image quality degradation is no longer acceptable to me.

After applying the image adjustment, we then just save the PNG as we normally would.

Below is the optimized image:

Because I was very aggressive with the posterization, the optimized image is only 37.6 KB— a 26.3% decrease in file size.

Further Reading

Pixel-fitting

Pixel-fitting is a useful technique for ensuring high-quality results for vector graphics that are converted to raster graphics.

Simple, non-photographic images such as icons and logos are best created as vector graphics because doing so allows us to scale them to different sizes without fidelity loss.

However, a problem often occurs when vector graphics are converted into static image formats (raster graphics) such as JPEG or PNG. When we use an image-editing software like Photoshop to automatically convert a vector graphic to a raster graphic, it tries to do its best to smooth out the edges — an automated process referred to as anti-aliasing.

The results of anti-aliasing varies. Often it leads to poor-quality results. In order to enhance the quality of the graphic, we can manually edit the pixels to make sure they fit inside the pixel grid. This is called pixel-fitting (or pixel hinting).

Source: dcurt.is

Using an image editor such as Photoshop, you can zoom into the vector graphic and then manually move its vector paths a bit until they fit perfectly inside the pixel grid before you save the vector as a raster:

Pixel-fitting only works for straight lines so you will have to rely on anti-aliasing to display curves.

Further Reading on Pixel-fitting

8-pixel Grid JPEG Optimization

I came across this trick from Smashing Magazine’s article called Clever JPEG Optimization Techniques. In the same article, you will also find other useful tricks for optimizing JPEGs.

A JPEG image is divided into 8x8px blocks, and each block can be treated as its own entity.

By carefully aligning parts of the image within the 8x8px grid you can lower the file size of the image as well as get better image-quality results.

To demonstrate: I created two identical 8x8px square objects that I then saved in JPEG using a very high compression level (to make the difference more pronounced). The top square is not aligned inside the 8x8px grid.

Notice the quality difference and the extra pixels that are rendered for the one that isn’t aligned to the 8x8px grid.

This optimization trick is useful for JPEG images containing rectangular objects because you can easily fit them in a grid.

Further Reading

  • JPEG optimization. Part 1— Sergey Chikuyonok (author of the Smashing Magazine article mentioned above) discusses the 8x8px JPEG concept in this tutorial

Selective JPEG Compression

The way typical JPEG compression works is a fixed level of compression is applied to the entire image.

In selective JPEG compression, we manually specify different compression levels for different areas of the image.

For example, we might want important areas of a photo to have a lower level of compression/higher-quality because we want to ensure that those areas look good. But then for other parts of the same image, like the photo’s background and low-detailed sections, we might be able to get away with a higher level of compression/lower-quality.

Demonstration

Selective JPEG compression can be done using Adobe Fireworks.

The photo below is compressed at a quality level of 80. Its file size is 54.0 KB.

Looking at the original photo, it appears that we use selective image compression, particularly by increasing the compression/lowering the quality of the blue sky in the background and most of the black wires.

In Adobe Fireworks, we can mask the areas we want to protect. The masked area will have a higher quality level (80)/lower image compression. The rest of the image — the parts that are not masked — will get a lower quality level (60)/higher image compression.

We can use one of the Lasso tools (in my case, I used the Polygon Lasso tool) to place a marquee selection around parts we want to protect.

Once you are done selecting around the high-quality areas, go to Modify > Selective JPEG > Save Selection as JPEG Mask.

The parts of the image that will have a quality level of 80 will now be highlighted:

In the Optimize panel, lower the Quality option to 60 and set the Selective quality option to 80. (If you can’t see the Optimize panel, make sure Window > Optimize is checked.)

Then just go to File > Save as to save the original image as a JPEG.

The image shown below uses selective compression. It’s 50.2 KB– a 7.0% decrease in file size versus the non-selective compression I showed you earlier.

You will have to play around with the selective compression settings and masking in order to get your desired results. In the case example above, detail-oriented folks will notice a huge difference between the two images. However, the results of the optimization might be alright under most people’s standards.

Selective JPEG compression is very time-consuming and the file size reduction is only slight in most instances. It’s impractical if you’re dealing with a lot of photos. However, if you are really concerned about optimizing image quality and image file size, this is one option.

Further Reading

Conclusion

There are simpler ways to optimize an image. Just using automated tools such as Photoshop’s Save for Web & Devices command and lossless compression tools like Smush.it can greatly reduce your image file sizes.

However, if you’re looking for finer image optimization control and even more file size reductions, try out the tricks above. An ideal workflow would be to use a lossless compression tool like Kraken.io or Smush.it, which will remove a big chunk of your image’s file size without affecting its quality. And then you can use the appropriate tricks discussed above to fine-tune your results.

Related Content

About the Author

Rahul Mistry a web design enthusiast and writer for Heart Internet. Connect with Rahul on Twitter @mistry213 and Google+.

Viewing all 9433 articles
Browse latest View live