Quantcast
Channel: Hacker News 50
Viewing all 9433 articles
Browse latest View live

The Saddest SaaS Pricing Pages of the Year

$
0
0

Comments:" The Saddest SaaS Pricing Pages of the Year "

URL:http://blog.priceintelligently.com/blog/bid/192297/The-Saddest-SaaS-Pricing-Pages-of-the-Year


The Saddest SaaS Pricing Pages of the Year

Discuss on Hacker News here. 

Your pricing page is the center of your universe. Everything you do, from your marketing and sales to your product development and support, works to drive people to that page, convert them, and keep them coming back. 

Yea. It’s that important. 

We’ve documented some amazing SaaS pricing pages in the past, chiefly in our annual SaaS pricing page pageant, but we decided to point out some pages that need a good New Year’s resolution. Many of the following pages have a lot going for them and come from phenomenally great companies (they’re clearly doing something right). Yet, they’ve fallen short in two key areas: design clarity and simplification. Let’s look at these two big points, apply them to some examples, and get you on your way to creating a better page. 

Overcoming the “Too Many Check Marks” Problem

If you’re a product person, I reckon you’re pretty proud of every single feature that you put out to the world (at least in theory). If you weren’t, then why the heck would you build them, right? You’re a maestro of the command line and the right customer will appreciate every subtle detail you’ve pushed live. As a result, you need your page to look something like Qualaroo's pricing. Look at all those beautiful checkmarks. Over 50% of the features displayed are included in every plan.

Qualaroo's Pricing 


You’re making the purchasing decision 10x more difficult

Here’s the thing though: most customers don’t care about every detail; they only care about what’s important to them. Including every single checkmark on the page limits your prospect’s ability to make a quick decision between your tiers. You want them to place themselves in a bucket without having to wade through rows and rows of features trying to figure out what’s core to the product and what’s included in each plan. You’re also allowing doubt to slip into their mind with questions of, “wait, is that feature I was marketed included in this plan or not?”

This is why pricing pages like UserVoice's pricing are so effective. Customers come in, see exactly what’s differentiated between the plans, and can make a quick decision. UserVoice even gives them an option to see the floodgate of features, but only if they want to click through.

UserVoice's Pricing


You’re not allowing yourself to market these core features effectively

Calling something “Deeper Analytics” and putting a small description below the tag or in a hover over question mark isn’t effective product marketing. It’s deeper analytics; it’s value goes well beyond a tiny blurb. Buffer falls prey to this crevasse with their Buffer for business product by having a beautifully built site and page, but taking up 70% of their pricing matrix with features that are included in each plan. How slick would it be if they eliminated this block of checkmarks and created a section that said, “All business plans include:” and then had a product marketing rich overview of each of the features. 

A phenomenal example of this style of marketing is Wistia’s pricing page. They keep it so simple by including all features with all plans (and marketing accordingly), displaying their differentiation only along their value metrics of videos and bandwidth.  

Buffer for Business Pricing 

Wistia's Pricing


You’re making it harder to upsell me

On the eighth day when the powers that be created pricing pages they realized the beauty that the tiered structure had on anchoring and upselling customers. Suddenly, customers who were typically basic customers saw four plans and thought, “well, I’m not basic, but I’m not enterprise, so I guess I land in the second plan.” Boom. MRR boosted. 

Here’s the problem though: confusing pricing pages not only hurt the decision making process, they destroy this upgrade potential. Look at DocuSign’s page. Great company, but there are more checkmarks here than I know what to do with, leading me to just pick the cheapest option out of frustration, rather than choice. 

Docusign's Pricing

The “We only have one customer” or “We have so many types of really specific customers” problem

Products start out as problems that real individuals have in their lives. The beauty of developing a SaaS product remains in your ability to quickly and nimbly respond to those individuals’ needs. This malleability in feature prioritization is a blessing and a curse though, as you could theoretically build something for every group of people on the planet (or build something for no one on the planet). Your product and your business need to therefore start with identifying and quantifying your customer personas

Your business then becomes a game of cloning your customers, focusing all of your efforts like a laser set out on world domination. Yet, far too often we see companies both successful and struggling who haven’t honed in on this key step in the process. 

This is easy to spot in pricing pages that are either too simple with minimal tiers or too complicated with convoluted or too many tiers. Both ends of the spectrum result in cash being left on the table and even more confusion for customers coming through the door. 

Too few tiers leaves cash on the table

Take Hootsuite's pricing for example, a killer company with over 8 million users. The problem is they only have one main tier along with a simpler freemium plan and a “contact us” enterprise plan. You mean to tell me that in eight million users there’s only one main person? We’d buy this if the product were much simpler, but this page alone has sixteen points of differentiation and I’m sure there are more features not even mentioned that could be helpful. What’s worse is that this page suffers from some of the simplification elements discussed above, as well. It’s sad to see so much potential cash being left on the table. 

You need to make sure you quantify your customer personas and ensure your pricing tiers align to those personas. One tier for each persona and each tier should be mutually exclusive. 

Hootsuite's Pricing


Too much differentiation is just as bad

A lot of companies miss out on sales opportunities by utilizing too much differentiation or too many plans, as well. The result is mainly the dissonance in the mind of the customer we discussed above, but look at Onelogin and their pricing page for example. There’s so much going on with this page that, as a customer, I’m not really sure where I fit in initially. The plan labels do help, but as I dig into the features I get more confused. They’ve over optimized the differentiation between their plans. 

OneLogin's Pricing

Dyn has a similar problem with their transactional email pricing where the email send thresholds and the feature differentiation leads to a lot of confusion (although some of this could be cleared up by stylistically getting rid of those “X”s).

Dyn's Transactional Email Pricing

Keep Your Pricing Page Simple and Focused on the Customer

We say it all the time, but it bears repeating: your pricing, just like your product, marketing, sales, and the like, all starts with your customer. Check out some of our other posts on SaaS pricing pages (including our SaaS pricing page pageant), as well as the big study we did on the top 270 SaaS pricing pages to learn more about constructing your page and your pricing strategy as a whole. 

Discuss on Hacker News here. 


Improve Your Python: Metaclasses and Dynamic Classes With Type

$
0
0

Comments:"Improve Your Python: Metaclasses and Dynamic Classes With Type"

URL:http://www.jeffknupp.com/blog/2013/12/28/improve-your-python-metaclasses-and-dynamic-classes-with-type/


metaclasses and the type keyword are each examples of little used (and, thus, not well understood by most) Python constructs. In this article, we'll explore the different, erm, "types" of type() and how the Little-known use of type relates to metaclasses.

Are You My Type?

The first use of type() is the most widely known and used: to determine the type of an object. Here, Python novices commonly interrupt and say, "But I thought Python didn't have types!" On the contrary, everything in Python has a type (even the types!) because everything is an object. Let's look at a few examples:

>>>type(1)<class'int'>>>>type('foo')<class'str'>>>>type(3.0)<class'float'>>>>type(float)<class'type'>

The type of type

Everything is as expected, until we check the type of float. <class 'type'>? What is that? Well, odd, but let's continue:

>>>classFoo(object):...pass...>>>type(Foo)<class'type'>

Ah! <class 'type'> again. Apparently the type of all classes themselves istype (regardless of if they're built-in or user-defined). What about the type of type itself?

>>>type(type)<class'type'>

Well, it had to end somewhere. type is the type of all types, including itself. In actuality, type is a metaclass, or "a thing that builds classes". Classes, like list(), build instances of that class, as in my_list = list(). In the same way, metaclasses build types, like Foo in:

classFoo(object):pass

Roll Your Own Metaclass

Just like regular classes, metaclasses can be user-defined. To use it, you set a class's __metaclass__ attribute to the metaclass you built. A metaclass can be any callable, as long as it returns a type. Usually, you'll assign a class's __metaclass__ to a function that, at some point, uses a variant of type we've not yet discussed: the three parameter variety used to create classes.

The Darker Side of type

As mentioned, it turns out that type has a totally separate use, when called with three arguments. type(name, bases, dict) creates a new type, programmatically. If I had the following code:

classFoo(object):pass

We could achieve the exact same effect with the following:

Foo=type('Foo',(),{})

Foo is now referencing a class named "Foo", whose base class is object (classes created with type, if specified without a base class, are automatically made new-style classes).

That's all well and good, but what if we want to add member functions to Foo? This is easily achieved by setting attributes of Foo, like so:

defalways_false(self):returnFalseFoo.always_false=always_false

We could have done it all in one go with the following:

Foo=type('Foo',(),{'always_false':always_false})

Of course, the bases parameter is a list of base classes of Foo. We've been leaving it empty, but it's perfectly valid to create a new class derived fromFoo, again using type to create it:

FooBar=type('FooBar',(Foo),{})

When Is This Ever Useful?

Once explained to someone, type and metaclasses are one of those topics where the very next question is, "OK, so when would I use it?". The answer is, not very often at all. However, there are times when creating classes dynamically with type is the appropriate solution. Let's take a look at an example.

sandman is a library I wrote to automatically generate a REST API and web-based admin interface for existing databases (without requiring any boilerplate code). Much of the heavy lifting is done by SQLAlchemy, an ORM framework.

There is only one way to register a database table with SQLAlchemy: create a Model class describing the table (not unlike Django's models). To get SQLAlchemy to recognize a table, a class for that table must be created in some way. Since sandman doesn't have any advanced knowledge of the database structure, it can't rely on pre-made model classes to register tables. Rather, it needs to introspect the database and create these classes on the fly. Sound familiar? Any time you're creating new classes dynamically, type is the correct/only choice.

Here's the relevant code from sandman:

ifnotcurrent_app.endpoint_classes:db.metadata.reflect(bind=db.engine)fornameindb.metadata.tables():cls=type(str(name),(sandman_model,db.Model),{'__tablename__':name})register(cls)

As you can see, if the user has not manually created a model class for a table, it is automatically created with a __tablename__ attribute set to the name of the table (used by SQLAlchemy to match tables to classes).

In Summary

In this article, we discussed the two uses of type, metaclasses, and when the alternate use of type is required. Although metaclasses are a somewhat confusing concept, hopefully you now have a good base off of which you can build through further study.

« Your Database Just Got Its Own Website

Please enable JavaScript to view the comments powered by Disqus. comments powered by

Saved | Lentor Solutions

Hyperloop: Not so fast! | Guy and Seth on Simulink

$
0
0

Comments:"Hyperloop: Not so fast! | Guy and Seth on Simulink"

URL:http://blogs.mathworks.com/seth/2013/11/22/hyperloop-not-so-fast/


November 22nd, 2013

Hyperloop: Not so fast!

This week Matt Brauer is back to describe work he did to analyze the trajectory of the Hyperloop proposal. This is a key input to the Hyperloop Simulink models we are building.

Matt's conclusion is surprising:
From the perspective of geographic constraints and rider comfort, the 760 mph peak speed is not an issue. It’s the 300 mph section through the suburbs of San Francisco that requires closer consideration.

Where is the Hyperloop going?

As mentioned in a previous blog post, we’ve begun implementing Simulink models of the Hyperloop concept. To exercise those models, a core piece of information is the trajectory. We need to know where the Hyperloop is going and at what speed. I analyzed the proposal to derive this information, and what I found was interesting. From the perspective of geographic constraints and rider comfort, the 760 mph peak speed is not an issue. It’s the 300 mph section through the suburbs of San Francisco that requires closer consideration.


Potential Hyperloop route, including image from http://www.spacex.com/sites/spacex/files/hyperloop_alpha-20130812.pdf and Google Earth

The basic idea put forth in the proposal is to follow existing highways as much as possible. In order to limit lateral accelerations experienced by the passengers to 0.5g, there needs to be “minor deviations when the highway makes a sharp turn”. I remember from Freshmen Physics class that centripetal acceleration in a curve is proportional to the square of velocity. This means that increasing your velocity from 76 to 760 mph is actually a 100x multiplication of g forces.

Is it really possible to average 600 mph across the state of California on available land without making passengers nauseous? Or would a reasonable trajectory require wide loops, impeding on private citizens’ backyards? I put the technical computing power of MATLAB to work on answering these questions.

Technical Computing of a Route

I won’t go into all the details because this is a Simulink blog, but here is an overview of the steps I followed:

  • First, I used Google Earth to get a set of longitude and latitude points along the California highways I-5 and I-580. After getting directions, saved a KML file with the data.
  • I used the read_kml submission from the MATLAB Central File Exchange to import the data in MATLAB.
  • It is possible to use functions like wmsfind, wmsupdate and wmsread from the Mapping Toolbox to add topography data to the route and surrounding area. However, I only used this information for plotting. For this first study, the derived route is only 2-dimensional.
  • This gave me a set of discrete points along the highways, to which I associated a desired velocity based on the Hyperloop document.
  • From these target points, I created a smoothed trajectory using the fit function from the Curve Fitting Toolbox.
  • Using this smoothed trajectory, I wrote a simple script to calculate and plot the lateral acceleration along this trajectory.
  • For the portions of the trajectory that were exceeding 0.5g's, I adjusted some target points to drive lateral accelerations below 0.5g’s while staying as close as possible to the original route.

In the end, I got the results below:


Velocity and Acceleration along derived Hyperloop route (created using the Mapping Toolbox)

You can see that the travel time is very close to the advertised 35 minutes. The accelerations were kept within reason, although it looks like a pretty dynamic ride.

How much does the Hyperloop need to deviate from the existing highways in order to achieve these results?

The red line shows the targeted highways. The yellow portions show sections of the derived Hyperloop route that deviate from the mean highway route by more than 50m. The yellow, "off-highway", portions account for 113 miles of the 347 mile trip.

To review in depth, I wrote the trajectory back to KML using kmlwriteline from the Mapping Toolbox. Now it's possible to view the derived route in Google Earth.


No major issues seen along I-5 (Image created using Google Earth)

Again, the red line is the mean highway route. Here, the yellow line shows the complete derived Hyperloop route. Upon reviewing the details, the route seems pretty reasonable up until I-580 outside of the bay area.

When I-580 turns west and starts going through the San Francisco suburbs, it becomes difficult to stay on the road. Here’s a snapshot of a particularly rough curve:


Re-routing along I-580 to avoid excessive g-forces (Image created using Google Earth)

Below is the route data for the area surrounding the above curve.

Despite the issues highlighted above, the final conclusion is quite positive. It seems that the first 295 miles of the route can be accomplished in about 27 minutes without excessive g’s on the passengers or encroaching on private property. “Landing” in the Bay Area may take a bit longer than originally advertised, but that seems like splitting hairs at this point. Further investigation is certainly warranted. So, let’s get those Simulink models running!

Now it's your turn

What do you think? Is the Hyperloop going in the right direction?

Given Elon Musk's statement that the Hyperloop should be an open design concept, would you be interested use Matt's trajectory and begin filling some of the boxes in our Hyperloop model architectures?

Let us know what you think by leaving a comment here.

By Guy Rouleau

10:12 UTC | Posted in Challenge, Community, Fun, Guest Blogger |Permalink |16 Comments »

You can follow any responses to this entry through the RSS 2.0 feed. You can skip to the end and leave a response. Pinging is currently not allowed.

Leave a Reply

The Power (and Peril) of Praising Your Kids -- New York Magazine

Start with a clean 2014 Take 2 minutes to scan your apps permissions

phinze/homebrew-cask · GitHub

$
0
0

Comments:"phinze/homebrew-cask · GitHub"

URL:https://github.com/phinze/homebrew-cask


"To install, drag this icon..." no more!

Let's see if we can get the elegance, simplicity, and speed of Homebrew for the installation and management of GUI Mac applications such as Google Chrome and Adium.

brew-cask provides a friendly homebrew-style CLI workflow for the administration of Mac applications distributed as binaries.

It's implemented as a homebrew "external command" calledcask.

Let's try it!

$ brew tap phinze/cask$ brew install brew-cask$ brew cask install google-chrome=> Downloading https://dl.google.com/chrome/mac/stable/GGRO/googlechrome.dmg=> Success! google-chrome installed to /opt/homebrew-cask/Caskroom/google-chrome/stable-channel=> Linking Google Chrome.app to /Users/phinze/Applications/Google Chrome.app

And there we have it. Google Chrome installed with a few quick commands: no clicking, no dragging, no dropping.

open ~/Applications/"Google Chrome.app"

Learn More

  • Find basic documentation on using homebrew-cask in USAGE.md
  • Want to contribute? Awesome! See CONTRIBUTING.md
  • More project-related details and discussion are available in FAQ.md

Questions? Wanna chat?

We're really rather friendly! Here are the best places to talk about the project:

  • Start an issue on GitHub
  • Join us on IRC, we're at #homebrew-cask on Freenode

License:

Code is under the BSD 2 Clause (NetBSD) license

bzr is dying; Emacs needs to move


Ocrad.js - Optical Character Recognition in Javascript

$
0
0

Comments:"Ocrad.js - Optical Character Recognition in Javascript"

URL:http://antimatter15.com/wp/2013/12/ocrad-js-pure-javascript-ocr-via-emscripten/


Ocrad.js 

Optical Character Recognition in JS

Ocrad.js is a pure-javascript version of the Ocrad project, automatically converted using Emscripten. It is a simple OCR (Optical Character Recognition) program that can convert scanned images of text back into text. Clocking in at about a megabyte of Javascript with no hefty training data dependencies (looking at you, Tesseract), it's on the lighter end of the spectrum.

This was made by antimatter15 (please follow me on Twitter or G+)

Below is a simple demo, which should hopefully demonstrate the capabilities but will more likely show the substantial limitations of the library. Hit the buttons on the left to reset the canvas or to randomly put some text in a random font. You can also try to draw something with your cursor.

You can also drag and drop an image from your computer (JPEG, PNG, GIF, BMP, SVG, or NetPBM) to feed into the text recognizer or choose a file by clicking anywhere on this box.

The Ocrad.js API is really simple. First you need to include ocrad.js which is about 1MB in size.

<script src="ocrad.js"></script>

This file exposes a single global function, OCRAD which takes an image as an argument and returns the recognized text as a string.

var string = OCRAD(image);
alert(string);

The image argument can be a canvas element, a Context2D instance, or an instance of ImageData.
Ocrad.js also exposes all of the C library functions in addition to the extremely simple high level API covered in the last section. By calling OCRAD.version() you can find the current version string: 0.23-pre1, the latest pre-release version of the software available.

string OCRAD.version()
void OCRAD.write_file(string filename, Uint8Array PBM Image Data)
pointer OCRAD.open()
int OCRAD.close(pointer descriptor)
(and more!)

What about GOCR.js? If you stumbled upon that page first, you might have realized that this entire page is a heinous act of plagiarism probably worthy of academic suspension— if not for the fact that I made that other page as well. It turns out that porting things with Emscripten is just so gosh-darned easy and addictive (don't tell the DEA I don't have an permscription). The neat thing about GOCR is that it compiles under Emscripten without any modifications, whereas Ocrad has some dependency issues.

Unlike GOCR.js, Ocrad.js is designed as a port of the library, rather than a wrapper around the executable. This means that processing subsequent images doesn't involve reinitializing an executable, so processing an image can be done in as little as an eighth of the time it takes GOCR.js to do the same (The fact that Ocrad is naturally faster than GOCR doesn't hurt this statistic either).

With a simple script which generates some text in some random font selected from the Google Font API, I ran a few hundred trials comparing the recognized text with the original text. The metric was a modified Levenshtein distance where the substitution cost for capitalization errors is 0.1. Of 485 trials, 175 trials ended up favoring GOCR.js, 184 favoring Ocrad.js, and 126 resulted in a tie. From playing with the draw tool, it seems that Ocrad is much more predictable and forgiving for minor alignment and orientation errors.

There have been some other comparisons on the performance of OCRAD versus GOCR. In this comparison done by Peter Selinger, Ocrad comes out just behind Tesseract. Another comparison by Andreas Gohr has GOCR performing better than Ocrad.

Accuracy wise, they're actually pretty close. It might be possible to create something which meshes together the outputs of both, picking whichever output matches a certain heuristic for quality. Ocrad does seem to vastly outperform GOCR when it comes to letter sketches on a canvas, so that's the one I'm focusing on here.

Aside from the relentless march of Atwood's law, there are legitimate applications which might benefit from client side OCR (I'd like to think that I'm currently working on one, and no, it's not solving the wavy squiggly letters blockading your attempts at building a spam empire). Arguably, it'd be best to go for porting the best possible open source OCR engine in existence (looking at you, Tesseract). Unlike OCRAD and GOCR, which interestingly seem to be powered by painstakingly written rules for each recognizable glyph, Tesseract uses neural networks and the ilk to learn features common to different letters (which means it's extensible and multilingual). When you include the training data, Tesseract is actually kind of massive— A functional Emscripten port would probably be at least 30 times the size of OCRAD.js!

Apple Acquires Rapid-Fire Camera App Developer SnappyLabs | TechCrunch

$
0
0

Comments:"Apple Acquires Rapid-Fire Camera App Developer SnappyLabs | TechCrunch"

URL:http://techcrunch.com/2014/01/04/snappylabs/


Apple has acquired the one-man photo technology startup SnappyLabs, maker of SnappyCam, sources tell me. The startup was founded and run solely by John Papandriopoulos, an electrical engineering PhD from the University Of Melbourne who invented a way to make the iPhone’s camera take full-resolution photos at 20 to 30 frames per second — significantly faster than Apple’s native iPhone camera.

I first noticed something was up when we got tipped off that SnappyCam had disappeared from the App Store and all of SnappyLabswebsites went blank. Sources have since affirmed that the company was acquired by Apple, and that there was also acquisition interest “from most of the usual players”, meaning other tech giants. I don’t have details on the terms of the deal, and I’m awaiting a response from Apple, which has not confirmed the acquisition.

But based on Papandriopoulos’ scientific breakthroughs in photography technology, it’s not hard to see why Apple would want to bring him in to help improve their cameras. The stragic acquisition of an extremely lean, hard technology-focused team (of one) fits with Apple’s MO. It typically buys smaller teams to work on specific products rather than buying big staffs and trying to blend them in across the company.

Papandriopoulos built his burst-mode photo technology into SnappyCam, which he sold in the Apple App Store for $1. After I profiled the app in July, Papandriopoulos told me SnappyCam jumped to #1 on the paid app chart in nine countries. Sales of the app let him run SnappyLabs without big funding from venture capital firms.

Back in July, Papandriopoulos told me he had a eureka moment in “discrete cosine transform JPG science” and had essentially reinvented the JPG image format. In a blog postnow taken down, the SnappyLabs founder explained

“First we studied the fast discrete cosine transform (DCT) algorithms…We then extended some of that research to create a new algorithm that’s a good fit for the ARM NEON SIMD co-processor instruction set architecture. The final implementation comprises nearly 10,000 lines of hand-tuned assembly code, and over 20,000 lines of low-level C code. (In comparison, the SnappyCam app comprises almost 50,000 lines of Objective C code.) JPEG compression comprises two parts: the DCT (above), and a lossless Huffman compression stage that forms a compact JPEG file. Having developed a blazing fast DCT implementation, Huffman then became a bottleneck. We innovated on that portion with tight hand-tuned assembly code that leverages special features of the ARM processor instruction set to make it as fast as possible.”

By bringing Papandriopoulos in-house, Apple could build this technology and more into its iPhone, iPad, Mac, and MacBook cameras. Photography is a core use for smartphones, and offering high-resolution, rapid-fire burst mode shooting could become a selling point for iPhones over competing phones.

And in case you were wondering if Papandriopoulos will be a good fit at Apple, he once dressed as an iPhone at a San Francisco parade.

For more on SnappyLabs, read my profile of the startup

Largest small system emulator

$
0
0

Comments:"Largest small system emulator"

URL:http://ioccc.org/2013/cable3/hint.html


Adrian Cable
adrian.cable@gmail.com

Judges' comments:

To build:

make cable3

To run:

./cable3 bios-image-file floppy-image-file [harddisk-image-file]

Try:

./runme

Selected Judges Remarks:

This entry weighs in at a magical 4043 bytes (8086 nibbles, 28,301 bits). It manages to implement most of the hardware in a 1980’s era IBM-PC using a few hundred fewer bits than the total number of transistors used to implement the original 8086 CPU.

If you are using OS X, the included sc-ioccc.terminal configuration file will correctly display console applications that use ANSI graphics.

Author’s comments:

The author hereby presents, for the delectation (?) of the judges, a portable PC emulator/VM written specifically for the IOCCC which runs DOS, Windows 3.0, Excel, MS Flight Simulator, AutoCAD, Lotus 1-2-3 …

In just 4043 bytes of C source, you get a complete mid-late 1980s-era IBM-compatible PC, consisting of:

  • Intel 8086/186 CPU
  • 1MB RAM
  • 8072A 3.5" floppy disk controller (1.44MB/720KB)
  • Fixed disk controller (supports a single hard drive up to 528MB)
  • Hercules graphics card with 720x348 2-color graphics (64KB video RAM), and CGA 80x25 16-color text mode support
  • 8253 programmable interval timer (PIT)
  • 8259 programmable interrupt controller (PIC)
  • 8042 keyboard controller with 83-key XT-style keyboard
  • MC146818 real-time clock
  • PC speaker

The emulator uses the SDL graphics library for portability, and compiles for Windows, Mac OS X, Linux and probably most other 32-bit/64-bit systems too.

If you like living on the edge you can try building the emulator on a big endian machine, and you will get an emulation of a big endian 8086, a rather bizarre and somewhat useless beast. For everyone else, please run the emulator on a little endian machine.

RULE 2 ABUSE DISCLAIMER

  • cable3.c is 4043 bytes in length (half an 8086)
  • iocccsize -i < cable3.c returns 1979 (the year the first 8086-based computer, the Seattle Computer Products SCP200B, was released)
  • Therefore, any suspicions the judges may have regarding rule 2 non-compliance may be well-intentioned but are groundless.
  • Nonetheless, the author would like to apologise to the judges for the one-big-block-of-code nature of this entry, which turned out to be unavoidable. Hopefully the joys of this entry will make up for its shortcomings.

/RULE 2 ABUSE DISCLAIMER

Why is this entry obfuscated/interesting?

  • First of all the 8086 is a nightmare processor to emulate. Instruction codings are complex and irregular in size and structure, with multiple addressing modes and no consistent memory placement for operands, very often multiple possible encodings for the same instruction, and the bizarre segment:offset memory model. In addition, the 8086 has a number of bugs (e.g. PUSH SP), undocumented behaviours and instructions (e.g. AAM/AAD + imm8, SALC, flag behaviour for MUL/DIV, etc. etc.), and archaic features (e.g. parity/auxiliary flags) which all need to be emulated properly. Here we emulate every feature of the CPU pretty exactly (in fact better than most commercial clones of the processor e.g. the NEC V30), with the exception of the trap flag which no real software uses except for debuggers (although support for the TF can be added if deemed important, without exceeding the IOCCC size limit).
  • In addition to the CPU we also emulate all the standard PC peripheral hardware. Parts of it are somewhat complete, much is rather dysfunctional (like the 8253/8259) but enough to support most real software.
  • Nonetheless the source, while large for the IOCCC, is rather tiny for a functional PC emulator - around 2% of the size of the source of other open-source 8086 PC emulators with comparable functionality.
  • In fact, this entry is just like happy hour - at 4043 bytes long, you pay for half the 8086, but you get served the whole one.
  • Getting the code down to <= 4096 characters to meet the IOCCC rule 2 overall size limit required a good deal of effort (and appreciable risk of divorce), with obfuscation being a pretty unavoidable consequence in most places … for the rest, I use short circuit operators whenever possible, mix x[y] and y[x], don’t use any flow control keywords except for for, use K&R-style declarations to honor the C language’s rich cultural heritage (plus it saves a byte), and occasionally go overboard with nasty nested indexes to give things like this: –64[T=1[O=32[L=(X=*Y&7)&1,o=X/2&1,l]=0,t=(c=y)&7,a=c/8&7,Y]>>6,g=~-T?y:(n)y,d=BX=y,l]
  • This entry highlights the importance of comments in C.
  • This entry might result in an adjustment to the IOCCC size tool for the 2014 competition (see above).

Compiling on different platforms

This entry has been tested on Windows (compiled with MS Visual Studio 2010), Mac OS X (clang and gcc), and Linux (clang and gcc). The Makefile supplied is good for Mac OS X, Linux and probably other UNIXes. You will need to adjust the Makefile if your system lacks sdl-config to correctly point to the SDL libraries and header files. The entry should also compile unchanged on Android tablets/phones but this hasn’t been tested.

On UNIX-based systems we can get raw keystrokes using stty. However Windows has no stty. Therefore the Makefile includes a -D entry to define a “keyboard driver” KB which as it stands is suitable for UNIXes, but maybe not non-UNIX platforms. For example, for Windows/MS Visual Studio, instead of the Makefile definition of KB, use something slightly different - add the following entry to the Preprocessor Definitions list in the Project Properties page:

KB=(kb=H(8),kbhit())&&(r[1190]=getch(),H(7))

Usage

./cable3 bios-image-file floppy-image-file [harddisk-image-file]

PLEASE NOTE that under UNIXes the keyboard must be in raw mode for the emulator to work properly. Therefore the emulator is best run from a shell script that looks something like:

stty cbreak raw -echo min 0
./cable3 bios floppy.img harddisk.img
stty cooked echo

To run the emulator - floppy mode only

The simplest use of the emulator is with a single floppy boot disk image, like Dos6.22.img, provided.

Before running the emulator on a Unix-type system, stty needs to be used to put the keyboard into raw mode (and afterwards it needs to be put back to cooked). So, run the emulator using something like this script (provided as “runme”):

stty cbreak raw -echo min 0
./cable3 bios Dos6.22.img
stty cooked echo

To run the emulator - floppy + HD mode

Easiest to start with is to try a ready-made 40MB hard disk image containing a whole bunch of software:

http://bitly.com/1bU8URK

For the more adventurous, you can start off with (for example) a blank 40MB image file called hd.img made using e.g. mkfile. Then use:

stty cbreak raw -echo min 0
./cable3 bios Dos6.22.img hd.img
stty cooked echo

Preparing the hard disk for use in the emulator is done just like a real PC. Boot the emulator, and use FDISK to partition the hard disk. When it’s done FDISK will reboot the emulator. Then you can use FORMAT C: and you are done. The resulting disk image is in the right format to be mounted on a real Windows PC using e.g. OSFMount, on a Mac using hdiutil, or on Linux using mount, providing an easy way to copy files and programs to and from the disk image. Or, you can install programs from regular floppy disk images (see “Floppy disk support” below).

Keyboard emulation

The emulator simulates an XT-style keyboard controlled by an Intel 8042 chip on I/O port 0x60, generating IRQ1 and then interrupt 9 on each keypress. This is harder than it sounds because a real 8042 returns scan codes rather than the ASCII characters which the C standard I/O functions return. Rather than make the emulator less portable and use ioctl or platform-dependent equivalents to obtain real scan codes from the keyboard, the emulator BIOS does the reverse of a real PC BIOS and converts ASCII characters to scancodes, simulating press/release of the modifier keys (e.g. shift) as necessary to work like a “real” keyboard. The OS (DOS/Windows) then converts them back to ASCII characters and normally this process works seamlessly (although don’t be surprised if there are issues, for example, with non-QWERTY e.g. international keyboards).

Most of the time you can just type normally, but there are special sequences to get Alt+xxx and Fxxx.

To send an Alt+XXX key combination, press Ctrl+A then the key, so for example to type Alt+F, press Ctrl+A then F.

To send an Fxx key, press Ctrl+F then a number key. For example, to get the F4 key, press Ctrl+F then 4. To get F10, press Ctrl+F then 0.

To send a Page Down key, press Ctrl+F then O. To send a Page Up key, press Ctrl+F then E. Other key combinations are left for the discovery of the user.

Text mode support

The emulator supports both text output via the standard BIOS interrupt 0x10 interface, and also direct video memory access (one page, 4KB video RAM at segment B800) in 80x25 CGA 16-color text mode.

BIOS text output calls are converted to simple writes to stdout. Direct video memory accesses for the 80x25 CGA color text mode are converted to ANSI terminal escape sequences. If you are using a terminal which does not support ANSI (e.g. you are compiling the emulator with MS VC++ and running in a Windows console window) then PC applications that directly write to video memory in text mode may be unusable.

Most CGA I/O ports are not supported except for the CGA refresh register at 0x3DA, which some applications use for timing or synchronisation.

The regular PC character code page (437) includes various extended ASCII characters for things like line drawing. You might want to set the font in your terminal program to something that includes these (e.g. on Mac OS X there is a freeware font called Perfect DOS VGA 437 which does the trick).

Occasionally a DOS application on exit will leave the video hardware in an odd state which confuses the emulator, resulting in subsequent text output being invisible. If this happens, just use the DOS CLS command to clear the screen and all will be well again.

Graphics mode support

Hercules 720x348 monochrome graphics mode emulation is implemented using SDL. Most Hercules features are supported via the normal I/O interface on ports 0x3B8 and 0x3BA including video memory bank switching (segments B000/B800), which some games use for double-buffered graphics. CGA graphics modes are not supported.

When an application enters graphics mode, the emulator will open an SDL window (which will be closed when the application goes back to text mode). Including code to redirect keystrokes from the SDL window to the main terminal window would have busted the IOCCC size limits, so you need to keep the main emulator terminal window in focus at all times even when you are doing graphics (sounds a little odd but you will get used to it).

On UNIXes, SDL will automatically output graphics via X11 if the DISPLAY environment variable is set up.

Dual graphics card support

Some applications (e.g. AutoCAD) support a PC configuration with a CGA card and a Hercules card, for simultaneous text and graphics output on different displays. The emulator simulates this configuration, too, using separate windows for the (terminal) text and (SDL) graphics displays.

BIOS

Like a real PC, the emulator needs a BIOS to do anything useful. Here we use a custom BIOS, written from scratch specifically for the emulator. Source code for the BIOS (written in 8086 assembly language) which compiles with the freely-available NASM x86 assembler is available from the author on request.

The BIOS implements the standard interrupt interfaces for video, disk, timer, clock and so on, much as a “real” PC BIOS does, and also a small timer-controlled video driver to convert video memory formatting into ANSI escape sequences when the emulator is in text mode.

CPU and memory emulation

Memory map is largely as per a real PC, with interrupt vector table at 0:0, BIOS data area including keyboard buffer at 40:0, CGA text video memory at B800:0, Hercules dual-bank graphics memory at B000/B800:0, and BIOS at F000:100. Unlike a real PC, in the emulator the CPU registers are memory-mapped (at F000:0), which enables considerable optimisation of the emulator’s instruction execution unit by permitting the unification of memory and register operations, while remaining invisible to the running software.

CPU supports the full 8086/186 instruction set. Due to the complexities of the 8086’s arbitrary-length instruction decoding and flags, 8086 instructions are first converted to a simpler intermediate format before being executed. This conversion, along with instruction lengths and how each instruction modifies the flags, is assisted by some lookup tables which form part of the BIOS binary.

The CPU also implements some “special” two-byte opcodes to help the emulator talk with the outside world. These are:

0F 00 - output character in register AL to terminal
0F 01 - write real-time clock data (as returned by localtime) to memory location ES:BX
0F 02 - read AX bytes from disk at offset 512*(16*SI+BP) into memory location ES:BX. Disk is specified in DL (0 = hard disk, 1 = floppy disk)
0F 03 - write AX bytes at memory location ES:BX to disk at offset 512*(16*SI+BP). Disk is specified in DL as per 0F 02

Emulator exit is triggered if CS:IP == 0:0 (which would be nonsensical in real software since this is where the interrupt vector table lives). The supplied Dos6.22.img disk includes a small program QUITEMU.COM which contains a single JMP 0:0 instruction, to allow the user to easily quit the emulator without shutting down the terminal.

Floppy disk support

Emulates a 3.5" high-density floppy drive. Can read, write and format 1.44MB disks (18 sectors per track, 2 heads) and 720KB disks (9 sectors per track, 2 heads).

If you want to install your own software from floppy images (downloaded from e.g. Vetusware), the easiest way to “change disks” is to copy each disk image in turn over the floppy image file you specify on the command line. Don’t forget to put your original boot disk back at the end!

Hard disk support

Supports up to 1023 cylinders, 63 sectors per track, 63 heads for disks up to 528MB.

Disk image format used is a subset of the standard “raw” format used by most disk image mount tools. In general, disk images prepared by the emulator will work with disk image tools and other emulators, but not the other way around.

The emulator uses a particularly dumb algorithm to derive a simulated cylinder/sector/head geometry from the disk image file’s size. This algorithm often results in not all the space in the image file being available for disk partitions. For example, creating a 40,000,000 byte image file results in DOS FDISK seeing only 31.9MB as the volume size.

Note that unlike a real PC, the emulator cannot boot from a hard disk (image). Therefore, you will always need to use a bootable floppy image, even if after boot everything runs from the HD.

Mouse

No mouse is emulated.

Real-time clock

Reading the RTC (both time and date) is emulated via the standard BIOS clock interface, pulling the time/date from the host computer. Setting the time or date is not supported.

Timers

A countdown timer on I/O port 0x40 is simulated in a broken way which is good enough for most software. On a real PC this has a default period of 55ms and is programmable. No programmability is supported in the emulator and the period may be about right or completely wrong depending on the actual speed of your computer.

On a real PC, IRQ0 and interrupt 8 are fired every 55ms. The emulator tries to do the same but again, the delay period is uncalibrated so you get what you get.

PC speaker

Beeps only, through the console.

Software supported

The emulator will run practically any software a real PC (of the spec listed at the top of this file) can. The author has tested a number of OSes/GUIs (MS-DOS 6.22, FreeDOS 0.82pl3, Windows 3.0, DESQview 2.8), professional software (Lotus 1-2-3 2.4 and AsEasyAs 5.7 for DOS, Excel 2.1 for Windows, AutoCAD 2.5, WordStar 4), programming languages (QBASIC, GWBASIC, Turbo C++), games (Carrier Command, Police Quest, and a bunch of freeware Windows games), and diagnostic/benchmark software (Manifest, Microsoft MSD, InfoSpot, CheckIt) and all of them run well.

Screenshots of some of these applications running (on Mac OS X) are provided for the impatient.

Compiler warnings

A lot of compiler warnings are produced by clang. Missing type specifiers, control reaching the end of functions without returning values, incompatible pointer type assignments, and some precedence warnings, all necessary to keep the source size down. Other compilers are likely to produce similar warnings.

How Python 3 Should Have Worked (Aaron Swartz's Raw Thought)

$
0
0

Comments:"How Python 3 Should Have Worked (Aaron Swartz's Raw Thought)"

URL:http://www.aaronsw.com/weblog/python3


As a workaday Python developer, it’s hard to shake the feeling that the Python 2 to 3 transition isn’t working. I get occasional requests to make my libraries work in 3 but it’s far from clear how to and when I try to look it up I find all sorts of conflicting advice, none of which sounds very practical.

Indeed, when you see new 3.x versions rolling off the line and no one using them, it’s hard to shake the feeling that Python might die in this transition. How will we ever make it across the chasm?

It seems to me that in all the talk about Python 3000 being a new, radical, blue-sky vision of the future, we neglected the proven methods of getting there. In the Python 2 era, we had a clear method for adding language changes:

In Python 2.a, support for from __future__ import new_feature was added so you could use the new feature if you explicitly declared you wanted it. In Python 2.b, support was added by default so you could just use it without the future declaration. In Python 2.c, warnings begun being issued when you tried to use the old way, explaining you needed to change or your code would stop working. In Python 2.d, it actually did stop working.

It seems to me this process worked pretty well. And I don’t see why it couldn’t work for the Python 3 transition. This would mean mainly just:

A Python 2.x release that added support for from __future__ import python3.

Putting this at the top of a file would declare it to be a Python3 file and allow the interpreter to parse it accordingly. (I realize behind the scenes this would mean a lot of work to merge tr 2 and 3 interpreters, but honestly it would always have been better to have a unified codebase to maintain.)

Then if I wanted my Python 2 program to use some 3 modules, I just need to make sure those modules have the import line at the top. If I want to do a new release of my module that works on Python 3, I just need to declare that it only works in Python 2.x and higher and release a version that’s been run through 2to3 (with the new import statement). If my project is big, I can even port files to 3 one at a time, leaving the rest as 2 until someone gets around to fixing the rest. Most importantly, I can start porting to Puhon 3 without waiting for all my dependencies to do the same, parallelizing what until now has been a rather serial process.

Users know they can safely upgrade to 2.x since it won’t break any existing code. Developers know everyone will eventually upgrade to 2.x so they can drop support for earlier versions. But since 2.x supports code that also runs in 3, they can start writing and releasing code that’s future-compatible as well. Eventually the vast major code will work in 3 and users can upgrade to 3. (2.x will issue warnings to the remaining stragglers.) Finally, we can drop support for 2.x and all live happily having crossed the bridge together.

This isn’t a radical idea. It’s how Python upgrades have always worked. And unless we use it again, I don’t see how we’re ever going to cross this chasm.

You should follow me on twitter here.

March 9, 2012

Hi Aaron,

Been a while, mate. Anyway I think one can accept your points as valid without despairing about “crossing the chasm.” The transition from Python 1.x to 2.x was similarly slow, and I’m pretty sure 1.52 was still more heavily used by the time 2.2 emerged. I think the fact that it takes a while is actually a testament to Python’s success.

—Uche

posted by Uche Ogbuji on March 10, 2012 #

Some of these thoughts have occurred to me as well recently, except I’m a bit more pessimistic than you. I’ve debated writing a “Python’s Dark Age?” piece. It’s not just py3k, but pypy, and numpy+scipy that have me a bit gloomy with all the fragmentation of focus. In some ways I think Py3K wasn’t a big enough change if they were going to break backwards compatibility. They should have added much more thorough functional programming underpinnings, etc.. Implementation wise, CPython is terrribly slow, especially when compared to Javascript these days, which, if anything, is even more dynamic than Python, and has been approaching C-like speeds with the new JITs. And with cross-compiled languages like Coffeescript, even Python’s syntax is looking clunky in some ways.

PyPy is very promising but it is not numpy, scipy compatible, which are the main reasons I stick around in Python these days. PyPy is also 2.7 compat. I believe, so Python 3K is truly sitting out there all alone. Though I consider myself a Python guru, and use it quite a lot, there are a couple of other languages I would consider switching too for evertyhing if I could: CoffeeScript/JS + node, Lua + LuaJIT, Haskell, Mono C# even… have you seen the awesome new async features being added to C#? Sigh.

posted by Craig on March 10, 2012 #

Aaron, python will not die out because of 3x and i will explain why. I’m a senior in high school, i started coding 2 years ago on python and everybody suggested i start with 2.7 because there is no real “community” or books for 3x beginners yet. I then read an article dated 1999! about the early adapters of 2x when 1.5 was being phased out. It spoke about new coders or beginners being taught in 2x. How educational institutions, teaching blank slate students will give them a head start by early adapting 2x.guess what? thats exactly what happened, and who uses 1.5.2 now? that being said, 2x was backwards compatible with 1.5.2. The point is though, this year our compsci classes started coding in 3.2.2. a lot of teachers are starting to teach in 3.2.2. the younger guys are going to do the porting for popular 2x modules, its gonna take a little time but it will happen. you will start to see a lot of people in a few years who never used 2x but are pro coders in 3x. plus the problem i had 2 years ago is no longer a problem, there are plenty of great beginners books and on-line tutorials available now for 3x. don’t get left behind buddy, adapt!

posted by Mike on March 11, 2012 #

I have no doubt that Python 3 will win the day in the end. It is taking a while but it is no Perl 6.

I am a huge fan of Mono as well and the async stuff is going to be great. You cab even use IronPython on Mono and just use C# with async/await where it makes sense. Or you can use F# for key parts. The ability to mix and match languages in a single codebase is one the great strengths of Mono.

posted by Justin on March 12, 2012 #

You can also send comments by email.

Five Paragraph Essays - 42Floors

$
0
0

Comments:"Five Paragraph Essays - 42Floors"

URL:http://blog.42floors.com/five-paragraph-essays/


 

In my freshman college class, I had to do a peer review of another student’s essay.  It was on King Lear and the prompt was:  What was King Lear’s fatal flaw?

I was a good writer in high school, and I had studied King Lear in my AP English class.  As I read through my partner’s essay, I was astounded at how awful it was.  He was a smart guy.  Not just book smart but also clearly intelligent and persuasive as a contributor in class.  But this essay lacked so many of the fundamentals. 

In my review, I commented how he needed to better define the thesis statement out front so that the reader would know the purpose of the essay.  Likewise, he needed to have clear topic sentences to each paragraph so that each of his examples would be tied together. I noted that the concluding paragraph needed to tie the thesis back together and link to his initial introduction in order to leave the essay whole.

It was surprising to me how he could be missing such simple fundamentals. I told him to work on the basic structure before we tackled the actual content.

 

A couple of days later Professor Wheaton called me into his office.

“Jason, I read your review of your partner’s essay. It was a thoughtful attempt, but it was not what I’m looking for.”

I was a little stunned at first.  He continued.

“What was the name of your high school English teacher?”

“Mrs. Streeter,” I replied.

“AP English?”

“Yep.”

“5 on the exam?”

“Yes.”

“Okay. So this is going to sting a little, but I need to help you un-learn high school writing. I don’t want to demonize Mrs. Streeter, because it’s in fact not her fault. You learned how to write the perfect five paragraph essay, didn’t you? Because that’s what the AP exams test for.”

“Yeah,” I replied cautiously. “Is that not what you wanted?”

“Jason, I asked you to dissect King Lear. I’ve read this play 25 times, and I still can’t make up my mind about King Lear.  I have a Ph.D. in English literature and I teach Shakespeare for a living, and I can’t for the life of me make up mind what to think about King Lear. It’s his complexity that makes him so timeless. So in short, no, I did not think you could explain King Lear in a five paragraph essay. What I wanted you and your partner to do was put some real thought into the character of King Lear and write me an essay that helps unpack all of that. It could take you 5 paragraphs; it could take you 14 paragraphs.  The structure is unimportant.  Your writing should simply reflect the depth and clarity of your thought.

 

 

 

 

Fast forward to today. It’s time to leave Mrs. Streeter’s five paragraph essay behind.

The medium I use to write is now the blog post. There is no format that is perfect for a good blog post.  Sometimes an interesting post will start with a story or maybe a question I’m trying to answer. Some blog posts will get straight to the point.  Often I’ll try to include a set of tips at the bottom, but only if that makes sense for that particular post.

I bring this up because I talk to a lot of people who would like to do more blogging.  But when I read their drafts, I’m shocked to be looking at five paragraph essays again – smart people writing in overly formulated ways.  Locked into a high school level of writing.

So I wanted to share my one trick for helping people become better writers.  It’s really simple:

 

To become a better writer you have to stop writing and start speaking.

 

For a lot of people (obviously not all), it’s much easier to get  thoughts across naturally by talking about a subject instead of formally writing. I started doing this a few years ago, and I now dictate one hundred percent of my blog posts.  I’ll go on a long walk with my phone in my pocket, the voice memo turned on, and I’ll simply talk through a subject that I’ve been thinking about.  I then email the audio file to have it transcribed by a virtual assistant and Booyah!  I have my first draft.

And while editing and rewriting will occur from there, I’m far better at getting to a decent draft using this method than if I were to sit down at a coffee shop and try to write it out from scratch.  I encourage you to give it a shot.

If you want to get into blogging and your first few posts are coming out like five paragraph essays, try dictating a few.  You won’t necessarily get it right on the first try, but you may find that “writing” actually becomes enjoyable.

 

 

 

Discuss on Hacker News.

About Jason Freedman

Entrepreneur, Co-Founder at 42Floors, Co-Founder at FlightCaster, YC-alum, and a Tuck MBA

We Need Viable Search Engine Competition, Now | PEEBS.ORG

$
0
0

Comments:"We Need Viable Search Engine Competition, Now | PEEBS.ORG"

URL:http://peebs.org/2014/01/04/we-need-viable-search-engine-competition-now/


It’s become clear to me that we desperately need a viable competitor (or two) in the search engine space. A somewhat related thought I’ve been having is the (probably inaccurate) sensation that bringing out a viable competitor to Google may not be nearly as hard as it has appeared for the last decade.

We need competitors now. Most websites see more than 80% of their search engine traffic arriving from just Google, and this is not a good long term recipe for a vibrant internet.

Inherent Conflict of Interest
Google’s revenue model of placing paid ads next to organic search results operates under the (publicly accepted) belief that there’s a secure “Chinese wall” between the paid and organic functions. It was even more secure, some argued, because ultimately the short-term conflict between receiving revenue for rankings (paid) vs. displaying the best rankings (organic) was not a long-term conflict. Better organic results were always in Google’s interest, because these competitive results maintained their dominance and user’s trust. And so we believed. To be fair, I feel that Google does a somewhat decent job in this area, but I continue to feel that the user experience of Adwords exhibits various dark patterns (more about this here) and Google’s corporate inertia seems to be focused on a walled garden approach with G+ and Android. Lets just say that I’m no longer going to blindly trust Google in the face of a worrying conflict of interest that’s central to their most valuable product. Declining empires under siege are the ones you have to be careful of, after all.

Vulnerabile to Manipulation
Is there anything worse than “SEO”? The very idea of this industry, filled with people whose sole job is to attempt to manipulate Google is bad enough, but the fact that “black hat” SEO can produce material gains is genuinely worrying. Having had to clean up a mess created by a black hat (who insisted he wasn’t) and now in the middle of another mess of toxic back links that may or may not be generated by a competitor, the whole thing is just annoying, wasteful, and embarrassing for Google. I get that they’re trying to clean this up with Penguin and Panda and the various versions therein.

Arbitrary and Corrupt
When RapGenius violated Google’s SEO guidelines, they were only caught due to a public revelation on Hacker News, then immediately penalised by a human (to compensate for where their algorithm failed), then they were permitted to communicate directly with google to discuss ways out of this mess. Not it appears they’ve been fast-tracked back into the listings, albeit at somewhat of a disadvantage.

All aspects of this rub me the wrong way -

Google is making arbitrary rules on how sites should behave, because they have a monopoly. If they didn’t have a monopoly, they might not be able to make these arbitrary rules, and others might not follow them. Google needs these rules, because Google’s rankings are apparently trivial to game. Build a ton of links and make sure you don’t over-optimise your link text. That’ll do it for most key phrases, apparently, as long as you’re not completely obvious. There’s a clear incentive for “Bad Guys” to win using“Bad Ways”, that penalises good sites just trying to get on with business. Does anyone actually believe that the ridiculously obvious, poorly written link farms that Google catches periodically are the only examples out there? Smarter people doing a better job are gaming google all the time, and it appears to be getting worse. Google feels the right to at any time, and with zero due process, transparency, or appeal, to manually penalise sites who successfully ignore their rules yet exhibit a high ranking. This is not transparent, fair, or reliable. It is scary for legitimate businesses, and this kind of instability should not be the norm, but it is. The only organisations or individuals who can actually engage with Google over a penalisation or problem in any meaningful way are Silicon Valley favourites or companies backed by influential VCs, or [insert some other not-avaible-to-the-public recourse here]. This is the definition of corruption.

We Need A Competitive Alternative
Competition could provide a healthy response to many of these items. I don’t think regulation is the answer, but it may become one if these trends continue and intensify. A different revenue model could remove the conflict of interest, a better or different algorithm could be less prone to manipulation, and a search engine that prided itself on a transparent and efficient arbitration process for disputes with regards to rankings could win users trust. Of course, Google could also work on these problems themselves, but it seems like they’re more or less happy with the current state of affairs.

Is PageRank really the indomitable tech of our generation? Nobody can do better algorithmically, or integrate some kind of crowd sourced feedback, or measure browsing time and habits, or simply hand tune some of the most competitive key phrases? I’m sure I’m oversimplifying, but I wonder if we haven’t all been hypnotised by the complexity, much of which is marketing hype, and have missed the enormous opportunity that exists right in front of our noses. Does the next search engine have to be as big, involved in as many things, employ as many people, and fight on the same footing to be accomplish the goal of providing a counterpoint to Google?

Time will tell.

Like this:

LikeLoading...

Sen. Paul says he's suing over NSA policies

$
0
0

Comments:"Sen. Paul says he's suing over NSA policies"

URL:http://bigstory.ap.org/article/sen-paul-says-hes-suing-over-nsa-policies/


WASHINGTON (AP) — Republican Sen. Rand Paul says he is filing suit against the Obama administration over the data-collection policies of the National Security Agency. On his website, he's urging Americans to join the lawsuit, in his words, "to stop Barack Obama's NSA from snooping on the American people."

In an interview Friday night on the Fox News show "Hannity," the Kentucky Republican tells host Eric Bolling he believes everyone in the U.S. with a cellphone would be eligible to join the suit as a class action.

Paul says that people who want to join the suit are telling the government that it can't have access to emails and phone records without permission or without a specific warrant.

Paul says the lead lawyer in the suit is Virginia's former attorney general, Ken Cuccinelli.


What is the best cross-platform note taking tool? - Slant

Daring Fireball Linked List: The Problem Is With the Product

$
0
0

Comments:"Daring Fireball Linked List: The Problem Is With the Product"

URL:http://daringfireball.net/linked/2014/01/02/scoble-glass


Robert Scoble, in a long ramble on Google Glass:

I’m also worried at a new trend: I rarely see Google employees wearing theirs anymore. Most say “I just don’t like advertising that I work for Google.” I understand that. Quite a few people assume I work for Google when they see me with mine. I just hope it doesn’t mean that Google’s average employee won’t support it. That is really what killed the tablet PC efforts inside Microsoft until Apple forced them to react due to popularity of iPad.

Scoble has the cause and effect backwards. If Glass were a good product, people who have them would wear them. It’s that simple. Same with tablet PCs — the problem wasn’t that Microsoft employees wouldn’t use them and that the product thus lost momentum and didn’t catch on with consumers. The problem is that tablet PCs were crap products.

When your own employees don’t use or support your product, the problem is with the product, not the employees.

Thursday, 2 January 2014

XL97: Data Not Returned from Query Using ORACLE Data Source

$
0
0

Comments:" XL97: Data Not Returned from Query Using ORACLE Data Source "

URL:http://support.microsoft.com/kb/168702


This article was previously published under Q168702

WARNING:This information is preliminary and has not been confirmed or tested by Microsoft. Use only with discretion. Some or all of the information in this article has been taken from unconfirmed customer reports. Microsoft provides this information "as is" without warranty of any kind, either express or implied, including but not limited to the implied warranties of merchantability and/or fitness for a particular purpose.

The third-party products that are discussed in this article are manufactured by companies that are independent of Microsoft. Microsoft makes no warranty, implied or otherwise, regarding the performance or reliability of these products.

If you are a Small Business customer, find additional troubleshooting and learning resources at the Support for Small Business site.
To work around this problem, use one of the following methods:

Method 1: Turn Off the Enable Background Refresh Setting

To prevent the query from running in the background, follow these steps: Create your query in Microsoft Query, and then select the option to return the data to Microsoft Excel.In the Returning External Data to Microsoft Excel dialog box, click Properties.Click to clear the check box for Enable Background Refresh, and then click OK.To return the results of your query to the worksheet, click OK.NOTE: When you turn off the Enable Background Refresh option, you cannot do other tasks while this query runs.

Method 2: Move Your Mouse Pointer

If you move your mouse pointer continuously while the data is being returned to Microsoft Excel, the query may not fail. Do not stop moving the mouse until all the data has been returned to Microsoft Excel.

NOTE: Depending on your query, it may take several minutes to return the results of your query to the worksheet.

Method 3: Paste the Data in the Worksheet

To paste the data from Microsoft Query in the worksheet, follow these steps: In Microsoft Query, create the correct parameters for your query.In the data pane, select an item.Press CTRL+SHIFT+SPACEBAR to select all the data in the data pane.On the Edit menu, click Copy.In Microsoft Excel, on the Edit menu, click Paste. NOTE: When you use this method, the parameters that you used to query your database are not saved with the data, so you cannot use the Refresh Data command to update the results of the query. You must follow these steps each time that you want to update the query.

Article ID: 168702 - Last Review: October 21, 2002 - Revision: 1.0

APPLIES TO
  • Microsoft Excel 97 Standard Edition
  • Microsoft Query 2000
kb3rdparty kbinterop kbprb KB168702
Retired KB Content Disclaimer

This article was written about products for which Microsoft no longer offers support. Therefore, this article is offered "as is" and will no longer be updated.

Daft Punk “One More Time” sample: Video shows how the electronic music producers made one of their most famous songs, from Eddie Johns’ “More Spell on You.”

$
0
0

Comments:"Daft Punk “One More Time” sample: Video shows how the electronic music producers made one of their most famous songs, from Eddie Johns’ “More Spell on You.”"

URL:http://www.slate.com/blogs/browbeat/2014/01/03/daft_punk_one_more_time_sample_video_shows_how_the_electronic_music_producers.html


A lot of people who listen to hip-hop and electronic music might find the process of building beats a bit mystifying, especially since the genres’ instruments aren’t as readily identifiable as, say, the electric guitar. (The drum machine, the synthesizer, and the sampler aren’t as widely understood, leading some people to say they’re not “real instruments” at all.) This video, which just started going viral this week, should help demystify that process a bit.

In it, YouTube user SadowickProduction gives a step-by-step breakdown of how Daft Punk built the basic backing track for “One More Time,” using Eddie Johns’ “More Spell on You.” (The author of the video claims that Daft Punk has denied using the sample, but Vibesays it was officially cleared.)

A lot of people will see this video as proof of how building songs from samples takes a keen ear and a lot of skill:You’re not just “stealing” something, you're transforming it. Others will see it as proof of the opposite: Anyone can put a urinal on the wall and call it art. Of course, those people didn’t put a urinal on the wall, andthey aren’t Daft Punk.

Rap Genius Founders – Rap Genius is Back on Google | News Genius

$
0
0

Comments:"Rap Genius Founders – Rap Genius is Back on Google | News Genius"

URL:http://news.rapgenius.com/Rap-genius-founders-rap-genius-is-back-on-google-lyrics


It takes a few days for things to return to normal, but we’re officially back!

First of all, we owe a big thanks to Google for being fair and transparent and allowing us back onto their results pages. We overstepped, and we deserved to get smacked.

This post has 2 parts:

The history of our link-building strategy: how it started, evolved, and eventually spun out of control. The story of how we approached the problem of identifying and removing problematic links from the Internet and got back on Google.

The Story of the Rap Genius Blog Family

In the beginning (back in 2009, when it was just a few friends), we’d email sites like Nah Right, HipHopDX, Rap Radar, and NYMag and beg them to feature stuff like well-annotated songs, the Rap Map, etc. In this, we had minimal success.

Over the next couple years, the site and the community of annotators grew. Many of our contributors had music blogs, and some would link to Rap Genius pages when mentioning tracks in their posts. Our social media presence grew, and we made friends on Twitter with a handful of music bloggers.

So at this point, our blog family was just a series of blogs, many of which belonged to site contributors, whose content and tone aligned well with Rap Genius. They linked our song pages in relevant posts, and we helped promote their blogs. The spirit of this was broader than a quid-pro-quo link exchange – the bloggers linked to us because their readers were interested in our lyrics and annotations, not just because they wanted to help us. We plugged them because they had good, tweet-worthy blog posts, not just because we appreciated their help with promotion.

As we grew, our blog family began to get fancy. We began collaborations with major publications – including TheGrio, Esquire, Huffington Post, and The Atlantic – where they would ask us to write reviews and op-ed pieces on their sites. This was fun because it was chance to promote Rap Genius to bigger audiences, and, since the publications liked it when we linked to interesting Rap Genius content (top lines, top songs, etc), it also helped our search rankings.

Most of the time the links from our blog family and these guest posts were organically woven into the text. For example (with Rap Genius links highlighted):

Other times we would link an album’s entire tracklist at the bottom of a post, and we encouraged other bloggers to do the same. Eventually we made it very easy for bloggers to grab the links for a whole album by adding an “embed” button to our album pages. This produced posts that look like this:

This is a blog post about Reasonable Doubt, with the full tracklist linked below. This definitely looks less natural, but at the time we didn’t think we were breaking Google’s rules because:

This takes us up to the past couple months, when we did two things that were more or less totally debauched:

On guest posts, we appended lists of song links (often tracklists of popular new albums) that were sometimes completely unrelated to the music that was the subject of the post. We offered to promote any blog whose owner linked to an album on Rap Genius in any post regardless of its content. This practice led to posts like this:

This last one triggered the controversy that caused Google to blacklist us. It started when John Marbach wrote to Mahbod to ask him about the details of the “Rap Genius blog affiliate program” (a recent Mahbod coinage):

Mahbod wrote back, and without asking what kind of blog John had or anything about the content of the post he intended to write, gave him the HTML of the tracklist of Bieber’s new album and asked him to link it. In return, he offered to tweet exactly what John wanted and promised “MASSIVE traffic” to his site.

The dubious-sounding “Rap Genius blog affiliate program”, the self-parodic used car salesman tone of the email to John, the lack of any discretion in the targeting of a partner – this all looked really bad. And it was really bad: a lazy and likely ineffective “strategy”, so over-the-top in its obviousness that it was practically begging for a response from Google.

When Matt Cutts chimed in on the HackerNews thread, action from Google seemed inevitable. Sure enough, we woke up on Christmas morning to a “manual action” from Google, which bumped Rap Genius to the sixth page of search results for even queries like [rap genius].

How did we get back on Google?

Google’s manual action had the reason “Unnatural links to your site”, which they explain as “a pattern of unnatural artificial, deceptive, or manipulative links pointing to your site.” Google recommends a 4-step approach to fixing the problem:

Download a list of links to your site from Webmaster Tools. Check this list for any links that violate our guidelines on linking. For any links that violate our guidelines, contact the webmaster of that site and ask that they either remove the links or prevent them from passing PageRank, such as by adding a rel="nofollow" attribute. Use the Disavow links tool in Webmaster Tools to disavow any links you were unable to get removed.

So that morning we dug in.

First we assembled the biggest list of inbound Rap Genius links we could get our hands on. We did this by combining the list of “links to your site” from Webmaster Tools that Google recommends with the list of inbound links you can download from Moz link search tool Open Site Explorer. After some cleaning and de-duping, we ended up with a master list of 177,781 URLs that contained inbound links to Rap Genius.

Now we had to find out which of these URLs contained Rap Genius links that Google considered unnatural. The obvious place to start was the URLs associated with publications that we had promoted via Twitter or otherwise had relationships with. So we compiled a list of 100 “potentially problematic domains” and filtered the master list of 178k URLs that contained Rap Genius links to a new list of 3,333 URLs that contained inbound Rap Genius links and were hosted on one of our “potentially problematic domains"

Next we manually examined each of these 3,333 URLs and categorized each one based on whether it contained unnatural links.

Here are the groupings and counts we came up with:

Group 1: Contains no links or already counted in another group. These we discarded. (1,791 pages) Group 2: Contains links organically woven into the text (1,294 pages) Group 3: Contains relevant structured link lists (169 pages) Group 4: Contains irrelevant structured link lists (129 pages)

The URLs in Group 4 obviously had to go, but we decided to remove the links on the Group 3 URLs as well just to be safe. So we started emailing bloggers and asking them to take down links.

This was a good start, but we wanted to catch everything. There had to be a better way! Enter the scraper.

The Scraper

To be completely thorough, we needed a way to examine every single URL in the master list of 178k linking URLs for evidence of unnatural links so we could get them removed.

So we wrote a scraper to download each of the 178k URLs, parse the HTML and rank them by “suspiciousness”, which was calculated from:

  • The total number of Rap Genius song page links the URL contained (more links is more suspicious)
  • How many of those links contained “ Lyrics” (standardized anchor text is more suspicious)
  • Whether the links were in a “clump” (i.e., 2 or more links separated by either nothing or whitespace)

Then we visited and categorized the most suspicious URLs by hand.

Calculating suspiciousness for an individual URL is relatively straightforward with Nokogiri, but downloading all the URLs is more challenging. How did we do it?

Technical Digression: How to scrape 178k URLs in Ruby in less than 15 minutes

Ok, so you have 178k URLs in a postgres database table. You need to scrape and analyze all of them and write the analysis back to the database. Then, once everything’s done, generate a CSV from the scraped data.

The naive approach

Open-uri is probably the easiest way to download a URL in Ruby:

require'open-uri'urls.eachdo|url|analyze_response(open(url).read)end

But downloading and analyzing 178k URLs one at a time would take days – how do we make it faster?

Concurrency

To make this faster we need a way to download multiple URLs simultaneously: Ruby threads. So let’s create a bunch of threads to simultaneously grab URLs from a queue and download them. Something like this:

require'open-uri'require'thread'queue,processed=Queue.new,Queue.newurls.each{|u|queue<<u}concurrency=200concurrency.timesdoThread.new{processed<<open(queue.pop).readuntilqueue.empty?}endurls.length.times{analyze_response(processed.pop)}

But this is verbose and writing concurrent code can be quite confusing. A better idea is to use Typhoeus, which abstracts away the thread handling logic behind a simple, callback-based API. Here’s the functionality of the code above implemented using Typhoeus:

hydra=Typhoeus::Hydra.new(max_concurrency:200)urls.eachdo|url|hydra.queue(request=Typhoeus::Request.new(url))request.on_completedo|response|analyze_response(response.body)endendhydra.run

Now we can download 200 pages at once – nice! But even at this rate it still takes over 3 hours to scrape all 178k URLs. Can we make it faster?

Even More Concurrency

The naive way to make this faster is to turn up Typhoeus’s concurrency level past 200. But as Typhoeus’s documentation warns, doing this causes things to “get flakey”– your program can only do so many things at once before it runs out of memory.

Also, though turning up Typhoeus’s concurrency would increase parallelization in downloading the URLs (since this is IO-bound), the processing of each response is CPU-bound and therefore cannot be effectively parallelized within a single MRI Ruby process.

So to achieve more parallelism we need more memory and CPU power – what if we could get 100 machines each downloading 200 URLs at a time by running their own version of the program?

Sounds like the perfect job for Amazon EC2. But configuring, spinning up, and coordinating a bunch of EC2 nodes is annoying and you have to write a bunch of boilerplate code to get it going. If only there were a way to abstract away the underlying virtual machines and instead only think about executing 100 simultaneous processes. Fortunately Heroku does exactly this!

People think of Heroku as a platform for web applications, but it’s pretty easy to hack it to run a work queue application like ours. All you have to do is put your app into maintenance mode, scale your web dynos to 0, edit your Procfile, and spin up some worker processes. So we added the smallest database that supports 100 simultaneous connections, spun up 100 workers, and started scraping.

Now we had 100 instances of the program running simultaneously, each of which was scraping 200 URLs at a time for a 4 order of magnitude improvement over our naive solution. But..

How do we prevent workers from tripping over one another?

This latest solution is highly performant, but that performance doesn’t come for free – we have to add additional logic to make the workers to work together. Specifically, we must ensure that every worker grabs its own URLs to scrape and that no 2 workers ever end up scraping the same URLs at the same time.

There are a number of approaches to keeping the workers out of each other’s way, but for simplicity we decided to do it all in the database. The idea is simple: when a worker wants to grab a new set of 200 URLs to scrape, it performs an update on the URLs table to lock each row it selects. Since each worker only tries to grab unlocked URL rows, no 2 workers will ever grab the same URL.

We decided to implement this by borrowing some of the locking ideas from Delayed Job ActiveRecord:

scope:not_locked,->{where(locked_at:nil)}scope:unscraped,->{where(fetched:false).not_locked}defUrl.reserve_batch_for_scraping(limit)urls_subquery=unscraped.limit(limit).order(:id).select(:id).lock(true).to_sqldb_time_now=connection.quote(Time.now.utc.to_s(:db))find_by_sql<<-SQL UPDATE urls SET locked_at = #{db_time_now}, updated_at = #{db_time_now} WHERE id IN (#{urls_subquery})RETURNING * SQLend

And that’s it.

Ok fine, that’s not it – you still have to write logic to unlock URL records when a worker locks them but somehow dies before finishing scraping them. But after that you’re done!

The Results

With our final approach – using 100 workers each of which scraped 200 URLs at a time – it took less than 15 minutes to scrape all 178k URLs. Not bad.

Some numbers from the final run of scraping/parsing:

  • Total pages fetched: 177,755
  • Total pages scraped successfully: 124,286
  • Total scrape failures: 53,469
  • Time outs (20s): 15,703
  • Pages that no longer exist (404): 13,884
  • Other notable error codes (code: count):
  • 403: 12,305
  • 522: 2,951
  • 503: 2,759
  • 520: 2,159
  • 500: 1,896

If you’re curious and want to learn more, check out the code on GitHub

End of technical digression

The scraper produced great results – we discovered 590 more pages with structured linked lists and asked the relevant webmasters to remove the links.

However, the vast majority of the URLs the scraper uncovered were fundamentally different from the pages we had seen before. They contained structured lists of Rap Genius links, but were part of spammy aggregator / scraping sites that had scraped (sometimes en masse) the posts of the sites with whom we had relationships (and posts from Rap Genius itself).

Generally Google doesn’t hold you responsible for unnatural inbound links outside of your control, so under normal circumstances we’d expect Google to simply ignore these spammy pages.

But in this case, we had screwed up and wanted to be as proactive as possible. So we wrote another script to scrape WHOIS listings and pull down as many contact email addresses for these aggregation/scraping operations that we could find. We asked all the webmasters whose contact information we could find to remove the links.

Disavowing links we couldn’t get removed

We were very successful in getting webmasters we knew to remove unnatural links. Most of these folks were friends of the site and were sad to see us disappear from Google.

All in all, of the 286 potentially problematic URLs that we manually identified, 217 (more than 75 percent!) have already had all unnatural links purged.

Unsurprisingly we did not have as much success in getting the unnatural links flagged by our scraper removed. Most of these sites are super-spammy aggregators/scrapers that we can only assume have Dr. Robotnik-esque webmasters without access to email or human emotion. Since we couldn’t reach them in their mountain lairs, we had no choice but to disavow these URLs (and all remaining URLs containing unnatural links from our first batch) using Google’s disavowal tool.

Conclusion

We hope this gives some insight into how Rap Genius did SEO and what went on behind the scenes while we were exiled from Google. Also, if you find your website removed from Google, we hope you find our process and tools helpful for getting back.

To Google and our fans: we’re sorry for being such morons. We regret our foray into irrelevant unnatural linking. We’re focused on building the best site in the world for understanding lyrics, poetry, and prose and watching it naturally rise to the top of the search results.

Though Google is an extremely important part of helping people discover and navigate Rap Genius, we hope that this ordeal will make fans see that Rap Genius is more than a Google-access-only website. The only way to fully appreciate and benefit from Rap Genius is to sign up for an accountand use Rap Genius– not as a substitute for Wikipedia or lyrics sites, but as a social network where curious and intelligent people gather to socialize and engage in close reading of text.

Much love. iOS app dropping next week!

Viewing all 9433 articles
Browse latest View live