Quantcast
Channel: Hacker News 50
Viewing all articles
Browse latest Browse all 9433

Dynamics of Hacker News

$
0
0

Comments:"Dynamics of Hacker News"

URL:http://mayank.lahiri.me/writing/hackernews/index.html


April 28th, 2013

Hacker News is an interesting microcosm of Silicon Valley, where startups and side projects are regularly launched. Users submit links with short descriptions, which are shown on a single page here. Other users vote on submissions, and each vote gives submissions a chance to hit the coveted front page and benefit from a surge in visitor traffic. During peak hours, a submitted link may fall off the newest submission page quite quickly, which has lead to many analyses of the "best time" to submit to Hacker News. Some examples: HNPickup (apparently defunct), one here (without source data), a Quora question, and another analysis by a Hacker News regular. This is another addition to that list, but one with downloadable source data.

Collecting the Data

The robots.txt file specifies a minimum crawl delay of 30 seconds, so I chose a reasonable value of 5 minutes between crawls of just the /newest page (this took some tweaking to get right, as can be seen in the dataset). I let the crawler run from March 18th to April 16th, during which it collected 5,790 snapshots of the newest submissions page. There were some periods where my server went down, or my crawler was (presumably) banned from Hacker News, which explains the discrepancy from the expected number of points.

Each snapshot is parsed and put into a MySQL table with the following columns:

  • href Submitted link
  • pos Position of link on page (1=top, 30=bottom)
  • title Title of submission
  • delay Approximate time since submission, relative to snapshot time
  • snapshot_epoch Timestamp of snapshot
  • submit_epoch Snapshot timestamp less delay
  • score Upvotes for submission at snapshot time
  • submitter Username of submitter
Download the MySQL table dump (2.6 MB compressed, 22 MB uncompressed), or continue reading for some analysis.

The Best Time to Submit

I'll be the first to admit that any stated "best time to submit" is a specious assertion. No amount of careful timing will magically boost linkspam to the front page, but it's also probably true that the amount of time a link spends on the /newest page affects it's chances of making it to the front page, all other factors being equal.

The simplest analysis is to measure the number of positions each submission falls between snapshots. This should be correlated with the submission rate, but is affected by the fact that upvotes increase the amount of time a submission spends on the /newest page. With almost four weeks of data, Fig. 1 shows the mean and median "fall rates" for submissions broken down by hour, in Pacific Standard Time.

Fig 1. Mean and median positions dropped per minute by hour of week.

On a Tuesday at 8am Pacific, new submissions drop an average (and median) of more than 1 position per minute. This means a story submitted around 8am will be off the /newest page in under half an hour, since there are 30 slots on the page. Fig. 2 is a more human-readable version of Fig. 1 that shows the amount of time a new submission lingers on the first of the /newest pages.

Fig 2. Mean and median time spent on /newest homepage by hour of week.

Voting Behavior

Assuming that a submission is worthy of upvotes, how does it go about procuring them? All submissions start with a score of 1, which increases by 1 for each upvote received. Hacker News does not allow submissions to be downvoted. From Fig 3(a), we can see that 60% of stories that are upvoted (and reach a score of 2) fall off the /latest page with just that one single upvote (a kindly friend of the submitter, perhaps). About 90% fall off with 10 or fewer upvotes.

What's more interesting is the amount of time till a submission acquires its first upvote. We can break this down by two types of submissions for convenience: those that received at least 1 upvote, and those that have received at least 10 upvotes before they fall off the /newest page (the former set is contained within the latter). Fig 3(b) shows the distribution of times till the first upvote for both these classes of stories. Interesting tidbit: 50% of submissions that are going to receive at least 1 upvote do so within 11 minutes of submission. However, more than 50% of submissions that going to receive at least 10 upvotes receive their first upvote within 5 minutes of submission. Hacker News sniffs out interesting stuff quickly.

If you submit to Hacker News, you probably only need to wait half an hour to figure out how your well your submission is going to fare.

Fig 3(a). The cumulative distribution of the highest score attained by a submission while still on the /newest page. Fig 3(b). For submissions that were upvoted, the time in minutes from submission till the first upvote was seen.

A Dubious Trick

In the course of sanity-checking my parser, I discovered a number of instances of an interesting phenomena: a URL submission seen at time T with score S would appear at a later time with a score less than S. It would appear that some submissions are reducing in score, i.e., being downvoted, even though Hacker News does not allow duplicate submissions, or submissions to be downvoted. Here are three examples (usernames omitted):

Snapshot time Score Submission time URL Apr 2 05:42 2 Apr 2 04:54 http://www.youtube.com/watch?v=zPrEgGAXdhI Apr 2 06:41 1 Apr 2 06:38 http://www.youtube.com/watch?v=zPrEgGAXdhI Mar 28 07:01 5 Mar 28 06:43 http://demo.peerkit.com/static/index_demo.html Mar 28 12:47 1 Mar 28 12:47 http://demo.peerkit.com/static/index_demo.html Mar 22 06:51 2 Mar 22 06:18 http://blog.smartbear.com/software-quality/bid/275689/What-Makes-Beautiful-Software Mar 22 07:07 1 Mar 22 07:01 http://blog.smartbear.com/software-quality/bid/275689/What-Makes-Beautiful-Software

In many of these cases, users are clearly using a loophole that allows submissions to be deleted by the person who submitted them. These users are then free to re-submit the same links, effectively increasing the amount of time that their submissions stay on the /newest page. There are on the order of hundreds of instances of this type of behavior, indicating that it's a well-known but not widely prevalent trick. It's probably a loophole that should be patched.

Summary

  • A submission will stay on the /newest page the longest after 9pm PST, and on Sundays. The worst times are between 6am PST and 3pm PST on weekdays. The absolute worst time to post is Tuesday at 8am PST, and the absolute best is Saturday or Sunday at 9pm PST, where "best" is defined as staying on the page longer.
  • Stories that get 10 or more upvotes while still on the /newest page tend to get their first upvote within 5 minutes of submission. If a submission hasn't gotten any upvotes in 20 minutes, it's highly unlikely that it will get at least 10 upvotes.
  • There is a loophole that allows you to submit, wait a few hours, delete the submission, and then re-submit to maximize time on the /newest page. This is probably not a worthwhile tactic, although it's in wide use.
  • Download the MySQL table dump, a Python script forgenerating data used in the figures, an R script forgenerating the plots here, and another Python script for finding instances of the submit-delete-resubmit phenomenon.

Viewing all articles
Browse latest Browse all 9433

Trending Articles