Quantcast
Channel: Hacker News 50
Viewing all articles
Browse latest Browse all 9433

An Inside Look at Lean Domain Search's Brandable Domain Name Generation Algorithm | Lean Domain Search

$
0
0

Comments:" An Inside Look at Lean Domain Search's Brandable Domain Name Generation Algorithm | Lean Domain Search "

URL:http://www.leandomainsearch.com/blog/42-an-inside-look-at-lean-domain-search-s-brandable-domain-name-generation-algorithm


16 May2013

Posted by Matt Mazur (@mhmazur)

At the end of March I launched Lean Domain Search's new Brandable Domain Names section. Brandable domain names, for those of you not familiar with the term, are domain names that can be used for a wide variety of websites. Think names like Obsera and Innoviza. The names don't convey the site's purpose so they can branded to use for pretty much anything. Since the launch over 1,000 brandable domain names have been released at a rate of 1 per hour. At the time of this writing, almost 20% of them have been registered.

I've received a few emails about how I generate these domain names so I figured I'd write up a short blog post explaining the process. It's slightly complicated, but hopefully by the end you will have a pretty good idea for how it works. This tutorial won't contain any code, though you are free to implement the algorithm on your own if you'd like to experiment with it.

How it Works

The key to generating good brandable domain names is to ensure that they are pronounceable. This is easier said than done though. If you throw a bunch of letters to together randomly you'll more than likely wind up with something that is entirely unpronounceable. What you need is some list of letter combinations that can be pronounced easily. While not comprehensive, a standard English dictionary is a great place to start.

What if we took every English word that ends in US and replaced the US with an A?

The English word list might look like this:

Replacing the trailing US's with A's changes the names to:

By generating domain names based on English words that we already know are pronounceable, we've managed to generate a list of domain name ideas that are also mostly pronounceable. There are some exceptions: if the original word ended in IUS or OUS then it now ends in IOA or OA which is not very pronounceable, but we can add some rules that say to disregard those when coming up with the actual list.

Using a dictionary of common English words is a good start, but it's somewhat limited. In the list of common English words that I am using, there are 602 words that end in US. Not bad, but not a huge number either. Is there somewhere else we can look?

Using the Zone File for Inspriation

Enter VeriSign's zone file. This list, published daily, contains most of the registered .com domain names in existence. With over 100 million domain names and counting, this is an invaluable resource for generating domain name ideas. For every registered domain name out there, someone decided that it was good enough to pay money for which means that it is more likely than not to be pronounceable. By looking at existing domain names and making slight modifications we can generate domain names of our own that are also likely produceable. There are a few things we need to do first though.

The zone file that I am using contains 609,888 .com domain names that end in US. If all we did was replace the trailing US with an A for all of those domain names you'd wind up with a pretty bad list of domain name ideas. For example, the original list would contain domain names such as:

Replacing the trailing US with A's results in:

Not a bad start, but there are still a lot of problems: some are not pronounceable, some contain numbers, and others contain words that make them unusable as company names (guerillamarketingpla will be read as Guerilla Marketing Pla, for example — a name no one would want to use). Some of these issues can be mitigated with various rules: only look at domain names of a certain length, ignore ones with numbers and dashes, etc, but you still have a problem that some convey meaning. ourcampa would pass all of our rules, but it's still not a good domain name. What to do?

The Importance of Common Roots

The root of a domain name is the part of the domain name that does not contain its suffix or prefix. For example, in a domain name like github, hub is the suffix and git is the root. In a domain name like ourcampus, our is the prefix and campus is the root. What if instead we said that ourcamp is the root and us is the suffix? That's not what most of would consider the root and suffix, but bear with me for a second.

What if you looked at all of the roots for domain names that end in US and compared it to the roots of all domain names that end in, say, IS. By looking at the roots that they have in common, we'll likely end up with a list of pretty good list of pronounceable roots. And because the roots are registered with multiple suffixes, there's a good chance that it doesn't have an actual meaning (for example, guerillamarketingplus would be a result for US, but guerillamarketingplis with an IS is unlikely to be registered so guerillamarketingpl wouldn't make the list of common roots).

There are 609,888 .com domain names that end in US which means there are 609,688 roots for those domain names. There are 540,887 .com domain names that end with IS and therefore 540,887 roots. Of those there are 26,782 roots in common. By adding a new suffix such as A to these common roots, you wind up with some pretty good domain name ideas. If you restrict it to results that are between 5 and 9 characters (all 1, 2, 3, and 4 letter .coms are registered and 10 or more for a brandable name tends to be too long), remove all of the names that contain numbers and dashes, and apply a few rules (no domain names that end in UA, YA, ZA, etc) you can reduce the list to a mere 10,021 domain name ideas. These domains include:

To drive the point home: if you replace the trailing A with a US or an IS then those domain names are registered. For example, the presence of Vizala means that Vizalus and Vizalis are registered.

At this point we have a list of domain name ideas but we haven't checked to see which are still available to register. After running them through an availability checking script, we're left with 1,211 domain names. These include:

Not bad, right?

The final step for me is to manually review these results. As good as this method is at generating available brandable domain names, it still comes up with some bad names so I wind up reviewing the results and selecting which ones to add to Lean Domain Search.

By playing with which suffixes it checks the roots for and what new suffixes to add to the common roots, this algorithm can be used to generate thousands of great available domain names.

Summary

To recap, here's how the algorithm works:

Determine all of the registered .com domain names that end with specific suffixes (US and IS in this case) Figure out the roots for those domain names and determine which ones they have in common Add a new suffix to those roots (A in this case) Programmatically remove domain names that indicate low quality (numbers, dashes, length, certain letter combinations, etc) Check which of those domain names are available Manually review the results for quality

If you have any questions, feedback, or ideas on how to improve it, please feel free to leave a comment or email me at matt@leandomainsearch.com.

Thanks!
Matt

View comments on HackerNews


Viewing all articles
Browse latest Browse all 9433

Trending Articles