c# - \d less efficient than [0-9]

Comments:"c# - \d less efficient than [0-9] - Stack Overflow"

URL:http://stackoverflow.com/questions/16621738/d-less-efficient-than-0-9

I made a comment yesterday on an answer where someone had used [0123456789] in a regex rather than [0-9] or \d. I said it was probably more efficient to use a range or digit specifier than a character set.

I decided to test that out today and found out to my surprise that (in the c# regex engine at least) \d appears to be less efficient than either of the other two which don't seem to differ much. Here is my test output over 10000 random strings of 1000 random characters with 5077 actually containing a digit:

Regex \d took 00:00:00.2141226 result: 5077/10000
Regex [0-9] took 00:00:00.1357972 result: 5077/10000 63.42 % of first
Regex [0123456789] took 00:00:00.1388997 result: 5077/10000 64.87 % of first

It's a surprise to me for two reasons, that I would be interested if anyone can shed some light on:

I would have thought the range would be implemented much more efficiently than the set. I can't understand why \d is worse than [0-9]. Is there more to \d than simply shorthand for [0-9]?

Here is the test code:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Diagnostics;
using System.Text.RegularExpressions;
namespace SO_RegexPerformance
{
 class Program
 {
 static void Main(string[] args)
 {
 var rand = new Random(1234);
 var strings = new List<string>();
 //10K random strings
 for (var i = 0; i < 10000; i++)
 {
 //generate random string
 var sb = new StringBuilder();
 for (var c = 0; c < 1000; c++)
 {
 //add a-z randomly
 sb.Append((char)('a' + rand.Next(26)));
 }
 //in roughly 50% of them, put a digit
 if (rand.Next(2) == 0)
 {
 //replace 1 char with a digit 0-9
 sb[rand.Next(sb.Length)] = (char)('0' + rand.Next(10));
 }
 strings.Add(sb.ToString());
 }
 var baseTime = testPerfomance(strings, @"\d");
 Console.WriteLine();
 var testTime = testPerfomance(strings, "[0-9]");
 Console.WriteLine(" {0:P2} of first", testTime.TotalMilliseconds / baseTime.TotalMilliseconds);
 testTime = testPerfomance(strings, "[0123456789]");
 Console.WriteLine(" {0:P2} of first", testTime.TotalMilliseconds / baseTime.TotalMilliseconds);
 }
 private static TimeSpan testPerfomance(List<string> strings, string regex)
 {
 var sw = new Stopwatch();
 int successes = 0;
 var rex = new Regex(regex);
 sw.Start();
 foreach (var str in strings)
 {
 if (rex.Match(str).Success)
 {
 successes++;
 }
 }
 sw.Stop();
 Console.Write("Regex {0,-12} took {1} result: {2}/{3}", regex, sw.Elapsed, successes, strings.Count);
 return sw.Elapsed;
 }
 }
}

Maybe \d deals with locales. E.g. Hebrew uses letters for digits. – Barmar 2 days ago Basically, when you have to deal with Unicode, then it is going to be much slower (since it has to do more checks). – nhahtdh 2 days ago @Barmar Hebrew does not use letters for digits normally, rather the same latin numeral digits [0-9]. Letters can be substituted for digits, but this is a rare use and reserved for special terms. I would not expect a regex parser to match כ"ג יורדי סירה (with כ"ג being a substitue for 23). Also, as can be seen in Sina Iravanian's answer, Hebrew letters do not appear as valid matches for \d. – Yuval Adam 7 hours ago

show 2 more comments

c# - \d less efficient than [0-9] - Stack Overflow

Trending Articles

Mp3 Download: Mdu - Mazola

[MP3] Okpo Recordz Virus & Texzy –“Raba Raba” (Prod. by Exy Pro)

Missing boy, Queens Quay West and Bathurst Street area, Javin Dillon, 15

usage of CSRF token in ABAP report for POST request

Gulabi kallu Lyrics and translation | GAV / Govindhudu andhari vadele (2014)

SAHARA FLASH LIVE IN WERAGOLLA 2018-04-20

99 God Status for Whatsapp, Facebook

Grimsby sex-swap teen Nicole beats the bullies

Portable iSkysoft PDF Editor 5.6.0.1

New curfew for accused Brathwaite

Chitown Wiseguy Cashed In His Chips In Winter Of ’20, Made Bones In Chicago...

Karimnagar District Police Office Mobile Numbers List in Telangana State

Materials Around Us Class 6 Worksheet Science Chapter 6

GTA 5 PPSSPP Zip File Download For Android Mediafire 382 MB

Moondru Mudichu 20-07-2016 – Polimer tv Serial

Practice Sheet of Right form of verbs for HSC Students

VMOU RSCIT Result 2017, RSCIT Result VMOU rkcl.vmou.ac.in Name Wise

AVS4YOU Products Patcher v1.4 By RADIXX11

Troubleshooting Connectivity #9 –ローカル接続でネットワークエラーとはこれいかに？

Bureau of Internal Revenue: Regional Offices (Directory)