Nepal pictures, history, people, facts

 
FireBoard
Welcome, Guest
Please Login or Register.    Lost Password?
regex Speed comparison of regex versus index, lc, and / /i (1 viewing) (1) Guests
Go to bottom Post Reply Favoured: 0
TOPIC: regex Speed comparison of regex versus index, lc, and / /i
#40046
Ben Bullock (Visitor)
Click here to see the profile of this user
Birthdate:
regex Speed comparison of regex versus index, lc, and / /i  
In a recent discussion on this newsgroup, it was mentioned that index is better for matching fixed strings than using regular _expression_s.  Coincidentally I've recently been setting up a search system for a fairly large volume (about 30 megabytes) of text files, and as a first approximation for the search system I made a simple routine to open each file and search for the string in the file using index . As a test of the proposition that index is better than regexes, I also tried using a regex to do the same job. My results were that the version using regexes was almost identical (within a few percent) in speed to index , leading me to think that under the bonnet index is probably just using a regex anyway. Furthermore, the biggest bottleneck in the code wasn't the pattern matching, but the use of lc to convert the text into lower case for case insensitive search. I found that saving all the text files as lower case before doing the matching, rather than converting the strings using lc, saved more than half of the total execution time, so the difference between index and regexes was not even significant compared to the time spent converting to lower case. I also found that the i option to the regex similarly meant that the regex ran drastically slower. Similarly another big bottleneck I identified was conversion of the files - opening the (utf8 encoded) files with open my $file, <:utf8 , $filename; saved about 30% of the total execution time compared to converting the text after reading it in. So my conclusion is that index isn't necessary and one can always use regexes - unless anyone can prove otherwise - and Perl's regexes are so fast that they may not be much of a bottleneck. But why lc should be so slow I don't know.
 
Report to moderator   Logged Logged  
  The administrator has disabled public write access.
#40047
John W. Krahn (Visitor)
Click here to see the profile of this user
Birthdate:
O:regex Speed comparison of regex versus index, lc, and / /i  
bottleneck in the code wasn't the pattern matching, but the use of lc to convert the text into lower case for case insensitive search. I found that saving all the text files as lower case before doing the matching, rather than converting the strings using lc, saved more than half of the total execution time, so the difference between index and regexes was not even significant compared to the time spent converting to lower case. I also found that the i option to the regex similarly meant that the regex ran drastically slower. Similarly another big bottleneck I identified was conversion of the files - opening the (utf8 encoded) files with open my $file, <:utf8 , $filename; saved about 30% of the total execution time compared to converting the text after reading it in. So my conclusion is that index isn't necessary and one can always use regexes - unless anyone can prove otherwise - and Perl's regexes are so fast that they may not be much of a bottleneck. But why lc should be so slow I don't know. I assume that most of the slowdown was caused by the introduction of the use of UTF, etc. John
 
Report to moderator   Logged Logged  
  The administrator has disabled public write access.
#40048
Ben Bullock (Visitor)
Click here to see the profile of this user
Birthdate:
O:regex Speed comparison of regex versus index, lc, and / /i  
I assume that most of the slowdown was caused by the introduction of the use of UTF, etc. No - the lc -related slowdown was experienced even if I read in the files as bytes and did not convert them into anything. I'm sure of this because I converted to using UTF-8 halfway through coding because of an unrelated problem, and by that point I'd already noticed that lc or / /i more than doubled the time of the program execution. In fact at the same time that I converted the searched files into UTF-8, I also converted them to lower case.
 
Report to moderator   Logged Logged  
  The administrator has disabled public write access.
#40049
nolo contendere (Visitor)
Click here to see the profile of this user
Birthdate:
O:regex Speed comparison of regex versus index, lc, and / /i  
Just the opposite.  AFAIK if searching for a literal string (as opposed to a regular _expression_ pattern) the regexp engine will use the same algorithm as index(). I don't know what it does internally, but actually using non-literal strings in the regular _expression_ match like something|else or first.*second did not result in a significant slowdown. The search string did not change at all during the execution of the program, so the regular _expression_ would only have been compiled once. I assume that most of the slowdown was caused by the introduction of the use of UTF, etc. No - the lc -related slowdown was experienced even if I read in the files as bytes and did not convert them into anything. I'm sure of this because I converted to using UTF-8 halfway through coding because of an unrelated problem, and by that point I'd already noticed that lc or / /i more than doubled the time of the program execution. In fact at the same time that I converted the searched files into UTF-8, I also converted them to lower case. Could you post the code you used to compare, as well as the output? I'm assuming you used Benchmark, please correct me if I'm wrong. Also, what's the output of perl -V, and what are your system specs?
 
Report to moderator   Logged Logged  
  The administrator has disabled public write access.
#40050
O:regex Speed comparison of regex versus index, lc, and / /i  
So my conclusion is that index isn't necessary and one can always use regexes True.  On the other hand, Perl isn't necessary and one can always use other languages.  Computers aren't necessary and one can always use paper and pencil.  Where is this headed? And of course, if you are interested in where the string matches (i.e. the return value of index, and not just whether or not it is -1) then it is simpler to get it from index than from a regex. Xho
 
Report to moderator   Logged Logged  
  The administrator has disabled public write access.
Go to top Post Reply
Powered by FireBoardget the latest posts directly to your desktop
 

Login Form






Lost Password?
No account yet? Register

[+]
  • Narrow screen resolution
  • Wide screen resolution
  • Auto width resolution
  • Increase font size
  • Decrease font size
  • Default font size
  • default color
  • blue color
  • green color
Skuteczne Pozycjonowanie stron internetowych
Relocation Poland - oklahoma divorce laws - rolex replica - Hostels Krakow - Ship - Polish Pottery - Mystery shopper - accounting Romania - http://www.infakta.de - Photo Etching - pozycjonowanie - Li-Ion Battery for IBM - Städ - Credit counseling - celebrity gossip
soccer leagues
Ratownictwo medyczne
ratownictwo medyczne, pierwsza pom…
www.pierwszapomoc.c…
opisy od¿ywek
opisy od¿ywek, opisy od¿ywek
www.opisy.musclezon…
Ustalanie ojcostwa
Ustalanie ojcostwa
www.ojcostwo-testy.…
Dragon Ball
Dragon Ball
www.kreskowka.pl
Forum
Forum, Forum dyskusyjne
www.forumowo.net
osuszanie tanie linie lotnicze Scarlet Cytaty og³oszenia bia³ystok W³osy