Tuesday, January 18, 2011

How fast can we replace multiple strings at once?

Always check out the original article at http://www.oraclequirks.com for latest comments, fixes and updates.

One of my favorite techniques for sending emails from Apex is to put the text of the message in a page region and then retrieve it through the apex dictionary. In most cases this text contains some substitution strings that i need to replace with a dynamic value, pretty much in the same way Apex does itself when rendering a page.
When i need to process a large number of emails, i usually "cache" the mail message body in a global variable initialized inside the package body initialization section, such that the apex repository is queried only when a new oracle session is created or the package body is invalidated for some reason.

Whether it is better to store all these email texts in a separate page or inside the relevant page with a display condition of "never" is a matter of taste, the most important aspect is in the fact that i am keeping these "user interface" components together with my application and when i export it, I can be sure that my email bodies will follow without the need of any additional operation. Moreover they are located in a place where they can be easily edited.

While i am using the aforementioned approach since years, this morning i decided to investigate a little further the performance aspects of replacing my custom substitution strings with their dynamic values, so i set up some search and replace benchmarking code in order to understand if there was a method better than others.
When it comes to measure optimizations like these, you must consider that most likely you won't perceive any visible difference unless you iterate the operation on a large scale. This is the reason why in the code below i had to repeat the same operation 100000 times to begin seeing some variation.

Also, when running such benchmarks, you should perform the operation in a controlled environment (no users kicking in while doing the job) and repeat the test cases several times to ensure that you are comparing consistent samples, excluding the first run where the parse time will adversely impact the measure.
That's why i repeated each test case five times recording the best and worst performance, excluding the first run of each test case.
-- first method: explicit IFs
declare
l_body1 varchar2(4000);
l_body2 varchar2(4000);
p_s0 varchar2(50) := '#CODICE#';
p_v0 varchar2(50) := 'SI0000001';
p_s1 varchar2(50) := '#DENOMINAZIONE#';
p_v1 varchar2(50) := 'prova';
p_s2 varchar2(50) := '#USERNAME#';
p_v2 varchar2(50) := 'U0000001';
p_s3 varchar2(50) := '#PASSWORD#';
p_v3 varchar2(50) := 'test';
p_s4 varchar2(50);
p_v4 varchar2(50);
p_s5 varchar2(50);
p_v5 varchar2(50);
p_s6 varchar2(50);
p_v6 varchar2(50);
p_s7 varchar2(50);
p_v7 varchar2(50);
p_s8 varchar2(50);
p_v8 varchar2(50);
p_s9 varchar2(50);
p_v9 varchar2(50);
begin
select region_source
into l_body1
from apex_application_page_regions
where application_id = 55001 and
page_id = 9999 and
static_id = 'mail_message';

for i in 1..100000 loop
if p_s0 is not null then
l_body2 := replace(l_body1, p_s0, p_v0);
else
l_body2 := l_body1;
end if;
if p_s1 is not null then
l_body2 := replace(l_body2, p_s1, p_v1);
end if;
if p_s2 is not null then
l_body2 := replace(l_body2, p_s2, p_v2);
end if;
if p_s3 is not null then
l_body2 := replace(l_body2, p_s3, p_v3);
end if;
if p_s4 is not null then
l_body2 := replace(l_body2, p_s4, p_v4);
end if;
if p_s5 is not null then
l_body2 := replace(l_body2, p_s5, p_v5);
end if;
if p_s6 is not null then
l_body2 := replace(l_body2, p_s6, p_v6);
end if;
if p_s7 is not null then
l_body2 := replace(l_body2, p_s7, p_v7);
end if;
if p_s8 is not null then
l_body2 := replace(l_body2, p_s8, p_v8);
end if;
if p_s9 is not null then
l_body2 := replace(l_body2, p_s9, p_v9);
end if;
end loop;
end;

-- best 1.635s, worst 1.716s

-- second method, brute force replace
declare
l_body1 varchar2(4000);
l_body2 varchar2(4000);
p_s0 varchar2(50) := '#CODICE#';
p_v0 varchar2(50) := 'SI0000001';
p_s1 varchar2(50) := '#DENOMINAZIONE#';
p_v1 varchar2(50) := 'prova';
p_s2 varchar2(50) := '#USERNAME#';
p_v2 varchar2(50) := 'U0000001';
p_s3 varchar2(50) := '#PASSWORD#';
p_v3 varchar2(50) := 'test';
p_s4 varchar2(50);
p_v4 varchar2(50);
p_s5 varchar2(50);
p_v5 varchar2(50);
p_s6 varchar2(50);
p_v6 varchar2(50);
p_s7 varchar2(50);
p_v7 varchar2(50);
p_s8 varchar2(50);
p_v8 varchar2(50);
p_s9 varchar2(50);
p_v9 varchar2(50);
begin
select region_source
into l_body1
from apex_application_page_regions
where application_id = 55001 and
page_id = 9999 and
static_id = 'mail_message';

for i in 1..100000 loop
l_body2 := replace(l_body1, p_s0, p_v0);
l_body2 := replace(l_body2, p_s1, p_v1);
l_body2 := replace(l_body2, p_s2, p_v2);
l_body2 := replace(l_body2, p_s3, p_v3);
l_body2 := replace(l_body2, p_s4, p_v4);
l_body2 := replace(l_body2, p_s5, p_v5);
l_body2 := replace(l_body2, p_s6, p_v6);
l_body2 := replace(l_body2, p_s7, p_v7);
l_body2 := replace(l_body2, p_s8, p_v8);
l_body2 := replace(l_body2, p_s9, p_v9);
end loop;
end;

-- best 1.766s, worst 1.867s

--third method, using collections as parameter for increased flexibility
create or replace
function multi_replace(
p_str in varchar2,
p_search_arr in apex_application_global.vc_arr2,
p_replace_arr in apex_application_global.vc_arr2)
return varchar2 as
l_str varchar2(32767);
begin

if p_search_arr.count != p_replace_arr.count then
raise_application_error(-20001, 'arrays count do not match');
end if;

l_str := p_str;
for i in 1..p_search_arr.count loop
l_str := replace(l_str, p_search_arr(i), p_replace_arr(i));
end loop;
return l_str;
end;
/
declare
l_body1 varchar2(4000);
l_body2 varchar2(4000);
l_s_tab apex_application_global.vc_arr2;
l_v_tab apex_application_global.vc_arr2;
begin
select region_source
into l_body1
from apex_application_page_regions
where application_id = 55001 and
page_id = 9999 and
static_id = 'mail_message';

l_s_tab(1) := '#CODICE#';
l_v_tab(1) := 'SI0000001';
l_s_tab(2) := '#DENOMINAZION#';
l_v_tab(2) := 'prova';
l_s_tab(3) := '#USERNAME#';
l_v_tab(3) := 'U0000001';
l_s_tab(4) := '#PASSWORD#';
l_v_tab(4) := 'test';

for i in 1..100000 loop
l_body2 := multi_replace(l_body1, l_s_tab, l_v_tab);
end loop;
end;

-- best 1.679s, worst 1.748s

So, the first method seems slightly faster, albeit the code is longer. The first technique could become a little bit faster if we assume that substitution strings are "dense", that is if no susbstitution string is given for p_s1, then we assume that subsequent strings must be all NULLs and replace the distinct IFs with a series of ELSIFs.
The third method is slightly slower but it adds some flexibility as there are no restrictions in terms of the total number of substitution strings to be processed and their "density".

Unless someone else finds a better way of doing this (please let me know!), the conclusion is that all the three methods perform in comparable times and you won't notice any practical difference regardless of the method, therefore I'd go for the third method because it gives maximum flexibility to the programmer.

2 comments:

Buzz Killington said...

I use Adrian Billington's freplace function all the time. It is great for creating templates with a bunch of substitution tokens and replacing them at run-time. Code is very small too which I like.

http://www.oracle-developer.net/content/utilities/replacef.sql

Byte64 said...

Hi Buzz,
i see however i don't like very much this implementation as it is strictly positional.
As far as i can see it only works for single occurrences of each token and you need to provide them in the exact order as they appear.
Considering that probably you'd get very similar execution times if not higher, I'd stick to my MULTI_REPLACE, but it could make sense to overload MULTI_REPLACE such that it can be executed inside SQL cursors whereas currently it works only inside PL/SQL.

Thank you!
Flavio

yes you can!

Two great ways to help us out with a minimal effort. Click on the Google Plus +1 button above or...
We appreciate your support!

latest articles