More than once customers wanted me to tune queries where the LIKE clause made the query very slow…
The easy answer is : you cannot. If you want to search for some characters in the middle of the string, Oracle will not use an index.
Is it a correct answer? Maybe not.
Let’s imagine my application allow you to search for a string in the middle of the name, but to avoid scanning too large amount of data, the application enforces the string to be at least 3 characters long. For example ABC.
In this case, instead of doing a full table scan of the table to retrieve only a few rows, we can use an hint to tell Oracle to use an index on the name:
CREATE TABLE lsc_t AS
SELECT
ROWNUM ID,
SUBSTR(DBMS_RANDOM.STRING ('n', DBMS_RANDOM.VALUE (2, 8))
|| ' '
|| DBMS_RANDOM.STRING ('n', DBMS_RANDOM.VALUE (2, 8))
,1,17) NAME,
TRUNC(SYSDATE-ABS(10000*DBMS_RANDOM.NORMAL)) birthdate,
LPAD('X',4000,'X') address
FROM DUAL
CONNECT BY LEVEL <= 1e5;
ALTER TABLE lsc_t ADD PRIMARY KEY(ID);
CREATE INDEX lsc_i ON lsc_t(NAME);
EXEC dbms_stats.gather_table_stats(user,'LSC_T',cascade=>true)
let’s measure the time for a full table scan
SQL> set timi on
SQL> SELECT ID, NAME, birthdate
FROM lsc_t WHERE NAME LIKE '%ABC%';
ID NAME BIRTHDATE
--------- ----------------- ---------
60249 ABCBIFAB KRKBCRN 11-MAR-90
16714 AF YABCG 09-OCT-95
55571 BABCIQ GESGLW 27-MAR-50
77561 BP GABC 24-APR-90
80027 DALSABC TZLOAWDV 05-NOV-01
49817 EABCTFIY XWB 10-FEB-88
23283 EMMOGGBF DABCB 20-DEC-87
39530 FMABCKB AB 18-SEP-87
68605 FTPGOHE ABCC 28-SEP-28
74615 KIFDWABC CSSUQ 08-AUG-82
31772 KNOABCT BO 08-SEP-77
68730 KRYIEN LMABC 10-APR-07
43317 LUFJKZJT AUABCZR 19-DEC-88
76851 MZABC TEIFG 14-SEP-92
54589 NXE YABCDX 03-MAY-88
6940 OIWABCZ DLFFXY 29-MAR-88
59070 ONIB ADGABCI 29-JUL-07
27264 PGHOABC ZY 05-OCT-90
38157 QABC OPZHE 13-JUN-87
17511 QPDKD CIABCJ 08-AUG-69
25507 RX OWULOABC 24-FEB-92
62159 SEABC DAILK 25-JUN-02
3845 SK CCABCG 22-JAN-80
50059 SPABC BVHRHW 18-MAR-86
54700 UABCPC WUHAJS 28-OCT-71
70207 UKY OIDUABC 23-APR-88
39484 WABC TJLYHVJZ 14-MAR-78
14561 WDRWABC XZKDH 29-MAR-86
61501 YBYU RYABCGI 28-JUN-78
30578 YEWENGX ABCHARA 12-SEP-67
35397 YHBEABC HFKO 25-AUG-85
26450 YOABCVG HJT 23-DEC-98
87224 ZKNLNY YAABC 13-NOV-61
33 rows selected.
Elapsed: 00:00:02.56
about 3 seconds for retrieving 33 rows out of 100000
let’s try with an index
SQL> SELECT /*+INDEX(LSC_T,LSC_I)*/ ID, NAME, birthdate
FROM lsc_t WHERE NAME LIKE '%ABC%';
ID NAME BIRTHDATE
--------- ----------------- ---------
60249 ABCBIFAB KRKBCRN 11-MAR-90
16714 AF YABCG 09-OCT-95
55571 BABCIQ GESGLW 27-MAR-50
77561 BP GABC 24-APR-90
80027 DALSABC TZLOAWDV 05-NOV-01
49817 EABCTFIY XWB 10-FEB-88
23283 EMMOGGBF DABCB 20-DEC-87
39530 FMABCKB AB 18-SEP-87
68605 FTPGOHE ABCC 28-SEP-28
74615 KIFDWABC CSSUQ 08-AUG-82
31772 KNOABCT BO 08-SEP-77
68730 KRYIEN LMABC 10-APR-07
43317 LUFJKZJT AUABCZR 19-DEC-88
76851 MZABC TEIFG 14-SEP-92
54589 NXE YABCDX 03-MAY-88
6940 OIWABCZ DLFFXY 29-MAR-88
59070 ONIB ADGABCI 29-JUL-07
27264 PGHOABC ZY 05-OCT-90
38157 QABC OPZHE 13-JUN-87
17511 QPDKD CIABCJ 08-AUG-69
25507 RX OWULOABC 24-FEB-92
62159 SEABC DAILK 25-JUN-02
3845 SK CCABCG 22-JAN-80
50059 SPABC BVHRHW 18-MAR-86
54700 UABCPC WUHAJS 28-OCT-71
70207 UKY OIDUABC 23-APR-88
39484 WABC TJLYHVJZ 14-MAR-78
14561 WDRWABC XZKDH 29-MAR-86
61501 YBYU RYABCGI 28-JUN-78
30578 YEWENGX ABCHARA 12-SEP-67
35397 YHBEABC HFKO 25-AUG-85
26450 YOABCVG HJT 23-DEC-98
87224 ZKNLNY YAABC 13-NOV-61
33 rows selected.
Elapsed: 00:00:00.06
Much better 🙂
reposted due to % in url
Great example, but do you have any explanation as to how this works? This seems to defy conventional wisdom.
Ok, if you look at the explain plan, it is doing a full index scan. As the index is much smaller than the table, the full index scan is faster.
Here the explain plan :
SELECT /*+INDEX(LSC_T,LSC_I)*/
ID, NAME, birthdate
FROM lsc_t
WHERE NAME LIKE '%ABC%';
Execution Plan
----------------------------------------------------------
Plan hash value: 3948252342
-------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 5048 | 128K| 5326 (1)| 00:01:04 |
| 1 | TABLE ACCESS BY INDEX ROWID| LSC_T | 5048 | 128K| 5326 (1)| 00:01:04 |
|* 2 | INDEX FULL SCAN | LSC_I | 5000 | | 324 (1)| 00:00:04 |
-------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter("NAME" LIKE '%ABC%')
SELECT ID, NAME, birthdate
FROM lsc_t
WHERE NAME LIKE '%ABC%';
Execution Plan
----------------------------------------------------------
Plan hash value: 3022182369
---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 5048 | 128K| 21915 (1)| 00:04:23 |
|* 1 | TABLE ACCESS FULL| LSC_T | 5048 | 128K| 21915 (1)| 00:04:23 |
---------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("NAME" LIKE '%ABC%')
Another benefit would be a smaller set to check if there were any NULLs. None in this set, but might be in others.
Excellent post Laurent. I never thought of that.
I performed the same test as you did here on my emrep db:
Oracle Database 10g Enterprise Edition Release 10.1.0.4.0 – 64bit Production
The only change I made in the test was to swap the order of the selects. First, I selected with the index hint:
40 rows selected.
Elapsed: 00:00:02.53
And second, I selected without the index hint:
40 rows selected.
Elapsed: 00:00:01.45
So is it possible that your results are largely the result of caching? I confirmed with autotrace that the index range scan is only done when I use the index hint, but I could not realize any performance benefit. The full table scan is always faster. The statistics that autotrace provides are practically identical.
I’m also interested in your thoughts about how an index range scan can even be performed for a like query like this, unless the range encompasses the entire index. 🙂
Hi Laurent,
Don’t use hints use a bind variable instead (Oracle 11g).
(I’m really curious about explaining this comportment while thinking at peek variable binding.)
mni@DIANA> variable b varchar2(10)
mni@DIANA> exec :b := ‘%ABC%’
ProcÚdure PL/SQL terminÚe avec succÞs.
mni@DIANA> set autotrace on
mni@DIANA> SELECT ID, NAME, birthdate
2 FROM lsc_t WHERE NAME LIKE :b;
ID NAME BIRTHDAT
———- —————————— ——–
23306 ABC SFNSK 15/07/02
63424 ABCK CURAX 25/09/72
24261 ABCSOJ UAEW 06/10/91
17789 ABCVFYJ OEL 19/10/07
94597 ANABCJF PXEWGJWD 28/03/92
…
40 ligne(s) sÚlectionnÚe(s).
Plan d’exÚcution
———————————————————-
Plan hash value: 2979532608
————————————————————————————-
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
————————————————————————————-
| 0 | SELECT STATEMENT | | 5000 | 122K| 904 (0)| 00:00:11 |
| 1 | TABLE ACCESS BY INDEX ROWID| LSC_T | 5000 | 122K| 904 (0)| 00:00:11 |
|* 2 | INDEX RANGE SCAN | LSC_I | 900 | | 4 (0)| 00:00:01 |
————————————————————————————-
Predicate Information (identified by operation id):
—————————————————
2 – access(“NAME” LIKE :B)
filter(“NAME” LIKE :B)
Statistiques
———————————————————-
412 recursive calls
0 db block gets
434 consistent gets
0 physical reads
0 redo size
162871 bytes sent via SQL*Net to client
438 bytes received via SQL*Net from client
4 SQL*Net roundtrips to/from client
6 sorts (memory)
0 sorts (disk)
40 rows processed
Sorry,
I’ve forgot to say that I believe that yours example misses an dbms_random.seed as the first instruction in order to be reproducible
@Philip
Did you use the exact same command to create the table? including a somewhat largish address field?
The query that uses an index will generate about 300 logical IOs for scanning the indexes plus 1 LIO for each row returned , while the full scan table will generate 1 logical IO for each row in the table.
We are talking 340 LIOs vs. 100,000 LIOs.
I don’t see any way 100,000 LIOS will be faster. Cache or no cache. Not to mention the fact that 340 blocks fit in a much smaller cache.
Laurent, of course, cheated a bit by not mentioning the fact that each row takes a full block, making for an impressive 800M table. While the indexes take around 100 and 200 blocks each. Size matters.
Using the index will not work as well in less extreme cases. Maybe a responsible SQL author should mention this 🙂
Its late, and I’m tired. Of course, only one of the indexes was used.
Laurent,
Just when I begin to think that I am starting to understand how CBO works, posts like this take me back to square one (just kidding)…:)
But seriously, why, in this case, CBO did not go for (much better) INDEX SCAN on its own (i.e. without a hint)?
Hi,
on my configuration (10.2.04 8k block and default pctused 40 /pcfree 10 )
each row of this table takes one block(due to the address field).
A full table scan reads 100 000 rows=100 000 blocks
(my test autotrace -> consistent gets=100018)
my LSC_I contains 321 leafs so
A full index scan reads reads 321 blocks + 40 rows=40 blocks of LSC_T
Total 361 blocks.
autotrace gives consistent gets=362
and we retrieve only 0.04 percent of rows….
it’s a perfect example of using a hint:)
@philip no caching. did you compare your execution plans?
@marius use the index automatically in 11g? sounds nice… why a seed, you will very probably get about 30 rows with ABC anyway, right ?
@chen it is not cheating, it is written in the code! ok, 4000 chars is probably more than an average row len.
select avg(avg_row_len)
from user_tables
where avg_row_len!=0;
AVG
---
158
So 4000 is a bit extreme…
lol@responsible author !!! I am writting a blog without a responsible reviewer like Chen Shapira, who spent more than 100 hours for my book, but fortunately I sometimes receive funny comments in my blog 😉
@narendra The CBO does not know that %ABC% return less than one thousandth of the rows. But he maybe should know!
@christophe Thanks!
OK, I got it: it’s the high value of the clustering factor that makes the index unusable: 100 000 for 99 060 distinct values. I think also that there are few chances that this situation occurs in practice for a NAME column.
@Laurent, Oracle 10 xe also uses the index if the query is using binds.
Pingback: Blogroll Report 03/07/2009 – 10/07/2006 « Coskan’s Approach to Oracle
Cool… i have in my production enviroment some really hard SQL´s that read from various tables….
could you make some example ??
regards.
really nice post, buddy!
@Chen Shapira: Yes, I used the same commands — copied and pasted, as a matter of fact!
@Laurent SchneiderI follow the logic — fewer LIOs is going to be faster, and scanning a smaller index is going to have fewer LIOs than scanning the table.
But there must be something else in play in my version at least — I’m running this in the stock database (emrep) that comes with Grid Control. Here’s the full output when I re-ran, with the same results:
SQL*Plus: Release 10.1.0.4.0 - Production on Tue Jul 14 15:02:47 2009
Copyright (c) 1982, 2005, Oracle. All rights reserved.
Enter password:
Connected to:
Oracle Database 10g Enterprise Edition Release 10.1.0.4.0 - 64bit Production
With the Partitioning, OLAP and Data Mining options
SQL> set autotrace on
SQL> set lines 132 pages 50000
SQL> CREATE TABLE lsc_t AS
2 SELECT
3 ROWNUM ID,
4 SUBSTR(DBMS_RANDOM.STRING ('n', DBMS_RANDOM.VALUE (2, 8))
5 || ' '
6 || DBMS_RANDOM.STRING ('n', DBMS_RANDOM.VALUE (2, 8))
7 ,1,17) NAME,
8 TRUNC(SYSDATE-ABS(10000*DBMS_RANDOM.NORMAL)) birthdate,
9 LPAD('X',4000,'X') address
10 FROM DUAL
11 CONNECT BY LEVEL <= 1e5; Table created. SQL> set timing on
SQL> ALTER TABLE lsc_t ADD PRIMARY KEY(ID);
Table altered.
Elapsed: 00:00:06.40
SQL> CREATE INDEX lsc_i ON lsc_t(NAME);
Index created.
Elapsed: 00:00:01.93
SQL> EXEC dbms_stats.gather_table_stats(user,'LSC_T',cascade=>true)
PL/SQL procedure successfully completed.
Elapsed: 00:00:03.90
SQL> SELECT /*+INDEX(LSC_T,LSC_I)*/ ID, NAME, birthdate
2 FROM lsc_t WHERE NAME LIKE '%ABC%';
ID NAME BIRTHDATE
---------- ---------------- ---------
48499 ABC UY 15-NOV-72
90410 ABCFUFL NNJAICF 22-NOV-73
26299 ABCH MSI 17-DEC-79
9676 ABCP FMDQ 26-DEC-47
60078 ABCZVCP RGOOR 07-OCT-89
9600 FGTEC QABCONZN 13-DEC-92
52663 GK OABCA 12-DEC-00
7477 HMCRHJE YYBABC 22-MAR-06
1954 HMMYSABC GX 13-OCT-03
88834 IKFXJYW PABCH 22-DEC-98
91975 ILABCSH HBFU 17-MAY-74
49988 JGABC BDODNVTH 10-FEB-78
61072 JIABC JBTYSSX 28-JUL-74
75697 KABCF SIADWOC 02-NOV-87
98934 KKLHWABC XEZEUT 13-JUL-71
19202 LPR AIOABC 11-JUL-81
6272 LTVABC HKZON 28-DEC-08
87546 NLAXP ARUABCU 01-NOV-75
30942 OGVGME PKABC 12-NOV-03
16687 OIBWXNF AOABCUL 07-NOV-90
80854 OOKLU ABCI 15-NOV-00
36058 PLMAMB QABCIF 11-AUG-81
21598 RCBRLJ KABC 13-JUN-90
21773 REVABC TYIYJCJ 14-NOV-89
2734 RIDWABC EJG 04-JAN-98
31459 RPYOR ABCGNRU 17-SEP-94
55713 RVABCQF IDENHCMP 22-JUL-06
68639 SDSHIFD EXCABCJ 19-NOV-82
592 SRMTJPLM ABCGMC 01-JUL-01
14004 SROKHSY XXABCSLB 16-JUL-85
24813 TWABCU LALZZOOS 05-JUN-92
7814 VBIRR SABCCWSY 04-NOV-89
99557 WQ ABC 11-DEC-63
27146 XWCM ABCYDI 22-NOV-91
68633 ZKU DABCKFHS 23-JUN-99
35 rows selected.
Elapsed: 00:00:02.21
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=ALL_ROWS (Cost=5025 Card=5000 Bytes=130000)
1 0 TABLE ACCESS (BY INDEX ROWID) OF 'LSC_T' (TABLE) (Cost=5025 Card=5000 Bytes=130000)
2 1 INDEX (RANGE SCAN) OF 'LSC_I' (INDEX) (Cost=18 Card=5000)
Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
100325 consistent gets
99074 physical reads
0 redo size
2078 bytes sent via SQL*Net to client
686 bytes received via SQL*Net from client
4 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
35 rows processed
SQL> SELECT ID, NAME, birthdate
2 FROM lsc_t WHERE NAME LIKE '%ABC%';
ID NAME BIRTHDATE
---------- ---------------- ---------
592 SRMTJPLM ABCGMC 01-JUL-01
1954 HMMYSABC GX 13-OCT-03
2734 RIDWABC EJG 04-JAN-98
6272 LTVABC HKZON 28-DEC-08
7477 HMCRHJE YYBABC 22-MAR-06
7814 VBIRR SABCCWSY 04-NOV-89
9600 FGTEC QABCONZN 13-DEC-92
9676 ABCP FMDQ 26-DEC-47
14004 SROKHSY XXABCSLB 16-JUL-85
16687 OIBWXNF AOABCUL 07-NOV-90
19202 LPR AIOABC 11-JUL-81
21598 RCBRLJ KABC 13-JUN-90
21773 REVABC TYIYJCJ 14-NOV-89
24813 TWABCU LALZZOOS 05-JUN-92
26299 ABCH MSI 17-DEC-79
27146 XWCM ABCYDI 22-NOV-91
30942 OGVGME PKABC 12-NOV-03
31459 RPYOR ABCGNRU 17-SEP-94
36058 PLMAMB QABCIF 11-AUG-81
48499 ABC UY 15-NOV-72
49988 JGABC BDODNVTH 10-FEB-78
52663 GK OABCA 12-DEC-00
55713 RVABCQF IDENHCMP 22-JUL-06
60078 ABCZVCP RGOOR 07-OCT-89
61072 JIABC JBTYSSX 28-JUL-74
68633 ZKU DABCKFHS 23-JUN-99
68639 SDSHIFD EXCABCJ 19-NOV-82
75697 KABCF SIADWOC 02-NOV-87
80854 OOKLU ABCI 15-NOV-00
87546 NLAXP ARUABCU 01-NOV-75
88834 IKFXJYW PABCH 22-DEC-98
90410 ABCFUFL NNJAICF 22-NOV-73
91975 ILABCSH HBFU 17-MAY-74
98934 KKLHWABC XEZEUT 13-JUL-71
99557 WQ ABC 11-DEC-63
35 rows selected.
Elapsed: 00:00:01.47
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=ALL_ROWS (Cost=9963 Card=5000 Bytes=130000)
1 0 TABLE ACCESS (FULL) OF 'LSC_T' (TABLE) (Cost=9963 Card=5000 Bytes=130000)
Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
100020 consistent gets
92438 physical reads
0 redo size
2078 bytes sent via SQL*Net to client
686 bytes received via SQL*Net from client
4 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
35 rows processed
SQL> drop table lsc_t;
Table dropped.
Elapsed: 00:00:00.41
SQL> quit
Disconnected from Oracle Database 10g Enterprise Edition Release 10.1.0.4.0 - 64bit Production
With the Partitioning, OLAP and Data Mining options
@Philip
you need to run the statement more than once to avoid caching issues. Or you can flush the buffer cache between each run.
@Laurent Schneider I already did that the first time I ran the test. I ran the hinted and un-hinted statements multiple times, and the timing of each was very repeatable. The FTS was *always* faster.
I suspect that this may have to do with the database version. I’ll try to get off my lazy ass tomorrow and do the test in a 10.2.0.3 database.
Regardless, this was a very cool post about pointing out another way to think about the “everybody knows you can’t use an index with non-prefixed LIKE clauses” rule of thumb.
Interesting, Maybe the speed of your disk or some internal settings ?
@Philip: What baffles me, is the fact you have so much PIO in both cases. I tried the same on my MS Windows workstation, with 10.2.0.3, and the results are even more amazing: 0.60 secs on the hinted, vs 17.6 secs on the FTS. The hinted plan:
The not hinted one (and note the above LIO/PIO versus the plan here!):
That is about the same as you – you nust have a very small buffer…
@Philip
Hello I want to learn more on SQL. So give me a sample queries on student database for my learning.Thank u.