Home » Server Options » Text & interMedia » Text Index return min of words specified (Oracle 11G)
Text Index return min of words specified [message #616613] Thu, 19 June 2014 01:11 Go to next message
sherlocksher
Messages: 14
Registered: June 2013
Junior Member
Hi,
We have a text index that is built on a multi column data store . The column names are DISC_CMPLNT_TX and MAINT_ACT_TX .

Our requirement is as follows :

1. The user can specify two types of words for searching. The first type is a REQUIRED KEYWORD and the second type is an OPTIONAL keyword.

2. Records that are returned should satisfy the condition where the required keyword is present and along with it ATLEAST 50% of the no of optional keywords should also be present .

For e.g. If the required keyword is RIGHT and the OPTIONAL keyword are SEAT, MAINTENANCE, ENGINE, SUPPLY,WING then records that contain the word RIGHT and three or more than three words from the optional keywords specified should be retrieved.

3. In addition, fuzzy search is switched on by default and each word can have synonyms.

4. For the REQUIRED keyword condition, I am able to specify the condition in the below manner where RHT,RT,R/H and RGT given below are synonyms of the word RIGHT.


SELECT a.ac_maint_id,
a.disc_cmplnt_tx,
a.MAINT_ACT_TX
score (1),
score (2)
FROM ac_maint a
WHERE project_id = 1261
AND contains (DISC_CMPLNT_TX,
'((?{RIGHT} = {RHT} = {RT} | {R/H} | {RGT}))',
1) > 0

5. Can somebody please clarify on how do I get atleast 50% of the OPTIONAL KEYWORDS. Also if the same OPTIONAL keyword is duplicated multiple times, then it should be treated as a single word.

Re: Text Index return min of words specified [message #616705 is a reply to message #616613] Thu, 19 June 2014 17:10 Go to previous messageGo to next message
Barbara Boehmer
Messages: 9077
Registered: November 2002
Location: California, USA
Senior Member
I believe the following meets your requirements and does so efficiently, using bind variables and one domain index access.

SCOTT@orcl12c> -- table:
SCOTT@orcl12c> CREATE TABLE ac_maint
  2    (ac_maint_id	 NUMBER,
  3  	project_id	 NUMBER,
  4  	disc_cmplnt_tx	 VARCHAR2(20),
  5  	maint_act_tx	 VARCHAR2(30))
  6  /

Table created.

SCOTT@orcl12c> -- data:
SCOTT@orcl12c> INSERT ALL
  2  INTO ac_maint VALUES ( 1, 1261, 'RIGHT SEAT ENGINE', 'MAINTENANCE SUPPLY WING')
  3  INTO ac_maint VALUES ( 2, 1261, 'SEAT ENGINE SUPPLY', 'MAINTENANCE WING RIGHT')
  4  INTO ac_maint VALUES ( 3, 1261, 'RICHT', 'SEAT MAINTENANCE SUPPLE')
  5  INTO ac_maint VALUES ( 4, 1261, 'SEAT WING SUPPLE', 'RICHT')
  6  INTO ac_maint VALUES ( 5, 1261, 'R/H', 'ENGINE WING SUPPLE')
  7  INTO ac_maint VALUES ( 6, 1261, 'ENGINE WING SUPPLE', 'R/H')
  8  INTO ac_maint VALUES ( 7, 1261, 'RIGHT', 'WING WING WING')
  9  INTO ac_maint VALUES ( 8, 9999, 'RIGHT SEAT ENGINE', 'MAINTENANCE SUPPLY WING')
 10  SELECT * FROM DUAL
 11  /

8 rows created.

SCOTT@orcl12c> INSERT INTO ac_maint
  2  SELECT ROWNUM + 10, 8888, NULL, SUBSTR (object_name, 1, 30)
  3  FROM   all_objects
  4  /

89921 rows created.

SCOTT@orcl12c> -- multi_column_datastore:
SCOTT@orcl12c> BEGIN
  2    CTX_DDL.CREATE_PREFERENCE ('test_ds', 'MULTI_COLUMN_DATASTORE');
  3    CTX_DDL.SET_ATTRIBUTE ('test_ds', 'COLUMNS', 'disc_cmplnt_tx, maint_act_tx');
  4  END;
  5  /

PL/SQL procedure successfully completed.

SCOTT@orcl12c> -- index:
SCOTT@orcl12c> CREATE INDEX text_index
  2  ON ac_maint (disc_cmplnt_tx)
  3  INDEXTYPE IS CTXSYS.CONTEXT
  4  FILTER BY project_id
  5  PARAMETERS
  6    ('DATASTORE	test_ds')
  7  /

Index created.

SCOTT@orcl12c> -- thesaurus with synonyms:
SCOTT@orcl12c> BEGIN
  2    CTX_THES.CREATE_THESAURUS ('test_thes');
  3    CTX_THES.CREATE_RELATION  ('test_thes', 'RIGHT', 'SYN', 'RHT');
  4    CTX_THES.CREATE_RELATION  ('test_thes', 'RIGHT', 'SYN', 'RT');
  5    CTX_THES.CREATE_RELATION  ('test_thes', 'RIGHT', 'SYN', 'R/H');
  6    CTX_THES.CREATE_RELATION  ('test_thes', 'RIGHT', 'SYN', 'RGT');
  7  END;
  8  /

PL/SQL procedure successfully completed.

SCOTT@orcl12c> -- required and optional variables and values:
SCOTT@orcl12c> VARIABLE required VARCHAR2(100)
SCOTT@orcl12c> EXEC :required := 'RIGHT'

PL/SQL procedure successfully completed.

SCOTT@orcl12c> VARIABLE optional VARCHAR2(100)
SCOTT@orcl12c> EXEC :optional := 'SEAT, MAINTENANCE, ENGINE, SUPPLY,WING'

PL/SQL procedure successfully completed.

SCOTT@orcl12c> -- query:
SCOTT@orcl12c> SET AUTOTRACE ON EXPLAIN
SCOTT@orcl12c> SELECT a.ac_maint_id, a.disc_cmplnt_tx, a.maint_act_tx
  2  FROM   ac_maint a,
  3  	    (SELECT  TRIM (REGEXP_SUBSTR (:optional, '[^,]+', 1, ROWNUM)) word,
  4  		     COUNT (*) OVER () words
  5  	     FROM    DUAL
  6  	     CONNECT BY LEVEL <= REGEXP_COUNT (:optional, ',') + 1) t
  7  WHERE  CONTAINS
  8  	      (a.disc_cmplnt_tx,
  9  	       '(?{' || :required || '} OR SYN ({' || :required || '}, test_thes)) AND
 10  		SDATA (project_id = 1261) AND
 11  		(?{' || t.word || '} OR SYN ({' || t.word || '}, test_thes))',
 12  	       1) > 0
 13  GROUP  BY a.ac_maint_id, a.disc_cmplnt_tx, a.maint_act_tx, words
 14  HAVING COUNT (*) / t.words >= 0.5
 15  ORDER  BY COUNT (*) DESC
 16  /

AC_MAINT_ID DISC_CMPLNT_TX       MAINT_ACT_TX
----------- -------------------- ------------------------------
          2 SEAT ENGINE SUPPLY   MAINTENANCE WING RIGHT
          1 RIGHT SEAT ENGINE    MAINTENANCE SUPPLY WING
          4 SEAT WING SUPPLE     RICHT
          6 ENGINE WING SUPPLE   R/H
          5 R/H                  ENGINE WING SUPPLE
          3 RICHT                SEAT MAINTENANCE SUPPLE

6 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 3029711498

--------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name       | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |            |    42 |  5586 |    16   (0)| 00:00:01 |
|   1 |  SORT ORDER BY                      |            |    42 |  5586 |    16   (0)| 00:00:01 |
|*  2 |   FILTER                            |            |       |       |            |          |
|   3 |    HASH GROUP BY                    |            |    42 |  5586 |    16   (0)| 00:00:01 |
|   4 |     NESTED LOOPS                    |            |    42 |  5586 |    16   (0)| 00:00:01 |
|   5 |      VIEW                           |            |     1 |    79 |     2   (0)| 00:00:01 |
|   6 |       WINDOW BUFFER                 |            |     1 |       |     2   (0)| 00:00:01 |
|   7 |        COUNT                        |            |       |       |            |          |
|*  8 |         CONNECT BY WITHOUT FILTERING|            |       |       |            |          |
|   9 |          FAST DUAL                  |            |     1 |       |     2   (0)| 00:00:01 |
|  10 |      TABLE ACCESS BY INDEX ROWID    | AC_MAINT   |    42 |  2268 |    16   (0)| 00:00:01 |
|* 11 |       DOMAIN INDEX                  | TEXT_INDEX |       |       |     4   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter(COUNT(*)/"WORDS">=0.5)
   8 - filter(LEVEL<= REGEXP_COUNT (:OPTIONAL,',')+1)
  11 - access("CTXSYS"."CONTAINS"("A"."DISC_CMPLNT_TX",'(?{'||:REQUIRED||'} OR SYN
              ({'||:REQUIRED||'}, test_thes)) AND            SDATA (project_id = 1261) AND
              (?{'||"T"."WORD"||'} OR SYN ({'||"T"."WORD"||'}, test_thes))',1)>0)

Note
-----
   - dynamic statistics used: dynamic sampling (level=2)

SCOTT@orcl12c>

Re: Text Index return min of words specified [message #616934 is a reply to message #616705] Mon, 23 June 2014 05:41 Go to previous messageGo to next message
sherlocksher
Messages: 14
Registered: June 2013
Junior Member
Thanks a lot Barbara ! . You are a lifesaver Very Happy
Some more questions.

1. Instead of using a create thesaurus, how can I use the synonyms directly for each OPTIONAL KEYWORD . We are using dynamic queries to retrieve data and insert records. Basically I need the text that needs to replace the below text with synonyms for t.word. Since we are using dynamic queries, I can build it and substitute it. What is the format in which it needs to be built.

OR SYN ({' || t.word || '}, test_thes)

2. Also , when the query is formed dynamically and the synonyms are added, how do we handle for cases where the synonym contains special characters like 'R/H'.

[Updated on: Mon, 23 June 2014 06:23]

Report message to a moderator

Re: Text Index return min of words specified [message #616975 is a reply to message #616934] Mon, 23 June 2014 13:27 Go to previous messageGo to next message
Barbara Boehmer
Messages: 9077
Registered: November 2002
Location: California, USA
Senior Member
The following revised demonstration does not use a thesaurus. Instead, it assumes that your synonyms will be passed within your variables with the | symbol between them, which means OR to Oracle Text. The brackets { and } are then wrapped around each individual word or synonym as part of the query to handle the special characters. I also made it so that it does fuzzy matching on the synonyms as well as the main words.

SCOTT@orcl12c> -- table:
SCOTT@orcl12c> CREATE TABLE ac_maint
  2    (ac_maint_id	 NUMBER,
  3  	project_id	 NUMBER,
  4  	disc_cmplnt_tx	 VARCHAR2(20),
  5  	maint_act_tx	 VARCHAR2(30))
  6  /

Table created.

SCOTT@orcl12c> -- data:
SCOTT@orcl12c> INSERT ALL
  2  INTO ac_maint VALUES ( 1, 1261, 'RIGHT SEAT ENGINE', 'MAINTENANCE SUPPLY WING')
  3  INTO ac_maint VALUES ( 2, 1261, 'CHAIR ENGINE SUPPLY', 'MAINTENANCE WING RIGHT')
  4  INTO ac_maint VALUES ( 3, 1261, 'RICHT', 'CHAIR MAINTENANCE SUPPLE')
  5  INTO ac_maint VALUES ( 4, 1261, 'CHIAR WING SUPPLE', 'RICHT')
  6  INTO ac_maint VALUES ( 5, 1261, 'R/H', 'ENGINE WING SUPPLE')
  7  INTO ac_maint VALUES ( 6, 1261, 'ENGINE WING SUPPLE', 'R/H')
  8  INTO ac_maint VALUES ( 7, 1261, 'RIGHT', 'WING WING WING')
  9  INTO ac_maint VALUES ( 8, 9999, 'RIGHT SEAT ENGINE', 'MAINTENANCE SUPPLY WING')
 10  SELECT * FROM DUAL
 11  /

8 rows created.

SCOTT@orcl12c> INSERT INTO ac_maint
  2  SELECT ROWNUM + 10, 8888, NULL, SUBSTR (object_name, 1, 30)
  3  FROM   all_objects
  4  /

89874 rows created.

SCOTT@orcl12c> -- multi_column_datastore:
SCOTT@orcl12c> BEGIN
  2    CTX_DDL.CREATE_PREFERENCE ('test_ds', 'MULTI_COLUMN_DATASTORE');
  3    CTX_DDL.SET_ATTRIBUTE ('test_ds', 'COLUMNS', 'disc_cmplnt_tx, maint_act_tx');
  4  END;
  5  /

PL/SQL procedure successfully completed.

SCOTT@orcl12c> -- index:
SCOTT@orcl12c> CREATE INDEX text_index
  2  ON ac_maint (disc_cmplnt_tx)
  3  INDEXTYPE IS CTXSYS.CONTEXT
  4  FILTER BY project_id
  5  PARAMETERS
  6    ('DATASTORE	test_ds')
  7  /

Index created.

SCOTT@orcl12c> -- required and optional variables and values:
SCOTT@orcl12c> VARIABLE required VARCHAR2(100)
SCOTT@orcl12c> EXEC :required := 'RIGHT|RHT|RT|R/H|RGT'

PL/SQL procedure successfully completed.

SCOTT@orcl12c> VARIABLE optional VARCHAR2(100)
SCOTT@orcl12c> EXEC :optional := 'SEAT|CHAIR, MAINTENANCE, ENGINE, SUPPLY,WING'

PL/SQL procedure successfully completed.

SCOTT@orcl12c> -- query:
SCOTT@orcl12c> SET AUTOTRACE ON EXPLAIN
SCOTT@orcl12c> SELECT a.ac_maint_id, a.disc_cmplnt_tx, a.maint_act_tx
  2  FROM   ac_maint a,
  3  	    (SELECT  TRIM (REGEXP_SUBSTR (:optional, '[^,]+', 1, ROWNUM)) word,
  4  		     COUNT (*) OVER () words
  5  	     FROM    DUAL
  6  	     CONNECT BY LEVEL <= REGEXP_COUNT (:optional, ',') + 1) t
  7  WHERE  CONTAINS
  8  	      (a.disc_cmplnt_tx,
  9  	       '(?{' || REPLACE (:required, '|', '}|?{') || '}) AND
 10  		SDATA (project_id = 1261) AND
 11  		(?{' || REPLACE (t.word, '|', '}|?{') || '})',
 12  	       1) > 0
 13  GROUP  BY a.ac_maint_id, a.disc_cmplnt_tx, a.maint_act_tx, words
 14  HAVING COUNT (*) / t.words >= 0.5
 15  ORDER  BY COUNT (*) DESC
 16  /

AC_MAINT_ID DISC_CMPLNT_TX       MAINT_ACT_TX
----------- -------------------- ------------------------------
          2 CHAIR ENGINE SUPPLY  MAINTENANCE WING RIGHT
          1 RIGHT SEAT ENGINE    MAINTENANCE SUPPLY WING
          5 R/H                  ENGINE WING SUPPLE
          4 CHIAR WING SUPPLE    RICHT
          3 RICHT                CHAIR MAINTENANCE SUPPLE
          6 ENGINE WING SUPPLE   R/H

6 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 3029711498

--------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name       | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |            |    42 |  5586 |    17   (6)| 00:00:01 |
|   1 |  SORT ORDER BY                      |            |    42 |  5586 |    17   (6)| 00:00:01 |
|*  2 |   FILTER                            |            |       |       |            |          |
|   3 |    HASH GROUP BY                    |            |    42 |  5586 |    17   (6)| 00:00:01 |
|   4 |     NESTED LOOPS                    |            |    42 |  5586 |    16   (0)| 00:00:01 |
|   5 |      VIEW                           |            |     1 |    79 |     2   (0)| 00:00:01 |
|   6 |       WINDOW BUFFER                 |            |     1 |       |     2   (0)| 00:00:01 |
|   7 |        COUNT                        |            |       |       |            |          |
|*  8 |         CONNECT BY WITHOUT FILTERING|            |       |       |            |          |
|   9 |          FAST DUAL                  |            |     1 |       |     2   (0)| 00:00:01 |
|  10 |      TABLE ACCESS BY INDEX ROWID    | AC_MAINT   |    42 |  2268 |    16   (0)| 00:00:01 |
|* 11 |       DOMAIN INDEX                  | TEXT_INDEX |       |       |     4   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter(COUNT(*)/"WORDS">=0.5)
   8 - filter(LEVEL<= REGEXP_COUNT (:OPTIONAL,',')+1)
  11 - access("CTXSYS"."CONTAINS"("A"."DISC_CMPLNT_TX",'(?{'||REPLACE(:REQUIRED,'|','}|?{'
              )||'}) AND            SDATA (project_id = 1261) AND
              (?{'||REPLACE("T"."WORD",'|','}|?{')||'})',1)>0)

Note
-----
   - dynamic statistics used: dynamic sampling (level=2)

SCOTT@orcl12c>

[Updated on: Mon, 23 June 2014 14:03]

Report message to a moderator

Re: Text Index return min of words specified [message #617050 is a reply to message #616975] Tue, 24 June 2014 07:35 Go to previous messageGo to next message
sherlocksher
Messages: 14
Registered: June 2013
Junior Member
Thanks again Barbara !.
Actually we are using some legacy code to retrieve records that satisfies this 50% optional keywords . Can you please check this logic and let me know your inputs on this.

1. Lets say the no of optional keywords is 6

2. Divide 100 by no of optional keywords which gives 16.66

3. Multiply 16.66 by ( 50% of No of optional keywords - 1) which translates to 16.66 * 2 and this comes to 32.

4. Now when we are forming the query text where instead of checking for contains text > 0, we check if the contains text is greater than the valued computed in step 3 i.e. 32

(?(({SEAT} = {SEATS}) , ({MAINTENANCE} = {MAINT}) , ({ENGINE} = {ENG} = {ENGINES}) , ({SUPPLY} = {SUPPY} = {SUP} = {SUPPL}) , ({WING} = {WINGS})))',1) > 32

Apparently this has "WORKED" for several years . I am not able to find any documentation or comments describing why this approach was followed and how this works.

If you have any inputs on how this works and its pitfalls, please let me know.
Re: Text Index return min of words specified [message #617075 is a reply to message #617050] Tue, 24 June 2014 13:41 Go to previous messageGo to next message
Barbara Boehmer
Messages: 9077
Registered: November 2002
Location: California, USA
Senior Member
Brilliant! It uses the comma, which is the accum operator, whose scoring is based on the number of matching terms. That enables you to tell where to limit the score, based on the number of terms. Please see the following section of the online documentation for a more detailed explanation and examples:

http://docs.oracle.com/cd/E16655_01/text.121/e17747/cqoper.htm#CCREF0302

Using that method necessitates putting that in a separate contains clause, which requires two domain index hits, but that may still perform better than what I previously suggested.

Using the = operator seems to not be able to handle variables with / like R/H even when surrounded by { and }, so I would use | instead.

I believe you need to place your ? in front of each word if you want to apply fuzzy to each word. If you want to have more options you could even use the word fuzzy and it's parameters, instead of just ? which uses the default parameters.

I don't know how you are dynamicaly creating your query. I hope you are not concatenating variables. You should be using bind variables for your required and optional values for maximum efficiency.

Please see my revised demonstration below.

SCOTT@orcl12c> -- table:
SCOTT@orcl12c> CREATE TABLE ac_maint
  2    (ac_maint_id	 NUMBER,
  3  	project_id	 NUMBER,
  4  	disc_cmplnt_tx	 VARCHAR2(20),
  5  	maint_act_tx	 VARCHAR2(30))
  6  /

Table created.

SCOTT@orcl12c> -- data:
SCOTT@orcl12c> INSERT ALL
  2  INTO ac_maint VALUES ( 1, 1261, 'RIGHT SEAT ENGINE', 'MAINTENANCE SUPPLY WING')
  3  INTO ac_maint VALUES ( 2, 1261, 'SEAT ENGINE SUPPLY', 'MAINTENANCE WING RIGHT')
  4  INTO ac_maint VALUES ( 3, 1261, 'RICHT', 'SEAT MAINTENANCE SUPPLE')
  5  INTO ac_maint VALUES ( 4, 1261, 'SEAT WING SUPPLE', 'RICHT')
  6  INTO ac_maint VALUES ( 5, 1261, 'R/H', 'ENGINE WING SUPPLE')
  7  INTO ac_maint VALUES ( 6, 1261, 'ENGONE WING SUPPLE', 'R/H')
  8  INTO ac_maint VALUES ( 7, 1261, 'RIGHT', 'WING WING WING')
  9  INTO ac_maint VALUES ( 8, 9999, 'RIGHT SEAT ENGINE', 'MAINTENANCE SUPPLY WING')
 10  SELECT * FROM DUAL
 11  /

8 rows created.

SCOTT@orcl12c> INSERT INTO ac_maint
  2  SELECT ROWNUM + 10, 8888, NULL, SUBSTR (object_name, 1, 30)
  3  FROM   all_objects
  4  /

89830 rows created.

SCOTT@orcl12c> -- multi_column_datastore:
SCOTT@orcl12c> BEGIN
  2    CTX_DDL.CREATE_PREFERENCE ('test_ds', 'MULTI_COLUMN_DATASTORE');
  3    CTX_DDL.SET_ATTRIBUTE ('test_ds', 'COLUMNS', 'disc_cmplnt_tx, maint_act_tx');
  4  END;
  5  /

PL/SQL procedure successfully completed.

SCOTT@orcl12c> -- index:
SCOTT@orcl12c> CREATE INDEX text_index
  2  ON ac_maint (disc_cmplnt_tx)
  3  INDEXTYPE IS CTXSYS.CONTEXT
  4  FILTER BY project_id
  5  PARAMETERS ('DATASTORE  test_ds')
  6  /

Index created.

SCOTT@orcl12c> -- required and optional variables and values:
SCOTT@orcl12c> VARIABLE required VARCHAR2(100)
SCOTT@orcl12c> EXEC :required := 'RIGHT|RHT|RT|R/H|RGT'

PL/SQL procedure successfully completed.

SCOTT@orcl12c> VARIABLE optional VARCHAR2(100)
SCOTT@orcl12c> EXEC :optional := 'SEAT|SEATS,MAINTENANCE|MAINT,ENGINE|ENG|ENGINES,SUPPLY|SUPPY|SUP|SUPPL,WING|WINGS'

PL/SQL procedure successfully completed.

SCOTT@orcl12c> -- query:
SCOTT@orcl12c> SET AUTOTRACE ON EXPLAIN
SCOTT@orcl12c> SELECT a.ac_maint_id, a.disc_cmplnt_tx, a.maint_act_tx
  2  FROM   ac_maint a
  3  WHERE  CONTAINS
  4  	      (a.disc_cmplnt_tx,
  5  	       '(((?{' ||
  6  	       REPLACE (REPLACE (:required, '|', '}|?{'), ',', '}),(?{') ||
  7  	       '}))) AND
  8  	       SDATA (project_id = 1261)',
  9  	       1) > 0
 10  AND    CONTAINS
 11  	      (a.disc_cmplnt_tx,
 12  	       '(((?{' ||
 13  	       REPLACE (REPLACE (:optional, '|', '}|?{'), ',', '}),(?{') ||
 14  	       '})))',
 15  	       2) > (100 / (REGEXP_COUNT (:optional, ',') + 1)) *
 16  		    ((ROUND ((REGEXP_COUNT (:optional, ',') + 1) * 0.5)) - 1)
 17  ORDER  BY SCORE(2) DESC
 18  /

AC_MAINT_ID DISC_CMPLNT_TX       MAINT_ACT_TX
----------- -------------------- ------------------------------
          1 RIGHT SEAT ENGINE    MAINTENANCE SUPPLY WING
          2 SEAT ENGINE SUPPLY   MAINTENANCE WING RIGHT
          6 ENGONE WING SUPPLE   R/H
          4 SEAT WING SUPPLE     RICHT
          5 R/H                  ENGINE WING SUPPLE
          3 RICHT                SEAT MAINTENANCE SUPPLE

6 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 1435967249

---------------------------------------------------------------------------------------------------
| Id  | Operation                            | Name       | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |            |     1 |    54 |     8   (0)| 00:00:01 |
|   1 |  SORT ORDER BY                       |            |     1 |    54 |     8   (0)| 00:00:01 |
|   2 |   TABLE ACCESS BY INDEX ROWID BATCHED| AC_MAINT   |     1 |    54 |     8   (0)| 00:00:01 |
|   3 |    BITMAP CONVERSION TO ROWIDS       |            |       |       |            |          |
|   4 |     BITMAP AND                       |            |       |       |            |          |
|   5 |      BITMAP CONVERSION FROM ROWIDS   |            |       |       |            |          |
|   6 |       SORT ORDER BY                  |            |       |       |            |          |
|*  7 |        DOMAIN INDEX                  | TEXT_INDEX |       |       |     4   (0)| 00:00:01 |
|   8 |      BITMAP CONVERSION FROM ROWIDS   |            |       |       |            |          |
|   9 |       SORT ORDER BY                  |            |       |       |            |          |
|* 10 |        DOMAIN INDEX                  | TEXT_INDEX |       |       |     4   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   7 - access("CTXSYS"."CONTAINS"("A"."DISC_CMPLNT_TX",'(((?{'||REPLACE(REPLACE(:REQUIRED,'
              |','}|?{'),',','}),(?{')||'}))) AND           SDATA (project_id = 1261)',1)>0 AND
              "CTXSYS"."CONTAINS"("A"."DISC_CMPLNT_TX",'(((?{'||REPLACE(REPLACE(:OPTIONAL,'|','}|?{'),','
              ,'}),(?{')||'})))',2)>100/( REGEXP_COUNT (:OPTIONAL,',')+1)*(ROUND(( REGEXP_COUNT
              (:OPTIONAL,',')+1)*0.5)-1))
  10 - access("CTXSYS"."CONTAINS"("A"."DISC_CMPLNT_TX",'(((?{'||REPLACE(REPLACE(:OPTIONAL,'
              |','}|?{'),',','}),(?{')||'})))',2)>100/( REGEXP_COUNT (:OPTIONAL,',')+1)*(ROUND((
              REGEXP_COUNT (:OPTIONAL,',')+1)*0.5)-1))

Note
-----
   - dynamic statistics used: dynamic sampling (level=2)

SCOTT@orcl12c>


Re: Text Index return min of words specified [message #617102 is a reply to message #617075] Wed, 25 June 2014 01:27 Go to previous messageGo to next message
sherlocksher
Messages: 14
Registered: June 2013
Junior Member
Thanks Barbara for the link.

a. Yes. We are using two domain index hits. One for searching required words with contains > 0 and the other for the optional words with contains > computed value.

b. The exact query that we are using is as follows
/* Formatted on 6/25/2014 11:55:59 AM (QP5 v5.227.12220.39724) */
SELECT a.ac_maint_id,
a.disc_cmplnt_tx,
a.MAINT_ACT_TX,
a.LGBK_CMT_TX,
score (1),
score (2)
FROM ac_maint a
WHERE project_id = 1261
AND contains (DISC_CMPLNT_TX,
'((?{RIGHT} = {RHT} = {RT} | {R/H} | {RGT}))',
1) > 0
AND contains (
DISC_CMPLNT_TX,
'((?{SEAT} = {SEATS}) , (?{ENTRY}) , (?{MAINTENANCE} = {MX} = {MTC} = {MAINT}) , (?{MUSIC}) , (?{ENGINE} = {ENGINES} = {ENG}) , (?{SUPPLY} = {SUP} = {SUPPL} = {SUPPY}) , (?{WING} = {WINGS}))',
2) > 40

c. We are not using bind variables. We are just forming the dynamic query with the concatenated variables. We will change the implementation to use bind variables as you suggested to improve performance.

Now using the OR operator is not giving the expected results when the text contains Special characters . Please see example below.

1. The scores in Step 6 for both ID 1 and 2 are 18 and 34 respectively whereas they should be identical. Since the word "C/B" is present for ID2 and we are searching for the synonym "A/C", it is matching the "C" of the synonym with the "C" of the text and returning a higher score.

2. I referred the link https://community.oracle.com/thread/2261631 where you had answered to use printjoins for special characters to ensure EQUIV operator works as expected and when I dropped the index and re-created with this , then the scores for both the records came out as 18 which was what we expected. Note : In this case both OR and EQUIV operator returned the same score



-- STEP 1
CREATE TABLE accumtbl
(
ID NUMBER,
disc_cmplnt_tx CLOB,
maint_act_tx CLOB
);

-- STEP 2
INSERT INTO accumtbl
VALUES (
1,
'RT ENG IGNITION WOULD NOT START ON PUSHBACK ON RT ENG. RESET CB ON RT ENG AFTER MANY ATTEMPT TO CLEAR FUEL OUT OF TAIL PIPE/OPS OK. MITCHELL',
'ABCD');

-- STEP 3

INSERT INTO accumtbl
VALUES (
2,
'RT ENG IGNITION WOULD NOT START ON PUSHBACK ON RT ENG. RESET C/B ON RT ENG AFTER MANY ATTEMPT TO CLEAR FUEL OUT OF TAIL PIPE/OPS OK. MITCHELL',
'DEFG');


-- STEP 4
BEGIN
CTX_DDL.CREATE_PREFERENCE ('test_ds', 'MULTI_COLUMN_DATASTORE');
CTX_DDL.SET_ATTRIBUTE ('test_ds',
'COLUMNS',
'disc_cmplnt_tx, maint_act_tx');
END;
/
-- STEP 5

CREATE INDEX accumtbl_idx
ON accumtbl (disc_cmplnt_tx)
INDEXTYPE IS ctxsys.context;


-- STEP 6

SELECT id,
disc_cmplnt_tx,
score (1),
score (2)
FROM accumtbl
WHERE contains (disc_cmplnt_tx,
'({RIGHT} | {R/H} | {RHT} = {RGT} = {RT})',
1) > 0
AND contains (
disc_cmplnt_tx,
'(({AIRCRAFT} = {ACFT} | {A/C}) , ({SEAT} = {SEATS}) , ({FUEL} = {FUELS}) , ({ENGINE} = {ENG} = {ENGINES}) , ({SUPPLY} = {SUPPY} = {SUP} = {SUPPL}) , ({WING} = {WINGS}))',
2) > 0;


-- STEP 7
DROP INDEX accumtbl_idx;

-- STEP 8

BEGIN
CTX_DDL.CREATE_PREFERENCE ('test_lex', 'BASIC_LEXER');
CTX_DDL.SET_ATTRIBUTE ('test_lex', 'PRINTJOINS', '/');
END;
/

-- STEP 9

CREATE INDEX accumtbl_idx
ON accumtbl (disc_cmplnt_tx)
INDEXTYPE IS ctxsys.context
PARAMETERS ('LEXER test_lex');


-- STEP 10
SELECT id,
disc_cmplnt_tx,
score (1),
score (2)
FROM accumtbl
WHERE contains (disc_cmplnt_tx,
'({RIGHT} | {R/H} | {RHT} = {RGT} = {RT})',
1) > 0
AND contains (
disc_cmplnt_tx,
'(({AIRCRAFT} = {ACFT} | {A/C}) , ({SEAT} = {SEATS}) , ({FUEL} = {FUELS}) , ({ENGINE} = {ENG} = {ENGINES}) , ({SUPPLY} = {SUPPY} = {SUP} = {SUPPL}) , ({WING} = {WINGS}))',
2) > 0;

[Updated on: Wed, 25 June 2014 01:38]

Report message to a moderator

Re: Text Index return min of words specified [message #617167 is a reply to message #617102] Wed, 25 June 2014 12:40 Go to previous message
Barbara Boehmer
Messages: 9077
Registered: November 2002
Location: California, USA
Senior Member
Without the printjoins, it converts / to a space, so it tokenizes C/B as C and B and given A/C searches for A or C and thus finds the C from A/C as the C from C/B. In my previous testing, using = without setting / as a printjoin produced an error, but using OR did not. Apparently, setting the / as a printjoin solves both problems. So, it looks like you've got it all figured out correctly now. Thanks for providing the full explanation and code.
Previous Topic: CTXSYS fuzzy searching
Next Topic: Oracle Text index catsearch order by close match
Goto Forum:
  


Current Time: Tue Mar 19 05:22:14 CDT 2024