Home » Server Options » Text & interMedia » Accent insensitive indexation (12.1.0.2)
Accent insensitive indexation [message #634261] Fri, 06 March 2015 11:40 Go to next message
Michel Cadot
Messages: 68618
Registered: March 2007
Location: Nanterre, France, http://...
Senior Member
Account Moderator

Here a question that has been sent to me and I can't answer.
We have an Oracle Text defaulted to French.
The question is: "is it possible to create a domain index in a way it is not sensitive to the accents?", that is (I think) it works in the same way if some accents are present or not in a text.
But maybe the question is not adequate, maybe the question should be "is there a way to search text so that it is accent insensitive?".

Or maybe the solution is the generic (not Oracle Text specific) one: use FRENCH_AI for NLS_SORT but I don't know if it will be sufficient to use with a domain index.

Waiting for your answers...

Re: Accent insensitive indexation [message #634268 is a reply to message #634261] Fri, 06 March 2015 16:43 Go to previous messageGo to next message
Barbara Boehmer
Messages: 9077
Registered: November 2002
Location: California, USA
Senior Member
If you create a lexer and set the base_letter attribute to yes for the lexer, then use that lexer in your index parameters, it will cause all letters with accents and other diacritical marks to be indexed without the accents and such and searching with or without accents will find results with or without accents. Please see the brief simplified demonstration below.

SCOTT@orcl12c> CREATE TABLE test_tab (id  NUMBER, test_col  VARCHAR2(60))
  2  /

Table created.

SCOTT@orcl12c> INSERT ALL
  2  INTO test_tab VALUES (1, 'très')
  3  INTO test_tab VALUES (2, 'tres')
  4  SELECT * FROM DUAL
  5  /

2 rows created.

SCOTT@orcl12c> BEGIN
  2    CTX_DDL.CREATE_PREFERENCE ('test_lex', 'BASIC_LEXER');
  3    CTX_DDL.SET_ATTRIBUTE ('test_lex', 'BASE_LETTER', 'YES');
  4  END;
  5  /

PL/SQL procedure successfully completed.

SCOTT@orcl12c> CREATE INDEX test_idx ON test_tab (test_col) INDEXTYPE IS CTXSYS.CONTEXT
  2  PARAMETERS ('LEXER test_lex')
  3  /

Index created.

SCOTT@orcl12c> SELECT token_text FROM dr$test_idx$i
  2  /

TOKEN_TEXT
----------------------------------------------------------------
TRES

1 row selected.

SCOTT@orcl12c> SELECT * FROM test_tab WHERE CONTAINS (test_col, 'très') > 0
  2  /

        ID TEST_COL
---------- ------------------------------------------------------------
         1 très
         2 tres

2 rows selected.

SCOTT@orcl12c> SELECT * FROM test_tab WHERE CONTAINS (test_col, 'tres') > 0
  2  /

        ID TEST_COL
---------- ------------------------------------------------------------
         1 très
         2 tres

2 rows selected.

Re: Accent insensitive indexation [message #634287 is a reply to message #634268] Sat, 07 March 2015 00:46 Go to previous message
Michel Cadot
Messages: 68618
Registered: March 2007
Location: Nanterre, France, http://...
Senior Member
Account Moderator

Many thanks, I will suggest this next Monday and tell you.

Previous Topic: Materialized view having oracle text index not returning results after complete refresh
Next Topic: Abbreviations and acronyms thesaurus
Goto Forum:
  


Current Time: Tue Mar 19 05:21:29 CDT 2024