Archive

Posts Tagged ‘Verity’

Documentum Full Text Index Server

November 2, 2011 Comments off

For faster and better search functionalities EMC has developed a Full Text Index Server which is installed separately with the Content management software to provide an index based search capability. In version 5.2.5 SPx the full text search engine was using Verity which has been now changed to FAST (Fast Search & Transfer) in 5.3 SPx onwards and xPlore replaced further.

In Verity we have to explicitly define the attributes to be indexed, in the content server configuration, whereas one of the salient features of FAST is that by default, all the attributes are indexed along with the content of the document. Since, FAST is no longer tightly coupled with the installation of the content server; one has the option of not installing Index
 Server. If the Full Text Index server is not installed, simple search will perform a case sensitive database search against object_name, title and subject attributes of dm_sysobject and its subtypes.

This post describes the various components of Index Server and their operations.

1.   Software Components

Full-text indexing in a Documentum repository is controlled by three software components:

  • Content Server, which manages the objects in a repository, generates the events that trigger  full-text indexing operations, queries the full-text indexes, and returns query results to client applications.
  • The index agent, which exports documents from a repository and prepares them for indexing.
  • The index server, which is a third-party server product that creates and maintains the full-text index for a repository. The index server also receives full-text queries from Content Server and responds to those queries.

2.   Set Up Configuration

a)  Basic Set Up

The basic indexing model consists of a single index agent and index server supporting a single repository. The index agent and index server may be installed on the Content Server host or on a different host.

b)  Consolidated Set Up

In a consolidated deployment, a single index server provides search and indexing services to multiple repositories. The repositories may be in the same Content Server installation or on different hosts. However, all repositories must be of the same Content Server version.

3.   Index Server Processes

The index server consists of five groups of processes that have different functions.

a) Document processors

Document processors (also sometimes called procservers) extract indexable content from content files, convert DFTXML to FIXML (a format that is used directly by the indexer), and merge the indexable content with the metadata during the DFTXML conversion process. Document processors are the largest consumer of CPU power in the index server.

b) Indexer

The indexer creates the searchable full-text index from the intermediate FIXML format. It consists of two processes. The frtsobj process interfaces with the document processor and spawns different findex processes as necessary to build the index from FIXML.

c) Query and Results servers

The QR Server (Query and Results Server) is a permanently-running process that accepts queries from Content Server, passes queries to the fsearch processes, and merges the results when there are multiple fsearch processes running.

The index server can run in continuous mode or in a special mode called suspended mode. In suspended mode, FIXML is generated for any updates
to the index but not integrated into the index. When the index server is taken out of suspended mode, the index is updated. Running in suspended mode; speeds up the indexing process. Suspended mode should be used when the requirement is to index large volume of documents or to re-index an entire repository.

4.   Health check up for Index Server processes

Execute the following command through command prompt

nctrl sysstatus

This will list all the Index Server process with their status.

Another option is to use Index Server admin console through <a href=”http://localhost:/admin”>http://localhost:<portno.>/admin and navigate to “System Management” tab.

Navigate to “Matching Engines” tab for details on total no. of documents in all the filestores (if there are multiple repositories) and the no. of documents processed by Index Server. It also provides a link to Index Server log file.

5.   How to determine Index Server ports

Using the Index server base port we can determine the ports for various Index Server processes

Index Server admin console: Base Port + 3000

FAST Search console: Base Port + 2100

6.   Fulltext indexing queue messages

When a document has been marked and submitted for fulltext indexing, it is queued to the Index Agent/Index Server.

The fulltext index status can be checked by the following dql query:

select sent_by, date_sent, item_name, content_type, task_state, message from dmi_queue_item where item_id = ”

The task_state can have one of the following values:

‘’ – The item is available to be picked up by an Index Agent for indexing.

‘acquired’ – The item is being processed. If an Index Agent stops abruptly, a queue item can be left in this state until the Index Agent is restarted or an Administrator clears the queue item.

‘warning’ – The item was indexed with a warning. Often it indicates that the content of the object failed to index but the meta-data was successfully indexed.

The ‘message’ attribute and the Index Agent log will have further details.

‘failed’ – The item failed to index, please refer to the ‘message’ attribute and the Index Agent log for more information.

‘done’ – Successfully indexed the item

7.   Index Agent Modes

An index agent may run in one of three operational modes:

normal

In normal mode, the index agent process index queue items and prepares the SysObjects associated with the queue items for indexing. When the index agent successfully submits the object for indexing, the index agent deletes the queue item from the repository. If the object is not submitted successfully, the queue item remains in the repository and the error or warning generated by the attempt to index the object is stored in the queue item.

migration

In migration mode, the index agent processes all SysObjects in a repository sequentially in r_object_id order and prepares them for indexing. A special queue item, the high-water mark queue item, is used to mark the index agent’s progress in the repository.

An index agent in normal mode and an index agent in migration mode cannot simultaneously update the same index.

file

In file mode, a file is used to submit a list of objects IDs to the index agent when a new index is created and index verification determines which objects are missing from the index.

8.   Switching modes of Index Agent

At the time of Index Agent set up the wizard gives an option to start the Index Agent under “Normal” or “Migration” mode.

The following steps should be performed to change the Index Agent from one mode to another.

1. Login to Index Agent Admin console through http://localhost:<index agent port no.>/IndexAgent<no.>/login.jsp

e.g., http://localhost:9081/IndexAgent1/login.jsp

2. Stop the Index Agent

3. Now change the Index Agent mode from Normal to Migration or Migration to Normal as the case may be.


4. Click on OK

5. Start the Index Agent again.

Note:

i) While in Migration mode Index Agent doesn’t appear in DA under Indexing Management tab. On the Index Agent admin screen it will provide the details of the no. of documents processed out of the total no. of documents.

ii) If the Index Agent service is restarted from services console then start the Index Agent from Index Agent Admin console or through DA under Indexing Management tab.

9.   Re-configuring Index Agent and FAST

Configuring another IA and FAST to a repository previously configured to work with one IA and FAST doesn’t modify dm_ftengine_config object and IA fails to start displaying error to connect to old FAST machine.

To resolve:

Manually update the dm_ftengine_config object based on the settings from the new machine

1. Go to IAPI and execute the following API –

iapi> retrieve,c,dm_ftengine_config

2. Note the object_id retrieved by the above API and use it to execute the following API –

iapi> dump,c,l

3. In the dump results note the following param_name, param_value pairs

fds_base_port should match 13000 or the base port number for Index Server Install

fds_config_host should match the host name where the Index Server is installed.

and so on….

4. The param_name/param_value pairs should be changed to match the values for the index server install.

5. Delete the following via dql:

delete dm_ftengine_config object where r_object_id = ‘old_value’

delete dm_ftindex_agent_config object where r_object_id = ‘old_value’

6. Run the index agent configuration program to create new index agent.

10.   Relocating fulltext indexes in Index Server

The following steps describe how we can change the location of fulltext indexes

 1. Shutdown the Index Agent

 2. Shutdown the Index Server

 3. Copy the indexes to the target location ( both the fixml and the index directories)

 4. Default location of the fixml and index directories:

      for Windows – %DOCUMENTUM%/data/fulltext

      for Unix –$DOCUMENTUM/data/fulltext

 5. Edit the following:

 In Windows:  %DOCUMENTUM/%fulltext/IndexServer/etc/searchrc-1.xml. Change the “index path”

 In Unix: $DOCUMENTUM/fulltext/IndexServer/etc/searchrc-1.xml. Change the “index path”

6. Edit the following:

 In Windows: %DOCUMENTUM%/fulltext/IndexServer/etc/config_data/RTSearch/ webcluster/rtsearchrc.xml. Change fixmlpath and fsearchdatasetdir to the new path

 In Unix: $DOCUMENTUM/fulltext/IndexServer/etc/config_data/RTSearch/webcluster/rtsearchrc.xml. Change fixmlpath and fsearchdatasetdir to the new path

 7. Startup IndexServer, Index Agent

11.   dm_FTCreateEvents Job

The Create Full-Text Events tool (dm_FTCreateEvents) may be used in two ways:

 a) To complete an upgrade by causing any objects missed by the pre-upgrade indexing operations to be indexed.

 The job generates events for each index able object added to a repository between the time a new 5.3 or later full-text index is created for a 5.2.5 repository and when the repository is upgraded to 5.3.

 This is the out-of-the-box behavior of the job.

 b) To generate the events required to re-index an entire 5.3 SP1 or later repository.

 Re-indexing the repository does not require deleting the existing index.

Please refer to the screenshot for the configuration of dm_FTCreateEvents Job –

To generate the events required to re-index an entire 5.3 SPx or later repository the -full_reindex argument must be set to TRUE to generate the required events.

The first time the job runs in its default mode, the job determines the last object indexed by an index agent running in migration mode and the date on which that object was indexed. The job searches for objects modified after that date and before the job runs for the first time and generates events for those objects. On its subsequent iterations, the job searches for objects modified after the end of the last iteration and before the beginning of the current iteration.

Before the job is run in a 5.3 SP1 or later repository with argument –full_reindex set to TRUE, you must create a high-water-mark queue item (dmi_queue_item) manually using the API –

create,c,dmi_queue_item

save,c,l

and specify the r_object_id of the queue item as the -high_water_mark_id argument of the dm_FTCreateEvents Job.

In case you get the following error message in the job’s report –

FTCreateEvents was aborted. Error happened while processing job. Error: No high water mark found for qualification:

Verify the -high_water_mark_id attribute and check whether the API was executed after installation or re-installation of the index server to get the required r_object_id argument.    

Disable the job if the application is not using Full Text Index Server.

The job can also be de-activated if the following events are registered for dm_fulltext_index_user’

  • dm_save
  • dm_destroy
  • dm_readonlysave
  • dm_checkin
  • dm_move_content

Execute the following query to verify the same:

select event from dmi_registry where user_name=’dm_fulltext_index_user’ 

12.   Using FT Integrity Tool

Modify the parameter file:

a) Login to the index server. Navigate to the parameter file location.
On Windows, Drive:\ProgramFiles\Documentum\IndexAgents\IndexAgentN\webapps\IndexAgentN

b) Open the ftintegrity.params.txt file in a text editor.

The first line is  -D repositoryname

where repositoryname is the repository for which you created a new index.

c) Add the following two lines immediately after the first line

-U username

-P password

where username is the user name of the Superuser whose account was used to install the index agent and password is the Superuser’s password.

 d) Save the  ftintegrity.params.txt file to %Documentum%\fulltext\IndexServer\bin  (Windows).

Sample FT Integrity params file

Note: The instruction above save a Superuser name and password to the file system in a plain text parameter file. For security reasons, you may wish to remove that information from the file after running the FTIntegrity tool. It is recommended that you save the parameter file in a location accessible only to the repository Superuser and installation owner.

 To run the index verification tool:

  1. Navigate to %Documentum%\fulltext\IndexServer\bin (Windows) or
  2. To verify both completeness and accuracy, open a command prompt and execute

           cobra ftintegrity.py -i ftintegrity.params.txt -m b

  1. To verify completenes only, open a command prompt and execute

           cobra ftintegrity.py -i ftintegrity.params.txt -m c

  1. To verify accuracy only and query all indexed objects, open a command prompt and execute

           cobra ftintegrity.py -i ftintegrity.params.txt -m a

FT Integrity generates 3 reports

res-comp-common.txt – object id of all documents that are found in both index and repository.

res-comp-dctmonly.txt – object id of documents that are in repository but not indexed

res-comp-fastonly.txt –object id of documents in index but not in repository.

It also generates ftintegrityoutput.txt file which is nothing but the console output generated in the text format.

 To resubmit objects that failed indexing:

a) Navigate to %DOCUMENTUM\fulltext\indexerver\bin.

b) Copy the res-comp-dctmonly.txt file to

drive:\Program Files\Documentum\IndexAgents\IndexAgentN\webapps\IndexAgentN\WEBINF\classes.

c) Rename the res-comp-dctmonly.txt file to ids.txt

The index agent periodically checks for the existence of ids.txt. If the file is found, the objects are resubmitted for indexing.

13.   Moving From Index Server 5.3 SPx to D6.5 SPx

 a) Index Server D6.5 can use indexes created by a 5.3 SPx Index Server. This reduces the overhead of indexing the entire repository during upgrade.

 While uninstalling Index Server 5.3 SPx leave the checkbox for deleting the Indexes unchecked. Then during D6.5 Index Server installation point it to the existing indices folder as data folder.

b) While upgrading from 5.3 SPx to D6.5 Index Server the data folder, if the indices from 5.3 are preserved, should be on the same drive as the Index Server home directory.

c) The path for attribute mapping xml is $Documentum\ fulltext\fast as against $Documentum\fulltext\fast40 folder in 5.3 SPx

 This path should be correct in dm_ftengine_config object for index based search to function properly.

d) Index Agent in D6.5 uses 20 consecutive ports as against 1 port in 5.3 SPx. i.e. If Index Agent 1 is running at port 9081 then Index Agent 2 cannot take any ports from 9081 to 9100.

e) D6.5 doesn’t provide an option to switch Index Agent between different modes using Index Agent admin console.

f) The params file for using FT Integrity tool is placed under $Documentum\jboss4.2.0\server\DctmServer_IndexAgentN\deploy\IndexAgentN.war

Accordingly the ids.txt file should be placed under $Documentum\jboss4.2.0\server\DctmServer_IndexAgentN\deploy\IndexAgentN.war\WEB-INF\classes

Hope this Post is useful for all those developers who needed more and step by step info on Index Servers.

%d bloggers like this: