Sofware in Review
Tech news
at TheJemReport.com
Software reviews
at SoftwareinReview.com
Hardware reviews
at HardwareinReview.com
Discuss technology
at TJRForum.com
Sofware in Review → Office/business → Search tools →

dtSearch 7.0 review

By Jem Matzan

When you reach a point of data supersaturation -- when the amount of data you have is so great that it has become unmanageable -- you need a tool to sort through it. Most individual computer users will never reach this point because single user data tends to be replaced or deleted when it becomes irrelevant. Businesses, on the other hand, often need to keep every document, email, log file, and chat session on file for many years. Even when your company is well organized, you still will need software to index and search the data. That's where dtSearch comes in -- it's a suite of desktop and network search tools that can end the hassle associated with managing large amounts of data.

The dtSearch suite

dtSearch is composed of several different parts: dtSearch Desktop, CD Wizard, Indexer, and dtSearch Web.

The Indexer creates indexes for dtSearch. You select which drives or directories you want to include in the index, then create or update it. You can maintain several different indexes, if necessary. Indexing is a fairly slow process -- even on a fast machine -- because it is disk-intensive, and the hard drive is the slowest part of the computing equation. For businesses serious about using dtSearch on a single server (as opposed to a cluster), I highly recommend a dedicated RAID 0+1 or RAID 5 array for safe and efficient storage of the index. Each index will be large -- nearly the size of the data being indexed. I created a test index of a 4.4GB portion of my /home directory and came out with a 3.3GB index. Indexes are limited to 1TB in size, but a program included with the dtSearch suite can create a meta index out of several indexes, thereby eliminating the 1TB limit.

Click here for a screen shot of dtSearch Indexer

Once you have an index, the other tools in the suite can be used. To search a shared or local machine's index, you'd use dtSearch Desktop. It's a standalone application that uses a variety of search methods to crawl through a selected index: fuzzy, phonic, natural language, boolean logic, and proximity. dtSearch Desktop will search through nearly any kind of file: HTML, XML, TXT, PDF, Word DOC, Excel XLS, PowerPoint PPT, WordPerfect WPD, RTF, ZIP, and email MBOX files. It'll even find readable text strings in binary files.

Click here for a screen shot of dtSearch Desktop

The CD Wizard creates searchable indexes for backup CDs and DVDs. On each disc is an index and a self-contained search tool that resembles a Web browser. From there you can search the contents of the disc to see what's on it. The CD Wizard doesn't write directly to discs or create ISO images -- all it does is provide a directory structure and search mechanism for the disc project. So to write the CD, you'd start your preferred CD writing application and start a new data project. Then you'd select the data and directory structure from the folder you created with the CD Wizard, and write the CD. When the CD is finished, you can put it into any Windows-based computer and it'll autorun the dtSearch engine interface to navigate the CD's contents.

Click here for a screen shot of CD Wizard

dtSearch Web is a Web-based search tool that converts all readable files (listed above) into basic HTML pages for online viewing, much like Google converts PDF documents into Web pages in its search results. The big downside to this program is its dependence on Microsoft's IIS Web server, which has historically been a security nightmare. It also requires that the server run Windows which, according to Netcraft, has barely 20% of the Web server market.

Click here for a screen shot of dtSearch Web

Also included with the dtSearch suite is a diagnostic tool that analyzes system information and compiles it all into an XML-formatted report. From within the diagnostic tool you can combine these reports and dtSearch logs into a zip file. If there are problems with the index or with the software, this makes it very easy to email all of the necessary information to the system administrator.

Lastly, there is a small program that comes with dtSearch which extracts readable text strings from binary files.

dtSearch Java, .Net, and C++ libraries

dtSearch can be integrated directly into Java, C++, C#, ASP.NET, Delphi, and .Net programs. I tried to test the Java and C++ demonstration code on 64-bit Gentoo Linux, but they wouldn't execute due to library dependency problems. I suspect this may mean that the dtSearch code is not 64-bit clean; there is no 64-bit edition of any part of the software available at this time.

More information on the dtSearch APIs can be found here.

Summary

Overall I found dtSearch 7 to be extremely useful as a desktop and network search tool. It goes far beyond the puny standalone applications meant for consumer-grade searching of small amounts of data. dtSearch is a relatively unique enterprise search product that is easily implemented; a fool could set up and use dtSearch, and any moderately experienced developer could implement the search libraries in a program. Considering all it can do and its lack of per-seat licensing requirements, dtSearch is quite reasonably priced.

Don't ignore the need for robust storage with dtSearch. If you're going to share indexes over a network, you will at very least need a decent dedicated server with SCSI or SATA RAID. If you're sharing several large indexes over a network, a dedicated cluster may be necessary.

The end-user tools are intuitive and powerful; with a small degree of training, any user should be able to expertly search through indexes of company data. Once the indexes were built and the software set up and configured properly, I found the dtSearch Desktop and dtSearch Web interfaces to be just as easy -- if not easier -- to use as a common Web search engine like Google or Excite.

Despite the time it takes to create indexes, information retrieval on local searches is incredibly speedy -- faster than searching Google over a broadband connection. Text inside of a variety of different formats is displayed in a viewer, so you don't need Microsoft Word to search inside of Word documents. In short, it's hard to find any significant shortcomings in dtSearch 7.0.

Purpose Desktop search tool and programming libraries
Manufacturer dtSearch Corporation
Platforms and architectures Microsoft Windows 95/98/ME/NT/2000/XP/2003, but the search engine libraries are cross-platform
License Proprietary, restrictive in all the usual ways. Unlimited client access.
Market Large businesses, developers in need of a third-party search algorithm
Price (retail) US $1000 for a single-server license; $2500 for a three-server license; $200 for the desktop-only version; upgrades from version 6.x are free
Previous version dtSearch 6.x
Product Web site Click here