Start at the right end
Choosing a search solution is one of the key decisions in your search strategy.
Don't start at the wrong end by looking for The Newest Hottest Search Engine or The One With The Most Bells And Whistles. Remember, you are trying to find a search solution that best fits your needs - not making your needs fit the search solution.
There is rarely a One-Size-Fits-All solution. You’ll also possibly discover that the most expensive/advanced search solution is not necessarily the best one for your needs.
Keep in mind:
- Don’t expect to find a conclusion at the end of this stating which solution you should choose – that decision is up to you, but having a checklist helps define your requirements.
- These are pointers to choose an underlying search technology. A great search experience also requires product customization and user interaction design, which is out of this scope.
- Most search vendors listen to customers’ needs. If you have a requirement that is currently unsupported, there’s a good chance they can fix it for you.
- Disclaimer: Epinova does not endorse or represent any of the vendors mentioned.
Define your requirements
There is no point considering search solutions that don’t fit your needs. For instance, if it lacks features you deem critical, or the solution is out of your price range, you’re wasting your time with it. If you only need 10% of its features, you’re probably better off choosing another.
- First, define your requirements to narrow your options down to a manageable list. I'm not just talking about which fancy tricks the search engine can do with your search results, but also more fundamental issues like budget, technology platform and infrastructure. Think big picture.
- Next, prioritize your requirements. Their priority levels will vary somewhere between "critical" and "optional", and will enable you to create a matrix to filter out search solutions that don't match.
Determine your target audience
When considering a product, ask yourself for whom it will add the most value:
- Your customers/end-users, who might get more relevant search results and complete their tasks quicker?
- Your site editors/admins, who might get better tools and reports to tweak content and indexing?
- Your developers, who might be able to implement more flexible, useful and advanced features in a shorter amount of time?
What to look for in a search product
Type of site
This is a vague criteria, but the type of site you’re building might give an indication of how important search is to your site and what kind of search engine you'll be needing.
- For an enterprise site, the ability to index all kinds of documents across many platforms and systems will be important.
- For a commerce/service-driven site, the ability to index, cross-reference and present metadata will be important.
- For a small-to-medium informational site, search might just be secondary to navigation and you'll get far with basic content/file indexing and decent weighting control, plus maybe some free plugins to spice up your search results page.
EPiServer sites tend to use the page tree structure as basis for content and navigation. But there are also navigational concepts where content is not as tightly bound to the location in the page tree. Think sites where pages are decorated with tags/categories and navigation is based on these tags. In such cases, you’ll need a search solution that can index and serve pages based on this kind of property.
On-premise search vs hosted search
On-premise search (or On-site search) means a search feature installed on a (local) web server that you control. The search solution will often be either a plugin integrated directly with your EPiServer site, or a connector to a standalone service installed on your web server. Indexes are located in your server domain, and your search feature will have direct access to these through a local API/connector. Some examples in this category are EPiServer Full Text Search, EasySearch, Google Search Appliance (formerly Google Mini), Forward Search and IntelliSearch. Many also choose to setup Apache Solr locally though it’s also available in various hosted flavors, like WebSolr.
Hosted search (or Search-as-a-Service) means a service hosted on a remote server controlled by the search vendor. Indexes will be located on the remote server, and you make queries from your site via the vendor's web service-based API. Examples in this category are EPiServer Find, SiteSeeker, and Ankiro.
Note that some vendors that mainly do hosted search, may also provide an on-premise version (at a premium price).
Factors to consider:
- Privacy/Security: How much control do you need over your search data?
With an on-premise search, you have more control, as data from your indexes never leaves your domain.
With hosted search, your indexes are stored remotely and search data leaves your domain. Search vendors should be able to clarify how they handle security and privacy.
- Maintenance: Will you operate your own servers?
With on-premise search, you have to fund and operate the servers where your search solution and indexes are hosted, and make sure they can cope with the traffic load and ever-expanding index size.
With hosted search, they take care of all that for you.
- Hardware: Some on-premise search solutions require that you install proprietary hardware (like the Google Mini server).
- Free/Premium: Would you sacrifice control for price?
Some hosted search (like Google Custom Search) allow you to integrate a search solution for free, but you will have virtually no control over indexing and presentation, your search data will be exposed, and your visitors will be served ads unless you upgrade to Premium.
With EPiServer, your web server is already .NET based. When evaluating search solutions, consider whether they are compatible or will require another runtime environment. (These factors are mostly relevant for on-premise solutions.)
Factors to consider:
- Technology: Is the search solution .NET or Java based?
- OS: Will the search solution work on Windows Server / Azure or require another OS?
- Web server: Will the search solution run on IIS, or will it require Apache?
- Hardware: Does the search solution require specific hardware?64-bit: Does the search solution support (or even require) 64-bit servers?
- Connectors: Are there ready-made connectors to EPiServer, will you need to build custom connectors – or are none required?
- Competence: Does your organization have the necessary competence in the technologies involved, or will you require external assistance/training?
Customization and support
No search solution fits perfectly out of the box, and you’ll most likely want to tweak some features. In addition to customization, you need to consider what kind of technical support you can expect from the vendor.
The big enterprise search vendors tend to be rich on functionality, with good documentation and support (necessary due to the number and size of their customers). But you might have a hard time getting them to implement new features, due to the complexity of the product and long release cycles. In general, there is a good availability of connectors for various CMS and platforms.
Smaller search vendors tend to have adequate all-round functionality, and rather specialize in a few features that make them stand out from the competition. Documentation and support varies from very good to quite poor, in my experience. However, smaller search vendors are generally more flexible regarding new features. Connectors for various CMS and platforms are generally more specialized rather than available in “all flavors”.
Factors to consider:
- Proprietary/open source: Is the product provided “as is”, or can you download and change the source code to implement your own features/changes? If open source, does the usage license require you to make your own changes freely available too?
- Standards: Is the product built on established industry standard technologies, cutting-edge new technology or custom made proprietary frameworks and components?
- Third party components: Is it a complete standalone product, or is it built on top of other third party components? For example, EPiServer Find is built on ElasticSearch, which in turn is built on Apache Lucene. Find is not unique in this sense though; many search products use third party components for one or more of their features. On one hand you could say that reusing well-established building blocks is a wise move - on the other hand, the more external depencies, the higher the risk of a weak link in the chain.
- Tools: Are there flexible developer APIs, tools for index customization or analysis/optimization available? How about plugins for EPiServer editors or administrators to tweak indexing or check statistics?
- Support & community: Can the search vendor provide adequate support? Is there extensive and up-to-date documentation? Are there any online communities where users can share code samples and experiences? Any demo sites available? Does the search vendor offer training/certification for developers/editors?
- Future updates: How frequently does the vendor add bug fixes or new features? Can you be sure that product development won’t be discontinued?Extensibility: Can the product be extended with add-on features or integrated with other EPiServer plugins to add value to your search?
Your content will most likely be EPiServer pages, all kinds of uploaded file types in your VPP or other file source, and perhaps even content found in other integrated systems. The search product’s ability to index it all will be crucial.
Factors to consider:
- Cross-platform indexing: How well does the product find and index across multiple platforms and systems, like Sharepoint, Active Directory, ImageVault etc? Will items from different systems be presented as separate indexes/reports, or in one big common report?
- Federated index: Will content from various systems be presented as separate object types, will they be converted into objects of least common denominators (which might mean items may not expose all their unique metadata) or will they be indexed with all their unique metadata intact?
- Binary text formats: Does it support indexing the contents of binary text formats like Microsoft Office (docx, xlsx, pptx) or Adobe (pdf), or will you have to install additional format support/filters (like IFilter)?
- Metadata: Will only superficial properties like page name, heading, textual content, file names be indexed, or will the search product retrieve metadata as well? Examples of metadata can be found in image files (location, camera model, orientation etc), office documents (author, version, copyrights etc), file uploads in EPiServer (fields from filesummary.config etc).
- Non-plaintext: Does the product index content in non-plaintext form, like text within images (using Optical Character Recognintion techniques) or XML/HTML-encoded text?
- Crawler vs event-driven: Does the search product use a crawler or event-driven indexing?
A crawler will dig through your content at regular intervals, adding new content to the index on each run. Obviously, this can cause a delay between when new content is published and when it's indexed.
Event-driven indexing will trigger automatically on specific EPiServer events, like new file uploads or published pages, adding items directly to the index.
Price obviously plays a big part in most decisions in web projects. Some search products are expensive because of their enterprise features or hardware requirements. Others may come with a cheap or free license, but will require implementation and customization costs.
Factors to consider:
- Open Source vs Licensed: Enterprise products often have a large initial cost, but includes a lot of features, tools, support and documentation. Open Source products may have a very low (or none) inital cost, but will typically require more developer hours in implementation and customization.
- License models: Does the vendor offer incremental license models to accomodate your site type and size? Does the price model allow your site to scale up without having to reimplement the search product?
- Expected Development costs: Are there any reference cases that document how long it takes to implement the search product?
- Support costs: Is support included in the price model, or cost extra? Does the vendor have a dedicated support department? Or, as if often the case with open source products; will you have to rely on community forums and online documentation for support?
Here's a list of common and popular features to look for in a search product.
- Globalization support: Indexing content from all the different languages in your EPiServer site. Indexing non-latin alphabets like arabic, cyrillic, greek, chinese requires UNICODE support.
- Multiple index support: Keeping separate indexes for different subsites. Might include ability to query indexes individually or merged as needed.
- Facets: Presenting categorized search results. May include "drill-down/up" functionality (narrowing/broadening the search results criteria to reveal less/more hits).
- Synonyms: Understanding that different words may have the same meaning, like "car" and "automobile".
- Best bet: Promoting specific search results that are most likely to be what the user is looking for.
- Document weighting: Influencing how individual files/documents (or even individual properties/metadata in an EPiServer page/file) are weighted.
- Keyword weighting: Influencing the weight of specific keywords in your content, which in turn will affect how the pages/files they occur in are weighted.
- Custom field mapping: Specifying which EPiServer properties should be indexed or not.
- Security model support: Supporting EPiServer's built-in permissions system for roles and users, so that search results are automatically filtered for each visitor.
- Spellchecking: Recognizing misspelled search terms and mapping them to their correctly spelled equivalent (typically used for "did you mean...?" features)
- Conceptual relations: Understanding that different words may be related to the same concept, like "transportation" and "vehicle".
- Load balancing support: Supporting indexing of EPiServer sites that are set up in a load balanced environment. Alternatively, supporting load balancing of the search service/index itself, creating redundancy.
- Match highlighting: Highlighting/emphasising the user's search terms in the search results to help the user visually evaluate their relevance
- Geo/spatial search: Using geographic data (user's location, time/distance between objects, etc) to determine the relevance of search results (typically used for "closest point of interest" functionality)
- OCR/binary search: Recognizing textual content in non-text formats, e.g. extracting text from an image or proprietary formats like doc, pdf etc.
- Case & archive systems: Integration and indexing from popular specialized systems commonly used by governmental instances.
- Statistics/diagnostics: Tools and reports that help site owners or developers tweak their content, indexing setup and search strategies.
The TL;DR version: Don't be blinded by flashy features, focus on what your target audience requires, and stay away from poorly supported products.
If you don't have time to consider the concepts listed herein, good luck with your Randomly Selected Search Solution.
If you do take the time to properly evaluate the available products, you'll avoid paying for features you don't need, your users will have a better search experience, and you can focus your efforts more on optimization instead of support.