Development

sfSearchPlugin/Chapter2

You must first sign up to be able to contribute.

Version 5 (modified by Carl.Vondrick, 9 years ago)
--

Creating An Index

sfSearch reduces the complexity of searching by using a centralized place to describe everything about the search engine. This configuration describes how documents are added, deleted, and displayed. sfSearch calls this central place an index. The search index provides a simple and efficient method to manage your search system.

To create your first index, run the symfony command:

symfony search:init-index SiteSearch

This command will create the file %SF_ROOT_DIR%/lib/search/SiteSearch.class.php with a skeleton index. After opening this file, you will find a class appropriately named SiteSearch with an empty ->configure() method. We put the search configuration in the ->configure() method.

Services

sfSearch can index virtually any type of data, whether it is a PDF file, a Propel model, or a symfony action. However, we must tell sfSearch how to index the content. sfSearch uses services to describe this behavior. A service controls the indexing behavior for different types of data. Once we tell the service what to do, it takes care of the details.

There are three components to a service:

  1. Identifiers notify the service if it can handle the data type and helps it discover similar types of data.
  2. Builders create indexable documents from the data to be stored in the index.
  3. Retorts enable access to information when displaying search results.

Most of these components are included in separate plugins. For example, sfPropelSearchPlugin contains components to index and return Propel models, and sfSymfonySearchPlugin allows developers to index and return symfony actions. If a plugin does not exist to index your data, it is easy to write your own identifiers, builders, and retorts. See Chapter 5 on how to extend sfSearch.

Create a Service

To create a service, we open the search index file and create a new service object in the ->configure() method:

<?php
class SiteSearch extends xfIndexSingle
{
  protected function configure()
  {
    $service = new xfService(new MyIdentifier);
  }
}

The constructor to xfService takes an identifier.

Identifiers

The job of an identifier is to tell the service what types of data it can index and allows it to discover new types. For example, the sfPropelSearchPlugin contains an identifier that tells the service to index only a certain model and instructs sfSearch how to select all of the models.

sfSearch does not ship with any identifiers to keep the package concise.

Builders

Builders tell a service how to index its content. In essence, builders extract data from some type of object and store it into a format that sfSearch can understand.

A service can have as many builders as the developer requires. For example, one builder might index the fields of the Propel object, while a completely separate builder is used to store related models.

To add a builder to a service, we simply use the ->addBuilder() method:

<?php
$service->addBuilder(new MyBuilder);
$service->addBuilder(new YourBuilder);

sfSearch does not ship with any builders by default. However, sfPropelSearchPlugin provides builders for Propel models and sfSymfonySearchPlugin provides builders for symfony actions. Refer to their respective documentations for usage.

Fields

Most builders take a fields argument that tells the builder what to store. For example, the sfPropelSearchPlugin builder only indexes the fields you tell it to, as otherwise your index would become massive.

sfSearch provides the xfField class to facilitate of passing the field to the index. xfField's constructor first argument is the field name, which is the name that it is stored as in the index. The second argument is the field type. The field type specifies how the engine should store the field. Refer to the following table for the field types:

Type Behavior
STORED Stores value in index to be retrieved later
INDEXED Indexes value in index to be matched in queries
TOKENIZED Breaks value into tokens for natural matching
BINARY Indicates data is raw binary, such as an image

You can use bitwise logic to mix and match the exact requirements. For example:

<?php
$field = new xfField('title', xfField::STORED | xfField::INDEXED);

will create a field that is stored in the index, indexed, but not tokenized nor binary.

If you do like or do not understand bitwise logic, xfField provides some common shortcut constants:

Shortcut Bitwise alternative
KEYWORD INDEXED + STORED
TEXT INDEXED + STORED + TOKENIZED
UNSTORED INDEXED + TOKENIZED
UNINDEXED STORED

You can use shortcuts just as you could use the bitwise versions:

<?php
$field = new xfField('title', xfField::STORED | xfField::INDEXED);
// or
$field = new xfField('title', xfField::KEYWORD);

Some fields are also more important than others. You can set a boost on the field to signify that this field holds more than weight all the others:

<?php
$field = new xfField('title', xfField::TEXT);
$field->setBoost(2);

The default boost is 1. You can use values less than 1 to signify that this field is less important.

Finally, you can register a callback to the field that will the field will use to transform values that are assigned to it. For example, suppose you wish to store the hash of a field in your ORM model. Instead of going out of your way to add a "->getHashedField()" to your model, simply use xfField's callbacks:

<?php
$field = new xfField('title', xfField::TEXT);
$field->registerCallback('md5');

All the fields for 'title' will now automatically be MD5 before being processed by the index.

Note: sfSearch uses the underscore convention when defining fields. Instead of creating the field authorName , use the field author_name .

Retorts

Retorts describe how to display the data indexed by the builders. They provide a convenient means of easily generating data from the results. Like builders, you may register as many retorts as you require.

To add a retort to a service, we can use the ->addRetort() method:

<?php
$service->addRetort(new MyRetort);
$service->addRetort(new YourRetort);

All results returned by $service will have retorts you register for it. As described in Chapter 3, retorts bind to the result objects returned from searching the index and respond to methods. For example, if you call ->getTitle() on the result object, it will search for a retort that can respond to the event "getTitle".

sfSearch ships with three of the most common retorts:

Fields

The most common need for a search engine is to provide an interface to return the fields stored in your search index. For example, you may store the fields "title" and "author" for the books your website sell. In order to be able to retrieve the title and author of a book returned from the search result, we must register xfRetortField retort with the service:

<?php
$service->addRetort(new xfRetortField);

This retort binds to "getXXX" where XXX is any field in your index.

Routes

Usually you want to provide a route for your search results that take the user to more information. To continue the book analogy, the search results might just display the title, author, and a short description of the book. But, with the "more info" link, we are able to provide a page with its ISBN number, table of contents, price, and reviews. The retort to use for a route is xfRetortRoute:

<?php
$service->addRetort(new xfRetortRoute('book/show?isbn=$isbn$'));

The above retort binds to "getRoute" and will return the string "book/show?isbn=$isbn$" where $isbn$ is replaced by the isbn field stored in the index. You can pass the result of this retort to the URL helper, url_for() .

Filters

The final common retort is a filtering retort. A filtering retort wraps around another retort but sends the result through one or more filters before returning it to the caller. For example, you may want to strip all HTML tags from your search results to keep your site safe. We can use xfRetortFilter to register strip_tags() as a filter around the fields:

<?php
$retortFilter = new xfRetortFilter(new xfRetortField); // the constructor will take *any* retort
$retortFilder->registerFilter('strip_tags');
$service->addRetort($retortFilter);

The above retort will bind to whatever retort it wraps in the constructor.

Service Registry

After we have configured a service, we must register it with the service registry. The service registry holds all the services.

To register a service in the service registry, we follow the syntax:

<?php
class SiteSearch extends xfIndexSingle
{
  protected function configure()
  {
    $service = new xfService(new MyIdentifier);

    $service->addBuilder(new MyBuilder);
    $service->addBuilder(new YourBuilder);

    $service->addRetort(new MyRetort);
    $service->addRetort(new YourRetort);

    $this->getServiceRegistry()->register($service);
  }
}

$this->getServiceRegistry() returns an xfServiceRegistry object that holds all services. The service registry interacts with sfSearch directly to implement all of the functionality described here.

Engine

If services tell sfSearch what and how to index data, we must also tell sfSearch where to store the index. We use sfSearch's search engine abstraction layer (SEAL) to accomplish this, which does the heavy lifting of tokenizing and matching search results for us. Every search index must contain an engine.

As you can probably guess, sfSearch does not ship with any useful engine, however sfLucenePlugin provides a search engine which integrates Zend_Search_Lucene with sfSearch.

To bind your engine to your index, simply follow the syntax:

<?php
class SiteSearch extends xfIndexSingle
{
  protected function configure()
  {
    // service registry is setup here

    $this->setEngine(new MyEngine);
  }
}

An index can only hold one engine and all engines must implement the "xfEngine" interface.

Index Groups

If your project has multiple search engines that share a common service registry, you may consider using an index group. An index group combines multiple indices together.

For example, if your web application supports three cultures, you likely need one index for each culture. But, as you want to convey the same information no matter the culture, you use a similar service registry for each index. xfIndexGroup can automate this process.

To create a group, run the symfony command:

symfony search:init-index --group SiteSearchGroup

This will create the file lib/search/SiteSearchGroup.class.php and will look nearly identical to a single search index.

However, there are three main differences between a group index and a single index:

  1. A group index does not have an engine, while a single index does.
  2. A group index holds multiple indices, while a single index holds just one.
  3. A group index does all its operations on all of the indices, while a single index just operates on its own.

When creating a group, we must specify a service registry and bind the child indices. Consider the following example:

<?php
class SiteSearchGroup extends xfIndexGroup
{
  protected function configure()
  {
    $service = new xfService(new MyIdentifier);
    $service->addBuilder(new MyBuilder);
    $service->addRetort(new MyRetort);
    $this->getServiceRegistry()->register($service);

    $this->addIndex('en_US', new SiteSearchEnglish);
    $this->addIndex('fr_FR', new SiteSearchFrench);
  }
}

Both SiteSearchEnglish and SiteSearchFrench will automatically inherit the service registry, where they can modify it and extend it to match their requirements depending on the locale.

The method ->addIndex() takes two parameters. The first parameter is an internal name that the group uses to identify the child index. This name is necessary for when you retrieve that specific index. Do not confuse this name with an individual index name. The second parameter takes an object instance of xfIndex, which can be either a group index or single index.

You are not limited to just child indices -- you can also add other child groups. For example:

<?php
// SiteSearchEnglishGroup.class.php
class SiteSearchEnglishGroup extends xfIndexGroup
{
  protected function configure()
  {
    $this->addIndex('public', new SiteSearchEnglishPublic);
    $this->addIndex('private', new SiteSearchEnglishPrivate);
  }
}

// SiteSearchFrenchGroup.class.php
class SiteSearchFrenchGroup extends xfIndexGroup
{
  protected function configure()
  {
    $this->addIndex('public', new SiteSearchFrenchPublic);
    $this->addIndex('private', new SiteSearchFrenchPrivate);
  }
}

// SiteSearchGroup.class.php
class SiteSearchGroup extends xfIndexGroup 
{
  protected function configure()
  {
    $this->addIndex('en_US', new SiteSearchEnglishGroup);
    $this->addIndex('fr_FR', new SiteSearchFrenchGroup);
  }
}

Using groups, you can create complex group hierarchies to keep your indices DRY.

Summary

sfSearch uses a search engine abstraction layer (SEAL) to enable efficient search results and a service registry to manage your index. Using external plugins, it is possible to index any type of data.

Return to Chapter 1

Continue to Chapter 3