Koha search #1

For people used to web searches and database oriented application Koha way of dealing with searches (since v.3) might seem a bit alien. This is a small set of notes related to some not very obvious questions which I had to answer in order to do some configurations which seemed trivial at the beginning. I am not saying this is wrong or ineffective, just different and different means time to learn. I’m writing this to help you reduce this time.

Search does not happens in the database

…and reindexing is done on fixed intervals

In fact Koha beside stores the content in the database but does not use the database for searching. Instead all new content is indexed into another application called Zebra and for those who already have dealt with Koha in previous versions imagine Zebra as your own Z39.50 server, which it fact it’s what it is. So one of the first things to know is that reindexing is not automatic or triggered by a database modification (what happened to zebraqueue I don’t know). Instead re-indexing is done upon fixed intervals and is triggered by an external cron launched script. This is why: If you add a biblio you will not find it immediately in search, you have to wait for the reindexing to take place and this can be quite annoying if you need for example to add an authority and reuse it on the spot when you add your biblio. Of course you can schedule the reindexing to take place quite often to make the interval short but a delay will exist nevertheless.

Search means PQF

Since Z39.50 is very very old it must have been designed by someone who was teaching computer science and could not imagine people could not know polish notation so expect some very strange queries. But first thing first, what are these queries? In fact Koha communicates with Zebra using a query language which is not quite meaningful at the beginning. And this is not just for searching but for everything such as selecting a biblio or authority by id. You can find these searches if you look in koha/var/log/koha-zebradaemon-output.log. There are a lot of documents on the net but as I said it’s a different approach. You will find the grammar of the language but very few explained examples. Let’s have one:

Assume you are doing a keyword search for “marketing”, here is the pqf for it:

@attrset Bib-1 @attr 1=1016 @attr 4=6 @attr 5=1 marketing

in translation, search:

  • using attributes defined in Bid-1 (ie. the file koha/etc/zebradb/biblios/etc/bib1.att) and
  • in field 1016 (@attr 1=1016) which means any, which means the “Any” index
  • where the term can be a word list (@attr 4=6)
  • using right truncation (@attr 5=1)

But how can you do an or?

Let’s assume you want to search in Author or Title or Subject, guess the pqf?

@attrset Bib-1 @attr 4=6 @attr 5=1 @or @attr 1=1003 marketing @or @attr 1=4 marketing @attr 1=21 marketing

Now you understand why I mentioned the polish notation. But here are some links which you might find useful also:

And how to test my wonderful search?

You can use yaz-client as for any Z39.50 server. Just connect locally (on unix: socket) and select the biblios (data)base

yaz-client unix:/usr/share/koha304/var/run/zebradb/bibliosocket
Z>base biblios
Z>f @attrset Bib-1 @attr 4=6 @attr 5=1 @or @attr 1=1003 marketing @or @attr 1=4 marketing @attr 1=21 marketing

Forget PQF, it’s time for CCL

Where you happy to have learned this wonderful new language? Well, don’t be impatient. The actual search is done using CCL, a newer type of language which is actually converted in PQF before execution. So if you where wondering what: kw,wrdl: marketing means you just found it. It’s CCL. Don’t worry you did not learned PQL for nothing since there are a lot of places where PQF it’s used directly. Just search for @attr in koha/lib/C4/*.pm and you will find a lot of results. But not for the searches. The CCL equivalent of the above PQF query is:

au,wrdl: marketing or ti,wrdl: marketing or su,wrdl: marketing

Somewhere in Search.pm it gets converted into PQF and sent to Zebra.

But where is MARC?

You might ask what are these 1016, 1003, 4 or 21. They don’t look like some known MARC fields. In fact they are not, they are Zebra indexes (remember the “Any” index?). And this is another thing to get your head bumped into but as your head might be spinning already it deserves a different post.

Related Posts with Thumbnails
Be Sociable, Share!

2 Responses

  1. [...] different and different means time to learn. I’m writing this to help you reduce this time. [Read Part 1 & Part [...]

  2. Thanks for the nice article though I still have a problem when trying to search, nothing comes though I have cataloged them already.

    What am I supposed to do in order to be to search for the items in my Library?

    BR.

    Emmanuel.

Leave a Reply

*