-
Notifications
You must be signed in to change notification settings - Fork 0
Solr
- UI Query Screen
- Introduction (youtube)
Solr Instance
- multiple instances can run ('multiple solr instances are running')
- deploy webapp on multiple servers, each of which is an instance
Solr Core
- each solr instance can have multiple cores
- also referred to as Solr Index, or simply Core or Index
- implemented in a databases
- generally, each core runs in isolation, but can configure some communication between cores via CoreContainer
Document
- 0..m documents live in a core
- basic unit of information
Field
- 0..m fields live in a document
- various types: text, numeric, date, etc.
- type tells solr how to interpret the field and how it can be queried
- type: String stores a word/sentence as an exact string without performing tokenization etc. Commonly useful for storing exact matches, e.g, for faceting.
- type: Text typically performs tokenization, and secondary processing (such as lower-casing etc.). Useful for all scenarios when we want to match part of a sentence.
Facet
index via...
- Request Handlers & Update Handlers (via HTTP POST/PUT)
- default: XML, Binary, JSON, CVS, etc.
- can define own handlers in config
- Index Handlers
- import from databases
- Solr Cell framework (???)
- custom Java application to ingest data through Solr's Java Client and other apps
update processors
- signature
- logging
- indexing
<!-- solr.SearchHandler -->
<requestHandler name="standard" class="solr.SearchHandler"> <!-- /select -->
<requestHandler name="search" class="solr.SearchHandler" default="true">
<requestHandler name="permissions" class="solr.SearchHandler" >
<requestHandler name="document" class="solr.SearchHandler" >
<!-- solr.UpdateRequestHandler -->
<requestHandler name="/update" class="solr.UpdateRequestHandler" />
<!-- other handlers -->
<requestHandler name="/replication" class="solr.ReplicationHandler" startup="lazy" />
<requestHandler name="/analysis/field" startup="lazy" class="solr.FieldAnalysisRequestHandler" />
<requestHandler name="/admin/" class="org.apache.solr.handler.admin.AdminHandlers" />
<requestHandler name="/admin/ping" class="solr.PingRequestHandler">
To see what a requestHandler returns, change the value of qt from /select to the name of the handler in the solr admin Query page. NOTE: You will need to change the host to your solr admin host and may need to change the name of the core from development to the name or your core.
- receive XML, JSON, CSV, or binary (via HTTP GET)
- request handlers (via HTTP GET)
- default: /admin, /select, /spell
- can define own handlers in config
- search components
- query
- spelling
- faceting
- highlighting
- statistics
- debug
- clustering
- search process (see Common Query Parameters)
description | default | example | |
---|---|---|---|
qt | selects Request Handler for a query using /select | DisMaxRequestHandler | |
defType | selects a Query Parser for the query | parser configured in Request Handler | |
q | field_name:field_value with * as wildcard to search for | : | q=title:Archery |
fq | filters query by applying an additional query to the initial query's results, caches the results (same syntax as q) | : | fq=popularity:[10TO*]& fq=section:0 |
sort | sort field | score desc | |
start | an offset into the query results where the returned response should begin | 0 | start=0 |
rows | the number of rows to be displayed at one time | 10 | rows=20 |
fl | fields to return in result | all | fl=id, name |
df | default field name (I think) that indicates field to search | all indexed fields | df=description |
wt | selects a Response Writer for formatting the query response | xml | json | wt=json |
qf | list of fields and the "boosts" to associate with each of them when building DisjunctionMaxQueries (see also SOLR df and qf explanation) | all indexed fields are required (???) | qf=title^20 description^10 |
High Level
- Advanced Full-Text Search
- Optimized for High Volume Web Traffic
- Standards Based Open Interfaces - XML, JSON, HTTP
- Comprehensive HTML Admin Interfaces
- Service statistics exposed over JMX for monitoring
- Near Real-time indexing and Adaptable with XML configuration
- Linearly scalable, auto index replication, auto, extensible plugin architecture
Specific Features
- faceting
- highlighting
- spell checking
- query-re-ranking
- transforming
- suggestors
- more like this
- pagination
- grouping & clustering
- spatial search
- components
- real time (get & update)
- labs
- schema.xml
- field types
- etc.
- solrconfig.xml
- register Request Handlers for querying the index
- register Update Handlers for indexing documents
- register Event Handlers for searcher events (e.g. queries to execute to warm new searches)
- activate version-dependent features in Lucene
- Lib directives indicates where Solr can find JAR files for extensions
- Index management settings
- Enable JMX instrumentation of Solr MBeans
- Cache-management settings
- solr.xml
- core.properties
Defined in schema.xml
Reference: schema.xml
defined by <types><fieldType>...</></>
postfix code | meaning |
---|---|
t | text (tokenized) |
te | english text (tokenized) |
s | string |
i | integer |
it - trie integer | |
f | float |
ft - trie float | |
l | long |
lt - trie float | |
d | double |
dt | trie double |
b | boolean |
dt | date |
dtt - trie date | |
ll | location |
coordinate | trie double to index lat and long of a location with indexed=true/stored=false |
NOTE: letter indicates the postfix indicator that sets the type for Hyrax dynamic fields. Ex. name_tsi means that name has type="text"
defined by <fields><dynamicField>...</></>
postfix code | parameter | impact | meaning if true |
---|---|---|---|
s | stored | sets stored to true | when true, value is returned in solr document |
i | indexed | sets indexed to true | when true, value is searchable |
m | multiValue | sets multiValue* to true | when true, can have multiple values |
v | termVectors | sets termVectors to true | ??? |
v | termPosition | sets termPosition to true | ??? |
v | termOffsets | sets termOffsets to true | ??? |
NOTE: letter indicates the postfix indicator that sets that parameter to true for Hyrax dynamic fields. Ex. name_tsi means that name has stored=true,indexed=true
stored="true" indexed="false"
- destination URL
- file system path
- time stamp
- icon image
- sort string - have a name that is tokenized text with stored=false/indexed=true and this field that is the exact string for sorting
stored="false" indexed="true"
- bag of words - want to be able to search for all terms in the bag, but don't want them in the solr document search results
- common misspellings - allow common misspellings to match in search, but don't include in solr document search results
indexed="false" stored="false"
- Use this when you want to ignore fields. For example, the following will ignore unknown fields that don't match a defined field rather than throwing an error by default.
<fieldtype name="ignored" stored="false" indexed="false" />
<dynamicField name="*" type="ignored" />
- horizontal scaling (for sharding and replication)
- elastic scaling
- high availability
- distributed indexing
- distributed searching
- central configuration for entire cluster
- automatic load balancing
- automatic failover for queries
- zookeeper integration for coordination & configurations
Return all results with search term = "book"
http://localhost:8983/solr/#/development/select?q=book
NOTE: Examples use stream.body to show how to do this through a URL. Usually done via HTTP POST.
Delete by ID
http://localhost:8983/solr/#/development/update?stream.body=<delete><id>SOLR1000</id></delete>
http://localhost:8983/solr/#/development/update?stream.body=<commit/>
Delete by Query
http://localhost:8983/solr/#/development/update?stream.body=<delete><query>cat:software</query></delete>
http://localhost:8983/solr/#/development/update?stream.body=<commit/>
Steps to delete all via Solr Admin UI
- In Solr UI, select core to effect from selection box on left side menu
- select Documents on left side menu
- set Document Type = XML
- set Doucment(s) text area to
<delete><query>*:*</query></delete>
- leave commit within and overwrite as defaults
- Submit
Delete All in Hyrax
require 'active_fedora/cleaner'
ActiveFedora::Cleaner.clean!
Delete All in Valkyrie-Solr in Hyrax
conn = Valkyrie::IndexingAdapter.find(:solr_index).connection
conn.delete_by_query('*:*', params: { 'softCommit' => true })
Search for a specific field, category, containing a search term, book
http://localhost:8983/solr/#/development/select?q=category:book
Search for price between 0 and 400, inclusive
http://localhost:8983/solr/#/development/select?q=price:[0 TO 400]
Limit search results to return only fields id, name, and price.
http://localhost:8983/solr/#/development/select?q=book&fl=id,name,price
Return facets for a specific field, category, with counts for each value of category based on the search results.
http://localhost:8983/solr/#/development/select?q=book&fl=id,name,price&facet=on&facet.field=category
Partial Response as relates to returned facet information.
<lst name="facet_counts">
<lst name="facet_queries" />
<lst name="facet_fields">
<lst name="category">
<int name="book">10</int>
<int name="video">2</int>
<int name="audio">2</int>
</lst>
</lst>
<lst name="facet_dates"/>
</lst>
Return facets for a specific field, category, with specific value for category, book, with counts for each value of category based on the search results.
http://localhost:8983/solr/#/development/select?q=book&fl=id,name,price&facet=on&facet.field=category&fq=category:electronics
Partial Response as relates to returned facet information.
<lst name="facet_counts">
<lst name="facet_queries" />
<lst name="facet_fields">
<lst name="category">
<int name="book">10</int>
<int name="video">0</int>
<int name="audio">0</int>
</lst>
</lst>
<lst name="facet_dates"/>
</lst>
NOTE: Can include multiple filter queries (fq).
NOTE: When filter query is applied, all categories are still listed, but now have 0 for count if they don't include the filtered value.