Skip to content

Commit

Permalink
nested-complex-facets (#68)
Browse files Browse the repository at this point in the history
* make 'keyword' the default index type

* allow for 'nested' fields in config

* fix /indices to deal with nested config

* translate nested queries to ES

* make multiFacet aux queries 'nested' compliant

* accept new style order/size in aggregations

* translate nested aggregations query, including new style order/size params

* remove redundant semicolon

* remove unused WIP classes

* Abstract deep nested ES result to bitesize 🥦

* inline var (logging no longer needed)

* Fix use map entry, not its size (!)

* Del spurious newline

* Bump version

* Remove debug toString()

* cleanup

* Fix (re-)ignore 'curTerm' when doing aux queries

* Bump version

* Upgrade base image (fix several CVEs)

* Make multiFacetCountQueries work with nested facets

* multiFacetCountQuery should only request aggs for its own term

* bump version

* simplify predicate lambda/let

* bump version (partial fix)

* Fix multiFacetCountQueries to work with nested terms

* Handle ES nested multiFacetCountQueries aggregation counts

* bump version

* remove obsolete mondriaan ignore

* WIP: untangle configuration-derived mess to prep for ES query building

* WIP: build 'main query' part of ES query for logical facets

* WIP: build 'main query' part of ES query for logical facets

* WIP: "filters" part of aggregation works, on to "aggs" part

* WIP: "aggs" portion works, left to do: the "size" and "order" spec

* WIP: add size+order spec

* WIP: refactor>extract code clone

* WIP: rename sortSpec

* WIP: return path from ES: start parsing result

* WIP: halfway extracting buckets; time to piece together the facetName

* WIP: fix some typing issues

* WIP: first working version of ES return mapping logical facet aggregations

* WIP: cleanup some debug prints

* WIP: make it work for configs without fixed field

* version bump (0.40-xxx-7a)

* Fix building queries for facets with no fixed value

* version bump (0.40-xxx-7b)

* Use deep value_count aggregation to get 'document' count for sorting nested facets by count

* bump version (7c)

* FROM and AS require same casing to satisfy linter

* WIP: escalate 'size' to largest found in aggSpecs

* WIP: tidy scope merge code a bit

* cull aggregation results when less requested than returned by ES

* bump version

* nested-facets-8b: remove debug print

* WIP: add multiple sort expressions to ES query when logical facets mapping to the same nested facet require different sorts

TODO: extract the correct portions on ES return

* WIP: suppress another unchecked cast

* WIP: extract ES results according to query desires

* Remove dev prints and bump version

* don't recompute aggs for logical facets themselves

* Use latest republic AR container
  • Loading branch information
hayco authored Nov 26, 2024
1 parent 6e729b2 commit 63b5f11
Show file tree
Hide file tree
Showing 13 changed files with 753 additions and 231 deletions.
1 change: 0 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -19,5 +19,4 @@ target/
*.log
globalise.http
/globalise.http
/mondriaan.http
/config.yml-
148 changes: 94 additions & 54 deletions config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -58,13 +58,10 @@ projects:
fields:
- name: bodyType
path: "$.body.type"
type: keyword
- name: invNr
path: "$.body.metadata.inventoryNumber"
type: keyword
- name: document
path: "$.body.metadata.document"
type: keyword
textRepo:
uri: https://globalise.tt.di.huc.knaw.nl

Expand Down Expand Up @@ -112,40 +109,28 @@ projects:
fields:
- name: bodyType
path: "$.body.type"
type: keyword
- name: lang
path: "$.body.metadata.lang"
type: keyword
- name: type
path: "$.body.metadata.type"
type: keyword
- name: anno
path: "$.body.metadata.anno"
type: keyword
- name: country
path: "$.body.metadata.country"
type: keyword
- name: institution
path: "$.body.metadata.institution"
type: keyword
- name: msid
path: "$.body.metadata.msid"
type: keyword
- name: period
path: "$.body.metadata.period"
type: keyword
- name: periodLong
path: "$.body.metadata.periodlong"
type: keyword
- name: letterId
path: "$.body.metadata.letterid"
type: keyword
- name: correspondent
path: "$.body.metadata.correspondent"
type: keyword
- name: location
path: "$.body.metadata.location"
type: keyword
annoRepo:
containerName: 'mondriaan-0.9.0'
uri: https://mondriaan.annorepo.dev.clariah.nl
Expand All @@ -160,18 +145,81 @@ projects:
deleteKey: 'republic-dev-mag-weg'
joinSeparator: " "
indices:
- name: 'republic-2024.06.18'
- name: 'rep-2024.11.18'
bodyTypes: [ Resolution ]
fields:
- name: attendantId
logical:
scope: attendants
path: ".id"
- name: attendantName
logical:
scope: attendants
path: ".name"
- name: locationName
logical:
scope: entities
path: ".name"
fixed:
path: ".category"
value: LOC
- name: locationLabels
logical:
scope: entities
path: ".labels"
fixed:
path: ".category"
value: LOC
- name: organisationName
logical:
scope: entities
path: ".name"
fixed:
path: ".category"
value: ORG
- name: organisationLabels
logical:
scope: entities
path: ".labels"
fixed:
path: ".category"
value: ORG
- name: personName
logical:
scope: entities
path: ".name"
fixed:
path: ".category"
value: PERS
- name: personLabels
logical:
scope: entities
path: ".labels"
fixed:
path: ".category"
value: PERS
- name: roleName
logical:
scope: entities
path: ".name"
fixed:
path: ".category"
value: HOE
- name: roleLabels
logical:
scope: entities
path: ".labels"
fixed:
path: ".category"
value: HOE
- name: bodyType
path: "$.body.type"
- name: propositionType
path: "$.body.metadata.propositionType"
type: keyword
- name: resolutionType
path: "$.body.metadata.resolutionType"
type: keyword
- name: textType
path: "$.body.metadata.textType"
type: keyword
- name: sessionDate
path: "$.body.metadata.sessionDate"
type: date
Expand All @@ -184,32 +232,38 @@ projects:
- name: sessionYear
path: "$.body.metadata.sessionYear"
type: short
- name: delegateId
path: "$.body.metadata.delegateId"
type: keyword
- name: delegateName
path: "$.body.metadata.delegateName"
type: keyword
- name: entityCategory
path: "$.body.metadata.category"
type: keyword
- name: entityId
path: "$.body.metadata.entityId"
type: keyword
- name: entityLabels
path: "$.body.metadata.entityLabels"
type: keyword
- name: entityName
path: "$.body.metadata.name"
type: keyword
- name: attendants
type: nested
nested:
from: [ Attendant ]
fields:
- name: id
path: "$.body.metadata.delegateId"
- name: name
path: "$.body.metadata.delegateName"
with:
- equal: "$.body.metadata.sessionID"
- name: entities
type: nested
nested:
from: [ Entity ]
fields:
- name: category
path: "$.body.metadata.category"
- name: id
path: "$.body.metadata.entityId"
- name: labels
path: "$.body.metadata.entityLabels"
- name: name
path: "$.body.metadata.name"
with:
- overlap: LogicalText
- name: bodyType
path: "$.body.type"
type: keyword
- name: sessionWeekday
path: "$.body.metadata.sessionWeekday"
type: keyword
annoRepo:
containerName: republic-2024.06.18
containerName: republic-2024.11.18
uri: https://annorepo.republic-caf.diginfra.org
textRepo:
uri: https://textrepo.republic-caf.diginfra.org
Expand Down Expand Up @@ -255,28 +309,21 @@ projects:
fields:
- name: bodyType
path: "$.body.type"
type: keyword
- name: date
path: "$.body.metadata.date"
type: date
- name: recipient
path: "$.body.metadata.recipient"
type: keyword
- name: recipientLoc
path: "$.body.metadata.recipientLoc"
type: keyword
- name: sender
path: "$.body.metadata.sender"
type: keyword
- name: senderLoc
path: "$.body.metadata.senderLoc"
type: keyword
- name: editorNotes
path: "$.body.metadata.editorNotes"
type: keyword
- name: shelfmark
path: "$.body.metadata.shelfmark"
type: keyword
- name: summary
path: "$.body.metadata.summary"
type: text
Expand All @@ -299,7 +346,6 @@ projects:
fields:
- name: bodyType
path: "$.body.type"
type: keyword
textRepo:
uri: https://brieven-van-hooft.tt.di.huc.knaw.nl

Expand Down Expand Up @@ -328,22 +374,16 @@ projects:
fields:
- name: correspondent
path: "$.body.metadata.correspondent"
type: keyword
- name: institution
path: "$.body.metadata.institution"
type: keyword
- name: location
path: "$.body.metadata.location"
type: keyword
- name: msid
path: "$.body.metadata.msid"
type: keyword
- name: period
path: "$.body.metadata.period"
type: keyword
- name: periodLong
path: "$.body.metadata.periodLong"
type: keyword
annoRepo:
containerName: 'vangogh-0.2.0'
uri: https://vangogh.annorepo.dev.clariah.nl
Expand Down
4 changes: 2 additions & 2 deletions k8s/broccoli-server/Dockerfile
Original file line number Diff line number Diff line change
@@ -1,11 +1,11 @@
FROM maven:3.8.5 as builder
FROM maven:3.8.5 AS builder

WORKDIR /build/
COPY ./src /build/src
COPY ./pom.xml /build/
RUN mvn --no-transfer-progress --batch-mode --update-snapshots --also-make package

FROM openjdk:20-slim
FROM openjdk:24-jdk-slim
RUN apt-get update && apt-get install -y curl jq

WORKDIR /apps/broccoli
Expand Down
16 changes: 10 additions & 6 deletions pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@

<groupId>nl.knaw.huc</groupId>
<artifactId>broccoli</artifactId>
<version>0.39.0</version>
<version>0.40-nested-facets-8d</version>

<packaging>jar</packaging>

Expand All @@ -18,7 +18,7 @@
<maven.build.timestamp.format>yyyy-MM-dd'T'HH:mm:ss'Z'</maven.build.timestamp.format>

<kotlin.code.style>official</kotlin.code.style>
<kotlin.version>1.9.25</kotlin.version>
<kotlin.version>2.0.20</kotlin.version>
<java.version>17</java.version>
<maven.compiler.source>${java.version}</maven.compiler.source>
<maven.compiler.target>${java.version}</maven.compiler.target>
Expand Down Expand Up @@ -104,8 +104,10 @@
<configuration>
<createDependencyReducedPom>true</createDependencyReducedPom>
<transformers>
<transformer implementation="org.apache.maven.plugins.shade.resource.ServicesResourceTransformer"/>
<transformer implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
<transformer
implementation="org.apache.maven.plugins.shade.resource.ServicesResourceTransformer"/>
<transformer
implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
<mainClass>${mainClass}</mainClass>
</transformer>
</transformers>
Expand Down Expand Up @@ -142,8 +144,10 @@
</goals>
<configuration>
<transformers>
<transformer implementation="org.apache.maven.plugins.shade.resource.ServicesResourceTransformer"/>
<transformer implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
<transformer
implementation="org.apache.maven.plugins.shade.resource.ServicesResourceTransformer"/>
<transformer
implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
<mainClass>${mainClass}</mainClass>
</transformer>
</transformers>
Expand Down
4 changes: 4 additions & 0 deletions src/main/kotlin/nl/knaw/huc/broccoli/api/Constants.kt
Original file line number Diff line number Diff line change
Expand Up @@ -29,4 +29,8 @@ object Constants {
}

const val TEXT_TOKEN_COUNT = "text.tokenCount"

const val NO_FILTERS = "no_filters"

const val DOC_COUNT = "doc_count"
}
Loading

0 comments on commit 63b5f11

Please sign in to comment.