Skip to content

Configuration

Cezary Kluczyński edited this page Mar 17, 2023 · 21 revisions

The main way of providing configuration to STAPI is through Java properties.

First, follow basic setup guildlines on copying properties file into the right location.

When properties are in place, configuration can be done.

Configuration specific to Spring itself is described in Spring documentation.

Configuration using profiles

The following profiles can be turned on and off for different effects:

  • stapi-custom - lets us have an application-stapi-custom.properties file in the same directory as application.properties. This file is ignored by Git.
  • genderize - whether or not connect to Genderize.io. When profile is present, connection would be made, and gender of real people will be decided upon responses from this API. If it is not gender accuracy that's currently important, it is advised that this profile be not present. Genderize.io offers 1000 free request a day, a number that could be exceeded when multiple tests are run, but a number well bellow what is needed for single datafeed, which required 150 to 200 API requests to Genderize.io.
  • docker - profile used for when STAPI's Docker image is built. Changes compared to non-Docker version are minimal and only related to where some static files are stored in the Docker container.

Note: Profiles default and stapi-custom should always be on.

Configuration properties

Database properties

  • spring.datasource.url - URL of database, check Spring documentation for details
  • spring.datasource.hikari.username - database username
  • spring.datasource.hikari.password=stapi - database password

Logging

  • logging.file.path - subdirectory For Logs. Default to ./logs
  • logging.config - location of logback config. Default to classpath:logback-spring.xml

The remaining properties prefixed with logging. are responsible for logging levels for a various packages. The exact configuration can be found https://github.com/cezarykluczynski/stapi/blob/master/server/src/main/resources/logback-spring.xml.

Sources

  • source.mediawiki.memoryAlphaEn.apiUrl - URL of Memory Alpha API, or it's local fork.
  • source.mediawiki.memoryBetaEn.apiUrl - URL of Memory Beta API, or it's local fork. For the remaining properties, see common properties section.
  • source.genderize.apiUrl - URL of Genderize.io API.
  • source.wordpress.starTrekCards.apiUrl - URL of StarTrekCards.com.
  • wordPress.starTrekCards.minimalInterval - minimal interval between requests for StarTrekCards.com. Default to 3000.

Common properties for all MediaWiki sources

All MediaWiki sources share common properties:

  • source.mediawiki.XXX.minimalInterval - minimal interval of requests. Default to auto. Could be either auto or number of milliseconds. If set to auto, it will be 0 for local fork, and one second for Wikia's wiki.
  • source.mediawiki.XXX.logPostpones - should the fact that API request was postponed be logged. Default to false.
  • source.mediawiki.XXX.intervalCalculationStrategy - how should the interval between requests be calculated. Possible values are FROM_BEFORE_SEND, and FROM_AFTER_RECEIVED. FROM_BEFORE_SEND means that the interval will be calculated from before the request was send, and another request will be send, for example, 1000 milliseconds after the last one was send. FROM_AFTER_RECEIVED means that the interval will be calculated from after the response was received, and another request will be send, for example, 1000 milliseconds after the last response was received.

ETL configuration

  • spring.batch.job.enabled - when this is set to true, ETL routines written on top of Spring Batch will execute the first time application is started, and all consecutive times, for all enabled steps. When this is set to false, no ETL routines nor database migrations will be executed.

Steps configuration

Every can be configured in a similar manner. All steps share the same properties:

  • step.XXX.enabled=true - whether or not this step is turned on. When step is turned off, step reader is provided with an empty list of objects to process, and, as a result, step is marked as completed immediately.
  • step.XXX.commitInterval=50 - how ofter should the resulting entities be saved. Commit interval 1, when found, should not be changed, because steps that have it were not tested with higher intervals. Steps with commit interval higher than 1 could be tuned.
  • step.XXX.order=1 - order in which steps are executed. Currently not used, but validated - no duplicated order values are allowed.

Legal documents

Terms of service and privacy policy can be configured with properties. Both files are optional, HTML-formatter text files (without <html>, <head>, and <body> tags, just the formatting), that will be used on /terms-of-service and /privacy-policy pages, respectivelly.

  • legal.termsOfService - full path to terms of service.
  • legal.privacyPolicy - full path to privacy policy.

Configuration env variables

Several env variables can be set. Those are mostly useful for when STAPI is deployed for stapi.co domain, but can be used in other contexts.

  • STAPI_DATA_VERSION – set when Docker image is built. It's the version of data to be downloaded from AWS. Format is: 2023-02. In a built image it's responsible for displaying on the main page the month in which currently served data was gathered.
  • STAPI_TOS_AND_PP – whether to show site's privacy policy and TOS in footer. Only really required for stapi.co. Value could be true to include them, and anything else to not include them.
  • STAPI_CANONICAL_DOMAIN – only requests to this one domain will be accepted, and other's rejected. Prevents people from pointing their domains to server on which STAPI runs.
  • STAPI_UPGRADE_INSECURE_REQUESTS – whether to respect Upgrade-Insecure-Requests header. If value is set to true, any request to http://stapi.co/some-url will be redirected to https://stapi.co/some-url. Setting to other value will ignore this header and perform no action based on it's value.
  • STAPI_EAGER_CACHING – whether complex full DB entities, and as a result, complex API entities, should be cached. When enabled, a process will work in the background of the Docker container (and local application), and will use existing repositories that implement CriteriaMatcher to load full entities into in-memory caches that application has. This does not delay application startup. Not all entities are pre-loaded that way, only those that has relations to other entities, or that proved to be loading slowly in performance tests. Enabling this adds several hundrers MB to the memory that the container uses. It also adds some 30 GB to the disk space that the container take, and why, it is yet to be discovered, so be careful with it. It takes about 40 minut to load cache, but this number will vary depending on the resources that a machine have. Not enabling this does not mean that caching is disabled, only that caches won't be filled eagerly. Only full entities are cached. Search is not affected by this setting.
Clone this wiki locally