Skip to content

Configuration

Cezary Kluczyński edited this page Dec 18, 2017 · 21 revisions

The main way of providing configuration to STAPI is through Java properties.

First, follow basic setup guildlines on copying properties file into the right location.

When properties are in place, configuration can be done.

Configuration specific to Spring itself is described in Spring documentation.

Configuration using profiles

The following profiles can be turned on and off for different effects:

  • stapi-custom - lets us have an application-stapi-custom.properties file in the same directory as application.properties. This file is ignored by Git.
  • genderize - whether or not connect to Genderize.io. When profile is present, connection would be made, and gender of real people will be decided upon responses from this API. If it is not gender accuracy that's currently important, it is advised that this profile be not present. Genderize.io offers 1000 free request a day, a number that could be exceeded when multiple tests are run, but a number well bellow what is needed for single datafeed, which required 150 to 200 API requests to Genderize.io.
  • apiThrottle - whether the API should be throttled. When enabled, throttling is performed according to settings from the throttle configuration. If not enabled, no throttle is performed.

Profiles default and stapi-custom should always be on.

Configuration properties

Database properties

  • stapi.datasource.main.url - URL of database, check Spring documentation for details
  • stapi.datasource.main.username - database username
  • stapi.datasource.main.password=stapi - database password
  • stapi.datasource.metrics.url - URL of metrics database
  • stapi.datasource.metrics.username - metrics database username
  • stapi.datasource.metrics.password=stapi - metrics database password

Logging

  • logging.log-files-path - subdirectory For Logs. Default to ./logs
  • logging.config - location of logback config. Default to classpath:logback-spring.xml

The remaining properties prefixed with logging. are responsible for logging levels for a various packages. The exact configuration can be found https://github.com/cezarykluczynski/stapi/blob/master/server/src/main/resources/logback-spring.xml.

Hits statistics

Those properties control how ofter endpoint hit statistics are persisted and read to be displayed on the statistics subpage. Reads and writes are executed by two indepentent asynchronous processes, so cron expressions should not overlap.

  • statistics.persist.endpointHit - when endpoint hits statistics should be persisted. Default to 0 * * * * *
  • statistics.read.endpointHit - when endpoint hits statistics should be read to be displayed on the statistics subpage. Default to 30 * * * * *

Sources

  • source.mediaWiki.memoryAlphaEn.apiUrl - URL of Memory Alpha API, or it's local fork.
  • source.mediaWiki.memoryBetaEn.apiUrl - URL of Memory Beta API, or it's local fork. For the remaining properties, see common properties section.
  • source.genderize.apiUrl - URL of Genderize.io API.
  • *source.wordPress.starTrekCards.apiUrl - URL of StarTrekCards.com.
  • wordPress.starTrekCards.minimalInterval - minimal interval between requests for StarTrekCards.com. Default to 3000.

Common properties for all MediaWiki sources

All MediaWiki sources share common properties:

  • source.mediawiki.XXX.minimalInterval - minimal interval of requests. Default to auto. Could be either auto or number of milliseconds. If set to auto, it will be 0 for local fork, and one second for Wikia's wiki.
  • source.mediaWiki.XXX.logPostpones - should the fact that API request was postponed be logged. Default to false.
  • source.mediaWiki.XXX.intervalCalculationStrategy - how should the interval between requests be calculated. Possible values are FROM_BEFORE_SEND, and FROM_AFTER_RECEIVED. FROM_BEFORE_SEND means that the interval will be calculated from before the request was send, and another request will be send, for example, 1000 milliseconds after the last one was send. FROM_AFTER_RECEIVED means that the interval will be calculated from after the response was received, and another request will be send, for example, 1000 milliseconds after the last response was received.

ETL configuration

  • etl.enabled - when this is set to true, ETL routines will execute the first time application is started. When this is set to false, it is assumed that the database state is right, which might or might not be true, and no ETL routines nor database migrations will be executed.

Steps configuration

Every can be configured in a similar manner. All steps share the same properties:

  • step.XXX.enabled=true - whether or not this step is turned on. When step is turned off, step reader is provided with an empty list of objects to process, and, as a result, step is marked as completed immediately.
  • step.XXX.commitInterval=50 - how ofter should the resulting entities be saved. Commit interval 1, when found, should not be changed, because steps that have it were not tested with higher intervals. Steps with commit interval higher than 1 could be tuned.
  • step.XXX.order=1 - order in which steps are executed. Currently not used, but validated - no duplicated order values are allowed.

Throttle configuration

  • throttle.ipAddressHourlyLimit - how much request can an IP do every hour. Default to 250.
  • throttle.minutesToDeleteExpiredIpAddresses - how much minutes should pass before expired entries for IP's are deleted. Default to 1815.
  • throttle.frequentRequestsPeriodLengthInSeconds - how much seconds should requests be considered when deciding whether requests are flowing too fast. Default to 15.
  • throttle.frequentRequestsMaxRequestsPerPeriod - how much requests can be done in a period defined by throttle.frequentRequestsPeriodLengthInSeconds. Default to 45.

Cache configuration

  • cache.cachingStrategyType - default strategy for caching read-only database queries. Possible values are NO_CACHE, CACHE_ALL, and CACHE_FULL_ENTITIES. The first two are self-explanatory. CACHE_FULL_ENTITIES will cache only results of entities retrieved by GUID. This setting will not have any effect if etl profile is enabled.

OAuth configuration

For GitHub OAuth to work, a set of clientId and clientSecret should be configured.

  • oauth.github.clientId=CLIENT_ID - client ID taken from GitHub OAuth application page.
  • oauth.github.clientSecret=CLIENT_SECRET - client secret taken from GitHub OAuth application page. Should never be made publically available.
  • oauth.github.adminIdentifiers=1,2,3 - GitHub user IDs of users who should also site admins.

API keys

API keys limits can be configured using properties.

  • apiKey.keyLimitPerAccount - how much keys can a single account have. Default to 5.
  • apiKey.requestLimitPerKey - requests limit per key. Default to 10000.

Actuator

Actuator is disabled by default, but it can be enabled using following configuration:

endpoints.metrics.enabled=true

endpoints.health.enabled=true

endpoints.info.enabled=true

security.user.name=USERNAME

security.user.password=PASSWORD

USERNAME and PASSWORD should be changed to something unguessable.

Clone this wiki locally