Skip to content

Latest commit

 

History

History
467 lines (401 loc) · 18.6 KB

ROADMAP.md

File metadata and controls

467 lines (401 loc) · 18.6 KB

CPAN Testers Project Roadmap

Operations and Documentation

  • Status: In Development

Repositories

Requirements

  • Monitoring
    • Each component of CPAN Testers should be monitored for correct operation
    • If a component is failing or if a component is operating more slowly than desired, it should send an alert
      • A single test report should take less than 30 minutes to reach the web site (in the future, we should reduce that to 5)
    • The monitoring system should have a web-based GUI for seeing the status of the system
    • The server doing monitoring should be in a different data center
  • Logging
    • All the system components should log to a central place
    • Logs should be kept for at least a month
      • This will ensure that someone is able to look at a problem in the logs after it is reported
  • Statistics
    • Statistics should be generated by every part of the system
      • System stats
        • CPU usage
        • Disk usage
        • Network usage
      • Backend
        • Number of reports
        • Reports by OS
        • Reports by Perl version
      • Web app and APIs
        • Number of requests
    • Statistics should be stored and trends calculated to help future resource planning needs
    • Statistics should be open to users for reporting
  • Deployment
    • Deployment should be automated
    • A QA/Staging environment should be set up
    • The deployment should be able to create a development server in a VM

Current Problems

  • Monitoring

    • Various users have various monitoring scripts of their own in-place, but CPAN Testers does not have its own monitoring
    • CPAN Testers has a lot of moving parts that each need monitoring for stability
  • Metrics and Statistics

    • The statistics pages on http://stats.cpantesters.org/osmatrix-month.html are not being updated.
    • More research and documentation on this problem is needed
      • How are statistics being generated?
      • Where are statistics being stored?
      • How are statistics reports being generated?

Future Plans

  • Monitoring

    • Possible Technologies
      • Icinga2
        • Based on / forked from Nagios, it has a well-documented set of configuration, easy command line tools for automation and deployment, and a nice web GUI application for viewing system status
      • Grafana
        • Has a new, built-in alerting/monitoring tool with easy configuration from the Grafana front-end
        • Can use Graphite as a back-end, which is something I know how to do, and is robust and scalable
    • Opportunities
      • Build a Rex plugin to deploy the monitoring tool and automate building the monitoring configuration.
  • Statistics

    • Possible Technologies
      • Graphite
        • Python based statistics database
        • Easy configuration and maintenance
        • Remote access over the network
        • Performs well enough for large organizations (which we are not)
        • Works with Grafana for dashboards and graphs
        • Supports automated aggregations
        • Does not support tagging metrics for later aggregation queries
      • RRDTool
        • Fast, native round-robin databases
        • Very simple data layer
        • Will need a model layer on top to route incoming stats to the right database file
        • Will need a remote access layer on top to allow centralized statistics storage
        • RRDTool's limitations is why Graphite exists
      • OpenTSDB
        • Java/Hadoop-based time series database
        • Very fast
        • Works with Kibana for dashboards and graphs
        • Supports tagging metrics for running queries on different tags
    • Opportunities
      • Build a Rex plugin to deploy the metrics database and automate building the configuration

Backend: Data Processing (ETL)

Requirements

The Backend must read incoming data from the Metabase and process it into useful statistics:

  • Distribution Name
  • Distribution Version
  • Perl Version
  • Perl Architecture
  • OS Name
  • OS Version
  • Status (pass/fail/unknown/na)

These statistics are made available to generate reports via APIs and the web app.

The backend should also store summary statistics containing only test status (pass/fail/unknown/na) aggregated by distribution and distribution version, for efficient consumption by other systems.

Current Problems

  • Performance
    • Users expect to see reports very shortly after they release their distribution, or at least, very shortly after a report is delivered to the Metabase. If a report is in the Metabase, but not visible on the web app, they get frustrated.
      • For example, the Matrix app has a corresponding Fast Matrix which loads data from the Metabase log, bypassing the CPAN Testers backend

Future Plans

  • Messaging API / Job workers

    • Using a message queue to get pushed updates from the Metabase to an array of backend workers can improve performance and reliability
    • The API could also be consumed by external users to allow real-time updates on the status of their reports
    • Possible Technologies
      • Minion
        • Minion is a job queue / worker model based on Mojolicious
        • Using a job-runner platform will increase scalability
        • Using existing tech will decrease development time
        • Integrates well with Mojolicious web framework
      • Mercury
        • Mercury is a message broker using WebSockets which supports a worker job distribution pattern
        • Written by preaction, so bias is likely
        • Using a pure-Perl messaging solution will simplify distribution and installation as opposed to ZeroMQ or nanomsg
          • ZeroMQ is installable from CPAN as long as there's a compiler available
        • Using a pure-Perl message broker will simplify distribution and installation as opposed to ActiveMQ, RabbitMQ, or Kafka
        • Mercury currently has no authentication, which would be necessary to ensure only our job workers are available in the worker pool
    • Progress:
      • The CPAN Testers API currently uses Mercury to do some simple notifications. This Mercury daemon could be used to notify job runners.
        • The API site must provide authentication for internal job runners to ensure only authorized backends can run jobs
  • ETL Framework

    • It would be nice if there was an existing ETL framework we could use for our processing to reduce the amount of code we have to write ourselves.
    • I also know this has been a strange gap in CPAN, having done a lot of ETL at various workplaces. Their ETL frameworks have never escaped into the wild.
    • Since our problem is basically "Copy data from this JSON blob into these fields in a SQL database", it seems like an ETL framework should be able to do this in a few lines of configuration.
    • Possible Technologies
    • Progress:
      • I've built Beam::Runner which provides a nice framework for configuring jobs to run. It's possible we could punt on building/using a full ETL framework and just migrate what we have now to this format until we know more about the ETL's requirements.

Data APIs

Requirements

There is a wide ecosystem of applications that depend on data from CPAN Testers.

These applications need APIs to get at the CPAN Testers data in various forms, including the exact form in the database, and a summary form aggregated by distribution+version useful for small views.

The APIs created must be stable, so versioned APIs is a requirement. They must also be as fast as possible, so aggressive database indexing and local caching should be used.

Current Problems

  • Performance

    • The existing APIs do not provide an efficient way to summarize the data, resulting in the Matrix pulling a 30MB JSON file for a very well-tested distribution like Test-Simple
  • Stability

    • The current summary statistics API is a SQLite database, which has been error-prone to create, resulting in corrupted SQLite databases
      • This API is still being used by MetaCPAN, and we need a new API before we can shut this down

Completed Tasks

  • Provide a web API to retrieve summary data
    • Backend will generate and store the summary data locally
    • Create a bulk-read API for copying into other databases
      • Allow the API to accept an "updated since" to return only records that have changed
    • Create a REST-style API for returning data on single distributions
      • Use HTTP Cache-Control heavily for performance
    • Rely on a caching proxy or static files for performance
    • Possible Technologies

Future Plans

  • Provide a web API to build CPAN Testers Matrix-style reports
    • #8
    • API should provide possible axes. User should be able to choose 1-3 axes.
      • Possible axes
        • Distribution version
        • Perl version
        • OS name
      • In the future, we may provide other axes

Web App

  • Status: Pending

Requirements

  • A web application to show test results
  • Should allow searching for authors, distributions, and versions
  • Should show an author dashboard
  • Should allow configuring of reports being sent to the author via e-mail
  • Should allow communication between authors and testers

Current Problems

  • Performance
    • The current site is quite slow, and it is often timing out
    • Fastly caching has not mitigated the issue
    • Certain pages are slower than others
      • Showing individual reports is the slowest thing
        • I've had to ban bots from these pages, and firewalled bots that did not respect the robots.txt ban
  • Organization
    • The web app is spread out among 8-12 CPAN distributions
    • The relationships between the web apps is not well-documented
    • The responsibilities of the individual web apps is not well-documented

Future Plans

  • Performance
    • The existing web app is CGI
      • Profile it to see if its startup time is a problem and see if it can be fixed
    • Possible Technologies
      • Mojolicious
  • Organization
    • The entire web app should be a single distribution
    • Possible Technologies
      • Mojolicious

Metabase

  • Status: Under Discussion

The Metabase (http://metabase.cpantesters.org) is the highly-available document storage for incoming test reports.

Requirements

  • Highly-available
    • I cannot stress this enough
  • Arbitrary structured data
    • There is a lot of data about the tester's machine and setup that is contained in reports, over and above the full text of the report itself.
    • Being able to add more structured data to reports is important, as parsing data out of the full text reports is nigh-impossible

Current Problems

  • It includes its own API separate from all other CPAN Testers APIs
  • It is fairly complicated internally for what it's being used for
    • The needs for the flexibility and expansion never materialized, so now it's a bit more complex than CPAN Testers really needs
  • It is hosted on Amazon SimpleDB
    • Which charges per-query
    • And has an annoying limit which we reach about once per year which involves manually updating a bunch of configuration files
    • But, it is highly-available!
    • Except, the API in front of it is a single point of failure, yet has been working just fine
  • There is a whole copy of Metabase in the CPAN Testers MySQL
    • Because of Amazon SimpleDB being costly to query, we keep a whole copy locally
    • We don't need this copy if we have just one we can use immediately
    • Populating this copy takes time and increases the latency between incoming test reports and consumable CPAN Testers data.
    • This whole copy in the CPAN Testers MySQL is formatted inefficiently and viewing individual reports is the single biggest resource problem on the CPAN Testers web app
      • This page occasionally results in the server going down due to overload

Future Plans

  • Integrate the Metabase API into the CPAN Testers API
    • This will reduce the amount of things we need to maintain
    • The API app can be made highly-available if needed
      • With HAProxy, even just the Metabase parts could be made such
  • Move the Metabase to a locally-hosted document store
    • Possible Technologies:
      • ElasticSearch
        • I've hosted Elastic before. It's easy to set up, and performs well out of the box.
        • It's scalable, with new nodes easily joining the cluster and distributing data and load
        • It could be used to perform full-text searches on test reports, opening us up to more features for CPAN Testers
        • ElasticSearch has other roles as a logging database and monitoring system
      • MongoDB
        • I've used MongoDB a while back, and clustering was not as nice as ElasticSearch
        • Besides (or even because of) that, it is easier to set up than ElasticSearch
        • Full-text searching is not a key feature
        • I should research this more, as it's been years since I last looked...

Testers

  • Status: In Development

Requirements

  • A way to send in test reports from any CPAN client

    • cpan
      • CPAN::Reporter
    • cpanplus
      • Bundle::CPANPLUS::Test::Reporter?
      • Task::CPANPLUS::Metabase?
    • cpanm
      • cpanm-reporter
  • A way to set up a system to automatically run tests on CPAN modules

    • Run tests on newly-uploaded distributions
      • From release to report to CPAN Testers webapp should be as fast as possible
    • Run tests on all of CPAN
      • Useful to test new Perls against the existing CPAN

Current Problems

XXX

Future Plans

XXX