CPAN Testers Project Roadmap

Operations and Documentation

Status: In Development

Repositories

cpantesters-project
- Documenting the project plan and ongoing issues
cpantesters-deploy
- Documenting the current application and deployment process

Requirements

Monitoring
- Each component of CPAN Testers should be monitored for correct operation
- If a component is failing or if a component is operating more slowly than desired, it should send an alert
  - A single test report should take less than 30 minutes to reach the web site (in the future, we should reduce that to 5)
- The monitoring system should have a web-based GUI for seeing the status of the system
- The server doing monitoring should be in a different data center
Logging
- All the system components should log to a central place
- Logs should be kept for at least a month
  - This will ensure that someone is able to look at a problem in the logs after it is reported
Statistics
- Statistics should be generated by every part of the system
  - System stats
    - CPU usage
    - Disk usage
    - Network usage
  - Backend
    - Number of reports
    - Reports by OS
    - Reports by Perl version
  - Web app and APIs
    - Number of requests
- Statistics should be stored and trends calculated to help future resource planning needs
- Statistics should be open to users for reporting
Deployment
- Deployment should be automated
- A QA/Staging environment should be set up
- The deployment should be able to create a development server in a VM

Current Problems

Monitoring
- Various users have various monitoring scripts of their own in-place, but CPAN Testers does not have its own monitoring
- CPAN Testers has a lot of moving parts that each need monitoring for stability
Metrics and Statistics
- The statistics pages on http://stats.cpantesters.org/osmatrix-month.html are not being updated.
- More research and documentation on this problem is needed
  - How are statistics being generated?
  - Where are statistics being stored?
  - How are statistics reports being generated?

Future Plans

Monitoring
- Possible Technologies
  - Icinga2
    - Based on / forked from Nagios, it has a well-documented set of configuration, easy command line tools for automation and deployment, and a nice web GUI application for viewing system status
  - Grafana
    - Has a new, built-in alerting/monitoring tool with easy configuration from the Grafana front-end
    - Can use Graphite as a back-end, which is something I know how to do, and is robust and scalable
- Opportunities
  - Build a Rex plugin to deploy the monitoring tool and automate building the monitoring configuration.
Statistics
- Possible Technologies
  - Graphite
    - Python based statistics database
    - Easy configuration and maintenance
    - Remote access over the network
    - Performs well enough for large organizations (which we are not)
    - Works with Grafana for dashboards and graphs
    - Supports automated aggregations
    - Does not support tagging metrics for later aggregation queries
  - RRDTool
    - Fast, native round-robin databases
    - Very simple data layer
    - Will need a model layer on top to route incoming stats to the right database file
    - Will need a remote access layer on top to allow centralized statistics storage
    - RRDTool's limitations is why Graphite exists
  - OpenTSDB
    - Java/Hadoop-based time series database
    - Very fast
    - Works with Kibana for dashboards and graphs
    - Supports tagging metrics for running queries on different tags
- Opportunities
  - Build a Rex plugin to deploy the metrics database and automate building the configuration

Backend: Data Processing (ETL)

Status: Planning
Repositories:
- cpantesters-schema

Requirements

The Backend must read incoming data from the Metabase and process it into useful statistics:

Distribution Name
Distribution Version
Perl Version
Perl Architecture
OS Name
OS Version
Status (pass/fail/unknown/na)

These statistics are made available to generate reports via APIs and the web app.

The backend should also store summary statistics containing only test status (pass/fail/unknown/na) aggregated by distribution and distribution version, for efficient consumption by other systems.

Current Problems

Performance
- Users expect to see reports very shortly after they release their distribution, or at least, very shortly after a report is delivered to the Metabase. If a report is in the Metabase, but not visible on the web app, they get frustrated.
  - For example, the Matrix app has a corresponding Fast Matrix which loads data from the Metabase log, bypassing the CPAN Testers backend

Future Plans

Messaging API / Job workers
- Using a message queue to get pushed updates from the Metabase to an array of backend workers can improve performance and reliability
- The API could also be consumed by external users to allow real-time updates on the status of their reports
- Possible Technologies
  - Minion
    - Minion is a job queue / worker model based on Mojolicious
    - Using a job-runner platform will increase scalability
    - Using existing tech will decrease development time
    - Integrates well with Mojolicious web framework
  - Mercury
    - Mercury is a message broker using WebSockets which supports a worker job distribution pattern
    - Written by preaction, so bias is likely
    - Using a pure-Perl messaging solution will simplify distribution and installation as opposed to ZeroMQ or nanomsg
      - ZeroMQ is installable from CPAN as long as there's a compiler available
    - Using a pure-Perl message broker will simplify distribution and installation as opposed to ActiveMQ, RabbitMQ, or Kafka
    - Mercury currently has no authentication, which would be necessary to ensure only our job workers are available in the worker pool
- Progress:
  - The CPAN Testers API currently uses Mercury to do some simple notifications. This Mercury daemon could be used to notify job runners.
    - The API site must provide authentication for internal job runners to ensure only authorized backends can run jobs
ETL Framework
- It would be nice if there was an existing ETL framework we could use for our processing to reduce the amount of code we have to write ourselves.
- I also know this has been a strange gap in CPAN, having done a lot of ETL at various workplaces. Their ETL frameworks have never escaped into the wild.
- Since our problem is basically "Copy data from this JSON blob into these fields in a SQL database", it seems like an ETL framework should be able to do this in a few lines of configuration.
- Possible Technologies
  - Catmandu http://metacpan.org/pod/Catmandu
    - Mature ETL with command-line utilities and a Perl API
    - Supports a simple transformation language
    - Supports MongoDB, ElasticSearch, and DBI
    - Logging with Log::Any
    - Good documentation at http://librecat.org/Catmandu/
  - Data::Tubes http://metacpan.org/pod/Data::Tubes
    - Pretty new, author warns some option names may change
    - Pure Perl. Easy to understand for Perl developers
    - Really easy to get started and add new functionality to
    - Some questionable API parts that may lead to strange interactions
      - tube() in scalar context may return a single item or an arrayref if there are multiple items. No way to know which is happening
      - Package names in a tube are subject to some confusing rules regarding what they resolve to: https://metacpan.org/pod/distribution/Data-Tubes/lib/Data/Tubes/Util.pod#resolve_module
  - ETL::Yertl
    - Yertl is an ETL framework written in Perl, designed to build ETL jobs in shell.
    - Written by preaction, so bias is likely
    - Yertl would need quite a bit of work before it could be used for CPAN Testers
      - A Perl-based API for use inside a Minion task
      - A document-storage database API which can be hooked into the Metabase (Amazon SimpleDB)
  - ETL::Pipeline
    - ETL::Pipeline is a simple ETL for mapping input data to an output
    - Seems to be missing the SQL output referred to in the documentation https://rt.cpan.org/Ticket/Display.html?id=117475
    - No public repository for collaboration https://rt.cpan.org/Ticket/Display.html?id=117473
- Progress:
  - I've built Beam::Runner which provides a nice framework for configuring jobs to run. It's possible we could punt on building/using a full ETL framework and just migrate what we have now to this format until we know more about the ETL's requirements.

Data APIs

Status: In Development
Repository: cpantesters-api

Requirements

There is a wide ecosystem of applications that depend on data from CPAN Testers.

Metacpan displays a summary of CPAN Testers data in a release's infobox
Matrix displays an alternate view of CPAN Testers data (even bypassing CPAN Testers when needed: http://fast-matrix.cpantesters.org
Analysis (CPAN::Testers::ParseReport) analyzes the data to perform statistical analysis to find possible reasons behind new failures
TuX's CPAN Dashboard
CPAN::Dashboard
cpXXXan which uses CPAN Testers results to build CPAN indexes with the last known good release of a distribution for your Perl and OS.

These applications need APIs to get at the CPAN Testers data in various forms, including the exact form in the database, and a summary form aggregated by distribution+version useful for small views.

The APIs created must be stable, so versioned APIs is a requirement. They must also be as fast as possible, so aggressive database indexing and local caching should be used.

Current Problems

Performance
- The existing APIs do not provide an efficient way to summarize the data, resulting in the Matrix pulling a 30MB JSON file for a very well-tested distribution like Test-Simple
  - #8
Stability
- The current summary statistics API is a SQLite database, which has been error-prone to create, resulting in corrupted SQLite databases
  - This API is still being used by MetaCPAN, and we need a new API before we can shut this down
    - #9

Completed Tasks

Provide a web API to retrieve summary data
- Backend will generate and store the summary data locally
- Create a bulk-read API for copying into other databases
  - Allow the API to accept an "updated since" to return only records that have changed
- Create a REST-style API for returning data on single distributions
  - Use HTTP Cache-Control heavily for performance
- Rely on a caching proxy or static files for performance
- Possible Technologies
  - Mojolicious
    - Pure-perl, high performance, asynchronous web framework
  - OpenAPI
    - An specification format for JSON APIs which helps document and validate
    - Mojolicious::Plugin::OpenAPI

Future Plans

Provide a web API to build CPAN Testers Matrix-style reports
- #8
- API should provide possible axes. User should be able to choose 1-3 axes.
  - Possible axes
    - Distribution version
    - Perl version
    - OS name
  - In the future, we may provide other axes

Web App

Status: Pending

Requirements

A web application to show test results
Should allow searching for authors, distributions, and versions
Should show an author dashboard
Should allow configuring of reports being sent to the author via e-mail
Should allow communication between authors and testers

Current Problems

Performance
- The current site is quite slow, and it is often timing out
- Fastly caching has not mitigated the issue
- Certain pages are slower than others
  - Showing individual reports is the slowest thing
    - I've had to ban bots from these pages, and firewalled bots that did not respect the robots.txt ban
Organization
- The web app is spread out among 8-12 CPAN distributions
- The relationships between the web apps is not well-documented
- The responsibilities of the individual web apps is not well-documented

Future Plans

Performance
- The existing web app is CGI
  - Profile it to see if its startup time is a problem and see if it can be fixed
- Possible Technologies
  - Mojolicious
Organization
- The entire web app should be a single distribution
- Possible Technologies
  - Mojolicious

Metabase

Status: Under Discussion

The Metabase (http://metabase.cpantesters.org) is the highly-available document storage for incoming test reports.

Requirements

Highly-available
- I cannot stress this enough
Arbitrary structured data
- There is a lot of data about the tester's machine and setup that is contained in reports, over and above the full text of the report itself.
- Being able to add more structured data to reports is important, as parsing data out of the full text reports is nigh-impossible

Current Problems

It includes its own API separate from all other CPAN Testers APIs
It is fairly complicated internally for what it's being used for
- The needs for the flexibility and expansion never materialized, so now it's a bit more complex than CPAN Testers really needs
It is hosted on Amazon SimpleDB
- Which charges per-query
- And has an annoying limit which we reach about once per year which involves manually updating a bunch of configuration files
- But, it is highly-available!
- Except, the API in front of it is a single point of failure, yet has been working just fine
There is a whole copy of Metabase in the CPAN Testers MySQL
- Because of Amazon SimpleDB being costly to query, we keep a whole copy locally
- We don't need this copy if we have just one we can use immediately
- Populating this copy takes time and increases the latency between incoming test reports and consumable CPAN Testers data.
- This whole copy in the CPAN Testers MySQL is formatted inefficiently and viewing individual reports is the single biggest resource problem on the CPAN Testers web app
  - This page occasionally results in the server going down due to overload

Future Plans

Integrate the Metabase API into the CPAN Testers API
- This will reduce the amount of things we need to maintain
- The API app can be made highly-available if needed
  - With HAProxy, even just the Metabase parts could be made such
Move the Metabase to a locally-hosted document store
- Possible Technologies:
  - ElasticSearch
    - I've hosted Elastic before. It's easy to set up, and performs well out of the box.
    - It's scalable, with new nodes easily joining the cluster and distributing data and load
    - It could be used to perform full-text searches on test reports, opening us up to more features for CPAN Testers
    - ElasticSearch has other roles as a logging database and monitoring system
  - MongoDB
    - I've used MongoDB a while back, and clustering was not as nice as ElasticSearch
    - Besides (or even because of) that, it is easier to set up than ElasticSearch
    - Full-text searching is not a key feature
    - I should research this more, as it's been years since I last looked...

Testers

Status: In Development

Requirements

A way to send in test reports from any CPAN client
- cpan
  - CPAN::Reporter
- cpanplus
  - Bundle::CPANPLUS::Test::Reporter?
  - Task::CPANPLUS::Metabase?
- cpanm
  - cpanm-reporter
A way to set up a system to automatically run tests on CPAN modules
- Run tests on newly-uploaded distributions
  - From release to report to CPAN Testers webapp should be as fast as possible
- Run tests on all of CPAN
  - Useful to test new Perls against the existing CPAN

Current Problems

XXX

Future Plans

XXX

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ROADMAP.md

ROADMAP.md

CPAN Testers Project Roadmap

Operations and Documentation

Repositories

Requirements

Current Problems

Future Plans

Backend: Data Processing (ETL)

Requirements

Current Problems

Future Plans

Data APIs

Requirements

Current Problems

Completed Tasks

Future Plans

Web App

Requirements

Current Problems

Future Plans

Metabase

Requirements

Current Problems

Future Plans

Testers

Requirements

Current Problems

Future Plans

Files

ROADMAP.md

Latest commit

History

ROADMAP.md

File metadata and controls

CPAN Testers Project Roadmap

Operations and Documentation

Repositories

Requirements

Current Problems

Future Plans

Backend: Data Processing (ETL)

Requirements

Current Problems

Future Plans

Data APIs

Requirements

Current Problems

Completed Tasks

Future Plans

Web App

Requirements

Current Problems

Future Plans

Metabase

Requirements

Current Problems

Future Plans

Testers

Requirements

Current Problems

Future Plans