- Status: In Development
- cpantesters-project
- Documenting the project plan and ongoing issues
- cpantesters-deploy
- Documenting the current application and deployment process
- Monitoring
- Each component of CPAN Testers should be monitored for correct operation
- If a component is failing or if a component is operating more
slowly than desired, it should send an alert
- A single test report should take less than 30 minutes to reach the web site (in the future, we should reduce that to 5)
- The monitoring system should have a web-based GUI for seeing the status of the system
- The server doing monitoring should be in a different data center
- Logging
- All the system components should log to a central place
- Logs should be kept for at least a month
- This will ensure that someone is able to look at a problem in the logs after it is reported
- Statistics
- Statistics should be generated by every part of the system
- System stats
- CPU usage
- Disk usage
- Network usage
- Backend
- Number of reports
- Reports by OS
- Reports by Perl version
- Web app and APIs
- Number of requests
- System stats
- Statistics should be stored and trends calculated to help future resource planning needs
- Statistics should be open to users for reporting
- Statistics should be generated by every part of the system
- Deployment
- Deployment should be automated
- A QA/Staging environment should be set up
- The deployment should be able to create a development server in a VM
-
Monitoring
- Various users have various monitoring scripts of their own in-place, but CPAN Testers does not have its own monitoring
- CPAN Testers has a lot of moving parts that each need monitoring for stability
-
Metrics and Statistics
- The statistics pages on http://stats.cpantesters.org/osmatrix-month.html are not being updated.
- More research and documentation on this problem is needed
- How are statistics being generated?
- Where are statistics being stored?
- How are statistics reports being generated?
-
Monitoring
- Possible Technologies
- Icinga2
- Based on / forked from Nagios, it has a well-documented set of configuration, easy command line tools for automation and deployment, and a nice web GUI application for viewing system status
- Grafana
- Has a new, built-in alerting/monitoring tool with easy configuration from the Grafana front-end
- Can use Graphite as a back-end, which is something I know how to do, and is robust and scalable
- Icinga2
- Opportunities
- Build a Rex plugin to deploy the monitoring tool and automate building the monitoring configuration.
- Possible Technologies
-
Statistics
- Possible Technologies
- Graphite
- Python based statistics database
- Easy configuration and maintenance
- Remote access over the network
- Performs well enough for large organizations (which we are not)
- Works with Grafana for dashboards and graphs
- Supports automated aggregations
- Does not support tagging metrics for later aggregation queries
- RRDTool
- Fast, native round-robin databases
- Very simple data layer
- Will need a model layer on top to route incoming stats to the right database file
- Will need a remote access layer on top to allow centralized statistics storage
- RRDTool's limitations is why Graphite exists
- OpenTSDB
- Java/Hadoop-based time series database
- Very fast
- Works with Kibana for dashboards and graphs
- Supports tagging metrics for running queries on different tags
- Graphite
- Opportunities
- Build a Rex plugin to deploy the metrics database and automate building the configuration
- Possible Technologies
- Status: Planning
- Repositories:
The Backend must read incoming data from the Metabase and process it into useful statistics:
- Distribution Name
- Distribution Version
- Perl Version
- Perl Architecture
- OS Name
- OS Version
- Status (pass/fail/unknown/na)
These statistics are made available to generate reports via APIs and the web app.
The backend should also store summary statistics containing only test status (pass/fail/unknown/na) aggregated by distribution and distribution version, for efficient consumption by other systems.
- Performance
- Users expect to see reports very shortly after they release their
distribution, or at least, very shortly after a report is
delivered to the Metabase. If a report is in the Metabase, but not
visible on the web app, they get frustrated.
- For example, the Matrix app has a corresponding Fast Matrix which loads data from the Metabase log, bypassing the CPAN Testers backend
- Users expect to see reports very shortly after they release their
distribution, or at least, very shortly after a report is
delivered to the Metabase. If a report is in the Metabase, but not
visible on the web app, they get frustrated.
-
Messaging API / Job workers
- Using a message queue to get pushed updates from the Metabase to an array of backend workers can improve performance and reliability
- The API could also be consumed by external users to allow real-time updates on the status of their reports
- Possible Technologies
- Minion
- Minion is a job queue / worker model based on Mojolicious
- Using a job-runner platform will increase scalability
- Using existing tech will decrease development time
- Integrates well with Mojolicious web framework
- Mercury
- Mercury is a message broker using WebSockets which supports a worker job distribution pattern
- Written by preaction, so bias is likely
- Using a pure-Perl messaging solution will simplify
distribution and installation as opposed to ZeroMQ or
nanomsg
- ZeroMQ is installable from CPAN as long as there's a compiler available
- Using a pure-Perl message broker will simplify distribution and installation as opposed to ActiveMQ, RabbitMQ, or Kafka
- Mercury currently has no authentication, which would be necessary to ensure only our job workers are available in the worker pool
- Minion
- Progress:
- The CPAN Testers API currently
uses Mercury to do some simple notifications. This Mercury
daemon could be used to notify job runners.
- The API site must provide authentication for internal job runners to ensure only authorized backends can run jobs
- The CPAN Testers API currently
uses Mercury to do some simple notifications. This Mercury
daemon could be used to notify job runners.
-
ETL Framework
- It would be nice if there was an existing ETL framework we could use for our processing to reduce the amount of code we have to write ourselves.
- I also know this has been a strange gap in CPAN, having done a lot of ETL at various workplaces. Their ETL frameworks have never escaped into the wild.
- Since our problem is basically "Copy data from this JSON blob into these fields in a SQL database", it seems like an ETL framework should be able to do this in a few lines of configuration.
- Possible Technologies
- Catmandu http://metacpan.org/pod/Catmandu
- Mature ETL with command-line utilities and a Perl API
- Supports a simple transformation language
- Supports MongoDB, ElasticSearch, and DBI
- Logging with Log::Any
- Good documentation at http://librecat.org/Catmandu/
- Data::Tubes
http://metacpan.org/pod/Data::Tubes
- Pretty new, author warns some option names may change
- Pure Perl. Easy to understand for Perl developers
- Really easy to get started and add new functionality to
- Some questionable API parts that may lead to strange
interactions
tube()
in scalar context may return a single item or an arrayref if there are multiple items. No way to know which is happening- Package names in a tube are subject to some confusing rules regarding what they resolve to: https://metacpan.org/pod/distribution/Data-Tubes/lib/Data/Tubes/Util.pod#resolve_module
- ETL::Yertl
- Yertl is an ETL framework written in Perl, designed to build ETL jobs in shell.
- Written by preaction, so bias is likely
- Yertl would need quite a bit of work before it could be
used for CPAN Testers
- A Perl-based API for use inside a Minion task
- A document-storage database API which can be hooked into the Metabase (Amazon SimpleDB)
- ETL::Pipeline
- ETL::Pipeline is a simple ETL for mapping input data to an output
- Seems to be missing the SQL output referred to in the documentation https://rt.cpan.org/Ticket/Display.html?id=117475
- No public repository for collaboration https://rt.cpan.org/Ticket/Display.html?id=117473
- Catmandu http://metacpan.org/pod/Catmandu
- Progress:
- I've built Beam::Runner which provides a nice framework for configuring jobs to run. It's possible we could punt on building/using a full ETL framework and just migrate what we have now to this format until we know more about the ETL's requirements.
- Status: In Development
- Repository: cpantesters-api
There is a wide ecosystem of applications that depend on data from CPAN Testers.
- Metacpan displays a summary of CPAN Testers data in a release's infobox
- Matrix displays an alternate view of CPAN Testers data (even bypassing CPAN Testers when needed: http://fast-matrix.cpantesters.org
- Analysis (CPAN::Testers::ParseReport) analyzes the data to perform statistical analysis to find possible reasons behind new failures
- TuX's CPAN Dashboard
- CPAN::Dashboard
- cpXXXan which uses CPAN Testers results to build CPAN indexes with the last known good release of a distribution for your Perl and OS.
These applications need APIs to get at the CPAN Testers data in various forms, including the exact form in the database, and a summary form aggregated by distribution+version useful for small views.
The APIs created must be stable, so versioned APIs is a requirement. They must also be as fast as possible, so aggressive database indexing and local caching should be used.
-
Performance
- The existing APIs do not provide an efficient way to summarize the data, resulting in the Matrix pulling a 30MB JSON file for a very well-tested distribution like Test-Simple
-
Stability
- The current summary statistics API is a SQLite database, which has
been error-prone to create, resulting in corrupted SQLite
databases
- This API is still being used by MetaCPAN, and we need a new API before we can shut this down
- The current summary statistics API is a SQLite database, which has
been error-prone to create, resulting in corrupted SQLite
databases
- Provide a web API to retrieve summary data
- Backend will generate and store the summary data locally
- Create a bulk-read API for copying into other databases
- Allow the API to accept an "updated since" to return only records that have changed
- Create a REST-style API for returning data on single distributions
- Use HTTP Cache-Control heavily for performance
- Rely on a caching proxy or static files for performance
- Possible Technologies
- Mojolicious
- Pure-perl, high performance, asynchronous web framework
- OpenAPI
- An specification format for JSON APIs which helps document and validate
- Mojolicious::Plugin::OpenAPI
- Mojolicious
- Provide a web API to build CPAN Testers Matrix-style reports
- #8
- API should provide possible axes. User should be able to choose
1-3 axes.
- Possible axes
- Distribution version
- Perl version
- OS name
- In the future, we may provide other axes
- Possible axes
- Status: Pending
- A web application to show test results
- Should allow searching for authors, distributions, and versions
- Should show an author dashboard
- Should allow configuring of reports being sent to the author via e-mail
- Should allow communication between authors and testers
- Performance
- The current site is quite slow, and it is often timing out
- Fastly caching has not mitigated the issue
- Certain pages are slower than others
- Showing individual reports is the slowest thing
- I've had to ban bots from these pages, and firewalled bots that did not respect the robots.txt ban
- Showing individual reports is the slowest thing
- Organization
- The web app is spread out among 8-12 CPAN distributions
- The relationships between the web apps is not well-documented
- The responsibilities of the individual web apps is not well-documented
- Performance
- The existing web app is CGI
- Profile it to see if its startup time is a problem and see if it can be fixed
- Possible Technologies
- Mojolicious
- The existing web app is CGI
- Organization
- The entire web app should be a single distribution
- Possible Technologies
- Mojolicious
- Status: Under Discussion
The Metabase (http://metabase.cpantesters.org) is the highly-available document storage for incoming test reports.
- Highly-available
- I cannot stress this enough
- Arbitrary structured data
- There is a lot of data about the tester's machine and setup that is contained in reports, over and above the full text of the report itself.
- Being able to add more structured data to reports is important, as parsing data out of the full text reports is nigh-impossible
- It includes its own API separate from all other CPAN Testers APIs
- It is fairly complicated internally for what it's being used for
- The needs for the flexibility and expansion never materialized, so now it's a bit more complex than CPAN Testers really needs
- It is hosted on Amazon SimpleDB
- Which charges per-query
- And has an annoying limit which we reach about once per year which involves manually updating a bunch of configuration files
- But, it is highly-available!
- Except, the API in front of it is a single point of failure, yet has been working just fine
- There is a whole copy of Metabase in the CPAN Testers MySQL
- Because of Amazon SimpleDB being costly to query, we keep a whole copy locally
- We don't need this copy if we have just one we can use immediately
- Populating this copy takes time and increases the latency between incoming test reports and consumable CPAN Testers data.
- This whole copy in the CPAN Testers MySQL is formatted
inefficiently and viewing individual reports is the single biggest
resource problem on the CPAN Testers web app
- This page occasionally results in the server going down due to overload
- Integrate the Metabase API into the CPAN Testers
API
- This will reduce the amount of things we need to maintain
- The API app can be made highly-available if needed
- With HAProxy, even just the Metabase parts could be made such
- Move the Metabase to a locally-hosted document store
- Possible Technologies:
- ElasticSearch
- I've hosted Elastic before. It's easy to set up, and performs well out of the box.
- It's scalable, with new nodes easily joining the cluster and distributing data and load
- It could be used to perform full-text searches on test reports, opening us up to more features for CPAN Testers
- ElasticSearch has other roles as a logging database and monitoring system
- MongoDB
- I've used MongoDB a while back, and clustering was not as nice as ElasticSearch
- Besides (or even because of) that, it is easier to set up than ElasticSearch
- Full-text searching is not a key feature
- I should research this more, as it's been years since I last looked...
- ElasticSearch
- Possible Technologies:
- Status: In Development
-
A way to send in test reports from any CPAN client
- cpan
- CPAN::Reporter
- cpanplus
- Bundle::CPANPLUS::Test::Reporter?
- Task::CPANPLUS::Metabase?
- cpanm
- cpanm-reporter
- cpan
-
A way to set up a system to automatically run tests on CPAN modules
- Run tests on newly-uploaded distributions
- From release to report to CPAN Testers webapp should be as fast as possible
- Run tests on all of CPAN
- Useful to test new Perls against the existing CPAN
- Run tests on newly-uploaded distributions
XXX
XXX