Skip to content

SunshineLibrary/statlysis

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Statlysis

Statistical & Analysis in Ruby DSL

Usage

setup

Statlysis.setup do
  set_database :statlysis

  daily CodeGist
  hourly EoeLog, :time_column => :t # support custom time_column

  [EoeLog,
   EoeLog.where(:ui => 0), # support query scope
   EoeLog.where(:ui => {"$ne" => 0}),
   Mongoid[/eoe_logs_[0-9]+$/].where(:ui => {"$ne" => 0}), # support collection name regexp
   EoeLog.where(:do => {"$in" => [DOMAINS_HASH[:blog], DOMAINS_HASH[:my]]}),
  ].each do |s|
    daily s, :time_column => :t
  end
end

access

Statlysis.daily # => return daily crons
Statlysis.daily.run # => run daily crons
Statlysis.daily[/name_regexp/] # => return matched daily crons

process

[23] pry(#<Statlysis::Configuration>)> Statlysis.daily['multi'].first

Features

  • Support time column that stored as integer.

TODO

  • Admin interface
  • statistical query api in Ruby and HTTP
  • Interacting with Javascript charting library, e.g. Highcharts, D3.

Statistical Process

  1. Delete invalid statistical data, e.g. data in tomorrow
  2. Count data within the specified time by the dimensions
  3. Delete overlapping data, and insert new data

FAQ

Q: Why use Sequel instead of ActiveRecord?

A: When initialize an ORM object, ActiveRecord is 3 times slower than Sequel, and we just need the basic operations, including read, write, enumerate, etc. See more details in Quick dive into Ruby ORM object initialization .

Q: Why do you recommend using multiple collections to store logs rather than a single collection, or a capped collection?

A: MongoDB can effectively reuse space freed by removing entire collections without leading to data fragmentation, see details at http://docs.mongodb.org/manual/use-cases/storing-log-data/#multiple-collections-single-database

Q: In Mongodb, why use MapReduce instead of Aggregation?

A: The result of aggregation pipeline is a document and is subject to the BSON Document size limit, which is currently 16 megabytes, see more details at http://docs.mongodb.org/manual/core/aggregation-pipeline/#pipeline

Copyright

MIT. David Chen at eoe.cn, sunshine-library .

Related

Projects

Articles

Event collector

Admin interface

  • http://three.kibana.org/ browser based analytics and search interface to Logstash and other timestamped data sets stored in ElasticSearch.

ETL

About

Statistical Analysis in Ruby DSL

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Ruby 100.0%