- BeamSearch now correctly uses
beam_width
parameter
Some internal changes to the continuous integration pipeline on top of version 0.7.6.
Moved to pyscaffold, src/test structure and GitHub Actions.
- you can now additionally provide constraints to SubgroupDiscovery
- MinSupportConstraint added
- you can now run the slow tests py passing
--runslow
to pytest Conjunction
,Disjunction
and Selectors now all have the public property.selectors
that provides all basic selectors involved
- support for weights has been removed, it will probably be added in the future as seperate targets and Quality functions.
create_numeric_selector_for_attribute
has been renamed tocreate_numeric_selectors_for_attribute
(inserting ans
) This brings it in lign with the corresponding name shema for nominal.
- statistics are now also store along with score and description
- The function
ps.get_cover_array_and_size
was added, it allows for a consistent way to acces a cover array (a.k.a. sth to be thrown into a dataframe or a numpy array) - algorithm tests now also call the
to_subgroups
andto_dataframe
methods to check they work with that algorithm - the order of
calculate_statistics
andget_base_statistics
are now in lign with that of quality functions (first subgroup then data) - the size of a subgroup specified in a statistics object is now called
size_sg
uniformly. This avoids confusion with thesize
attribute of numpy arrays etc.
This update prepares pysubgroup for a better future. To do so we had to break backwards compatibility. Many of the classes that you know and love have been renamed so as to make their purpose more clear.
- SubgroupDescription is now called Conjunction
- NominalTarget is now called BinaryTarget
- algorithms now return a SubgroupDiscoveryResult object
- the structure of quality functions changed (see documentation for more info)
- pysubgroup now has a bunch of tests
- some algorithms and quality functions support numba for just in time compilation
- ModelTarget
- gp-growth
- 3 types of Representations (bitset, set, numpy-set)
- Refinement operator
- Disjunction
- New algorithms
- Apriori now has the option to disable numba using the use_numba flag
- SimpleSrach now has a progressbar (enabled via the show_progress=True flag)
- The number of quality function evaluations can now be tracked using the CountCallsInterestingMeasure as a wrapper
- StandardQfNumeric now offers three different options to calculate the optimistic estimate
- 'sum' (default) sums the values larger then the dataset mean (cf. Lemmerich 2014 p. 81 top)
- 'average' uses the maximum target values as estimate (cf. Lemmerich 2014 p. 82 center)
- 'order' uses ordering based bounds (cf. Lemmerich 2014 p. 89 bottom)
- Apriori now calculates the constant statistics before using representation
- DFS now properly works with any quality function
- Apriori now reuses the compiled numba function
- Nominal target now uses subgroup.size to access the size of a subgroup representation
- StaticSpecializationOperator now avoids checking refinements of the same attribute
- test_algorithms_numeric now checks more algorithms
- SubgroupDescription has been replaced with Conjunction
- Selector .covers function returns a numpy array instead of a pandas Series (speedup on dense data)
- Conjunction .selectors is renamed to Conjunction._selectors
- quality functions have a different interface
- calculate_constant_statistics(self, task) caches necessary precomputation
- calculate_statistics(self, subgroup, data=None) returns a namedtuple with necessary statistics
- evaluate(self, subgroup, statistics=None) computes quality from provided statistics
- optimistic_estimate(self, subgroup, statistics=None) computes optimistic estimate from provided statistics
- Conjunction (replaces SubgroupDescription)
- Disjunction
- DNF (Disjunctive Normal Form)
- representations (given a dataset selectors are queried only once)
- BitsetRepresentation
- SetRepresentation
- NumpySetRepresentation
- SimpleSearch algorithm
- DFS (Depth first search) using a representation for StandardQF
- tests
- access to datasets for testing is provided through DataSets class
- tests for selector classes (NominalSelector, NumericSelector)
- __eq__
- __lt__
- __hash__ similarity
- uniqueness of selectors
- cover function for NominalSelector
- tests for Conjunction, Disjuntion
- __eq__
- __lt__
- __hash__ similarity
- cover
- tests for algorithms with nominal target concept on the creditg dataset (StandardQF(1) + NominalSearchSpace, StandardQF(1)+Nominal&Numeric Searchspace, StandardQF(0.5)+Nominal&Numeric Searchspace)
- Apriori
- SimpleDFS
- BeamSearch
- DFS_bitset
- DFS_set
- DFS_numpy_sets
- SimpleSearch
- tests for algorithms with numeric target concept (StandardQFNumeric)
- Apriori
- SimpleDFS
- DFSNumeric
- tests for algorithm with fi target (CountQF)
- Apriori
- DFS
- tests for algorithms to find the best Disjunctions
- Apriori
- Generalising BFS
- Apriori algorithm now runs significantly faster due to precomputing and usage of list comprehension