Skip to content

Commit

Permalink
Merge pull request #25 from fidelity/default-span-constraint-on-index
Browse files Browse the repository at this point in the history
Built-in span constraint
  • Loading branch information
takojunior authored Jul 5, 2022
2 parents 993d62c + ac3b5a7 commit 0a0fc60
Show file tree
Hide file tree
Showing 15 changed files with 2,202 additions and 2,126 deletions.
9 changes: 9 additions & 0 deletions CHANGELOG.txt
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,15 @@
CHANGELOG
=========

-------------------------------------------------------------------------------
June, 28, 2022 1.3.3
-------------------------------------------------------------------------------

Minor:
- Add a default index attribute and a default maximum span constraint
- A fix to use original gap value instead of abs gap value in Check_gap() in build_mdd.cpp
- Update unit tests where the default constraint may change results

-------------------------------------------------------------------------------
May, 20, 2022 1.3.2
-------------------------------------------------------------------------------
Expand Down
24 changes: 20 additions & 4 deletions docs/_modules/sequential/seq2pat.html
Original file line number Diff line number Diff line change
Expand Up @@ -70,13 +70,13 @@ <h1>Source code for sequential.seq2pat</h1><div class="highlight"><pre>
<span class="c1"># SPDX-License-Identifier: GPL-2.0</span>

<span class="kn">import</span> <span class="nn">gc</span>
<span class="kn">from</span> <span class="nn">typing</span> <span class="kn">import</span> <span class="n">NamedTuple</span><span class="p">,</span> <span class="n">List</span><span class="p">,</span> <span class="n">Dict</span><span class="p">,</span> <span class="n">NoReturn</span>
<span class="kn">from</span> <span class="nn">typing</span> <span class="kn">import</span> <span class="n">NamedTuple</span><span class="p">,</span> <span class="n">List</span><span class="p">,</span> <span class="n">Dict</span><span class="p">,</span> <span class="n">NoReturn</span><span class="p">,</span> <span class="n">Optional</span>

<span class="kn">from</span> <span class="nn">sequential.backend</span> <span class="kn">import</span> <span class="n">seq_to_pat</span> <span class="k">as</span> <span class="n">stp</span>
<span class="kn">from</span> <span class="nn">sequential.utils</span> <span class="kn">import</span> <span class="n">Num</span><span class="p">,</span> <span class="n">check_true</span><span class="p">,</span> <span class="n">get_max_column_size</span><span class="p">,</span> \
<span class="n">get_min_value</span><span class="p">,</span> <span class="n">get_max_value</span><span class="p">,</span> <span class="n">sort_pattern</span><span class="p">,</span> <span class="n">item_map</span><span class="p">,</span> \
<span class="n">string_to_int</span><span class="p">,</span> <span class="n">int_to_string</span><span class="p">,</span> <span class="n">check_sequence_feature_same_length</span><span class="p">,</span> \
<span class="n">validate_attribute_values</span><span class="p">,</span> <span class="n">validate_sequences</span>
<span class="n">validate_attribute_values</span><span class="p">,</span> <span class="n">validate_sequences</span><span class="p">,</span> <span class="n">validate_max_span</span>


<span class="c1"># IMPORTANT: Constant values should not be changed</span>
Expand Down Expand Up @@ -345,11 +345,18 @@ <h1>Source code for sequential.seq2pat</h1><div class="highlight"><pre>
<span class="sd"> sequences : List[list]</span>
<span class="sd"> A list of sequences each with a list of events.</span>
<span class="sd"> The event values can be all strings or all integers.</span>
<span class="sd"> max_span: Optional[int]</span>
<span class="sd"> The value for applying a built-in maximum span constraint to the length of items in mining, max_span=10 by</span>
<span class="sd"> default (10 items). This is going to avoid regular users to run into a scaling issue when data contains long</span>
<span class="sd"> sequences but no constraints are used to run the mining efficiently and practically.</span>
<span class="sd"> Power users can choose to drop this constraint by setting it to be None or increase the maximum span</span>
<span class="sd"> as the system has resources to support.</span>
<span class="sd"> &quot;&quot;&quot;</span>

<span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">sequences</span><span class="p">:</span> <span class="n">List</span><span class="p">[</span><span class="nb">list</span><span class="p">]):</span>
<span class="c1"># Validate input sequences</span>
<span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">sequences</span><span class="p">:</span> <span class="n">List</span><span class="p">[</span><span class="nb">list</span><span class="p">],</span> <span class="n">max_span</span><span class="p">:</span> <span class="n">Optional</span><span class="p">[</span><span class="nb">int</span><span class="p">]</span> <span class="o">=</span> <span class="mi">10</span><span class="p">):</span>
<span class="c1"># Validate input</span>
<span class="n">validate_sequences</span><span class="p">(</span><span class="n">sequences</span><span class="p">)</span>
<span class="n">validate_max_span</span><span class="p">(</span><span class="n">max_span</span><span class="p">)</span>

<span class="c1"># Input sequences</span>
<span class="bp">self</span><span class="o">.</span><span class="n">_sequences</span><span class="p">:</span> <span class="n">List</span><span class="p">[</span><span class="nb">list</span><span class="p">]</span> <span class="o">=</span> <span class="n">sequences</span>
Expand All @@ -373,6 +380,15 @@ <h1>Source code for sequential.seq2pat</h1><div class="highlight"><pre>
<span class="c1"># Cython implementor object</span>
<span class="bp">self</span><span class="o">.</span><span class="n">_cython_imp</span> <span class="o">=</span> <span class="kc">None</span>

<span class="k">if</span> <span class="n">max_span</span><span class="p">:</span>
<span class="c1"># Create index attribute</span>
<span class="n">index_attr</span> <span class="o">=</span> <span class="n">Attribute</span><span class="p">([[</span><span class="n">i</span> <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="n">seq</span><span class="p">))]</span> <span class="k">for</span> <span class="n">seq</span> <span class="ow">in</span> <span class="n">sequences</span><span class="p">])</span>

<span class="c1"># Add built-in maximum span constraint on index.</span>
<span class="c1"># The minimum span is at least 1 between two indices. Here we add it explicitly.</span>
<span class="c1"># Given max_span items, the maximum difference on the index is (max_span - 1)</span>
<span class="bp">self</span><span class="o">.</span><span class="n">add_constraint</span><span class="p">(</span><span class="mi">1</span> <span class="o">&lt;=</span> <span class="n">index_attr</span><span class="o">.</span><span class="n">span</span><span class="p">()</span> <span class="o">&lt;=</span> <span class="p">(</span><span class="n">max_span</span> <span class="o">-</span> <span class="mi">1</span><span class="p">))</span>

<span class="nd">@property</span>
<span class="k">def</span> <span class="nf">sequences</span><span class="p">(</span><span class="bp">self</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="n">List</span><span class="p">[</span><span class="nb">list</span><span class="p">]:</span>
<span class="sd">&quot;&quot;&quot;Sequence</span>
Expand Down
8 changes: 7 additions & 1 deletion docs/api.html
Original file line number Diff line number Diff line change
Expand Up @@ -112,13 +112,19 @@

<dl class="py class">
<dt class="sig sig-object py" id="sequential.seq2pat.Seq2Pat">
<em class="property"><span class="pre">class</span><span class="w"> </span></em><span class="sig-prename descclassname"><span class="pre">sequential.seq2pat.</span></span><span class="sig-name descname"><span class="pre">Seq2Pat</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">sequences</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">List</span><span class="p"><span class="pre">[</span></span><a class="reference external" href="https://docs.python.org/3/library/stdtypes.html#list" title="(in Python v3.10)"><span class="pre">list</span></a><span class="p"><span class="pre">]</span></span></span></em><span class="sig-paren">)</span><a class="reference internal" href="_modules/sequential/seq2pat.html#Seq2Pat"><span class="viewcode-link"><span class="pre">[source]</span></span></a><a class="headerlink" href="#sequential.seq2pat.Seq2Pat" title="Permalink to this definition"></a></dt>
<em class="property"><span class="pre">class</span><span class="w"> </span></em><span class="sig-prename descclassname"><span class="pre">sequential.seq2pat.</span></span><span class="sig-name descname"><span class="pre">Seq2Pat</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">sequences</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">List</span><span class="p"><span class="pre">[</span></span><a class="reference external" href="https://docs.python.org/3/library/stdtypes.html#list" title="(in Python v3.10)"><span class="pre">list</span></a><span class="p"><span class="pre">]</span></span></span></em>, <em class="sig-param"><span class="n"><span class="pre">max_span</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">Optional</span><span class="p"><span class="pre">[</span></span><a class="reference external" href="https://docs.python.org/3/library/functions.html#int" title="(in Python v3.10)"><span class="pre">int</span></a><span class="p"><span class="pre">]</span></span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">10</span></span></em><span class="sig-paren">)</span><a class="reference internal" href="_modules/sequential/seq2pat.html#Seq2Pat"><span class="viewcode-link"><span class="pre">[source]</span></span></a><a class="headerlink" href="#sequential.seq2pat.Seq2Pat" title="Permalink to this definition"></a></dt>
<dd><p>Bases: <a class="reference external" href="https://docs.python.org/3/library/functions.html#object" title="(in Python v3.10)"><code class="xref py py-class docutils literal notranslate"><span class="pre">object</span></code></a></p>
<p><strong>Seq2Pat: Sequence-to-Pattern Generation Library</strong></p>
<dl class="simple">
<dt>sequences<span class="classifier">List[list]</span></dt><dd><p>A list of sequences each with a list of events.
The event values can be all strings or all integers.</p>
</dd>
<dt>max_span: Optional[int]</dt><dd><p>The value for applying a built-in maximum span constraint to the length of items in mining, max_span=10 by
default (10 items). This is going to avoid regular users to run into a scaling issue when data contains long
sequences but no constraints are used to run the mining efficiently and practically.
Power users can choose to drop this constraint by setting it to be None or increase the maximum span
as the system has resources to support.</p>
</dd>
</dl>
<dl class="py method">
<dt class="sig sig-object py" id="sequential.seq2pat.Seq2Pat.add_constraint">
Expand Down
Loading

0 comments on commit 0a0fc60

Please sign in to comment.