Skip to content

Commit

Permalink
a bunch of docs
Browse files Browse the repository at this point in the history
  • Loading branch information
Lips7 committed Jul 18, 2024
1 parent 47eee45 commit ab1ebec
Show file tree
Hide file tree
Showing 24 changed files with 2,638 additions and 107 deletions.
16 changes: 7 additions & 9 deletions .github/workflows/rust.yml
Original file line number Diff line number Diff line change
Expand Up @@ -22,16 +22,14 @@ jobs:
toolchain: nightly
- name: Build
run: cargo build --release --verbose
- name: Run tests prebuilt
run: cargo test -p matcher_rs --verbose --no-default-features --features "prebuilt"
- name: Run tests runtime_build
run: cargo test -p matcher_rs --verbose --no-default-features --features "runtime_build"
- name: Run tests prebuilt and dfa
run: cargo test -p matcher_rs --verbose --no-default-features --features "prebuilt,dfa"
- name: Run tests runtime_build and dfa
- name: Test
run: cargo test -p matcher_rs --verbose --no-default-features
- name: Test dfa
run: cargo test -p matcher_rs --verbose --no-default-features --features "dfa"
- name: Test runtime_build and dfa
run: cargo test -p matcher_rs --verbose --no-default-features --features "runtime_build,dfa"
- name: Run tests serde
run: cargo test -p matcher_rs --verbose --no-default-features --features "prebuilt,dfa,serde"
- name: Test serde and dfa
run: cargo test -p matcher_rs --verbose --no-default-features --features "serde,dfa"
- name: Run doc
run: cargo doc
- name: Release
Expand Down
14 changes: 6 additions & 8 deletions .github/workflows/test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -55,16 +55,14 @@ jobs:
target: ${{ matrix.platform.target }}
- name: Build
run: cargo build --release --target ${{ matrix.platform.target }}
- name: Test prebuilt
run: cargo test -p matcher_rs --target ${{ matrix.platform.target }} --verbose --no-default-features --features "prebuilt"
- name: Test runtime_build
run: cargo test -p matcher_rs --target ${{ matrix.platform.target }} --verbose --no-default-features --features "runtime_build"
- name: Test prebuilt and dfa
run: cargo test -p matcher_rs --target ${{ matrix.platform.target }} --verbose --no-default-features --features "prebuilt,dfa"
- name: Test
run: cargo test -p matcher_rs --target ${{ matrix.platform.target }} --verbose --no-default-features
- name: Test dfa
run: cargo test -p matcher_rs --target ${{ matrix.platform.target }} --verbose --no-default-features --features "dfa"
- name: Test runtime_build and dfa
run: cargo test -p matcher_rs --target ${{ matrix.platform.target }} --verbose --no-default-features --features "runtime_build,dfa"
- name: Test serde
run: cargo test -p matcher_rs --target ${{ matrix.platform.target }} --verbose --no-default-features --features "runtime_build,dfa,serde"
- name: Test serde and dfa
run: cargo test -p matcher_rs --target ${{ matrix.platform.target }} --verbose --no-default-features --features "serde,dfa"
- name: Run doc
run: cargo doc
- name: Rename & move
Expand Down
4 changes: 4 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,9 @@
# Changelog

## 0.5.0 - 2024-07-18
### Changed
- A bunch of changes and I don't want to explain one by one.

## 0.4.6 - 2024-07-15
### Performance

Expand Down
6 changes: 3 additions & 3 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ readme = "README.md"
keywords = ["text", "string", "search", "pattern", "multi"]
license = "Apache-2.0 OR MIT"
repository = "https://github.com/Lips7/Matcher"
version = "0.4.6"
version = "0.5.0"
rust-version = "1.79.0"

[profile.release]
Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -128,7 +128,7 @@ Please refer to [benchmarks](./matcher_rs/README.md#benchmarks) for details.
- [ ] Customize str conversion map.
- [x] Add Matcher process function to py, c and java.
- [ ] For simple matcher, is it possible to use regex-automata to replace aho-corasick? and support regex.
- [ ] Add simple match type to `RegexMatcher` and `SimMatcher` to pre-process a text.
- [x] Add simple match type to `RegexMatcher` and `SimMatcher` to pre-process a text.

### Readability
- [x] More precise and convenient MatchTable.
Expand Down
2 changes: 1 addition & 1 deletion matcher_c/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,6 @@ name = "matcher_c"
crate-type = ["cdylib", "rlib"]

[dependencies]
matcher_rs = { path = "../matcher_rs", version = "0.4.6" }
matcher_rs = { path = "../matcher_rs", version = "0.5.0" }
rmp-serde = "1.3.0"
sonic-rs = "0.3.8"
147 changes: 145 additions & 2 deletions matcher_c/extension_types.py
Original file line number Diff line number Diff line change
@@ -1,8 +1,23 @@
from enum import Enum, IntFlag
from typing import Dict, List, TypedDict
from typing import Dict, List, TypedDict, Union


class ProcessType(IntFlag):
"""
An enumeration representing various types of text processing operations.
Attributes:
MatchNone (IntFlag): An operation that performs no matching (binary 00000001).
MatchFanjian (IntFlag): An operation that matches traditional and simplified Chinese characters (binary 00000010).
MatchDelete (IntFlag): An operation that matches deleted characters (binary 00000100).
MatchNormalize (IntFlag): An operation that normalizes characters (binary 00001000).
MatchDeleteNormalize (IntFlag): A combined operation that deletes and normalizes characters (binary 00001100).
MatchFanjianDeleteNormalize (IntFlag): A combined operation that matches traditional and simplified Chinese characters,
deletes, and normalizes (binary 00001110).
MatchPinYin (IntFlag): An operation that matches Pinyin representations of Chinese characters (binary 00010000).
MatchPinYinChar (IntFlag): An operation that matches individual characters in the Pinyin representation (binary 00100000).
"""

MatchNone = 0b00000001
MatchFanjian = 0b00000010
MatchDelete = 0b00000100
Expand All @@ -14,44 +29,116 @@ class ProcessType(IntFlag):


class RegexMatchType(Enum):
"""
An enumeration representing various types of regex matching operations.
Attributes:
MatchSimilarChar (str): An operation that matches characters that are similar in some way.
MatchAcrostic (str): An operation that matches acrostic patterns.
MatchRegex (str): An operation that matches using standard regular expressions.
"""

MatchSimilarChar = "similar_char"
MatchAcrostic = "acrostic"
MatchRegex = "regex"


class SimMatchType(Enum):
"""
An enumeration representing various types of similarity matching operations.
Attributes:
MatchLevenshtein (str): An operation that matches using the Levenshtein distance metric.
"""

MatchLevenshtein = "levenshtein"


class Simple(TypedDict):
"""
A TypedDict representing a simple text processing operation.
Attributes:
process_type (ProcessType): The type of processing operation to be performed.
"""

process_type: ProcessType


class Regex(TypedDict):
"""
A TypedDict representing a regex-based text processing operation.
Attributes:
process_type (ProcessType): The type of processing operation to be performed.
regex_match_type (RegexMatchType): The type of regex matching operation to be used.
"""

process_type: ProcessType
regex_match_type: RegexMatchType


class Similar(TypedDict):
"""
A TypedDict representing a similarity-based text processing operation.
Attributes:
process_type (ProcessType): The type of processing operation to be performed.
sim_match_type (SimMatchType): The type of similarity matching operation to be used.
threshold (float): The threshold value for the similarity matching operation.
"""

process_type: ProcessType
sim_match_type: SimMatchType
threshold: float


class MatchTableType:
def Simple(process_type: ProcessType) -> Dict[str, Simple]:
"""
Create a dictionary representing a simple text processing operation.
Args:
process_type (ProcessType): The type of processing operation to be performed.
Returns:
Dict[str, Simple]: A dictionary with one key "simple" mapping to a Simple TypedDict
containing the provided process_type.
"""
return {"simple": Simple(process_type=process_type)}

def Regex(
process_type: ProcessType, regex_match_type: RegexMatchType
) -> Dict[str, Regex]:
"""
Create a dictionary representing a regex-based text processing operation.
Args:
process_type (ProcessType): The type of processing operation to be performed.
regex_match_type (RegexMatchType): The type of regex matching operation to be used.
Returns:
Dict[str, Regex]: A dictionary with one key "regex" mapping to a Regex TypedDict
containing the provided process_type and regex_match_type.
"""
return {
"regex": Regex(process_type=process_type, regex_match_type=regex_match_type)
}

def Similar(
process_type: ProcessType, sim_match_type: SimMatchType, threshold: float
) -> Dict[str, Similar]:
"""
Create a dictionary representing a similarity-based text processing operation.
Args:
process_type (ProcessType): The type of processing operation to be performed.
sim_match_type (SimMatchType): The type of similarity matching operation to be used.
threshold (float): The threshold value for the similarity matching operation.
Returns:
Dict[str, Similar]: A dictionary with one key "similar" mapping to a Similar TypedDict
containing the provided process_type, sim_match_type, and threshold.
"""
return {
"similar": Similar(
process_type=process_type,
Expand All @@ -62,24 +149,80 @@ def Similar(


class MatchTable(TypedDict):
"""
A TypedDict representing a table for matching operations.
Attributes:
table_id (int): A unique identifier for the match table.
match_table_type (Union[Dict[str, Simple], Dict[str, Regex], Dict[str, Similar]]):
A dictionary that specifies the type of match operation to be performed. The key is a string indicating
the match type ('simple', 'regex', 'similar'), and the value is a corresponding TypedDict describing
the operation.
word_list (List[str]): A list of words that are subject to the matching operations.
exemption_process_type (ProcessType): The type of process for which certain words are exempt from matching.
exemption_word_list (List[str]): A list of words that are exempt from the matching process.
"""

table_id: int
match_table_type: MatchTableType
match_table_type: Union[Dict[str, Simple], Dict[str, Regex], Dict[str, Similar]]
word_list: List[str]
exemption_process_type: ProcessType
exemption_word_list: List[str]


MatchTableMap = Dict[int, List[MatchTable]]
"""
A type alias for mapping table identifiers to lists of MatchTable objects.
Type:
Dict[int, List[MatchTable]]
This dictionary maps an integer table ID to a list of MatchTable objects that correspond to the ID. It is used to
organize and retrieve match tables based on their unique identifiers.
"""


class MatchResult(TypedDict):
"""
A TypedDict representing the result of a matching operation.
Attributes:
match_id (int): A unique identifier for the match result.
table_id (int): The identifier of the match table where the matching operation was performed.
word_id (int): The identifier of the matched word within the word list.
word (str): The matched word.
similarity (float): The similarity score of the match operation.
"""

match_id: int
table_id: int
word_id: int
word: str
similarity: float


SimpleTable = Dict[ProcessType, Dict[int, str]]
"""
A type alias for representing a simple table structure for text processing.
This dictionary maps a `ProcessType` to another dictionary that maps an integer ID to a string.
The outer dictionary's keys represent different types of processing operations, while the inner
dictionary's keys represent unique identifiers corresponding to specific strings related to the
operations.
Type:
Dict[ProcessType, Dict[int, str]]
"""


class SimpleResult(TypedDict):
"""
A TypedDict representing a simplified result of a text processing operation.
Attributes:
word_id (int): The identifier of the word within the word list.
word (str): The word corresponding to the word_id.
"""

word_id: int
word: str
Loading

0 comments on commit ab1ebec

Please sign in to comment.