Text span utilities for Rust and Python

Rust doc: https://docs.rs/textspan

Usage (Python)

Install: pip install pytextspan

`align_spans`

def align_spans(spans: List[Tuple[int, int]], text: str, original_text: str) -> List[List[Tuple[int, int]]]: ...

Converts the spans defined in text to those defined in original_text.

This is useful, for example, when you want to get the spans in the original text of spans obtained in the normalized text.

>>> import textspan
>>> spans = [(0, 3), (3, 6)];
>>> text = "foobarbaz";
>>> original_text = "FOo.BåR baZ";
>>> textspan.align_spans(spans, text, original_text)
[[(0, 3)], [(4, 7)]]

`align_spans_by_mapping`

def align_spans_by_mapping(spans: List[Tuple[int, int]], mapping: List[List[int]]) -> List[List[Tuple[int, int]]]: ...

Converts the spans by the given mapping.

Generally speaking, the character correspondence between two texts is not necessarily surjective, not injective, not even a methematical map - some character in textA may not have a correspondence in textB, or may have multiple correspondences in textB. Thus, you should provide mapping as List[List[Tuple[int,int]]].

>>> import textspan
>>> spans = [(0, 2), (3, 4)]
>>> mapping = [[0, 1], [], [2], [4, 5, 6]]
>>> textspan.align_spans_by_mapping(spans, mapping)
[[(0, 2)], [(4, 7)]]

`get_original_spans`

def get_original_spans(tokens: List[str], original_text: str) -> List[List[Tuple[int, int]]]: ...

Returns the span indices of original_text from the tokens based on the shortest edit script (SES).

This is useful, for example, when you want to get the spans in the original text of tokens obtained in the normalized text.

>>> import textspan
>>> tokens = ["foo", "bar"]
>>> textspan.get_original_spans(tokens, "FO.o  BåR")
[[(0, 2), (3, 4)], [(6, 9)]]

`lift_span_index`

def lift_span_index(span: Tuple[int, int], target_spans: List[Tuple[int, int]]) -> Tuple[Tuple[int, bool], Tuple[int, bool]]: ...

Examples:

>>> import textspan
>>> spans = [(0, 3), (3, 4), (4, 9), (9, 12)]
>>> assert textspan.lift_spans_index((2, 10), spans) == (0, 4)

`lift_spans_index`

def lift_spans_index(spans: List[Tuple[int, int]], target_spans: List[Tuple[int, int]]) -> List[Tuple[Tuple[int, bool], Tuple[int, bool]]]: ...

`remove_span_overlaps`

def remove_span_overlaps(tokens: List[Tuple[int, int]]) -> List[Tuple[int, int]]: ...

Remove overlapping spans from given spans.

First, longest spans are remained - if the two spans are overlapped, the first span will be remained. If the two spans are overlapped and their start positions are same, the longer span will be remained.

>>> import textspan
>>> spans = [(0, 2), (0, 3), (2, 4), (5, 7)]
>>> assert textspan.remove_span_overlaps(spans) == [(0, 3), (5, 7)]

`remove_span_overlaps_idx`

def remove_span_overlaps_idx(tokens: List[Tuple[int, int]]) -> List[int]: ...

Remove overlapping spans from given spans, and returns remained span indices.

First, longest spans are remained - if the two spans are overlapped, the first span will be remained. If the two spans are overlapped and their start positions are same, the longer span will be remained.

>>> import textspan
>>> spans = [(0, 2), (0, 3), (2, 4), (5, 7)]
>>> assert textspan.remove_span_overlaps_idx(spans) == [1, 3]

Name		Name	Last commit message	Last commit date
Latest commit History 97 Commits
.cargo		.cargo
.github		.github
python		python
src		src
.gitignore		.gitignore
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md
rustfmt.toml		rustfmt.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Text span utilities for Rust and Python

Usage (Python)

`align_spans`

`align_spans_by_mapping`

`get_original_spans`

`lift_span_index`

`lift_spans_index`

`remove_span_overlaps`

`remove_span_overlaps_idx`

About

Releases

Sponsor this project

Packages

Contributors 2

Languages

License

tamuhey/textspan

Folders and files

Latest commit

History

Repository files navigation

Text span utilities for Rust and Python

Usage (Python)

align_spans

align_spans_by_mapping

get_original_spans

lift_span_index

lift_spans_index

remove_span_overlaps

remove_span_overlaps_idx

About

Resources

License

Stars

Watchers

Forks

Releases

Sponsor this project

Packages 0

Contributors 2

Languages

`align_spans`

`align_spans_by_mapping`

`get_original_spans`

`lift_span_index`

`lift_spans_index`

`remove_span_overlaps`

`remove_span_overlaps_idx`

Packages