Skip to content

Commit

Permalink
Merge pull request #466 from YuukiToriyama/release/v0.1.19
Browse files Browse the repository at this point in the history
release/v0.1.19をmainブランチにマージ
  • Loading branch information
YuukiToriyama authored Oct 19, 2024
2 parents 197454f + 3338295 commit 8f84a78
Show file tree
Hide file tree
Showing 9 changed files with 59 additions and 96 deletions.
1 change: 1 addition & 0 deletions .github/workflows/code-quality-check.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ on:
paths:
- '**.rs'
- '**/Cargo.toml'
- '!*.md'

jobs:
build:
Expand Down
1 change: 1 addition & 0 deletions .github/workflows/python-build-check.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@ name: Python module build check

on:
pull_request:
paths-ignore: [ '*.md' ]
push:
branches: [ 'main' ]

Expand Down
1 change: 1 addition & 0 deletions .github/workflows/run-test.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@ name: Unit test & Integration test

on:
pull_request:
paths-ignore: [ '*.md' ]
push:
branches: [ 'main' ]

Expand Down
2 changes: 1 addition & 1 deletion Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ members = [
resolver = "2"

[workspace.package]
version = "0.1.18"
version = "0.1.19"
edition = "2021"
description = "A Rust Library to parse japanese addresses."
repository = "https://github.com/YuukiToriyama/japanese-address-parser"
Expand Down
31 changes: 15 additions & 16 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,16 +5,15 @@
![Rust Version](https://img.shields.io/badge/rust%20version-%3E%3D1.73.0-orange)
[![Unit test & Integration test](https://github.com/YuukiToriyama/japanese-address-parser/actions/workflows/run-test.yaml/badge.svg?branch=main)](https://github.com/YuukiToriyama/japanese-address-parser/actions/workflows/run-test.yaml)

A Rust Library to parse japanese addresses.
A Rust library for parsing Japanese addresses.

## Usage

Add this to your `Cargo.toml`
Add the following to your `Cargo.toml`.

```bash
cargo add japanese-address-parser
# or
cargo add japanese-address-parser -F blocking
```toml
[dependencies]
japanese-address-parser = "0.1"
```

### Async Version
Expand Down Expand Up @@ -47,7 +46,7 @@ fn main() {
[![npmjs](https://img.shields.io/npm/v/%40toriyama/japanese-address-parser)](https://www.npmjs.com/package/@toriyama/japanese-address-parser)

This crate is designed to be buildable for `wasm32-unknown-unknown` with `wasm-pack`.
Pre-compiled wasm module is available npmjs.com
Pre-compiled wasm module is available on npmjs.com

```bash
npm install @toriyama/japanese-address-parser
Expand All @@ -68,28 +67,28 @@ init().then(() => {

[![PyPI - Version](https://img.shields.io/pypi/v/japanese-address-parser-py)](https://pypi.org/project/japanese-address-parser-py/)

This library can be called from Python world. For more detail, see [python module's README](python/README.md).
This library can be called from the Python world. For more details, see [python module's README](python/README.md).

## Road to v1

The goals that this library aims to achieve are below.
The goals of this library are as follows.

- Supports not only wasm target but also various platforms and architectures.
- Supports not only wasm but also multiple platforms and architectures.
- Enables more advanced normalization. For example, provides more detailed analysis than town level.
- Provides latlng of the given address.
- Enables processing of town names that have ceased to exist as a result of municipal mergers.
- Returns the location of the given address.
- Enables processing of town names that no longer exist due to municipal mergers.

## Support

This software is maintained by [YuukiToriyama](https://github.com/yuukitoriyama).
If you have questions, please create an issue.
If you have any questions, please create a new issue.

## Acknowledgements

This software was developed inspired
This software was inspired
by [@geolonia/normalize-japanese-addresses](https://github.com/geolonia/normalize-japanese-addresses).
Also, the parsing process uses [Geolonia 住所データ](https://github.com/geolonia/japanese-addresses) provided
by [株式会社Geolonia](https://www.geolonia.com/company/).
In addition, the parsing process uses [Geolonia 住所データ](https://github.com/geolonia/japanese-addresses) which is
provided by [株式会社Geolonia](https://www.geolonia.com/company/).

## License

Expand Down
1 change: 1 addition & 0 deletions core/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,7 @@ regex = { version = "1.10.6", default-features = false, features = ["std", "unic
serde.workspace = true
reqwest = { version = "0.12.5", default-features = false, features = ["json", "rustls-tls"] }
js-sys = "0.3.67"
jisx0401 = "0.1.0-beta.3"

[dev-dependencies]
criterion = { version = "0.5.1", default-features = false, features = ["html_reports"] }
Expand Down
20 changes: 10 additions & 10 deletions core/src/parser.rs
Original file line number Diff line number Diff line change
Expand Up @@ -81,7 +81,7 @@ impl Parser {
pub async fn parse(api: Arc<AsyncApi>, input: &str) -> ParseResult {
let tokenizer = Tokenizer::new(input);
// 都道府県を特定
let (prefecture_name, tokenizer) = match tokenizer.read_prefecture() {
let (prefecture, tokenizer) = match tokenizer.read_prefecture() {
Ok(found) => found,
Err(tokenizer) => {
return ParseResult {
Expand All @@ -91,7 +91,7 @@ pub async fn parse(api: Arc<AsyncApi>, input: &str) -> ParseResult {
}
};
// その都道府県の市町村名リストを取得
let prefecture = match api.get_prefecture_master(&prefecture_name).await {
let prefecture_master = match api.get_prefecture_master(prefecture.name_ja()).await {
Err(error) => {
return ParseResult {
address: Address::from(tokenizer.finish()),
Expand All @@ -101,11 +101,11 @@ pub async fn parse(api: Arc<AsyncApi>, input: &str) -> ParseResult {
Ok(result) => result,
};
// 市町村名を特定
let (city_name, tokenizer) = match tokenizer.read_city(&prefecture.cities) {
let (city_name, tokenizer) = match tokenizer.read_city(&prefecture_master.cities) {
Ok(found) => found,
Err(not_found) => {
// 市区町村が特定できない場合かつフィーチャフラグが有効な場合、郡名が抜けている可能性を検討
match not_found.read_city_with_county_name_completion(&prefecture.cities) {
match not_found.read_city_with_county_name_completion(&prefecture_master.cities) {
Ok(found) if cfg!(feature = "city-name-correction") => found,
_ => {
// それでも見つからない場合は終了
Expand All @@ -118,7 +118,7 @@ pub async fn parse(api: Arc<AsyncApi>, input: &str) -> ParseResult {
}
};
// その市町村の町名リストを取得
let city = match api.get_city_master(&prefecture_name, &city_name).await {
let city = match api.get_city_master(prefecture.name_ja(), &city_name).await {
Err(error) => {
return ParseResult {
address: Address::from(tokenizer.finish()),
Expand Down Expand Up @@ -247,7 +247,7 @@ mod tests {
#[cfg(feature = "blocking")]
pub fn parse_blocking(api: Arc<BlockingApi>, input: &str) -> ParseResult {
let tokenizer = Tokenizer::new(input);
let (prefecture_name, tokenizer) = match tokenizer.read_prefecture() {
let (prefecture, tokenizer) = match tokenizer.read_prefecture() {
Ok(found) => found,
Err(tokenizer) => {
return ParseResult {
Expand All @@ -256,7 +256,7 @@ pub fn parse_blocking(api: Arc<BlockingApi>, input: &str) -> ParseResult {
}
}
};
let prefecture = match api.get_prefecture_master(&prefecture_name) {
let prefecture_master = match api.get_prefecture_master(prefecture.name_ja()) {
Err(error) => {
return ParseResult {
address: Address::from(tokenizer.finish()),
Expand All @@ -265,10 +265,10 @@ pub fn parse_blocking(api: Arc<BlockingApi>, input: &str) -> ParseResult {
}
Ok(result) => result,
};
let (city_name, tokenizer) = match tokenizer.read_city(&prefecture.cities) {
let (city_name, tokenizer) = match tokenizer.read_city(&prefecture_master.cities) {
Ok(found) => found,
Err(not_found) => {
match not_found.read_city_with_county_name_completion(&prefecture.cities) {
match not_found.read_city_with_county_name_completion(&prefecture_master.cities) {
Ok(found) if cfg!(feature = "city-name-correction") => found,
_ => {
return ParseResult {
Expand All @@ -279,7 +279,7 @@ pub fn parse_blocking(api: Arc<BlockingApi>, input: &str) -> ParseResult {
}
}
};
let city = match api.get_city_master(&prefecture_name, &city_name) {
let city = match api.get_city_master(prefecture.name_ja(), &city_name) {
Err(error) => {
return ParseResult {
address: Address::from(tokenizer.finish()),
Expand Down
74 changes: 15 additions & 59 deletions core/src/tokenizer/read_prefecture.rs
Original file line number Diff line number Diff line change
Expand Up @@ -3,56 +3,6 @@ use crate::tokenizer::{End, Init, PrefectureNameFound, Tokenizer};
use crate::util::extension::StrExt;
use std::marker::PhantomData;

const PREFECTURE_NAME_LIST: [&str; 47] = [
"北海道",
"青森県",
"岩手県",
"宮城県",
"秋田県",
"山形県",
"福島県",
"茨城県",
"栃木県",
"群馬県",
"埼玉県",
"千葉県",
"東京都",
"神奈川県",
"新潟県",
"富山県",
"石川県",
"福井県",
"山梨県",
"長野県",
"岐阜県",
"静岡県",
"愛知県",
"三重県",
"滋賀県",
"京都府",
"大阪府",
"兵庫県",
"奈良県",
"和歌山県",
"鳥取県",
"島根県",
"岡山県",
"広島県",
"山口県",
"徳島県",
"香川県",
"愛媛県",
"高知県",
"福岡県",
"佐賀県",
"長崎県",
"熊本県",
"大分県",
"宮崎県",
"鹿児島県",
"沖縄県",
];

impl Tokenizer<Init> {
pub(crate) fn new(input: &str) -> Self {
Self {
Expand All @@ -68,11 +18,12 @@ impl Tokenizer<Init> {

pub(crate) fn read_prefecture(
&self,
) -> Result<(String, Tokenizer<PrefectureNameFound>), Tokenizer<End>> {
for prefecture_name in PREFECTURE_NAME_LIST {
if self.rest.starts_with(prefecture_name) {
return Ok((
prefecture_name.to_string(),
) -> Result<(jisx0401::Prefecture, Tokenizer<PrefectureNameFound>), Tokenizer<End>> {
match find_prefecture(&self.rest) {
Some(prefecture) => {
let prefecture_name = prefecture.name_ja();
Ok((
prefecture.clone(),
Tokenizer {
tokens: vec![Token::Prefecture(Prefecture {
prefecture_name: prefecture_name.to_string(),
Expand All @@ -85,17 +36,22 @@ impl Tokenizer<Init> {
.collect::<String>(),
_state: PhantomData::<PrefectureNameFound>,
},
));
))
}
None => Err(self.finish()),
}
Err(self.finish())
}
}

fn find_prefecture(input: &str) -> Option<&jisx0401::Prefecture> {
jisx0401::Prefecture::values().find(|&prefecture| input.starts_with(prefecture.name_ja()))
}

#[cfg(test)]
mod tests {
use crate::domain::common::token::Token;
use crate::tokenizer::Tokenizer;
use jisx0401::Prefecture;

#[test]
fn new() {
Expand Down Expand Up @@ -124,8 +80,8 @@ mod tests {
let tokenizer = Tokenizer::new("東京都港区芝公園4丁目2-8");
let result = tokenizer.read_prefecture();
assert!(result.is_ok());
let (prefecture_name, tokenizer) = result.unwrap();
assert_eq!(prefecture_name, "東京都");
let (prefecture, tokenizer) = result.unwrap();
assert_eq!(prefecture, Prefecture::TOKYO);
assert_eq!(tokenizer.tokens.len(), 1);
assert_eq!(tokenizer.rest, "港区芝公園4丁目2-8");
}
Expand Down
24 changes: 14 additions & 10 deletions python/README.md
Original file line number Diff line number Diff line change
@@ -1,15 +1,18 @@
# japanese-address-parser-py
A python toolkit for processing japanese addresses

A Python toolkit for processing Japanese addresses

[![PyPI - Version](https://img.shields.io/pypi/v/japanese-address-parser-py)](https://pypi.org/project/japanese-address-parser-py/)
[![PyPI - Downloads](https://img.shields.io/pypi/dm/japanese-address-parser-py)](https://pypi.org/project/japanese-address-parser-py/#history)
[![Unit test & Integration test](https://github.com/YuukiToriyama/japanese-address-parser/actions/workflows/run-test.yaml/badge.svg?branch=main)](https://github.com/YuukiToriyama/japanese-address-parser/actions/workflows/run-test.yaml)

## What is it?
**japanese-address-parser-py** is a Python package for parsing japanese addresses.
Any address can be processed into structured data.

**japanese-address-parser-py** is a Python package for parsing Japanese addresses.
Any address can be parsed into structured data.

## Installation from PyPI

```bash
pip install japanese-address-parser-py
```
Expand Down Expand Up @@ -38,7 +41,6 @@ for address in address_list:
{'town': '日本大通', 'city': '横浜市中区', 'prefecture': '神奈川県', 'rest': '1'}
```


```python
from japanese_address_parser_py import Parser

Expand All @@ -59,8 +61,9 @@ print(parse_result.address["rest"])
```

## Development
This library is written in Rust language. You need to set up a Rust development environment to build this library.
Also, you need to install `maturin` because this library uses it in order to generate Python bindings.

This library is written in Rust. You need to set up a Rust development environment to build this library.
Also, you need to install `maturin` as this library uses it in order to generate Python bindings.

```bash
# Install maturin
Expand All @@ -78,18 +81,19 @@ pip3 install dist/japanese_address_parser_py-[version]-cp37-abi3-[arch].whl
## Support

This software is maintained by [YuukiToriyama](https://github.com/yuukitoriyama).
If you have questions, please create an issue.
If you have any questions, please create a new issue.

## Where to get source code

The source code is hosted on GitHub at:
https://github.com/YuukiToriyama/japanese-address-parser

## Acknowledgements

This software was developed inspired
This software was inspired
by [@geolonia/normalize-japanese-addresses](https://github.com/geolonia/normalize-japanese-addresses).
Also, the parsing process uses [Geolonia 住所データ](https://github.com/geolonia/japanese-addresses) provided
by [株式会社Geolonia](https://www.geolonia.com/company/).
In addition, the parsing process uses [Geolonia 住所データ](https://github.com/geolonia/japanese-addresses) which is
provided by [株式会社Geolonia](https://www.geolonia.com/company/).

## License

Expand Down

0 comments on commit 8f84a78

Please sign in to comment.