Skip to content

Commit

Permalink
Merge pull request #2 from duguying/develop
Browse files Browse the repository at this point in the history
Support Chinese Word Segmentation to deal with polyphone, release v0.4.0
  • Loading branch information
duguying committed Feb 22, 2015
2 parents fb6aac2 + 1890acc commit 5340333
Show file tree
Hide file tree
Showing 23 changed files with 7,576 additions and 20,775 deletions.
1 change: 1 addition & 0 deletions .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@ language: c
compiler:
- "gcc"
before_script:
- "sudo apt-get update -qq"
- "sudo apt-get install php5-dev"
script:
- "phpize"
Expand Down
29 changes: 18 additions & 11 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -22,26 +22,33 @@ add_definitions(
set(PHPSDK "C:/php/SDK")

link_directories(
"${PHPSDK}/lib"
# "${PHPSDK}/lib"
# "D:/wamp/bin/php/php5.3.10/dev"
)
include_directories(
"${PHPSDK}/include"
"${PHPSDK}/include/main"
"${PHPSDK}/include/TSRM"
"${PHPSDK}/include/win32"
"${PHPSDK}/include/Zend"
# "${PHPSDK}/include"
# "${PHPSDK}/include/main"
# "${PHPSDK}/include/TSRM"
# "${PHPSDK}/include/win32"
# "${PHPSDK}/include/Zend"
)

set(
PY_LIST
./pinyin.c
./py_hashtable.c
./py_pinyin.c
./src/pinyin.c
./src/hashtable.c
./src/py_pinyin.c
)
add_library(php_pinyin SHARED ${PY_LIST})

target_link_libraries(php_pinyin php5ts)
set(
PY_TEST
./src/hashtable.c
./src/py_pinyin.c
./src/test_pinyin.c
)

# add_library(php_pinyin SHARED ${PY_LIST})
# target_link_libraries(php_pinyin php5ts)

add_executable(pinyin_test ${PY_TEST})

35 changes: 13 additions & 22 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,31 +19,22 @@ make
make test
```

**win32(cmake)**
## configuration
add following into `php.ini`

使用cmake编译,首先,您必须确保您本地存在PHP SDK(这个SDK可以由您编译php得来),以及目标php版本的静态库(如`php5ts.lib`),您可以按以下步骤进行:

- 打开VC命令行(以目标版本为VC9为例)

- 若您的SDK版本与目标版本不一致,搜索文件`${PHPSDK}/include/main/config.w32.h`中的字符串`PHP_COMPILER_ID`,将该宏的定义改为对应目标版本标识,如此例为`VC9`

- 设置`CMakeList.txt``add_definitions`的对应参数以符合自己的环境。

- 设置`CMakeList.txt``PHPSDK`路径。

- 在VC9命令行中编译:

```shell
mkdir build & cd build
cmake -G"NMake Makefiles" ..
nmake
```

如此便可得到php_pinyin.dll
```
pinyin.chars=/path/to/chars.csv
pinyin.words=/path/to/words.csv
```

**about**
## usage

the file `pinyin.inc`, it is a resource file. of course, you needn't change the file name or create a `pinyin.inc`, the file `pinyin.inc` will be built by a php script in the `/data/` directory.
```php
<?php
echo pinyin("汉"),"\n";
echo pinyin("わたしわ阿飞, and my English name is Rex Lee. 网名是独孤影! ^_^。下面是一段多音分词歧义测试,这个人无伤无臭味。"),"\n";
?>
```

# License #

Expand Down
28 changes: 28 additions & 0 deletions clean
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
rm -rf autom4te.cache
rm -rf build
rm -rf include
rm -rf modules

rm acinclude.m4
rm aclocal.m4
rm config.sub
rm configure
rm configure.in
rm config.status
rm config.guess
rm config.h
rm config.h.in
rm config.log
rm config.nice
rm Makefile*
rm ltmain.sh
rm libtool
rm mkinstalldirs
rm missing
rm install-sh
rm run-tests.php

rm -rf *.la
rm -rf src/*.lo
rm -rf *.a
rm -rf *.so
6 changes: 3 additions & 3 deletions config.m4
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ PHP_ARG_WITH(pinyin, for pinyin support,

PHP_ADD_INCLUDE(pinyin)

PHP_NEW_EXTENSION(pinyin, pinyin.c \
py_hashtable.c \
py_pinyin.c \
PHP_NEW_EXTENSION(pinyin, src/pinyin.c \
src/hashtable.c \
src/py_pinyin.c \
,$ext_shared)
6 changes: 3 additions & 3 deletions config.w32
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
ARG_ENABLE("pinyin", "for pinyin support", "yes");

EXTENSION("pinyin", "pinyin.c \
py_hashtable.c \
py_pinyin.c");
EXTENSION("pinyin", "src/pinyin.c \
src/hashtable.c \
src/py_pinyin.c");

10 changes: 2 additions & 8 deletions data/README.md
Original file line number Diff line number Diff line change
@@ -1,11 +1,5 @@
# get the latest data #
## about data ##

you can get the latest data file(pinyin.inc) from the internet via run the script `getData.sh`.
here is the data for pinyin-php extension

```
./getData.sh
```

## about pinyin-php resource build ##

the raw data is in the MySQL database, I read the data from the database and build the data into three variables.They are `char* cnchar`,`char* pinyin` and `char* py`. Chinese characters is in the variable `cnchar`. Chinese PinYin characters is in the variable `pinyin`. The PinYin alphabet is in the variable `py`.
Loading

0 comments on commit 5340333

Please sign in to comment.