Skip to content

Commit

Permalink
feat(stripEndings): add stripEndings function to remove line endings
Browse files Browse the repository at this point in the history
Closes #25
Closes #138
  • Loading branch information
Harjot1Singh committed Jun 2, 2020
1 parent afdb20b commit 9158e56
Show file tree
Hide file tree
Showing 8 changed files with 205 additions and 4 deletions.
4 changes: 3 additions & 1 deletion README.hbs
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,7 @@ const {
toShahmukhi,
stripAccents,
stripVishraams,
stripEndings
isGurmukhi,
} = require( 'gurmukhi-utils' )

Expand All @@ -41,7 +42,8 @@ toHindi('ਕੁਲ ਜਨ ਮਧੇ ਮਿਲੵੋਿ ਸਾਰਗ ਪਾਨ
toShahmukhi('ਹਰਿ ਹਰਿ ਹਰਿ ਗੁਨੀ') // => هر هر هر گُنی
stripAccents('ਜ਼ਫ਼ੈਸ਼ਸ') // => ਜਫੈਸਸ
stripVishraams('sbid mrY. so mir rhY; iPir.') // => sbid mrY so mir rhY iPir
isGurmukhi('ਗੁਰਮੁਖੀ') // t=> true
stripEndings('ਸੋ ਘਰੁ ਰਾਖੁ; ਵਡਾਈ ਤੋਇ ॥੧॥ ਰਹਾਉ ॥') // => ਸੋ ਘਰੁ ਰਾਖੁ; ਵਡਾਈ ਤੋਇ
isGurmukhi('ਗੁਰਮੁਖੀ') // => true
```

Additionally, the package is available for web use via [unpkg CDN](https://unpkg.com/gurmukhi-utils).
Expand Down
34 changes: 33 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@ Want to speak with us? <p>[![Slack](https://slack.shabados.com/badge.svg)](https
* [firstLetters(line, [stripNukta], [withVishraams]) ⇒ String](#firstlettersline-stripnukta-withvishraams-%E2%87%92-string)
* [isGurmukhi(text, [exhaustive]) ⇒ boolean](#isgurmukhitext-exhaustive-%E2%87%92-boolean)
* [stripAccents(text) ⇒ String](#stripaccentstext-%E2%87%92-string)
* [stripEndings(text) ⇒ String](#stripendingstext-%E2%87%92-string)
* [stripVishraams(text, options) ⇒ String](#stripvishraamstext-options-%E2%87%92-string)
* [toAscii(text) ⇒ String](#toasciitext-%E2%87%92-string)
* [toEnglish(line) ⇒ String](#toenglishline-%E2%87%92-string)
Expand All @@ -45,6 +46,7 @@ const {
toShahmukhi,
stripAccents,
stripVishraams,
stripEndings
isGurmukhi,
} = require( 'gurmukhi-utils' )

Expand All @@ -57,7 +59,8 @@ toHindi('ਕੁਲ ਜਨ ਮਧੇ ਮਿਲੵੋਿ ਸਾਰਗ ਪਾਨ
toShahmukhi('ਹਰਿ ਹਰਿ ਹਰਿ ਗੁਨੀ') // => هر هر هر گُنی
stripAccents('ਜ਼ਫ਼ੈਸ਼ਸ') // => ਜਫੈਸਸ
stripVishraams('sbid mrY. so mir rhY; iPir.') // => sbid mrY so mir rhY iPir
isGurmukhi('ਗੁਰਮੁਖੀ') // t=> true
stripEndings('ਸੋ ਘਰੁ ਰਾਖੁ; ਵਡਾਈ ਤੋਇ ॥੧॥ ਰਹਾਉ ॥') // => ਸੋ ਘਰੁ ਰਾਖੁ; ਵਡਾਈ ਤੋਇ
isGurmukhi('ਗੁਰਮੁਖੀ') // => true
```

Additionally, the package is available for web use via [unpkg CDN](https://unpkg.com/gurmukhi-utils).
Expand Down Expand Up @@ -138,6 +141,35 @@ Useful for generalising search queries.
stripAccents('ਜ਼ਫ਼ੈਸ਼ਸਓ') // => ਜਫੈਸਸੳ
stripAccents('Z^Svb') // => gKsvb
```
### stripEndings(text) ⇒ <code>String</code>
Strips line endings from any Gurmukhi or translation string.
Accepts both Unicode and ASCII input.
Useful for generating accurate first letters or modifying non-Gurbani for better display.
*Not* designed for headings or Sirlekhs.

**Returns**: <code>String</code> - A ending-less version of the text.

| Param | Type | Description |
| --- | --- | --- |
| text | <code>String</code> | The text to stip endings from. |

**Example** *(Line ending phrases)*
```js
stripEndings('ਸੋ ਘਰੁ ਰਾਖੁ; ਵਡਾਈ ਤੋਇ ॥੧॥ ਰਹਾਉ ॥') // => ਸੋ ਘਰੁ ਰਾਖੁ; ਵਡਾਈ ਤੋਇ
stripEndings('ਹੁਕਮੁ ਪਛਾਣਿ; ਤਾ ਖਸਮੈ ਮਿਲਣਾ ॥੧॥ ਰਹਾਉ ਦੂਜਾ ॥') // => ਹੁਕਮੁ ਪਛਾਣਿ; ਤਾ ਖਸਮੈ ਮਿਲਣਾ
stripEndings('ਜਨ ਨਾਨਕ. ਗੁਰਮੁਖਿ ਜਾਤਾ ਰਾਮ ॥੪॥੬॥ ਛਕਾ ੧ ॥') // => ਜਨ ਨਾਨਕ. ਗੁਰਮੁਖਿ ਜਾਤਾ ਰਾਮ
```
**Example** *(English Translations)*
```js
stripEndings('O Nanak, Forever And Ever True. ||1||') // => O Nanak, Forever And Ever True.
stripEndings('lush greenery. ||1||Pause||') // => lush greenery.
stripEndings('always I live within the Khalsa. 519') // => always I live within the Khalsa.
stripEndings('without your reminiscence.(1) (3)') // => without your reminiscence.
```
**Example** *(Spanish Translations)*
```js
stripEndings('ofrece su ser en sacrificio a Ti. (4-2-9)') // => ofrece su ser en sacrificio a Ti.
```
### stripVishraams(text, options) ⇒ <code>String</code>
Removes the specified vishraams from a string.

Expand Down
34 changes: 33 additions & 1 deletion docs/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@ Want to speak with us? <p>[![Slack](https://slack.shabados.com/badge.svg)](https
* [firstLetters(line, [stripNukta], [withVishraams]) ⇒ String](#firstlettersline-stripnukta-withvishraams-%E2%87%92-string)
* [isGurmukhi(text, [exhaustive]) ⇒ boolean](#isgurmukhitext-exhaustive-%E2%87%92-boolean)
* [stripAccents(text) ⇒ String](#stripaccentstext-%E2%87%92-string)
* [stripEndings(text) ⇒ String](#stripendingstext-%E2%87%92-string)
* [stripVishraams(text, options) ⇒ String](#stripvishraamstext-options-%E2%87%92-string)
* [toAscii(text) ⇒ String](#toasciitext-%E2%87%92-string)
* [toEnglish(line) ⇒ String](#toenglishline-%E2%87%92-string)
Expand All @@ -45,6 +46,7 @@ const {
toShahmukhi,
stripAccents,
stripVishraams,
stripEndings
isGurmukhi,
} = require( 'gurmukhi-utils' )

Expand All @@ -57,7 +59,8 @@ toHindi('ਕੁਲ ਜਨ ਮਧੇ ਮਿਲੵੋਿ ਸਾਰਗ ਪਾਨ
toShahmukhi('ਹਰਿ ਹਰਿ ਹਰਿ ਗੁਨੀ') // => هر هر هر گُنی
stripAccents('ਜ਼ਫ਼ੈਸ਼ਸ') // => ਜਫੈਸਸ
stripVishraams('sbid mrY. so mir rhY; iPir.') // => sbid mrY so mir rhY iPir
isGurmukhi('ਗੁਰਮੁਖੀ') // t=> true
stripEndings('ਸੋ ਘਰੁ ਰਾਖੁ; ਵਡਾਈ ਤੋਇ ॥੧॥ ਰਹਾਉ ॥') // => ਸੋ ਘਰੁ ਰਾਖੁ; ਵਡਾਈ ਤੋਇ
isGurmukhi('ਗੁਰਮੁਖੀ') // => true
```

Additionally, the package is available for web use via [unpkg CDN](https://unpkg.com/gurmukhi-utils).
Expand Down Expand Up @@ -138,6 +141,35 @@ Useful for generalising search queries.
stripAccents('ਜ਼ਫ਼ੈਸ਼ਸਓ') // => ਜਫੈਸਸੳ
stripAccents('Z^Svb') // => gKsvb
```
### stripEndings(text) ⇒ <code>String</code>
Strips line endings from any Gurmukhi or translation string.
Accepts both Unicode and ASCII input.
Useful for generating accurate first letters or modifying non-Gurbani for better display.
*Not* designed for headings or Sirlekhs.

**Returns**: <code>String</code> - A ending-less version of the text.

| Param | Type | Description |
| --- | --- | --- |
| text | <code>String</code> | The text to stip endings from. |

**Example** *(Line ending phrases)*
```js
stripEndings('ਸੋ ਘਰੁ ਰਾਖੁ; ਵਡਾਈ ਤੋਇ ॥੧॥ ਰਹਾਉ ॥') // => ਸੋ ਘਰੁ ਰਾਖੁ; ਵਡਾਈ ਤੋਇ
stripEndings('ਹੁਕਮੁ ਪਛਾਣਿ; ਤਾ ਖਸਮੈ ਮਿਲਣਾ ॥੧॥ ਰਹਾਉ ਦੂਜਾ ॥') // => ਹੁਕਮੁ ਪਛਾਣਿ; ਤਾ ਖਸਮੈ ਮਿਲਣਾ
stripEndings('ਜਨ ਨਾਨਕ. ਗੁਰਮੁਖਿ ਜਾਤਾ ਰਾਮ ॥੪॥੬॥ ਛਕਾ ੧ ॥') // => ਜਨ ਨਾਨਕ. ਗੁਰਮੁਖਿ ਜਾਤਾ ਰਾਮ
```
**Example** *(English Translations)*
```js
stripEndings('O Nanak, Forever And Ever True. ||1||') // => O Nanak, Forever And Ever True.
stripEndings('lush greenery. ||1||Pause||') // => lush greenery.
stripEndings('always I live within the Khalsa. 519') // => always I live within the Khalsa.
stripEndings('without your reminiscence.(1) (3)') // => without your reminiscence.
```
**Example** *(Spanish Translations)*
```js
stripEndings('ofrece su ser en sacrificio a Ti. (4-2-9)') // => ofrece su ser en sacrificio a Ti.
```
### stripVishraams(text, options) ⇒ <code>String</code>
Removes the specified vishraams from a string.

Expand Down
4 changes: 3 additions & 1 deletion example.js
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,8 @@ const {
toEnglish,
stripAccents,
stripVishraams,
isGurmukhi
stripEndings,
isGurmukhi,
} = require( 'gurmukhi-utils' )

console.log(toUnicode( 'Koj' ))
Expand All @@ -19,4 +20,5 @@ console.log(toShahmukhi( 'ਹਰਿ ਹਰਿ ਹਰਿ ਗੁਨੀ' ))
console.log(toHindi( 'ਕੁਲ ਜਨ ਮਧੇ ਮਿਲੵੋਿ ਸਾਰਗ ਪਾਨ ਰੇ ॥' ))
console.log(stripAccents('ਜ਼ਫ਼ੈਸ਼ਸ'))
console.log(stripVishraams('sbid mrY. so mir rhY; iPir.'))
console.log(stripEndings('ਸੋ ਘਰੁ ਰਾਖੁ; ਵਡਾਈ ਤੋਇ ॥੧॥ ਰਹਾਉ ॥'))
console.log(isGurmukhi('ਗੁਰਮੁਖੀ'))
3 changes: 3 additions & 0 deletions index.d.ts
Original file line number Diff line number Diff line change
Expand Up @@ -13,10 +13,13 @@ export function firstLetters(text: string, stripNukta?: boolean = true, withVish
export function isGurmukhi(text: string, exhaustive?: boolean): boolean

export function stripAccents(text: string): string

interface StripVishraamsOptions {
heavy?: boolean;
medium?: boolean;
light?: boolean;
}

export function stripVishraams(text: string, options?: StripVishraamsOptions): string

export function stripEndings(text: string): string
2 changes: 2 additions & 0 deletions index.js
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ const toHindi = require( './lib/toHindi' )
const isGurmukhi = require( './lib/isGurmukhi' )
const stripAccents = require( './lib/stripAccents' )
const stripVishraams = require( './lib/stripVishraams' )
const stripEndings = require( './lib/stripEndings' )

module.exports = {
toAscii,
Expand All @@ -18,4 +19,5 @@ module.exports = {
isGurmukhi,
stripAccents,
stripVishraams,
stripEndings,
}
51 changes: 51 additions & 0 deletions lib/stripEndings.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
const toUnicode = require( './toUnicode' )
const toAscii = require( './toAscii' )
const { getRegexClass, getRegexGroup } = require( './regex-utils' )

// Line endings in both ASCII, Unicode, and English
const endingClass = getRegexClass( [ '।', '॥', ']', '[', '|' ] )
// Sometimes translation line endings begin with these characters, before numbers
const optionalEndingClass = getRegexClass( [ '(' ] )
// Remove any broken endings
const brokenEndingClass = getRegexGroup( [ '()' ] )

// All numbers in ASCII, Unicode
const numbers = [ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 ].map( i => i.toString() )
const numberClass = getRegexClass( [ ...numbers, ...numbers.map( toUnicode ) ] )

// Rahao in English, ASCII, Unicode
const pauseGroup = getRegexGroup( [ 'ਰਹਾਉ', toAscii( 'ਰਹਾਉ' ), 'Pause' ] )

const matchers = [
// Endings followed by any number => match the rest of the line
` ?(${endingClass}|${optionalEndingClass}?)${numberClass}.*`,
// || Rahao || style endings
` ?${endingClass} ?${pauseGroup} ?${endingClass}`,
// Clean up any lingering ending characters
brokenEndingClass,
endingClass,
].map( exp => new RegExp( exp, 'g' ) )


/**
* Strips line endings from any Gurmukhi or translation string.
* Accepts both Unicode and ASCII input.
* Useful for generating accurate first letters or modifying non-Gurbani for better display.
* *Not* designed for headings or Sirlekhs.
* @param {String} text The text to stip endings from.
* @return {String} A ending-less version of the text.
* @example <caption>Line ending phrases</caption>
* stripEndings('ਸੋ ਘਰੁ ਰਾਖੁ; ਵਡਾਈ ਤੋਇ ॥੧॥ ਰਹਾਉ ॥') // => ਸੋ ਘਰੁ ਰਾਖੁ; ਵਡਾਈ ਤੋਇ
* stripEndings('ਹੁਕਮੁ ਪਛਾਣਿ; ਤਾ ਖਸਮੈ ਮਿਲਣਾ ॥੧॥ ਰਹਾਉ ਦੂਜਾ ॥') // => ਹੁਕਮੁ ਪਛਾਣਿ; ਤਾ ਖਸਮੈ ਮਿਲਣਾ
* stripEndings('ਜਨ ਨਾਨਕ. ਗੁਰਮੁਖਿ ਜਾਤਾ ਰਾਮ ॥੪॥੬॥ ਛਕਾ ੧ ॥') // => ਜਨ ਨਾਨਕ. ਗੁਰਮੁਖਿ ਜਾਤਾ ਰਾਮ
* @example <caption>English Translations</caption>
* stripEndings('O Nanak, Forever And Ever True. ||1||') // => O Nanak, Forever And Ever True.
* stripEndings('lush greenery. ||1||Pause||') // => lush greenery.
* stripEndings('always I live within the Khalsa. 519') // => always I live within the Khalsa.
* stripEndings('without your reminiscence.(1) (3)') // => without your reminiscence.
* @example <caption>Spanish Translations</caption>
* stripEndings('ofrece su ser en sacrificio a Ti. (4-2-9)') // => ofrece su ser en sacrificio a Ti.
*/
const stripEndings = text => matchers.reduce( ( text, exp ) => text.replace( exp, '' ), text ).trimRight()

module.exports = stripEndings
Loading

0 comments on commit 9158e56

Please sign in to comment.