Go package for splitting strings (aware of enclosing braces and quotes)
The problem with standard Golang strings.Split
is that it does not take into consideration that the string being split may
contain enclosing braces and/or quotes (where the separator should not be considered where it's inside braces or quotes)
Take for example a string representing a slice of comma separated strings...
str := `"aaa","bbb","this, for sanity, should not be split"`
running strings.Split
on that...
package main
import "strings"
func main() {
str := `"aaa","bbb","this, for sanity, should not be parts"`
parts := strings.Split(str, `,`)
println(len(parts))
}
would yield 5 (try on go-playground) - instead of the desired 3
However, with splitter, the result would be different...
package main
import "github.com/go-andiamo/splitter"
func main() {
commaSplitter, _ := splitter.NewSplitter(',', splitter.DoubleQuotes)
str := `"aaa","bbb","this, for sanity, should not be split"`
parts, _ := commaSplitter.Split(str)
println(len(parts))
}
which yields the desired 3! try on go-playground
Note: The varargs, after the first separator arg, are the desired 'enclosures' (e.g. quotes, brackets, etc.) to be taken into consideration
While splitting, any enclosures specified are checked for balancing!
To install Splitter, use go get:
go get github.com/go-andiamo/splitter
To update Splitter to the latest version, run:
go get -u github.com/go-andiamo/splitter
Enclosures instruct the splitter specific start/end sequences within which the separator is not to be considered. An enclosure can be one of two types: quotes or brackets.
Quote type enclosures only differ from bracket type enclosures in the way that their optional escaping works -
- Quote enclosures can be:
- escaped by escape prefix - e.g. a quote enclosure starting with
"
and ending with"
but\"
is not seen as ending - escaped by doubles - e.g. a quote enclosure starting with
'
and ending with'
but any doubles''
are not seen as ending
- escaped by escape prefix - e.g. a quote enclosure starting with
- Bracket enclosures can only be:
- escaped by escape prefix - e.g. a bracket enclosure starting with
(
and ending with)
and escape set to\
\(
is not seen as a start\)
is not seen as an end
- escaped by escape prefix - e.g. a bracket enclosure starting with
Note that brackets are ignored inside quotes - but quotes can exist within brackets. And when splitting, separators found within any specified quote or bracket enclosure are not considered.
The Splitter provides many pre-defined enclosures:
Var Name | Type | Start - End | Escaped end |
---|---|---|---|
DoubleQuotes |
Quote | " " |
none |
DoubleQuotesBackSlashEscaped |
Quote | " " |
\" |
DoubleQuotesDoubleEscaped |
Quote | " " |
"" |
SingleQuotes |
Quote | ' ' |
none |
SingleQuotesBackSlashEscaped |
Quote | ' ' |
\' |
SingleQuotesDoubleEscaped |
Quote | ' ' |
'' |
SingleInvertedQuotes |
Quote | ` ` |
none |
SingleInvertedQuotesBackSlashEscaped |
Quote | ` ` |
\' |
SingleInvertedQuotesDoubleEscaped |
Quote | ` ` |
`` |
SinglePointingAngleQuotes |
Quote | ‹ › |
none |
SinglePointingAngleQuotesBackSlashEscaped |
Quote | ‹ › |
\› |
DoublePointingAngleQuotes |
Quote | « » |
none |
LeftRightDoubleDoubleQuotes |
Quote | “ ” |
none |
LeftRightDoubleSingleQuotes |
Quote | ‘ ’ |
none |
LeftRightDoublePrimeQuotes |
Quote | 〝 〞 |
none |
SingleLowHigh9Quotes |
Quote | ‚ ‛ |
none |
DoubleLowHigh9Quotes |
Quote | „ ‟ |
none |
Parenthesis |
Brackets | ( ) |
none |
CurlyBrackets |
Brackets | { } |
none |
SquareBrackets |
Brackets | [ ] |
none |
LtGtAngleBrackets |
Brackets | < > |
none |
LeftRightPointingAngleBrackets |
Brackets | 〈 〉 |
none |
SubscriptParenthesis |
Brackets | ₍ ₎ |
none |
SuperscriptParenthesis |
Brackets | ⁽ ⁾ |
none |
SmallParenthesis |
Brackets | ﹙ ﹚ |
none |
SmallCurlyBrackets |
Brackets | ﹛ ﹜ |
none |
DoubleParenthesis |
Brackets | ⸨ ⸩ |
none |
MathWhiteSquareBrackets |
Brackets | ⟦ ⟧ |
none |
MathAngleBrackets |
Brackets | ⟨ ⟩ |
none |
MathDoubleAngleBrackets |
Brackets | ⟪ ⟫ |
none |
MathWhiteTortoiseShellBrackets |
Brackets | ⟬ ⟭ |
none |
MathFlattenedParenthesis |
Brackets | ⟮ ⟯ |
none |
OrnateParenthesis |
Brackets | ﴾ ﴿ |
none |
AngleBrackets |
Brackets | 〈 〉 |
none |
DoubleAngleBrackets |
Brackets | 《 》 |
none |
FullWidthParenthesis |
Brackets | ( ) |
none |
FullWidthSquareBrackets |
Brackets | [ ] |
none |
FullWidthCurlyBrackets |
Brackets | { } |
none |
SubstitutionBrackets |
Brackets | ⸂ ⸃ |
none |
SubstitutionQuotes |
Quote | ⸂ ⸃ |
none |
DottedSubstitutionBrackets |
Brackets | ⸄ ⸅ |
none |
DottedSubstitutionQuotes |
Quote | ⸄ ⸅ |
none |
TranspositionBrackets |
Brackets | ⸉ ⸊ |
none |
TranspositionQuotes |
Quote | ⸉ ⸊ |
none |
RaisedOmissionBrackets |
Brackets | ⸌ ⸍ |
none |
RaisedOmissionQuotes |
Quote | ⸌ ⸍ |
none |
LowParaphraseBrackets |
Brackets | ⸜ ⸝ |
none |
LowParaphraseQuotes |
Quote | ⸜ ⸝ |
none |
SquareWithQuillBrackets |
Brackets | ⁅ ⁆ |
none |
WhiteParenthesis |
Brackets | ⦅ ⦆ |
none |
WhiteCurlyBrackets |
Brackets | ⦃ ⦄ |
none |
WhiteSquareBrackets |
Brackets | 〚 〛 |
none |
WhiteLenticularBrackets |
Brackets | 〖 〗 |
none |
WhiteTortoiseShellBrackets |
Brackets | 〘 〙 |
none |
FullWidthWhiteParenthesis |
Brackets | ⦅ ⦆ |
none |
BlackTortoiseShellBrackets |
Brackets | ⦗ ⦘ |
none |
BlackLenticularBrackets |
Brackets | 【 】 |
none |
PointingCurvedAngleBrackets |
Brackets | ⧼ ⧽ |
none |
TortoiseShellBrackets |
Brackets | 〔 〕 |
none |
SmallTortoiseShellBrackets |
Brackets | ﹝ ﹞ |
none |
ZNotationImageBrackets |
Brackets | ⦇ ⦈ |
none |
ZNotationBindingBrackets |
Brackets | ⦉ ⦊ |
none |
MediumOrnamentalParenthesis |
Brackets | ❨ ❩ |
none |
LightOrnamentalTortoiseShellBrackets |
Brackets | ❲ ❳ |
none |
MediumOrnamentalFlattenedParenthesis |
Brackets | ❪ ❫ |
none |
MediumOrnamentalPointingAngleBrackets |
Brackets | ❬ ❭ |
none |
MediumOrnamentalCurlyBrackets |
Brackets | ❴ ❵ |
none |
HeavyOrnamentalPointingAngleQuotes |
Quote | ❮ ❯ |
none |
HeavyOrnamentalPointingAngleBrackets |
Brackets | ❰ ❱ |
none |
Note: To convert any of the above enclosures to escaping - use the MakeEscapable()
or MustMakeEscapable()
functions.
Quotes within quotes can be handled by using an enclosure that specifies how the escaping works, for example the following uses \ (backslash) prefixed escaping...
package main
import "github.com/go-andiamo/splitter"
func main() {
commaSplitter, _ := splitter.NewSplitter(',', splitter.DoubleQuotesBackSlashEscaped)
str := `"aaa","bbb","this, for sanity, \"should\" not be split"`
parts, _ := commaSplitter.Split(str)
println(len(parts))
}
Or with double escaping...
package main
import "github.com/go-andiamo/splitter"
func main() {
commaSplitter, _ := splitter.NewSplitter(',', splitter.DoubleQuotesDoubleEscaped)
str := `"aaa","bbb","this, for sanity, """"should,,,,"" not be split"`
parts, _ := commaSplitter.Split(str)
println(len(parts))
}
package main
import (
"fmt"
"github.com/go-andiamo/splitter"
)
func main() {
encs := []*splitter.Enclosure{
splitter.Parenthesis, splitter.SquareBrackets, splitter.CurlyBrackets,
splitter.DoubleQuotesDoubleEscaped, splitter.SingleQuotesDoubleEscaped,
}
commaSplitter, _ := splitter.NewSplitter(',', encs...)
str := `do(not,)split,'don''t,split,this',[,{,(a,"this has "" quotes")}]`
parts, _ := commaSplitter.Split(str)
println(len(parts))
for i, pt := range parts {
fmt.Printf("\t[%d]%s\n", i, pt)
}
}
Options define behaviours that are to be carried out on each found part during splitting.
An option, by virtue of it's return args from .Apply()
, can do one of three things:
- return a modified string of what is to be added to the split parts
- return a
false
to indicate that the split part is not to be added to the split result - return an
error
to indicate that the split part is unacceptable (and cease further splitting - the error is returned from theSplit
method)
Options can be added directly to the Splitter using .AddDefaultOptions()
method. These options are checked for every call to the splitters .Split()
method.
Options can also be specified when calling the splitter .Split()
method - these options are only carried out for this call (and after any options already specified on the splitter)
package main
import (
"fmt"
"github.com/go-andiamo/splitter"
)
func main() {
s := splitter.MustCreateSplitter('/').
AddDefaultOptions(splitter.IgnoreEmpties)
parts, _ := s.Split(`/a//c/`)
println(len(parts))
fmt.Printf("%+v", parts)
}
package main
import (
"fmt"
"github.com/go-andiamo/splitter"
)
func main() {
s := splitter.MustCreateSplitter('/').
AddDefaultOptions(splitter.IgnoreEmptyFirst, splitter.IgnoreEmptyLast)
parts, _ := s.Split(`/a//c/`)
println(len(parts))
fmt.Printf("%+v\n", parts)
parts, _ = s.Split(`a//c/`)
println(len(parts))
fmt.Printf("%+v\n", parts)
parts, _ = s.Split(`/a//c`)
println(len(parts))
fmt.Printf("%+v\n", parts)
}
package main
import (
"fmt"
"github.com/go-andiamo/splitter"
)
func main() {
s := splitter.MustCreateSplitter('/').
AddDefaultOptions(splitter.TrimSpaces)
parts, _ := s.Split(`/a/b/c/`)
println(len(parts))
fmt.Printf("%+v\n", parts)
parts, _ = s.Split(` / a /b / c/ `)
println(len(parts))
fmt.Printf("%+v\n", parts)
parts, _ = s.Split(`/ a / b / c /`)
println(len(parts))
fmt.Printf("%+v\n", parts)
}
package main
import (
"fmt"
"github.com/go-andiamo/splitter"
)
func main() {
s := splitter.MustCreateSplitter('/').
AddDefaultOptions(splitter.TrimSpaces, splitter.IgnoreEmpties)
parts, _ := s.Split(`/a/ /c/`)
println(len(parts))
fmt.Printf("%+v\n", parts)
parts, _ = s.Split(` / a // c/ `)
println(len(parts))
fmt.Printf("%+v\n", parts)
parts, _ = s.Split(`/ a / / c /`)
println(len(parts))
fmt.Printf("%+v\n", parts)
}
package main
import (
"fmt"
"github.com/go-andiamo/splitter"
)
func main() {
s := splitter.MustCreateSplitter('/').
AddDefaultOptions(splitter.TrimSpaces, splitter.NoEmpties)
if parts, err := s.Split(`/a/ /c/`); err != nil {
println(err.Error())
} else {
println(len(parts))
fmt.Printf("%+v\n", parts)
}
if parts, err := s.Split(` / a // c/ `); err != nil {
println(err.Error())
} else {
println(len(parts))
fmt.Printf("%+v\n", parts)
}
if parts, err := s.Split(`/ a / / c /`); err != nil {
println(err.Error())
} else {
println(len(parts))
fmt.Printf("%+v\n", parts)
}
if parts, err := s.Split(` a / b/c `); err != nil {
println(err.Error())
} else {
println(len(parts))
fmt.Printf("%+v\n", parts)
}
}