-
Notifications
You must be signed in to change notification settings - Fork 244
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TEXT-217: Add SnakeCase Parsing #552
base: master
Are you sure you want to change the base?
Conversation
For me, camel case starts with a lower case like a Java method name or an Open API key. The test does not make it easy to understand what is converted to what or what supported round-trips are supported. Explicit tests are helpful as documentation. |
The test does a cross product of cases. I find that using My strategy here is take what was originally provided and use that. If character case changes are required the user can do so in the calling code. Adding a pair of methods to get the segments as a |
Please see my reply in TEXT-217 The tests are too obtuse from my POV and there is zero Javadoc to set user expectations (see link above). I'd like to see something we can point to users like: assertToCamelCase("MyJavaMethodName", "myJavaMethodName"); |
case Dot: | ||
return "Hello.World"; | ||
default: | ||
throw new RuntimeException("Unsupported StringCase: " + stringCase); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use JUnit's fail()
API.
*/ | ||
public enum StringCase { | ||
/** | ||
* Camel case identifies strings like 'CamelCase'. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Each string case should have a precise definition (a RegEx for example) or a link to a precise definition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See comments here and in TEXT-217
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Claudenw
Great changes! TY.
I have 2 concerns and lots of small comments embedded.
- The main class let's you create illegal instances.
- It's still not clear to me this is generally useful for interfacing with programming languages. At the very least, I should be able to create legal Java type and method names. Even if eventually out of scope, there should be tests to show what's missing.
* @since 1.13.0 | ||
*/ | ||
public class CasedString { | ||
/** the string of the cased format. */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A sentence should start with a capital letter.
* A method to join camel string fragments together. | ||
*/ | ||
private static final Function<String[], String> CAMEL_JOINER = a -> { | ||
StringBuilder sb = new StringBuilder(a[0].toLowerCase(Locale.ROOT)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should account for size 0 and null input, unless the framework guarantees otherwise, in which case that fact can go in the functions Javadoc.
* @param segments the segments to create the CasedString from. | ||
* @return a CasedString | ||
*/ | ||
public String assemble(String[] segments) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does null input mean here?
*/ | ||
public String[] getSegments(String string) { | ||
if (string == null) { | ||
return new String[0]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Refactor this array into a constant, it's contents can never change.
public void testToCamelCase() { | ||
assertThat(new CasedString(StringCase.CAMEL, "").toString()).isEqualTo(""); | ||
assertThat(new CasedString(StringCase.CAMEL, " ").toString()).isEqualTo(""); | ||
assertThat(new CasedString(StringCase.CAMEL, "Tocamelcase").toString()).isEqualTo("tocamelcase"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's the opposite of what I expect: It can't be used to create a Java method name. This hints that there are different kinds of camel cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I cam across a reference that says camel case is lower case first character and pascal case is upper case first character. to create a java method name use
CasedString cString = new CasedString(StringCase.....
String javaClassName = WordUtils.capitalise(cCstring.toCase(StringCase.CAMEL));
String javaVarName = cString.toCase(StringCase.CAMEL));
Alternatively we could add PASCAL case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I cam across a reference that says camel case is lower case first character and pascal case is upper case first character. to create a java method name use
CasedString cString = new CasedString(StringCase..... String javaClassName = WordUtils.capitalise(cCstring.toCase(StringCase.CAMEL)); String javaVarName = cString.toCase(StringCase.CAMEL));
Alternatively we could add PASCAL case.
That's what I'm getting at, camel case is ambiguous depending on your context. In Java it means one thing for classes and another for method names. And then there is Pascal (it's not an acronym BTW).
Camel case is a way of writing phrases without spaces, where the first letter of each word is capitalized, except for the first letter of the entire compound word, which may be either upper or lower case.
From https://developer.mozilla.org/en-US/docs/Glossary/Camel_case
This tells me that StringCase.CAMEL does not define anything precise. StringCase.HIGHER_CAMEL, StringCase.LOWER_CAMEL lets you know what you get and it is one variation.
Another variation is what to do with words that are in all caps like acronyms. From the same URL:
Note that if the phrase contains acronyms (such as URI and HTML), camel casing practices vary. Some prefer to keep all of them capitalized, such as encodeURIComponent above. This may sometimes lead to ambiguity with multiple consecutive acronyms, such as XMLHTTPRequest. Others prefer to only capitalize the first letter, as XmlHttpRequest. The actual global variable, XMLHttpRequest, uses a mix of both.
If the input is "XMLHttpRequest", it would be odd to get back a camel string of "xMLHttpRequest" for LOWER_CAMEL, I'd expect "xmlHttpRequest".
All of this to say, that we need to define the behavior better IMO.
/** | ||
* A method to join camel string fragments together. | ||
*/ | ||
private static final Function<String[], String> CAMEL_JOINER = a -> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems to me a JOINER should be kept in it's specifc case enum.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know how to do this. Every attempt I made resulted in the enum not being able to locate the method.
* @param stringCase The {@code StringCase} that the {@code string} argument is in. | ||
* @param string The string. | ||
*/ | ||
public CasedString(StringCase stringCase, String string) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This constructor should not let me construct illegal instances, for example " a b c " should not be an allowed camel case string (or a kebab, or a snake string). Otherwise, it's just GIGO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is just GIGO. I expect that you know what the string looks like that you are starting with. So I guess it is not really "parsing" but converting. Perhaps it does not belong with TEXT-217 but should have its own ticket.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's not acceptable IMO, when I create a BigInteger or a Duration, I get an exception when I feed the methods garbage. It should be the same here, fail-fast is always better in this case, otherwise, I could get all sorts of junk down the line.
I expect that you know what the string looks like that you are starting with.
This will not only be for user input, where typos could still happen of course.
When you start with an Open API or XML Schema, you hope the input is well-formed and valid, but even if so, the contents may be junky.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no way that I can think of that will not produce errors down the line for somebody. The commitment here is that the string will be parsed into segments based upon the splitter rule, and it does that. Other than the SnakeCase the conversions do not change the case of the characters. If you want to validate the result then the external code should do that before or after the conversion.
I ended up developing this code because I needed to convert commons-cli based long kebab keys into Java class and variable names to convert the CLI interface into Ant and Maven build tools for the Rat project.
Perhaps it does not belong here, but I went looking for a lightweight conversion utility and found none.
I think that the "parser" moniker does not fit and that conversion is probably the correct term. So this is really not a fix for TEXT-217 as it does not "parse".
@Claudenw Please set this PR to Draft while we are still discussing it :-) |
I haven't downloaded this PR yet to take a good look at it yet. Probably Monday. I'll pull it and run a few aggressive tests. As far as conventions for camelCase, PascalCase, etc. there is a pretty good list I developed for a raw edit of CaseUtils in PR 528 of conventions, not opinions on what they should or should not be. camelCase should always start in lower case, PascalCase should always start with upper case if converting to ASCII, or title case (as defined by Unicode) if retaining UTF encodings. My PR tries to convert everything to lower ASCII that can be converted. dot case and phrase case are not included in my PR but are easily obtained through delimiters. @theshoesiner took a completely different approach in PR 450 that retains Unicode characters instead of converting to lower ASCII Latin. I'd thought of merging our two PRs and seeing if we could make something of it with an option to retain Unicode or convert to lower ASCII. I haven't looked at yours yet, but I did write some good JUnit tests I can use on it to see how she handles. As I said, I'll give that a shot sometime Monday. |
This implementation adds a CasedString class that can convert between several different formats.
Initially supported formats:
CasedString does not convert the character case except where mandated by the case. so
SnakeCase
converted to kabob case isSnake-Case
andkabob-case
converted to snake case iskabobCase
Other utilities are available to modify the characte case.