Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TEXT-217: Add SnakeCase Parsing #552

Draft
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

Claudenw
Copy link

This implementation adds a CasedString class that can convert between several different formats.

Initially supported formats:

  • Camel case identifies strings like 'CamelCase'.
  • Snake case identifies strings like 'Snake_Case'
  • Kebab case identifies strings like 'kebab-case'
  • Phrase case identifies phrases of words like 'phrase case'
  • Dot case identifies strings of words like 'dot.case'

CasedString does not convert the character case except where mandated by the case. so SnakeCase converted to kabob case is Snake-Case and kabob-case converted to snake case is kabobCase

Other utilities are available to modify the characte case.

@Claudenw Claudenw self-assigned this May 25, 2024
@Claudenw Claudenw requested a review from aherbert May 25, 2024 10:06
@garydgregory
Copy link
Member

garydgregory commented May 25, 2024

For me, camel case starts with a lower case like a Java method name or an Open API key.

The test does not make it easy to understand what is converted to what or what supported round-trips are supported.

Explicit tests are helpful as documentation.

@Claudenw
Copy link
Author

The test does a cross product of cases.

I find that using WordUtils.capitalise() and WordUitls.uncapitalise() can address the capitalisation SnakeCase issues. I found I need to generate both Java Class names and argument names.

My strategy here is take what was originally provided and use that. If character case changes are required the user can do so in the calling code.

Adding a pair of methods to get the segments as a String[] and to take a String[] and create the phrase might be a useful addition.

@garydgregory
Copy link
Member

Please see my reply in TEXT-217

The tests are too obtuse from my POV and there is zero Javadoc to set user expectations (see link above). I'd like to see something we can point to users like:

assertToCamelCase("MyJavaMethodName", "myJavaMethodName");

case Dot:
return "Hello.World";
default:
throw new RuntimeException("Unsupported StringCase: " + stringCase);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use JUnit's fail() API.

*/
public enum StringCase {
/**
* Camel case identifies strings like 'CamelCase'.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Each string case should have a precise definition (a RegEx for example) or a link to a precise definition.

Copy link
Member

@garydgregory garydgregory left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comments here and in TEXT-217

Copy link
Member

@garydgregory garydgregory left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Claudenw
Great changes! TY.
I have 2 concerns and lots of small comments embedded.

  1. The main class let's you create illegal instances.
  2. It's still not clear to me this is generally useful for interfacing with programming languages. At the very least, I should be able to create legal Java type and method names. Even if eventually out of scope, there should be tests to show what's missing.

* @since 1.13.0
*/
public class CasedString {
/** the string of the cased format. */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A sentence should start with a capital letter.

* A method to join camel string fragments together.
*/
private static final Function<String[], String> CAMEL_JOINER = a -> {
StringBuilder sb = new StringBuilder(a[0].toLowerCase(Locale.ROOT));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should account for size 0 and null input, unless the framework guarantees otherwise, in which case that fact can go in the functions Javadoc.

* @param segments the segments to create the CasedString from.
* @return a CasedString
*/
public String assemble(String[] segments) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does null input mean here?

*/
public String[] getSegments(String string) {
if (string == null) {
return new String[0];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refactor this array into a constant, it's contents can never change.

public void testToCamelCase() {
assertThat(new CasedString(StringCase.CAMEL, "").toString()).isEqualTo("");
assertThat(new CasedString(StringCase.CAMEL, " ").toString()).isEqualTo("");
assertThat(new CasedString(StringCase.CAMEL, "Tocamelcase").toString()).isEqualTo("tocamelcase");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's the opposite of what I expect: It can't be used to create a Java method name. This hints that there are different kinds of camel cases.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I cam across a reference that says camel case is lower case first character and pascal case is upper case first character. to create a java method name use

CasedString cString = new CasedString(StringCase.....
String javaClassName = WordUtils.capitalise(cCstring.toCase(StringCase.CAMEL));
String javaVarName = cString.toCase(StringCase.CAMEL));

Alternatively we could add PASCAL case.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I cam across a reference that says camel case is lower case first character and pascal case is upper case first character. to create a java method name use

CasedString cString = new CasedString(StringCase.....
String javaClassName = WordUtils.capitalise(cCstring.toCase(StringCase.CAMEL));
String javaVarName = cString.toCase(StringCase.CAMEL));

Alternatively we could add PASCAL case.

That's what I'm getting at, camel case is ambiguous depending on your context. In Java it means one thing for classes and another for method names. And then there is Pascal (it's not an acronym BTW).

Camel case is a way of writing phrases without spaces, where the first letter of each word is capitalized, except for the first letter of the entire compound word, which may be either upper or lower case.

From https://developer.mozilla.org/en-US/docs/Glossary/Camel_case

This tells me that StringCase.CAMEL does not define anything precise. StringCase.HIGHER_CAMEL, StringCase.LOWER_CAMEL lets you know what you get and it is one variation.

Another variation is what to do with words that are in all caps like acronyms. From the same URL:

Note that if the phrase contains acronyms (such as URI and HTML), camel casing practices vary. Some prefer to keep all of them capitalized, such as encodeURIComponent above. This may sometimes lead to ambiguity with multiple consecutive acronyms, such as XMLHTTPRequest. Others prefer to only capitalize the first letter, as XmlHttpRequest. The actual global variable, XMLHttpRequest, uses a mix of both.

If the input is "XMLHttpRequest", it would be odd to get back a camel string of "xMLHttpRequest" for LOWER_CAMEL, I'd expect "xmlHttpRequest".

All of this to say, that we need to define the behavior better IMO.

/**
* A method to join camel string fragments together.
*/
private static final Function<String[], String> CAMEL_JOINER = a -> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems to me a JOINER should be kept in it's specifc case enum.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know how to do this. Every attempt I made resulted in the enum not being able to locate the method.

* @param stringCase The {@code StringCase} that the {@code string} argument is in.
* @param string The string.
*/
public CasedString(StringCase stringCase, String string) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This constructor should not let me construct illegal instances, for example " a b c " should not be an allowed camel case string (or a kebab, or a snake string). Otherwise, it's just GIGO.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is just GIGO. I expect that you know what the string looks like that you are starting with. So I guess it is not really "parsing" but converting. Perhaps it does not belong with TEXT-217 but should have its own ticket.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's not acceptable IMO, when I create a BigInteger or a Duration, I get an exception when I feed the methods garbage. It should be the same here, fail-fast is always better in this case, otherwise, I could get all sorts of junk down the line.

I expect that you know what the string looks like that you are starting with.

This will not only be for user input, where typos could still happen of course.

When you start with an Open API or XML Schema, you hope the input is well-formed and valid, but even if so, the contents may be junky.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no way that I can think of that will not produce errors down the line for somebody. The commitment here is that the string will be parsed into segments based upon the splitter rule, and it does that. Other than the SnakeCase the conversions do not change the case of the characters. If you want to validate the result then the external code should do that before or after the conversion.

I ended up developing this code because I needed to convert commons-cli based long kebab keys into Java class and variable names to convert the CLI interface into Ant and Maven build tools for the Rat project.

Perhaps it does not belong here, but I went looking for a lightweight conversion utility and found none.

I think that the "parser" moniker does not fit and that conversion is probably the correct term. So this is really not a fix for TEXT-217 as it does not "parse".

@garydgregory
Copy link
Member

@Claudenw Please set this PR to Draft while we are still discussing it :-)

@speters33w
Copy link

I haven't downloaded this PR yet to take a good look at it yet. Probably Monday. I'll pull it and run a few aggressive tests.

As far as conventions for camelCase, PascalCase, etc. there is a pretty good list I developed for a raw edit of CaseUtils in PR 528 of conventions, not opinions on what they should or should not be. camelCase should always start in lower case, PascalCase should always start with upper case if converting to ASCII, or title case (as defined by Unicode) if retaining UTF encodings.

My PR tries to convert everything to lower ASCII that can be converted.

dot case and phrase case are not included in my PR but are easily obtained through delimiters.

@theshoesiner took a completely different approach in PR 450 that retains Unicode characters instead of converting to lower ASCII Latin.

I'd thought of merging our two PRs and seeing if we could make something of it with an option to retain Unicode or convert to lower ASCII.

I haven't looked at yours yet, but I did write some good JUnit tests I can use on it to see how she handles.

As I said, I'll give that a shot sometime Monday.

@Claudenw Claudenw marked this pull request as draft May 26, 2024 08:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants