-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Name mangling #189
Comments
As of #190 , basic infra for this is now in place (see |
I thought that C identifiers are pretty restricted, but that knowledge is outdated: C does indeed support Unicode now! From the C23 working draft (N3096):
Package A Haskell identifier "consists of a letter followed by zero or more letters, digits, underscores, and single quotes" (Haskell 2010 Language Report). Some details:
Perhaps we can remove any characters that are not valid in Haskell names, as well as change the case of the first character according to the type of name as appropriate. There are some edge cases:
Name conversion is pure. Perhaps it is acceptable to let edge case compiler errors happen, notifying users that they need to work around such name issues? That might be preferred over workarounds such as encoding invalid characters, etc. It is possible to convert to Haskell-style |
To avoid clashes, the mangling should be injective. struct foo { ... };
typedef struct foo Foo; We cannot mangle both |
Thank you very much for the example! C identifiers that differ only in the case of the first letter do indeed provide a challenge. It is also common for a typedef struct bar {
char c;
struct bar *next;
} bar; Currently, conversion is done via a class ToHsName (ns :: Namespace) where
toHsName :: CName -> HsName ns Perhaps we need more context. I am trying to get a feeling for what the desired result might be. For a C identifiers may differ only in the case of the first letter in general, though, so that special case is not sufficient. enum Bar {
HOGE,
PIYO
}; What should we do in the general case? With more context, perhaps we can use a prefix or suffix. For example, |
C With more context, perhaps we could prefix member names with the name of the TODO:
|
The Haskell 2010 Language Report specs are likely insufficient. It is probably better to reference the GHC source. |
An example of a character that is valid in C but causes problems in Haskell:
This character can be used in C identifiers as long as it is not the first character. When used in Haskell, however, it is interpreted as a symbol character that creates an infix operator. This is an edge case, but it is feasible that somebody could write C using Japanese identifiers. Other problematic characters that I have found seem at least as unlikely. If we want to handle all cases, however, we need to filter or escape such characters. I am attaching a simple utility script that shows various information about characters passed as command-line arguments. The extension is renamed to |
I think it's probably important to have a definition that deals with the most urgent problems first (capitalization issues, name clashes, etc.), before we deal with the various unicode intricacies. I think keeping this context free is not only useful for code clarity, but also for predictability of the names for users. Perhaps we can just use a system of standard prefixes? Actually, on the topic of prefixes, perhaps we should have some context: it might be useful to prefix the names of the fields of a struct with the name of that struct; this avoids name clashes between fields of structs, and is anyway a commom idiom in Haskell. (Perhaps at a later stage we could add an option for enabling the use of |
Perhaps we should introduce a type family mapping |
(Still a local context though; I think for predictability it is important that it doesn't depend on which other types happen to be in scope also.) |
Thank you for the feedback! I agree that we should prioritize common issues over edge cases such as Unicode. Predictability of names for users is indeed a significant concern. Generating Haddock documentation (#26) would likely be really appreciated. This is especially true in the Template Haskell case, IMHO, so that users do not have to read dumped splices. By "system of standard prefixes," perhaps you are thinking of prepending strings such as One benefit of using standard prefixes is that the case of the first letter would be part of the prefix. If we do not try to use Haskell naming conventions, we could use prefixes to work around issues caused by identifiers that only differ in case of the first letter. Contrived Examplestruct bar {
int a;
int b;
};
struct Bar {
int b;
int c;
}; data Struct_bar = Struct_bar {
struct_bar_a :: Int
, struct_bar_b :: Int
}
data Struct_Bar = Struct_Bar {
struct_Bar_b :: Int
, struct_Bar_c :: Int
} Prefixing the names of fields indeed requires (local) context. We need to know that the name is a class ToHsName (ns :: Namespace) where
type ToHsNameContext ns :: Type
toHsName :: (ToHsNameContext ns) -> CName -> HsName ns I understand the desire and benefits of only using a local context, not use state to keep track of which identifiers have already been used within each namespace. That is how I was thinking about it at first, but I then recalled that the low level API should work without user customization. In cases where there is a collision, what recourse does the user have? Perhaps they have to develop a C wrapper header with some names changed, duplicating the whole API since we operate on a file level. I wonder if this is acceptable. If we go this route, we should probably write documentation to help users who run into collisions. FWIW, I think that using a local context sounds good, assuming that it is acceptable for users to have to work around issues using C wrappers. Here is an overview of the types of collisions (thought of so far):
|
Yes indeed, though instead of
class ToHsName (ns :: Namespace) where
type ToHsNameContext ns :: Type
toHsName :: (ToHsNameContext ns) -> CName -> HsName ns Yes indeed (no need for the brackets though 😄 ). While we're at it, we might as well also add a |
I just discovered #160, which is the same topic. This issue is the duplicate, but I am closing #160 in favor of this once because this one has discussion.
I see that the case of the first letter is already causing problems. I will fix this ASAP, before adding the context and options or considering edge cases. |
The following example should address both of the basic case issues that @phadej indicated. // lowercase first character
struct foo {
// uppercase first character
int A;
}; I prefer to see a test fail before I address the issue, so that I have confidence that I am going in the right direction when the test starts to pass. I tried using Details
EDIT: It turns out that this was already in the tests, but the tests were not failing because they were testing against invalid Haskell. I updated the tests. I implemented very simple name mangling that just changes the case of the first letter of the name as required, according to the target namespace. Some GHCi tests
I did this minimal change first in case this is blocking anybody, and I will add the context and options next. |
Adjust case of first letter for namespace (#189)
We need to be careful translating C names to Haskell names.
The text was updated successfully, but these errors were encountered: