Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kernel "libc" #2

Open
rellermeyer opened this issue Mar 8, 2014 · 53 comments
Open

Kernel "libc" #2

rellermeyer opened this issue Mar 8, 2014 · 53 comments
Assignees
Labels

Comments

@rellermeyer
Copy link
Owner

We will need a restricted set of libc-like functions in the kernel. E.g., some way of printing output for debugging, some initial way of dynamically allocating memory a la kmalloc, etc.

@SensitiveDongle
Copy link
Contributor

From some cursory research it seems like this will take a bit of mucking around in assembly to get the basic library routines up. Is that correct?

@rellermeyer
Copy link
Owner Author

It might possibly involve some assembly but I would think that the majority is in C.

@ascensionra
Copy link
Collaborator

We're not reinventing the wheel here, so we'll be able to use things like stdlib.h to construct this libc? I think I would be interested in working on this.

@ascensionra ascensionra self-assigned this Mar 11, 2014
@rellermeyer
Copy link
Owner Author

Well, we cannot use code from glibc or alike that because of license issues but we can take code from musl-libc (http://www.musl-libc.org) which is under MIT license.

However, for the kernel we should focus on only the library functions that we really need since the library is only needed inside the kernel and nowhere else.
For some that we will need, e.g., a way for the kernel to print debug output, we can discuss how we want them because it does not necessarily have to be printf. Just to give an example, some people (most prominently the designers of Solaris) believe that kernel code should not support variable length strings for reliability and security reasons, etc.

@SheldonSandbekkhaug
Copy link
Collaborator

I would like to request a string compare function like memcmp. All it does is compare two strings of bytes.

@stv8
Copy link
Collaborator

stv8 commented Mar 19, 2014

So we are allowed to use anything from the musl library? Once it is added to the repo that is.

@rellermeyer
Copy link
Owner Author

Yes, if we want certain functionality from it we can pull it in. Just make sure that all copyright headers are left in place to correctly attribute the authorship of the code.

@null-sysadmin
Copy link
Collaborator

Do you guys need any help with this? Jeremy and I are trying to write the ELF parser, but we can't do anything until we have libc finished.

@ascensionra
Copy link
Collaborator

Hi guys... First off, my apologies. I've been dealing with a lot of family stuff recently (brother in hospital, thousands of miles in road trips to deal with these issues), and I've fallen behind. Any help would be welcome, primarily if you could identify the key functionality you need from the library to get the parser to work. I will focus on getting those items completed. Again, my apologies for dropping the ball on this one...

@SheldonSandbekkhaug
Copy link
Collaborator

For the argument parser, all we need at the moment is a memory comparing function like memcmp so that we can compare two strings. I'll let you know if the parser needs anything else.

@ascensionra
Copy link
Collaborator

Would memcmp and strcmp be helpful? I'll get those added in ASAP.

@ascensionra
Copy link
Collaborator

memcmp and strmp added. Config.mk and makefile updated, Make run works. Added COPYRIGHT, since I used and made minor changes to the musl-libc.

@null-sysadmin
Copy link
Collaborator

Hey, no worries. For the ELF parser, we mainly need read and print functionality.

@rellermeyer
Copy link
Owner Author

Read as the IO call? I don't think we will have this any time soon, it requires an IO subsystem.
I would think that in the first phase you can consider the program binary to be mapped into memory and you will get a pointer to the start.

@SheldonSandbekkhaug
Copy link
Collaborator

Thanks for adding those two functions. Do you think you could push your changes so I can use them?

@ascensionra
Copy link
Collaborator

Ok, I merged the header and source for libc to master. Please make sure to read the comments in the files. To avoid conflict with existing functions, I prepended the functions with os_, so to call strcmp, you'd use os_strcmp

@SheldonSandbekkhaug
Copy link
Collaborator

I've been thinking that a function to split a string (like strtok) may be useful. Is that something that should be in our libc?

@rellermeyer
Copy link
Owner Author

If you have a use for it, you should add it or ask the klibc maintainers to add it.

@SheldonSandbekkhaug
Copy link
Collaborator

Would it be alright with you guys (the people working on libc) if I went ahead and added strtok or something like it from MUSL?

@ascensionra
Copy link
Collaborator

I'll add strtok. Anything else you would like to request while I'm at it?

@SheldonSandbekkhaug
Copy link
Collaborator

I think memcpy (or a function that copies a region of memory from one location to another) could be useful.

@SheldonSandbekkhaug
Copy link
Collaborator

^That was an accident, sorry.

@ascensionra
Copy link
Collaborator

I'll try to get this done today (along with strtok if it's not there already).

@SheldonSandbekkhaug
Copy link
Collaborator

Are you guys still working on klibc? I wouldn't mind adding strtok from MUSL myself if that's alright with you guys.

@SheldonSandbekkhaug
Copy link
Collaborator

I've added a couple of functions from MUSL to klibc. However, I think strtok is not working properly, so I'm going try to figure out what the problem is.

@ascensionra
Copy link
Collaborator

Hey Sheldon, thanks for picking up the slack on this. I'm still dealing with a lot of family stuff, so it is much appreciated.

Is there anything that is critical right now that I can resume working on?

@SheldonSandbekkhaug
Copy link
Collaborator

ascensionra, no problem! I just want to get things done, you know? :)

SUMDude21, I had copied the strtok from MUSL. I put it in klibc.c under os_strtok(). It turns out the problem with strtok was actually in strspn and strcspn (for some reason, those work in desktop Linux but not in CourseOS on the virtual machine). I just replaced the MUSL versions of strspn and strcspn with functions I wrote myself, so there may be hidden bugs that I haven't found yet. See commit d3615e9

@SUMDude21
Copy link
Contributor

@SheldonSandbekkhaug All right I'll write some tests for them, anything else I can help with?

@SheldonSandbekkhaug
Copy link
Collaborator

I can't think of anything at the moment, though things will come up, except if you ever need to use a library function in the kernel, just add it from MUSL if you can!

@SheldonSandbekkhaug
Copy link
Collaborator

I'm looking for a function to convert a hex string (like "17fa0b") to an integer. MUSL has a function called strtol (http://git.musl-libc.org/cgit/musl/tree/src/stdlib/strtol.c) but it requires some functions I cannot find, such as shlim. Does anyone know where these functions are, or is there another library with an acceptable license we could copy code from?

@SensitiveDongle
Copy link
Contributor

come on, sheldon - you're better than that! it would take like 5 minutes to code up.

@SheldonSandbekkhaug
Copy link
Collaborator

Alright, I wrote one in commit f9009ec.

@jdonszelmann
Copy link
Collaborator

The kernel libc is a complete mess. Everything in one header and the functions are probably buggy. I already replaced string and memory functions but the entire code base needs to have an update.

@rellermeyer
Copy link
Owner Author

I am not sure if everything in one header is a problem because this is not public API, it is only consumed in the kernel. I think the consensus at that point was that unless the kernel becomes large, one header was a good enough starting point.

I do agree with the buggy implementations, though. There are also some questionable decisions that I would advise against. For instance, I would go the more safe and conservative route and do not allow variable length strings at any point in the kernel, including printk

@jdonszelmann
Copy link
Collaborator

jdonszelmann commented Mar 10, 2020

at any point in the kernel

So the filesystem can't have variable length filenames?

I think a good klibc is very useful. And all the functions are already there. I already reimplemented most questionable functions so it all feels very similar to a normal libc like musl.

@rellermeyer
Copy link
Owner Author

I would not allow filenames to have infinite length either, no. So rather than scanning infinitely for a terminal character and opening up all avenues for malicious or unintentional buffer overflows, I would have a maximum length and possibly also store the actual length of the string in the data structure.

@jdonszelmann
Copy link
Collaborator

Also we built a very safe wrapper around strings called a qstring (reimplementation of linux' qstring) which allows for O(1) strlength() operations and stores a hash of the string to make comparing strings faster

@jdonszelmann
Copy link
Collaborator

We store filenames as qstrings which can't buffer overflow

@jdonszelmann
Copy link
Collaborator

jdonszelmann commented Mar 10, 2020

@rellermeyer
Copy link
Owner Author

Sounds good, although arguably qstring is a bandaid on a previously bad decision. Many other kernels never allowed variable strings in the first place.

With the hash, I would instantiate the value lazily to avoid the overhead.

Also make sure that you do not copy code from other sources that are under incompatible licenses. Remembering the concept and implementing it clean-room is okay.

@jdonszelmann
Copy link
Collaborator

That's exactly what we did. We didn't copy a single line of code.

Now that I remember, even though filenames may be fixed lengths, paths are definitely not. they are the sum of file/dirnames. So to store them in a similar way linux does in their dcaches we would really need variable length strings

@jdonszelmann
Copy link
Collaborator

Also, we do instantiate the hash lazily. Only on the first compare when we loop through the string anyway do we set the hash. If they compare equal we even set both hashes

@rellermeyer
Copy link
Owner Author

Paths are composites, though. Every file or directory name can be fixed length and then you have a finite concatenation of them.

@jdonszelmann
Copy link
Collaborator

jdonszelmann commented Mar 10, 2020

That does mean a lot of overhead. If the max filename is 128 and the max dir depth is 32. Then we're storing 4kb per path even though we might only have 4 letter filenames and a 3 deep directory structure. That could be stored in 12 bytes. With a null terminator and 3 slashes that gives 16 bytes total. That's a factor of 256 better

@rellermeyer
Copy link
Owner Author

If you make a path an array of strings, it consumes less memory than making it always the full 32 times 128.
An array of qstring pointers would only introduce the overhead for the additional length and hash fields.

It would need to be empirically proven but I could imagine that you save quite some time by avoiding the repeated parsing of the path, though.

So in the end it is a tradeoff and I leave it up to you folks.

@jdonszelmann
Copy link
Collaborator

We are currently live discussing it. We already have parsed path structures to reduce that overhead. Internally we think it's fine to use null terminated strings as long as any syscall involving strings has a fixed length set. With those precautions we don't think the current way of doing it is a problem

@jdonszelmann
Copy link
Collaborator

jdonszelmann commented Mar 10, 2020

Plus, we think that the since most people are used to user land c, people might understand the current system a lot better. This kernel is not designed to be super secure. It is after all a course_os, not linux.

@rellermeyer
Copy link
Owner Author

rellermeyer commented Mar 10, 2020

A kernel not designed to be secure is not a kernel :-)
(Also, Linux is not a good reference point for security in OS but this is another discussion)

But I understand your rationale. If you decide to go with this solution then this is fine for the moment.

@ValentijnvdBeek ValentijnvdBeek self-assigned this Feb 20, 2021
@ValentijnvdBeek
Copy link
Collaborator

ValentijnvdBeek commented Feb 20, 2021

A little living comment with what is missing from libc, feel free to add to comment anywhere down below with suggestions or tips.

Short overview: the kernel libc was never really finished and features were added by various students as time went on. In order to help development of more advanced features it is a good idea to figure out what has happened until now, what should still be done, and what should be rewritten. The target standard is C99, since that one is somewhat similar to the one already implemented. It is a best-faith-effort, so don't expect everything to be standard conforming or for it to include everything.

As of right now the libc kernel contains the following aspects:

  • Standard headers
  • Default memory allocators
  • A few datastructures, some from other (free software) sources.

The list of implemented headers and data structures:

  • math.h
  • stdarg.h
  • stdbool.h
  • stdint.h
  • stdio.h
  • stdlib.h
  • string.h
  • HashMap
  • Priority Queue
  • Qstr
  • Array List
  • Single Linked List

List of headers and data structures to implement or research:

  • assert.h
  • inttypes.h
  • signal.h
  • complex.h
  • iso646.h
  • ctype.h
  • limits.h
  • tgmath.h
  • errno.h
  • locale.h
  • stddef.h
  • time.h
  • fenv.h
  • wchar.h
  • float .h
  • setjmp.h
  • wctype.h
  • Doubly Linked List
  • Some functional programming (at least for VPLL)

@jdonszelmann
Copy link
Collaborator

jdonszelmann commented Feb 20, 2021

assert.h is sort of already implemented with the testing framework. But it's definitely non-standard. Some headers I think are a bit unnecessary, for example setjmp.h, iso646.h and signal.h (the last one only makes sense in userland). Locale.h can be usful maybe, but usually I think userland processes are responsible for stuff like that, and it's something that the kernel doesn't need to worry about much. You might also want to think about adding some things that are especially useful for the kernel. An example of that is the constants.h file we have with some definition of memory sizes. Apart from that: great work setting up a to do list!

@ValentijnvdBeek
Copy link
Collaborator

assert.h is sort of already implemented with the testing framework. But it's definitely non-standard. Some headers I think are a bit unnecessary, for example setjmp.h, iso646.h and signal.h (the last one only makes sense in userland). Locale.h can be usful maybe, but usually I think userland processes are responsible for stuff like that, and it's something that the kernel doesn't need to worry about much. You might also want to think about adding some things that are especially useful for the kernel. An example of that is the constants.h file we have with some definition of memory sizes. Apart from that: great work setting up a to do list!

It deliberately says to implement or to research! The point is to make this a comprehensive look at the libc to prevent anyone else having to worry about it, so I added all the headers from C99 so I could cross them out here and give my reason why here. For example, the chance that I will implement wide characters is really small but if I look at them first and document my reasons for skipping them then people won't have to worry about it in the future

@jdonszelmann
Copy link
Collaborator

alright, you're right it does say that 👍 Then this is just my initial reaction on them. But doing your research seems like a good idea. I suspect you will have to embrace the fact that libc of a kernel just has different needs than ansi C 99. But I think you already know that :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

9 participants