Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stdlib: user-friendly filesystem API #211

Open
ospencer opened this issue Jul 12, 2020 · 22 comments · May be fixed by #1966
Open

Stdlib: user-friendly filesystem API #211

ospencer opened this issue Jul 12, 2020 · 22 comments · May be fixed by #1966
Assignees
Labels

Comments

@ospencer
Copy link
Member

No description provided.

@c0d3d
Copy link
Contributor

c0d3d commented Jul 12, 2020

I'll take it.

@spotandjake
Copy link
Member

spotandjake commented Nov 30, 2021

Out of curiosity what type of methods would you guys like for this lib to have I imagine

read: (path) -> Result<string>
write: (path, data) -> Result<String>

and maybe a few other directory ones are a must but anything else for a first iteration.

@alex-snezhko
Copy link
Member

In the brainstorming process for what this API might look like, I encountered a few contention points where design decisions should be made:

  1. Should this API be completely independent of sys/file from the user's perspective? For example if a user wanted to get the stats of a file should they get back a Filestats enum from sys/file or should a separate type be created in the new module? (I am personally leaning towards the latter as it allows for better abstraction and independence, but it has a few disadvantages also).
  2. How should the API handle working with file streams/channels (e.g. opening a file and doing a sequence of read/write operations on it)? Possible options:
    1. Have the open function return a file stream; this would be simpler to work with, but the user would then have to close the stream manually lest they be scolded by their operating system.
    2. Have the user pass a callback function to open; this approach is safer, but may be a bit more annoying to work with in certain situations.
    3. Implement some sort of automatic resource management mechanism (like with in Python, using in C#, etc); this would offer the best of both worlds but I am not sure that such a feature fits in with the rest of the language.

@spotandjake
Copy link
Member

I would argue for using a new enum that could be more user friendly possibly, I also personally more prefer the open approach for streams. though automatic resource management would be cool it feels like it would push having a user friendly api which is much needed far down the line. just my opinions though.

@phated
Copy link
Member

phated commented Oct 24, 2022

  1. Independent
  2. Why is there a streams/channels API at all when wasm can't do async things yet? I like the idea of a "file handle" type API like nodejs did with their fs.promises.open, etc

@alex-snezhko
Copy link
Member

Why would async be required for this? Maybe I'm not using the right term but I essentially mean a file handle

@phated
Copy link
Member

phated commented Oct 25, 2022

Fair enough. I only think about streams in the context of async data processing (from node) but I suppose OCaml has blocking streams & channels 😛 I'd definitely like to stay away from that overloaded terminology if possible.

@av8ta
Copy link
Contributor

av8ta commented Jan 11, 2023

I've made a pull request for this. Happy to have discussion about the api 😝

@av8ta
Copy link
Contributor

av8ta commented Jan 15, 2023

I'm working on an implementation in av8ta/fs based on feedback that looks something like in this comment: #1596 (comment)

export enum Error<e> {
  EISDIR(e),
  ENOENT(e),
  .... 75 more errors (just these two currently implemented to gather feedback)
  UNKNOWN(e),
}

export record PathOpenOptions {
  lookupFlag: List<File.LookupFlag>,
  openFlags: List<File.OpenFlag>,
  rights: List<File.Rights>,
  inheriting: List<File.Rights>,
  rwFlags: List<File.FdFlag>,
}

I've implemented open and readfile to start with using FileDescriptors. That will change when I implement Fs.FileHandle. The user will interact directly with handles instead of FileDescriptors. Error(e) are returned instead of exceptions.

// -> Result<File.FileDescriptor, Error<String>>
export let open = (path: Path.Path, options: PathOpenOptions) => {}

Recursively reads file in 1024 byte chunks. Opens & closes file descriptor.

// -> Result<String, Error<String>>
export let readFile = (path: Path.Path) => {}

Will begin working on Fs.FileHandle while implementing the below pseudo-code. Any objections, comments feedback are greatly appreciated :)

export let readFileChunk = (handle: Fs.FileHandle, maxBytesToRead: Number) => {
  // store handle if not seen previously
  // read one chunk
  // successive calls to readFileChunk advance

  (chunk, numBytesRead)
}

export let readFileChunks = (handle: Fs.FileHandle, maxBytesToRead: Number, callback) => {
  // store handle if not seen previously
  // read each chunk & call back on each 
  // until end of file

  callback((chunk, numBytesRead))
}

export let readFileSlice = (handle: Fs.FileHandle, range:Range.Range) => {
  // store handle if not seen previously
  // read chunk in range

  (slice, numBytesRead)
}

export let readFileSlices = (handle: Fs.FileHandle, range:Range.Range, maxBytesToRead: Number, callback) => {
  // store handle if not seen previously
  // read chunk in range up to maxBytesToRead
  // callback on each chunk

  callback((chunk, numBytesRead))
}

@phated
Copy link
Member

phated commented Jan 16, 2023

@av8ta Why are you having readFileChunks and readFileSlices taking a callback?

@av8ta
Copy link
Contributor

av8ta commented Jan 17, 2023

@phated
My thinking was devs might want to read files in chunks and work on them as they're streaming in. Perhaps they've got an infinite length file like a logfile that's constantly being appended to, or a file that's too large to fit in memory.

Rather than load the whole file into memory and then work on it, devs can get work done on the data immediately. Think; logfile streaming in on stdin, processing the chunks, and then writing the transformation to stdout. Loads of other usecases of course because streams are a great way to work on data.

Of course if they would rather do it in a loop then there are the singular versions readFileChunk and readFileSlice to use.

I also think wasm on edge compute is one of the many great usecases for webassembly and grain is already great for cli tools. Those environments can be very resource constrained.

I tried this and got a greatly improved output file :)

let writeStdout = message => File.fdWrite(File.stdout, message)
let improveCsv = string => String.replaceAll(",", "🌾\n\n", string)
let callback = string => writeStdout(improveCsv(string))

readFileChunks(Fs.stdin, 1024, ((string, num)) => callback(string))

@ospencer
Copy link
Member Author

ospencer commented Feb 4, 2023

For this first iteration, let's keep the API really small and then we can expand it from there. We'll hold off on chunks/streaming for now, since that'll play a lot into Grain's async story, but we're a ways off from that.

So for the first version, I propose this small API:

enum FileError {
  ...
}
readFile: Path -> Result<Bytes, FileError>
writeFile: (Path, Bytes) -> Result<Number, FileError> // number of bytes written
appendFile: (Path, Bytes) -> Result<Number, FileError>
module Utf8 {
  readFile: Path -> Result<String, FileError>
  writeFile: (Path, String) -> Result<Number, FileError>
  appendFile: (Path, String) -> Result<Number, FileError>
}

There are some natural extensions to this, like reading/writing a range, but those can come later. This gives us a really good starting point that we can continue to build from.

This will need a long overdue update to sys/file to operate on bytes, but I'll get that change in swiftly.

@ospencer
Copy link
Member Author

ospencer commented Feb 4, 2023

Here is said change: #1655

@av8ta
Copy link
Contributor

av8ta commented Feb 5, 2023

Looks good to me @ospencer, thanks for the feedback

@alex-snezhko
Copy link
Member

@av8ta Since you made a lot of progress towards this API on your initial PR, were you still planning on writing a new iteration based on the recent feedback or would you be ok with someone else bringing it over the finish line?

@spotandjake spotandjake assigned alex-snezhko and unassigned av8ta Apr 5, 2023
@av8ta
Copy link
Contributor

av8ta commented Apr 10, 2023

@alex-snezhko I'm keen to resume with the new design. I found myself afk for a few weeks while shifting country and getting settled in, and can get back into it now.

@alex-snezhko
Copy link
Member

@av8ta Sounds good! I made some progress for the new proposed API that you could use as a base if you'd like https://github.com/alex-snezhko/grain/blob/ergo-fs/stdlib/sys/fs.gr

@av8ta
Copy link
Contributor

av8ta commented Apr 11, 2023

Thanks @alex-snezhko that's really helpful

@phated
Copy link
Member

phated commented Apr 24, 2023

Hey @av8ta and @alex-snezhko - I'd love for this PR to be up soon so we can get as much landed for 0.6 as possible. We are probably only a couple of weeks away! 🙏

@av8ta
Copy link
Contributor

av8ta commented May 11, 2023

Hey @phated I've had busy weekends lately, but I've got a chunk of this one free so I'll take a look at it. Hopefully that won't be too late for 0.6 🙏

av8ta pushed a commit to av8ta/grain that referenced this issue Oct 3, 2023
Authored-by: Alex Snezhko <alexsnezhko89@gmail.com>

Co-authored-by: av8ta <av8ta@pm.me>
@alex-snezhko
Copy link
Member

@av8ta I've had some time to pick this back up as I wasn't sure if you had the time to get to this since your last comment. However, I see that you pushed up a commit on Oct 3rd, so I was curious if you were still looking to close this out or if you'd be fine with me making this PR? I implemented several additional functions and think I have it at a near-complete state now.

@av8ta
Copy link
Contributor

av8ta commented Dec 19, 2023

@alex-snezhko yeah go for it, I had planned to but having trouble creating the time for it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
6 participants