Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

filehandle.readableWebStream() chunks return incompatible ArrayBuffer instead of Uint8Array #54041

Open
karlhorky opened this issue Jul 25, 2024 · 17 comments · May be fixed by #55461
Open

filehandle.readableWebStream() chunks return incompatible ArrayBuffer instead of Uint8Array #54041

karlhorky opened this issue Jul 25, 2024 · 17 comments · May be fixed by #55461

Comments

@karlhorky
Copy link
Contributor

karlhorky commented Jul 25, 2024

Version

v20.12.0

Platform

Linux ljysjk 6.1.43 #1 SMP PREEMPT_DYNAMIC Sun Aug  6 20:05:33 UTC 2023 x86_64 GNU/Linux

Subsystem

No response

What steps will reproduce the bug?

  1. Open the CodeSandbox (see code below)
  2. Observe the ERR_INVALID_ARG_TYPE error below

CodeSandbox demo: https://codesandbox.io/p/devbox/filehandle-readablewebstream-with-new-response-ljysjk?file=%2Findex.js

import fs from "node:fs/promises";
import http from "node:http";
import path from "node:path";
import { Readable } from "node:stream";

const filePath = "./image.jpg";

const server = http.createServer(async (nodeRequest, nodeResponse) => {
  const stats = await fs.stat(filePath);
  const fileHandle = await fs.open(filePath);
  const webStream = fileHandle.readableWebStream();

  nodeResponse.writeHead(200, {
    "Content-Disposition": `attachment; filename=${path.basename(filePath)}`,
    "Content-Type": "image/jpeg",
    "Content-Length": String(stats.size),
  });

  const nodeStream = Readable.fromWeb(webStream);
  nodeStream.pipe(nodeResponse);
});

const port = 3000;
server.listen(port, () => {
  console.log(`Server running at http://localhost:${port}/`);
});

Error:

Server running at http://localhost:3000/
node:events:496
      throw er; // Unhandled 'error' event
      ^

TypeError [ERR_INVALID_ARG_TYPE]: The "chunk" argument must be of type string or an instance of Buffer or Uint8Array. Received an instance of ArrayBuffer
    at readableAddChunkPushByteMode (node:internal/streams/readable:480:28)
    at Readable.push (node:internal/streams/readable:390:5)
    at node:internal/webstreams/adapters:517:22
Emitted 'error' event on Readable instance at:
    at emitErrorNT (node:internal/streams/destroy:169:8)
    at emitErrorCloseNT (node:internal/streams/destroy:128:3)
    at process.processTicksAndRejections (node:internal/process/task_queues:82:21) {
  code: 'ERR_INVALID_ARG_TYPE'
}

Node.js v20.12.0
Failed running 'index.js'

Screenshot 2024-07-25 at 12 52 56

Alternative version with new Response():

import fs from "node:fs/promises";
import http from "node:http";
import path from "node:path";

const filePath = "./image.jpg";

const server = http.createServer(async (nodeRequest, nodeResponse) => {
  const stats = await fs.stat(filePath);
  const fileHandle = await fs.open(filePath);
  const webStream = fileHandle.readableWebStream();

  // Version with new Response()
  const webResponse = new Response(webStream, {
    status: 200,
    headers: new Headers({
      "content-disposition": `attachment; filename=${path.basename(filePath)}`,
      "content-type": "image/jpeg",
      "content-length": String(stats.size),
    }),
  });

  nodeResponse.writeHead(
    webResponse.status,
    Object.fromEntries(webResponse.headers.entries())
  );

  webResponse.body.pipeTo(
    new WritableStream({
      write(chunk) {
        nodeResponse.write(chunk);
      },
      close() {
        nodeResponse.end();
      },
    })
  );
});

const port = 3000;
server.listen(port, () => {
  console.log(`Server running at http://localhost:${port}/`);
});

How often does it reproduce? Is there a required condition?

Always

What is the expected behavior? Why is that the expected behavior?

Web streams created by fileHandle.readableWebStream() should be compatible with Readable.fromWeb()

What do you see instead?

Web streams created by fileHandle.readableWebStream() should are incompatible with Readable.fromWeb()

Additional information

Also reported over here:

cc @jasnell

@Ethan-Arrowood
Copy link
Contributor

Taking a look 👀

@Ethan-Arrowood
Copy link
Contributor

My understanding is that the readableWebStream method returns a ReadableStream in bytes mode (https://developer.mozilla.org/en-US/docs/Web/API/ReadableStream/ReadableStream#type) by default because files can contain any kind of data. As the error suggests, ArrayBuffer is not directly compatible with Node.js streams.

The best solution I could recommend right now is to maybe use a TransformStream and convert each chunk of the ReadableStream into a Node.js Buffer using Buffer.from(chunk) (which should correctly handle ArrayBuffer).

Also, I assume the code you provided is just for reproduction purposes, but it's pretty inefficient to go from Node Stream to Web Stream back to Node stream. Keep it all as a Node stream, or convert it to Web stream and keep it as that.

@Ethan-Arrowood
Copy link
Contributor

Roughly:

const t = new TransformStream({
	transform(chunk, controller) {
		controller.enqueue(Buffer.from(chunk))
	},
})

const nodeStream = Readable.fromWeb(webStream.pipeThrough(t));

🤷‍♂️

@Ethan-Arrowood
Copy link
Contributor

Regarding an actual fix to Node.js, I'd say we have to improve Readable.fromWeb to infer this and maybe do this transform automatically. I don't immediately see an issue with that. WDYT @jasnell ?

@karlhorky
Copy link
Contributor Author

karlhorky commented Jul 25, 2024

My understanding is that the readableWebStream method returns a ReadableStream in bytes mode (developer.mozilla.org/en-US/docs/Web/API/ReadableStream/ReadableStream#type) by default because files can contain any kind of data.

I'd say we have to improve Readable.fromWeb to infer this and maybe do this transform automatically

Oh ok, interesting - or should there a better default for file.readableWebStream()?

Also, I assume the code you provided is just for reproduction purposes, but it's pretty inefficient to go from Node Stream to Web Stream back to Node stream. Keep it all as a Node stream, or convert it to Web stream and keep it as that.

Right, I can see how converting back and forth multiple times with this code would seem unrealistic or add overhead.

But this may not be so unrealistic if working with a Node.js framework that expects Web streams in user code - eg. Route Handlers in Next.js:

eric-burel/demo-readableWebStream in @eric-burel's demo repo

import fs from "fs/promises"
import path from "path"

export const dynamic = 'force-dynamic'

export async function GET(request: Request) {
    const filePath = path.resolve("./public/image.jpg")
    const stats = await fs.stat(filePath);
    const fileHandle = await fs.open(filePath)
    const stream = fileHandle.readableWebStream()
    return new Response(stream, {
        status: 200,
        headers: new Headers({
            "content-disposition": `attachment; filename=${path.basename(filePath)}`,
            "content-type": "image/jpeg",
            "content-length": stats.size + "",
        })
    })

...where in the background, there may be code like this to convert it to a Node.js stream:

Keep it all as a Node stream, or convert it to Web stream and keep it as that

If there was a mode / way of working to instead keep everything as a Web stream, and also serve up Web streams from a Node.js response with node:http or similar, then that would I guess be an option for frameworks to migrate to... 🤔

@Ethan-Arrowood
Copy link
Contributor

Oh ok, interesting - or should there a better default for file.readableWebStream()?

Maybe, I'm trying to think through where to best "fix" this. Because both side's limitation make sense to me.

On one hand, you don't really want file.readableWebStream() to muck the underlying data. On the other, Readable.fromWeb should definitely handle this case. So the question comes down to how?

@jasnell
Copy link
Member

jasnell commented Jul 26, 2024

FWIW, The readableWebStream() does not return a bytes stream by default... If you call const readable = fileHandle.readableWebStream({ type: 'bytes' }) then you'll get a proper byte-oriented stream that appears to work correctly with new Response(readable) ... I think maybe the change to make here is that readableWebStream() should return a byte-oriented stream by default.

@Ethan-Arrowood
Copy link
Contributor

Ah I misunderstood the code here

readableWebStream(options = kEmptyObject) {
if (this[kFd] === -1)
throw new ERR_INVALID_STATE('The FileHandle is closed');
if (this[kClosePromise])
throw new ERR_INVALID_STATE('The FileHandle is closing');
if (this[kLocked])
throw new ERR_INVALID_STATE('The FileHandle is locked');
this[kLocked] = true;
if (options.type !== undefined) {
validateString(options.type, 'options.type');
}
let readable;
if (options.type !== 'bytes') {
const {
newReadableStreamFromStreamBase,
} = require('internal/webstreams/adapters');
readable = newReadableStreamFromStreamBase(
this[kHandle],
undefined,
{ ondone: () => this[kUnref]() });
} else {
const {
ReadableStream,
} = require('internal/webstreams/readablestream');
const readFn = FunctionPrototypeBind(this.read, this);
const ondone = FunctionPrototypeBind(this[kUnref], this);
readable = new ReadableStream({
type: 'bytes',
autoAllocateChunkSize: 16384,
async pull(controller) {
const view = controller.byobRequest.view;
const { bytesRead } = await readFn(view, view.byteOffset, view.byteLength);
if (bytesRead === 0) {
ondone();
controller.close();
}
controller.byobRequest.respond(bytesRead);
},
cancel() {
ondone();
},
});
}

Do you foresee any issue with that change in default?

@jasnell
Copy link
Member

jasnell commented Jul 26, 2024

I wouldn't imagine any issues but it would need to be a semvver-major change that should be called out in notable changes.

@jasnell
Copy link
Member

jasnell commented Jul 26, 2024

@karlhorky ... can you verify quickly if const stream = fileHandle.readableWebStream({ type: 'bytes' }); works for your case?

@karlhorky
Copy link
Contributor Author

Should be pretty easy to change the CodeSandbox linked above yep.

Although probably I'll need to check that out tomorrow

@Ethan-Arrowood
Copy link
Contributor

The method is still labeled as experimental in the docs, so would it really be a major change? I think a fix is appropriate

@karlhorky
Copy link
Contributor Author

karlhorky commented Jul 27, 2024

@karlhorky ... can you verify quickly if const stream = fileHandle.readableWebStream({ type: 'bytes' }); works for your case?

Seems to work, yes! 👍

CodeSandbox: https://codesandbox.io/p/devbox/filehandle-readablewebstream-with-new-response-forked-mszlgw?file=%2Findex.js

Changes to original sandbox from PR description:

  1. I forked the sandbox just now
  2. I added { type: 'bytes' } as an argument to fileHandle.readableWebStream()
  3. I removed the "Content-Disposition": `attachment; filename=${path.basename(filePath)}`, (which also worked to download the file by visiting the page directly, but didn't allow for a very nice demo)
  4. I added fileHandle.close(), to avoid these warnings:
    (node:6079) Warning: Closing file descriptor 23 on garbage collection
    (Use `node --trace-warnings ...` to show where the warning was created)
    (node:6079) [DEP0137] DeprecationWarning: Closing a FileHandle object on garbage collection is deprecated. Please close FileHandle objects explicitly using FileHandle.prototype.close(). In the future, an error will be thrown if a file descriptor is closed during garbage collection.
    

Screenshot 2024-07-27 at 13 05 19

cc @eric-burel

@eric-burel
Copy link

eric-burel commented Jul 31, 2024

@karlhorky absolutely wonderful, I'll update my resources on the topic to match this API.
Last tiny issue, I hit some TypeScript issues in the context of Next.js:

Argument of type 'ReadableStream<any>' is not assignable to parameter of type 'BodyInit | null | undefined'.
  Type 'import("stream/web").ReadableStream<any>' is not assignable to type 'ReadableStream<any>'.
    Types of property 'pipeThrough' are incompatible.

When passing the stream to Response. But maybe an issue on Next.js side.

Note that you can't close a file after the Response is sent in Next.js, so no way to properly close the file (closing it before the stream is sent will trigger an error as expected).

@Ethan-Arrowood
Copy link
Contributor

@eric-burel that is an issue on the Next.js side. I believe it is a mismatch between Node.js Web Stream types and the Web Stream types from TypeScript's standard library (lib).

@isker
Copy link
Contributor

isker commented Oct 17, 2024

@jasnell

I think maybe the change to make here is that readableWebStream() should return a byte-oriented stream by default.

I'm looking into this problem and I'm not sure why the stream should ever be allowed to not be byte-oriented. It's not clear to me why this should be an option in the API at all.

For example, as far as I know, the Node stream APIs in fs only deal in bytes (unless you use encoding to get strings; but this is done with something like TextDecoderStream in web streams).

isker added a commit to isker/node that referenced this issue Oct 19, 2024
The original implementation of the experimental
`FileHandle.readableWebStream` API created non-`type: 'bytes'` streams,
which prevented callers from creating `mode: 'byob'` readers from the
returned stream, which means they could not achieve the associated
"zero-copy" performance characteristics.

Then, nodejs#46933 added a parameter allowing callers to pass the `type`
parameter down to the ReadableStream constructor, exposing the same
semantics to callers of `FileHandle.readableWebStream`.

But there is no point to giving callers this choice: FileHandle-derived
streams are by their very nature byte streams. We should not require
callers to explicitly opt in to having byte stream semantics. Moreover,
I do not see a situation in which callers would ever want to have a
non-bytes stream: bytes-streams only do anything differently than normal
ones if `mode: 'byob'` is passed to `getReader`.

So, remove the `options` parameter and always create a ReadableStream
with `type: 'bytes'`.

Fixes nodejs#54041.
isker added a commit to isker/node that referenced this issue Nov 7, 2024
The original implementation of the experimental
`FileHandle.readableWebStream` API created non-`type: 'bytes'` streams,
which prevented callers from creating `mode: 'byob'` readers from the
returned stream, which means they could not achieve the associated
"zero-copy" performance characteristics.

Then, nodejs#46933 added a parameter allowing callers to pass the `type`
parameter down to the ReadableStream constructor, exposing the same
semantics to callers of `FileHandle.readableWebStream`.

But there is no point to giving callers this choice: FileHandle-derived
streams are by their very nature byte streams. We should not require
callers to explicitly opt in to having byte stream semantics. Moreover,
I do not see a situation in which callers would ever want to have a
non-bytes stream: bytes-streams only do anything differently than normal
ones if `mode: 'byob'` is passed to `getReader`.

So, remove the `options` parameter and always create a ReadableStream
with `type: 'bytes'`.

Fixes nodejs#54041.
isker added a commit to isker/node that referenced this issue Nov 7, 2024
The original implementation of the experimental
`FileHandle.readableWebStream` API created non-`type: 'bytes'` streams,
which prevented callers from creating `mode: 'byob'` readers from the
returned stream, which means they could not achieve the associated
"zero-copy" performance characteristics.

Then, nodejs#46933 added a parameter allowing callers to pass the `type`
parameter down to the ReadableStream constructor, exposing the same
semantics to callers of `FileHandle.readableWebStream`.

But there is no point to giving callers this choice: FileHandle-derived
streams are by their very nature byte streams. We should not require
callers to explicitly opt in to having byte stream semantics. Moreover,
I do not see a situation in which callers would ever want to have a
non-bytes stream: bytes-streams only do anything differently than normal
ones if `mode: 'byob'` is passed to `getReader`.

So, remove the `options` parameter and always create a ReadableStream
with `type: 'bytes'`.

Fixes nodejs#54041.
@karlhorky
Copy link
Contributor Author

@jasnell @Ethan-Arrowood PR has been opened by @isker over here:

isker added a commit to isker/node that referenced this issue Nov 7, 2024
The original implementation of the experimental
`FileHandle.readableWebStream` API created non-`type: 'bytes'` streams,
which prevented callers from creating `mode: 'byob'` readers from the
returned stream, which means they could not achieve the associated
"zero-copy" performance characteristics.

Then, nodejs#46933 added a parameter allowing callers to pass the `type`
parameter down to the ReadableStream constructor, exposing the same
semantics to callers of `FileHandle.readableWebStream`.

But there is no point to giving callers this choice: FileHandle-derived
streams are by their very nature byte streams. We should not require
callers to explicitly opt in to having byte stream semantics. Moreover,
I do not see a situation in which callers would ever want to have a
non-bytes stream: bytes-streams only do anything differently than normal
ones if `mode: 'byob'` is passed to `getReader`.

So, remove the `options` parameter and always create a ReadableStream
with `type: 'bytes'`.

Fixes nodejs#54041.
isker added a commit to isker/node that referenced this issue Nov 17, 2024
The original implementation of the experimental
`FileHandle.readableWebStream` API created non-`type: 'bytes'` streams,
which prevented callers from creating `mode: 'byob'` readers from the
returned stream, which means they could not achieve the associated
"zero-copy" performance characteristics.

Then, nodejs#46933 added a parameter allowing callers to pass the `type`
parameter down to the ReadableStream constructor, exposing the same
semantics to callers of `FileHandle.readableWebStream`.

But there is no point to giving callers this choice: FileHandle-derived
streams are by their very nature byte streams. We should not require
callers to explicitly opt in to having byte stream semantics. Moreover,
I do not see a situation in which callers would ever want to have a
non-bytes stream: bytes-streams only do anything differently than normal
ones if `mode: 'byob'` is passed to `getReader`.

So, remove the `options` parameter and always create a ReadableStream
with `type: 'bytes'`.

Fixes nodejs#54041.
isker added a commit to isker/node that referenced this issue Nov 17, 2024
The original implementation of the experimental
`FileHandle.readableWebStream` API created non-`type: 'bytes'` streams,
which prevented callers from creating `mode: 'byob'` readers from the
returned stream, which means they could not achieve the associated
"zero-copy" performance characteristics.

Then, nodejs#46933 added a parameter allowing callers to pass the `type`
parameter down to the ReadableStream constructor, exposing the same
semantics to callers of `FileHandle.readableWebStream`.

But there is no point to giving callers this choice: FileHandle-derived
streams are by their very nature byte streams. We should not require
callers to explicitly opt in to having byte stream semantics. Moreover,
I do not see a situation in which callers would ever want to have a
non-bytes stream: bytes-streams only do anything differently than normal
ones if `mode: 'byob'` is passed to `getReader`.

So, remove the `options` parameter and always create a ReadableStream
with `type: 'bytes'`.

Fixes nodejs#54041.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants