SUBSTREAM, or temporarily resetting the EOF ? #1198

rmkaplan · 2021-08-06T17:57:03Z

rmkaplan
Aug 6, 2021
Maintainer

One of the complexities of the external format interface is that there is no guaranteed correspondence between the number of characters and the number of bytes. So for the few functions that won't to keep track (PFCOPYBYTES, FILEPOS), the format interface functions have the complexity of needing to communicate how many bytes read.

The stream itself actually knows what happened, but this is all set up so that the caller doesn't have to keep calling GETFILEPTR to keep track.

In the case of FILEPOS, the issue is not actually about the chars-to-bytes, that is already screwed up. It is set up to search a range of bytes, and it is set up with a special optimization so that it can do that calculation again with dealing with large numbers. That makes it even harder to update this to deal properly with character as opposed to byte-sequence searching.

So I wonder whether some of this complexity can be moved down into the stream itself, since streams already know how to keep track of their byte positions in an efficient way.

We have a function GETEOFPTR that tells us the end of file. We don't have a corresponding SETEOFPTR. Suppose we did, and we could RESETSAVE the EOFPTR to the end of the region that we want to operate on, and then have the ENDOFSTREAMOP trigger whenever we went beyond that.

Then PFCOPYBYTES and FILEPOS could just set things up for ordinary binning, with the endofstreamop triggering to say they went passed the range. Those functions would no longer have to do their own arithmetic.

Is there a different way of pushing this down into the stream?

masinter · 2021-08-06T18:45:51Z

masinter
Aug 6, 2021
Maintainer

I’d worry about a hardreset. RESETSAVE is ok for user code but not the file system. BIN is an opcode. Changes need to be coordinated with Maiko. Charcount as a special just passing in the value seems workable.

0 replies

rmkaplan · 2021-08-07T19:35:33Z

rmkaplan
Aug 7, 2021
Maintainer Author

We are already screwed on RESETSAVE on a hard reset (stackoverflow in URAID?) because typically we mangle the string's ENDOFSTREAMOP under a RESETSAVE.

The idea would be to change the currently operative stream values that code its eof and that BIN pays attention to, while saving the original true values in other stream fields to ensure restorability. On the assumption that the opcode only looks at those eof-coding fields, it would be transparent.

Byte counting (as distinct from charcounting, that's the heart of the issue) is somewhat difficult to implement, because the inccode implementations have to accurately account for every byte that they read or unread. (I have a simplification ("improvement") to the interface that I still can't get through a load-up--always breaks in reading bitmaps from ADISPLAY)

But the thing that triggered this question is not the cases where byte-counting is currently being used in a fairly transparent way (e.g. COPYCHARS, PFCOPYBYTES), but the fact that there is at least one routine (FILEPOS) that has its own special complexity of segmenting the byte range to avoid creating large numbers while it is counting. That is an obstacle to upgrading FILEPOS to become a character-searcher instead of a byte-sequence searcher.

It's the temptation of such difficult-to-maintain-and-extend optimizations that I would like to eliminate in favor of a general notification scheme when a reader tries to advance beyond a specified range (which I know will require a little adjustment in the multi-byte-character reading implementations, but not in any character reading clients.)

0 replies

rmkaplan · 2023-05-05T17:42:21Z

rmkaplan
May 5, 2023
Maintainer Author

This arose originally because of the stack-smashing problem of CL:VALUES. The legacy code was reporting the number of bytes as a second value for PFCOPYBYTES and COPYCHARS and similar functions that wanted to keep track of the imperfect correspondence between characters and bytes, when characters are read (or written) through the generic external-format interface.

I worked around the issue in this context in another hackish way, by having a caller that wants to know optionally pass a flag to the generic functions that tells the format-implementors to freely set a known variable, bound by the caller, to the number of bytes. This allows the caller to do its own byte-counting, in a way that adds only a marginal cost (check the flag) when the flag is not passed.

But it remains that it would be nice to have a function e.g. (SUBSTREAM STREAM START END) that constructs a stream datum just like STREAM except that it maps its byte-position 0 to START in STREAM and its EOF to END. The ENDOFSTREAMOP of the substream datum could be smashed without changing the behavior of the parent, and then functions like COPYCHARS and PFCOPYBYTES could just test for NIL and not do any of their own arithmetic.

1 reply

nbriggs May 5, 2023
Maintainer

Can we keep a test case around for the stack-smashing problem of CL:VALUES so we can eventually track this down?

masinter · 2023-05-05T18:36:04Z

masinter
May 5, 2023
Maintainer

Issue #19. https://github.com/Interlisp/medley/files/6925843/stackbug.tar.gz has a test case. THe problem is that multiple value return walks the PC over the FN 1 \MVLIST in a way that the return leaves an extra value on the staack.

…

On Fri, May 5, 2023 at 11:25 AM Nick Briggs ***@***.***> wrote: Can we keep a test case around for the stack-smashing problem of CL:VALUES so we can eventually track this down? — Reply to this email directly, view it on GitHub <#1198 (reply in thread)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAIQTK6YFERGJTKASFFT2TLXEVA2ZANCNFSM6AAAAAAXXM35AE> . You are receiving this because you commented.Message ID: ***@***.***>

-- https://LarryMasinter.net https://interlisp.org

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Interlisp.org

SUBSTREAM, or temporarily resetting the EOF ? #1198

{{title}}

Replies: 4 comments 1 reply

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Interlisp.org

SUBSTREAM, or temporarily resetting the EOF ? #1198

rmkaplan Aug 6, 2021 Maintainer

Replies: 4 comments · 1 reply

masinter Aug 6, 2021 Maintainer

rmkaplan Aug 7, 2021 Maintainer Author

rmkaplan May 5, 2023 Maintainer Author

nbriggs May 5, 2023 Maintainer

masinter May 5, 2023 Maintainer

rmkaplan
Aug 6, 2021
Maintainer

Replies: 4 comments 1 reply

masinter
Aug 6, 2021
Maintainer

rmkaplan
Aug 7, 2021
Maintainer Author

rmkaplan
May 5, 2023
Maintainer Author

nbriggs May 5, 2023
Maintainer

masinter
May 5, 2023
Maintainer