Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Classify tape images #150

Open
larsbrinkhoff opened this issue Aug 17, 2022 · 8 comments
Open

Classify tape images #150

larsbrinkhoff opened this issue Aug 17, 2022 · 8 comments

Comments

@larsbrinkhoff
Copy link
Owner

larsbrinkhoff commented Aug 17, 2022

Add a tool to classify tape images, especially those formats in use at MIT.

  • ITS DUMP format.
  • TOPS-20 DUMPER.
  • TOPS-10: BACKUP and FAILSAFE.
  • Skip optional ANSI label.
  • Unix: tar, dump, cpio.
  • Lispm tapes.
  • VMS BACKUP.

CC @eswenson1 @ams

@larsbrinkhoff
Copy link
Owner Author

For PDP-10 tapes, we'll also have to consider whether they are 9 or 7 track, little or big endian, etc.

@ams
Copy link

ams commented Aug 17, 2022 via email

@larsbrinkhoff
Copy link
Owner Author

there are multiple formats for LispM tapes; Symbolics and MIT.

I can see that. I see something beginning with some ASCII text strings like

PRELUDE
VERSION 3
TAPE-SYSTEM-VERSION 429
LMFS-VERSION 428.1 RELEASED

Is that the (or a) Symbolics format? I think those are tapes from REAGAN.

And then there's something more binary looking.

@larsbrinkhoff
Copy link
Owner Author

I also see VMS BACKUP tapes. Hoping this will help: https://github.com/kkaempf/vmsbackup

@larsbrinkhoff
Copy link
Owner Author

larsbrinkhoff commented Aug 18, 2022

I have some code up on a branch called lars/classify-tape. So far I try to detect:

  • Big or little endian record length.
  • SIMH or E11 tape format (odd record padded or not).
  • For PDP-10 tapes, if they are 7 or 9 track.
  • ITS DUMP.
  • TOPS-20 DUMPER.
  • Symbolics LMFS dump.
  • VMS BACKUP.
  • Unix tar - several versions; old/new, 16/32-bit, little/big endian.
  • Unix dump.
  • MS-DOS FAT file system(?!)

I see some unclassified tapes that look liike Unix or Lispm. The vast majority of tapes are ITS or TOPS-20, but there are quite a few Unix, Lispm, and VMS tapes that are as of yet totally unexplored.

@larsbrinkhoff
Copy link
Owner Author

larsbrinkhoff commented Aug 18, 2022

Detection schemes range from heuristic to hacky. I mainly look at the first record, skipping any ANSI label if present.

  • PDP-10 records should be multiples of 5 or 6 frames (9 and 7 track, respectively).
  • ITS likes to write records of 1024 words, but may be smaller. A DUMP tape begins with a word -4,,0 or -4,,2.
  • TOPS-20 DUMPER uses records of 518 words exclusively, so that's a good indication.
  • TOPS-20 install media starts with an .EXE file; the first word has 1776 in the left half.
  • Symbolics LMFS dump tapes have a bunch of ASCII text metadats; I look for the string "TAPE-SYSTEM-VERSION"
  • Unix tar has the string "ustar" at offset 256 (decimal) in the first record. (There are other tar formats.)
  • Unix dump has the magic number 60011 at offset 18, or 60012 at offset 24 (V7 and 32-bit Unix, respectively).

@larsbrinkhoff
Copy link
Owner Author

Hello @romkey,

I'm working on classifying tape image files from MIT's "Tapes o' Tech Square" collection of backups. Some of the files seem to be MS-DOS floppy images, or at least some kind of FAT file system. I was surprised to see that type of data on MIT backup media, especially on what should be magnetic tapes.

When I think about PC's at MIT in the 80s, I of course think of the author of PC/IP. So, do you have any idea how this MS-DOS data would appear on magtapes from MIT?

Thanks!

@larsbrinkhoff
Copy link
Owner Author

larsbrinkhoff commented Aug 18, 2022

Classification results so far, on my limited corpus.

   2136 ITS DUMP
   1410 TOPS-20 DUMPER
    683 Unix dump
    261 Symbolics LMFS dump
    236 Unknown
    154 ITS DUMP (with label)
    106 Unix tar
    105 Error reading tape image
     78 VMS BACKUP
     15 TOPS-20 install
      8 FAT file system
      4 Unix cpio
      4 TOPS-10 BACKUP
      1 TOPS-10 FAILSAFE
      1 MIT/LMI dump

Breakdown of platforms:

Tapes Percentage Platform
5202 100 All
3720 72 PDP-10
793 15 Unix
341 7 Unknown/bad
262 5 Lispm
78 1 VMS
8 0.1 PC

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants