Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

M68K compatible #1

Open
leuat opened this issue Aug 26, 2020 · 11 comments
Open

M68K compatible #1

leuat opened this issue Aug 26, 2020 · 11 comments

Comments

@leuat
Copy link

leuat commented Aug 26, 2020

Hi there! I'm the creator of TRSE (www.turborascal.com), and I'm currently in the process of adding automatic packing + extraction of M68K resources (for the Amiga + Atari ST 520). I'm trying to get your decompression code to work on the M68K, but I'm having some problems, and was wondering if you would be able to help me

if we mange to get this work, this will be the de-facto way of compressing resources in TRSE, and you'll naturally be credited in the TRSE credits

I've compared the (compressed) files to the compressed output from fc8, and since the files are identical - the problem must reside with the decompression method. I've tried changing it to M68000 compatibility, but can't make it work. I'm suspecting that by (slow) byte copying method is incorrect, or perhaps the " lea _FC8_LENGTH_DECODE_LUT(pc),a5" that I'm not really getting. Any help would be very .. helpful!

Btw the only major things that I've changed are marked with "CHANGES HERE"

Nicolaas

Here's what I've tried to modify :

procedure Decompress(in, out : global pointer of integer; outsize : global long);
begin
;//#pragma parameter __D0 fc8_decode(__A0, __A1, __D1)
asm("
;asm unsigned short fc8_decode(unsigned char* in, unsigned char* out, unsigned long outsize)
;	machine 68020
FC8_HEADER_SIZE EQU 8
FC8_DECODED_SIZE_OFFSET  EQU 4
	movem.l	d1-d7/a0-a6,-(sp)
	move.l compression_in,a0	
	move.l compression_out,a1	
	move.l compression_outsize,d1	
	move.l #32000,d1	

	bra 	_Init_Decode

;// lookup table for decoding the copy length-1 parameter
_FC8_LENGTH_DECODE_LUT:
	dc.b	2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17
	dc.b	18,19,20,21,22,23,24,25,26,27,28,34,47,71,127,255

;//-------------------------------------------------------------------------------
;// _GetUINT32 - alignment independent reader for 32-bit integers
;// a0 = in
;// d6 = offset
;// d7 = result
;//-------------------------------------------------------------------------------
_GetUINT32:
	move.b	(a0,d6.w),d7
	asl.w	#8,d7
	move.b	1(a0,d6.w),d7
	swap	d7
	move.b	2(a0,d6.w),d7
	asl.w	#8,d7
	move.b	3(a0,d6.w),d7
	rts
	
_Init_Decode:
	;// check magic ID
	cmp.b	#'F',(a0)
	bne	_fail
	cmp.b	#'C',1(a0)
	bne	_fail
	cmp.b	#'8',2(a0)
	bne	_fail
	cmp.b	#'_',3(a0)
	bne	_fail
	
	;// check decoded size - enough space in the output buffer?
	moveq	#FC8_DECODED_SIZE_OFFSET,d6
	bsr.s	_GetUINT32
	cmp.l	d7,d1				
	bhi	_fail

	;// advance a0 to start of compressed data
	lea		FC8_HEADER_SIZE(a0),a0

	;// a5 = base of length decode lookup table
	lea		_FC8_LENGTH_DECODE_LUT(pc),a5
	;lea		-110(pc),a5	;// fix PC offset manually after assembly
	
	;// helpful constants
	move.l	#$0000001F,d1
	move.l	#$00000007,d2
	move.l	#$00000001,d3
	move.l	#$0000003F,d4		;// Main decompression loop
_mainloop:
	move.b	(a0)+,d6			;// d6 = next token
	bmi.s	_BR1_BR2			;// BR1 and BR2 tokens have the high bit set
	
	btst	#6,d6				;// check bit 6 to see if this is a BR0 or LIT token
	bne.s	_BR0

;// LIT 00aaaaaa - copy literal string of aaaaaa+1 bytes from input to output	
_LIT:							
	move.l	a0,a4				;// a4 = source ptr for copy
	and.w 	d4,d6				;// AND with 0x3F, d6 = length-1 word for copy
	lea		1(a0,d6.w),a0		;// advance a0 to the end of the literal string
	bra.s 	_copyLoop
	
;// BR0 01baaaaa - copy b+3 bytes from output backref aaaaa to output
_BR0:								
	move.b	d6,d5
	and.l	d1,d5				;// AND with 0x1F, d5 = offset for copy = (long)(t0 & 0x1F)
	beq		_done				;// BR0 with 0 offset means EOF
	
	move.l	a1,a4
	sub.l	d5,a4				;// a4 = source ptr for copy
	
	move.b	(a4)+,(a1)+		;// copy 3 bytes, can't use move.w. or move.l because src and dest may overlap
	move.b	(a4)+,(a1)+
	move.b	(a4)+,(a1)+
	btst	#5,d6				;// check b bit
	beq.s	_mainloop
	move.b	(a4)+,(a1)+		;// copy 1 more byte
	bra.s	_mainloop
	
_BR1_BR2:
	btst	#6,d6				;// check bit 6 to see if this is a BR1 or BR2 token
	bne.s	_BR2

;// BR1 10bbbaaa'aaaaaaaa - copy bbb+3 bytes from output backref aaa'aaaaaaaa to output
_BR1:							
	move.b	d6,d5
	and.l	d2,d5				;// AND with 0x07 
	lsl.l	#8,d5
	move.b	(a0)+,d5			;// d5 = offset for copy = ((long)(t0 & 0x07) << 8) | t1
	lsr.b	#3,d6
	and.w	d2,d6				;// AND with 0x07
	addq.w	#2,d6				;// d6 = length-1 word for copy = ((word)(t0 >> 3) & 0x7) + 1
	bra.s 	_copyBackref

;// BR2 11bbbbba'aaaaaaaa'aaaaaaaa - copy lookup_table[bbbbb] bytes from output backref a'aaaaaaaa'aaaaaaaa to output
_BR2:	
	move.b	d6,d5
	and.w	d3,d5				;// AND with 0x01
	swap	d5
	move.w	(a0)+,d5			;// d5 = offset for copy = ((long)(t0 & 0x01) << 16) | (t1 << 8) | t2
	lsr.b	#1,d6
	and.w	d1,d6				;// AND with 0x1F
	move.b	(a5,d6.w),d6		;// d6 = length-1 word for copy = ((word)(t0 >> 3) & 0x7) + 1
	;// fall through to _copyBackref

	;// copy data from a previous location in the output buffer
	;// d5 = offset from current buffer position	
_copyBackref:
	move.l	a1,a4
	sub.l	d5,a4 				;// a4 = source ptr for copy
	cmpi.l 	#4,d5   
	blt.s 	_nearCopy 			;// must copy byte-by-byte if offset < 4, to avoid overlapping long copies
	
	;// Partially unrolled block copy. Requires 68020 or better.
	;// Uses move.l and move.w where possible, even though both source and dest may be unaligned.
	;// It's still faster than multiple move.b instructions
	;// d6 = length-1
_copyLoop:	
	cmpi.w 	#16,d6
	bge.s 	_copy17orMore	
	jmp _copyBytes_leuat


; CHANGES HERE  

_copyBytes_leuat:
	cmp.w #0,d6
	beq.s _mainloop

;     subi.w #1,d6 ;; Copy 1 byte less
 
_topp_leuat:
  move.b (a4)+,(a1)+
  dbf d6,_topp_leuat
  bra		_mainloop

; CHANGES HERE

_copy17orMore:
	    move.l 	(a4)+,(a1)+
	    move.l 	(a4)+,(a1)+
	    move.l 	(a4)+,(a1)+
	    move.l 	(a4)+,(a1)+
	    subi.w 	#16,d6
	    cmpi.w 	#16,d6
		bge.s 	_copy17orMore	
		jmp _copyBytes_leuat

		
_nearCopy:	
		cmpi.l	#1,d5
		beq.s	_copyRLE	
_nearLoop:
	    move.b 	(a4)+,(a1)+
	    dbf 	d6,_nearLoop
	    bra		_mainloop

_copyRLE:
		;// assumes length is at least 3
		move.b	(a4),(a1)+		;// copy first byte
		btst	#0,d6
		beq.s	_doRLE			;// branch if copy length is odd (because d6 is length-1)
		move.b	(a4),(a1)+		;// copy second byte
_doRLE:
		subq.w	#2,d6
		lsr.w	#1,d6			;// length = (length-2) / 2
		move.w	(a4),d5	
_rleLoop:
		move.w	d5,(a1)+
		dbf		d6,_rleLoop
		bra		_mainloop

_fail:
	moveq.w	#0,d0				;// result = 0
	bra		_exit	
	
_done:
	moveq.w	#1,d0				;// result = 1
	
_exit:	
	movem.l	(sp)+,d1-d7/a0-a6
	");
end;
@steve-chamberlin
Copy link
Owner

The details of this code have faded from my mind since four years ago. Since this is just vanilla 68K code, I'd suggest running your modified version in an emulator like Easy68K, and stepping through it while it performs the decompression to find the point where the output is wrong. From a quick glance, I think you need to modify the code at _copy17orMore to use move.b instead of move.l, because the address might be unaligned and I don't think the 68000 supports unaligned memory access. There may also be some unaligned memory access happening in _doRLE and _rleLoop. For best performance on the 68000, you probably need to check if the source and destination addresses are aligned, copy a few bytes individually with move.b if necessary, and then copy the rest with aligned move.w or move.l copies.

@steve-chamberlin
Copy link
Owner

The lea _FC8_LENGTH_DECODE_LUT(pc),a5 is intended to set a5 to point to the beginning of _FC8_LENGTH_DECODE_LUT table, in a position-independent way relative to the PC. I seem to recall I had some difficulty getting this to work right, so I inspected the assembled code or ran it in an emulator, and then manually set the -110 offset to make it work. If you've modified any code that's before that lea statement, then you'll probably need to change that -110 to another number.

@leuat
Copy link
Author

leuat commented Aug 26, 2020

Thanks for the feedback! If I don't get it working, I'll probably just re-implement the whole thing in TRSE (kinda slow!) and then convert each part to assembly. I'm guessing that the crashes is due to the alignment, so I'll start there.

@steve-chamberlin
Copy link
Owner

As a starting point to get it working, I think you could use _nearLoop for all the copying, and delete _copyLoop and its related code as well as deleting _copyRLE, _doRLE, and _rleLoop. nearLoop is the most basic form of copying, and everything else is just intended to optimize specific cases. But I'm now noticing there might also be unaligned access in a few other places, like move.w (a0)+,d5 in BR2.

@leuat
Copy link
Author

leuat commented Aug 26, 2020

hehe I almost got it working, but not quite yet... check out this video: https://www.irio.co.uk/div/almost.mp4 , but the image still has artifacts. Yeah I don't really care about speed right now (usually, decompressing a single resource - like an image - won't have that much penalty in terms of speed)

@leuat
Copy link
Author

leuat commented Aug 26, 2020

ok I'll try to remove those other copying mechanisms

@leuat
Copy link
Author

leuat commented Aug 26, 2020

got it! Works like a charm! Check out https://www.irio.co.uk/div/almost.mp4 (when the first effect transitions, it reverts to the raw uncompressed data and scrambles) - but - thanks a bunch! Slow as %(#/%, but who cares.

I've added support in TRSE for an automatic "compression" flag, so the user will be able to include any resource as such:
..
myData : incbin("myImage.bin") compressed; // this will automatically compress +include the data

and within each M68K system, I've added a "compression" library so you can simply do "Compression::Decompress(myData, uncompressed_storage);"

Since I'm using (a modified version) of both your C++ compression / m68K decompression code, I'll add you to the list of credits of TRSE (again, www.turborascal.com). Also, I'll retain the header you have in the .h file. Let me know if I should add anything else

regards

Nicolaas

@steve-chamberlin
Copy link
Owner

Great, glad to hear it works. Have you read this: https://www.bigmessowires.com/2016/05/06/fc8-faster-68k-decompression/

FC8 is a balance between compression density and decompression speed. If decompression speed isn't very important for your purposes, then tighter compression is definitely possible. See the "tradeoffs" section of the linked blog post.

@steve-chamberlin
Copy link
Owner

Could you also submit your 68000 changes as a patch or a separate file? I'm sure others would benefit.

@leuat
Copy link
Author

leuat commented Aug 26, 2020

yup well I'll have to rewrite the library first, add some comments and make it more accessible. Also, since it is written as a library within TRSE, there will be some hybrid stuff - Pascal mixed with assembler. For for the most part, it should be simple to copy-paste. And yes, most of the decompression will be of images and demo effect data - stuff that can be done while the music is playing + the raster irq can do some simple effects, so no problem if the load time is 2s instead of 1 for an image. Would therefore love to have higher compression rates, especially since I'd like to keep things in RAM (as opposed to loading resources from disk).

@leuat
Copy link
Author

leuat commented Aug 26, 2020

The M68K decompression library (for the Atari ST) in TRSE can be found here : https://github.com/leuat/TRSE/blob/master/Publish/tutorials/ATARI520ST/tru/compression.tru

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants