RUA is a new LLM architecture, designed for use low memory high compute in mind.
RUA is based on the transformer architecture, there are 2 parts the encoder modules and the decoder module. the output of the encoder modules act as independent K and V values for the decoder.