Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Manifest (performance improvement) #182

Closed
wants to merge 3 commits into from

Conversation

notcancername
Copy link

@notcancername notcancername commented Jan 31, 2023

WIP

This will probably be replaced by a hash-based manifest given the official site has 530K files.....

This PR will add support for the manifest.txt proposed in #85. It's expected to bring performance improvements. Closes #85.

Description

The manifest is optional. It must located at the root of the base URL, with the name manifest.txt. It must contain a list of LF, CRLF, LFCR, or CR-terminated paths, relative to the base URL. A path is made up of components, separated by either / or \. Each path component must not include any of the bytes /, \, LF, or CR. If the manifest is present, it must contain paths of all files and directories accessible to the client. A manifest may include a leading empty path component or a leading . path component. However, this is discouraged. The manifest should be sorted in ascending order.

Design issues

  • LF and CR are used as separators. Thus, paths may not contain LF or CR. This is done for ease of generation with find . >manifest.txt or whatever the Windows equivalent is. To fix this, escaping and/or zero-termination may be used instead.
  • \ is a valid path separator. Thus, /-separated path components may not contain \. This is done for compatibility with Windows.
  • Mixing \ and / path component separators is allowed. This is inelegant, but easy to process.

Implementation issues

  • Currently, we use binary search to determine whether an element exists. Thus, we will have to make log2(n)+1 comparisons in the worst case. This is not an issue for vanilla, a vanilla manifest is about 3000 paths and the worst case number of comparisons is 13. Also, storing all the flle paths wastes space and causes a bunch of cache misses. Consider using a hash set.
  • I kept the old fileExists to not break the API. Look into removing it.

Practical issues

  • In my testing, the fact that all the URLs are specified in lower case but the actual paths are sentence case caused issues. Not sure how to address that other than to specify it correctly.
  • I am a complete beginner at both JS and TS and have never done webdev. Please, PLEASE check all the code very thoroughly. While I have tested everything both with and without a manifest, I might still have fucked something up unknowingly.

@Salanto
Copy link

Salanto commented Jan 31, 2023

In my testing, the fact that all the URLs are specified in lower case but the actual paths are sentence case caused issues. Not sure how to address that other than to specify it correctly.

Convert to lowercase, lol. AO should've adopted a standard for content spelling years ago.

@notcancername
Copy link
Author

notcancername commented Jan 31, 2023

@Salanto That's what I ended up doing.

@notcancername notcancername changed the title Manifest Manifest (performance improvement) Jan 31, 2023
stonedDiscord
stonedDiscord previously approved these changes Jan 31, 2023
Copy link
Member

@stonedDiscord stonedDiscord left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

holy shit it actually became real

@stonedDiscord
Copy link
Member

stonedDiscord commented Jan 31, 2023

the windows command is
dir /S /B /A:-D > manifest.txt
and then you have to remove the absolute paths

@stonedDiscord stonedDiscord dismissed their stale review January 31, 2023 23:14

testing booboo

@notcancername
Copy link
Author

@stonedDiscord You said manfred's sprites don't load anymore after you add the manifest?

@notcancername
Copy link
Author

notcancername commented Jan 31, 2023

confirmed, I guess?
That didn't happen on my test server 🤔
The fact that it's 24(!!!) MB is concerning. We probably need a new approach.

@notcancername
Copy link
Author

I'm closing this. 24 Megs per WebAO session is not tolerable.

@@ -11,7 +13,7 @@ const getAnimLength = async (url) => {
const extensions = ['.gif', '.webp', '.apng'];
for (const extension of extensions) {
const urlWithExtension = url + extension;
const exists = await fileExists(urlWithExtension);
const exists = await fileExistsManifest(urlWithExtension);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there's some extra spacing here

* Otherwise, look it up in the manifest */
const fileExistsManifest = async (url) =>
new Promise((resolve, reject) => {
if(client.manifest.length == 0) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

=== here please

resolve(fileExists(url));
return;
}
const c_url = encodeURI(canonicalizePath(decodeURI(url.slice(AO_HOST.length))));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, JS uses camel case, so it may be a good idea to stick with that schema. Also calling it the full canonical.
Kinda nitpicking that since c has to be slightly interpreted

} else {
url = `${characterFolder}${encodeURI(charactername)}/${encodeURI(
prefix
)}${encodeURI(emotename)}${extension}`;
}
const exists = await fileExists(url);

const exists = await fileExistsManifest(url);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sure to update the unit tests for setEmote

else if (comparison < 0)
start = middle + 1;
else
end = middle - 1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we get unit tests for this file and can you also convert it to TS?

}
const c_url = encodeURI(canonicalizePath(decodeURI(url.slice(AO_HOST.length))));

if(binarySearch(client.manifest, c_url) != null)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

!==

/* Removes `/./`, `/../`, and `//`.
* Does not add a leading `/` or `./`.
* Does not add a trailing `/`. */
export function canonicalizePath(path: string): string {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sure to update the unit tests please

* Does not add a leading `/` or `./`.
* Does not add a trailing `/`. */
export function canonicalizePath(path: string): string {
const path_elements = path.split("/");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

camelCasing here please

@@ -10,11 +10,11 @@ const tryUrls = async (url: string) => {
for (let i = 0; i < urlExtensionsToTry.length; i++) {
const extension = urlExtensionsToTry[i]
const fullFileUrl = url + extension
const exists = await fileExists(fullFileUrl);
const exists = await fileExistsManifest(fullFileUrl);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unit tests for this please

@oldmud0
Copy link
Member

oldmud0 commented Feb 1, 2023

I would suggest making separate manifests in these places:

  • base/characters/**/manifest.txt - basically, one manifest per character. When a character is loaded at startup, also load the manifest.
  • base/sound/manifest.txt
  • base/evidence/manifest.txt
  • base/background/manifest.txt
  • base/manifest.txt that covers the remainder.

Then make a file that instructs webAO where each manifest is located, and to preload each manifest (except for the character manifests). This would be useful for a more general mount system where there are multiple sources of assets.

There is a more esoteric approach that focuses on your particular need, which is to use filename hashes and also use their byte representation instead of text. You could also try to encode it in a tree structure like Huffman coding, e.g.

9f -- 64 -- 2d -- 1e      value: 9f642d1e
         \- 1c -- 2d      value: 9f641c2d
               \- cc      value: 9f641ccc

encoded:
9f 64 2d 1e UP UP 1c 2d UP cc
where the UP character can be anything

@Salanto
Copy link

Salanto commented Feb 1, 2023

Getting some air conditioning vibes rn :
https://github.com/AttorneyOnline/AO3Protocol/blob/master/assets.md

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support server content file manifest via manifest.txt to reduce number of guessing
5 participants