Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RAM Usage issue with m3u8 videos #8

Closed
fl4shforward opened this issue Jan 5, 2024 · 15 comments · Fixed by #10
Closed

RAM Usage issue with m3u8 videos #8

fl4shforward opened this issue Jan 5, 2024 · 15 comments · Fixed by #10
Assignees
Labels
performance Unoptimized CPU/RAM usage

Comments

@fl4shforward
Copy link

fl4shforward commented Jan 5, 2024

Hi!
I dockerized your fork and run it on a NAS with 8gb of ram.

A creator I follow posts 15-20 minutes 4K videos (1,5 to 3GB file size) that make the scraper RAM usage explode and saturate all 8GB of ram of the nas. At most the scraper was using close to 6GB.
Capture_decran_2024-01-04_a_17 27 03

As you can see on the screenshot above, the NAS starts aggressively killing everything to get RAM back (all containers and nas services).
I was able to get around the issue by limiting ram usage with docker limits but the scraper runs super slow because of it.

I'm under the impression that m3u8 videos are fully downloaded in RAM before being offloaded in a file is that right ?
Would this be something that could be mitigated ?

@prof79
Copy link
Owner

prof79 commented Jan 5, 2024

Hi!

The M3U8 code is largely from the original codebase, regrettably I am not an expert for M3U8 and stream processing in general and in Python in particular.

All TS streams are plucked from the M3U8, downloaded and then re-muxed (demuxed/muxed). I don't know if mem usage already explodes during downloads or only during the re-muxing process. I see Avnsx properly uses a streamed web request saving memory but all streams are downloaded almost at once using a thread pool. For the re-muxing I currently lack the knowledge.

I can do some more research and try to debug it but without a concrete sample, where this can be observed, it is even more difficult. I also had to postpone some important private stuff due to #3 so I beg your patience.

@prof79 prof79 self-assigned this Jan 5, 2024
@prof79 prof79 added the help wanted Extra attention is needed label Jan 5, 2024
@prof79
Copy link
Owner

prof79 commented Jan 5, 2024

Now I see it, you are absolutely right - although the downloads are streamed/chunked, everything goes into a memory buffer and all .ts file contents are then collected/merged - also in memory!

This will take quite some effort to re-write the code using temporary files on disk instead and not breaking anything during the process. This may take 14 days or so according to my schedule, if I get some free spot maybe earlier but no promises.

@prof79 prof79 added performance Unoptimized CPU/RAM usage and removed help wanted Extra attention is needed labels Jan 5, 2024
@fl4shforward
Copy link
Author

fl4shforward commented Jan 5, 2024

Now I see it, you are absolutely right - although the downloads are streamed/chunked, everything goes into a memory buffer and all .ts file contents are then collected/merged - also in memory!

That would explain why ram usage is about double the size of the file. That's how I kind of guessed it was working with a quick look at the code.

No worries though, I'm not in a rush, it works, slowly, but it works.
Since I run it on a nas it's really not an urgent issue.

I was just sharing my discovery 😄

I run it fully headless and noticed the issue when my monitoring started alerting me about all my services going down lol

@prof79
Copy link
Owner

prof79 commented Jan 5, 2024

Thanks a lot for sharing, neither do I know 4K creators nor would I have noticed on my gaming PC 😂 (shame on me)
So, hopefully, all old code paths will get an overhaul sooner or later.

Glad this is just a private NAS and nothing critical 😂

@prof79
Copy link
Owner

prof79 commented Jan 19, 2024

Writing a little scraper for another site I learned more about M3U8 and MPEG-TS and ffmpeg, I plan on moving to this new way of downloading and processing. This might, however, break de-duped existing videos and re-download them. I also hope this will package properly. Stay tuned.

@prof79
Copy link
Owner

prof79 commented Jan 20, 2024

Hi, you might try this version but note the warning - try with a different folder or backup your existing creator(s). Though I have some ideas I do not yet have a solution for the de-duplication thing as the files essentially become different files when ffmpeg merges them proper.

I'm not sure whether this will work in your Docker container. It didn't work on WSL on a mounted project directory but I didn't try on a native Linux. There might be an issue regarding pyffmpeg and quoting as mentioned on their GitHub but the error message is different.

https://github.com/prof79/fansly-downloader-ng/releases/tag/ondemand

@fl4shforward
Copy link
Author

fl4shforward commented Jan 22, 2024

I'll have to try and make a new image with your new branch.
Gonna have to make a new branch myself with a new submodule version, never done that before and I'm not a git expert by any way.
Probably gonna take some days till I have the time to figure it out :)

[EDIT]

Following error occurs at concat step: "Error opening input files: No such file or directory"

 Info | 11:52 || Downloading video '2023-11-18_at_19-46_id_582327431130525696.m3u8'
2024-01-22 11:52:41,156 - pyffmpeg.FFmpeg - INFO - Checking GitHub Activeness: True
2024-01-22 11:52:44,568 - pyffmpeg.FFmpeg - INFO - Using /root/.pyffmpeg/bin/ffmpeg as ffmpeg file
2024-01-22 11:52:44,568 - pyffmpeg.FFmpeg - INFO - Options is: /root/.pyffmpeg/bin/ffmpeg -y -f concat -i "downloads/******/Messages/Videos/_ffmpeg_concat_.ffc" -c copy "downloads/******/Messages/Videos/2023-11-18_at_19-46_id_582327431130525696.mp4" as at now
2024-01-22 11:52:44,572 - pyffmpeg.FFmpeg - ERROR - Error opening input files: No such file or directory
 [43]ERROR | 11:52 || Unexpected error during Messages download: 
Traceback (most recent call last):
  File "/usr/src/fansly-ng/download/common.py", line 144, in process_download_accessible_media
    download_media(config, state, accessible_media)
  File "/usr/src/fansly-ng/download/media.py", line 162, in download_media
    file_downloaded = download_m3u8(
                      ^^^^^^^^^^^^^^
  File "/usr/src/fansly-ng/download/m3u8.py", line 166, in download_m3u8
    ffmpeg.options(
  File "/usr/local/lib/python3.11/site-packages/pyffmpeg/__init__.py", line 292, in options
    raise Exception(self.error)
Exception: Error opening input files: No such file or directory
Continuing in 15 seconds ...

Files and concat are written to disk:

image

_ffmpeg_concat_.ffc is populated:

Capture d’écran 2024-01-22 à 15 02 40

Shouldn't the files be prefixed with the full path (downloads/******/Messages/Videos/) in the _ffmpeg_concat_.ffc file ? Doesn't change anything.

@prof79
Copy link
Owner

prof79 commented Jan 22, 2024

Very quick and I wanted to write, I might have dispelled my Linux doubts in a few days :D

Actually there is two ways to do such concat files - relative or absolute; and absolute would require an unsafe flag. Relative paths are used relative to where the concat file is located - so should be no problem in this case, as I also specify the list file name in a fully-qualified manner to pyffmpeg.

I rather suspect, but could not yet test it due to hashing headaches, that pyffmpeg does some fancy quoting or non-quoting stuff with the command-line options in the background. Thus I'll try a version where I'll manually launch ffmpeg as provided by pyffmepg using the subprocess module. Some luck and this might already be a winner. I can at least tell you, using pyffmpeg's binary directly from a WSL command line and a set of .ts files and a list file, it works. There are also several issues in the pyffmpeg GitHub repository hinting that Linux support is currently broken and they, for whatever reason, do not commit the necessary changes/pull requests upstream. But this is the only package I could find so far that is not overblown/overly complicated and includes an ffmpeg binary independent of platform.

@fl4shforward
Copy link
Author

I'm under the impression that pyffmpeg "overrides" the current working dir which would obviously cause issues with relative paths. Currently setting up a linux VM to do some more testing.

Actually there is two ways to do such concat files - relative or absolute; and absolute would require an unsafe flag.

I'm curious about this, do you know of any doc I could read about that ? Never knew absolute paths were unsafe.

I rather suspect, but could not yet test it due to hashing headaches, that pyffmpeg does some fancy quoting or non-quoting stuff with the command-line options in the background. Thus I'll try a version where I'll manually launch ffmpeg as provided by pyffmepg using the subprocess module.

Could be a realtively quick and easy fix, yes.

@prof79
Copy link
Owner

prof79 commented Jan 22, 2024

Yeah and it gives more control like exception handling - I can also do fast and should probably start out a psychic 😂

Try this: https://github.com/prof79/fansly-downloader-ng/releases/tag/ondemand - 176c42f 😁

Regarding your question:

https://trac.ffmpeg.org/wiki/Concatenate

https://ffmpeg.org/ffmpeg-formats.html#concat-1 -> see 3.5.2

@prof79
Copy link
Owner

prof79 commented Jan 22, 2024

I've also implemented a new selective MP4 hashing algorithm that ignores stupid lavf version info or re-muxing "artifacts" like deviating bitrates and stuff in the header although the track data is identical to the old manual method.

Having an opt-in new, more succinct file naming scheme probably using a CRC is also on my personal wishlist. But I still don't know if using pHash for images currently is a beneficial or detrimental thing ...

@fl4shforward
Copy link
Author

fl4shforward commented Jan 22, 2024

It seems to work fine. I lifted the docker limits and I'm trying the full scrape of the creator that raised the issue.
RAM usage is way more under control. No services are down at the moment 😁
image
image

Happy NAS !

Regarding your question:

https://trac.ffmpeg.org/wiki/Concatenate

https://ffmpeg.org/ffmpeg-formats.html#concat-1 -> see 3.5.2

Oh, I see ! Never thought of that.

@prof79
Copy link
Owner

prof79 commented Jan 22, 2024

Awesome! 😁🙏
That's what I expected using memory-profiler, whose commented fragments are still there, where memory usage aside video was somewhat shy of 100 MiB. If I interpret it correctly, most of your RAM usage is caching by the NAS or Docker engine itself, the grey-blueish usage bar is almost unnoticeable 😁

We can leave this open if you want to do some more testing, you could close yourself or I could close with the next main branch release tomorrow+, also need to write up some explanation/release notes but not today ...

@fl4shforward
Copy link
Author

Blue is the RAM, yes the most I've seen used atm is around 900mb but most of the time it's around 180-200mb.
Red is cache, don't really know what it means though.

From what I see, all the big 4K vids are downloaded and no issue. I guess we can close.

@prof79
Copy link
Owner

prof79 commented Jan 22, 2024

Well, tbh I've not monitored/checked what RAM usage the ffmpeg binary contributes during a merge but sounds OK I guess.

Cache is cache 😁 - all the stuff from disk that is used by the NAS OS/services/Docker and identified as potentially required often is proactively loaded/stored in RAM - aka cached - since RAM is just so much faster than even the fastest SSD can be. What is more, this also ensures good use of your RAM instead of being empty to a large degree all the time 😁Also, stuff written back to disk may get buffered (cached) in RAM to speed things up and cut the disks some slack. But cache can be freed/shrinked by the OS as needed.
Eg. my Windows power horse with 32 GiB RAM from 18.1 GiB used RAM uses 13.1 GiB for caching alone, so effectively ~ 5 GiB are required for basic OS services, lots of browser sessions and VS Code.

@prof79 prof79 linked a pull request Jan 27, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Unoptimized CPU/RAM usage
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants