changelog
-
+
-
+
version 598
+-
+
misc
+ - I screwed up the import folder deduplication fix last week! it caused import folders that contained duplicated items (and a handful of subscriptions, and even one normal GUI session) to not be able to save back their work. nothing was damaged, _per se_, but progress was not being saved and work was stopping after the respective systems paused out of safety. I am sorry for the trouble and worry here, and I hate it when this kind of error happens. I did made a test to test this thing worked, but it wasn't good enough. I have fixed it now and I am rejigging my test procedures to explicitly check for this specific class of object type problem (issue #1624) +
- fixed the duplicate filter comparison statements to obey the new 'do not use pretty (720p etc..) resolution swap-in strings' option (issue #1621) +
- the 'maintenance and processing' page now has some expand/collapse stuff on its boxes to make the options page not want to be so tall on startup +
- the 'edit filename tagging options' panel under the 'edit import folder' dialog now auto-populates the example filename from the actual folder's current contents. thanks to a user for pointing this out +
- moved a bunch of checkboxes around in the options. `options->tags` is renamed `tag autocomplete tabs` and now just handles children and favourites. `search` is renamed `file search` and handles the 'read' autocomplete and implicit system:limit, and a new page `tag editing` is added to handle the 'write' autocomplete and various 'tag service panel' settings +
- the normal search page 'selection tags' list now only computes the tags for the first n thumbnails (default 4096) on a page when you have no files selected. this saves time on mega pages when you click off a selection and also on giant import pages where new files are continually streaming in at the end. I expect this to reduce CPU and lag significantly on clients that idle looking at big import pages. you can set the n under `options->tag presentation`, including turning it off entirely. I did some misc optimisation here too, but I also found some places I can improve the general tag re-compute in future cleanup work +
- I may have improved some media viewer hover window positioning, sizing, and flicker in layout, particularly on the note window +
- the 'do really heavy sibling and parents calculation work in the background' daemon now waits 60 seconds after boot to start work (previously 10s). since I added the new fast sibling and parent cache (which works quick but takes some extra work to initialise), I've noticed you often get a heap of lag as this guy is initially populated right after boot. so, the primary caller now happens a little later in the boot rush and _should_ smooth out the curve a little +
listbooks
+ - I rewrote the 'ListBook' the options dialog relies on from ancient and irll-desingned wx code to a nice clean simple Qt panel +
- if you have a ton of tag services, a new 'use listbook instead of tabbed notebook for tag service panels' checkbox under `options->tag editing` now lets you use the new listbook instead of the old notebook/tabbed widget in: manage tags, manage tag siblings, manage tag parents, manage tag display and application, and review tag display sync +
drag and drops
+ - moved the DnD options out of `options->gui` and to a new `exporting` panel and added a bit of text +
- the BUGFIX 'secret' Discord fix is now formalised into an always-on 'set the DnD to a move flag', with a nice explanatory tooltip. it is now also always safe because it will now only ever run if you are set to export your DnDs to the temp folder +
- the 'DnD temp folder' system is now cleaner and DnD temp folders will now be deleted after six hours (previously they were only cleaned up on client exit) +
- added a note to the 'getting started with files' help to say you can export files with drag and drop m8 +
some multi-column list fixes
+ - fixed a bad list type definition in the new auto-resolution rules UI. it thought it was the export folder dialog's list and was throwing weird errors if that list was sorted in column >=4 +
- if a multi-column list fails to sort, it now catches and displays the error and continues with whatever was going on at the time +
- if a multi-column list status is asked for a non-existing column type, the status now reports the error info and attempts its best fallback +
- improved multi-column list initialisation across the board so the above problem cannot happen again (the list type was being set in two different locations, and I missed a ctrl+c/v edit) +
parsing
+ - behind the scenes, the 'subsidiary page parser' is now one object. it was a janky thing before +
- the subsidiary page parsers list in the parsing edit UI now has import/export/duplicate buttons +
- it doesn't matter outside of file log post order, I don't think, but subsidiary page parsers now always work in alphabetical order +
- they also now name themselves specifically when they cause an error +
- parsers now deduplicate the list when saying what they 'produce/parse' in UI +
boring linting cleanup
+ - tweaked my linter settings to better catch some stupid errors and put the effort into cleaning up the hundreds of long-time warnings, probably more than a thousand items of Qt Signal false-positive spam, and the actual real bugs. I am hoping to better expose future needles without a haystack of garbage in the way. I am determined to maintain a 0 error count on Unresolved References going forward +
- every single unused import statement is now removed or suppressed. I'm sure there are still tangles and bad ideas generally, but everything is completely lean now +
- fixed some PILImage enum references +
- improved some hydrus serialisable typedefs +
- fixed some exception/warning defs +
- deleted some old defunct 'retry' code from subscriptions +
- fixed some bitmap generation code to handle non-c-contiguous memoryviews properly +
- cleaned up some html parsing to properly navigate weird stuff bs4 might put out +
- fixed a stupid type error in the old HydrusTagArchive namespace code +
- fixed some account type calls in _manage services_ auto-account creation +
- fixed an issue with unusual tab drag and drops +
- deleted the empty `TestClientData.py` +
- deleted the empty `ServerServices.py` +
- fixed a bunch of misc typedefs in general +
boring build/source stuff
+ - updated my Windows 'running from source' help to now say you need to put the sqlite3.dll in your actual python DLLs dir. as this is more scary than just dropping it in your hydrus install dir, I emphasise this is optional +
- updated my 'requirements_server.txt', which is not used but a reference, to use the new requests and setuptools versions we recently updated to +
- I am dropping support for the ancient OpenCV 2. we've had some enum monkeypatches in place for years and years, but I don't even know if 2 will even run on any modern python; it is gone now +
-
version 597
-
diff --git a/running_from_source.html b/running_from_source.html
index b4189107a..4235ffffe 100644
--- a/running_from_source.html
+++ b/running_from_source.html
@@ -2548,7 +2548,9 @@
Built Programshere. You want the x64 dll.
-
FFMPEG
@@ -2818,7 +2820,7 @@My Code - November 6, 2024 + November 13, 2024 diff --git a/search/search_index.json b/search/search_index.json index fdb42936a..3c86e01b4 100644 --- a/search/search_index.json +++ b/search/search_index.json @@ -1 +1 @@ -{"config":{"lang":["en"],"separator":"[\\s\\-]+","pipeline":["stopWordFilter"]},"docs":[{"location":"index.html","title":"hydrus network - client and server","text":"
The hydrus network client is a desktop application written for Anonymous and other internet enthusiasts with large media collections. It organises your files into an internal database and browses them with tags instead of folders, a little like a booru on your desktop. Tags and files can be anonymously shared through custom servers that any user may run. Everything is free, nothing phones home, and the source code is included with the release. It is developed mostly for Windows, but builds for Linux and macOS are available (perhaps with some limitations, depending on your situation).
The software is constantly being improved. I try to put out a new release every Wednesday by 8pm Eastern.
Hydrus supports various filetypes for images, video and audio files, image project files, and more. A full list of supported filetypes is here.
On the Windows and Linux builds, an MPV window is embedded to play video and audio smoothly. For files like pdf, which cannot currently be viewed in the client, it is easy to launch any file with your OS's default program.
The client can download files and parse tags from a number of websites, including by default:
- 4chan and other imageboards, with a thread watcher
- the popular boorus
- gallery sites like deviant art, hentai foundry, and pixiv
- tumblr and twitter
And can be extended to download from more locations using easily shareable user-made downloaders. It can also be set to 'subscribe' to any gallery search, repeating it every few days to keep up with new results.
The program's emphasis is on your freedom. There is no DRM, no spying, no censorship. The program never phones home.
"},{"location":"index.html#start_here","title":"Start Here","text":"If you would like to try hydrus, I strongly recommend you check out the help and getting started guide. It will take you through all the main systems.
"},{"location":"index.html#links","title":"links","text":"- homepage
- github (latest build)
- issue tracker
- 8chan.moe /t/ (Hydrus Network General)
- tumblr
- x
- discord
- patreon
- user-run repository and wiki (including download presets for several non-default boorus)
- more links and contact info
- Hydrus crashes without a crash log
- Standard error reads
Killed
- System logs say OOMKiller
- Programs appear to havevery high virtual memory utilization despite low real memory.
Add the followng line to the end of
/etc/sysctl.conf
. You will need admin, so usesudo nano /etc/sysctl.conf
orsudo gedit /etc/sysctl.conf
vm.min_free_kbytes=1153434\nvm.overcommit_memory=1\n
Check that you have (enough) swap space or you might still run out of memory.
sudo swapon --show\n
If you need swap
Add tosudo fallocate -l 16G /swapfile #make 16GiB of swap\nsudo chmod 600 /swapfile\nsudo mkswap /swapfile\n
/etc/fstab
so your swap is mounted on reboot/swapfile swap swap defaults 0 0\n
You may add as many swapfiles as you like, and should add a new swapfile before you delete an old one if you plan to do so, as unmounting a swapfile will evict its contents back in to real memory. You may also wish to use a swapfile type that uses compression, this saves you some disk space for a little bit of a performance hit, but also significantly saves on mostly empty memory.
Reboot for all changes to take effect, or use
"},{"location":"Fixing_Hydrus_Random_Crashes_Under_Linux.html#details","title":"Details","text":"sysctl
to setvm
variables.Linux's memory allocator is lazy and does not perform opportunistic reclaim. This means that the system will continue to give your process memory from the real and virtual memory pool(swap) until there is none left.
Linux will only cleanup if the available total real and virtual memory falls below the watermark as defined in the system control configuration file
/etc/sysctl.conf
. The watermark's name isvm.min_free_kbytes
, it is the number of kilobytes the system keeps in reserve, and therefore the maximum amount of memory the system can allocate in one go before needing to reclaim memory it gave eariler but which is no longer in use.The default value is
vm.min_free_kbytes=65536
, which means 66MiB (megabytes).If for a given request the amount of memory asked to be allocated is under
vm.min_free_kbytes
, but this would result in an ammount of total free memory less thanvm.min_free_kbytes
then the OS will clean up memory to service the request.If
vm.min_free_kbytes
is less than the ammount requested and there is no virtual memory left, then the system is officially unable to service the request and will lauch the OOMKiller (Out of Memory Killer) to free memory by kiling memory glut processes.Increase the
"},{"location":"Fixing_Hydrus_Random_Crashes_Under_Linux.html#the_oom_killer","title":"The OOM Killer","text":"vm.min_free_kbytes
value to prevent this scenario.The OOM kill decides which program to kill to reclaim memory, since hydrus loves memory it is usually picked first, even if another program asking for memory caused the OOM condition. Setting the minimum free kilobytes higher will avoid the running of the OOMkiller which is always preferable, and almost always preventable.
"},{"location":"Fixing_Hydrus_Random_Crashes_Under_Linux.html#memory_overcommmit","title":"Memory Overcommmit","text":"We mentioned that Linux will keep giving out memory, but actually it's possible for Linux to launch the OOM killer if it just feel like our program is aking for too much memory too quickly. Since hydrus is a heavyweight scientific processing package we need to turn this feature off. To turn it off change the value of
vm.overcommit_memory
which defaults to2
.Set
"},{"location":"Fixing_Hydrus_Random_Crashes_Under_Linux.html#what_about_swappiness","title":"What about swappiness?","text":"vm.overcommit_memory=1
this prevents the OS from using a heuristic and it will just always give memory to anyone who asks for it.Swapiness is a setting you might have seen, but it only determines Linux's desire to spend a little bit of time moving memory you haven't touched in a while out of real memory and into virtual memory, it will not prevent the OOM condition it just determines how much time to use for moving things into swap.
"},{"location":"Fixing_Hydrus_Random_Crashes_Under_Linux.html#why_does_my_linux_system_studder_or_become_unresponsive_when_hydrus_has_been_running_a_while","title":"Why does my Linux system studder or become unresponsive when hydrus has been running a while?","text":"You are running out of pages because Linux releases I/O buffer pages only when a file is closed, OR memory fragmentation in Hydrus is high because you have a big session weight or had a big I/O spike. Thus the OS is waiting for you to hit the watermark(as described in \"why is hydrus crashing\") to start freeing pages, which causes the chug.
When contents is written from memory to disk the page is retained so that if you reread that part of the disk the OS does not need to access disk it just pulls it from the much faster memory. This is usually a good thing, but Hydrus makes many small writes to files you probably wont be asking for again soon it eats up pages over time.
Hydrus also holds the database open and red/wrires new areas to it often even if it will not acess those parts again for ages. It tends to accumulate lots of I/O cache for these small pages it will not be interested in. This is really good for hydrus (because it will over time have the root of the most important indexes in memory) but sucks for the responsiveness of other apps, and will cause hydrus to consume pages after doing a lengthy operation in anticipation of needing them again, even when it is thereafter idle. You need to set
vm.dirtytime_expire_seconds
to a lower value.vm.dirtytime_expire_seconds
When a lazytime inode is constantly having its pages dirtied, the inode with an updated timestamp will never get chance to be written out. And, if the only thing that has happened on the file system is a dirtytime inode caused by an atime update, a worker will be scheduled to make sure that inode eventually gets pushed out to disk. This tunable is used to define when dirty inode is old enough to be eligible for writeback by the kernel flusher threads. And, it is also used as the interval to wakeup dirtytime writeback thread.On many distros this happens only once every 12 hours, try setting it close to every one hour or 2. This will cause the OS to drop pages that were written over 1-2 hours ago. Returning them to the free store for use by other programs.
https://www.kernel.org/doc/Documentation/sysctl/vm.txt
"},{"location":"Fixing_Hydrus_Random_Crashes_Under_Linux.html#why_does_everything_become_clunky_for_a_bit_if_i_have_tuned_all_of_the_above_settings_especially_if_i_try_to_do_something_on_the_system_that_isnt_hydrus","title":"Why does everything become clunky for a bit if I have tuned all of the above settings? (especially if I try to do something on the system that isn't hydrus)","text":"The kernel launches a process called
kswapd
to swap and reclaim memory pages, after hydrus has used pages they need to be returned to the OS (unless fragmentation is preventing this). The OS needs to scan for pages allocated to programs which are not in use, it doens't do this all the time because holding the required locks would have a serious performance impact. The behaviour ofkswapd
is goverened by several important values. If you are using a classic system with a reasonably sized amount of memoery and a swapfile you should tune these. If you are using memory compression (or should be using memory compression because you have a cheap system) read this whole document for info specific to that configuration.-
vm.watermark_scale_factor
This factor controls the aggressiveness of kswapd. It defines the amount of memory left in a node/system before kswapd is woken up and how much memory needs to be free before kswapd goes back to sleep. The unit is in fractions of 10,000. The default value of 10 means the distances between watermarks are 0.1% of the available memory in the node/system. The maximum value is 1000, or 10% of memory. A high rate of threads entering direct reclaim (allocstall) or kswapd going to sleep prematurely (kswapd_low_wmark_hit_quickly) can indicate that the number of free pages kswapd maintains for latency reasons is too small for the allocation bursts occurring in the system. This knob can then be used to tune kswapd aggressiveness accordingly. -
vm.watermark_boost_factor
: If memory fragmentation is high raise the scale factor to look for reclaimable/swappable pages more agressively.
I like to keep
watermark_scale_factor
at 70 (70/10,000)=0.7%, so kswapd will run until at least 0.7% of system memory has been reclaimed. i.e. If 32GiB (real and virt) of memory, it will try to keep at least 0.224 GiB immediately available.vm.dirty_ratio
: The absolute maximum number of un-synced memory(as a percentage of available memory) that the system will buffer before blocking writing processes. This protects you against OOM, but does not keep your system responsive.-
Note: A default installation of Ubuntu sets this way too high (60%) as it does not expect your workload to just be hammering possibly slow disks with written pages. Even with memory overcomitting this can make you OOM, because you will run out of real memory before the system pauses the program that is writing so hard. A more reasonable value is 10 (10%)
-
vm.dirty_background_ratio
: How many the number of unsynced pages that can exist before the system starts comitting them in the background. If this is set too low the system will constantly spend cycles trying to write out dirty pages. If it is set too high it will be way to lazy. I like to set it to 8. -
vm.vfs_cache_pressure
The tendancy for the kernel to reclaim I/O cache for files and directories. This is less important than the other values, but hydrus opens and closes lots of file handles so you may want to boost it a bit higher than default. Default=100, set to 110 to bias the kernel into reclaiming I/O pages over keeping them at a \"fair rate\" compared to other pages. Hydrus tends to write a lot of files and then ignore them for a long time, so its a good idea to prefer freeing pages for infrequent I/O. Note: Increasingvfs_cache_pressure
significantly beyond 100 may have negative performance impact. Reclaim code needs to take various locks to find freeable directory and inode objects. Withvfs_cache_pressure=1000
, it will look for ten times more freeable objects than there are.
An example
/etc/sysctl.conf
section for virtual memory settings.
"},{"location":"Fixing_Hydrus_Random_Crashes_Under_Linux.html#virtual_memory_under_linux_5_phenomenal_cosmic_power_itty_bitty_living_space","title":"Virtual Memory Under Linux 5: Phenomenal Cosmic Power; Itty bitty living space","text":"########\n# virtual memory\n########\n\n#1 always overcommit, prevents the kernel from using a heuristic to decide that a process is bad for asking for a lot of memory at once and killing it.\n#https://www.kernel.org/doc/Documentation/vm/overcommit-accounting\nvm.overcommit_memory=1\n\n#force linux to reclaim pages if under a gigabyte \n#is available so large chunk allocates don't fire off the OOM killer\nvm.min_free_kbytes = 1153434\n\n#Start freeing up pages that have been written but which are in open files, after 2 hours.\n#Allows pages in long lived files to be reclaimed\nvm.dirtytime_expire_seconds = 7200\n\n#Have kswapd try to reclaim .7% = 70/10000 of pages before returning to sleep\n#This increases responsiveness by reclaiming a larger portion of pages in low memory condition\n#So that the next time you make a large allocation the kernel doesn't have to stall and look for pages to free immediately.\nvm.watermark_scale_factor=70\n\n#Have the kernel prefer to reclaim I/O pages at 110% of the rate at which it frees other pages.\n#Don't set this value much over 100 or the kernel will spend all its time reclaiming I/O pages\nvm.vfs_cache_pressure=110\n
Are you trying to run hydrus on a 200 dollar miniPC, this is suprisingly doable, but you will need to really understand what you are tuning.
To start lets explain memory tiers. As memory moves further away from the CPU it becomes slower. Memory close to the CPU is volatile, which means if you remove power from it, it disappears forever. Conversely disk is called non-volatile memory, or persistant storage. We want to get files written to non-volatile storage, and we don't want to have to compete to read non-volatile storage, we would also prefer to not have to compete for writing, butthis is harder.
The most straight forward way of doing this is to seperate where hydrus writes its SQLITE database(index) files, from where it writes the imported files. But we can make a more flexible setup that will also keep our system responsive, we just need to make sure that the system writes to the fastest possible place first. So let's illustrate the options.
graph\n direction LR\n CPU-->RAM;\n RAM-->ZRAM;\n ZRAM-->SSD;\n SSD-->HDD;\n subgraph Non-Volatile\n SSD;\n HDD;\n end
- RAM: Information must be in RAM for it to be operated on
- ZRAM: A compressed area of RAM that cannot be directly accessed. Like a zip file but in memory. Or for the more technical, like a compressed ramdisk.
- SSD: Fast non-volatile storage,good for random access about 100-1000x slower than RAM.
- HDD: Slow non-volatile storage good for random access. About 10000x slowee than RAM.
- Tape(Not shown): Slow archival storage or backup. surprisingly fast actually but can only be accessed sequentially.
The objective is to make the most of our limited hardware so we definitely want to go through zram first. Depending on your configuration you might have a bulk storage (NAS) downstream that you can write the files to, if all of your storage is in the same tower as you are running hydrus, then make sure the SQLITE .db files are on an SSD volume.
Next you should enable ZRAM devices (Not to be confused with ZWAP). A ZRAM device is a compressed swapfile that lives in RAM.
ZRAM can drastically improve performance, and RAM capacity. Experimentally, a 1.7GB partition usually shrinks to around 740MiB. Depending on your system ZRAM may generate several partitions. The author asked for 4x2GB=8GIB partitions, hence the cited ratio.
ZRAM must be created every boot as RAM-disks are lost when power is removed Install a zram generator as part of your startup process. If you still do not have enough swap, you can still create a swapfile. RAM can be configured to use a partition as fallback, but not a file. However you can enable a standard swapfile as described in the prior section. ZRAM generators usually create ZRAM partitions with the highest priority (lowest priority number) so ZRAM will fill up first, before normal disk swaping.
To check your swap configuration
swapon #no argument\ncat /proc/swapinfo\n
To make maximum use of your swap make sure to SET THE FOLLOWING VM SETTINGS
#disable IO clustering we are writing to memroy which is super fast\n#IF YOU DO NOT DO THIS YOUR SYSTEM WILL HITCH as it tries to lock multiple RAM pages. This would be desirable on non-volatile storages but is actually bad on RAM.\nvm.page-cluster=0\n\n#Tell the system that it costs almost as much to swap as to write out dirty pages. But bias it very slightly to completing writes. This is ideal since hydrus tends to hammer the system with writes, and we want to use ZRAM to eat spikes, but also want the system to slightly prefer writing dirty pages\nvm.swappiness=99\n
The above is good for most users. If however you also need to speed up your storage due to a high number of applications on your network using it you may wish to install cache, provided you have at least one or two avialable SSD slots, and the writing pattern is many small random writes.
You should never create a write cache without knowing what you are doing. You need two SSDs to crosscheck eachother, and ideally ressilant server SSDs with large capacitors that ensure all content is always written. If you go with a commercial storage solution they will probably check this already, and give you a nice interface for just inserting and assigning SSD cache.
You can also create a cach manually wit the Logical Volume using the LVM. If you do this you can group together storage volumes. In particular you can put a read or write cache with an SSD in front of slower HDD.
"},{"location":"PTR.html","title":"PTR for Dummies","text":"or Myths and facts about the Public Tag Repository
"},{"location":"PTR.html#what_is_the_ptr","title":"What is the PTR?","text":"Short for Public Tag Repository, a now community managed repository of tags. Locally it acts as a tag service, just like
my tags
. At the time of writing 54 million files have tags on it. The PTR only store the sha256 hash and tag mappings of a file, not the files themselves or any non-tag meta data. In other words: If you do not see it in the tag list then it is not stored.Most of the things in this document also applies to self-hosted servers, except for tag guidelines.
"},{"location":"PTR.html#connecting_to_the_ptr","title":"Connecting to the PTR","text":"The easiest method is to use the built in function, found under
help -> add the public tag repository
. For adding it manually, if you so desire, read the Hydrus help document on access keys.Once you are connected, Hydrus will proceed to download and then process the update files. The progress of this can be seen under
services -> review services -> remote -> tag repositories -> public tag repository
. Here you can view its status, your account (the default account is a shared public account. Currently only janitors and the administrator have personal accounts), tag status, and how synced you are. Being behind on the sync by a certain amount makes you unable to push tags and petitions until you are caught up again.QuickSync 2
If you are starting out with a completely fresh client (i.e. you have not imported any files yet), you can instead download a fully pre-synced client here (overview) Though a little out of date, it will nonetheless save processing time. Some settings may differ from the defaults of an official installation.
"},{"location":"PTR.html#how_does_it_work","title":"How does it work?","text":"For something to end up on the PTR it has to be pushed there. Tags can either be entered into the tag service manually by the user through the
manage tags
window, or be routed there by a parser when downloading files. See parsing tags. Once tags have been entered into the PTR tag service they are pending until pushed. This is indicated by thepending ()
that will appear betweentags
andhelp
in the menu bar. Here you can chose to either push your changes to the PTR or discard them.- Adding tags pass automatically.
- Deleting (petitioning) tags requires janitor action.
- If a tag has been deleted from a file it will not be added again.
- Currently there is no way for a normal user to re-add a deleted tag. If it gets deleted then it is gone. A janitor can undelete tags manually.
- Adding and petitioning siblings and parents all require janitor action.
- The client always assumes the server approves any petition. If your petition gets rejected you wont know.
When making petitions it is important to remember that janitors are only human. We do not necessarily know everything about every niche. We do not necessarily have the files you are making changes for and we will only see a blank thumbnail if we do not have the file. Explain why you are making a petition. Try and keep the number of files manageable. If a janitor at any point is unsure if the petition is correct they are likely to deny the entire petition rather than risk losing good tags. Some users have pushed changes regarding hundreds of tags over thousands of files at once, but due to disregarding PTR tagging practices or being lazy with justification the petition has been denied entirely. Or they have just been plain wrong, trying to impose frankly stupid tagging methods.
Furthermore, if you are two weeks out of sync with PTR you are unable to push additions or deletions until you're back within the threshold.
Q: Does this automagically tag my files? A: No. Until we get machine learning based auto-tagging nothing is truly automatic. All tags on the PTR were uploaded by another user, so if nobody uploaded tags associated with the hash of your file it won't have any tags in the PTR. Q: How good is the PTR at tagging [insert file format or thing from site here]? A: That depends largely on if there's a scrapable database of tags for whatever you're asking about. Anything that comes from a booru or site that supports tags is fairly likely to have something on the PTR. Original content on some obscure chan-style imageboard is less so. Q: Help! My files don't have any tags! What do!? A: As stated above, some things are just very likely to not have any tags. It is also possible that the files have been altered by whichever service you downloaded from. Imgur, Reddit, Discord, and many other sites and services recompress images to save space which might give it a different hash even if it looks indistinguishable from the original file. Use one of the IQDB lookup programs linked in Cuddle's wiki. Q: Why is my database so big!? This can't be right. A: It is working as intended. The size is because you are literally downloading and processing the entire tag database and history of the PTR. It is done this way to ensure redundancy and privacy. Redundancy because anybody with an up-to-date PTR sync can just start their own. Privacy because nobody can tell what files you have since you are downloading the tags for everything the PTR has. Q: Does that mean I can't do anything about the size? A: Correct. There are some plans to crunch the size through a few methods but there are a lot of other far more requested features being, well, requested. Speaking crassly if you are bothered by the size requirement of the PTR you probably don't have a big enough library to really benefit and would be better off just using the IQDB script."},{"location":"PTR.html#janitors","title":"Janitors","text":"Janitors are the people that review petitions. You can meet us at the community Discord to ask questions or see us bitch about some of the silly stuff boorus and users cause to end up in the PTR.
"},{"location":"PTR.html#tag_guidelines","title":"Tag Guidelines","text":"These are a mix of standard practice used by various boorus and changes made by Hydrus Developer and PTR users, ratified by the janitors that actually have to manage all of this. The \"full\" document is viewable at Cuddle's git repo. See Hydrus Developer's thoughts on a public tagging schema.
If you are looking to help out by tagging low tag-count files, remember to keep the tags objective, start simple by for example adding the characters/persons and big obvious things in the image or what else. Tagging every little thing and detail is a sure path to burnout. If you are looking to petition removal of tags then it is preferable to sibling common misspellings, underscores, and defunct tags rather than deleting them outright. The exception is for ambiguous tags where it is better to delete and replace with a less ambiguous tag. When deleting tags that don't belong in the image it can be helpful if you include a short description as to why. It's also helpful if you sanitise downloaded tags from sites with tagged galleries before pushing them to the PTR. For example Pixiv, where you can have a gallery of multiple images, each containing one character, and all of the characters being tagged. Consequently all images in that gallery will have all of the character tags despite no image having more than one character.
"},{"location":"PTR.html#siblings_and_parents","title":"Siblings and parents","text":"When making siblings, go for the closest less-bad tag. Example:
bad_tag
->bad tag
, rather than going for what the top level sibling might be. This creates less potential future work in case standards change and makes it so your request is less likely to be denied by a janitor not being entirely certain that what you're asking is right. Be careful about creating siblings for potentially ambiguous tags. Isjames bond
supposed to becharacter:james bond
or is itseries:james bond
? This is a bit of a bad example due to having the case of the character always belonging to the series, so you can safely sibling it toseries:james bond
since all instances of the character will also have the series, but not all instances of the series will have the character. So let us look at another example: how aboutwool
? Is it the material harvested from sheep, or is it the Malaysian artist that likes to draw Touhou? In doubtful cases it's better to leave it as is, petition the tag for deletion if it's incorrect and add the correct tag.When making parents, make sure it's an always factually correct relationship.
"},{"location":"PTR.html#namespaces","title":"Namespaces","text":"character:james bond
always belongs toseries:james bond
. Butcharacter:james bond
is not alwaysperson:pierce brosnan
. Common examples of not-always true relationships: gender (genderbending), species (furrynisation/humanisation/anthropomorphism), hair colour, eye colour, and other mutable traits.creator:
Used for the creator of the tagged piece of media. Hydrus being primarily used for images it will often be the artist that drew the image. Other potential examples are the author of a book or musician for a song.character:
Refers to characters. James Bond is a character.person:
Refers to real persons. Pierce Brosnan is a person.series:
Used for series. James Bond is a series tag and so is GoldenEye. Due to usage being different on some boorus chance is that you will also see things like Absolut Vodka and other brands in it.photoset:
Used for photosets. Primarily seen for content from idols, cosplayers, and gravure idols.studio:
Is used for the entity that facilitated the production of the file or what's in it. Eon Productions for the James Bond movies.species:
Species of the depicted characters/people/animals. Somewhat controversial for being needlessly detailed, some janitors not liking the namespace at all. Primarily used for furry content.title:
The title of the file. One of the tags Hydrus uses for various purposes such as sorting and collecting. Somewhat tainted by rampant Reddit parsers.medium:
Used for tags about the image and how it's made. Photography, water painting, napkin sketch as a few examples. White background, simple background, checkered background as a few others. What you see about the image.meta:
This namespace is used for information that isn't visible in the image itself or where you might need to go to the source. Some examples include: third-party edit, paid reward (patreon/enty/gumroad/fantia/fanbox), translated, commentary, and such. What you know about the image.Namespaces not listed above are not \"supported\" by the janitors and are liable to get siblinged out, removed, and/or mocked if judged being bad and annoying enough to justify the work. Do not take this to mean that all un-listed namespaces are bad, some are created and used by parsers to indicate where an image came from which can be helpful if somebody else wants to fetch the original or check source tags against the PTR tags. But do exercise some care in what you put on the PTR if you use custom namespaces. Recently
"},{"location":"Understanding_Database_Synchronization.html","title":"Understanding Database Synchronization Options","text":"clothing:
was removed due to being disliked, no booru using it, and the person(s) pushing for it seeming to have disappeared, leaving a less-than-finished mess behind. It was also rife with lossy siblings and things that just plain don't belong with clothing, such asclothing:brown hair
.Tuning your database synchronization using the
"},{"location":"Understanding_Database_Synchronization.html#key_points","title":"Key Points","text":"--db_synchronous_override=0
launch argument can make Hydrus significantly faster with some caveats.- This is a tutorial for advanced users who have read and understood this document and the risk/recovery procedure.
- It is nearly always safe to use
--db_synchronous_override=1
on any modern filesystem and this is the default. - It is always more expensive to access the disk than doing things in memory. SSDs are 10-100x as slow as memory, and HDDs are 1000-10000x as slow as memory.
- If you turn synchronization to
0
you are gambling, but it is a safe gamble if you have a backup and know exactly what you are doing. - After running with synchronization set to zero you must either:
- Exit hydrus normally and let the OS flush disk caches (either by letting the system run/\"idle\" for a while, running
sync
on *NIX systems, or normal shutdown), or - Restore the sqlite database files backup if the OS shutdown abnormally.
- Because of the potential for a lot of outstanding writes when using
synchronous=0
, other I/O on your system will slow down as the pending writes are interleaved. Normal shutdown may also take abnormally long because the system is syncing these pending writes, but you must allow it to take its time as explained in the section below.
Note: In historical versions of hydrus (
"},{"location":"Understanding_Database_Synchronization.html#the_secret_sauce","title":"The Secret Sauce","text":"synchronous=2
), performance was terrible because hydrus would agressively (it was arguably somewhat paranoid) write changes to disk.Setting the synchronous to 0 lets the database engine defer writing to disk as long as physically possible. In the normal operation of your system, files are constantly being partially transfered to disk, even if the OS pretends they have been fully written to disk. This is called write cache and it is really important to use it or your system's performance would be terrible. The caveat is that until you have \"
synced
\" the disk cache, the changes to files are not actually in permanent storage. One purpose of a normal shutdown of the operating system is to make sure all disk caches have been flushed and synced. A program can also request that a file it has just written to be flushed or synced, and it will wait until that is done before continuing.When not in synchronous 0 mode, the database engine syncs at regular intervals to make sure data has been written. - Setting synchronous to 0 is generally safe if and only if the system also shuts down normally, allowing any of these pending writes to be flushed. - The database can back out of partial changes if hydrus crashes even if
"},{"location":"Understanding_Database_Synchronization.html#technical_explanation","title":"Technical Explanation","text":"synchronous=0
, so your database will not go corrupt from hydrus shutting down abnormally, only from the system shutting down abnormally.Programmers are responsible for handling partially written files, but this is tedious for large complex data, so they use a database engine which handles all of this. The database ensures that any partially written data is reversible to a known state (called a rollback).
An existing file may be in 3 possible states:
- Unflushed: Contents is owned by the program writing the file, but control returns immediately to the program instead of waiting for a full write. Content can be transitioned from unflushed to flushed using
fflush(FILE)
.fflush()
is called automatically when a programmer closes a file, or exits the program normally(under most runtimes but not for example in Java). If the program exits abnormally before data is flushed it will be lost when the program crashes. - Flushed: Pending write to permenant storage but memory has been transfered to the operating system. Data will not be lost if the calling program crashes, since the OS promises it will \"eventually\" arrive on disk before returning from
fflush()
. When you \"safely shutdown:, you are instructing the OS among other things to sync the flushed files. If someone decides to read a file before it has been synced the OS will read the contents up until the flush from the flush buffer, and return that instead of what is actually on disk. If the OS crashes due to error or power failure, data that are flushed but not synced will be lost. - Synced: Written to permenant storage. A programmer may request that the contents of the file be synced, or it is done gradually over time to free the OS buffers
To ensure the consistency of the database and rollback when needed, the database engine keeps a journal of what it is doing. Each transaction ends in a
flush
which may be followed by async
. Insynchronous=2
there is a sync after EVERYCOMMIT
, forsynchronous=1
it depends on the journal mode, often enough to maintian consistanc, but not after every commit. The flush ensures that everything written before the flush will occur before the line that indicates the transaction completed. The sync ensures that the entire contents of the transaction has been written to permenant storage before proceeding. The OS is not obligated to write chunks of the database file in the order it recieves them. It only guarantees that if you flush, everything submitted before the flush happens first, and everything submitted after the flush happens next.The sync is what is controlled by the
"},{"location":"Understanding_Database_Synchronization.html#an_example_journal","title":"An example journal","text":"synchronous
switch. Allowing the database to ignore whether sync actually completes is the magic that makessynchronous=0
so dang fast.- Begin Transaction 1
- Write Change 1
- Write Change 2
- Read data
- Write Change 3
- End Transaction 1
Each of these steps are performed in order. Suppose a crash occcured mid writing
- Begin Transaction 1
- Write Change 1
- Write Cha
When the database resumes it will start scanning the journal at step 1. Since it will reach the end without seeing
End Transaction 1
it knows that data was only partialy written, and can put the data back in the state before transaction 1 began. This property of a database is called atomicity in the sense that something atomic is \"indivisible\"; either all of the steps in transaction 1 occur or non of them occur.Hydrus is structured in such a way that the database is written to to keep track of your file catalog only once the file has been fully imported and moved where it is supposed to be. Thus every action hydrus takes is kept \"atomic\" or \"repeatable\" (redo existing work that was partway through). If hydrus crashes in the middle of importing a file, then when it resumes, as far as it is aware, it didn't even start importing the file. It will repeat the steps from the start until the file catalog is \"consistent\" with what is on disk.
"},{"location":"Understanding_Database_Synchronization.html#where_synchronization_comes_in","title":"Where synchronization comes in","text":"Let's revisit the journal, this time with two transactions. Note that the database is syncing on step 8 and thus will have to wait for the OS to write to disk before proceeding, holding up transaction 2, and any other access to the database.
- Begin Transaction 1
- Write Change 1
- Write Change 2
- Read data
- Write Change 3
- FLUSH
- End Transaction 1
- SYNC
- Begin Transaction 2
- Write Change 2
- Write Change 2
- Read data
- Write Change 3
- FLUSH
- End Transaction 2
- SYNC
What happens if we remove step 8 and then die at step 11?
- Begin Transaction 1
- Write Change 1
- Write Change 2
- Read data
- Write Change 3
- FLUSH
- End Transaction 1
- SYNC
- Begin Transaction 2
- Write Change 2
- Write Ch
What if we crash ,
End Transaction 1
possibly has not been written to disk. Now not only do we need to repeat transaction 2, we also need to repeat transaction 1. Note that this just increases the ammount of repeatable work, and actually is fully recoverable (assuming a file you were downloading didn't cease to exist in the interim).Now what happens if we do the above and the OS crashes? As written we are actually glossing over a number of steps that happen in step 8. Actually the database must make a few syncs to be sure the database is reversible. The steps are roughly speaking
- Write and sync rollback
- Update database file with changes
- Sync database file
- Remove rollback/update WAL checkpoint
If sqlite crashes, but the OS doesn't that's fine all of this in flight data is in the OS write buffer and the OS will pretend as if it is on disc. But what if We haven't even finished creating a rollback for the changes made in step 1 and step 2 starts partially changing the database file? Then bam power failure. We now can't revert the database because we don't have a complete rollback, but we also can't move forward in time either because we don't have a marker showing the completion of transaction 2. So we are stuck in the middle of an incomplete transaction, and have lost the data necessary to leave either end.
See also: https://www.sqlite.org/atomiccommit.html#section_6_2
Thus if the OS crashes at the exact wrong moment, there is no way to be sure that the journal is correct if syncing was skipped (
synchronous=0
). This means there is no way for you to determine whether the database file is correct after a system crash if you had synchronous 0, and you MUST restore your files from backup as this will be the ONLY WAY to know they are in a known good state.So, setting
"},{"location":"about_docs.html","title":"About These Docs","text":"synchronous=0
gets you a pretty huge speed boost, but you are gambling that everything goes perfectly and will pay the price of a manual restore every time it doesn't.The Hydrus docs are built with MkDocs using the Material for MkDocs theme. The .md files in the
"},{"location":"about_docs.html#local_setup","title":"Local Setup","text":"docs
directory are converted into nice html in thehelp
directory. This is done automatically in the built releases, but if you run from source, you will want to build your own.To see or work on the docs locally, install
mkdocs-material
:The recommended installation method is
pip
:
"},{"location":"about_docs.html#building","title":"Building","text":"pip install mkdocs-material\n
To build the help, run:
In the base hydrus directory (same as themkdocs build -d help\n
mkdocs.yml
file), which will build it into thehelp
directory. You will then be good!Repeat the command and MkDocs will clear out the old directory and rebuild it, so you can fold this into any update script.
"},{"location":"about_docs.html#live_preview","title":"Live Preview","text":"To edit the
docs
directory, you can run the live preview development server with:mkdocs serve \n
Again in the base hydrus directory. It will host the help site at http://127.0.0.1:8000/, and when you change a file, it will automatically rebuild and reload the page in your browser.
"},{"location":"access_keys.html","title":"PTR access keys","text":"The PTR is now run by users with more bandwidth than I had to give, so the bandwidth limits are gone! If you would like to talk with the new management, please check the discord.
A guide and schema for the new PTR is here.
"},{"location":"access_keys.html#first_off","title":"first off","text":"I don't like it when programs I use connect anywhere without asking me, so I have purposely not pre-baked any default repositories into the client. You have to choose to connect yourself. The client will never connect anywhere until you tell it to.
For a long time, I ran the Public Tag Repository myself and was the lone janitor. It grew to 650 million tags, and siblings and parents were just getting complicated, and I no longer had the bandwidth or time it deserved. It is now run by users.
There also used to be just one user account that everyone shared. Everyone was essentially the same Anon, and all uploads were merged to that one ID. As the PTR became more popular, and more sophisticated and automatically generated content was being added, it became increasingly difficult for the janitors to separate good submissions from bad and undo large scale mistakes.
That old shared account is now a 'read-only' account. This account can only download--it cannot upload new tags or siblings/parents. Users who want to upload now generate their own individual accounts, which are still Anon, but separate, which helps janitors approve and deny uploaded petitions more accurately and efficiently.
I recommend using the shared read-only account, below, to start with, but if you decide you would like to upload, making your own account is easy--just click the 'check for automatic account creation' button in services->manage services, and you should be good. You can change your access key on an existing service--you don't need to delete and re-add or anything--and your client should quickly resync and recognise your new permissions.
"},{"location":"access_keys.html#privacy","title":"privacy","text":"I have tried very hard to ensure the PTR respects your privacy. Your account is a very barebones thing--all a server stores is a couple of random hexadecimal texts and which rows of content you uploaded, and even the memory of what you uploaded is deleted after a delay. The server obviously needs to be aware of your IP address to accept your network request, but it forgets it as soon as the job is done. Normal users are never told which accounts submitted any content, so the only privacy implications are against janitors or (more realistically, since the janitor UI is even more buggy and feature-poor than the hydrus front-end!) the server owner or anyone else with raw access to the server as it operates or its database files.
Most users should have very few worries about privacy. The general rule is that it is always healthy to use a VPN, but please check here for a full discussion and explanation of the anonymisation routine.
"},{"location":"access_keys.html#ssd","title":"a note on resources","text":"Danger
If your database files are stored on an HDD, or your SSD does not have at least 96GB of free space, do not add the PTR!
The PTR has been operating since 2011 and is now huge, more than two billion mappings! Your client will be downloading and indexing them all, which is currently (2021-06) about 6GB of bandwidth and 50GB of hard drive space. It will take hours of total processing time to catch up on all the years of submissions. Furthermore, because of mechanical drive latency, HDDs are too slow to process all the content in reasonable time. Syncing is only recommended if your hydrus db is on an SSD. It doesn't matter if you store your jpegs and webms and stuff on an external HDD; this is just your actual .db database files (normally in install_dir/db folder). Note also that it is healthier if the work is done in small pieces in the background, either during idle time or shutdown time, rather than trying to do it all at once. Just leave it to download and process on its own--it usually takes a couple of weeks to quietly catch up. If you happen to see it working, it will start as fast as 50,000 rows/s (with some bumps down to 1 rows/s as it commits data), and eventually it will slow, when fully synced, to 100-1,000 rows/s. You'll see tags appear on your files as processing continues, first on older, then all the way up to new files just uploaded a couple days ago. Once you are synced, the daily processing work to stay synced is usually just a few minutes. If you leave your client on all the time in the background, you'll likely never notice it.
"},{"location":"access_keys.html#easy_setup","title":"easy setup","text":"Hit help->add the public tag repository and you will all be set up.
"},{"location":"access_keys.html#manually","title":"manually","text":"Hit services->manage services and click add->hydrus tag repository. You'll get a panel, fill it out like this:
Here's the info so you can copy it:
address
portptr.hydrus.network\n
access key45871\n
4a285629721ca442541ef2c15ea17d1f7f7578b0c3f4f5f2a05f8f0ab297786f\n
Note that because this is the public shared key, you can ignore the 'DO NOT SHARE' red text warning.
It is worth checking the 'test address' and 'test access key' buttons just to double-check your firewall and key are all correct. Notice the 'check for automatic account creation' button, for if and when you decide you want to contribute to the PTR.
Then you can check your PTR at any time under services->review services, under the 'remote' tab:
"},{"location":"access_keys.html#quicksync","title":"jump-starting an install","text":"A user kindly manages a store of update files and pre-processed empty client databases. If you want to start a new client that syncs with the PTR (i.e. you have not started this new client and not imported any files yet), this will get you going quicker. This is generally recommended for advanced users or those following a guide, but if you are otherwise interested, please check it out:
https://cuddlebear92.github.io/Quicksync/
"},{"location":"adding_new_downloaders.html","title":"adding new downloaders","text":""},{"location":"adding_new_downloaders.html#anonymous","title":"all downloaders are user-creatable and -shareable","text":"Since the big downloader overhaul, all downloaders can be created, edited, and shared by any user. Creating one from scratch is not simple, and it takes a little technical knowledge, but importing what someone else has created is easy.
Hydrus objects like downloaders can sometimes be shared as data encoded into png files, like this:
This contains all the information needed for a client to add a realbooru tag search entry to the list you select from when you start a new download or subscription.
You can get these pngs from anyone who has experience in the downloader system. An archive is maintained here.
To 'add' the easy-import pngs to your client, hit network->downloaders->import downloaders. A little image-panel will appear onto which you can drag-and-drop these png files. The client will then decode and go through the png, looking for interesting new objects and automatically import and link them up without you having to do any more. Your only further input on your end is a 'does this look correct?' check right before the actual import, just to make sure there isn't some mistake or other glaring problem.
Objects imported this way will take precedence over existing functionality, so if one of your downloaders breaks due to a site change, importing a fixed png here will overwrite the broken entries and become the new default.
"},{"location":"advanced.html","title":"general clever tricks","text":"this is non-comprehensive
I am always changing and adding little things. The best way to learn is just to look around. If you think a shortcut should probably do something, try it out! If you can't find something, let me know and I'll try to add it!
"},{"location":"advanced.html#advanced_mode","title":"advanced mode","text":"To avoid confusing clutter, several advanced menu items and buttons are hidden by default. When you are comfortable with the program, hit help->advanced mode to reveal them!
"},{"location":"advanced.html#exclude_deleted_files","title":"exclude deleted files","text":"In the client's options is a checkbox to exclude deleted files. It recurs pretty much anywhere you can import, under 'import file options'. If you select this, any file you ever deleted will be excluded from all future remote searches and import operations. This can stop you from importing/downloading and filtering out the same bad files several times over. The default is off. You may wish to have it set one way most of the time, but switch it the other just for one specific import or search.
"},{"location":"advanced.html#ime","title":"inputting non-english lanuages","text":"If you typically use an IME to input Japanese or another non-english language, you may have encountered problems entering into the autocomplete tag entry control in that you need Up/Down/Enter to navigate the IME, but the autocomplete steals those key presses away to navigate the list of results. To fix this, press Insert to temporarily disable the autocomplete's key event capture. The autocomplete text box will change colour to let you know it has released its normal key capture. Use your IME to get the text you want, then hit Insert again to restore the autocomplete to normal behaviour.
"},{"location":"advanced.html#tag_display","title":"tag display","text":"If you do not like a particular tag or namespace, you can easily hide it with tags->manage tag display and search:
This image is out of date, sorry!
You can exclude single tags, like as shown above, or entire namespaces (enter the colon, like 'species:'), or all namespaced tags (use ':'), or all unnamespaced tags (''). 'all known tags' will be applied to everything, as well as any repository-specific rules you set.
A blacklist excludes whatever is listed; a whitelist excludes whatever is not listed.
This censorship is local to your client. No one else will experience your changes or know what you have censored.
"},{"location":"advanced.html#importing_with_tags","title":"importing and adding tags at the same time","text":"Add tags before importing on file->import files lets you give tags to the files you import en masse, and intelligently, using regexes that parse filename:
This should be somewhat self-explanatory to anyone familiar with regexes. I hate them, personally, but I recognise they are powerful and exactly the right tool to use in this case. This is a good introduction.
Once you are done, you'll get something neat like this:
Which you can more easily manage by collecting:
Collections have a small icon in the bottom left corner. Selecting them actually selects many files (see the status bar), and performing an action on them (like archiving, uploading) will do so to every file in the collection. Viewing collections fullscreen pages through their contents just like an uncollected search.
Here is a particularly zoomed out view, after importing volume 2:
Importing with tags is great for long-running series with well-formatted filenames, and will save you literally hours' finicky tagging.
"},{"location":"advanced.html#tag_migration","title":"tag migration","text":"Danger
At some point I will write some better help for this system, which is powerful. Be careful with it!
Sometimes, you may wish to move thousands or millions of tags from one place to another. These actions are now collected in one place: services->tag migration.
It proceeds from left to right, reading data from the source and applying it to the destination with the certain action. There are multiple filters available to select which sorts of tag mappings or siblings or parents will be selected from the source. The source and destination can be the same, for instance if you wanted to delete all 'clothing:' tags from a service, you would pull all those tags and then apply the 'delete' action on the same service.
You can import from and export to Hydrus Tag Archives (HTAs), which are external, portable .db files. In this way, you can move millions of tags between two hydrus clients, or share with a friend, or import from an HTA put together from a website scrape.
Tag Migration is a powerful system. Be very careful with it. Do small experiments before starting large jobs, and if you intend to migrate millions of tags, make a backup of your db beforehand, just in case it goes wrong.
This system was once much more simple, but it still had HTA support. If you wish to play around with some HTAs, there are some old user-created ones here.
"},{"location":"advanced.html#shortcuts","title":"custom shortcuts","text":"Once you are comfortable with manually setting tags and ratings, you may be interested in setting some shortcuts to do it quicker. Try hitting file->shortcuts or clicking the keyboard icon on any media viewer window's top hover window.
There are two kinds of shortcuts in the program--reserved, which have fixed names, are undeletable, and are always active in certain contexts (related to their name), and custom, which you create and name and edit and are only active in a media viewer when you want them to. You can redefine some simple shortcut commands, but most importantly, you can create shortcuts for adding/removing a tag or setting/unsetting a rating.
Use the same 'keyboard' icon to set the current and default custom shortcuts.
"},{"location":"advanced.html#finding_duplicates","title":"finding duplicates","text":"system:similar_to lets you run the duplicates processing page's searches manually. You can either insert the hash and hamming distance manually, or you can launch these searches automatically from the thumbnail right-click->find similar files menu. For example:
"},{"location":"advanced.html#file_import_errors","title":"truncated/malformed file import errors","text":"Some files, even though they seem ok in another program, will not import to hydrus. This is usually because they file has some 'truncated' or broken data, probably due to a bad upload or storage at some point in its internet history. While sophisticated external programs can usually patch the error (often rendering the bottom lines of a jpeg as grey, for instance), hydrus is not so clever. Please feel free to send or link me, hydrus developer, to these files, so I can check them out on my end and try to fix support.
If the file is one you particularly care about, the easiest solution is to open it in photoshop or gimp and save it again. Those programs should be clever enough to parse the file's weirdness, and then make a nice clean saved file when it exports. That new file should be importable to hydrus.
"},{"location":"advanced.html#password","title":"setting a password","text":"the client offers a very simple password system, enough to keep out noobs. You can set it at database->set a password. It will thereafter ask for the password every time you start the program, and will not open without it. However none of the database is encrypted, and someone with enough enthusiasm or a tool and access to your computer can still very easily see what files you have. The password is mainly to stop idle snoops checking your images if you are away from your machine.
"},{"location":"advanced_multiple_local_file_services.html","title":"multiple local file services","text":"The client lets you store your files in different overlapping partitions. This can help management workflows and privacy.
"},{"location":"advanced_multiple_local_file_services.html#the_problem","title":"what's the problem?","text":"Most of us end up storing all sorts of things in our clients, often from different parts of our lives. With everything in the same 'my files' domain, some personal photos might be sitting right beside nsfw content, a bunch of wallpapers, and thousands of comic pages. Different processing jobs, like 'go through those old vidya screenshots I imported' and 'filter my subscription files' and 'load up my favourite pictures of babes' all operate on the same gigantic list of files and must be defined through careful queries of tags, ratings, and other file metadata to separate what you want from what you don't.
The problem is aggravated the larger your client grows. When you are trying to sift the 500 art reference images out 850,000 random internet files from the last ten years, it can be difficult getting good tag counts or just generally browsing around without stumbling across other content. This particularly matters when you are typing in search tags, since the tag you want, 'anatomy drawing guide', is going to come with thousands of others, starting 'a...', 'an...', and 'ana...' as you type. If someone is looking over your shoulder as you load up the images, you want to preserve your privacy.
Wouldn't it be nice if you could break your collection into separate areas?
"},{"location":"advanced_multiple_local_file_services.html#file_domains","title":"multiple file domains","text":"tl;dr: you can have more than one 'my files', add them in 'manage services'.
A file domain (or file service) in the hydrus context, is, very simply, a list of files. There is a bit of extra metadata like the time each file was imported to the domain, and a ton of behind the scenes calculation to accelerate searching and aggregate autocomplete tag counts and so on, but overall, when you search in 'my files', you are telling the client \"find all the files in this list that have tag x, y, z on any tag domain\". If you switch to searching 'trash', you are then searching that list of trashed files.
A search page's tag domain is similar. Normally, you will be set to 'all known tags', which is basically the union of all your tag services, but if you need to, you can search just 'my tags' or 'PTR', which will make your search \"find all the files in my files that have tag x, y, z on my tags\". You are setting up an intersection of a file and a tag domain.
Changing the tag domain to 'PTR' or 'all known tags' would make for a different blue circle with a different intersection of search results ('PTR' probably has a lot more 'pretty dress', although maybe not for your files, and 'all known tags', being the union of all the blue circles, will make the same or larger intersection).
This idea of dynamically intersecting domains is very important to hydrus. Each service stands on its own, and the 'my tags' domain is not linked to 'my files'. It does not care where its tagged files are. When you delete a file, no tags are changed. But when you delete a file, the 'file domain' circle will shrink, and that may change the search results in the intersection.
With multiple local file services, you can create new file lists beyond 'my files', letting you make different red circles. You can move and copy files between your local file domains to make new sub-collections and search them separately for a very effective filter.
You can add and remove them under services->manage services:
"},{"location":"advanced_multiple_local_file_services.html#sfw","title":"what does this actually mean?","text":"I think the best simple idea for most regular users is to try a sfw/nsfw split. Make a new 'sfw' local file domain and start adding some images to it. You might eventualy plan to send all your sfw images there, or just your 'IRL' stuff like family photos, but it will be a separate area for whitelisted safe content you are definitely happy for others to glance at.
Search up some appropriate images in your collection and then add them to 'sfw':
This 'add' command is a copy. The files stay in 'my files', but they also go to 'sfw'. You still only have one file on your hard drive, but the database has its identifier in both file lists. Now make a new search page, switch it to 'sfw', and try typing in a search.
The tag results are limited to the files we added to 'sfw'. Nothing from 'my files' bleeds over. The same is true of a file search. Note the times the file was added to 'my files' and 'sfw' are both tracked.
Also note that these files now have two 'delete' commands. You will be presented with more complicated delete and undelete dialogs for files in multiple services. Files only end up in the trash when they are no longer in any local file domain.
You can be happy that any search in this new domain--for tags or files--is not going to provide any unexpected surprises. You can also do 'system:everything', 'system:limit=64' for a random sample, or any other simple search predicate for browsing, and the search should run fast and safe.
If you want to try multiple local file services out, I recommend this split to start off. If you don't like it, you can delete 'sfw' later with no harm done.
Note
While 'add to y' copies the files, 'move from x to y' deletes the files from the original location. They get a delete timestamp (\"deleted from my files 5 minutes ago\"), and they can be undeleted or 'added' back, and they will get their old import timestamp back.
"},{"location":"advanced_multiple_local_file_services.html#using_it","title":"using it","text":"The main way to add and move files around is the thumbnail/media viewer right-click menu.
You can make shortcuts for the add/move operations too. Check file->shortcuts and then the 'media actions' set.
In the future, I expect to have more ways to move files around, particularly integration into the archive/delete filter, and ideally a 'file migration' system that will allow larger operations such as 'add all the files in search x to place y'.
I also expect to write a system to easily merge clients together. Several users already run several different clients to get their 'my files' separation (e.g. a sfw client and a nsfw client), and now we have this tech supported in one client, it makes a lot of efficiency sense to merge them together.
Note that when you select a file domain, you can select 'multiple locations'. This provides the union of whichever domains you like. Tag counts will be correct but imprecise, often something like 'blonde hair (2-5)', meaning 'between two and five files', due to the complexity of quickly counting within these complicated domains.
As soon as you add another local file service, you will also see a 'all my files' service listed in the file domain selector. This is a virtual service that provides a very efficient and accurate search space of the union of all your local file domains.
This whole system is new. I will keep working on it, including better 'at a glance' indications of which files are where (current thoughts are custom thumbnail border colours and little indicator icons). Let me know how you get on with it!
"},{"location":"advanced_multiple_local_file_services.html#meta_file_domains","title":"advanced: a word on the meta file domains","text":"If you are in help->advanced mode, your file search file domain selectors will see 'all known files'. This domain is similar to 'all known tags', but it is not useful for normal browsing. It represents not filtering your tag services by any file list, fetching all tagged file results regardless of what your client knows about them.
If you search 'all known files'/'PTR', you can search all the files the PTR knows about, the vast majority of which you will likely never import. The client will show these files with a default hydrus thumbnail and offer very limited information about them. For file searches, this search domain is only useful for debug and janitorial purposes. You cannot combine 'all known files' with 'all known tags'. It also has limited sibling/parent support.
You can search for deleted files under 'multiple domains' too. These may or may not still be in your client, so they might get the hydrus icon again. You won't need to do this much, but it can be super useful for some maintenance operations like 'I know I deleted this file by accident, what was its URL so I can find it again?'.
Another service is 'all local files'. This is a larger version of 'all my files'. It essentially means 'all the files on your hard disk', which strictly means the union of all the files in your local file domains ('my files' and any others you create, i.e. the 'all my files' domain), 'repository updates' (which stores update files for hydrus repository sync), and 'trash'. This search can be useful for some advanced maintenance jobs.
If you select 'repository updates' specifically, you can inspect this advanced domain, but I recommend you not touch it! Otherwise, if you search 'all local files', repository files are usually hidden from view.
Your client looks a bit like this:
graph TB\n A[all local files] --- B[repository updates]\n A --- C[all my files]\n C --- D[local file domains]\n A --- E[trash]
Repository files, your media, and the trash are actually mutually exclusive. When a file is imported, it is added to 'all local files' and either repository updates or 'all my files' and one or more local file domains. When it is deleted from all of those, it is taken from 'all my files' and moved to trash. When trashed files are cleared, the files are removed from 'trash' and then 'all local files' and thus your hard disk.
"},{"location":"advanced_multiple_local_file_services.html#advanced","title":"more advanced usage","text":"Warning
Careful! It is easy to construct a massively overcomplicated Mind Palace here that won't actually help you due to the weight of overhead. If you want to categorise things, tags are generally better. But if you do want strict search separations for speed, workflow, or privacy, try this out.
If you put your files through several layers of processing, such as
inbox/archive->tags->rating
, it might be helpful to create different file domains for each step. I have seen a couple of proposals like this that I think make sense:graph LR\n A[inbox] --> B[sfw processing]\n A --> C[nsfw processing]\n B --> D[sfw archive]\n C --> E[nsfw archive]
Where the idea would be to make the 'is this sfw/nsfw?' choice early, probably at the same time as archive/delete, and splitting files off to either side before doing tagging and rating. I expect to expand the 'archive/delete' filter to support more actions soon to help make these workflows easy.
File Import Options allows you to specify which service it will import to. You can even import to multiple, although that is probably a bit much. If your inbox filters are overwhelming you--or each other--you might like to have more than one 'landing zone' for your files:
graph LR\n A[subscription and gallery inbox] --> B[archive]\n B --- C[sfw]\n D[watcher inbox] --> B\n E[hard drive inbox] --> B\n F[that zip of cool architecture photos] --> C
Some users have floated the idea of storing your archive on one drive and the inbox on another. This makes a lot of sense for network storage situations--the new inbox could be on a local disk, but the less-accessed archive on cheap network storage. File domains would be a great way to manage this in future, turning the workflow into nice storage commands.
Another likely use of this in future is in the Client API, when sharing with others. If you were to put the files you wanted to share in a file domain, and the Client API were set up to search just on that domain, this would guarantee great privacy. I am still thinking about this, and it may ultimately end up just being something that works that way behind the scenes.
"},{"location":"advanced_parents.html","title":"tag parents","text":"graph LR\n A[inbox] --> B[19th century fishman conspiracy theory evidence]\n A --> C[the mlp x sonic hyperplex]\n A --> D[extremely detailed drawings of hands and feet]\n A --> E[normal stuff]\n E --- F[share with dave]
Tag parents let you automatically add a particular tag every time another tag is added. The relationship will also apply retroactively.
"},{"location":"advanced_parents.html#the_problem","title":"what's the problem?","text":"Tags often fall into certain heirarchies. Certain tags always imply other tags, and it is annoying and time-consuming to type them all out individually every time.
As a basic example, a
car
is avehicle
. It is a subset. Any time you see a car, you also see a vehicle. Similarly, arifle
is afirearm
,face tattoo
impliestattoo
, andspecies:pikachu
impliesspecies:pok\u00e9mon
which also impliesseries:pok\u00e9mon
.Another way of thinking about this is considering what you would expect to see when you search these terms. If you search
vehicle
, you would expect the result to include allcars
. If you searchseries:league of legends
, you would expect to see all instances ofcharacter:ahri
(even if, on rare occasion, she were just appearing in cameo or in a crossover).For hydrus terms,
character x is in series y
is a common relationship, as iscostume x is of character y
:graph TB\n C[series:metroid] --- B[character:samus aran] --- A[character:zero suit samus]
In this instance, anything with
character:zero suit samus
would also havecharacter:samus aran
. Anything withcharacter:samus aran
(and thus anything withcharacter:zero suit samus
) would haveseries:metroid
.Remember that the reverse is not true. Samus comes inextricably from Metroid, but not everything Metroid is Samus (e.g. a picture of just Ridley).
Even a small slice of these relationships can get complicated:
graph TB\n A[studio:blizzard entertainment]\n A --- B[series:overwatch]\n B --- B1[character:dr. angela 'mercy' ziegler]\n B1 --- B1b[character:pink mercy]\n B1 --- B1c[character:witch mercy]\n B --- B2[character:hana 'd.va' song]\n B2 --- B2b[\"character:d.va (gremlin)\"]\n A --- C[series:world of warcraft]\n C --- C1[character:jaina proudmoore]\n C1 --- C1a[character:dreadlord jaina]\n C --- C2[character:sylvanas windrunner]
Some franchises are bananas:
Also, unlike siblings, which as we previously saw are
n->1
, some tags have more than one implication (n->n
):graph TB\n A[adjusting clothes] --- B[adjusting swimsuit]\n C[swimsuit] --- B
adjusting swimsuit
implies both aswimsuit
andadjusting clothes
. Consider howadjusting bikini
might fit on this chart--perhaps this:graph TB\n A[adjusting clothes] --- B[adjusting swimsuit]\n A --- E[adjusting bikini]\n C[swimsuit] --- B\n F[bikini] --- E\n D[swimwear] --- C\n D --- F
Note this is not a loop--like with siblings, loops are not allowed--this is a family tree with three 'generations'.
adjusting bikini
is a child to bothbikini
andadjusting clothes
, andbikini
is a child to the newswimwear
, which is also a parent toswimsuit
.adjusting bikini
andadjusting swimsuit
are both grandchildren toswimwear
.This can obviously get as complicated and over-engineered as you like, but be careful of being too confident. Reasonable people disagree on what is 'clearly' a parent or sibling, or what is an excessive level of detail (e.g.
person:scarlett johansson
may begender:female
, if you think that useful, butspecies:human
,species:mammal
, andspecies:animal
may be going a little far). Beyond its own intellectual neatness, ask yourself the purpose of what you are creating.Of course you can create any sort of parent tags on your local tags or your own tag repositories, but this sort of thing can easily lead to arguments between reasonable people on a shared server like the PTR.
Just like with normal tags, try not to create anything 'perfect' or stray away from what you actually search with, as it usually ends up wasting time. Act from need, not toward purpose.
"},{"location":"advanced_parents.html#tag_parents","title":"tag parents","text":"Let's define the child-parent relationship 'C->P' as saying that tag P is the semantic superset/superclass of tag C. All files that have C should also have P, without exception.
Any file that has C should appear to have P. Any search for P will include all of C implicitly.
Tags can have multiple parents, and multiple tags have the same parent. Loops are not allowed.
Note
In hydrus, tag parents are virtual. P is not actually added to every file by C, it just appears as if it is. When you look at a file in manage tags, you will see the implication, just like you see how tags will be renamed by siblings, but you won't see the parent unless it actually happens to also be there as a 'hard' tag. If you remove a
C->P
parent relationship, all the implied P tags will disappear!It also takes a bunch of CPU to figure this stuff out. Please bear with this system, sometimes it can take time.
"},{"location":"advanced_parents.html#how_to_do_it","title":"how you do it","text":"Go to tags->manage tag parents:
Which looks and works just like the manage tag siblings dialog.
Note that when you hit ok, the client will look up all the files with all your added tag Cs and retroactively apply/pend the respective tag Ps if needed. This could mean thousands of tags!
Once you have some relationships added, the parents and grandparents will show indented anywhere you 'write' tags, such as the manage tags dialog:
"},{"location":"advanced_parents.html#remote_parents","title":"remote parents","text":"Whenever you add or remove a tag parent pair to a tag repository, you will have to supply a reason (like when you petition a tag). A janitor will review this petition, and will approve or deny it. If it is approved, all users who synchronise with that tag repository will gain that parent pair. If it is denied, only you will see it.
"},{"location":"advanced_parents.html#parent_favourites","title":"parent 'favourites'","text":"As you use the client, you will likely make several processing workflows to archive/delete your different sorts of imports. You don't always want to go through things randomly--you might want to do some big videos for a bit, or focus on a particular character. A common search page is something like
[system:inbox, creator:blah, limit:256]
, which will show a sample of a creator in your inbox, so you can process just that creator. This is easy to set up and save in your favourite searches and quick to run, so you can load it up, do some archive/delete, and then dismiss it without too much hassle.But what happens if you want to search for multiple creators? You might be tempted to make a large OR search predicate, like
creator:aaa OR creator:bbb OR creator:ccc OR creator:ddd
, of all your favourite creators so you can process them together as a 'premium' group. But if you want to add or remove a creator from that long OR, it can be cumbersome. And OR searches can just run slow sometimes. One answer is to use the new tag parents tools to apply a 'favourite' parent on all the artists and then search for that favourite.Let's assume you want to search bunch of 'creator' tags on the PTR. What you will do is:
- Create a new 'local tag service' in manage services called 'my parent favourites'. This will hold our subjective parents without uploading anything to the PTR.
- Go to tags->manage where tag siblings and parents apply and add 'my parent favourites' as the top priority for parents, leaving 'PTR' as second priority.
-
Under tags->manage tag parents, on your 'my parent favourites' service, add:
creator:aaa->favourite:aesthetic art
creator:bbb->favourite:aesthetic art
creator:ccc->favourite:aesthetic art
creator:ddd->favourite:aesthetic art
Watch/wait a few seconds for the parents to apply across the PTR for those creator tags.
-
Then save a new favourite search of
[system:inbox, favourite:aesthetic art, limit:256]
. This search will deliver results with any of the child 'creator' tags, just like a big OR search, and real fast!
If you want to add or remove any creators to the 'aesthetic art' group, you can simply go back to tags->manage tag parents, and it will apply everywhere. You can create more umbrella/group tags if you like (and not just creators--think about clothing, or certain characters), and also use them in regular searches when you just want to browse some cool files.
"},{"location":"advanced_siblings.html","title":"tag siblings","text":"Tag siblings let you replace a bad tag with a better tag.
"},{"location":"advanced_siblings.html#the_problem","title":"what's the problem?","text":"Reasonable people often use different words for the same things.
A great example is in Japanese names, which are natively written surname first.
character:ayanami rei
andcharacter:rei ayanami
have the same meaning, but different users will use one, or the other, or even both.Other examples are tiny syntactic changes, common misspellings, and unique acronyms:
- smiling and smile
- staring at camera and looking at viewer
- pokemon and pok\u00e9mon
- jersualem and jerusalem
- lotr and series:the lord of the rings
- marimite and series:maria-sama ga miteru
- ishygddt and i sure hope you guys don't do that
A particular repository may have a preferred standard, but it is not easy to guarantee that all the users will know exactly which tag to upload or search for.
After some time, you get this:
Without continual intervention by janitors or other experienced users to make sure y\u2287x (i.e. making the yellow circle entirely overlap the blue by manually giving y to everything with x), searches can only return x (blue circle) or y (yellow circle) or x\u2229y (the lens-shaped overlap). What we really want is x\u222ay (both circles).
So, how do we fix this problem?
"},{"location":"advanced_siblings.html#tag_siblings","title":"tag siblings","text":"Let's define a relationship, A->B, that means that any time we would normally see or use tag A or tag B, we will instead only get tag B:
Note that this relationship implies that B is in some way 'better' than A.
"},{"location":"advanced_siblings.html#more_complicated","title":"ok, I understand; now confuse me","text":"This relationship is transitive, which means as well as saying
A->B
, you can also sayB->C
, which impliesA->C
andB->C
.graph LR\n A[lena_oxton] --> B[lena oxton] --> C[character:tracer];
In this case, everything with 'lena_oxton' or 'lena oxton' will show 'character:tracer' instead.
You can also have an
A->C
andB->C
that does not includeA->B
.graph LR\n A[d.va] --> C[character:hana 'd.va' song]\n B[hana song] --> C
The outcome of these two arrangements is the same--everything ends up as C.
Many complicated arrangements are possible (and inevitable, as we try to merge many different communities' ideal tags):
graph LR\n A[angela_ziegler] --> B[angela ziegler] --> I[character:dr. angela 'mercy' ziegler]\n C[\"angela_ziegler_(overwatch)\"] --> B\n D[character:mercy] --> I\n E[\"character:mercy (overwatch)\"] --> I\n F[dr angela ziegler] --> I\n G[\"character:\u30de\u30fc\u30b7\u30fc\uff08\u30aa\u30fc\u30d0\u30fc\u30a6\u30a9\u30c3\u30c1\uff09\"] --> E\n H[overwatch mercy] --> I
Note that if you say
"},{"location":"advanced_siblings.html#how_to_do_it","title":"how you do it","text":"A->B
, you cannot also sayA->C
. This is ann->1
relationship. Many things can point to a single ideal, but a tag cannot have more than one ideal. Also, obviously, these graphs are non-cyclic--no loops.Just open tags->manage tag siblings, and add a few.
The client will automatically collapse the tagspace to whatever you set. It'll even work with autocomplete, like so:
Please note that siblings' autocomplete counts may be slightly inaccurate, as unioning the count is difficult to quickly estimate.
The client will not collapse siblings anywhere you 'write' tags, such as the manage tags dialog. You will be able to add or remove A as normal, but it will be written in some form of \"A (B)\" to let you know that, ultimately, the tag will end up displaying in the main gui as B:
Although the client may present A as B, it will secretly remember A! You can remove the association A->B, and everything will return to how it was. No information is lost at any point.
"},{"location":"advanced_siblings.html#remote_siblings","title":"remote siblings","text":"Whenever you add or remove a tag sibling pair to a tag repository, you will have to supply a reason (like when you petition a tag). A janitor will review this petition, and will approve or deny it. If it is approved, all users who synchronise with that tag repository will gain that sibling pair. If it is denied, only you will see it.
"},{"location":"advanced_sidecars.html","title":"sidecars","text":"Sidecars are files that provide additional metadata about a master file. They typically share the same basic filename--if the master is 'Image_123456.jpg', the sidecar will be something like 'Image_123456.txt' or 'Image_123456.jpg.json'. This obviously makes it easy to figure out which sidecar goes with which file.
Hydrus does not use sidecars in its own storage, but it can import data from them and export data to them. It currently supports raw data in .txt files and encoded data in .json files, and that data can be either tags or URLs. I expect to extend this system in future to support XML and other metadata types such as ratings, timestamps, and inbox/archive status.
We'll start with .txt, since they are simpler.
"},{"location":"advanced_sidecars.html#importing_sidecars","title":"Importing Sidecars","text":"Imagine you have some jpegs you downloaded with another program. That program grabbed the files' tags somehow, and you want to import the files with their tags without messing around with the Client API.
If your extra program can export the tags to a simple format--let's say newline-separated .txt files with the same basic filename as the jpegs, or you can, with some very simple scripting, convert to that format--then importing them to hydrus is easy!
Put the jpegs and the .txt files in the same directory and then drag and drop the directory onto the client, as you would for a normal import. The .txt files should not be added to the list. Then click 'add tags/urls with the import'. The sidecars are managed on one of the tabs:
This system can get quite complicated, but the essential idea is that you are selecting one or more sidecar
"},{"location":"advanced_sidecars.html#the_source_dialog","title":"The Source Dialog","text":"sources
, parsing their text, and sending that list of data to one hydrus servicedestination
. Most of the time you will be pulling from just one sidecar at a time.The
source
is a description of a sidecar to load and how to read what it contains.In this example, the texts are like so:
4e01850417d1978e6328d4f40c3b550ef582f8558539b4ad46a1cb7650a2e10b.jpg.txt
5e390f043321de57cb40fd7ca7cf0cfca29831670bd4ad71622226bc0a057876.jpg.txtflowers\nlandscape\nblue sky\n
fast car\nanime girl\nnight sky\n
Since our sidecars in this example are named (filename.ext).txt, and use newlines as the separator character, we can leave things mostly as default.
If you do not have newline-separated tags, for instance comma-separated tags (
flowers, landscape, blue sky
), then you can set that here. Be careful if you are making your own sidecars, since any separator character obviously cannot be used in tag text!If your sidecars are named (filename).txt instead of (filename.ext).txt, then just hit the checkbox, but if the conversion is more complicated, then play around with the filename string converter and the test boxes.
If you need to, you can further process the texts that are loaded. They'll be trimmed of extra whitespace and so on automatically, so no need to worry about that, but if you need to, let's say, add the
"},{"location":"advanced_sidecars.html#the_router_dialog","title":"The Router Dialog","text":"creator:
prefix to everything, or filter out some mis-parsed garbage, this is the place.A 'Router' is a single set of orders to grab from one or more sidecars and send to a destination. You can have several routers in a single import or export context.
You can do more string processing here, and it will apply to everything loaded from every sidecar.
The destination is either a tag service (adding the loaded strings as tags), or your known URLs store.
"},{"location":"advanced_sidecars.html#previewing","title":"Previewing","text":"Once you have something set up, you can see the results are live-loaded in the dialog. Make sure everything looks all correct, and then start the import as normal and you should see the tags or URLs being added as the import works.
It is good to try out some simple situations with one or two files just to get a feel for the system.
"},{"location":"advanced_sidecars.html#import_folders","title":"Import Folders","text":"If you have a constant flow of sidecar-attached media, then you can add sidecars to Import Folders too. Do a trial-run of anything you want to parse with a manual import before setting up the automatic system.
"},{"location":"advanced_sidecars.html#exporting_sidecars","title":"Exporting Sidecars","text":"The rules for exporting are similar, but now you are pulling from one or more hydrus service
sources
and sending to a singledestination
sidecar every time. Let's look at the UI:I have chosen to select these files' URLs and send them to newline-separated .urls.txt files. If I wanted to get the tags too, I could pull from one or more tag services, filter and convert the tags as needed, and then output to a .tags.txt file.
The best way to learn with this is just to experiment. The UI may seem intimidating, but most jobs don't need you to work with multiple sidecars or string processing or clever filenames.
"},{"location":"advanced_sidecars.html#json_files","title":"JSON Files","text":"JSON is more complicated than .txt. You might have multiple metadata types all together in one file, so you may end up setting up multiple routers that parse the same file for different content, or for an export you might want to populate the same export file with multiple kinds of content. Hydrus can do it!
"},{"location":"advanced_sidecars.html#importing","title":"Importing","text":"Since JSON files are richly structured, we will have to dip into the Hydrus parsing system:
If you have made a downloader before, you will be familiar with this. If not, then you can brave the help or just have a play around with the UI. In this example, I am getting the URL(s) of each JSON file, which are stored in a list under the
file_info_urls
key.It is important to paste an example JSON file that you want to parse into the parsing testing area (click the paste button) so you can test on read data live.
Once you have the parsing set up, the rest of the sidecar UI is the same as for .txt. The JSON Parsing formula is just the replacement/equivalent for the .txt 'separator' setting.
Note that you could set up a second Router to import the tags from this file!
"},{"location":"advanced_sidecars.html#exporting","title":"Exporting","text":"In Hydrus, the exported JSON is typically a nested Object with a similar format as in the Import example. You set the names of the Object keys.
Here I have set the URLs of each file to be stored under
metadata->urls
, which will make this sort of structure:{\n \"metadata\" : {\n \"urls\" : [\n \"http://example.com/123456\",\n \"https://site.org/post/45678\"\n ]\n }\n}\n
The cool thing about JSON files is I can export multiple times to the same file and it will update it! Lets say I made a second Router that grabbed the tags, and it was set to export to the same filename but under
metadata->tags
. The final sidecar would look like this:{\n \"metadata\" : {\n \"tags\" : [\n \"blonde hair\",\n \"blue eyes\",\n \"skirt\"\n ],\n \"urls\" : [\n \"http://example.com/123456\",\n \"https://site.org/post/45678\"\n ]\n }\n}\n
You should be careful that the location you are exporting to does not have any old JSON files with conflicting filenames in it--hydrus will update them, not overwrite them! This may be an issue if you have an synchronising Export Folder that exports random files with the same filenames.
"},{"location":"advanced_sidecars.html#note_on_notes","title":"Note on Notes","text":"You can now import/export notes with your sidecars. Since notes have two variables--name and text--but the sidecars system only supports lists of single strings, I merge these together! If you export notes, they will output in the form 'name: text'. If you want to import notes, arrange them in the same form, 'name: text'.
If you do need to select a particular note out of many, see if a String Match (regex
^name:
) in the String Processor will do it.If you need to work with multiple notes that have newlines, I recommend you use JSON rather than txt. If you have to use txt on multiple multi-paragraph-notes, then try a different separator than newline. Go for
||||
or something, whatever works for your job.Depending on how awkward this all is, I may revise it.
"},{"location":"after_disaster.html","title":"Recovering After Disaster","text":""},{"location":"after_disaster.html#you_just_had_a_database_problem","title":"you just had a database problem","text":"I have helped quite a few users recover a mangled database from disk failure or accidental deletion. You just had similar and have been pointed here. This is a simple spiel on the next step that I, hydev, like to give people once we are done.
"},{"location":"after_disaster.html#what_next","title":"what next?","text":"When I was younger, I lost a disk with about 75,000 curated files. It really sucks to go through, and whether you have only had a brush with death or lost tens or hundreds of thousands of files, I know exactly how you have been feeling. The only thing you can change now is the future. Let's make sure it does not happen again.
The good news is the memory of that sinking 'oh shit' feeling is a great motivator. You don't want to feel that way again, so use that to set up and maintain a proper backup regime. If you have a good backup, the worst case scenario, even if your whole computer blows up, is usually just a week's lost work.
So, plan to get a good external USB drive and figure out a backup script and a reminder to ensure you never forget to run it. Having a 'backup day' in your schedule works well, and you can fold in other jobs like computer updates and restarts at the same time. It takes a bit of extra 'computer budget' every year and a few minutes a week, but it is absolutely worth the peace of mind it brings.
Here's the how to backup help, if you want to revisit it. If you would like help setting up FreeFileSync or ToDoList or other similar software, let me know.
This is also a great time to think about backing up other things in your life. All of your documents, family photos, your password manager file--are they backed up? Would you be ok with losing them if their drive failed tomorrow? Movies and music will need a real drive, but your smaller things like documents can also fit on an (encrypted) USB stick that you can put in your wallet or keychain.
"},{"location":"changelog.html","title":"changelog","text":"Note
This is the new changelog, only the most recent builds. For all versions, see the old changelog.
"},{"location":"changelog.html#version_597","title":"Version 597","text":""},{"location":"changelog.html#misc","title":"misc","text":"- fixed an issue that caused non-empty hard drive file import file logs that were created before v595 (this typically affected import folders that are set to 'leave source alone, do not reattempt it' for any of the result actions) to lose track of their original import objects' unique IDs and thus, when given more items to possibly add (again, usually on an import folder sync), to re-add the same items one time over again and essentially double-up in size one time. this broke the ability to review the file log UI panel too, so users who noticed the behaviour was jank couldn't see what was going on. on update, all the newer duplicate items will be removed and you'll reset to the original 'already in db' etc.. stuff you had before. all file logs now check for and remove newer duplicates whenever they load or change contents. this happened because of the 'make file logs load faster' update in v595--it worked great for downloaders and subs, but local file imports use a slightly different ID system to differentiate separate objects and it was not updated correct
- the main text-fetching routine that failed to load the list UI in the above case can now recover from null results if this happens again
- file import objects now have some more safety code to ensure they are identifying themselves correctly on load
- did some more work on copying tags: the new 'always copy parents with tags' was not as helpful as I expected, so this is no longer the default when you hit Ctrl+C (it goes back to the old behaviour of just copying the top-line rows in your selection). when you open a tag selection 'copy' menu, it now lists as a separate item 'copy 2 selected and 3 parents' kind of thing if you do want parents. also, parents will no longer copy with their indent (wew), and the taglists are now deduped so you will not be inundated with tagspam. futhermore, the 'what tags do we have' taglist in the manage tags dialog, and favourites/suggestions taglists, are now more parent-aware and plugged into this system
- added Mr Bones to the frame locations list under
options->gui
. if you use him a lot, he'll now remember where he was and how big he was - also added
manage_times_dialog
,manage_urls_dialog
,manage_notes_dialog
, andexport_files_frame
to the list. they will all remember last size and position by default - the client now recovers from a missing frame location entry with a fallback and a note in the log
- rewrote the way the media viewer hover windows and their sub-controls are updated to the current media object. the old asynchronous pubsub is out, and synchronous Qt signals are in. fingers crossed this truly fixes the rare-but-annoying 'oh the ratings in the top-right hover aren't updating I guess' bug, but we'll see. I had to be stricter about the pipeline here, and I was careful to ensure it would be failsafe, so if you discover a media viewer with hover windows that simply won't switch media (they'd probably be frozen in a null state from viewer open), let me know the details!
- some built versions of the client seem unable to find their local help, so now, when a user asks to open a help page, if it seems to be missing locally, a little text with the paths involved is now written to the log
- all formulae now have a 'name/description' field. this is wholly decorative and simply appears in the single- or multi-line summary of the formula in UI. all formulae start with and will initialise with a blank label
- the generic 'edit formula' panel (the one where you can change the formula type) now has import/export buttons
- updated the ZIPPER UI to use a newer single-class 'queue list' widget rather than some ten year old 'still has some wx in it' scatter of gubbins
- added import/export/duplicate capability to the 'queue list' widget, and added it for ZIPPER formulae
- also added import/export/duplicate buttons to the 'edit string processor' list!!
- 'any characters' String Match objects now describe themselves with the 'such as' respective example string, with the new proviso that no String Match will give this string if it is stuck at the 'example string' default. you'll probably most see this in the manage url class dialog for components and parameters
- cleaned a bunch of this code generally
- fixed an issue fetching millisecond-precise timestamps in the
file_metadata
call when one of the timestamps had a null value (for instance if the file has no modified date of any kind registered) - in the various potential duplicates calls, some simple searches (usually when one/both of two searches are system:everything) are now optimised using the same routine that happens in UI
- the client api version is now 75
- for Win 7 users who run from source, I believe newer the program's newer virtual environments will no longer build in Win 7. it looks like a new version of psd-tools will not compile in python 3.8, and there's also some code in newer versions of the program that 3.8 simply won't run. I think the last version that works for you is v582. we've known this train was coming for a while, so I'm afraid Win 7 guys will have to freeze at that version unless and until they update Windows or move to Linux/macOS
- I have updated the 'running from source' help to talk about this, including adding the magic git line you need to choose a specific version rather than normal git pull. this is likely the last time I will specifically support Win 7, and I suspect I will sunset pyside2 and PyQt5 testing too
- I am releasing a future build alongside this release, just for Windows. it has new dlls for SQLite and mpv. advanced users are invited to test it out and tell me if there are any problems booting and playing media, and if there are no issues, I'll fold this into the normal build next week
- mpv: 2023-08-20 to 2024-10-20
- SQLite: 3.45.3 to 3.47.0
- these bring normal optimisations and bug fixes. I expect no huge problems (although I believe the mpv dll strictly no longer supports Win 7, but that is now moot), but please check and we'll see
- in prep for duplicates auto-resolution, the five variables that go into a potential duplicates search (two file searches, the search type, the pixel dupe requirement, and the max hamming distance) are now bundled into one nice clean object that is simpler to handle and will be easier to update in future. everything that touches this stuff--the page manager, the page UI (there's a whole edit panel for the new class), the filter itself, the Client API, the db search code, all the unit tests, and now the duplicates auto-resolution system--all works on this new thing rather than throwing list of variables around
- I pushed this forward in a bunch of ways. nothing actually works yet, still, but if you poke around in the advanced placeholder UI, you'll see the new potential duplicates search context UI, now with side-by-side file search context panels, for the fleshed-out pixel-perfect jpeg/png default
- due to an ill-planned parsing update, several downloaders' hash lookups (which allow the client to quickly determine 'already in db'/'previously deleted' sometimes) broke last week. they are fixed today, sorry for the trouble!
- the fps number on the file info line, which was previously rounded always to the nearest integer, is now reported to two sig figs when small. it'll say 1.2fps and 0.50fps
- I hacked in some collapse/expand tech into my static box layout that I use all over the place and tentatively turned it on, and defaulting to collapsed, in the bigger review services sub-panels. the giganto-tall repository panel is now much shorter by default, making the rest of the pages more normal sized on first open. let's see how it goes, and I expect I'll put it elsewhere too and add collapse memory and stuff if that makes sense
- the 'copy service key' on review services panels is now hidden behind advanced mode
- tweaked some layout sizers for some spinboxes (the number controls that have an up/down arrow on the side) and my 'noneable' spinboxes so they aren't so width-hesitant. they were not showing their numbers fully on some styles where the arrows were particularly wide. they mostly size stupidly wide now, but at least that lines up with pretty much everything else so the number of stupid layout problems we are dealing with has reduced by one
- the frame locations list under
options->gui
has four new buttons to mass-set 'remember size/position' and 'reset last size/position' to all selected - max implicit system:limit in
options->search
is raised from 100 thousand to 100 million - if there is a critical drive problem when adding a file to the file structure, the exact error is now spammed to a popup and log. previously, it was just propagated up to the caller
- I messed up the 'hex' and 'base64' decode stuff last week. we used to have hex and base64 decode back in python 2 to do some hash conversion stuff, but it was overhauled into the content parser hash type dropdown and the explict conversion was deprecated to a no-op. last week, I foolishly re-used the same ids when I revived the decoding functionality, which caused a bunch of old parsers like gelbooru 0.2.5, e621, 4chan, and likely others, which still had the no-op, to suddenly hex- or base-64-afy their parsed hashes, breaking the parse and lookup
- this week I redefined the hacky enums and generally cleaned this code, and I am deleting all hex and base64 string conversion decodes from all pre-596 parsers. this fixes all the old downloaders by explicitly deleting the no-op so it won't trouble us again
- if you made a string converter in v595 that decodes hex or base64, that encoding step will be deleted, sorry! I have to ask you to re-make it
- added a 'connect.bat' (and .sql file) to the db dir to make it easy to load up the whole database with 'correct' ATTACHED schema names in the sqlite3 terminal
- added
database->db maintenance->get tables using definitions
, which uses the long-planned database module rewrite maintenance tech ( basically a faux foreign key) to fetch every table that uses hash_ids or tag_ids along with the specific column name that uses the id. this will help with various advanced maintenance jobs where we need to clear off a particular master definition to, as for instance happened this week, reset a super-huge autoincrement value on the master hashes table. this same feature will eventually trim client.master.db by discovering which master definitions are no longer used anywhere (e.g. after PTR delete)
- thanks to the continuing efforts of the user making Ugoira improvements, the Client API's
/get_files/render
call will now render an Ugoira to apng or animated webp. note the apng conversion appears to take a while, so make sure you try both formats to see what you prefer - fixed a critical bug in the Client API where if you used the
file_id(s)
request parameter, and gave novel ids, the database was hitting emergency repair code and filling in the ids with pseudorandom recovery hashes. this wasn't such a huge deal, but if you put a very high number in, the autoincrementhash_id
of the hashes table would then move up to there, and if the number was sufficiently high, SQLite would have trouble because of max integer limits and all kinds of stuff blew up. asking about a non-existentfile_id
will now raise a 404, as originally intended - refactored the note set/delete calls, which were doing their own thing, to use the unified hash-parsing routine with the new safety code
- if the Client API is ever asked about a hash_id that is negative or over a ~quadrillion (1024^5), it now throws a special error
- as a backup, if the Client DB is ever asked about a novel hash_id that is negative or over a ~quadrillion (1024^5), it now throws a special error rather than trigger the pseudorandom hash recovery code
- the Client API version is now 74
- fleshed out the duplicates auto-resolution manager and plugged it into the main controller. the mainloop boots and exits now, but it doesn't do anything yet
- updated the multiple-file warning in the edit file urls dialog
- gave the Client API review services panel a very small user-friendliness pass
- I converted more old multi-column list display/sort generation code from the old bridge to the newer, more efficient separated calls for 10 of the remaining 43 lists to do
- via some beardy-but-I-think-it-is-ok typedefs, all the managers and stuff that take the controller as a param now use the new 'only import when linting'
ClientGlobals
Controller type, all unified through that one place, and in a way that should be failsafe, making for much better linting in any decent IDE. I didn't want to spam the 'only import when linting' blocks everywhere, so this was the compromise - deleted the
interface
modules with the Controller interface gubbins. this was an ok start of an idea, but the new Globals import trick makes it redundant - pulled and unified a bunch of the common
ManagerWithMainLoop
code up to the superclass and cleaned up all the different managers a bit - deleted
ClientMaintenance.py
, which was an old attempt to unify some global maintenance daemons that never got off the ground and I had honestly forgotten about - moved responsibility for the
remote_thumbnails
table to the Client Repositories DB module; it is also now plugged into the newer content type maintenance system - moved responsibility for the
service_info
table to the Client Services DB module - the only CREATE TABLE stuff still in the old Client DB creation method is the version table and the old YAML options structure, so we are essentially all moved to the new modules now
- fixed some bugs/holes in the table definition reporting system after playing with the new table export tool (some bad sibling/parent tables, wrongly reported deferred tables, missing notes_map and url_map due to a bad content type def, and the primary master definition tables, which I decided to include). I'm sure there are some more out there, but we are moving forward on a long-term job here and it seems to work
- thanks to a user who put in a lot of work, we finally have Ugoira rendering! all ugoiras will now animate using the hydrus native animation player. if the ugoira has json timing data in its zip (those downloaded with PixivUtil and gallery-dl will!), we will use that, but if it is just a zip of images (which is most older ugoiras you'll see in the wild), it'll check a couple of note names for the timing data, and, failing that, will assign a default 125ms per frame fallback. ugoiras without internal timing data will currently get no 'duration' metadata property, but right-clicking on them will show their note-based or simulated duration on the file info line
- all existing ugoiras will be metadata rescanned and thumbnail regenned on update
- technical info here: https://hydrusnetwork.github.io/hydrus/filetypes.html#ugoira
- ugoira metadata and thumbnail generation is cleaner
- a bug in ugoira thumbnail selection, when the file contains non-image files, is fixed
- a future step will be to write a special hook into the hydrus downloader engine to recognise ugoiras (typically on Pixiv) and splice the timing data into the zip on download, at which point we'll finally be able to turn on Ugoira downloading on Pixiv on our end. for now, please check out PixivUtil or gallery-dl to get rich Ugoiras
- I'd like to bake the simulated or note-based durations into the database somehow, as I don't like the underlying media object thinking these things have no duration, but it'll need more thought
- all multi-column lists now sort string columns in a caseless manner. a subscription called 'Tents' will now slot between 'sandwiches' and 'umbrellas'
- in 'favourite searches', the 'folder' name now has hacky nested folder support. just put '/' in the folder name and it'll make nested submenus. in future this will be implemented with a nicer tree widget
- file logs now load faster in a couple of ways, which should speed up UI session and subscriptions dialog load. previously, there were two rounds of URL normalisation on URL file import object load, one wasteful and one fixable with a cache; these are now dealt with. thanks to the users who sent in profiles of the subscriptions dialog opening; let me know how things seem now (hopefully this fixes/relieves #1612)
- added 'Swap in common resolution labels' to
options->media viewer
. this lets you turn off the '1080p' and '4k'-style label swap-ins for common resolutions on file descriptor strings - the 'are you sure you want to exit the client? 3 pages say \"I am still importing\"' popup now says the page names, and in a pretty way, and it shows multiple messages nicer
- the primary 'sort these tags in a human way m8' routine now uses unicode tech to sort things like \u00df better
- the String Converter can decode 'hex' and 'base64' again (so you can now do '68656c6c6f20776f726c64' or 'aGVsbG8gd29ybGQ=' to 'hello world'). these functions were a holdover from hash parsing in the python 2 times, but I've brushed them off and cleared out the 'what if we put raw bytes in the parsing system bro' nonsense we used to have to deal with. these types are now explictly UTF-8. I also added a couple unit tests for them
- fixed an options initialisation bug where setting two files in the duplicate filter as 'not related' was updating the A file to have the B file's file modified time if that was earlier!! if you have files in this category, you will be asked on update if you want to reset their file modified date back to what is actually on disk (the duplicate merge would not have overwritten this; this only happens if you edit the time in the times dialog by hand). a unit test now checks this situation. sorry for the trouble, and thank you to the user who noticed and reported this
- the hydrus Docker package now sets the 'hydrus' process to
autorestart=unexpected
. I understand this makesfile->exit
stick without an automatic restart. it seems like commanding the whole Docker image to shut down still causes a near-instant unclean exit (some SIGTERM thing isn't being caught right, I think), butfile->exit
should now be doable beforehand. we will keep working here
- the new 'replace selected with their OR' and the original 'add an OR of the selected' are now mutually exclusive, depending on whether the current selection is entirely in the active search list
- added 'start an OR with selected', which opens the 'edit OR predicate' panel on the current selection. this works if you only select one item, too
- added 'dissolve selected into single predicates', when you select only OR predicates. it does the opposite of the 'replace'
- the new OR menu gubbins is now in its own separated menu section on the tag right-click
- the indent for OR sub preds is moved up from two spaces to four
- wrote some help about the 'force page refetch' checkboxes in 'tag import options' here: https://hydrusnetwork.github.io/hydrus/getting_started_downloading.html#force_page_fetch
- added a new submenu
urls->force metadata refetch
that lets you quickly and automatically create a new urls downloader page with the selected files' 'x URL Class' urls with the tag import options set to the respective URLs' default but with these checkboxes all set for you. we finally have a simple answer to 'I messed up my tag parse, I need to redownload these files to get the tags'! - the urls menu offers the 'for x url class' even when only one file is selected now. crazy files with fifty of the same url class can now be handled
- wrote some placeholder UI for the new system. anyone who happens to be in advanced mode will see another tab on duplicate filter pages. you can poke around if you like, but it is mostly just blank lists that aren't plugged into anything
- wrote some placeholder help too. same deal, just a placeholder that you have to look for to find that I'll keep working on
- I still feel good about the duplicates auto-resolution system. there is much more work to do, but I'll keep iterating and fleshing things out
- the new
/get_files/file_path
command now returns thefiletype
andsize
of the file - updated the Client API help and unit tests for this
- client api version is now 73
- the library updates we've been testing the past few weeks have gone well, so I am rolling them into the normal builds for everyone. the libraries that do 'fetch stuff from the internet' and 'help python manage its packages' are being updated because of some security problems that I don't think matter for us at all (there's some persistent https verification thing in requests that I know we don't care about, and a malicious URL exploit in setuptools that only matters if you are using it to download packages, which, as I understand, we don't), but we are going to be good and update anyway
requests
is updated from2.31.0
to2.32.3
setuptools
is updated from69.1.1
to70.3.0
PyInstaller
is updated from6.2
to6.7
for Windows and Linux to handle the newsetuptools
- there do not appear to be any update conflicts with dlls or anything, so just update like you normally do. I don't think the new pyinstaller will have problems with older/weirder Windows, but let me know if you run into anything
- users who run from source may like to reinstall their venvs after pulling to get the new libraries too
- refactored
ClientGUIDuplicates
to a newduplicates
gui module and renamed it toClientGUIDuplicateActions
- harmonised some duplicates auto-resolution terminology across the client to exactly that form. not auto-duplicates or duplicate auto resolution, but 'duplicates auto-resolution'
- fixed some bad help link anchors
- clarified a couple things in the 'help my db is broke.txt' document
- updated the new x.svg to a black version; it looks a bit better in light & dark styles
- fixed an error that was stopping files from being removed sometimes (it also messed up thumbnail selection). it could even cause crashes! the stupid logical problem was in my new list code; it was causing the thumbnail grid backing list to get pseudorandomly poisoned with bad indices when a previous remove event removed the last item in the list
- the tag
right-click->search
menu, on a multiple selection of non-OR predicates that exists in its entirely in the current search context, now hasreplace selected with their OR
, which removes the selection and replaces it with an OR of them all! - the system predicate parser no longer removes all underscores from to-be-parsed text. this fixes parsing for namespaces, URLs, service names, etc.. with underscores in (issue #1610)
- fixed some bad layout in the edit predicates dialog for system:hash (issue #1590)
- fixed some content update logic for the advanced delete choices of 'delete from all local file domains' and 'physically delete now', where the UI-side thumbnail logic was not removing the file from the 'all my files' or 'all local files' domains respectively, which caused some funny thumbnail display and hide/show rules until a restart rebuilt the media object from the (correct) db source
- if you physically delete a file, I no longer force-remove it from view so enthusiastically. if you are looking at 'all known files', it should generally still display after the delete (and now it will properly recognise it is now non-local)
- I may have fixed an issue with page tab bar clicks on the very new Qt 6.8, which has been rolling out this week
- wrote out my two rules for tagging (don't be perfect, only tag what you search) to the 'getting started - more tags' help page: https://hydrusnetwork.github.io/hydrus/getting_started_more_tags.html#tags_are_for_searching_not_describing
- I cleaned up and think I fixed some SIGTERM and related 'woah, we have to shut down right now' shutdown handling. if a non-UI thread calls for the program to exit, the main 'save data now' calls are now all done by or blocked on that thread, with improved thread safety for when it does tell Qt to hide and save the UI and so on (issue #1601, but not sure I totally fixed it)
- added some SIGTERM test calls to
help->debug->tests
so we can explore this more in future - on the client, the managers for db maintenance, quick downloads, file maintanence, and import folders now shut down more gracefully, with overall program shutdown waiting for them to exit their loops and reporting what it is still waiting on in the exit splash (like it already does for subscriptions and tag display). as a side thing, these managers also start faster on program boot if you nudge their systems to do something
- wrote some unit tests to test my unique list and better catch stupid errors like I made last week
- added default values for the 'select from list of things' dialogs for: edit duplicate merge rating action; edit duplicate merge tag action; and edit url/parser link
- moved
FastIndexUniqueList
fromHydrusData
toHydrusLists
- fixed an error in the main import object if it parses (and desires to skip associating) a domain-modified 'post time' that's in the first week of 1970
- reworked the text for the 'focus the text input when you change pages' checkbox under
options->gui pages
and added a tooltip - reworded and changed tone of the boot error message on missing database tables if the tables are all caches and completely recoverable
- updated the twitter link and icon in
help->links
to X
- in a normal search page tag autocomplete input, search results will recognise exact-text-matches of their worse siblings for 'put at the top of the list' purposes. so, if you type 'lotr', and it was siblinged to 'series:lord of the rings', then 'series:lord of the rings' is now promoted to the top of the list, regardless of count, as if you had typed in that full ideal tag
- OR predicates are now multi-line. the top line is OR:, and then each sub-tag is now listed indented below. if you construct an OR pred using shift+enter in the tag autocomplete, this new OR does start to eat up some space, but if you are making crazy 17-part OR preds, maybe you'll want to use the OR button dialog input anyway
- when you right-click an OR predicate, the 'copy' menu now recognises this as '3 selected tags' etc.. and will copy all the involved tags and handle subtags correctly
- the 'remove/reset for all selected' file relationship menu is no longer hidden behind advanced mode. it being buried five layers deep is enough
- to save a button press, the manage tag siblings dialog now has a paste button for the right-side tag autocomplete input. if you paste multiple lines of content, it just takes the first
- updated the file maintenance job descriptions for the 'try to redownload' jobs to talk about how to deal with URL downloads that 404 or produce a duplicate and brushed up a bit of that language in general
- the new 'if a db job took more than 15 seconds, log it' thing now tests if the program was non-idle at the start or end of the db job, rather than just the end. this will catch some 'it took so long that some \"wake up\" stuff had time to kick in' instances
- fixed a typo where if the 'other' hashes were unknown, the 'sha512 (unknown)' label was saying 'md5 (unknown)'
- file import logs get a new 'advanced' menu option, tucked away a little, to 'renormalise' their contents. this is a maintenance job to clear out duplicate chaff on an existing list after the respective URL Class rules have changed to remove something in normalisation (e.g. setting a parameter to be ephemeral). I added a unit test for this also, but let me know how it works in the wild
- fixed the source time parsing for the gelbooru 0.2.0 (rule34.xxx and others) and gelbooru 0.2.5 (gelbooru proper) page parsers
- fixed the 'permits everything' API Permissions update from a couple weeks ago. it was supposed to set 'permits everything' when the existing permissions structure was 'mostly full', but the logic was bad and it was setting it when the permissions were sparse. if you were hit by this and did not un-set the 'permits everything' yourself in review services, you will get a yes/no prompt on update asking if you want to re-run the fixed update. if the update only missed out setting \"permits everything\" where it should have, you'll just get a popup saying it did them. sorry for missing this, my too-brief dev machine test happened to be exactly on the case of a coin flip landing three times on its edge--I've improved my API permission tests for future
- I got started on the db module that will handle duplicates auto-resolution. this started out feeling daunting, and I wasn't totally sure how I'd do some things, but I gave it a couple iterations and managed to figure out a simple design I am very happy with. I think it is about 25-33% complete (while object design is ~50-75% and UI is 0%), so there is a decent bit to go here, but the way is coming into focus
- updated my
SortedList
, which does some fast index lookup stuff, to handle more situations, optimised some remove actions, made it more compatible as a list drop-in replacement, moved it toHydrusData
, and renamed it toFastIndexUniqueList
- the autocomplete results system uses the new
FastIndexUniqueList
a bit for some cached matches and results reordering stuff - expanded my
TemporerIntegerTable
system, which I use to do some beardy 'executemany' SELECT statements, to support an arbitrary number of integer columns. the duplicate auto-resolution system is going to be doing mass potential pair set intersections, and this makes it simple - thanks to a user, the core
Globals
files get some linter magic that lets an IDE do good type checking on the core controller classes without running into circular import issues. this reduced project-wide PyCharm linter warnings from like 4,500 to 2,200 wew - I pulled the
ServerController
andTestController
gubbins out ofHydrusGlobals
into their own 'Globals' files in their respective modules to ensure other module-crawlers (e.g. perhaps PyInstaller) do not get confused about what they are importing here, and to generally clean this up a bit - improved a daemon unit test that would sometimes fail because it was not waiting long enough for the daemon to finish. I cut some other fat and it is now four or five seconds faster too
- the 'read' autocomplete dropdown has a new one-click 'clear search' button, just beside the favourites 'star' menu button. the 'empty page' favourite is removed from new users' defaults
- in an alteration to the recent Autocomplete key processing, Ctrl+c/Ctrl+Insert will now propagate to the results list if you currently have none of the text input selected (i.e. if it would have been a no-op on the text input, we assume you wanted whatever is selected in the list)
- in the normal thumbnail/viewer menu and review services, the 'files' entry is renamed to 'locations'. this continues work in the left hand button of the autocomplete dropdown where you set the 'location', which can be all sorts of complicated things these days, rather than just 'file service key selector'. I don't think I'll rename 'my files' or anything, but I will try to emphasise this 'locations' idea more when I am talking about local file domains etc.. in other places going forward; what I often think of as 'oh yeah the files bit' isn't actually referring to the files themselves, but where they are located, so let's be precise
- last week's tag pair filtering in tags->migrate tags now has 'if either the left or right of the pair have count', and when you hit 'Go' with any of the new count filter checkboxes hit, the preview summary on the yes/no confirmation dialog talks about it
- any time a watcher subject is parsed, if the text contains non-decoded html entities (like
>
), they are now auto-converted to normal chars. these strings are often ripped from odd places and are only used for user display, so this just makes that simpler - if you are set to remove trashed files from view, this now works when the files are in multpile local file domains, and you choose 'delete from all local file services', and you are looking at 'all my files' or a subset of your local file domains
- we now log any time (when the client is non-idle) that a database job's work inside the transaction wrapper takes more than 15 seconds to complete
- fixed an issue caused by the sibling or parents system doing some regen work at an unlucky time
- thanks to user help, the derpibooru post parser now additionally grabs the raw markdown of a description as a second note. this catches links and images better than the html string parse. if you strictly only want one of these notes, please feel free to dive into network->downloaders->defailt import options for your derpi downloader and try to navigate the 'note import options' hell I designed and let me know how it could be more user friendly
- added a new NESTED formula type. this guy holds two formulae of any type internally, parsing the document with the first and passing those results on to the second. it is designed to solve the problem of 'how do I parse this JSON tucked inside HTML' and vice versa. various encoding stuff all seems to be handled, no extra work needed
- added Nested formula stuff to the 'how to make a downloader' help
- made all the screenshot in the parsing formula help clickable
- renamed the COMPOUND formula to ZIPPER formula
- all the 'String Processor' buttons across the program now have copy and paste buttons, so it is now easy to duplicate some rules you set up
- in the parsing system, sidecar importer, and clipboard watcher, all strings are now cleansed of errant 'surrogate' characters caused by the source incorrectly providing utf-16 garbage in a utf-8 stream. fingers crossed, the cleansing here will actually fix problem characters by converting them to utf-8, but we'll see
- thanks to a user, the JSON parsing system has a new 'de-minify json' parsing rule, which decompresses a particular sort of minified JSON that expresses multiply-referenced values using list positions. as it happened that I added NESTED formulae this week, I wonder if we will migrate this capability to the string processing system, but let's give it time to breathe
- fixed the permission check on the new 'get file/thumbnail local path' commands--due to me copy/pasting stupidly, they were still just checking 'search files' perm
- added
/get_files/local_file_storage_locations
, which spits out the stuff in database->move media files and lets you do local file access en masse - added help and a unit test for this new command
- the client api version is now 72
- the 'old' OpenCV version in the
(a)dvanced
setup, which pointed to version 4.5.3.56, which had the webp vulnerability, is no longer an option. I believe this means that the program will no longer run on python 3.7. I understad Win 7 can run python 3.8 at the latest, so we are nearing the end of the line on that front - the old/new Pillow choice in
(a)dvanced
setup, which offered support for python 3.7, is removed - I have added a new question to the
(a)dvanced
venv setup to handle misc 'future' tests better, and I added a new future test for two security patches forsetuptools
andrequests
: - A)
setuptools
is updated to 70.3.0 (from 69.1.1) to resolve a security issue related to downloading packages from bad places (don't think this would ever affect us, but we'll be good) - B)
requests
is updated to 2.32.3 (from 2.31.0) to resolve a security issue with verify=False (the specific problem doesn't matter for us, but we'll be good) - if you run from source and want to help me test, you might like to rebuild your venv this week and choose the new future choice. these version increments do not appear to be a big deal, so assuming no problems I will roll these new libraries into a 'future' test build next week, and then into the normal builds a week after
- did a bunch more
super()
refactoring. I think all__init__
is now converted across the program, and I cleared all the normal calls in the canvas and media results panel code too - refactored
ClientGUIResults
into four files for the core class, the loading, the thumbnails, and some menu gubbins. also unified the mish-mash ofResults
andMediaPanel
nomenclature toMediaResultsPanel
- fixed a stupid oversight with last week's \"move page focus left/right after closing tab\" thing where it was firing even when the page closed was not the current tab!! it now correctly only moves your focus if you close the current tab, not if you just middle click some other one
- fixed the share->export files menu command not showing if you right-clicked on just one file
- cleaned some of the broader thumbnail menu code, separating the 'stuff to show if we have a focus' and 'stuff to show if we have a selection'; the various 'manage' commands now generally show even if there is no current 'focus' in the preview (which happens if you select with ctrl+click or ctrl+a and then right-click in whitespace)
- the 'migrate tags' dialog now allows you to filter the sibling or parent pairs by whether the child/worse or parent/ideal tag has actual mapping counts on an arbitrary tag service. some new unit tests ensure this capability
- fixed an error in the duplicate metadata merge system where if files were exchanging known URLs, and one of those URLs was not actually an URL (e.g. it was garbage data, or human-entered 'location' info), a secondary system that tried to merge correlated domain-based timestamps was throwing an exception
- to reduce comma-confusion, the template for 'show num files and import status' on page names is now \"name - (num_files - import_status)\"
- the option that governs whether page names have the file count after them (under options->gui pages) has a new choice--'show for all pages, but only if greater than zero'--which is now the default for new users
- broke up the over-coupled 'migrate tags' unit tests into separate content types and the new count-filtering stuff
- cleaned up the 'share' menu construction code--it was messy after some recent rewrites
- added some better error handling around some of the file/thumbnail path fetching/regen routines
- the client api gets a new permissions state this week: the permissions structure you edit for an access key can now be (and, as a convenient default, starts as) a simple 'permits everything' state. if the permissions are set to 'permit everything', then this overrules all the specific rules and tag search filter gubbins. nice and simple, and a permissions set this way will automatically inherit new permissions in the future. any api access keys that have all the permissions up to 'edit ratings' will be auto-updated to 'permits everything' and you will get an update saying this happened--check your permissions in review services if you need finer control
- added a new permission,
13
, for 'see local paths' - added
/get_files/file_path
, which fetches the local path of a file. it needs the new permission - added
/get_files/thumbnail_path
, which fetches the local path of a thumbnail and optionally the filetype of the actual thumb (jpeg or png). it needs the new permission - the
/request_new_permissions
command now accepts apermits_everything
bool as a selective alternate to thebasic_permissions
list - the
/verify_access_key
command now responds with the name of the access key and the newpermits_everything
value - the API help is updated for the above
- new unit tests test all the above
- the Client API version is now 71
- the main
ClientLocalServerResources
file has been getting too huge (5,000 lines), so I've moved it andClientLocalServer
to their ownapi
module and broken the Resources file up into core functions, the superclass, and the main verbs - fixed permissions check for
/manage_popups/update_popup
, which was checking for pages permission rather than popup permission - did a general linting pass of these easier-to-handle files; cleaned up some silly stuff
- the 'check now' button in manage subscriptions is generally more intelligent and now offers questions around paused status: if all the selected queries are DEAD, it now asks you if you want to resurrect them with a yes/no variant of the DEAD/ALIVE question (previously it just did it); if you are in edit subscriptions and any of the selected subs are paused, it now asks you if you want to include them (and unpause) in the check now, and if not it reduces the queries examined for the DEAD/ALIVE question appropriately (previously it just did their queries, and did not unpause); in either edit subscriptions or edit subscription, if any queries in the selection after any 'paused subs' or 'DEAD/ALIVE' filtering are paused, it asks you if you want to include (and unpause) them in the check now (previously it just did and unpaused them all)
- if you shrink the search page's preview window down to 0 size (which it will suddenly snap to, and which is a silghtly different hide state to the one caused by double-left-clicking the splitter sash), the preview canvas will now recognise it is hidden and no longer load media as you click on thumbs. previously this thing was loading noisy videos in the background etc..
- the
StringMatch
'character set' match type now has 'hexadecimal characters' (^[\\da-fA-F]+$
) and 'base-64 characters' (^[a-zA-Z\\d+/]+={0,2}$
) in its dropdown choice - the 'gui pages' options panel now has 'when closing tabs, move focus (left/right)', so if you'd rather move left when middle-clicking tabs etc.., you can now set it, and if your style's default behaviour is whack and never moved to the right before despite you wanting it, now you can force it; it is now explicit either way. let me know if any crazy edge-case focus logic happens in this mode with nested page of pages or whatever
- when you right-click a file, in the share->copy hash menu, the md5, sha1, and sha512 hashes are now loaded from the database, usually in the milliseconds after the menu is opened, and shown in the menu labels for quick human reference. if your client does not have these hashes for the file, it says so
- the 'share' thumbnail menu is now visible on non-local files. it is severely truncated, basically just shows copy hash/file_id stuff
- wrote a 'Current Deleted Pending Petitioned' section for the Developer API to discuss how the states in the content storage system overlap and change in relation to various commands in the content update pipeline https://hydrusnetwork.github.io/hydrus/developer_api.html#CDPP It may be of interest to non-API-devs who are generally interested in what exactly the 'pending' state etc.. is
- if the file import options in a hard drive import page currently imports to an empty location context (e.g. you deleted the local file service it wanted to import to), the import page now pauses and presents an appropriate error text. the URL importers already did this, so this is the hdd import joining them
- this 'check we are good to do file work' test in the importer pages now in all cases pursues a 'default' file import options to the actual real one that will be used, so if your importer file import options are borked, this is now detected too and the importer will pause rather than fail everything in its file log
- thanks to a user, fixed a typo bug in the new multi-column list work that was causing problems when looking at gallery logs that included mis-linked log entries. in general, the main 'turn this integer into something human' function will now handle errors better
- advanced/technical, tl;dr: x.com URLs save better now. since a better fix will take more work, the 'x post' URL class is for now set to associate URLs. this fixes the association of x.com URLs when those are explicitly referred to as source URLs in a booru post. previously, some hydrus network engine magic related to how x URLs are converted to twitter URLs (and then fx/vxtwitter URLs) to get parsed by the twitter parser was causing some problems. a full 'render this URL as this URL' system will roll out in future to better handle this situation where two non-API URLs can mean the same thing. this will result in some twitter/x post URL duplication--we'll figure out a nice merge later!
- I have written the first skeleton of the
MetadataConditional
object. it has a rule based on a system predicate (like 'width > 400px') and returns True/False when you give it a media object. this lego-brick will plug into a variety of different systems in future, including the duplicate auto-resolution system, with a unified UI - system predicates cannot yet do this arbitrarily, so it will be future work to fill out this code. to start with, I've just got system:filetype working to ultimately effect the first duplicate auto-resolution job of 'if pixel duplicates and one is jpeg, one png, then keep the jpeg'
- add some unit tests to test this capability
- refactored the main
Predicate
object and friends to the newClientSearchPredicate
- refactored the main
NumberTest
object and friends to the newClientNumberTest
- refactored the main
TagContext
object and friends to the newClientTagContext
- refactored the main
FileSearchContext
object and friends to the newClientSearchFileSearchContext
- moved some other
ClientSearch
stuff to other places and renamed the original file toClientSearchFavourites
; it now just contains the favourite searches manager - some misc cleanup around here. some enums had bad names, that sort of thing
- the similar-files search maintenance code has an important update that corrects tree rebalancing for a variety of clients that initialised with an unlucky first import file. in the database update, I will check if you were affected here and immediately optimise your tree if so. it might take a couple minutes if you have millions of files
- tag parent and sibling changes now calculate faster at the database level. a cache that maintains the structure of which pairs need to be synced is now adjusted with every parent/sibling content change, rather than regenerated. for the PTR, I believe this will save about a second of deferred CPU time on an arbitrary parent/sibling change for the price of about 2MB of memory, hooray. fingers crossed, looking at the tags->sibling/parent sync->review panel while repository processing is going on will now be a smooth-updating affair, rather than repeated 'refreshing...' wait-flicker
- the 'the pairs you mean to add seem to connect to some loops' auto-loop-resolution popup in the manage siblings/parents dialogs will now only show when it is relevent to pairs to be added. previously, this thing was spamming during the pre-check of the process of the user actually breaking up loops by removing pairs
- added an item, 'sync now', to the tags->sibling/parent sync menu. this is a nice easy way to force 'work hard' on all services that need work. it tells you if there was no work to do
- reworked the 'new page chooser' mini-dialog and better fixed-in-place the intended static 3x3 button layout
- showing 'all my files' and 'local files' in the 'new page chooser' mini-dialog is now configurable in options->pages. previously 'local files' was hidden behind advanced mode. 'all my files' will only ever show if you have more than one local files domain
- when a login script fails with 401 or 403, or indeed any other network error, it now presents a simpler error in UI (previously it could spam the raw html of the response up to UI)
- generally speaking, the network job status widget will now only show the first line of any status text it is given. if some crazy html document or other long error ends up spammed to this thing, it should now show a better summary
- the 'filename' and 'first/second/etc.. directory' checkbox-and-text-input controls in the filename tagging panel now auto-check when you type something in
- the 'review sibling/parent sync' and 'manage where tag siblings and parents apply' dialogs are now plugged into the 'default tag service' system. they open to this tab, and if you are set to update it to the last seen, they save over the value on changes
- fixed the default safebooru file page parser to stop reading undesired '?' tags for every namespace (they changed their html recently I think)
- catbox 'collection' pages are now parseable by default
- fixed an issue with showing the 'manage export folders' dialog. sorry for the trouble--in my list rewrite, I didn't account for one thing that is special for this list and it somehow slipped through testing. as a side benefit, we are better prepped for a future update that will support column hiding and rearranging
- optimised about half of the new multi-column lists, as discussed last week. particularly included are file log, gallery log, watcher page, gallery page, and filename tagging panel, which all see a bunch of regular display/sort updates. the calls to get display data or sort data for a row are now separate, so if the display code is CPU expensive, it won't slow a sort
- in a couple places, url type column is now sorted by actual text, i.e. file url-gallery url-post url-watchable url, rather than the previous conveniently ordered enum. not sure if this is going to be annoying, so we'll see
- the filename tagging list no longer sorts the tag column by tag contents, instead it just does '#''. this makes this list sort superfast, so let's see if it is super annoying, but since this guy often has 10,000+ items, we may prefer the fast sort/updates for now
- the
/add_files/add_file
command now has adelete_after_successful_import
parameter, default false, that does the same as the manual file import's similar checkbox. it only works on commands with apath
parameter, obviously - updated client api help and unit tests to test this
- client api version is now 70
- I cleaned up a mash of ancient shortcut-processing jank in the tag autocomplete input and fixed some logic. everything is now processed through one event filter, the result flags are no longer topsy-turvy, and the question of which key events are passed from the text input to the results list is now a simple strict whitelist--basically now only up/down/page up/page down/home/end/enter (sometimes)/escape (sometimes) and ctrl+p/n (for legacy reasons) are passed to the results list. this fixes some unhelpful behaviour where you couldn't select text and ctrl+c it unless the results list was empty (since the list was jumping in, after recent updates, and saying 'hey, I can do ctrl+c, bro' and copying the currently selected results)
- the key event processing in multi-column lists is also cleaned up from the old wx bridge to native Qt handling
- and some crazy delete handling in the manage urls dialog is cleaned up too
- the old
EVT_KEY_DOWN
wx bridge is finally cleared out of the program. I also cleared out some other old wx event definitions that have long been replaced. mostly we just have some mouse handling and window state changes to deal with now - replaced many of my ancient static inheritance references with python's
super()
gubbins. I disentangled all the program's multiple inheritance into super() and did I think about half of the rest. still like 360__init__
lines to do in future - a couple of the 'noneable text' widgets that I recently set to have a default text, in the subscription dialogs, now use that text as placeholder rather than actual default. having 'my subscription' or whatever is ok as a guide, but when the user actually wants to edit, having it be real text is IRL a pain
- refactored the repair file locations dialog and manage options dialog and new page picker mini-dialog to their own python files
- tl;dr: big lists faster now. you do not need to do anything
- every multi-column list in the program (there's about 75 of them) now works on a more sophisticated model (specifically, we are updating from QTreeWidget to QTreeView). instead of the list storing and regenerating display labels for every single row of a table, only the rows that are currently in view are generally consulted. sort events are similarly extremely fast, with off-screen updates virtualised and deferred
- in my tests, a list with 170,000 rows now sorts in about four seconds. my code is still connected to a non-optimised part of the old system, so I hope to improve gains with background cleanup work in coming months. I believe I can make it work at least twice as fast in places, particularly in initialisation
- multi-column lists are much better about initialising/migrating the selection 'focus' (the little, usually dotted-line-border box that says where keyboard focus is) through programmatic insertions and deletes and sorts
- column headers now show the up/down 'sort' arrows using native style. everything is a bit more Qt-native and closer to C++ instead of my old custom garbage
- none of this changes anything to do with single-column lists across the program, which are still using somewhat jank old code. my taglist in particular is an entirely custom object that is neat in some ways but stuck in place by my brittle design. the above rewrite was tricky in a couple of annoying ways but overall very worth doing, so I expect to replicate it elsewhere. another open choice is rewriting the similarly entirely custom thumbnail canvas to a proper Qt widget with a QLayout and such. we'll see how future work goes
- fixed the 'show' part of 'pages->sidebar and preview panels->show/hide sidebar and preview panel', which was busted last week in the page relayout cleanup
- I think I fixed the frame of flicker (usually a moment of page-wide autocomplete input) you would sometimes get when clicking a 'show these files' popup message files button
- fixed the new shimmie parser (by adding a simpler dupe and wangling the example urls around) to correctly parse r34h tags
- I think I may have fixed some deadlocks and/or mega-pauses in the manage tag parents/siblings dialogs when entering pairs causes a dialog (a yes/no confirmation, or the 'enter a reason' input) to pop up
- I think I have fixed the 'switch between fullscreen borderless and regular framed window' command when set to the 'media_viewer' shortcut set. some command-processing stuff wasn't wired up fully after I cleared out some old hacks a while ago
- the manage tag parents dialog has some less janky layout as it is expanded/shrunk
- if similar files search tree maintenance fails to regenerate a branch, the user is now told to try running the full regen
- the full similar files search tree regen now filters out nodes with invalid phashes (i.e. due to database damage), deleting those nodes fully and printing all pertinent info to the log, and tells the user what to do next
- you can now regen the similar files search tree on an empty database without error, lol
- while I was poking around lists, I fixed a bit of bad error handling when you try to import a broken serialised data png to a multi-column list
- the
/get_files/search_files
command now supportsinclude_current_tags
andinclude_pending_tags
, mirroring the buttons on the normal search interface (issue #1577) - updated the help and unit tests to check these new params
- client api version is now 69
The hydrus client now supports a very simple API so you can access it with external programs.
"},{"location":"client_api.html#enabling_the_api","title":"Enabling the API","text":"By default, the Client API is not turned on. Go to services->manage services and give it a port to get it started. I recommend you not allow non-local connections (i.e. only requests from the same computer will work) to start with.
The Client API should start immediately. It will only be active while the client is open. To test it is running all correct (and assuming you used the default port of 45869), try loading this:
http://127.0.0.1:45869
You should get a welcome page. By default, the Client API is HTTP, which means it is ok for communication on the same computer or across your home network (e.g. your computer's web browser talking to your computer's hydrus), but not secure for transmission across the internet (e.g. your phone to your home computer). You can turn on HTTPS, but due to technical complexities it will give itself a self-signed 'certificate', so the security is good but imperfect, and whatever is talking to it (e.g. your web browser looking at https://127.0.0.1:45869) may need to add an exception.
The Client API is still experimental and sometimes not user friendly. If you want to talk to your home computer across the internet, you will need some networking experience. You'll need a static IP or reverse proxy service or dynamic domain solution like no-ip.org so your device can locate it, and potentially port-forwarding on your router to expose the port. If you have a way of hosting a domain and have a signed certificate (e.g. from Let's Encrypt), you can overwrite the client.crt and client.key files in your 'db' directory and HTTPS hydrus should host with those.
Once the API is running, go to its entry in services->review services. Each external program trying to access the API will need its own access key, which is the familiar 64-character hexadecimal used in many places in hydrus. You can enter the details manually from the review services panel and then copy/paste the key to your external program, or the program may have the ability to request its own access while a mini-dialog launched from the review services panel waits to catch the request.
"},{"location":"client_api.html#tools_created_by_hydrus_users","title":"Tools created by hydrus users","text":""},{"location":"client_api.html#browser_add-on","title":"Browser Add-on","text":"- Hydrus Companion: a Chrome/Firefox extension for hydrus that allows easy download queueing as you browse and advanced login support
- Hydrus Web: a web client for hydrus (allows phone browsing of hydrus)
- Hyshare: a way to share small galleries with friends--a replacement for the old 'local booru' system
- Hydra Vista: a macOS client for hydrus
- LoliSnatcher: a booru client for Android that can talk to hydrus
- Anime Boxes: a booru browser, now supports adding your client as a Hydrus Server
- FlipFlip: an advanced slideshow interface, now supports hydrus as a source
- Hydrus Archive Delete: Archive/Delete filter in your web browser
- hydownloader: Hydrus-like download system based on gallery-dl
- hydrus-dd: DeepDanbooru tagging for Hydrus
- wd-e621-hydrus-tagger: More AI tagging, with more models
- Hydrus Video Deduplicator: Discovers duplicate videos in your client and queues them for the duplicate filter
- tagrank: Shows you comparison images and cleverly ranks your favourite tag.
- hyextract: Extract archives from Hydrus and reimport with tags and URL associations
- Send to Hydrus: send URLs from your Android device to your client
- Iwara-Hydrus: a userscript to simplify sending Iwara videos to Hydrus Network
- dolphin-hydrus-actions: Adds Hydrus right-click context menu actions to Dolphin file manager.
- more projects on github
I welcome all your bug reports, questions, ideas, and comments. It is always interesting to see how other people are using my software and what they generally think of it. Most of the changes every week are suggested by users.
You can contact me by email, twitter, discord, or the release threads on 8chan or Endchan--I do not mind which. Please know that I have difficulty with social media, and while I try to reply to all messages, it sometimes takes me a while to catch up.
If you need it, here's my public GPG key.
The Github Issue Tracker was turned off for some time, as it did not fit my workflow and I could not keep up, but it is now running again, managed by a team of volunteer users. Please feel free to submit feature requests there if you are comfortable with Github. I am not socially active on Github, please do not ping me there.
I am on the discord on Saturday afternoon, USA time, if you would like to talk live, and briefly on Wednesday after I put the release out. If that is not a good time for you, please leave me a DM and I will get to you when I can. There are also plenty of other hydrus users who idle who can help with support questions.
I delete all tweets and resolved email conversations after three months. So, if you think you are waiting for a reply, or I said I was going to work on something you care about and seem to have forgotten, please do nudge me.
I am always overwhelmed by work and behind on my messages. This is not to say that I do not enjoy just hanging out or talking about possible new features, but forgive me if some work takes longer than expected or if I cannot get to a particular idea quickly. In the same way, if you encounter actual traceback-raising errors or crashes, there is only one guy to fix it, so I prefer to know ASAP so I can prioritise.
I work by myself because I have acute difficulty working with others. Please do not spontaneously write long design documents or prepare other work for me--I find it more stressful than helpful, every time, and I won't give it the attention it deserves. If you would like to contribute time to hydrus, the user projects like the downloader repository and wiki help guides always have things to do.
That said:
- homepage
- github (latest build)
- issue tracker
- 8chan.moe /t/ (Hydrus Network General) (endchan bunker (.org))
- tumblr (rss)
- new downloads
- old downloads
- x
- discord
- patreon
- user-run repository and wiki (including download presets for several non-default boorus)
Warning
I am working on this system right now and will be moving the 'move files now' action to a more granular, always-on background migration. This document will update to reflect those changes!
"},{"location":"database_migration.html#database_migration","title":"database migration","text":""},{"location":"database_migration.html#intro","title":"the hydrus database","text":"A hydrus client consists of three components:
-
the software installation
This is the part that comes with the installer or extract release, with the executable and dlls and a handful of resource folders. It doesn't store any of your settings--it just knows how to present a database as a nice application. If you just run the hydrus_client executable straight, it looks in its 'db' subdirectory for a database, and if one is not found, it creates a new one. If it sees a database running at a lower version than itself, it will update the database before booting it.
It doesn't really matter where you put this. An SSD will load it marginally quicker the first time, but you probably won't notice. If you run it without command-line parameters, it will try to write to its own directory (to create the initial database), so if you mean to run it like that, it should not be in a protected place like Program Files.
-
the actual SQLite database
The client stores all its preferences and current state and knowledge about files--like file size and resolution, tags, ratings, inbox status, and so on and on--in a handful of SQLite database files, defaulting to install_dir/db. Depending on the size of your client, these might total 1MB in size or be as much as 10GB.
In order to perform a search or to fetch or process tags, the client has to interact with these files in many small bursts, which means it is best if these files are on a drive with low latency. An SSD is ideal, but a regularly-defragged HDD with a reasonable amount of free space also works well.
-
your media files
All of your jpegs and webms and so on (and their thumbnails) are stored in a single complicated directory that is by default at install_dir/db/client_files. All the files are named by their hash and stored in efficient hash-based subdirectories. In general, it is not navigable by humans, but it works very well for the fast access from a giant pool of files the client needs to do to manage your media.
Thumbnails tend to be fetched dozens at a time, so it is, again, ideal if they are stored on an SSD. Your regular media files--which on many clients total hundreds of GB--are usually fetched one at a time for human consumption and do not benefit from the expensive low-latency of an SSD. They are best stored on a cheap HDD, and, if desired, also work well across a network file system.
Although an initial install will keep these parts together, it is possible to, say, run the SQLite database on a fast drive but keep your media in cheap slow storage. This is an excellent arrangement that works for many users. And if you have a very large collection, you can even spread your files across multiple drives. It is not very technically difficult, but I do not recommend it for new users.
Backing such an arrangement up is obviously more complicated, and the internal client backup is not sophisticated enough to capture everything, so I recommend you figure out a broader solution with a third-party backup program like FreeFileSync.
"},{"location":"database_migration.html#pulling_media_apart","title":"pulling your media apart","text":"Danger
As always, I recommend creating a backup before you try any of this, just in case it goes wrong.
If you would like to move your files and thumbnails to new locations, I generally recommend you not move their folders around yourself--the database has an internal knowledge of where it thinks its file and thumbnail folders are, and if you move them while it is closed, it will become confused.
Missing LocationsIf your folders are in the wrong locations on a client boot, a repair dialog appears, and you can manually update the client's internal understanding. This is not impossible to figure out, and in some tricky storage situations doing this on purpose can be faster than letting the client migrate things itself, but generally it is best and safest to do everything through the dialog.
Go database->move media files, giving you this dialog:
The buttons let you add more locations and remove old ones. The operations on this dialog are simple and atomic--at no point is your db ever invalid.
Beneath db? means that the path is beneath the main db dir and so is stored internally as a relative path. Portable paths will still function if the database changes location between boots (for instance, if you run the client from a USB drive and it mounts under a different location).
Weight means the relative amount of media you would like to store in that location. It only matters if you are spreading your files across multiple locations. If location A has a weight of 1 and B has a weight of 2, A will get approximately one third of your files and B will get approximately two thirds.
Max Size means the max total size of files the client will want to store in that location. Again, it only matters if you are spreading your files across multiple locations, but it is a simple way to ensure you don't go over a particular smaller hard drive's size. One location must always be limitless. This is not precise, so give it some padding. When one location is maxed out, the remaining locations will distribute the remainder of the files according to their respective weights. For the meantime, this will not update by itself. If you import many files, the location may go over its limit and you will have to revisit 'move media files' to rebalance your files again. Bear with me--I will fix this soon with the background migrate.
Let's set up an example move:
I made several changes:
- Added
C:\\hydrus_files
to store files. - Added
D:\\hydrus_files
to store files, with a max size of 128MB. - Set
C:\\hydrus_thumbs
as the location to store thumbnails. - Removed the original
C:\\Hydrus Network\\db\\client_files
location.
While the ideal usage has changed significantly, note that the current usage remains the same. Nothing moves until you click 'move files now'. Moving files will take some time to finish. Once done, it looks like this:
The current and ideal usages line up, and the defunct
"},{"location":"database_migration.html#launch_parameter","title":"informing the software that the SQLite database is not in the default location","text":"C:\\Hydrus Network\\db\\client_files
location, which no longer stores anything, is removed from the list.A straight call to the hydrus_client executable will look for a SQLite database in install_dir/db. If one is not found, it will create one. If you move your database and then try to run the client again, it will try to create a new empty database in that old location!
To tell it about the new database location, pass it a
-d
or--db_dir
command line argument, like so:hydrus_client -d=\"D:\\media\\my_hydrus_database\"
- --or--
hydrus_client --db_dir=\"G:\\misc documents\\New Folder (3)\\DO NOT ENTER\"
- --or, from source--
python hydrus_client.py -d=\"D:\\media\\my_hydrus_database\"
- --or, for macOS--
open -n -a \"Hydrus Network.app\" --args -d=\"/path/to/db\"
And it will instead use the given path. If no database is found, it will similarly create a new empty one at that location. You can use any path that is valid in your system.
Bad Locations
Do not run a SQLite database on a network location! The database relies on clever hardware-level exclusive file locks, which network interfaces often fake. While the program may work, I cannot guarantee the database will stay non-corrupt.
Do not run a SQLite database on a location with filesystem-level compression enabled! In the best case (BTRFS), the database can suddenly get extremely slow when it hits a certain size; in the worst (NTFS), a >50GB database will encounter I/O errors and receive sporadic corruption!
Rather than typing the path out in a terminal every time you want to launch your external database, create a new shortcut with the argument in. Something like this:
Note that an install with an 'external' database no longer needs access to write to its own path, so you can store it anywhere you like, including protected read-only locations (e.g. in 'Program Files'). Just double-check your shortcuts are good.
"},{"location":"database_migration.html#finally","title":"backups","text":"If your database now lives in one or more new locations, make sure to update your backup routine to follow them!
"},{"location":"database_migration.html#to_an_ssd","title":"moving to an SSD","text":"As an example, let's say you started using the hydrus client on your HDD, and now you have an SSD available and would like to move your thumbnails and main install to that SSD to speed up the client. Your database will be valid and functional at every stage of this, and it can all be undone. The basic steps are:
- Move your 'fast' files to the fast location.
- Move your 'slow' files out of the main install directory.
- Move the install and db itself to the fast location and update shortcuts.
Specifically:
- Update your backup if you maintain one.
- Create an empty folder on your HDD that is outside of your current install folder. Call it 'hydrus_files' or similar.
- Create two empty folders on your SSD with names like 'hydrus_db' and 'hydrus_thumbnails'.
- Set the 'thumbnail location override' to 'hydrus_thumbnails'. You should get that new location in the list, currently empty but prepared to take all your thumbs.
- Hit 'move files now' to actually move the thumbnails. Since this involves moving a lot of individual files from a high-latency source, it will take a long time to finish. The hydrus client may hang periodically as it works, but you can just leave it to work on its own--it will get there in the end. You can also watch it do its disk work under Task Manager.
- Now hit 'add location' and select your new 'hydrus_files'. 'hydrus_files' should be appear and be willing to take 50% of the files.
- Select the old location (probably 'install_dir/db/client_files') and hit 'remove location' or 'decrease weight' until it has weight 0 and you are prompted to remove it completely. 'hydrus_files' should now be willing to take all 100% of the files from the old location.
- Hit 'move files now' again to make this happen. This should be fast since it is just moving a bunch of folders across the same partition.
- With everything now 'non-portable' and hence decoupled from the db, you can now easily migrate the install and db to 'hydrus_db' simply by shutting the client down and moving the install folder in a file explorer.
- Update your shortcut to the new hydrus_client.exe location and try to boot.
- Update your backup scheme to match your new locations.
- Enjoy a much faster client.
You should now have something like this (let's say the D drive is the fast SSD, and E is the high capacity HDD):
"},{"location":"database_migration.html#multiple_clients","title":"p.s. running multiple clients","text":"Since you now know how to tell the software about an external database, you can, if you like, run multiple clients from the same install (and if you previously had multiple install folders, now you can now just use the one). Just make multiple shortcuts to the same hydrus_client executable but with different database directories. They can run at the same time. You'll save yourself a little memory and update-hassle.
"},{"location":"developer_api.html","title":"API documentation","text":""},{"location":"developer_api.html#library_modules_created_by_hydrus_users","title":"Library modules created by hydrus users","text":"- Hydrus API: A python module that talks to the API.
- hydrus.js: A node.js module that talks to the API.
- more projects on github
In general, the API deals with standard UTF-8 JSON. POST requests and 200 OK responses are generally going to be a JSON 'Object' with variable names as keys and values obviously as values. There are examples throughout this document. For GET requests, everything is in standard GET parameters, but some variables are complicated and will need to be JSON-encoded and then URL-encoded. An example would be the 'tags' parameter on GET /get_files/search_files, which is a list of strings. Since GET http URLs have limits on what characters are allowed, but hydrus tags can have all sorts of characters, you'll be doing this:
-
Your list of tags:
[ 'character:samus aran', 'creator:\u9752\u3044\u685c', 'system:height > 2000' ]\n
-
JSON-encoded:
[\"character:samus aran\", \"creator:\\\\u9752\\\\u3044\\\\u685c\", \"system:height > 2000\"]\n
-
Then URL-encoded:
%5B%22character%3Asamus%20aran%22%2C%20%22creator%3A%5Cu9752%5Cu3044%5Cu685c%22%2C%20%22system%3Aheight%20%3E%202000%22%5D\n
-
In python, converting your tag list to the URL-encoded string would be:
urllib.parse.quote( json.dumps( tag_list ), safe = '' )\n
-
Full URL path example:
/get_files/search_files?file_sort_type=6&file_sort_asc=false&tags=%5B%22character%3Asamus%20aran%22%2C%20%22creator%3A%5Cu9752%5Cu3044%5Cu685c%22%2C%20%22system%3Aheight%20%3E%202000%22%5D\n
The API returns JSON for everything except actual file/thumbnail requests. Every JSON response includes the
version
of the Client API andhydrus_version
of the Client hosting it (for brevity, these values are not included in the example responses in this help). For errors, you'll typically get 400 for a missing/invalid parameter, 401/403/419 for missing/insufficient/expired access, and 500 for a real deal serverside error.Note
For any request sent to the API, the total size of the initial request line (this includes the URL and any parameters) and the headers must not be larger than 2 megabytes. Exceeding this limit will cause the request to fail. Make sure to use pagination if you are passing very large JSON arrays as parameters in a GET request.
"},{"location":"developer_api.html#cbor","title":"CBOR","text":"The API now tentatively supports CBOR, which is basically 'byte JSON'. If you are in a lower level language or need to do a lot of heavy work quickly, try it out!
To send CBOR, for POST put Content-Type
application/cbor
in your request header instead ofapplication/json
, and for GET just add acbor=1
parameter to the URL string. Use CBOR to encode any parameters that you would previously put in JSON:For POST requests, just print the pure bytes in the body, like this:
cbor2.dumps( arg_dict )\n
For GET, encode the parameter value in base64, like this:
-or-base64.urlsafe_b64encode( cbor2.dumps( argument ) )\n
str( base64.urlsafe_b64encode( cbor2.dumps( argument ) ), 'ascii' )\n
If you send CBOR, the client will return CBOR. If you want to send CBOR and get JSON back, or vice versa (or you are uploading a file and can't set CBOR Content-Type), send the Accept request header, like so:
Accept: application/cbor\nAccept: application/json\n
If the client does not support CBOR, you'll get 406.
"},{"location":"developer_api.html#access_and_permissions","title":"Access and permissions","text":"The client gives access to its API through different 'access keys', which are the typical random 64-character hex used in many other places across hydrus. Each guarantees different permissions such as handling files or tags. Most of the time, a user will provide full access, but do not assume this. If a access key header or parameter is not provided, you will get 401, and all insufficient permission problems will return 403 with appropriate error text.
Access is required for every request. You can provide this as an http header, like so:
Hydrus-Client-API-Access-Key : 0150d9c4f6a6d2082534a997f4588dcf0c56dffe1d03ffbf98472236112236ae\n
Or you can include it in the normal parameters of any request (except POST /add_files/add_file, which uses the entire POST body for the file's bytes).
For GET, this means including it into the URL parameters:
/get_files/thumbnail?file_id=452158&Hydrus-Client-API-Access-Key=0150d9c4f6a6d2082534a997f4588dcf0c56dffe1d03ffbf98472236112236ae\n
For POST, this means in the JSON body parameters, like so:
{\n \"hash_id\" : 123456,\n \"Hydrus-Client-API-Access-Key\" : \"0150d9c4f6a6d2082534a997f4588dcf0c56dffe1d03ffbf98472236112236ae\"\n}\n
There is also a simple 'session' system, where you can get a temporary key that gives the same access without having to include the permanent access key in every request. You can fetch a session key with the /session_key command and thereafter use it just as you would an access key, just with Hydrus-Client-API-Session-Key instead.
Session keys will expire if they are not used within 24 hours, or if the client is restarted, or if the underlying access key is deleted. An invalid/expired session key will give a 419 result with an appropriate error text.
Bear in mind the Client API is still under construction. Setting up the Client API to be accessible across the internet requires technical experience to be convenient. HTTPS is available for encrypted comms, but the default certificate is self-signed (which basically means an eavesdropper can't see through it, but your ISP/government could if they decided to target you). If you have your own domain to host from and an SSL cert, you can replace them and it'll use them instead (check the db directory for client.crt and client.key). Otherwise, be careful about transmitting sensitive content outside of your localhost/network.
"},{"location":"developer_api.html#common_complex_parameters","title":"Common Complex Parameters","text":""},{"location":"developer_api.html#parameters_files","title":"files","text":"If you need to refer to some files, you can use any of the following:
Arguments:file_id
: (selective, a numerical file id)file_ids
: (selective, a list of numerical file ids)hash
: (selective, a hexadecimal SHA256 hash)hashes
: (selective, a list of hexadecimal SHA256 hashes)
In GET requests, make sure any list is percent-encoded JSON. Your
"},{"location":"developer_api.html#parameters_file_domain","title":"file domain","text":"[1,2,3]
becomesurllib.parse.quote( json.dumps( [1,2,3] ), safe = '' )
, and thusfile_ids=%5B1%2C%202%2C%203%5D
.When you are searching, you may want to specify a particular file domain. Most of the time, you'll want to just set
Arguments:file_service_key
, but this can get complex:file_service_key
: (optional, selective A, hexadecimal, the file domain on which to search)file_service_keys
: (optional, selective A, list of hexadecimals, the union of file domains on which to search)deleted_file_service_key
: (optional, selective B, hexadecimal, the 'deleted from this file domain' on which to search)deleted_file_service_keys
: (optional, selective B, list of hexadecimals, the union of 'deleted from this file domain' on which to search)
The service keys are as in /get_services.
Hydrus supports two concepts here:
- Searching over a UNION of subdomains. If the user has several local file domains, e.g. 'favourites', 'personal', 'sfw', and 'nsfw', they might like to search two of them at once.
- Searching deleted files of subdomains. You can specifically, and quickly, search the files that have been deleted from somewhere.
You can play around with this yourself by clicking 'multiple locations' in the client with help->advanced mode on.
In extreme edge cases, these two can be mixed by populating both A and B selective, making a larger union of both current and deleted file records.
Please note that unions can be very very computationally expensive. If you can achieve what you want with a single file_service_key, two queries in a row with different service keys, or an umbrella like
all my files
orall local files
, please do. Otherwise, let me know what is running slow and I'll have a look at it.'deleted from all local files' includes all files that have been physically deleted (i.e. deleted from the trash) and not available any more for fetch file/thumbnail requests. 'deleted from all my files' includes all of those physically deleted files and the trash. If a file is deleted with the special 'do not leave a deletion record' command, then it won't show up in a 'deleted from file domain' search!
'all known files' is a tricky domain. It converts much of the search tech to ignore where files actually are and look at the accompanying tag domain (e.g. all the files that have been tagged), and can sometimes be very expensive.
Also, if you have the option to set both file and tag domains, you cannot enter 'all known files'/'all known tags'. It is too complicated to support, sorry!
"},{"location":"developer_api.html#legacy_service_name_parameters","title":"legacy service_name parameters","text":"The Client API used to respond to name-based service identifiers, for instance using 'my tags' instead of something like '6c6f63616c2074616773'. Service names can change, and they aren't strictly unique either, so I have moved away from them, but there is some soft legacy support.
The client will attempt to convert any of these to their 'service_key(s)' equivalents:
- file_service_name
- tag_service_name
- service_names_to_tags
- service_names_to_actions_to_tags
- service_names_to_additional_tags
But I strongly encourage you to move away from them as soon as reasonably possible. Look up the service keys you need with /get_service or /get_services.
If you have a clever script/program that does many things, then hit up /get_services on session initialisation and cache an internal map of key_to_name for the labels to use when you present services to the user.
Also, note that all users can now copy their service keys from review services.
"},{"location":"developer_api.html#services_object","title":"The Services Object","text":"Hydrus manages its different available domains and actions with what it calls services. If you are a regular user of the program, you will know about review services and manage services. The Client API needs to refer to services, either to accept commands from you or to tell you what metadata files have and where.
When it does this, it gives you this structure, typically under a
Services Objectservices
key right off the root node:{\n \"c6f63616c2074616773\" : {\n \"name\" : \"my tags\",\n \"type\": 5,\n \"type_pretty\" : \"local tag service\"\n },\n \"5674450950748cfb28778b511024cfbf0f9f67355cf833de632244078b5a6f8d\" : {\n \"name\" : \"example tag repo\",\n \"type\" : 0,\n \"type_pretty\" : \"hydrus tag repository\"\n },\n \"6c6f63616c2066696c6573\" : {\n \"name\" : \"my files\",\n \"type\" : 2,\n \"type_pretty\" : \"local file domain\"\n },\n \"7265706f7369746f72792075706461746573\" : {\n \"name\" : \"repository updates\",\n \"type\" : 20,\n \"type_pretty\" : \"local update file domain\"\n },\n \"ae7d9a603008919612894fc360130ae3d9925b8577d075cd0473090ac38b12b6\" : {\n \"name\": \"example file repo\",\n \"type\" : 1,\n \"type_pretty\" : \"hydrus file repository\"\n },\n \"616c6c206c6f63616c2066696c6573\" : {\n \"name\" : \"all local files\",\n \"type\": 15,\n \"type_pretty\" : \"virtual combined local file service\"\n },\n \"616c6c206c6f63616c206d65646961\" : {\n \"name\" : \"all my files\",\n \"type\" : 21,\n \"type_pretty\" : \"virtual combined local media service\"\n },\n \"616c6c206b6e6f776e2066696c6573\" : {\n \"name\" : \"all known files\",\n \"type\" : 11,\n \"type_pretty\" : \"virtual combined file service\"\n },\n \"616c6c206b6e6f776e2074616773\" : {\n \"name\" : \"all known tags\",\n \"type\": 10,\n \"type_pretty\" : \"virtual combined tag service\"\n },\n \"74d52c6238d25f846d579174c11856b1aaccdb04a185cb2c79f0d0e499284f2c\" : {\n \"name\" : \"example local rating like service\",\n \"type\" : 7,\n \"type_pretty\" : \"local like/dislike rating service\",\n \"star_shape\" : \"circle\"\n },\n \"90769255dae5c205c975fc4ce2efff796b8be8a421f786c1737f87f98187ffaf\" : {\n \"name\" : \"example local rating numerical service\",\n \"type\" : 6,\n \"type_pretty\" : \"local numerical rating service\",\n \"star_shape\" : \"fat star\",\n \"min_stars\" : 1,\n \"max_stars\" : 5\n },\n \"b474e0cbbab02ca1479c12ad985f1c680ea909a54eb028e3ad06750ea40d4106\" : {\n \"name\" : \"example local rating inc/dec service\",\n \"type\" : 22,\n \"type_pretty\" : \"local inc/dec rating service\"\n },\n \"7472617368\" : {\n \"name\" : \"trash\",\n \"type\" : 14,\n \"type_pretty\" : \"local trash file domain\"\n }\n}\n
I hope you recognise some of the information here. But what's that hex key on each section? It is the
service_key
.All services have these properties:
name
- A mutable human-friendly name like 'my tags'. You can use this to present the service to the user--they should recognise it.type
- An integer enum saying whether the service is a local tag service or like/dislike rating service or whatever. This cannot change.service_key
- The true 'id' of the service. It is a string of hex, sometimes just twenty or so characters but in many cases 64 characters. This cannot change, and it is how we will refer to different services.
This
service_key
is important. A user can rename their services, soname
is not an excellent identifier, and definitely not something you should save to any permanent config file.If we want to search some files on a particular file and tag domain, we should expect to be saying something like
file_service_key=6c6f63616c2066696c6573
andtag_service_key=f032e94a38bb9867521a05dc7b189941a9c65c25048911f936fc639be2064a4b
somewhere in the request.You won't see all of these, but the service
type
enum is:- 0 - tag repository
- 1 - file repository
- 2 - a local file domain like 'my files'
- 5 - a local tag domain like 'my tags'
- 6 - a 'numerical' rating service with several stars
- 7 - a 'like/dislike' rating service with on/off status
- 10 - all known tags -- a union of all the tag services
- 11 - all known files -- a union of all the file services and files that appear in tag services
- 12 - the local booru -- you can ignore this
- 13 - IPFS
- 14 - trash
- 15 - all local files -- all files on hard disk ('all my files' + updates + trash)
- 17 - file notes
- 18 - Client API
- 19 - deleted from anywhere -- you can ignore this
- 20 - local updates -- a file domain to store repository update files in
- 21 - all my files -- union of all local file domains
- 22 - a 'inc/dec' rating service with positive integer rating
- 99 - server administration
type_pretty
is something you can show users. Hydrus uses the same labels in manage services and so on.Rating services now have some extra data:
- like/dislike and numerical services have
star_shape
, which is one ofcircle | square | fat star | pentagram star
- numerical services have
min_stars
(0 or 1) andmax_stars
(1 to 20)
If you are displaying ratings, don't feel crazy obligated to obey the shape! Show a \u2158, select from a dropdown list, do whatever you like!
If you want to know the services in a client, hit up /get_services, which simply gives the above. The same structure has recently been added to /get_files/file_metadata for convenience, since that refers to many different services when it is talking about file locations and ratings and so on.
Note: If you need to do some quick testing, you should be able to copy the
"},{"location":"developer_api.html#CDPP","title":"Current Deleted Pending Petitioned","text":"service_key
of any service by hitting the 'copy service key' button in review services.The content storage and update pipeline systems in hydrus consider content (e.g. 'on file service x, file y exists', 'on tag service x, file y has tag z', or 'on rating service x, file y has rating z') as existing within a blend of four states:
- Current - The content exists on the service.
- Deleted - The content has been removed from on the service.
- Pending - The content is queued to be added to the service.
- Petitioned - The content is queued to be removed from the service.
Where content that has never touched the service has a default 'null status' of no state at all.
Content may be in two categories at once--for instance, any Petitioned data is always also Current--but some states are mutually exclusive: Current data cannot also be Deleted.
Let's examine this more carefully specifically. Current, Deleted, and Pending may exist on their own, and Deleted and Pending may exist simultaneously. Read this horizontally to vertically, such that 'Content that is Current may also be Petitioned' while 'Content that is Petitioned must also be Current':
Current Deleted Pending Petitioned Current - Never Never May Deleted Never - May Never Pending Never May - Never Petitioned Must Never Never -Local services have no concept of pending or petitioned, so they generally just have 'add x'/'delete x' verbs to convert content between current and deleted. Remote services like the PTR have a queue of pending changes waiting to be committed by the user to the server, so in these cases I will expose to you the full suite of 'pend x'/'petition x'/'rescind pend x'/'rescind petition x'. Although you might somewhere be able to 'pend'/'petition' content to a local service, these 'pending' changes will be committed instantly so they are synonymous with add/delete.
- When an 'add' is committed, the data is removed from the deleted record and added to the current record.
- When a 'delete' is committed, the data is removed from the current record and added to the deleted record.
- When a 'pend' is committed, the data is removed from the deleted record and added to the current record. (It is also deleted from the pending record!)
- When a 'petition' is committed, the data is removed from the current record and added to the deleted record. (It is also deleted from the petitioned record!)
- When a 'rescind pend' is committed, the data is removed from the pending record.
- When a 'rescind petition' is committed, the data is removed from the petitioned record.
Let's look at where the verbs make sense. Again, horizontal, so 'Content that is Current can receive a Petition command':
Add/Pend Delete/Petition Rescind Pend Rescind Petition No state May May - - Current - May - - Deleted May - - - Pending May overwrite an existing reason - May - Petitioned - May overwrite an existing reason - MayIn hydrus, anything in the content update pipeline that doesn't make sense, here a '-', tends to result in an errorless no-op, so you might not care to do too much filtering on your end of things if you don't need to--don't worry about deleting something twice.
Note that content that does not yet exist can be pre-emptively petitioned/deleted. A couple of advanced cases enjoy this capability, for instance when you are syncing delete records from one client to another.
Also, it is often the case that content that is recorded as deleted is more difficult to re-add/re-pend. You might need to be a janitor to re-pend something, or, for this API, set some
"},{"location":"developer_api.html#access_management","title":"Access Management","text":""},{"location":"developer_api.html#api_version","title":"GEToverride_previously_deleted_mappings
parameter. This is by design and helps you to stop automatically re-adding something that the user spent slow human time deciding to delete./api_version
","text":"Gets the current API version. This increments every time I alter the API.
Restricted access: NO.
Required Headers: n/a
Arguments: n/a
Response: Some simple JSON describing the current api version (and hydrus client version, if you are interested). Note that this is not very useful any more, for two reasons:- The 'Server' header of every response (and a duplicated 'Hydrus-Server' one, if you have a complicated proxy situation that overwrites 'Server') are now in the form \"client api/{client_api_version} ({software_version})\", e.g. \"client api/32 (497)\".
- Every JSON response explicitly includes this now.
"},{"location":"developer_api.html#request_new_permissions","title":"GET{\n \"version\" : 17,\n \"hydrus_version\" : 441\n}\n
/request_new_permissions
","text":"Register a new external program with the client. This requires the 'add from api request' mini-dialog under services->review services to be open, otherwise it will 403.
Restricted access: NO.
Required Headers: n/a
Arguments:name
: (descriptive name of your access)permits_everything
: (selective, bool, whether to permit all tasks now and in future)-
basic_permissions
: Selective. A JSON-encoded list of numerical permission identifiers you want to request.The permissions are currently:
- 0 - Import and Edit URLs
- 1 - Import and Delete Files
- 2 - Edit File Tags
- 3 - Search for and Fetch Files
- 4 - Manage Pages
- 5 - Manage Cookies and Headers
- 6 - Manage Database
- 7 - Edit File Notes
- 8 - Edit File Relationships
- 9 - Edit File Ratings
- 10 - Manage Popups
- 11 - Edit File Times
- 12 - Commit Pending
- 13 - See Local Paths
Example request (for permissions [0,1])/request_new_permissions?name=migrator&permit_everything=true\n
Response: Some JSON with your access key, which is 64 characters of hex. This will not be valid until the user approves the request in the client ui. Example response/request_new_permissions?name=my%20import%20script&basic_permissions=%5B0%2C1%5D\n
{\n \"access_key\" : \"73c9ab12751dcf3368f028d3abbe1d8e2a3a48d0de25e64f3a8f00f3a1424c57\"\n}\n
The
"},{"location":"developer_api.html#session_key","title":"GETpermits_everything
overrules all the individual permissions and will encompass any new permissions added in future. It is a convenient catch-all for local-only services where you are running things yourself or the user otherwise implicitly trusts you./session_key
","text":"Get a new session key.
Restricted access: YES. No permissions required.
Required Headers: n/a
Arguments: n/a
Response: Some JSON with a new session key in hex. Example response{\n \"session_key\" : \"f6e651e7467255ade6f7c66050f3d595ff06d6f3d3693a3a6fb1a9c2b278f800\"\n}\n
Note
Note that the access you provide to get a new session key can be a session key, if that happens to be useful. As long as you have some kind of access, you can generate a new session key.
A session key expires after 24 hours of inactivity, whenever the client restarts, or if the underlying access key is deleted. A request on an expired session key returns 419.
"},{"location":"developer_api.html#verify_access_key","title":"GET/verify_access_key
","text":"Check your access key is valid.
Restricted access: YES. No permissions required.
Required Headers: n/a
Arguments: n/a
Response: 401/403/419 and some error text if the provided access/session key is invalid, otherwise some JSON with basic permission info. Example response
"},{"location":"developer_api.html#get_service","title":"GET{\n \"name\" : \"autotagger\",\n \"permits_everything\" : false,\n \"basic_permissions\" : [0, 1, 3],\n \"human_description\" : \"API Permissions (autotagger): add tags to files, import files, search for files: Can search: only autotag this\"\n}\n
/get_service
","text":"Ask the client about a specific service.
Restricted access: YES. At least one of Add Files, Add Tags, Manage Pages, or Search Files permission needed.Required Headers: n/a
Arguments:service_name
: (selective, string, the name of the service)service_key
: (selective, hex string, the service key of the service)
Response: Some JSON about the service. A similar format as /get_services and The Services Object. Example response/get_service?service_name=my%20tags\n/get_service?service_key=6c6f63616c2074616773\n
{\n \"service\" : {\n \"name\" : \"my tags\",\n \"service_key\" : \"6c6f63616c2074616773\",\n \"type\" : 5,\n \"type_pretty\" : \"local tag service\"\n }\n}\n
If the service does not exist, this gives 404. It is very unlikely but edge-case possible that two services will have the same name, in this case you'll get the pseudorandom first.
It will only respond to services in the /get_services list. I will expand the available types in future as we add ratings etc... to the Client API.
"},{"location":"developer_api.html#get_services","title":"GET/get_services
","text":"Ask the client about its services.
Restricted access: YES. At least one of Add Files, Add Tags, Manage Pages, or Search Files permission needed.Required Headers: n/a
Arguments: n/a
Response: Some JSON listing the client's services. Example response{\n \"services\" : \"The Services Object\"\n}\n
This now primarily uses The Services Object.
Note
If you do the request and look at the actual response, you will see a lot more data under different keys--this is deprecated, and will be deleted in 2024. If you use the old structure, please move over!
"},{"location":"developer_api.html#importing_and_deleting_files","title":"Importing and Deleting Files","text":""},{"location":"developer_api.html#add_files_add_file","title":"POST/add_files/add_file
","text":"Tell the client to import a file.
Restricted access: YES. Import Files permission needed. Required Headers:- Content-Type:
application/json
(if sending path),application/octet-stream
(if sending file)
path
: (the path you want to import)delete_after_successful_import
: (optional, defaults tofalse
, sets to delete the source file on a 'successful' or 'already in db' result)- file domain (optional, local file domain(s) only, defaults to your \"quiet\" file import options's destination)
{\n \"path\" : \"E:\\\\to_import\\\\ayanami.jpg\"\n}\n
If you include a file domain, it can only include 'local' file domains (by default on a new client this would just be \"my files\"), but you can send multiple to import to more than one location at once. Asking to import to 'all local files', 'all my files', 'trash', 'repository updates', or a file repository/ipfs will give you 400.
Arguments (as bytes): You can alternately just send the file's raw bytes as the entire POST body. In this case, you cannot send any other parameters, so you will be left with the default import file domain. Response:Some JSON with the import result. Please note that file imports for large files may take several seconds, and longer if the client is busy doing other db work, so make sure your request is willing to wait that long for the response. Example response
{\n \"status\" : 1,\n \"hash\" : \"29a15ad0c035c0a0e86e2591660207db64b10777ced76565a695102a481c3dd1\",\n \"note\" : \"\"\n}\n
status
is:- 1 - File was successfully imported
- 2 - File already in database
- 3 - File previously deleted
- 4 - File failed to import
- 7 - File vetoed
A file 'veto' is caused by the file import options (which in this case is the 'quiet' set under the client's options->importing) stopping the file due to its resolution or minimum file size rules, etc...
'hash' is the file's SHA256 hash in hexadecimal, and 'note' is any additional human-readable text appropriate to the file status that you may recognise from hydrus's normal import workflow. For an outright import error, it will be a summary of the exception that you can present to the user, and a new field
"},{"location":"developer_api.html#add_files_delete_files","title":"POSTtraceback
will have the full trace for debugging purposes./add_files/delete_files
","text":"Tell the client to send files to the trash.
Restricted access: YES. Import Files permission needed. Required Headers:Content-Type
:application/json
- files
- file domain (optional, defaults to all my files)
reason
: (optional, string, the reason attached to the delete action)
Response: 200 and no content.{\n \"hash\" : \"78f92ba4a786225ee2a1236efa6b7dc81dd729faf4af99f96f3e20bad6d8b538\"\n}\n
If you specify a file service, the file will only be deleted from that location. Only local file domains are allowed (so you can't delete from a file repository or unpin from ipfs yet). It defaults to all my files, which will delete from all local services (i.e. force sending to trash). Sending 'all local files' on a file already in the trash will trigger a physical file delete.
"},{"location":"developer_api.html#add_files_undelete_files","title":"POST/add_files/undelete_files
","text":"Tell the client to restore files that were previously deleted to their old file service(s).
Restricted access: YES. Import Files permission needed. Required Headers:Content-Type
: application/json
- files
- file domain (optional, defaults to all my files)
Response: 200 and no content.{\n \"hash\" : \"78f92ba4a786225ee2a1236efa6b7dc81dd729faf4af99f96f3e20bad6d8b538\"\n}\n
This is the reverse of a delete_files--restoring files back to where they came from. If you specify a file service, the files will only be undeleted to there (if they have a delete record, otherwise this is nullipotent). The default, 'all my files', undeletes to all local file services for which there are deletion records.
This operation will only occur on files that are currently in your file store (i.e. in 'all local files', and maybe, but not necessarily, in 'trash'). You cannot 'undelete' something you do not have!
"},{"location":"developer_api.html#add_files_clear_file_deletion_record","title":"POST/add_files/clear_file_deletion_record
","text":"Tell the client to forget that it once deleted files.
Restricted access: YES. Import Files permission needed. Required Headers:Content-Type
: application/json
- files
Response: 200 and no content.{\n \"hash\" : \"78f92ba4a786225ee2a1236efa6b7dc81dd729faf4af99f96f3e20bad6d8b538\"\n}\n
This is the same as the advanced deletion option of the same basic name. It will erase the record that a file has been physically deleted (i.e. it only applies to deletion records in the 'all local files' domain). A file that no longer has a 'all local files' deletion record will pass a 'exclude previously deleted files' check in a file import options.
"},{"location":"developer_api.html#add_files_migrate_files","title":"POST/add_files/migrate_files
","text":"Copy files from one local file domain to another.
Restricted access: YES. Import Files permission needed. Required Headers:Content-Type
:application/json
- files
- file domain
Response: 200 and no content.{\n \"hash\" : \"78f92ba4a786225ee2a1236efa6b7dc81dd729faf4af99f96f3e20bad6d8b538\",\n \"file_service_key\" : \"572ff2bd34857c0b3210b967a5a40cb338ca4c5747f2218d4041ddf8b6d077f1\"\n}\n
This is only appropriate if the user has multiple local file services. It does the same as the media files->add to->domain menu action. If the files are originally in local file domain A, and you say to add to B, then afterwards they will be in both A and B. You can say 'B and C' to add to multiple domains at once, if needed. The action is idempotent and will not overwrite 'already in' files with fresh timestamps or anything.
If you need to do a 'move' migrate, then please follow this command with a delete from wherever you need to remove from.
If you try to add non-local files (specifically, files that are not in 'all my files'), or migrate to a file domain that is not a local file domain, then this will 400!
"},{"location":"developer_api.html#add_files_archive_files","title":"POST/add_files/archive_files
","text":"Tell the client to archive inboxed files.
Restricted access: YES. Import Files permission needed. Required Headers:Content-Type
: application/json
- files
Response: 200 and no content.{\n \"hash\" : \"78f92ba4a786225ee2a1236efa6b7dc81dd729faf4af99f96f3e20bad6d8b538\"\n}\n
This puts files in the 'archive', taking them out of the inbox. It only has meaning for files currently in 'my files' or 'trash'. There is no error if any files do not currently exist or are already in the archive.
"},{"location":"developer_api.html#add_files_unarchive_files","title":"POST/add_files/unarchive_files
","text":"Tell the client re-inbox archived files.
Restricted access: YES. Import Files permission needed. Required Headers:Content-Type
: application/json
- files
Response: 200 and no content.{\n \"hash\" : \"78f92ba4a786225ee2a1236efa6b7dc81dd729faf4af99f96f3e20bad6d8b538\"\n}\n
This puts files back in the inbox, taking them out of the archive. It only has meaning for files currently in 'my files' or 'trash'. There is no error if any files do not currently exist or are already in the inbox.
"},{"location":"developer_api.html#add_files_generate_hashes","title":"POST/add_files/generate_hashes
","text":"Generate hashes for an arbitrary file.
Restricted access: YES. Import Files permission needed. Required Headers:- Content-Type:
application/json
(if sending path),application/octet-stream
(if sending file)
path
: (the path you want to import)
Arguments (as bytes): You can alternately just send the file's bytes as the POST body. Response:{\n \"path\" : \"E:\\\\to_import\\\\ayanami.jpg\"\n}\n
Some JSON with the hashes of the file Example response
{\n \"hash\": \"7de421a3f9be871a7037cca8286b149a31aecb6719268a94188d76c389fa140c\",\n \"perceptual_hashes\": [\n \"b44dc7b24dcb381c\"\n ],\n \"pixel_hash\": \"c7bf20e5c4b8a524c2c3e3af2737e26975d09cba2b3b8b76341c4c69b196da4e\",\n}\n
hash
is the sha256 hash of the submitted file.perceptual_hashes
is a list of perceptual hashes for the file.pixel_hash
is the sha256 hash of the pixel data of the rendered image.
"},{"location":"developer_api.html#importing_and_editing_urls","title":"Importing and Editing URLs","text":""},{"location":"developer_api.html#add_urls_get_url_files","title":"GEThash
will always be returned for any file, the others will only be returned for filetypes they can be generated for./add_urls/get_url_files
","text":"Ask the client about an URL's files.
Restricted access: YES. Import URLs permission needed.Required Headers: n/a
Arguments:url
: (the url you want to ask about)doublecheck_file_system
: true or false (optional, defaults False)
http://safebooru.org/index.php?page=post&s=view&id=2753608
:
Response: Some JSON which files are known to be mapped to that URL. Note this needs a database hit, so it may be delayed if the client is otherwise busy. Don't rely on this to always be fast. Example response/add_urls/get_url_files?url=http%3A%2F%2Fsafebooru.org%2Findex.php%3Fpage%3Dpost%26s%3Dview%26id%3D2753608\n
{\n \"normalised_url\" : \"https://safebooru.org/index.php?id=2753608&page=post&s=view\",\n \"url_file_statuses\" : [\n {\n \"status\" : 2,\n \"hash\" : \"20e9002824e5e7ffc240b91b6e4a6af552b3143993c1778fd523c30d9fdde02c\",\n \"note\" : \"url recognised: Imported at 2015/10/18 10:58:01, which was 3 years 4 months ago (before this check).\"\n }\n ]\n}\n
The
url_file_statuses
is a list of zero-to-n JSON Objects, each representing a file match the client found in its database for the URL. Typically, it will be of length 0 (for as-yet-unvisited URLs or Gallery/Watchable URLs that are not attached to files) or 1, but sometimes multiple files are given the same URL (sometimes by mistaken misattribution, sometimes by design, such as pixiv manga pages). Handling n files per URL is a pain but an unavoidable issue you should account for.status
mas the same mapping as for/add_files/add_file
, but the possible results are different:- 0 - File not in database, ready for import (you will only see this very rarely--usually in this case you will just get no matches)
- 2 - File already in database
- 3 - File previously deleted
hash
is the file's SHA256 hash in hexadecimal, and 'note' is some occasional additional human-readable text you may recognise from hydrus's normal import workflow.If you set
"},{"location":"developer_api.html#add_urls_get_url_info","title":"GETdoublecheck_file_system
totrue
, then any result that is 'already in db' (2) will be double-checked against the actual file system. This check happens on any normal file import process, just to check for and fix missing files (if the file is missing, the status becomes 0--new), but the check can take more than a few milliseconds on an HDD or a network drive, so the default behaviour, assuming you mostly just want to spam for 'seen this before' file statuses, is to not do it./add_urls/get_url_info
","text":"Ask the client for information about a URL.
Restricted access: YES. Import URLs permission needed.Required Headers: n/a
Arguments:url
: (the url you want to ask about)
https://boards.4chan.org/tv/thread/197641945/itt-moments-in-film-or-tv-that-aged-poorly
:
Response:/add_urls/get_url_info?url=https%3A%2F%2Fboards.4chan.org%2Ftv%2Fthread%2F197641945%2Fitt-moments-in-film-or-tv-that-aged-poorly\n
Some JSON describing what the client thinks of the URL. Example response
{\n \"request_url\" : \"https://a.4cdn.org/tv/thread/197641945.json\",\n \"normalised_url\" : \"https://boards.4chan.org/tv/thread/197641945\",\n \"url_type\" : 4,\n \"url_type_string\" : \"watchable url\",\n \"match_name\" : \"8chan thread\",\n \"can_parse\" : true\n}\n
The url types are currently:
- 0 - Post URL
- 2 - File URL
- 3 - Gallery URL
- 4 - Watchable URL
- 5 - Unknown URL (i.e. no matching URL Class)
'Unknown' URLs are treated in the client as direct File URLs. Even though the 'File URL' type is available, most file urls do not have a URL Class, so they will appear as Unknown. Adding them to the client will pass them to the URL Downloader as a raw file for download and import.
The
normalised_url
is the fully normalised URL--what is used for comparison and saving to disk.The
"},{"location":"developer_api.html#add_urls_add_url","title":"POSTrequest_url
is either the lighter 'for server' normalised URL, which may include ephemeral token parameters, or, as in the case here, the fully converted API/redirect URL. (When hydrus is asked to check a 4chan thread, it doesn't hit the HTML, but the JSON API.)/add_urls/add_url
","text":"Tell the client to 'import' a URL. This triggers the exact same routine as drag-and-dropping a text URL onto the main client window.
Restricted access: YES. Import URLs permission needed. Add Tags needed to include tags. Required Headers:Content-Type
:application/json
url
: (the url you want to add)destination_page_key
: (optional page identifier for the page to receive the url)destination_page_name
: (optional page name to receive the url)- file domain (optional, sets where to import the file)
show_destination_page
: (optional, defaulting to false, controls whether the UI will change pages on add)service_keys_to_additional_tags
: (optional, selective, tags to give to any files imported from this url)filterable_tags
: (optional tags to be filtered by any tag import options that applies to the URL)
If you specify a
destination_page_name
and an appropriate importer page already exists with that name, that page will be used. Otherwise, a new page with that name will be recreated (and used by subsequent calls with that name). Make sure it that page name is unique (e.g. '/b/ threads', not 'watcher') in your client, or it may not be found.Alternately,
destination_page_key
defines exactly which page should be used. Bear in mind this page key is only valid to the current session (they are regenerated on client reset or session reload), so you must figure out which one you want using the /manage_pages/get_pages call. If the correct page_key is not found, or the page it corresponds to is of the incorrect type, the standard page selection/creation rules will apply.You can set a destination file domain, which will select (or, for probably most of your initial requests, create) a download page that has a non-default 'file import options' with the given destination. If you set both a file domain and also a
destination_page_key
, then the page key takes precedence. If you do not set a file domain, then the import uses whatever the page has, like normal; for url import pages, this is probably your \"loud\" file import options default.show_destination_page
defaults to False to reduce flicker when adding many URLs to different pages quickly. If you turn it on, the client will behave like a URL drag and drop and select the final page the URL ends up on.service_keys_to_additional_tags
uses the same data structure as in /add_tags/add_tags--service keys to a list of tags to add. You will need 'add tags' permission or this will 403. These tags work exactly as 'additional' tags work in a tag import options. They are service specific, and always added unless some advanced tag import options checkbox (like 'only add tags to new files') is set.filterable_tags works like the tags parsed by a hydrus downloader. It is just a list of strings. They have no inherant service and will be sent to a tag import options, if one exists, to decide which tag services get what. This parameter is useful if you are pulling all a URL's tags outside of hydrus and want to have them processed like any other downloader, rather than figuring out service names and namespace filtering on your end. Note that in order for a tag import options to kick in, I think you will have to have a Post URL URL Class hydrus-side set up for the URL so some tag import options (whether that is Class-specific or just the default) can be loaded at import time.
Example request body
Example request body{\n \"url\" : \"https://8ch.net/tv/res/1846574.html\",\n \"destination_page_name\" : \"kino zone\",\n \"service_keys_to_additional_tags\" : {\n \"6c6f63616c2074616773\" : [\"as seen on /tv/\"]\n }\n}\n
Response: Some JSON with info on the URL added. Example response{\n \"url\" : \"https://safebooru.org/index.php?page=post&s=view&id=3195917\",\n \"filterable_tags\" : [\n \"1girl\",\n \"artist name\",\n \"creator:azto dio\",\n \"blonde hair\",\n \"blue eyes\",\n \"breasts\",\n \"character name\",\n \"commentary\",\n \"english commentary\",\n \"formal\",\n \"full body\",\n \"glasses\",\n \"gloves\",\n \"hair between eyes\",\n \"high heels\",\n \"highres\",\n \"large breasts\",\n \"long hair\",\n \"long sleeves\",\n \"looking at viewer\",\n \"series:metroid\",\n \"mole\",\n \"mole under mouth\",\n \"patreon username\",\n \"ponytail\",\n \"character:samus aran\",\n \"solo\",\n \"standing\",\n \"suit\",\n \"watermark\"\n ]\n}\n
"},{"location":"developer_api.html#add_urls_associate_url","title":"POST{\n \"human_result_text\" : \"\\\"https://8ch.net/tv/res/1846574.html\\\" URL added successfully.\",\n \"normalised_url\" : \"https://8ch.net/tv/res/1846574.html\"\n}\n
/add_urls/associate_url
","text":"Manage which URLs the client considers to be associated with which files.
Restricted access: YES. Import URLs permission needed. Required Headers:Content-Type
:application/json
- files
url_to_add
: (optional, selective A, an url you want to associate with the file(s))urls_to_add
: (optional, selective A, a list of urls you want to associate with the file(s))url_to_delete
: (optional, selective B, an url you want to disassociate from the file(s))urls_to_delete
: (optional, selective B, a list of urls you want to disassociate from the file(s))normalise_urls
: (optional, default true, only affects the 'add' urls)
The single/multiple arguments work the same--just use whatever is convenient for you.
Unless you really know what you are doing, I strongly recommend you stick to associating URLs with just one single 'hash' at a time. Multiple hashes pointing to the same URL is unusual and frequently unhelpful.
By default, anything you throw at the 'add' side will be normalised nicely, but if you need to add some specific/weird URL text, or you need to add a URI, set
Example request bodynormalise_urls
tofalse
. Anything you throw at the 'delete' side will not be normalised, so double-check you are deleting exactly what you mean to via GET /get_files/file_metadata etc..
Response: 200 with no content. Like when adding tags, this is safely idempotent--do not worry about re-adding URLs associations that already exist or accidentally trying to delete ones that don't."},{"location":"developer_api.html#editing_file_tags","title":"Editing File Tags","text":""},{"location":"developer_api.html#add_tags_clean_tags","title":"GET{\n \"url_to_add\" : \"https://rule34.xxx/index.php?id=2588418&page=post&s=view\",\n \"hash\" : \"3b820114f658d768550e4e3d4f1dced3ff8db77443472b5ad93700647ad2d3ba\"\n}\n
/add_tags/clean_tags
","text":"Ask the client about how it will see certain tags.
Restricted access: YES. Add Tags permission needed.Required Headers: n/a
Arguments (in percent-encoded JSON):tags
: (a list of the tags you want cleaned)
[ \" bikini \", \"blue eyes\", \" character : samus aran \", \" :)\", \" \", \"\", \"10\", \"11\", \"9\", \"system:wew\", \"-flower\" ]
:
Response:/add_tags/clean_tags?tags=%5B%22%20bikini%20%22%2C%20%22blue%20%20%20%20eyes%22%2C%20%22%20character%20%3A%20samus%20aran%20%22%2C%20%22%3A%29%22%2C%20%22%20%20%20%22%2C%20%22%22%2C%20%2210%22%2C%20%2211%22%2C%20%229%22%2C%20%22system%3Awew%22%2C%20%22-flower%22%5D\n
The tags cleaned according to hydrus rules. They will also be in hydrus human-friendly sorting order. Example response
{\n \"tags\" : [\"9\", \"10\", \"11\", \" ::)\", \"bikini\", \"blue eyes\", \"character:samus aran\", \"flower\", \"wew\"]\n}\n
Mostly, hydrus simply trims excess whitespace, but the other examples are rare issues you might run into. 'system' is an invalid namespace, tags cannot be prefixed with hyphens, and any tag starting with ':' is secretly dealt with internally as \"[no namespace]:[colon-prefixed-subtag]\". Again, you probably won't run into these, but if you see a mismatch somewhere and want to figure it out, or just want to sort some numbered tags, you might like to try this.
"},{"location":"developer_api.html#add_tags_get_siblings_and_parents","title":"GET/add_tags/get_siblings_and_parents
","text":"Ask the client about tags' sibling and parent relationships.
Restricted access: YES. Add Tags permission needed.Required Headers: n/a
Arguments (in percent-encoded JSON):tags
: (a list of the tags you want info on)
[ \"blue eyes\", \"samus aran\" ]
:
Response:/add_tags/get_siblings_and_parents?tags=%5B%22blue%20eyes%22%2C%20%22samus%20aran%22%5D\n
An Object showing all the display relationships for each tag on each service. Also The Services Object. Example response
{\n \"services\" : \"The Services Object\"\n \"tags\" : {\n \"blue eyes\" : {\n \"6c6f63616c2074616773\" : {\n \"ideal_tag\" : \"blue eyes\",\n \"siblings\" : [\n \"blue eyes\",\n \"blue_eyes\",\n \"blue eye\",\n \"blue_eye\"\n ],\n \"descendants\" : [],\n \"ancestors\" : []\n },\n \"877bfcf81f56e7e3e4bc3f8d8669f92290c140ba0acfd6c7771c5e1dc7be62d7\": {\n \"ideal_tag\" : \"blue eyes\",\n \"siblings\" : [\n \"blue eyes\"\n ],\n \"descendants\" : [],\n \"ancestors\" : []\n }\n },\n \"samus aran\" : {\n \"6c6f63616c2074616773\" : {\n \"ideal_tag\" : \"character:samus aran\",\n \"siblings\" : [\n \"samus aran\",\n \"samus_aran\",\n \"character:samus aran\"\n ],\n \"descendants\" : [\n \"character:samus aran (zero suit)\"\n \"cosplay:samus aran\"\n ],\n \"ancestors\" : [\n \"series:metroid\",\n \"studio:nintendo\"\n ]\n },\n \"877bfcf81f56e7e3e4bc3f8d8669f92290c140ba0acfd6c7771c5e1dc7be62d7\": {\n \"ideal_tag\" : \"samus aran\",\n \"siblings\" : [\n \"samus aran\"\n ],\n \"descendants\" : [\n \"zero suit samus\",\n \"samus_aran_(cosplay)\"\n ],\n \"ancestors\" : []\n }\n }\n }\n}\n
This data is essentially how mappings in the
storage
tag_display_type
becomedisplay
.The hex keys are the service keys, which you will have seen elsewhere, like GET /get_files/file_metadata. Note that there is no concept of 'all known tags' here. If a tag is in 'my tags', it follows the rules of 'my tags', and then all the services' display tags are merged into the 'all known tags' pool for user display.
Also, the siblings and parents here are not just what is in tags->manage tag siblings/parents, they are the final computed combination of rules as set in tags->manage where tag siblings and parents apply. The data given here is not guaranteed to be useful for editing siblings and parents on a particular service. That data, which is currently pair-based, will appear in a different API request in future.
ideal_tag
is how the tag appears in normal display to the user.siblings
is every tag that will show as theideal_tag
, including theideal_tag
itself.descendants
is every child (and recursive grandchild, great-grandchild...) that implies theideal_tag
.ancestors
is every parent (and recursive grandparent, great-grandparent...) that our tag implies.
Every descendant and ancestor is an
ideal_tag
itself that may have its own siblings.Most situations are simple, but remember that siblings and parents in hydrus can get complex. If you want to display this data, I recommend you plan to support simple service-specific workflows, and add hooks to recognise conflicts and other difficulty and, when that happens, abandon ship (send the user back to Hydrus proper). Also, if you show summaries of the data anywhere, make sure you add a 'and 22 more...' overflow mechanism to your menus, since if you hit up 'azur lane' or 'pokemon', you are going to get hundreds of children.
I generally warn you off computing sibling and parent mappings or counts yourself. The data from this request is best used for sibling and parent decorators on individual tags in a 'manage tags' presentation. The code that actually computes what siblings and parents look like in the 'display' context can be a pain at times, and I've already done it. Just run /search_tags or /file_metadata again after any changes you make and you'll get updated values.
"},{"location":"developer_api.html#add_tags_search_tags","title":"GET/add_tags/search_tags
","text":"Search the client for tags.
Restricted access: YES. Search for Files and Add Tags permission needed.Required Headers: n/a
Arguments:search
: (the tag text to search for, enter exactly what you would in the client UI)- file domain (optional, defaults to all my files)
tag_service_key
: (optional, hexadecimal, the tag domain on which to search, defaults to all known tags)tag_display_type
: (optional, string, to select whether to search raw or sibling-processed tags, defaults tostorage
)
The
file domain
andtag_service_key
perform the function of the file and tag domain buttons in the client UI.The
tag_display_type
can be eitherstorage
(the default), which searches your file's stored tags, just as they appear in a 'manage tags' dialog, ordisplay
, which searches the sibling-processed tags, just as they appear in a normal file search page. In the example above, setting thetag_display_type
todisplay
could well combine the two kim possible tags and give a count of 3 or 4.'all my files'/'all known tags' works fine for most cases, but a specific tag service or 'all known files'/'tag service' can work better for editing tag repository
Example request: Example requeststorage
contexts, since it provides results just for that service, and for repositories, it gives tags for all the non-local files other users have tagged.
Response: Some JSON listing the client's matching tags. Example response/add_tags/search_tags?search=kim&tag_display_type=display\n
{\n \"tags\" : [\n {\n \"value\" : \"series:kim possible\", \n \"count\" : 3\n },\n {\n \"value\" : \"kimchee\", \n \"count\" : 2\n },\n {\n \"value\" : \"character:kimberly ann possible\", \n \"count\" : 1\n }\n ]\n}\n
The
tags
list will be sorted by descending count. The various rules in tags->manage tag display and search (e.g. no pure*
searches on certain services) will also be checked--and if violated, you will get 200 OK but an empty result.Note that if your client api access is only allowed to search certain tags, the results will be similarly filtered.
"},{"location":"developer_api.html#add_tags_add_tags","title":"POST/add_tags/add_tags
","text":"Make changes to the tags that files have.
Restricted access: YES. Add Tags permission needed.Required Headers: n/a
Arguments (in JSON):- files
service_keys_to_tags
: (selective B, an Object of service keys to lists of tags to be 'added' to the files)service_keys_to_actions_to_tags
: (selective B, an Object of service keys to content update actions to lists of tags)override_previously_deleted_mappings
: (optional, defaulttrue
)create_new_deleted_mappings
: (optional, defaulttrue
)
In 'service_keys_to...', the keys are as in /get_services. You may need some selection UI on your end so the user can pick what to do if there are multiple choices.
Also, you can use either '...to_tags', which is simple and add-only, or '...to_actions_to_tags', which is more complicated and allows you to remove/petition or rescind pending content.
The permitted 'actions' are:
- 0 - Add to a local tag service.
- 1 - Delete from a local tag service.
- 2 - Pend to a tag repository.
- 3 - Rescind a pend from a tag repository.
- 4 - Petition from a tag repository. (This is special)
- 5 - Rescind a petition from a tag repository.
Read about Current Deleted Pending Petitioned for more info on these states.
When you petition a tag from a repository, a 'reason' for the petition is typically needed. If you send a normal list of tags here, a default reason of \"Petitioned from API\" will be given. If you want to set your own reason, you can instead give a list of [ tag, reason ] pairs.
Some example requests: Adding some tags to a file
Adding more tags to two files{\n \"hash\" : \"df2a7b286d21329fc496e3aa8b8a08b67bb1747ca32749acb3f5d544cbfc0f56\",\n \"service_keys_to_tags\" : {\n \"6c6f63616c2074616773\" : [\"character:supergirl\", \"rating:safe\"]\n }\n}\n
A complicated transaction with all possible actions{\n \"hashes\" : [\n \"df2a7b286d21329fc496e3aa8b8a08b67bb1747ca32749acb3f5d544cbfc0f56\",\n \"f2b022214e711e9a11e2fcec71bfd524f10f0be40c250737a7861a5ddd3faebf\"\n ],\n \"service_keys_to_tags\" : {\n \"6c6f63616c2074616773\" : [\"process this\"],\n \"ccb0cf2f9e92c2eb5bd40986f72a339ef9497014a5fb8ce4cea6d6c9837877d9\" : [\"creator:dandon fuga\"]\n }\n}\n
{\n \"hash\" : \"df2a7b286d21329fc496e3aa8b8a08b67bb1747ca32749acb3f5d544cbfc0f56\",\n \"service_keys_to_actions_to_tags\" : {\n \"6c6f63616c2074616773\" : {\n \"0\" : [\"character:supergirl\", \"rating:safe\"],\n \"1\" : [\"character:superman\"]\n },\n \"aa0424b501237041dab0308c02c35454d377eebd74cfbc5b9d7b3e16cc2193e9\" : {\n \"2\" : [\"character:supergirl\", \"rating:safe\"],\n \"3\" : [\"filename:image.jpg\"],\n \"4\" : [[\"creator:danban faga\", \"typo\"], [\"character:super_girl\", \"underscore\"]],\n \"5\" : [\"skirt\"]\n }\n }\n}\n
This last example is far more complicated than you will usually see. Pend rescinds and petition rescinds are not common. Petitions are also quite rare, and gathering a good petition reason for each tag is often a pain.
Note that the enumerated status keys in the service_keys_to_actions_to_tags structure are strings, not ints (JSON does not support int keys for Objects).
The
override_previously_deleted_mappings
parameter adjusts your Add/Pend actions. In the client, if a human, in the manage tags dialog, tries to add a tag mapping that has been previously deleted, that deleted record will be overwritten. An automatic system like a gallery parser will filter/skip any Add/Pend actions in this case (so that repeat downloads do not overwrite a human user delete, etc..). The Client API acts like a human, by default, overwriting previously deleted mappings. If you want to spam a lot of new mappings but do not want to overwrite previously deletion decisions, acting like a downloader, then set this tofalse
.The
create_new_deleted_mappings
parameter adjusts your Delete/Petition actions, particularly whether a delete record should be made even if the tag does not exist on the file. There are not many ways to spontaneously create a delete record in the normal hydrus UI, but you as the Client API should think whether this is what you want. By default, the Client API will write a delete record whether the tag already exists for the file or not. If you only want to create a delete record (which prohibits the tag being added back again by something like a downloader, as withoverride_previously_deleted_mappings
) when the tag already exists on the file, then set this tofalse
. Are you saying 'migrate all these deleted tag records from A to B so that none of them are re-added'? Then you want thistrue
. Are you saying 'This tag was applied incorrectly to some but perhaps not all of of these files; where it exists, delete it.'? Then set itfalse
.There is currently no way to delete a tag mapping without leaving a delete record (as you can with files). This will probably happen, though, and it'll be a new parameter here.
Response description: 200 and no content.Note
Note also that hydrus tag actions are safely idempotent. You can pend a tag that is already pended, or add a tag that already exists, and not worry about an error--the surplus add action will be discarded. The same is true if you try to pend a tag that actually already exists, or rescinding a petition that doesn't. Any invalid actions will fail silently.
It is fine to just throw your 'process this' tags at every file import and not have to worry about checking which files you already added them to.
"},{"location":"developer_api.html#editing_file_ratings","title":"Editing File Ratings","text":""},{"location":"developer_api.html#edit_ratings_set_rating","title":"POST/edit_ratings/set_rating
","text":"Add or remove ratings associated with a file.
Restricted access: YES. Edit Ratings permission needed. Required Headers:Content-Type
:application/json
- files
rating_service_key
: (hexadecimal, the rating service you want to edit)rating
: (mixed datatype, the rating value you want to set)
{\n \"hash\" : \"3b820114f658d768550e4e3d4f1dced3ff8db77443472b5ad93700647ad2d3ba\",\n \"rating_service_key\" : \"282303611ba853659aa60aeaa5b6312d40e05b58822c52c57ae5e320882ba26e\",\n \"rating\" : 2\n}\n
This is fairly simple, but there are some caveats around the different rating service types and the actual data you are setting here. It is the same as you'll see in GET /get_files/file_metadata.
"},{"location":"developer_api.html#likedislike_ratings","title":"Like/Dislike Ratings","text":"Send
"},{"location":"developer_api.html#numerical_ratings","title":"Numerical Ratings","text":"true
for 'like',false
for 'dislike', ornull
for 'unset'.Send an
"},{"location":"developer_api.html#incdec_ratings","title":"Inc/Dec Ratings","text":"int
for the number of stars to set, ornull
for 'unset'.Send an
int
for the number to set. 0 is your minimum.As with GET /get_files/file_metadata, check The Services Object for the min/max stars on a numerical rating service.
Response: 200 and no content."},{"location":"developer_api.html#editing_file_times","title":"Editing File Times","text":""},{"location":"developer_api.html#edit_times_set_time","title":"POST/edit_times/set_time
","text":"Add or remove timestamps associated with a file.
Restricted access: YES. Edit Times permission needed. Required Headers:Content-Type
:application/json
- files
timestamp
: (selective, float or int of the time in seconds, ornull
for deleting web domain times)timestamp_ms
: (selective, int of the time in milliseconds, ornull
for deleting web domain times)timestamp_type
: (int, the type of timestamp you are editing)file_service_key
: (dependant, hexadecimal, the file service you are editing in 'imported'/'deleted'/'previously imported')canvas_type
: (dependant, int, the canvas type you are editing in 'last viewed')domain
: (dependant, string, the domain you are editing in 'modified (web domain)')
Example request body, more complicated{\n \"timestamp\" : \"1641044491\",\n \"timestamp_type\" : 5\n}\n
Example request body, deleting{\n \"timestamp\" : \"1641044491.458\",\n \"timestamp_type\" : 6,\n \"canvas_type\" : 1\n}\n
{\n \"timestamp_ms\" : null,\n \"timestamp_type\" : 0,\n \"domain\" : \"blahbooru.org\"\n}\n
This is a copy of the manage times dialog in the program, so if you are uncertain about something, check that out. The client records timestamps up to millisecond accuracy.
You have to select some files, obviously. I'd imagine most uses will be over one file at a time, but you can spam 100 or 10,000 if you need to.
Then choose whether you want to work with
timestamp
ortimestamp_ms
.timestamp
can be an integer or a float, and in the latter case, the API will suck up the three most significant digits to be the millisecond data.timestamp_ms
is an integer of milliseconds, simply thetimestamp
value multiplied by 1,000. It doesn't matter which you use--whichever is easiest for you.If you send
null
timestamp time, then this will instruct to delete the existing value, if possible and reasonable.timestamp_type
is an enum as follows:- 0 - File modified time (web domain)
- 1 - File modified time (on the hard drive)
- 3 - File import time
- 4 - File delete time
- 5 - Archived time
- 6 - Last viewed (in the media viewer)
- 7 - File originally imported time
Adding or Deleting
You can add or delete type 0 (web domain) timestamps, but you can only edit existing instances of all the others. This is broadly how the manage times dialog works, also. Stuff like 'last viewed' is tied up with other numbers like viewtime and num_views, so if that isn't already in the database, then we can't just add the timestamp on its own. Same with 'deleted time' for a file that isn't deleted! So, in general, other than web domain stuff, you can only edit times you already see in /get_files/file_metadata.
If you select 0, you have to include a
domain
, which will usually be a web domain, but you can put anything in there.If you select 1, the client will not alter the modified time on your hard disk, only the database record. This is unlike the dialog. Let's let this system breathe a bit before we try to get too clever.
If you select 3, 4, or 7, you have to include a
file_service_key
. The 'previously imported' time is for deleted files only; it records when the file was originally imported so if the user hits 'undo', the database knows what import time to give back to it.If you select 6, you have to include a
canvas_type
, which is:- 0 - Media viewer
- 1 - Preview viewer
/add_notes/set_notes
","text":"Add or update notes associated with a file.
Restricted access: YES. Add Notes permission needed. Required Headers:Content-Type
:application/json
notes
: (an Object mapping string names to string texts)hash
: (selective, an SHA256 hash for the file in 64 characters of hexadecimal)file_id
: (selective, the integer numerical identifier for the file)merge_cleverly
: true or false (optional, defaults false)extend_existing_note_if_possible
: true or false (optional, defaults true)conflict_resolution
: 0, 1, 2, or 3 (optional, defaults 3)
With
merge_cleverly
leftfalse
, then this is a simple update operation. Existing notes will be overwritten exactly as you specify. Any other notes the file has will be untouched. Example request body{\n \"notes\" : {\n \"note name\" : \"content of note\",\n \"another note\" : \"asdf\"\n },\n \"hash\" : \"3b820114f658d768550e4e3d4f1dced3ff8db77443472b5ad93700647ad2d3ba\"\n}\n
If you turn on
merge_cleverly
, then the client will merge your new notes into the file's existing notes using the same logic you have seen in Note Import Options and the Duplicate Metadata Merge Options. This navigates conflict resolution, and you should use it if you are adding potential duplicate content from an 'automatic' source like a parser and do not want to wade into the logic. Do not use it for a user-editing experience (a user expects a strict overwrite/replace experience and will be confused by this mode).To start off, in this mode, if your note text exists under a different name for the file, your dupe note will not be added to your new name.
If a new note name already exists and its new text differs from what already exists:extend_existing_note_if_possible
makes it so your existing note text will overwrite an existing name (or a '... (1)' rename of that name) if the existing text is inside your given text.conflict_resolution
is an enum governing what to do in all other conflicts:- 0 - replace - Overwrite the existing conflicting note.
- 1 - ignore - Make no changes.
- 2 - append - Append the new text to the existing text.
- 3 - rename (default) - Add the new text under a 'name (x)'-style rename.
merge_cleverly=false
, this is exactly what you gave, and this operation is idempotent. Ifmerge_cleverly=true
, then this may differ, even be empty, and this operation might not be idempotent. Example response
"},{"location":"developer_api.html#add_notes_delete_notes","title":"POST{\n \"notes\" : {\n \"note name\" : \"content of note\",\n \"another note (1)\" : \"asdf\"\n }\n}\n
/add_notes/delete_notes
","text":"Remove notes associated with a file.
Restricted access: YES. Add Notes permission needed. Required Headers:Content-Type
:application/json
note_names
: (a list of string note names to delete)hash
: (selective, an SHA256 hash for the file in 64 characters of hexadecimal)file_id
: (selective, the integer numerical identifier for the file)
Response: 200 with no content. This operation is idempotent."},{"location":"developer_api.html#searching_and_fetching_files","title":"Searching and Fetching Files","text":"{\n \"note_names\" : [\"note name\", \"another note\"],\n \"hash\" : \"3b820114f658d768550e4e3d4f1dced3ff8db77443472b5ad93700647ad2d3ba\"\n}\n
File search in hydrus is not paginated like a booru--all searches return all results in one go. In order to keep this fast, search is split into two steps--fetching file identifiers with a search, and then fetching file metadata in batches. You may have noticed that the client itself performs searches like this--thinking a bit about a search and then bundling results in batches of 256 files before eventually throwing all the thumbnails on screen.
"},{"location":"developer_api.html#get_files_search_files","title":"GET/get_files/search_files
","text":"Search for the client's files.
Restricted access: YES. Search for Files permission needed. Additional search permission limits may apply.Required Headers: n/a
Arguments (in percent-encoded JSON):tags
: (a list of tags you wish to search for)- file domain (optional, defaults to all my files)
tag_service_key
: (optional, hexadecimal, the tag domain on which to search, defaults to all my files)include_current_tags
: (optional, bool, whether to search 'current' tags, defaults totrue
)include_pending_tags
: (optional, bool, whether to search 'pending' tags, defaults totrue
)file_sort_type
: (optional, integer, the results sort method, defaults to2
forimport time
)file_sort_asc
: true or false (optional, defaulttrue
, the results sort order)return_file_ids
: true or false (optional, defaulttrue
, returns file id results)return_hashes
: true or false (optional, defaultfalse
, returns hex hash results)
/get_files/search_files?tags=%5B%22blue%20eyes%22%2C%20%22blonde%20hair%22%2C%20%22%5Cu043a%5Cu0438%5Cu043d%5Cu043e%22%2C%20%22system%3Ainbox%22%2C%20%22system%3Alimit%3D16%22%5D\n
If the access key's permissions only permit search for certain tags, at least one positive whitelisted/non-blacklisted tag must be in the \"tags\" list or this will 403. Tags can be prepended with a hyphen to make a negated tag (e.g. \"-green eyes\"), but these will not be checked against the permissions whitelist.
Wildcards and namespace searches are supported, so if you search for 'character:sam*' or 'series:*', this will be handled correctly clientside.
Many system predicates are also supported using a text parser! The parser was designed by a clever user for human input and allows for a certain amount of error (e.g. ~= instead of \u2248, or \"isn't\" instead of \"is not\") or requires more information (e.g. the specific hashes for a hash lookup). Here's a big list of examples that are supported:
System Predicates- system:everything
- system:inbox
- system:archive
- system:has duration
- system:no duration
- system:is the best quality file of its duplicate group
- system:is not the best quality file of its duplicate group
- system:has audio
- system:no audio
- system:has exif
- system:no exif
- system:has human-readable embedded metadata
- system:no human-readable embedded metadata
- system:has icc profile
- system:no icc profile
- system:has tags
- system:no tags
- system:untagged
- system:number of tags > 5
- system:number of tags ~= 10
- system:number of tags > 0
- system:number of words < 2
- system:height = 600
- system:height > 900
- system:width < 200
- system:width > 1000
- system:filesize ~= 50 kilobytes
- system:filesize > 10megabytes
- system:filesize < 1 GB
- system:filesize > 0 B
- system:similar to abcdef01 abcdef02 abcdef03, abcdef04 with distance 3
- system:similar to abcdef distance 5
- system:limit = 100
- system:filetype = image/jpg, image/png, apng
- system:hash = abcdef01 abcdef02 abcdef03 (this does sha256)
- system:hash = abcdef01 abcdef02 md5
- system:modified date < 7 years 45 days 7h
- system:modified date > 2011-06-04
- system:last viewed time < 7 years 45 days 7h
- system:last view time < 7 years 45 days 7h
- system:date modified > 7 years 2 months
- system:date modified < 0 years 1 month 1 day 1 hour
- system:import time < 7 years 45 days 7h
- system:time imported < 7 years 45 days 7h
- system:time imported > 2011-06-04
- system:time imported > 7 years 2 months
- system:time imported < 0 years 1 month 1 day 1 hour
- system:time imported ~= 2011-1-3
- system:time imported ~= 1996-05-2
- system:duration < 5 seconds
- system:duration ~= 600 msecs
- system:duration > 3 milliseconds
- system:file service is pending to my files
- system:file service currently in my files
- system:file service is not currently in my files
- system:file service is not pending to my files
- system:number of file relationships = 2 duplicates
- system:number of file relationships > 10 potential duplicates
- system:num file relationships < 3 alternates
- system:num file relationships > 3 false positives
- system:ratio is wider than 16:9
- system:ratio is 16:9
- system:ratio taller than 1:1
- system:num pixels > 50 px
- system:num pixels < 1 megapixels
- system:num pixels ~= 5 kilopixel
- system:media views ~= 10
- system:all views > 0
- system:preview views < 10
- system:media viewtime < 1 days 1 hour 0 minutes
- system:all viewtime > 1 hours 100 seconds
- system:preview viewtime ~= 1 day 30 hours 100 minutes 90s
- system:has url matching regex index\\.php
- system:does not have a url matching regex index\\.php
- system:has url https://safebooru.donmai.us/posts/4695284
- system:does not have url https://safebooru.donmai.us/posts/4695284
- system:has domain safebooru.com
- system:does not have domain safebooru.com
- system:has a url with class safebooru file page
- system:does not have a url with url class safebooru file page
- system:tag as number page < 5
- system:has notes
- system:no notes
- system:does not have notes
- system:num notes is 5
- system:num notes > 1
- system:has note with name note name
- system:no note with name note name
- system:does not have note with name note name
- system:has a rating for
service_name
- system:does not have a rating for
service_name
- system:rating for
service_name
> \u2157 (numerical services) - system:rating for
service_name
is like (like/dislike services) - system:rating for
service_name
= 13 (inc/dec services)
Please test out the system predicates you want to send. If you are in help->advanced mode, you can test this parser in the advanced text input dialog when you click the OR* button on a tag autocomplete dropdown. More system predicate types and input formats will be available in future. Reverse engineering system predicate data from text is obviously tricky. If a system predicate does not parse, you'll get 400.
Also, OR predicates are now supported! Just nest within the tag list, and it'll be treated like an OR. For instance:
[ \"skirt\", [ \"samus aran\", \"lara croft\" ], \"system:height > 1000\" ]
Makes:
- skirt
- samus aran OR lara croft
- system:height > 1000
The file and tag services are for search domain selection, just like clicking the buttons in the client. They are optional--default is 'all my files' and 'all known tags'.
include_current_tags
andinclude_pending_tags
do the same as the buttons on the normal search interface. They alter the search of normal tags and tag-related system predicates like 'system:number of tags', including or discluding that type of tag from whatever the search is doing. If you set both of these tofalse
, you'll often get no results.File searches occur in the
display
tag_display_type
. If you want to pair autocomplete tag lookup from /search_tags to this file search (e.g. for making a standard booru search interface), then make sure you are searchingdisplay
tags there.file_sort_asc is 'true' for ascending, and 'false' for descending. The default is descending.
file_sort_type is by default import time. It is an integer according to the following enum, and I have written the semantic (asc/desc) meaning for each type after:
- 0 - file size (smallest first/largest first)
- 1 - duration (shortest first/longest first)
- 2 - import time (oldest first/newest first)
- 3 - filetype (N/A)
- 4 - random (N/A)
- 5 - width (slimmest first/widest first)
- 6 - height (shortest first/tallest first)
- 7 - ratio (tallest first/widest first)
- 8 - number of pixels (ascending/descending)
- 9 - number of tags (on the current tag domain) (ascending/descending)
- 10 - number of media views (ascending/descending)
- 11 - total media viewtime (ascending/descending)
- 12 - approximate bitrate (smallest first/largest first)
- 13 - has audio (audio first/silent first)
- 14 - modified time (oldest first/newest first)
- 15 - framerate (slowest first/fastest first)
- 16 - number of frames (smallest first/largest first)
- 18 - last viewed time (oldest first/newest first)
- 19 - archive timestamp (oldest first/newest first)
- 20 - hash hex (lexicographic/reverse lexicographic)
- 21 - pixel hash hex (lexicographic/reverse lexicographic)
- 22 - blurhash (lexicographic/reverse lexicographic)
The pixel and blurhash sorts will put files without one of these (e.g. an mp3) at the end, regardless of asc/desc.
Response:The full list of numerical file ids that match the search. Example response
Example response with return_hashes=true{\n \"file_ids\" : [125462, 4852415, 123, 591415]\n}\n
{\n \"hashes\" : [\n \"1b04c4df7accd5a61c5d02b36658295686b0abfebdc863110e7d7249bba3f9ad\",\n \"fe416723c731d679aa4d20e9fd36727f4a38cd0ac6d035431f0f452fad54563f\",\n \"b53505929c502848375fbc4dab2f40ad4ae649d34ef72802319a348f81b52bad\"\n ],\n \"file_ids\" : [125462, 4852415, 123]\n}\n
You can of course also specify
return_hashes=true&return_file_ids=false
just to get the hashes. The order of both lists is the same.File ids are internal and specific to an individual client. For a client, a file with hash H always has the same file id N, but two clients will have different ideas about which N goes with which H. IDs are a bit faster to retrieve than hashes and search with en masse, which is why they are exposed here.
This search does not apply the implicit limit that most clients set to all searches (usually 10,000), so if you do system:everything on a client with millions of files, expect to get boshed. Even with a system:limit included, complicated queries with large result sets may take several seconds to respond. Just like the client itself.
"},{"location":"developer_api.html#get_files_file_hashes","title":"GET/get_files/file_hashes
","text":"Lookup file hashes from other hashes.
Restricted access: YES. Search for Files permission needed.Required Headers: n/a
Arguments (in percent-encoded JSON):hash
: (selective, a hexadecimal hash)hashes
: (selective, a list of hexadecimal hashes)source_hash_type
: [sha256|md5|sha1|sha512] (optional, defaulting to sha256)desired_hash_type
: [sha256|md5|sha1|sha512]
If you have some MD5 hashes and want to see what their SHA256 are, or vice versa, this is the place. Hydrus records the non-SHA256 hashes for every file it has ever imported. This data is not removed on file deletion.
Example request
Response: A mapping Object of the successful lookups. Where no matching hash is found, no entry will be made (therefore, if none of your source hashes have matches on the client, this will return an empty/get_files/file_hashes?hash=ec5c5a4d7da4be154597e283f0b6663c&source_hash_type=md5&desired_hash_type=sha256\n
hashes
Object). Example response
"},{"location":"developer_api.html#get_files_file_metadata","title":"GET{\n \"hashes\" : {\n \"ec5c5a4d7da4be154597e283f0b6663c\" : \"2a0174970defa6f147f2eabba829c5b05aba1f1aea8b978611a07b7bb9cf9399\"\n }\n}\n
/get_files/file_metadata
","text":"Get metadata about files in the client.
Restricted access: YES. Search for Files permission needed. Additional search permission limits may apply.Required Headers: n/a
Arguments (in percent-encoded JSON):- files
create_new_file_ids
: true or false (optional if asking with hash(es), defaulting to false)only_return_identifiers
: true or false (optional, defaulting to false)only_return_basic_information
: true or false (optional, defaulting to false)detailed_url_information
: true or false (optional, defaulting to false)include_blurhash
: true or false (optional, defaulting to false. Only applies whenonly_return_basic_information
is true)include_milliseconds
: true or false (optional, defaulting to false)include_notes
: true or false (optional, defaulting to false)include_services_object
: true or false (optional, defaulting to true)hide_service_keys_tags
: Deprecated, will be deleted soon! true or false (optional, defaulting to true)
If your access key is restricted by tag, the files you search for must have been in the most recent search result.
Example request for two files with ids 123 and 4567
The same, but only wants hashes back/get_files/file_metadata?file_ids=%5B123%2C%204567%5D\n
And one that fetches two hashes/get_files/file_metadata?file_ids=%5B123%2C%204567%5D&only_return_identifiers=true\n
/get_files/file_metadata?hashes=%5B%224c77267f93415de0bc33b7725b8c331a809a924084bee03ab2f5fae1c6019eb2%22%2C%20%223e7cb9044fe81bda0d7a84b5cb781cba4e255e4871cba6ae8ecd8207850d5b82%22%5D\n
This request string can obviously get pretty ridiculously long. It also takes a bit of time to fetch metadata from the database. In its normal searches, the client usually fetches file metadata in batches of 256.
Response: A list of JSON Objects that store a variety of file metadata. Also The Services Object for service reference.Example response
And one where only_return_identifiers is true{\n \"services\" : \"The Services Object\",\n \"metadata\" : [\n {\n \"file_id\" : 123,\n \"hash\" : \"4c77267f93415de0bc33b7725b8c331a809a924084bee03ab2f5fae1c6019eb2\",\n \"size\" : 63405,\n \"mime\" : \"image/jpeg\",\n \"filetype_forced\" : false,\n \"filetype_human\" : \"jpeg\",\n \"filetype_enum\" : 1,\n \"ext\" : \".jpg\",\n \"width\" : 640,\n \"height\" : 480,\n \"thumbnail_width\" : 200,\n \"thumbnail_height\" : 150,\n \"duration\" : null,\n \"time_modified\" : null,\n \"time_modified_details\" : {},\n \"file_services\" : {\n \"current\" : {},\n \"deleted\" : {}\n },\n \"ipfs_multihashes\" : {},\n \"has_audio\" : false,\n \"blurhash\" : \"U6PZfSi_.AyE_3t7t7R**0o#DgR4_3R*D%xt\",\n \"pixel_hash\" : \"2519e40f8105599fcb26187d39656b1b46f651786d0e32fff2dc5a9bc277b5bb\",\n \"num_frames\" : null,\n \"num_words\" : null,\n \"is_inbox\" : false,\n \"is_local\" : false,\n \"is_trashed\" : false,\n \"is_deleted\" : false,\n \"has_exif\" : true,\n \"has_human_readable_embedded_metadata\" : true,\n \"has_icc_profile\" : true,\n \"has_transparency\" : false,\n \"known_urls\" : [],\n \"ratings\" : {\n \"74d52c6238d25f846d579174c11856b1aaccdb04a185cb2c79f0d0e499284f2c\" : null,\n \"90769255dae5c205c975fc4ce2efff796b8be8a421f786c1737f87f98187ffaf\" : null,\n \"b474e0cbbab02ca1479c12ad985f1c680ea909a54eb028e3ad06750ea40d4106\" : 0\n },\n \"tags\" : {\n \"6c6f63616c2074616773\" : {\n \"storage_tags\" : {},\n \"display_tags\" : {}\n },\n \"37e3849bda234f53b0e9792a036d14d4f3a9a136d1cb939705dbcd5287941db4\" : {\n \"storage_tags\" : {},\n \"display_tags\" : {}\n },\n \"616c6c206b6e6f776e2074616773\" : {\n \"storage_tags\" : {},\n \"display_tags\" : {}\n }\n }\n },\n {\n \"file_id\" : 4567,\n \"hash\" : \"3e7cb9044fe81bda0d7a84b5cb781cba4e255e4871cba6ae8ecd8207850d5b82\",\n \"size\" : 199713,\n \"mime\" : \"video/webm\",\n \"filetype_forced\" : false,\n \"filetype_human\" : \"webm\",\n \"filetype_enum\" : 21,\n \"ext\" : \".webm\",\n \"width\" : 1920,\n \"height\" : 1080,\n \"thumbnail_width\" : 200,\n \"thumbnail_height\" : 113,\n \"duration\" : 4040,\n \"time_modified\" : 1604055647,\n \"time_modified_details\" : {\n \"local\" : 1641044491,\n \"gelbooru.com\" : 1604055647\n },\n \"file_services\" : {\n \"current\" : {\n \"616c6c206c6f63616c2066696c6573\" : {\n \"time_imported\" : 1641044491\n },\n \"616c6c206c6f63616c206d65646961\" : {\n \"time_imported\" : 1641044491\n },\n \"cb072cffbd0340b67aec39e1953c074e7430c2ac831f8e78fb5dfbda6ec8dcbd\" : {\n \"time_imported\" : 1641204220\n }\n },\n \"deleted\" : {\n \"6c6f63616c2066696c6573\" : {\n \"time_deleted\" : 1641204274,\n \"time_imported\" : 1641044491\n }\n }\n },\n \"ipfs_multihashes\" : {\n \"55af93e0deabd08ce15ffb2b164b06d1254daab5a18d145e56fa98f71ddb6f11\" : \"QmReHtaET3dsgh7ho5NVyHb5U13UgJoGipSWbZsnuuM8tb\"\n },\n \"has_audio\" : true,\n \"blurhash\" : \"UHF5?xYk^6#M@-5b,1J5@[or[k6.};FxngOZ\",\n \"pixel_hash\" : \"1dd9625ce589eee05c22798a9a201602288a1667c59e5cd1fb2251a6261fbd68\",\n \"num_frames\" : 102,\n \"num_words\" : null,\n \"is_inbox\" : false,\n \"is_local\" : true,\n \"is_trashed\" : false,\n \"is_deleted\" : false,\n \"has_exif\" : false,\n \"has_human_readable_embedded_metadata\" : false,\n \"has_icc_profile\" : false,\n \"has_transparency\" : false,\n \"known_urls\" : [\n \"https://gelbooru.com/index.php?page=post&s=view&id=4841557\",\n \"https://img2.gelbooru.com/images/80/c8/80c8646b4a49395fb36c805f316c49a9.jpg\",\n \"http://origin-orig.deviantart.net/ed31/f/2019/210/7/8/beachqueen_samus_by_dandonfuga-ddcu1xg.jpg\"\n ],\n \"ratings\" : {\n \"74d52c6238d25f846d579174c11856b1aaccdb04a185cb2c79f0d0e499284f2c\" : true,\n \"90769255dae5c205c975fc4ce2efff796b8be8a421f786c1737f87f98187ffaf\" : 3,\n \"b474e0cbbab02ca1479c12ad985f1c680ea909a54eb028e3ad06750ea40d4106\" : 11\n },\n \"tags\" : {\n \"6c6f63616c2074616773\" : {\n \"storage_tags\" : {\n \"0\" : [\"samus favourites\"],\n \"2\" : [\"process this later\"]\n },\n \"display_tags\" : {\n \"0\" : [\"samus favourites\", \"favourites\"],\n \"2\" : [\"process this later\"]\n }\n },\n \"37e3849bda234f53b0e9792a036d14d4f3a9a136d1cb939705dbcd5287941db4\" : {\n \"storage_tags\" : {\n \"0\" : [\"blonde_hair\", \"blue_eyes\", \"looking_at_viewer\"],\n \"1\" : [\"bodysuit\"]\n },\n \"display_tags\" : {\n \"0\" : [\"blonde hair\", \"blue_eyes\", \"looking at viewer\"],\n \"1\" : [\"bodysuit\", \"clothing\"]\n }\n },\n \"616c6c206b6e6f776e2074616773\" : {\n \"storage_tags\" : {\n \"0\" : [\"samus favourites\", \"blonde_hair\", \"blue_eyes\", \"looking_at_viewer\"],\n \"1\" : [\"bodysuit\"]\n },\n \"display_tags\" : {\n \"0\" : [\"samus favourites\", \"favourites\", \"blonde hair\", \"blue_eyes\", \"looking at viewer\"],\n \"1\" : [\"bodysuit\", \"clothing\"]\n }\n }\n }\n }\n ]\n}\n
And where only_return_basic_information is true{\n \"services\" : \"The Services Object\",\n \"metadata\" : [\n {\n \"file_id\" : 123,\n \"hash\" : \"4c77267f93415de0bc33b7725b8c331a809a924084bee03ab2f5fae1c6019eb2\"\n },\n {\n \"file_id\" : 4567,\n \"hash\" : \"3e7cb9044fe81bda0d7a84b5cb781cba4e255e4871cba6ae8ecd8207850d5b82\"\n }\n ]\n}\n
"},{"location":"developer_api.html#basics","title":"basics","text":"{\n \"services\" : \"The Services Object\",\n \"metadata\" : [\n {\n \"file_id\" : 123,\n \"hash\" : \"4c77267f93415de0bc33b7725b8c331a809a924084bee03ab2f5fae1c6019eb2\",\n \"size\" : 63405,\n \"mime\" : \"image/jpeg\",\n \"filetype_forced\" : false,\n \"filetype_human\" : \"jpeg\",\n \"filetype_enum\" : 1,\n \"ext\" : \".jpg\",\n \"width\" : 640,\n \"height\" : 480,\n \"duration\" : null,\n \"has_audio\" : false,\n \"num_frames\" : null,\n \"num_words\" : null\n },\n {\n \"file_id\" : 4567,\n \"hash\" : \"3e7cb9044fe81bda0d7a84b5cb781cba4e255e4871cba6ae8ecd8207850d5b82\",\n \"size\" : 199713,\n \"mime\" : \"video/webm\",\n \"filetype_forced\" : false,\n \"filetype_human\" : \"webm\",\n \"filetype_enum\" : 21,\n \"ext\" : \".webm\",\n \"width\" : 1920,\n \"height\" : 1080,\n \"duration\" : 4040,\n \"has_audio\" : true,\n \"num_frames\" : 102,\n \"num_words\" : null\n }\n ]\n}\n
Size is in bytes. Duration is in milliseconds, and may be an int or a float.
is_trashed
means if the file is currently in the trash but available on the hard disk.is_deleted
means currently either in the trash or completely deleted from disk.file_services
stores which file services the file is currently in and deleted from. The entries are by the service key, same as for tags later on. In rare cases, the timestamps may benull
, if they are unknown (e.g. atime_deleted
for the file deleted before this information was tracked). Thetime_modified
can also be null. Time modified is just the filesystem modified time for now, but it will evolve into more complicated storage in future with multiple locations (website post times) that'll be aggregated to a sensible value in UI.ipfs_multihashes
stores the ipfs service key to any known multihash for the file.The
thumbnail_width
andthumbnail_height
are a generally reliable prediction but aren't a promise. The actual thumbnail you get from /get_files/thumbnail will be different if the user hasn't looked at it since changing their thumbnail options. You only get these rows for files that hydrus actually generates an actual thumbnail for. Things like pdf won't have it. You can use your own thumb, or ask the api and it'll give you a fixed fallback; those are mostly 200x200, but you can and should size them to whatever you want.include_notes
will decide whether to show a file's notes, in a simple names->texts Object.include_milliseconds
will determine if timestamps are integers (1641044491
), which is the default, or floats with three significant figures (1641044491.485
). As of v559, all file timestamps across the program are internally tracked with milliseconds.If the file has a thumbnail,
blurhash
gives a base 83 encoded string of its blurhash.pixel_hash
is an SHA256 of the image's pixel data and should exactly match for pixel-identical files (it is used in the duplicate system for 'must be pixel duplicates').If the file's filetype is forced by the user,
"},{"location":"developer_api.html#tags","title":"tags","text":"filetype_forced
becomestrue
and a second mime string,original_mime
is added.The
tags
structure is similar to the /add_tags/add_tags scheme, excepting that the status numbers are:- 0 - current
- 1 - pending
- 2 - deleted
- 3 - petitioned
Note
Since JSON Object keys must be strings, these status numbers are strings, not ints.
Read about Current Deleted Pending Petitioned for more info on these states.
While the 'storage_tags' represent the actual tags stored on the database for a file, 'display_tags' reflect how tags appear in the UI, after siblings are collapsed and parents are added. If you want to edit a file's tags, refer to the storage tags. If you want to render to the user, use the display tags. The display tag calculation logic is very complicated; if the storage tags change, do not try to guess the new display tags yourself--just ask the API again.
"},{"location":"developer_api.html#ratings","title":"ratings","text":"The
ratings
structure is simple, but it holds different data types. For each service:- For a like/dislike service, 'no rating' is null. 'like' is true, 'dislike' is false.
- For a numerical service, 'no rating' is null. Otherwise it will be an integer, for the number of stars.
- For an inc/dec service, it is always an integer. The default value is 0 for all files.
Check The Services Object to see the shape of a rating star, and min/max number of stars in a numerical service.
"},{"location":"developer_api.html#services","title":"services","text":"The
tags
,ratings
, andfile_services
structures use the hexadecimalservice_key
extensively. If you need to look up the respective service name or type, check The Services Object under the top levelservices
key.Note
If you look, those file structures actually include the service name and type already, but this bloated data is deprecated and will be deleted in 2024, so please transition over.
If you don't want the services object (it is generally superfluous on the 'simple' responses), then add
"},{"location":"developer_api.html#parameters","title":"parameters","text":"include_services_object=false
.The
metadata
list should come back in the same sort order you asked, whether that is infile_ids
orhashes
!If you ask with hashes rather than file_ids, hydrus will, by default, only return results when it has seen those hashes before. This is to stop the client making thousands of new file_id records in its database if you perform a scanning operation. If you ask about a hash the client has never encountered before--for which there is no file_id--you will get this style of result:
Missing file_id example{\n \"metadata\" : [\n {\n \"file_id\" : null,\n \"hash\" : \"766da61f81323629f982bc1b71b5c1f9bba3f3ed61caf99906f7f26881c3ae93\"\n }\n ]\n}\n
You can change this behaviour with
create_new_file_ids=true
, but bear in mind you will get a fairly 'empty' metadata result with lots of 'null' lines, so this is only useful for gathering the numerical ids for later Client API work.If you ask about file_ids that do not exist, you'll get 404.
If you set
only_return_basic_information=true
, this will be much faster for first-time requests than the full metadata result, but it will be slower for repeat requests. The full metadata object is cached after first fetch, the limited file info object is not. You can optionally setinclude_blurhash
when using this option to fetch blurhash strings for the files.If you add
For exampledetailed_url_information=true
, a new entry,detailed_known_urls
, will be added for each file, with a list of the same structure as /add_urls/get_url_info
. This may be an expensive request if you are querying thousands of files at once.
"},{"location":"developer_api.html#get_files_file","title":"GET{\n \"detailed_known_urls\": [\n {\n \"normalised_url\": \"https://gelbooru.com/index.php?id=4841557&page=post&s=view\",\n \"url_type\": 0,\n \"url_type_string\": \"post url\",\n \"match_name\": \"gelbooru file page\",\n \"can_parse\": true\n },\n {\n \"normalised_url\": \"https://img2.gelbooru.com/images/80/c8/80c8646b4a49395fb36c805f316c49a9.jpg\",\n \"url_type\": 5,\n \"url_type_string\": \"unknown url\",\n \"match_name\": \"unknown url\",\n \"can_parse\": false\n }\n ]\n}\n
/get_files/file
","text":"Get a file.
Restricted access: YES. Search for Files permission needed. Additional search permission limits may apply.Required Headers: n/a
Arguments :file_id
: (selective, numerical file id for the file)hash
: (selective, a hexadecimal SHA256 hash for the file)download
: (optional, boolean, defaultfalse
)
Only use one of
file_id
orhash
. As with metadata fetching, you may only use the hash argument if you have access to all files. If you are tag-restricted, you will have to use a file_id in the last search you ran.Example request
Example request/get_files/file?file_id=452158\n
Response: The file itself. You should get the correct mime type as the Content-Type header./get_files/file?hash=7f30c113810985b69014957c93bc25e8eb4cf3355dae36d8b9d011d8b0cf623a&download=true\n
By default, this will set the
Content-Disposition
header toinline
, which causes a web browser to show the file. If you setdownload=true
, it will set it toattachment
, which triggers the browser to automatically download it (or open the 'save as' dialog) instead.This stuff supports
"},{"location":"developer_api.html#get_files_thumbnail","title":"GETRange
requests, so if you want to build a video player, go nuts./get_files/thumbnail
","text":"Get a file's thumbnail.
Restricted access: YES. Search for Files permission needed. Additional search permission limits may apply.Required Headers: n/a
Arguments:file_id
: (selective, numerical file id for the file)hash
: (selective, a hexadecimal SHA256 hash for the file)
Only use one. As with metadata fetching, you may only use the hash argument if you have access to all files. If you are tag-restricted, you will have to use a file_id in the last search you ran.
Example request
Example request/get_files/thumbnail?file_id=452158\n
Response:/get_files/thumbnail?hash=7f30c113810985b69014957c93bc25e8eb4cf3355dae36d8b9d011d8b0cf623a\n
The thumbnail for the file. Some hydrus thumbs are jpegs, some are pngs. It should give you the correct image/jpeg or image/png Content-Type.
If hydrus keeps no thumbnail for the filetype, for instance with pdfs, then you will get the same default 'pdf' icon you see in the client. If the file does not exist in the client, or the thumbnail was expected but is missing from storage, you will get the fallback 'hydrus' icon, again just as you would in the client itself. This request should never give a 404.
Size of Normal Thumbs
Thumbnails are not guaranteed to be the correct size! If a thumbnail has not been loaded in the client in years, it could well have been fitted for older thumbnail settings. Also, even 'clean' thumbnails will not always fit inside the settings' bounding box; they may be boosted due to a high-DPI setting or spill over due to a 'fill' vs 'fit' preference. You cannot easily predict what resolution a thumbnail will or should have!
In general, thumbnails are the correct ratio. If you are drawing thumbs, you should embed them to fit or fill, but don't fix them at 100% true size: make sure they can scale to the size you want!
Size of Defaults
If you get a 'default' filetype thumbnail like the pdf or hydrus one, you will be pulling the pngs straight from the hydrus/static folder. They will most likely be 200x200 pixels.
"},{"location":"developer_api.html#get_files_file_path","title":"GET/get_files/file_path
","text":"Get a local file path.
Restricted access: YES. Search for Files permission and See Local Paths permission needed. Additional search permission limits may apply.Required Headers: n/a
Arguments :file_id
: (selective, numerical file id for the file)hash
: (selective, a hexadecimal SHA256 hash for the file)
Only use one. As with metadata fetching, you may only use the hash argument if you have access to all files. If you are tag-restricted, you will have to use a file_id in the last search you ran.
Example request
Example request/get_files/file_path?file_id=452158\n
Response: The actual path to the file on the host system. Filetype and size are included for convenience. Example response/get_files/file_path?hash=7f30c113810985b69014957c93bc25e8eb4cf3355dae36d8b9d011d8b0cf623a\n
{\n \"path\" : \"D:\\hydrus_files\\f7f\\7f30c113810985b69014957c93bc25e8eb4cf3355dae36d8b9d011d8b0cf623a.jpg\",\n \"filetype\" : \"image/jpeg\",\n \"size\" : 95237\n}\n
This will give 404 if the file is not stored locally (which includes if it should exist but is actually missing from the file store).
"},{"location":"developer_api.html#get_files_thumbnail","title":"GET/get_files/thumbnail_path
","text":"Get a local thumbnail path.
Restricted access: YES. Search for Files permission and See Local Paths permission needed. Additional search permission limits may apply.Required Headers: n/a
Arguments:file_id
: (selective, numerical file id for the file)hash
: (selective, a hexadecimal SHA256 hash for the file)include_thumbnail_filetype
: (optional, boolean, defaults tofalse
)
Only use one of
file_id
orhash
. As with metadata fetching, you may only use the hash argument if you have access to all files. If you are tag-restricted, you will have to use a file_id in the last search you ran.Example request
Example request/get_files/thumbnail?file_id=452158\n
Response: The actual path to the thumbnail on the host system. Example response/get_files/thumbnail?hash=7f30c113810985b69014957c93bc25e8eb4cf3355dae36d8b9d011d8b0cf623a&include_thumbnail_filetype=true\n
Example response with include_thumbnail_filetype=true{\n \"path\" : \"D:\\hydrus_files\\f7f\\7f30c113810985b69014957c93bc25e8eb4cf3355dae36d8b9d011d8b0cf623a.thumbnail\"\n}\n
{\n \"path\" : \"C:\\hydrus_thumbs\\f85\\85daaefdaa662761d7cb1b026d7b101e74301be08e50bf09a235794ec8656f79.thumbnail\",\n \"filetype\" : \"image/png\"\n}\n
All thumbnails in hydrus have the .thumbnail file extension and in content are either jpeg (almost always) or png (to handle transparency).
This will 400 if the given file type does not have a thumbnail in hydrus, and it will 404 if there should be a thumbnail but one does not exist and cannot be generated from the source file (which probably would mean that the source file was itself Not Found).
"},{"location":"developer_api.html#get_local_file_storage_locations","title":"GET/get_files/local_file_storage_locations
","text":"Get the local file storage locations, as you see under database->migrate files.
Restricted access: YES. Search for Files permission and See Local Paths permission needed.Required Headers: n/a
Arguments: n/a
Response: A list of the different file storage locations and what they store. Example response{\n \"locations\" : [\n {\n \"path\" : \"C:\\my_thumbs\",\n \"ideal_weight\" : 1,\n \"max_num_bytes\": None,\n \"prefixes\" : [\n \"t00\", \"t01\", \"t02\", \"t03\", \"t04\", \"t05\", \"t06\", \"t07\", \"t08\", \"t09\", \"t0a\", \"t0b\", \"t0c\", \"t0d\", \"t0e\", \"t0f\",\n \"t10\", \"t11\", \"t12\", \"t13\", \"t14\", \"t15\", \"t16\", \"t17\", \"t18\", \"t19\", \"t1a\", \"t1b\", \"t1c\", \"t1d\", \"t1e\", \"t1f\",\n \"t20\", \"t21\", \"t22\", \"t23\", \"t24\", \"t25\", \"t26\", \"t27\", \"t28\", \"t29\", \"t2a\", \"t2b\", \"t2c\", \"t2d\", \"t2e\", \"t2f\",\n \"t30\", \"t31\", \"t32\", \"t33\", \"t34\", \"t35\", \"t36\", \"t37\", \"t38\", \"t39\", \"t3a\", \"t3b\", \"t3c\", \"t3d\", \"t3e\", \"t3f\",\n \"t40\", \"t41\", \"t42\", \"t43\", \"t44\", \"t45\", \"t46\", \"t47\", \"t48\", \"t49\", \"t4a\", \"t4b\", \"t4c\", \"t4d\", \"t4e\", \"t4f\",\n \"t50\", \"t51\", \"t52\", \"t53\", \"t54\", \"t55\", \"t56\", \"t57\", \"t58\", \"t59\", \"t5a\", \"t5b\", \"t5c\", \"t5d\", \"t5e\", \"t5f\",\n \"t60\", \"t61\", \"t62\", \"t63\", \"t64\", \"t65\", \"t66\", \"t67\", \"t68\", \"t69\", \"t6a\", \"t6b\", \"t6c\", \"t6d\", \"t6e\", \"t6f\",\n \"t70\", \"t71\", \"t72\", \"t73\", \"t74\", \"t75\", \"t76\", \"t77\", \"t78\", \"t79\", \"t7a\", \"t7b\", \"t7c\", \"t7d\", \"t7e\", \"t7f\",\n \"t80\", \"t81\", \"t82\", \"t83\", \"t84\", \"t85\", \"t86\", \"t87\", \"t88\", \"t89\", \"t8a\", \"t8b\", \"t8c\", \"t8d\", \"t8e\", \"t8f\",\n \"t90\", \"t91\", \"t92\", \"t93\", \"t94\", \"t95\", \"t96\", \"t97\", \"t98\", \"t99\", \"t9a\", \"t9b\", \"t9c\", \"t9d\", \"t9e\", \"t9f\",\n \"ta0\", \"ta1\", \"ta2\", \"ta3\", \"ta4\", \"ta5\", \"ta6\", \"ta7\", \"ta8\", \"ta9\", \"taa\", \"tab\", \"tac\", \"tad\", \"tae\", \"taf\",\n \"tb0\", \"tb1\", \"tb2\", \"tb3\", \"tb4\", \"tb5\", \"tb6\", \"tb7\", \"tb8\", \"tb9\", \"tba\", \"tbb\", \"tbc\", \"tbd\", \"tbe\", \"tbf\",\n \"tc0\", \"tc1\", \"tc2\", \"tc3\", \"tc4\", \"tc5\", \"tc6\", \"tc7\", \"tc8\", \"tc9\", \"tca\", \"tcb\", \"tcc\", \"tcd\", \"tce\", \"tcf\",\n \"td0\", \"td1\", \"td2\", \"td3\", \"td4\", \"td5\", \"td6\", \"td7\", \"td8\", \"td9\", \"tda\", \"tdb\", \"tdc\", \"tdd\", \"tde\", \"tdf\",\n \"te0\", \"te1\", \"te2\", \"te3\", \"te4\", \"te5\", \"te6\", \"te7\", \"te8\", \"te9\", \"tea\", \"teb\", \"tec\", \"ted\", \"tee\", \"tef\",\n \"tf0\", \"tf1\", \"tf2\", \"tf3\", \"tf4\", \"tf5\", \"tf6\", \"tf7\", \"tf8\", \"tf9\", \"tfa\", \"tfb\", \"tfc\", \"tfd\", \"tfe\", \"tff\"\n ]\n },\n {\n \"path\" : \"D:\\hydrus_files_1\",\n \"ideal_weight\" : 5,\n \"max_num_bytes\": None,\n \"prefixes\" : [\n \"f00\", \"f02\", \"f04\", \"f05\", \"f08\", \"f0c\", \"f11\", \"f12\", \"f13\", \"f15\", \"f17\", \"f18\", \"f1a\", \"f1b\", \"f20\", \"f23\",\n \"f25\", \"f26\", \"f27\", \"f2b\", \"f2e\", \"f2f\", \"f31\", \"f35\", \"f36\", \"f37\", \"f38\", \"f3a\", \"f40\", \"f42\", \"f43\", \"f44\",\n \"f49\", \"f4b\", \"f4d\", \"f4e\", \"f50\", \"f51\", \"f55\", \"f59\", \"f60\", \"f63\", \"f64\", \"f65\", \"f66\", \"f68\", \"f69\", \"f6e\",\n \"f71\", \"f73\", \"f78\", \"f79\", \"f7a\", \"f7d\", \"f7f\", \"f82\", \"f83\", \"f84\", \"f86\", \"f87\", \"f88\", \"f89\", \"f8f\", \"f90\",\n \"f91\", \"f96\", \"f9e\", \"fa1\", \"fa4\", \"fa5\", \"fa7\", \"faa\", \"fad\", \"faf\", \"fb1\", \"fb9\", \"fba\", \"fbb\", \"fbf\", \"fc1\",\n \"fc4\", \"fc7\", \"fc8\", \"fcf\", \"fd2\", \"fd6\", \"fd7\", \"fd8\", \"fd9\", \"fdf\", \"fe2\", \"fe8\", \"fe9\", \"fea\", \"feb\", \"fec\",\n \"ff4\", \"ff7\", \"ffd\", \"ffe\"\n ]\n },\n {\n \"path\" : \"E:\\hydrus\\hydrus_files_2\",\n \"ideal_weight\" : 2,\n \"max_num_bytes\": 805306368000,\n \"prefixes\" : [\n \"f01\", \"f03\", \"f06\", \"f07\", \"f09\", \"f0a\", \"f0b\", \"f0d\", \"f0e\", \"f0f\", \"f10\", \"f14\", \"f16\", \"f19\", \"f1c\", \"f1d\",\n \"f1e\", \"f1f\", \"f21\", \"f22\", \"f24\", \"f28\", \"f29\", \"f2a\", \"f2c\", \"f2d\", \"f30\", \"f32\", \"f33\", \"f34\", \"f39\", \"f3b\",\n \"f3c\", \"f3d\", \"f3e\", \"f3f\", \"f41\", \"f45\", \"f46\", \"f47\", \"f48\", \"f4a\", \"f4c\", \"f4f\", \"f52\", \"f53\", \"f54\", \"f56\",\n \"f57\", \"f58\", \"f5a\", \"f5b\", \"f5c\", \"f5d\", \"f5e\", \"f5f\", \"f61\", \"f62\", \"f67\", \"f6a\", \"f6b\", \"f6c\", \"f6d\", \"f6f\",\n \"f70\", \"f72\", \"f74\", \"f75\", \"f76\", \"f77\", \"f7b\", \"f7c\", \"f7e\", \"f80\", \"f81\", \"f85\", \"f8a\", \"f8b\", \"f8c\", \"f8d\",\n \"f8e\", \"f92\", \"f93\", \"f94\", \"f95\", \"f97\", \"f98\", \"f99\", \"f9a\", \"f9b\", \"f9c\", \"f9d\", \"f9f\", \"fa0\", \"fa2\", \"fa3\",\n \"fa6\", \"fa8\", \"fa9\", \"fab\", \"fac\", \"fae\", \"fb0\", \"fb2\", \"fb3\", \"fb4\", \"fb5\", \"fb6\", \"fb7\", \"fb8\", \"fbc\", \"fbd\",\n \"fbe\", \"fc0\", \"fc2\", \"fc3\", \"fc5\", \"fc6\", \"fc9\", \"fca\", \"fcb\", \"fcc\", \"fcd\", \"fce\", \"fd0\", \"fd1\", \"fd3\", \"fd4\",\n \"fd5\", \"fda\", \"fdb\", \"fdc\", \"fdd\", \"fde\", \"fe0\", \"fe1\", \"fe3\", \"fe4\", \"fe5\", \"fe6\", \"fe7\", \"fed\", \"fee\", \"fef\",\n \"ff0\", \"ff1\", \"ff2\", \"ff3\", \"ff5\", \"ff6\", \"ff8\", \"ff9\", \"ffa\", \"ffb\", \"ffc\", \"fff\"\n ]\n }\n ]\n}\n
Note that
ideal_weight
andmax_num_bytes
are provided for courtesy and mean nothing fixed. Each storage location might store anything, thumbnails or files or nothing, regardless of the ideal situation. Whenever a folder is non-ideal, the 'move media files' dialog shows \"files need to be moved now\", but it will still keep doing its thing.For now, a prefix only occurs in one location, so there will always be 512 total prefixes in this response, all unique. However, please note that this will not always be true! In a future expansion, the client will be, on user command, slowly migrating files from one place to another in the background, and during that time there will be multiple valid locations for a file to actually be. When this happens, you will have to hit all the possible locations and test.
Also, it won't be long before the client supports moving to some form of three- and four-character prefix. I am still thinking how this will happen other than it will be an atomic change--no slow migration where we try to support both at once--but it will certainly complicate something in here (e.g. while the prefix may be 'f012', maybe the subfolder will be '\\f01\\2'), so we'll see.
"},{"location":"developer_api.html#get_files_render","title":"GET/get_files/render
","text":"Get an image or ugoira file as rendered by Hydrus.
Restricted access: YES. Search for Files permission needed. Additional search permission limits may apply.Required Headers: n/a
Arguments :file_id
: (selective, numerical file id for the file)hash
: (selective, a hexadecimal SHA256 hash for the file)download
: (optional, boolean, defaultfalse
)render_format
: (optional, integer, the filetype enum value to render the file to, for still images it defaults2
for PNG, for Ugoiras it defaults to23
for APNG)render_quality
: (optional, integer, the quality or PNG compression level to use for encoding the image, default1
for PNG and80
for JPEG and WEBP, has no effect for Ugoiras using APNG)width
andheight
: (optional but must provide both if used, integer, the width and height to scale the image to. Doesn't apply to Ugoiras)
Only use one of file_id or hash. As with metadata fetching, you may only use the hash argument if you have access to all files. If you are tag-restricted, you will have to use a file_id in the last search you ran.
Currently the accepted values for
render_format
for image files are:1
for JPEG (quality
sets JPEG quality 0 to 100, always progressive 4:2:0 encoding)2
for PNG (quality
sets the compression level from 0 to 9. A higher value means a smaller size and longer compression time)33
for WEBP (quality
sets WEBP quality 1 to 100, for values over 100 lossless compression is used)
The accepted values for Ugoiras are:
23
for APNG (quality
does nothing for this format)83
for animated WEBP (quality
sets WEBP quality 1 to 100, for values over 100 lossless compression is used)
The file you request must be a still image file that Hydrus can render (this includes PSD files) or a Ugoira file. This request uses the client image cache for images.
Example request
Example request/get_files/render?file_id=452158\n
Response: A PNG (or APNG), JPEG, or WEBP file of the image as would be rendered in the client, optionally resized as specified in the query parameters. It will be converted to sRGB color if the file had a color profile but the rendered file will not have any color profile./get_files/render?hash=7f30c113810985b69014957c93bc25e8eb4cf3355dae36d8b9d011d8b0cf623a&download=true\n
By default, this will set the
"},{"location":"developer_api.html#managing_file_relationships","title":"Managing File Relationships","text":"Content-Disposition
header toinline
, which causes a web browser to show the file. If you setdownload=true
, it will set it toattachment
, which triggers the browser to automatically download it (or open the 'save as' dialog) instead.This refers to the File Relationships system, which includes 'potential duplicates', 'duplicates', and 'alternates'.
This system is pending significant rework and expansion, so please do not get too married to some of the routines here. I am mostly just exposing my internal commands, so things are a little ugly/hacked. I expect duplicate and alternate groups to get some form of official identifier in future, which may end up being the way to refer and edit things here.
Also, at least for now, 'Manage File Relationships' permission is not going to be bound by the search permission restrictions that normal file search does. Getting this file relationship management permission allows you to search anything.
There is more work to do here, including adding various 'dissolve'/'undo' commands to break groups apart.
"},{"location":"developer_api.html#manage_file_relationships_get_file_relationships","title":"GET/manage_file_relationships/get_file_relationships
","text":"Get the current relationships for one or more files.
Restricted access: YES. Manage File Relationships permission needed.Required Headers: n/a
Arguments (in percent-encoded JSON):- files
- file domain (optional, defaults to all my files)
Response: A JSON Object mapping the hashes to their relationships. Example response/manage_file_relationships/get_file_relationships?hash=ac940bb9026c430ea9530b4f4f6980a12d9432c2af8d9d39dfc67b05d91df11d\n
{\n \"file_relationships\" : {\n \"ac940bb9026c430ea9530b4f4f6980a12d9432c2af8d9d39dfc67b05d91df11d\" : {\n \"is_king\" : false,\n \"king\" : \"8784afbfd8b59de3dcf2c13dc1be9d7cb0b3d376803c8a7a8b710c7c191bb657\",\n \"king_is_on_file_domain\" : true,\n \"king_is_local\" : true,\n \"0\" : [\n ],\n \"1\" : [],\n \"3\" : [\n \"8bf267c4c021ae4fd7c4b90b0a381044539519f80d148359b0ce61ce1684fefe\"\n ],\n \"8\" : [\n \"8784afbfd8b59de3dcf2c13dc1be9d7cb0b3d376803c8a7a8b710c7c191bb657\",\n \"3fa8ef54811ec8c2d1892f4f08da01e7fc17eed863acae897eb30461b051d5c3\"\n ]\n }\n }\n}\n
king
refers to which file is set as the best of a duplicate group. If you are doing potential duplicate comparisons, the kings of your two groups are usually the ideal representatives, and the 'get some pairs to filter'-style commands try to select the kings of the various to-be-compared duplicate groups.is_king
is a convenience bool for when a file is king of its own group.It is possible for the king to not be available. Every group has a king, but if that file has been deleted, or if the file domain here is limited and the king is on a different file service, then it may not be available. A similar issue occurs when you search for filtering pairs--while it is ideal to compare kings with kings, if you set 'files must be pixel dupes', then the user will expect to see those pixel duplicates, not their champions--you may be forced to compare non-kings.
king_is_on_file_domain
lets you know if the king is on the file domain you set, andking_is_local
lets you know if it is on the hard disk--ifking_is_local=true
, you can do a/get_files/file
request on it. It is generally rare, but you have to deal with the king being unavailable--in this situation, your best bet is to just use the file itself as its own representative.All the relationships you get are filtered by the file domain. If you set the file domain to 'all known files', you will get every relationship a file has, including all deleted files, which is often less useful than you would think. The default, 'all my files', is usually most useful.
A file that has no duplicates is considered to be in a duplicate group of size 1 and thus is always its own king.
The numbers are from a duplicate status enum, as so:
- 0 - potential duplicates
- 1 - false positives
- 3 - alternates
- 8 - duplicates
Note that because of JSON constraints, these are the string versions of the integers since they are Object keys.
All the hashes given here are in 'all my files', i.e. not in the trash. A file may have duplicates that have long been deleted, but, like the null king above, they will not show here.
"},{"location":"developer_api.html#manage_file_relationships_get_potentials_count","title":"GET/manage_file_relationships/get_potentials_count
","text":"Get the count of remaining potential duplicate pairs in a particular search domain. Exactly the same as the counts you see in the duplicate processing page.
Restricted access: YES. Manage File Relationships permission needed.Required Headers: n/a
Arguments (in percent-encoded JSON):- file domain (optional, defaults to all my files)
tag_service_key_1
: (optional, default 'all known tags', a hex tag service key)tags_1
: (optional, default system:everything, a list of tags you wish to search for)tag_service_key_2
: (optional, default 'all known tags', a hex tag service key)tags_2
: (optional, default system:everything, a list of tags you wish to search for)potentials_search_type
: (optional, integer, default 0, regarding how the pairs should match the search(es))pixel_duplicates
: (optional, integer, default 1, regarding whether the pairs should be pixel duplicates)max_hamming_distance
: (optional, integer, default 4, the max 'search distance' of the pairs)
/manage_file_relationships/get_potentials_count?tag_service_key_1=c1ba23c60cda1051349647a151321d43ef5894aacdfb4b4e333d6c4259d56c5f&tags_1=%5B%22dupes_to_process%22%2C%20%22system%3Awidth%3C400%22%5D&potentials_search_type=1&pixel_duplicates=2&max_hamming_distance=0&max_num_pairs=50\n
tag_service_key_x
andtags_x
work the same as /get_files/search_files. The_2
variants are only useful if thepotentials_search_type
is 2.potentials_search_type
andpixel_duplicates
are enums:- 0 - one file matches search 1
- 1 - both files match search 1
- 2 - one file matches search 1, the other 2
-and-
- 0 - must be pixel duplicates
- 1 - can be pixel duplicates
- 2 - must not be pixel duplicates
The
Response: A JSON Object stating the count. Example responsemax_hamming_distance
is the same 'search distance' you see in the Client UI. A higher number means more speculative 'similar files' search. Ifpixel_duplicates
is set to 'must be', thenmax_hamming_distance
is obviously ignored.{\n \"potential_duplicates_count\" : 17\n}\n
If you confirm that a pair of potentials are duplicates, this may transitively collapse other potential pairs and decrease the count by more than 1.
"},{"location":"developer_api.html#manage_file_relationships_get_potential_pairs","title":"GET/manage_file_relationships/get_potential_pairs
","text":"Get some potential duplicate pairs for a filtering workflow. Exactly the same as the 'duplicate filter' in the duplicate processing page.
Restricted access: YES. Manage File Relationships permission needed.Required Headers: n/a
Arguments (in percent-encoded JSON):- file domain (optional, defaults to all my files)
tag_service_key_1
: (optional, default 'all known tags', a hex tag service key)tags_1
: (optional, default system:everything, a list of tags you wish to search for)tag_service_key_2
: (optional, default 'all known tags', a hex tag service key)tags_2
: (optional, default system:everything, a list of tags you wish to search for)potentials_search_type
: (optional, integer, default 0, regarding how the pairs should match the search(es))pixel_duplicates
: (optional, integer, default 1, regarding whether the pairs should be pixel duplicates)max_hamming_distance
: (optional, integer, default 4, the max 'search distance' of the pairs)max_num_pairs
: (optional, integer, defaults to client's option, how many pairs to get in a batch)
/manage_file_relationships/get_potential_pairs?tag_service_key_1=c1ba23c60cda1051349647a151321d43ef5894aacdfb4b4e333d6c4259d56c5f&tags_1=%5B%22dupes_to_process%22%2C%20%22system%3Awidth%3C400%22%5D&potentials_search_type=1&pixel_duplicates=2&max_hamming_distance=0&max_num_pairs=50\n
The search arguments work the same as /manage_file_relationships/get_potentials_count.
Response: A JSON Object listing a batch of hash pairs. Example responsemax_num_pairs
is simple and just caps how many pairs you get.{\n \"potential_duplicate_pairs\" : [\n [ \"16470d6e73298cd75d9c7e8e2004810e047664679a660a9a3ba870b0fa3433d3\", \"7ed062dc76265d25abeee5425a859cfdf7ab26fd291f50b8de7ca381e04db079\" ],\n [ \"eeea390357f259b460219d9589b4fa11e326403208097b1a1fbe63653397b210\", \"9215dfd39667c273ddfae2b73d90106b11abd5fd3cbadcc2afefa526bb226608\" ],\n [ \"a1ea7d671245a3ae35932c603d4f3f85b0d0d40c5b70ffd78519e71945031788\", \"8e9592b2dfb436fe0a8e5fa15de26a34a6dfe4bca9d4363826fac367a9709b25\" ]\n ]\n}\n
The selected pair sample and their order is strictly hardcoded for now (e.g. to guarantee that a decision will not invalidate any other pair in the batch, you shouldn't see the same file twice in a batch, nor two files in the same duplicate group). Treat it as the client filter does, where you fetch batches to process one after another. I expect to make it more flexible in future, in the client itself and here.
You will see significantly fewer than
"},{"location":"developer_api.html#manage_file_relationships_get_random_potentials","title":"GETmax_num_pairs
(and potential duplicate count) as you close to the last available pairs, and when there are none left, you will get an empty list./manage_file_relationships/get_random_potentials
","text":"Get some random potentially duplicate file hashes. Exactly the same as the 'show some random potential dupes' button in the duplicate processing page.
Restricted access: YES. Manage File Relationships permission needed.Required Headers: n/a
Arguments (in percent-encoded JSON):- file domain (optional, defaults to all my files)
tag_service_key_1
: (optional, default 'all known tags', a hex tag service key)tags_1
: (optional, default system:everything, a list of tags you wish to search for)tag_service_key_2
: (optional, default 'all known tags', a hex tag service key)tags_2
: (optional, default system:everything, a list of tags you wish to search for)potentials_search_type
: (optional, integer, default 0, regarding how the files should match the search(es))pixel_duplicates
: (optional, integer, default 1, regarding whether the files should be pixel duplicates)max_hamming_distance
: (optional, integer, default 4, the max 'search distance' of the files)
/manage_file_relationships/get_random_potentials?tag_service_key_1=c1ba23c60cda1051349647a151321d43ef5894aacdfb4b4e333d6c4259d56c5f&tags_1=%5B%22dupes_to_process%22%2C%20%22system%3Awidth%3C400%22%5D&potentials_search_type=1&pixel_duplicates=2&max_hamming_distance=0\n
The arguments work the same as /manage_file_relationships/get_potentials_count, with the caveat that
potentials_search_type
has special logic:- 0 - first file matches search 1
- 1 - all files match search 1
- 2 - first file matches search 1, the others 2
Essentially, the first hash is the 'master' to which the others are paired. The other files will include every matching file.
Response: A JSON Object listing a group of hashes exactly as the client would. Example response{\n \"random_potential_duplicate_hashes\" : [\n \"16470d6e73298cd75d9c7e8e2004810e047664679a660a9a3ba870b0fa3433d3\",\n \"7ed062dc76265d25abeee5425a859cfdf7ab26fd291f50b8de7ca381e04db079\",\n \"9e0d6b928b726562d70e1f14a7b506ba987c6f9b7f2d2e723809bb11494c73e6\",\n \"9e01744819b5ff2a84dda321e3f1a326f40d0e7f037408ded9f18a11ee2b2da8\"\n ]\n}\n
If there are no potential duplicate groups in the search, this returns an empty list.
"},{"location":"developer_api.html#manage_file_relationships_remove_potentials","title":"POST/manage_file_relationships/remove_potentials
","text":"Remove all potential pairs that any of the given files are a part of. If you hit /manage_file_relationships/get_file_relationships after this on any of these files, they will have no potential relationships, and any hashes that were potential to them before will no longer, conversely, refer to these files as potentials.
Restricted access: YES. Manage File Relationships permission needed. Required Headers:Content-Type
: application/json
- files
Response: 200 with no content.{\n \"file_id\" : 123\n}\n
If the files are a part of any potential pairs (with any files, including those you did not specify), those pairs will be deleted. This deletes everything they are involved in, and the files will not be queued up for a re-scan, so I recommend you only do this if you know you added the potentials yourself (e.g. this is regarding video files) or you otherwise have a plan to replace the deleted potential pairs with something more useful.
"},{"location":"developer_api.html#manage_file_relationships_set_file_relationships","title":"POST/manage_file_relationships/set_file_relationships
","text":"Set the relationships to the specified file pairs.
Restricted access: YES. Manage File Relationships permission needed. Required Headers:Content-Type
: application/json
relationships
: (a list of Objects, one for each file-pair being set)
Each Object is:
* `hash_a`: (a hexadecimal SHA256 hash)\n* `hash_b`: (a hexadecimal SHA256 hash)\n* `relationship`: (integer enum for the relationship being set)\n* `do_default_content_merge`: (bool)\n* `delete_a`: (optional, bool, default false)\n* `delete_b`: (optional, bool, default false)\n
hash_a
andhash_b
are normal hex SHA256 hashes for your file pair.relationship
is one of this enum:- 0 - set as potential duplicates
- 1 - set as false positives
- 2 - set as same quality
- 3 - set as alternates
- 4 - set A as better
- 7 - set B as better
2, 4, and 7 all make the files 'duplicates' (8 under
/get_file_relationships
), which, specifically, merges the two files' duplicate groups. 'same quality' has different duplicate content merge options to the better/worse choices, but it ultimately sets something similar to A>B (but see below for more complicated outcomes). You obviously don't have to use 'B is better' if you prefer just to swap the hashes. Do what works for you.do_default_content_merge
sets whether the user's duplicate content merge options should be loaded and applied to the files along with the relationship. Most operations in the client do this automatically, so the user may expect it to apply, but if you want to do content merge yourself, set this to false.
Example request bodydelete_a
anddelete_b
are booleans that select whether to delete A and/or B in the same operation as setting the relationship. You can also do this externally if you prefer.
Response: 200 with no content.{\n \"relationships\" : [\n {\n \"hash_a\" : \"b54d09218e0d6efc964b78b070620a1fa19c7e069672b4c6313cee2c9b0623f2\",\n \"hash_b\" : \"bbaa9876dab238dcf5799bfd8319ed0bab805e844f45cf0de33f40697b11a845\",\n \"relationship\" : 4,\n \"do_default_content_merge\" : true,\n \"delete_b\" : true\n },\n {\n \"hash_a\" : \"22667427eaa221e2bd7ef405e1d2983846c863d40b2999ce8d1bf5f0c18f5fb2\",\n \"hash_b\" : \"65d228adfa722f3cd0363853a191898abe8bf92d9a514c6c7f3c89cfed0bf423\",\n \"relationship\" : 4,\n \"do_default_content_merge\" : true,\n \"delete_b\" : true\n },\n {\n \"hash_a\" : \"0480513ffec391b77ad8c4e57fe80e5b710adfa3cb6af19b02a0bd7920f2d3ec\",\n \"hash_b\" : \"5fab162576617b5c3fc8caabea53ce3ab1a3c8e0a16c16ae7b4e4a21eab168a7\",\n \"relationship\" : 2,\n \"do_default_content_merge\" : true\n }\n ]\n}\n
If you try to add an invalid or redundant relationship, for instance setting files that are already duplicates as potential duplicates, no changes are made.
This is the file relationships request that is probably most likely to change in future. I may implement content merge options. I may move from file pairs to group identifiers. When I expand alternates, those file groups are going to support more variables.
"},{"location":"developer_api.html#king_merge_rules","title":"king merge rules","text":"Recall in
/get_file_relationships
that we discussed how duplicate groups have a 'king' for their best file. This file is the most useful representative when you do comparisons, since if you say \"King A > King B\", then we know that King A is also better than all of King B's normal duplicate group members. We can merge the group simply just by folding King B and all the other members into King A's group.So what happens if you say 'A = B'? We have to have a king, so which should it be?
What happens if you say \"non-king member of A > non-king member of B\"? We don't want to merge all of B into A, since King B might be higher quality than King A.
The logic here can get tricky, but I have tried my best to avoid overcommitting and accidentally promoting the wrong king. Here are all the possible situations ('>' means 'better than', and '=' means 'same quality as'):
Merges- King A > King B
- Merge B into A
- King A is king of the new combined group
- Non-King A > King B
- Merge B into A
- King of A is king of the new combined group
- King A > Non-King B
- Remove Non-King B from B and merge it into A
- King A stays king of A
- King of B stays king of B
- Non-King A > Non-King B
- Remove Non-King B from B and merge it into A
- King of A stays king of A
- King of B stays king of B
- King A = King B
- Merge B into A
- King A is king of the new combined group
- Non-King A = King B
- Merge B into A
- King of A is king of the new combined group
- King A = Non-King B
- Merge A into B
- King of B is king of the new combined group
- Non-King A = Non-King B
- Remove Non-King B from B and merge it into A
- King of A stays king of A
- King of B stays king of B
So, if you can, always present kings to your users, and action using those kings' hashes. It makes the merge logic easier in all cases. Remember that you can set
"},{"location":"developer_api.html#manage_file_relationships_set_kings","title":"POSTsystem:is the best quality file of its duplicate group
in any file search to exclude any non-kings (e.g. if you are hunting for easily actionable pixel potential duplicates)./manage_file_relationships/set_kings
","text":"Set the specified files to be the kings of their duplicate groups.
Restricted access: YES. Manage File Relationships permission needed. Required Headers:Content-Type
: application/json
- files
Response: 200 with no content.{\n \"file_id\" : 123\n}\n
The files will be promoted to be the kings of their respective duplicate groups. If the file is already the king (also true for any file with no duplicates), this is idempotent. It also processes the files in the given order, so if you specify two files in the same group, the latter will be the king at the end of the request.
"},{"location":"developer_api.html#managing_services","title":"Managing Services","text":"For now, this refers to just seeing and committing pending content (which you see in the main \"pending\" menubar if you have an IPFS, Tag Repository, or File Repository service).
"},{"location":"developer_api.html#manage_services_get_pending_counts","title":"GET/manage_services/get_pending_counts
","text":"Get the counts of pending content for each upload-capable service. This basically lets you construct the \"pending\" menu in the main GUI menubar.
Restricted access: YES. Start Upload permission needed.Required Headers: n/a
Arguments: n/a
Example request
Response: A JSON Object of all the service keys capable of uploading and their current pending content counts. Also The Services Object. Example response/manage_services/get_pending_counts\n
{\n \"services\" : \"The Services Object\",\n \"pending_counts\" : {\n \"ae91919b0ea95c9e636f877f57a69728403b65098238c1a121e5ebf85df3b87e\" : {\n \"pending_tag_mappings\" : 11564,\n \"petitioned_tag_mappings\" : 5,\n \"pending_tag_siblings\" : 2,\n \"petitioned_tag_siblings\" : 0,\n \"pending_tag_parents\" : 0,\n \"petitioned_tag_parents\" : 0\n },\n \"3902aabc3c4c89d1b821eaa9c011be3047424fd2f0c086346e84794e08e136b0\" : {\n \"pending_tag_mappings\" : 0,\n \"petitioned_tag_mappings\" : 0,\n \"pending_tag_siblings\" : 0,\n \"petitioned_tag_siblings\" : 0,\n \"pending_tag_parents\" : 0,\n \"petitioned_tag_parents\" : 0\n },\n \"e06e1ae35e692d9fe2b83cde1510a11ecf495f51910d580681cd60e6f21fde73\" : {\n \"pending_files\" : 2,\n \"petitioned_files\" : 0\n }\n }\n}\n
The keys are as in /get_services.
Each count here represents one 'row' of content, so for \"tag_mappings\" that is one (tag, file) pair and for \"tag_siblings\" one (tag, tag) pair. You always get everything, even if the counts are all 0.
"},{"location":"developer_api.html#manage_services_commit_pending","title":"POST/manage_services/commit_pending
","text":"Start the job to upload a service's pending content.
Restricted access: YES. Start Upload permission needed.Required Headers: n/a
Arguments (in JSON):service_key
: (the service to commit)
{\n \"service_key\" : \"ae91919b0ea95c9e636f877f57a69728403b65098238c1a121e5ebf85df3b87e\"\n}\n
This starts the upload popup, just like if you click 'commit' in the menu. This upload could ultimately take one second or several minutes to finish, but the response will come back immediately.
If the job is already running, this will return 409. If it cannot start because of a difficult problem, like all repositories being paused or the service account object being unsynced or something, it gives 422; in this case, please direct the user to check their client manually, since there is probably an error popup on screen.
If tracking the upload job's progress is important, you could hit it again and see if it gives 409, or you could /get_pending_counts again--since the counts will update live as the upload happens--but note that the user may pend more just after the upload is complete, so do not wait forever for it to fall back down to 0.
"},{"location":"developer_api.html#manage_services_forget_pending","title":"POST/manage_services/forget_pending
","text":"Forget all pending content for a service.
Restricted access: YES. Start Upload permission needed.Required Headers: n/a
Arguments (in JSON):service_key
: (the service to forget for)
{\n \"service_key\" : \"ae91919b0ea95c9e636f877f57a69728403b65098238c1a121e5ebf85df3b87e\"\n}\n
This clears all pending content for a service, just like if you click 'forget' in the menu.
Response description: 200 and no content."},{"location":"developer_api.html#managing_cookies","title":"Managing Cookies","text":"This refers to the cookies held in the client's session manager, which you can review under network->data->manage session cookies. These are sent to every request on the respective domains.
"},{"location":"developer_api.html#manage_cookies_get_cookies","title":"GET/manage_cookies/get_cookies
","text":"Get the cookies for a particular domain.
Restricted access: YES. Manage Cookies and Headers permission needed.Required Headers: n/a
Arguments:domain
Response: A JSON Object listing all the cookies for that domain in [ name, value, domain, path, expires ] format. Example response/manage_cookies/get_cookies?domain=gelbooru.com\n
{\n \"cookies\" : [\n [\"__cfduid\", \"f1bef65041e54e93110a883360bc7e71\", \".gelbooru.com\", \"/\", 1596223327],\n [\"pass_hash\", \"0b0833b797f108e340b315bc5463c324\", \"gelbooru.com\", \"/\", 1585855361],\n [\"user_id\", \"123456\", \"gelbooru.com\", \"/\", 1585855361]\n ]\n}\n
"},{"location":"developer_api.html#manage_cookies_set_cookies","title":"POSTNote that these variables are all strings except 'expires', which is either an integer timestamp or _null_ for session cookies.\n\nThis request will also return any cookies for subdomains. The session system in hydrus generally stores cookies according to the second-level domain, so if you request for specific.someoverbooru.net, you will still get the cookies for someoverbooru.net and all its subdomains.\n
/manage_cookies/set_cookies
","text":"Set some new cookies for the client. This makes it easier to 'copy' a login from a web browser or similar to hydrus if hydrus's login system can't handle the site yet.
Restricted access: YES. Manage Cookies and Headers permission needed. Required Headers:Content-Type
: application/json
cookies
: (a list of cookie rows in the same format as the GET request above)
{\n \"cookies\" : [\n [\"PHPSESSID\", \"07669eb2a1a6e840e498bb6e0799f3fb\", \".somesite.com\", \"/\", 1627327719],\n [\"tag_filter\", \"1\", \".somesite.com\", \"/\", 1627327719]\n ]\n}\n
You can set 'value' to be null, which will clear any existing cookie with the corresponding name, domain, and path (acting essentially as a delete).
Expires can be null, but session cookies will time-out in hydrus after 60 minutes of non-use.
"},{"location":"developer_api.html#managing_http_headers","title":"Managing HTTP Headers","text":"This refers to the custom headers you can see under network->data->manage http headers.
"},{"location":"developer_api.html#manage_headers_get_headers","title":"GET/manage_headers/get_headers
","text":"Get the custom http headers.
Restricted access: YES. Manage Cookies and Headers permission needed.Required Headers: n/a
Arguments:domain
: optional, the domain to fetch headers for
Example request (for global)/manage_headers/get_headers?domain=gelbooru.com\n
Response: A JSON Object listing all the headers: Example response/manage_headers/get_headers\n
"},{"location":"developer_api.html#manage_headers_set_headers","title":"POST{\n \"network_context\" : {\n \"type\" : 2,\n \"data\" : \"gelbooru.com\"\n },\n \"headers\" : {\n \"User-Agent\" : {\n \"value\" : \"Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:56.0) Gecko/20100101 Firefox/56.0\",\n \"approved\" : \"approved\",\n \"reason\" : \"Set by Client API\"\n },\n \"DNT\" : {\n \"value\" : \"1\",\n \"approved\" : \"approved\",\n \"reason\" : \"Set by Client API\"\n }\n }\n}\n
/manage_headers/set_headers
","text":"Manages the custom http headers.
Restricted access: YES. Manage Cookies and Headers permission needed. Required Headers: *Content-Type
: application/json Arguments (in JSON):domain
: (optional, the specific domain to set the header for)headers
: (a JSON Object that holds \"key\" objects)
Example request body that deletes{\n \"domain\" : \"mysite.com\",\n \"headers\" : {\n \"User-Agent\" : {\n \"value\" : \"Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:56.0) Gecko/20100101 Firefox/56.0\"\n },\n \"DNT\" : {\n \"value\" : \"1\"\n },\n \"CoolStuffToken\" : {\n \"value\" : \"abcdef0123456789\",\n \"approved\" : \"pending\",\n \"reason\" : \"This unlocks the Sonic fanfiction!\"\n }\n }\n}\n
{\n \"domain\" : \"myothersite.com\",\n \"headers\" : {\n \"User-Agent\" : {\n \"value\" : null\n },\n \"Authorization\" : {\n \"value\" : null\n }\n }\n}\n
If you do not set a domain, or you set it to
null
, the 'context' will be the global context, which applies as a fallback to all jobs.Domain headers also apply to their subdomains--unless they are overwritten by specific subdomain entries.
Each
key
Object underheaders
has the same form as /manage_headers/get_headers.value
is obvious--it is the value of the header. If the pair doesn't exist yet, you need thevalue
, but if you just want to approve something, it is optional. Set it tonull
to delete an existing pair.You probably won't ever use
approved
orreason
, but they plug into the 'validation' system in the client. They are both optional. Approved can be any of[ approved, denied, pending ]
, and by default everything you add will beapproved
. If there is anythingpending
when a network job asks, the user will be presented with a yes/no popup presenting the reason for the header. If they click 'no', the header is set todenied
and the network job goes ahead without it. If you have a header that changes behaviour or unlocks special content, you might like to make it optional in this way.If you need to reinstate it, the default
"},{"location":"developer_api.html#manage_headers_set_user_agent","title":"POSTglobal
User-Agent
isMozilla/5.0 (compatible; Hydrus Client)
./manage_headers/set_user_agent
","text":"This is deprecated--move to /manage_headers/set_headers!
This sets the 'Global' User-Agent for the client, as typically editable under network->data->manage http headers, for instance if you want hydrus to appear as a specific browser associated with some cookies.
Restricted access: YES. Manage Cookies and Headers permission needed. Required Headers: *Content-Type
: application/json Arguments (in JSON):user-agent
: (a string)
{\n \"user-agent\" : \"Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:56.0) Gecko/20100101 Firefox/56.0\"\n}\n
Send an empty string to reset the client back to the default User-Agent, which should be
"},{"location":"developer_api.html#managing_pages","title":"Managing Pages","text":"Mozilla/5.0 (compatible; Hydrus Client)
.This refers to the pages of the main client UI.
"},{"location":"developer_api.html#manage_pages_get_pages","title":"GET/manage_pages/get_pages
","text":"Get the page structure of the current UI session.
Restricted access: YES. Manage Pages permission needed.Required Headers: n/a
Arguments: n/a
Response: A JSON Object of the top-level page 'notebook' (page of pages) detailing its basic information and current sub-pages. Page of pages beneath it will list their own sub-page lists. Example response{\n \"pages\" : {\n \"name\" : \"top pages notebook\",\n \"page_key\" : \"3b28d8a59ec61834325eb6275d9df012860a1ecfd9e1246423059bc47fb6d5bd\",\n \"page_state\" : 0,\n \"page_type\" : 10,\n \"is_media_page\" : false,\n \"selected\" : true,\n \"pages\" : [\n {\n \"name\" : \"files\",\n \"page_key\" : \"d436ff5109215199913705eb9a7669d8a6b67c52e41c3b42904db083255ca84d\",\n \"page_state\" : 0,\n \"page_type\" : 6,\n \"is_media_page\" : true,\n \"selected\" : false\n },\n {\n \"name\" : \"thread watcher\",\n \"page_key\" : \"40887fa327edca01e1d69b533dddba4681b2c43e0b4ebee0576177852e8c32e7\",\n \"page_state\" : 0,\n \"page_type\" : 9,\n \"is_media_page\" : true,\n \"selected\" : false\n },\n {\n \"name\" : \"pages\",\n \"page_key\" : \"2ee7fa4058e1e23f2bd9e915cdf9347ae90902a8622d6559ba019a83a785c4dc\",\n \"page_state\" : 0,\n \"page_type\" : 10,\n \"is_media_page\" : false,\n \"selected\" : true,\n \"pages\" : [\n {\n \"name\" : \"urls\",\n \"page_key\" : \"9fe22cb760d9ee6de32575ed9f27b76b4c215179cf843d3f9044efeeca98411f\",\n \"page_state\" : 0,\n \"page_type\" : 7,\n \"is_media_page\" : true,\n \"selected\" : true\n },\n {\n \"name\" : \"files\",\n \"page_key\" : \"2977d57fc9c588be783727bcd54225d577b44e8aa2f91e365a3eb3c3f580dc4e\",\n \"page_state\" : 0,\n \"page_type\" : 6,\n \"is_media_page\" : true,\n \"selected\" : false\n }\n ]\n }\n ]\n }\n}\n
name
is the full text on the page tab.page_key
is a unique identifier for the page. It will stay the same for a particular page throughout the session, but new ones are generated on a session reload.page_type
is as follows:- 1 - Gallery downloader
- 2 - Simple downloader
- 3 - Hard drive import
- 5 - Petitions (used by repository janitors)
- 6 - File search
- 7 - URL downloader
- 8 - Duplicates
- 9 - Thread watcher
- 10 - Page of pages
page_state
is as follows:- 0 - ready
- 1 - initialising
- 2 - searching/loading
- 3 - search cancelled
Most pages will be 0, normal/ready, at all times. Large pages will start in an 'initialising' state for a few seconds, which means their session-saved thumbnails aren't loaded yet. Search pages will enter 'searching' after a refresh or search change and will either return to 'ready' when the search is complete, or fall to 'search cancelled' if the search was interrupted (usually this means the user clicked the 'stop' button that appears after some time).
is_media_page
is simply a shorthand for whether the page is a normal page that holds thumbnails or a 'page of pages'. Only media pages can have files (and accept /manage_files/add_files commands).selected
means which page is currently in view. It will propagate down the page of pages until it terminates. It may terminate in an empty page of pages, so do not assume it will end on a media page.The top page of pages will always be there, and always selected.
"},{"location":"developer_api.html#manage_pages_get_page_info","title":"GET/manage_pages/get_page_info
","text":"Get information about a specific page.
Under Construction
This is under construction. The current call dumps a ton of info for different downloader pages. Please experiment in IRL situations and give feedback for now! I will flesh out this help with more enumeration info and examples as this gets nailed down. POST commands to alter pages (adding, removing, highlighting), will come later.
Restricted access: YES. Manage Pages permission needed.Required Headers: n/a
Arguments:page_key
: (hexadecimal page_key as stated in /manage_pages/get_pages)simple
: true or false (optional, defaulting to true)
Response description A JSON Object of the page's information. At present, this mostly means downloader information. Example response with simple = true/manage_pages/get_page_info?page_key=aebbf4b594e6986bddf1eeb0b5846a1e6bc4e07088e517aff166f1aeb1c3c9da&simple=true\n
{\n \"page_info\" : {\n \"name\" : \"threads\",\n \"page_key\" : \"aebbf4b594e6986bddf1eeb0b5846a1e6bc4e07088e517aff166f1aeb1c3c9da\",\n \"page_state\" : 0,\n \"page_type\" : 3,\n \"is_media_page\" : true,\n \"management\" : {\n \"multiple_watcher_import\" : {\n \"watcher_imports\" : [\n {\n \"url\" : \"https://someimageboard.net/m/123456\",\n \"watcher_key\" : \"cf8c3525c57a46b0e5c2625812964364a2e801f8c49841c216b8f8d7a4d06d85\",\n \"created\" : 1566164269,\n \"last_check_time\" : 1566164272,\n \"next_check_time\" : 1566174272,\n \"files_paused\" : false,\n \"checking_paused\" : false,\n \"checking_status\" : 0,\n \"subject\" : \"gundam pictures\",\n \"imports\" : {\n \"status\" : \"4 successful (2 already in db)\",\n \"simple_status\" : \"4\",\n \"total_processed\" : 4,\n \"total_to_process\" : 4\n },\n \"gallery_log\" : {\n \"status\" : \"1 successful\",\n \"simple_status\" : \"1\",\n \"total_processed\" : 1,\n \"total_to_process\" : 1\n }\n },\n {\n \"url\" : \"https://someimageboard.net/a/1234\",\n \"watcher_key\" : \"6bc17555b76da5bde2dcceedc382cf7d23281aee6477c41b643cd144ec168510\",\n \"created\" : 1566063125,\n \"last_check_time\" : 1566063133,\n \"next_check_time\" : 1566104272,\n \"files_paused\" : false,\n \"checking_paused\" : true,\n \"checking_status\" : 1,\n \"subject\" : \"anime pictures\",\n \"imports\" : {\n \"status\" : \"124 successful (22 already in db), 2 previously deleted\",\n \"simple_status\" : \"124\",\n \"total_processed\" : 124,\n \"total_to_process\" : 124\n },\n \"gallery_log\" : {\n \"status\" : \"3 successful\",\n \"simple_status\" : \"3\",\n \"total_processed\" : 3,\n \"total_to_process\" : 3\n }\n }\n ]\n },\n \"highlight\" : \"cf8c3525c57a46b0e5c2625812964364a2e801f8c49841c216b8f8d7a4d06d85\"\n }\n },\n \"media\" : {\n \"num_files\" : 4\n }\n}\n
name
,page_key
,page_state
, andpage_type
are as in /manage_pages/get_pages.As you can see, even the 'simple' mode can get very large. Imagine that response for a page watching 100 threads! Turning simple mode off will display every import item, gallery log entry, and all hashes in the media (thumbnail) panel.
For this first version, the five importer pages--hdd import, simple downloader, url downloader, gallery page, and watcher page--all give rich info based on their specific variables. The first three only have one importer/gallery log combo, but the latter two of course can have multiple. The \"imports\" and \"gallery_log\" entries are all in the same data format.
"},{"location":"developer_api.html#manage_pages_add_files","title":"POST/manage_pages/add_files
","text":"Add files to a page.
Restricted access: YES. Manage Pages permission needed. Required Headers:Content-Type
: application/json
page_key
: (the page key for the page you wish to add files to)- files
The files you set will be appended to the given page, just like a thumbnail drag and drop operation. The page key is the same as fetched in the /manage_pages/get_pages call.
Example request body
Response: 200 with no content. If the page key is not found, it will 404. If you try to add files to a 'page of pages' (i.e.{\n \"page_key\" : \"af98318b6eece15fef3cf0378385ce759bfe056916f6e12157cd928eb56c1f18\",\n \"file_ids\" : [123, 124, 125]\n}\n
is_media_page=false
in the /manage_pages/get_pages call), you'll get 400."},{"location":"developer_api.html#manage_pages_focus_page","title":"POST/manage_pages/focus_page
","text":"'Show' a page in the main GUI, making it the current page in view. If it is already the current page, no change is made.
Restricted access: YES. Manage Pages permission needed. Required Headers:Content-Type
: application/json
page_key
: (the page key for the page you wish to show)
The page key is the same as fetched in the /manage_pages/get_pages call.
Example request body
Response: 200 with no content. If the page key is not found, this will 404."},{"location":"developer_api.html#manage_pages_refresh_page","title":"POST{\n \"page_key\" : \"af98318b6eece15fef3cf0378385ce759bfe056916f6e12157cd928eb56c1f18\"\n}\n
/manage_pages/refresh_page
","text":"Refresh a page in the main GUI. Like hitting F5 in the client, this obviously makes file search pages perform their search again, but for other page types it will force the currently in-view files to be re-sorted.
Restricted access: YES. Manage Pages permission needed. Required Headers:Content-Type
: application/json
page_key
: (the page key for the page you wish to refresh)
The page key is the same as fetched in the /manage_pages/get_pages call. If a file search page is not set to 'searching immediately', a 'refresh' command does nothing.
Example request body
Response: 200 with no content. If the page key is not found, this will 404.{\n \"page_key\" : \"af98318b6eece15fef3cf0378385ce759bfe056916f6e12157cd928eb56c1f18\"\n}\n
Poll the
"},{"location":"developer_api.html#managing_popups","title":"Managing Popups","text":"page_state
in /manage_pages/get_pages or /manage_pages/get_page_info to see when the search is complete.Under Construction
This is under construction. The popup managment APIs and data structures may change in future versions.
"},{"location":"developer_api.html#job_status_objects","title":"Job Status Objects","text":"Job statuses represent shared information about a job in hydrus. In the API they are currently only used for popups.
Job statuses have these fields:
key
: the generated hex key identifying the job statuscreation_time
: the UNIX timestamp when the job status was created, as a floating point number in seconds.status_title
: the title for the job statusstatus_text_1
andstatus_text_2
: Two fields for body texthad_error
: a boolean indiciating if the job status has an error.traceback
: if the job status has an error this will contain the traceback text.is_cancellable
: a boolean indicating the job can be canceled.is_cancelled
: a boolean indicating the job has been cancelled.is_deleted
: a boolean indicating the job status has been dismissed but not removed yet.is_pausable
: a boolean indicating the job can be pausedis_paused
: a boolean indicating the job is paused.is_working
: a boolean indicating whether the job is currently working.nice_string
: a string representing the job status. This is generated using thestatus_title
,status_text_1
,status_text_2
, andtraceback
if present.attached_files_mergable
: a boolean indicating whether the files in the job status can be merged with the files of another submitted job status with the same label.popup_gauge_1
andpopup_gauge_2
: each of these is a 2 item array of numbers representing a progress bar shown in the client. The first number is the current value and the second is the maximum of the range. The minimum is always 0. When using these in combination with thestatus_text
fields they are shown in this order:status_text_1
,popup_gauge_1
,status_text_2
,popup_gauge_2
.api_data
: an arbitrary object for use by API clients.files
: an object representing the files attached to this job status, shown as a button in the client that opens a search page for the given hashes. It has these fields:hashes
: an array of sha256 hashes.label
: the label for the show files button.
user_callable_label
: if the job status has a user callable function this will be the label for the button that triggers it.network_job
: An object represneting the current network job. It has these fields:url
: the url being downloaded.waiting_on_connection_error
: booleandomain_ok
: booleanwaiting_on_serverside_bandwidth
: booleanno_engine_yet
: booleanhas_error
: booleantotal_data_used
: integer number of bytesis_done
: booleanstatus_text
: stringcurrent_speed
: integer number of bytes per secondbytes_read
: integer number of bytesbytes_to_read
: integer number of bytes
All fields other than
"},{"location":"developer_api.html#manage_popups_get_popups","title":"GETkey
andcreation_time
are optional and will only be returned if they're set./manage_popups/get_popups
","text":"Get a list of popups from the client.
Restricted access: YES. Manage Popups permission needed.Required Headers: n/a
Arguments:only_in_view
: whether to show only the popups currently in view in the client, true or false (optional, defaulting to false)
job_statuses
which is a list of job status objects Example response
"},{"location":"developer_api.html#manage_popups_add_popuip","title":"POST{\n \"job_statuses\": [\n {\n \"key\": \"e57d42d53f957559ecaae3054417d28bfef3cd84bbced352be75dedbefb9a40e\",\n \"creation_time\": 1700348905.7647762,\n \"status_text_1\": \"This is a test popup message\",\n \"had_error\": false,\n \"is_cancellable\": false,\n \"is_cancelled\": false,\n \"is_done\": true,\n \"is_pausable\": false,\n \"is_paused\": false,\n \"is_working\": true,\n \"nice_string\": \"This is a test popup message\"\n },\n {\n \"key\": \"0d9e134fe0b30b05f39062b48bd60c35cb3bf3459c967d4cf95dde4d01bbc801\",\n \"creation_time\": 1700348905.7667763,\n \"status_title\": \"sub gap downloader test\",\n \"had_error\": false,\n \"is_cancellable\": false,\n \"is_cancelled\": false,\n \"is_done\": true,\n \"is_pausable\": false,\n \"is_paused\": false,\n \"is_working\": true,\n \"nice_string\": \"sub gap downloader test\",\n \"user_callable_label\": \"start a new downloader for this to fill in the gap!\"\n },\n {\n \"key\": \"d59173b59c96b841ab82a08a05556f04323f8446abbc294d5a35851fa01035e6\",\n \"creation_time\": 1700689162.6635988,\n \"status_text_1\": \"downloading files for \\\"elf\\\" (1/1)\",\n \"status_text_2\": \"file 4/27: downloading file\",\n \"status_title\": \"subscriptions - safebooru\",\n \"had_error\": false,\n \"is_cancellable\": true,\n \"is_cancelled\": false,\n \"is_done\": false,\n \"is_pausable\": false,\n \"is_paused\": false,\n \"is_working\": true,\n \"nice_string\": \"subscriptions - safebooru\\r\\ndownloading files for \\\"elf\\\" (1/1)\\r\\nfile 4/27: downloading file\",\n \"popup_gauge_2\": [\n 3,\n 27\n ],\n \"files\": {\n \"hashes\": [\n \"9b5485f83948bf369892dc1234c0a6eef31a6293df3566f3ee6034f2289fe984\",\n \"cd6ebafb8b39b3455fe382cba0daeefea87848950a6af7b3f000b05b43f2d4f2\",\n \"422cebabc95fabcc6d9a9488060ea88fd2f454e6eb799de8cafa9acd83595d0d\"\n ],\n \"label\": \"safebooru: elf\"\n },\n \"network_job\": {\n \"url\": \"https://safebooru.org//images/4425/17492ccf2fe97591e14531d4b070e922c70384c9.jpg\",\n \"waiting_on_connection_error\": false,\n \"domain_ok\": true,\n \"waiting_on_serverside_bandwidth\": false,\n \"no_engine_yet\": false,\n \"has_error\": false,\n \"total_data_used\": 2031616,\n \"is_done\": false,\n \"status_text\": \"downloading\u2026\",\n \"current_speed\": 2031616,\n \"bytes_read\": 2031616,\n \"bytes_to_read\": 3807369\n }\n }\n ]\n}\n
/manage_popups/add_popup
","text":"Add a popup.
Restricted access: YES. Manage Popups permission needed. Required Headers:Content-Type
: application/json
- it accepts these fields of a job status object:
is_cancellable
is_pausable
attached_files_mergable
status_title
status_text_1
andstatus_text_2
popup_gauge_1
andpopup_gauge_2
api_data
files_label
: the label for the files attached to the job status. It will be returned aslabel
in thefiles
object in the job status object.- files that will be added to the job status. They will be returned as
hashes
in thefiles
object in the job status object.files_label
is required to add files.
A new job status will be created and submitted as a popup. Set a
status_title
on bigger ongoing jobs that will take a while and receive many updates--and leave it alone, even when the job is done. For simple notes, just setstatus_text_1
.Finishing Jobs
The pausable, cancellable, and files-mergable status of a job is only settable at creation. A pausable or cancellable popup represents an ongoing and unfinished job. The popup will exist indefinitely and will not be user-dismissable unless the user can first cancel it.
You, as the creator, must plan to call Finish once your work is done. Yes, even if there is an error!
Pausing and Cancelling
If the user pauses a job, you should recognise that and pause your work. Resume when they do.
If the user cancels a job, you should recognise that and stop work. Either call
finish
with an appropriate status update, orfinish_and_dismiss
if you have nothing more to say.If your long-term job has a main loop, place this at the top of the loop, along with your status update calls.
Example request body
Example request body{\n \"status_text_1\": \"Note to user\"\n}\n
Response: A JSON Object containing{\n \"status_title\": \"Example Popup\",\n \"popup_gauge_1\": [35, 120],\n \"popup_gauge_2\": [9, 10],\n \"status_text_1\": \"Doing things\",\n \"status_text_2\": \"Doing other things\",\n \"is_cancellable\": true,\n \"api_data\": {\n \"whatever\": \"stuff\"\n },\n \"files_label\": \"test files\",\n \"hashes\": [\n \"ad6d3599a6c489a575eb19c026face97a9cd6579e74728b0ce94a601d232f3c3\",\n \"4b15a4a10ac1d6f3d143ba5a87f7353b90bb5567d65065a8ea5b211c217f77c6\"\n ]\n}\n
job_status
, the job status object that was added."},{"location":"developer_api.html#manage_popups_call_user_callable","title":"POST/manage_popups/call_user_callable
","text":"Call the user callable function of a popup.
Restricted access: YES. Manage Pages permission needed. Required Headers:Content-Type
: application/json
job_status_key
: The job status key to call the user callable of
The job status must have a user callable (the
Example request bodyuser_callable_label
in the job status object indicates this) to call it.
Response: 200 with no content."},{"location":"developer_api.html#manage_popups_cancel_popup","title":"POST{\n \"job_status_key\" : \"abee8b37d47dba8abf82638d4afb1d11586b9ef7be634aeb8ae3bcb8162b2c86\"\n}\n
/manage_popups/cancel_popup
","text":"Try to cancel a popup.
Restricted access: YES. Manage Popups permission needed. Required Headers:Content-Type
: application/json
job_status_key
: The job status key to cancel
The job status must be cancellable to be cancelled. If it isn't, this is nullipotent.
Example request body
Response: 200 with no content."},{"location":"developer_api.html#manage_popups_dismiss_popup","title":"POST{\n \"job_status_key\" : \"abee8b37d47dba8abf82638d4afb1d11586b9ef7be634aeb8ae3bcb8162b2c86\"\n}\n
/manage_popups/dismiss_popup
","text":"Try to dismiss a popup.
Restricted access: YES. Manage Popups permission needed. Required Headers:Content-Type
: application/json
job_status_key
: The job status key to dismiss
This is a call an 'observer' (i.e. not the job creator) makes. In the client UI, it would be a user right-clicking a popup to dismiss it. If the job is dismissable (i.e. it
is_done
), the popup disappears, but if it is pausable/cancellable--an ongoing job--then this action is nullipotent.You should call this on jobs you did not create yourself.
Example request body
Response: 200 with no content."},{"location":"developer_api.html#manage_popups_finish_popup","title":"POST{\n \"job_status_key\": \"abee8b37d47dba8abf82638d4afb1d11586b9ef7be634aeb8ae3bcb8162b2c86\"\n}\n
/manage_popups/finish_popup
","text":"Mark a popup as done.
Restricted access: YES. Manage Popups permission needed. Required Headers:Content-Type
: application/json
job_status_key
: The job status key to finish
Important
You may only call this on jobs you created yourself.
You only need to call it on jobs that you created pausable or cancellable. It clears those statuses, sets
is_done
, and allows the user to dismiss the job with a right-click.Once called, the popup will remain indefinitely. You should marry this call with an
Example request bodyupdate
that clears the texts and gauges you were using and leaves a \"Done, processed x files with y errors!\" or similar statement to let the user know how the job went.
Response: 200 with no content."},{"location":"developer_api.html#manage_popups_finish_and_dismiss_popup","title":"POST{\n \"job_status_key\" : \"abee8b37d47dba8abf82638d4afb1d11586b9ef7be634aeb8ae3bcb8162b2c86\"\n}\n
/manage_popups/finish_and_dismiss_popup
","text":"Finish and dismiss a popup.
Restricted access: YES. Manage Popups permission needed. Required Headers:Content-Type
: application/json
job_status_key
: The job status key to dismissseconds
: (optional) an integer number of seconds to wait before dismissing the job status, defaults to happening immediately
Important
You may only call this on jobs you created yourself.
This will call
finish
immediately and flag the message for auto-dismissal (i.e. removing it from the popup toaster) either immediately or after the given number of seconds.You would want this instead of just
Example request bodyfinish
for when you either do not need to leave a 'Done!' summary, or if the summary is not so important, and is only needed if the user happens to glance that way. If you did boring work for ten minutes, you might like to set a simple 'Done!' and auto-dismiss after thirty seconds or so.
Response: 200 with no content."},{"location":"developer_api.html#manage_popups_update_popuip","title":"POST{\n \"job_status_key\": \"abee8b37d47dba8abf82638d4afb1d11586b9ef7be634aeb8ae3bcb8162b2c86\",\n \"seconds\": 5\n}\n
/manage_popups/update_popup
","text":"Update a popup.
Restricted access: YES. Manage Popups permission needed. Required Headers:Content-Type
: application/json
job_status_key
: The hex key of the job status to update.- It accepts these fields of a job status object:
status_title
status_text_1
andstatus_text_2
popup_gauge_1
andpopup_gauge_2
api_data
files_label
: the label for the files attached to the job status. It will be returned aslabel
in thefiles
object in the job status object.- files that will be added to the job status. They will be returned as
hashes
in thefiles
object in the job status object.files_label
is required to add files.
The specified job status will be updated with the new values submitted. Any field without a value will be left alone and any field set to
Example request bodynull
will be removed from the job status.
Response: A JSON Object containing{\n \"job_status_key\": \"abee8b37d47dba8abf82638d4afb1d11586b9ef7be634aeb8ae3bcb8162b2c86\",\n \"status_title\": \"Example Popup\",\n \"status_text_1\": null,\n \"popup_gauge_1\": [12, 120],\n \"api_data\": {\n \"whatever\": \"other stuff\"\n }\n}\n
job_status
, the job status object that was updated."},{"location":"developer_api.html#managing_the_database","title":"Managing the Database","text":""},{"location":"developer_api.html#manage_database_lock_on","title":"POST/manage_database/lock_on
","text":"Pause the client's database activity and disconnect the current connection.
Restricted access: YES. Manage Database permission needed.Arguments: None
This is a hacky prototype. It commands the client database to pause its job queue and release its connection (and related file locks and journal files). This puts the client in a similar position as a long VACUUM command--it'll hang in there, but not much will work, and since the UI async code isn't great yet, the UI may lock up after a minute or two. If you would like to automate database backup without shutting the client down, this is the thing to play with.
This should return pretty quick, but it will wait up to five seconds for the database to actually disconnect. If there is a big job (like a VACUUM) current going on, it may take substantially longer to finish that up and process this STOP command. You might like to check for the existence of a journal file in the db dir just to be safe.
As long as this lock is on, all Client API calls except the unlock command will return 503. (This is a decent way to test the current lock status, too)
"},{"location":"developer_api.html#manage_database_lock_off","title":"POST/manage_database/lock_off
","text":"Reconnect the client's database and resume activity.
Restricted access: YES. Manage Database permission needed.Arguments: None
This is the obvious complement to the lock. The client will resume processing its job queue and will catch up. If the UI was frozen, it should free up in a few seconds, just like after a big VACUUM.
"},{"location":"developer_api.html#manage_database_mr_bones","title":"GET/manage_database/mr_bones
","text":"Get the data from help->how boned am I?. This is a simple Object of numbers just for hacky advanced purposes if you want to build up some stats in the background. The numbers are the same as the dialog shows, so double check that to confirm what means what.
Restricted access: YES. Manage Database permission needed. Arguments (in percent-encoded JSON):tags
: (optional, a list of tags you wish to search for)- file domain (optional, defaults to all my files)
tag_service_key
: (optional, hexadecimal, the tag domain on which to search, defaults to all my files)
Example response/manage_database/mr_bones\n/manage_database/mr_bones?tags=%5B%22blonde_hair%22%2C%20%22blue_eyes%22%5D\n
{\n \"boned_stats\" : {\n \"num_inbox\" : 8356,\n \"num_archive\" : 229,\n \"num_deleted\" : 7010,\n \"size_inbox\" : 7052596762,\n \"size_archive\" : 262911007,\n \"size_deleted\" : 13742290193,\n \"earliest_import_time\" : 1451408539,\n \"total_viewtime\" : [3280, 41621, 2932, 83021],\n \"total_alternate_files\" : 265,\n \"total_duplicate_files\" : 125,\n \"total_potential_pairs\" : 3252\n }\n}\n
The arguments here are the same as for GET /get_files/search_files. You can set any or none of them to set a search domain like in the dialog.
"},{"location":"developer_api.html#manage_database_get_client_options","title":"GET/manage_database/get_client_options
","text":"Unstable Response
The response for this path is unstable and subject to change without warning. No examples are given.
Gets the current options from the client.
Restricted access: YES. Manage Database permission needed.Required Headers: n/a
Arguments: n/a
Response: A JSON dump of nearly all options set in the client. The format of this is based on internal hydrus structures and is subject to change without warning with new hydrus versions. Do not rely on anything you find here to continue to exist and don't rely on the structure to be the same."},{"location":"docker.html","title":"Hydrus in a container(HiC)","text":"Latest hydrus client that runs in docker 24/7. Employs xvfb and vnc. Runs on alpine.
TL;DR:
docker run --name hydrusclient -d -p 5800:5800 -p 5900:5900 ghcr.io/hydrusnetwork/hydrus:latest
. Connect to noVNC viahttp://yourdockerhost:5800/vnc.html
or use Tiger VNC Viewer or any other VNC client and connect on port 5900.For persistent storage you can either create a named volume or mount a new/existing db path
"},{"location":"docker.html#the_container_will_not_fix_the_permissions_inside_the_db_folder_chown_your_db_folder_content_on_your_own","title":"The container will NOT fix the permissions inside the db folder. CHOWN YOUR DB FOLDER CONTENT ON YOUR OWN","text":"-v /hydrus/client/db:/opt/hydrus/db
. The client runs with default permissions of1000:1000
, this can be changed by the ENVUID
andGID
(not working atm, fixed to 1000) will be fixed someday\u2122.If you have enough RAM, mount
/tmp
as tmpfs. If not, download more RAM.As of
v359
hydrus understands IPFSnocopy
. And can be easily run with go-ipfs container. Read Hydrus IPFS help. MountHOST_PATH_DB/client_files
to/data/client_files
in ipfs. Go manage the ipfs service and set the path to/data/client_files
, you'll know where to put it in.Example compose file:
Further containerized application of interest:version: '3.8'\nvolumes:\n tor-config:\n driver: local\n hybooru-pg-data:\n driver: local\n hydrus-server:\n driver: local\n hydrus-client:\n driver: local\n ipfs-data:\n driver: local\n hydownloader-data:\n driver: local\nservices:\n hydrusclient:\n image: ghcr.io/hydrusnetwork/hydrus:latest\n container_name: hydrusclient\n restart: unless-stopped\n environment:\n - UID=1000\n - GID=1000\n volumes:\n - hydrus-client:/opt/hydrus/db\n tmpfs:\n - /tmp #optional for SPEEEEEEEEEEEEEEEEEEEEEEEEED and less disk access\n ports:\n - 5800:5800 #noVNC\n - 5900:5900 #VNC\n - 45868:45868 #Booru\n - 45869:45869 #API\n\n hydrusserver:\n image: ghcr.io/hydrusnetwork/hydrus:server\n container_name: hydrusserver\n restart: unless-stopped\n volumes:\n - hydrus-server:/opt/hydrus/db\n\n hydrusclient-ipfs:\n image: ipfs/go-ipfs\n container_name: hydrusclient-ipfs\n restart: unless-stopped\n volumes:\n - ipfs-data:/data/ipfs\n - hydrus-clients:/data/db:ro\n ports:\n - 4001:4001 # READ\n - 5001:5001 # THE\n - 8080:8080 # IPFS\n - 8081:8081 # DOCS\n\n hydrus-web:\n image: floogulinc/hydrus-web\n container_name: hydrus-web\n restart: always\n ports:\n - 8080:80 # READ\n\n hybooru-pg:\n image: healthcheck/postgres\n container_name: hybooru-pg\n environment:\n - POSTGRES_USER=hybooru\n - POSTGRES_PASSWORD=hybooru\n - POSTGRES_DB=hybooru\n volumes:\n - hybooru-pg-data:/var/lib/postgresql/data\n restart: unless-stopped\n\n hybooru:\n image: suika/hybooru:latest # https://github.com/funmaker/hybooru build it yourself\n container_name: hybooru\n restart: unless-stopped\n depends_on:\n hybooru-pg:\n condition: service_started\n ports:\n - 8081:80 # READ\n volumes:\n - hydrus-client:/opt/hydrus/db\n\n hydownloader:\n image: ghcr.io/thatfuckingbird/hydownloader:edge\n container_name: hydownloader\n restart: unless-stopped\n ports:\n - 53211:53211\n volumes:\n - hydownloader-data:/db\n - hydrus-client:/hydb\n\n tor-socks-proxy:\n #network_mode: \"container:myvpn_container\" # in case you have a vpn container\n container_name: tor-socks-proxy\n image: peterdavehello/tor-socks-proxy:latest\n restart: unless-stopped\n\n tor-hydrus:\n image: goldy/tor-hidden-service\n container_name: tor-hydrus\n depends_on:\n hydrusclient:\n condition: service_healthy\n hydrusserver:\n condition: service_healthy\n hybooru:\n condition: service_started\n environment:\n HYBOORU_TOR_SERVICE_HOSTS: '80:hybooru:80'\n HYBOORU_TOR_SERVICE_VERSION: '3'\n HYSERV_TOR_SERVICE_HOSTS: 45870:hydrusserver:45870,45871:hydrusserver:45871\n HYSERV_TOR_SERVICE_VERSION: '3'\n HYCLNT_TOR_SERVICE_HOSTS: 45868:hydrusclient:45868,45869:hydrusclient:45869\n HYCLNT_TOR_SERVICE_VERSION: '3'\n volumes:\n - tor-config:/var/lib/tor/hidden_service \n
- Hybooru: Hydrus-based booru-styled imageboard in React, inspired by hyve.
- hydownloader: Alternative way of downloading and importing files. Decoupled from hydrus logic and limitations.
"},{"location":"downloader_completion.html","title":"Putting it all together","text":"# Alpine (client)\ncd hydrus/\ndocker build -t ghcr.io/hydrusnetwork/hydrus:latest -f static/build_files/docker/client/Dockerfile .\n
Now you know what GUGs, URL Classes, and Parsers are, you should have some ideas of how URL Classes could steer what happens when the downloader is faced with an URL to process. Should a URL be imported as a media file, or should it be parsed? If so, how?
You may have noticed in the Edit GUG ui that it lists if a current URL Class matches the example URL output. If the GUG has no matching URL Class, it won't be listed in the main 'gallery selector' button's list--it'll be relegated to the 'non-functioning' page. Without a URL Class, the client doesn't know what to do with the output of that GUG. But if a URL Class does match, we can then hand the result over to a parser set at network->downloader components->manage url class links:
Here you simply set which parsers go with which URL Classes. If you have URL Classes that do not have a parser linked (which is the default for new URL Classes), you can use the 'try to fill in gaps...' button to automatically fill the gaps based on guesses using the parsers' example URLs. This is usually the best way to line things up unless you have multiple potential parsers for that URL Class, in which case it'll usually go by the parser name earliest in the alphabet.
If the URL Class has no parser set or the parser is broken or otherwise invalid, the respective URL's file import object in the downloader or subscription is going to throw some kind of error when it runs. If you make and share some parsers, the first indication that something is wrong is going to be several users saying 'I got this error: (copy notes from file import status window)'. You can then load the parser back up in manage parsers and try to figure out what changed and roll out an update.
manage url class links also shows 'api/redirect link review', which summarises which URL Classes redirect to others. In these cases, only the redirected-to URL gets a parser entry in the first 'parser links' window, since the first will never be fetched for parsing (in the downloader, it will always be converted to the Redirected URL, and that is fetched and parsed).
Once your GUG has a URL Class and your URL Classes have parsers linked, test your downloader! Note that Hydrus's URL drag-and-drop import uses URL Classes, so if you don't have the GUG and gallery stuff done but you have a Post URL set up, you can test that just by dragging a Post URL from your browser to the client, and it should be added to a new URL Downloader and just work. It feels pretty good once it does!
"},{"location":"downloader_gugs.html","title":"Gallery URL Generators","text":"Gallery URL Generators, or GUGs are simple objects that take a simple string from the user, like:
- blue_eyes
- blue_eyes blonde_hair
- InCase
- elsa dandon_fuga
- wlop
- goth* order:id_asc
And convert them into an initialising Gallery URL, such as:
- http://safebooru.org/index.php?page=post&s=list&tags=blue_eyes&pid=0
- https://konachan.com/post?page=1&tags=blonde_hair+blue_eyes
- https://www.hentai-foundry.com/pictures/user/InCase/page/1
- http://rule34.paheal.net/post/list/elsa dandon_fuga/1
- https://www.deviantart.com/wlop/favourites/?offset=0
- https://danbooru.donmai.us/posts?page=1&tags=goth*+order:id_asc
These are all the 'first page' of the results if you type or click-through to the same location on those sites. We are essentially emulating their own simple search-url generation inside the hydrus client.
"},{"location":"downloader_gugs.html#doing_it","title":"actually doing it","text":"Although it is usually a fairly simple process of just substituting the inputted tags into a string template, there are a couple of extra things to think about. Let's look at the ui under network->downloader components->manage gugs:
The client will split whatever the user enters by whitespace, so
blue_eyes blonde_hair
becomes two search terms,[ 'blue_eyes', 'blonde_hair' ]
, which are then joined back together with the given 'search terms separator', to makeblue_eyes+blonde_hair
. Different sites use different separators, although ' ', '+', and ',' are most common. The new string is substituted into the%tags%
in the template phrase, and the URL is made.Note that you will not have to make %20 or %3A percent-encodings for reserved characters here--the network engine handles all that before the request is sent. For the most part, if you need to include or a user puts in ':' or ' ' or '\u304a\u3063\u3071\u3044', you can just pass it along straight into the final URL without worrying.
This ui should update as you change it, so have a play and look at how the output example url changes to get a feel for things. Look at the other defaults to see different examples. Even if you break something, you can just cancel out.
The name of the GUG is important, as this is what will be listed when the user chooses what 'downloader' they want to use. Make sure it has a clear unambiguous name.
The initial search text is also important. Most downloaders just take some text tags, but if your GUG expects a numerical artist id (like pixiv artist search does), you should specify that explicitly to the user. You can even put in a brief '(two tag maximum)' type of instruction if you like.
Notice that the Deviart Art example above is actually the stream of wlop's favourites, not his works, and without an explicit notice of that, a user could easily mistake what they have selected. 'gelbooru' or 'newgrounds' are bad names, 'type here' is a bad initialising text.
"},{"location":"downloader_gugs.html#nested_gugs","title":"Nested GUGs","text":"Nested Gallery URL Generators are GUGs that hold other GUGs. Some searches actually use more than one stream (such as a Hentai Foundry artist lookup, where you might want to get both their regular works and their scraps, which are two separate galleries under the site), so NGUGs allow you to generate multiple initialising URLs per input. You can experiment with this ui if you like--it isn't too complicated--but you might want to hold off doing anything for real until you are comfortable with everything and know how producing multiple initialising URLs is going to work in the actual downloader.
"},{"location":"downloader_intro.html","title":"Making a Downloader","text":"Caution
Creating custom downloaders is only for advanced users who understand HTML or JSON. Beware! If you are simply looking for how to add new downloaders, please head over here.
"},{"location":"downloader_intro.html#intro","title":"this system","text":"The first versions of hydrus's downloaders were all hardcoded and static--I wrote everything into the program itself and nothing was user-creatable or -fixable. After the maintenance burden of the entire messy system proved too large for me to keep up with and a semi-editable booru system proved successful, I decided to overhaul the entire thing to allow user creation and sharing of every component. It is designed to be very simple to the front-end user--they will typically handle a couple of png files and then select a new downloader from a list--but very flexible (and hence potentially complicated) on the back-end. These help pages describe the different compontents with the intention of making an HTML- or JSON- fluent user able to create and share a full new downloader on their own.
As always, this is all under active development. Your feedback on the system would be appreciated, and if something is confusing or you discover something in here that is out of date, please let me know.
"},{"location":"downloader_intro.html#downloader","title":"what is a downloader?","text":"In hydrus, a downloader is one of:
Gallery Downloader This takes a string like 'blue_eyes' to produce a series of thumbnail gallery page URLs that can be parsed for image page URLs which can ultimately be parsed for file URLs and metadata like tags. Boorus fall into this category. URL Downloader This does just the Gallery Downloader's back-end--instead of taking a string query, it takes the gallery or post URLs directly from the user, whether that is one from a drag-and-drop event or hundreds pasted from clipboard. For our purposes here, the URL Downloader is a subset of the Gallery Downloader. Watcher This takes a URL that it will check in timed intervals, parsing it for new URLs that it then queues up to be downloaded. It typically stops checking after the 'file velocity' (such as '1 new file per day') drops below a certain level. It is mostly for watching imageboard threads. Simple Downloader This takes a URL one-time and parses it for direct file URLs. This is a miscellaneous system for certain simple gallery types and some testing/'I just need the third tag's src on this one page' jobs.The system currently supports HTML and JSON parsing. XML should be fine under the HTML parser--it isn't strict about checking types and all that.
"},{"location":"downloader_intro.html#pipeline","title":"what does a downloader do?","text":"The Gallery Downloader is the most complicated downloader and uses all the possible components. In order for hydrus to convert our example 'blue_eyes' query into a bunch of files with tags, it needs to:
- Present some user interface named 'safebooru tag search' to the user that will convert their input of 'blue_eyes' into https://safebooru.org/index.php?page=post&s=list&tags=blue_eyes&pid=0.
- Recognise https://safebooru.org/index.php?page=post&s=list&tags=blue_eyes&pid=0 as a Safebooru Gallery URL.
- Convert the HTML of a Safebooru Gallery URL into a list URLs like https://safebooru.org/index.php?page=post&s=view&id=2437965 and possibly a 'next page' URL (e.g. https://safebooru.org/index.php?page=post&s=list&tags=blue_eyes&pid=40) that points to the next page of thumbnails.
- Recognise the https://safebooru.org/index.php?page=post&s=view&id=2437965 URLs as Safebooru Post URLs.
- Convert the HTML of a Safebooru Post URL into a file URL like https://safebooru.org//images/2329/b6e8c263d691d1c39a2eeba5e00709849d8f864d.jpg and some tags like: 1girl, bangs, black gloves, blonde hair, blue eyes, braid, closed mouth, day, fingerless gloves, fingernails, gloves, grass, hair ornament, hairclip, hands clasped, creator:hankuri, interlocked fingers, long hair, long sleeves, outdoors, own hands together, parted bangs, pointy ears, character:princess zelda, smile, solo, series:the legend of zelda, underbust.
So we have three components:
- Gallery URL Generator (GUG): faces the user and converts text input into initialising Gallery URLs.
- URL Class: identifies URLs and informs the client how to deal with them.
- Parser: converts data from URLs into hydrus-understandable metadata.
URL downloaders and watchers do not need the Gallery URL Generator, as their input is an URL. And simple downloaders also have an explicit 'just download it and parse it with this simple rule' action, so they do not use URL Classes (or even full-fledged Page Parsers) either.
"},{"location":"downloader_login.html","title":"Login Manager","text":"The system works, but this help was never done! Check the defaults for examples of how it works, sorry!
"},{"location":"downloader_parsers.html","title":"Parsers","text":"In hydrus, a parser is an object that takes a single block of HTML or JSON data and returns many kinds of hydrus-level metadata.
Parsers are flexible and potentially quite complicated. You might like to open network->downloader components->manage parsers and explore the UI as you read these pages. Check out how the default parsers already in the client work, and if you want to write a new one, see if there is something already in there that is similar--it is usually easier to duplicate an existing parser and then alter it than to create a new one from scratch every time.
There are three main components in the parsing system (click to open each component's help page):
- Formulae: Take parsable data, search it in some manner, and return 0 to n strings.
- Content Parsers: Take parsable data, apply a formula to it to get some strings, and apply a single metadata 'type' and perhaps some additional modifiers.
- Page Parsers: Take parsable data, apply content parsers to it, and return all the metadata in an appropriate structure.
Once you are comfortable with these objects, you might like to check out these walkthroughs, which create full parsers from nothing:
- e621 HTML gallery page
- Gelbooru HTML file page
- Artstation JSON file page API
Once you are comfortable with parsers, and if you are feeling brave, check out how the default imageboard and pixiv parsers work. These are complicated and use more experimental areas of the code to get their job done. If you are trying to get a new imageboard parser going and can't figure out subsidiary page parsers, send me a mail or something and I'll try to help you out!
When you are making a parser, consider this checklist (you might want to copy/have your own version of this somewhere):
- Do you get good URLs with good priority? Do you ever accidentally get favourite/popular/advert results you didn't mean to?
- If you need a next gallery page URL, is it ever not available (and hence needs a URL Class fix)? Does it change for search tags with unicode or http-restricted characters?
- Do you get nice namespaced tags? Are any unwanted single characters like -/+/? getting through?
- Is the file hash available anywhere?
- Is a source/post time available?
- Is a source URL available? Is it good quality, or does it often just point to an artist's base twitter profile? If you pull it from text or a tooltip, is it clipped for longer URLs?
Taken a break? Now let's put it all together ---->
"},{"location":"downloader_parsers_content_parsers.html","title":"Content Parsers","text":"So, we can now generate some strings from a document. Content Parsers will let us apply a single metadata type to those strings to inform hydrus what they are.
A content parser has a name, a content type, and a formula. This example fetches the character tags from a danbooru post.
The name is just decorative, but it is generally a good idea so you can find things again when you next revisit them.
The current content types are:
"},{"location":"downloader_parsers_content_parsers.html#intro","title":"urls","text":"This should be applied to relative ('/image/smile.jpg') and absolute ('https://mysite.com/content/image/smile.jpg') URLs. If the URL is relative, the client will generate an absolute URL based on the original URL used to fetch the data being parsed (i.e. it should all just work).
You can set several types of URL:
- url to download/pursue means a Post URL or a File URL in our URL Classes system, like a booru post or an actual raw file like a jpg or webm.
- url to associate means an URL you want added to the list of 'known urls' for the file, but not one you want to client to actually download and parse. Use this to neatly add booru 'source' urls.
- next gallery page means the next Gallery URL on from the current one.
The 'file url quality precedence' allows the client to select the best of several possible URLs. Given multiple content parsers producing URLs at the same 'level' of parsing, it will select the one with the highest value. Consider these two posts:
- https://danbooru.donmai.us/posts/3016415
- https://danbooru.donmai.us/posts/3040603
The Garnet image fits into a regular page and so Danbooru embed the whole original file in the main media canvas. One easy way to find the full File URL in this case would be to select the \"src\" attribute of the \"img\" tag with id=\"image\".
The Cirno one, however, is much larger and has been scaled down. The src of the main canvas tag points to a resized 'sample' link. The full link can be found at the 'view original' link up top, which is an \"a\" tag with id=\"image-resize-link\".
The Garnet post does not have the 'view original' link, so to cover both situations we might want two content parsers--one fetching the 'canvas' \"src\" and the other finding the 'view original' \"href\". If we set the 'canvas' one with a quality of 40 and the 'view original' 60, then the parsing system would know to select the 60 when it was available but to fall back to the 40 if not.
As it happens, Danbooru (afaik, always) gives a link to the original file under the 'Size:' metadata to the left. This is the same 'best link' for both posts above, but it isn't so easy to identify. It is a quiet \"a\" tag without an \"id\" and it isn't always in the same location, but if you could pin it down reliably, it might be nice to circumvent the whole issue.
Sites can change suddenly, so it is nice to have a bit of redundancy here if it is easy.
"},{"location":"downloader_parsers_content_parsers.html#tags","title":"tags","text":"These are simple--they tell the client that the given strings are tags. You set the namespace here as well. I recommend you parse 'splashbrush' and set the namespace 'creator' here rather than trying to mess around with 'append prefix \"creator:\"' string conversions at the formula level--it is simpler up here and it lets hydrus handle any edge case logic for you.
Leave the namespace field blank for unnamespaced tags.
"},{"location":"downloader_parsers_content_parsers.html#file_hash","title":"file hash","text":"This says 'this is the hash for the file otherwise referenced in this parser'. So, if you have another content parser finding a File or Post URL, this lets the client know early that that destination happens to have a particular MD5, for instance. The client will look for that hash in its own database, and if it finds a match, it can predetermine if it already has the file (or has previously deleted it) without ever having to download it. When this happens, it will still add tags and associate the file with the URL for it's 'known urls' just as if it had downloaded it!
If you understand this concept, it is great to include. It saves time and bandwidth for everyone. Many site APIs include a hash for this exact reason--they want you to be able to skip a needless download just as much as you do.
The usual suite of hash types are supported: MD5, SHA1, SHA256, and SHA512. An old version of this required some weird string decoding, but this is no longer true. Select 'hex' or 'base64' from the encoding type dropdown, and then just parse the 'e5af57a687f089894f5ecede50049458' or '5a9XpofwiYlPXs7eUASUWA==' text, and hydrus should handle the rest. It will present the parsed hash in hex.
"},{"location":"downloader_parsers_content_parsers.html#timestamp","title":"timestamp","text":"This lets you say that a given number refers to a particular time for a file. At the moment, I only support 'source time', which represents a 'post' time for the file and is useful for thread and subscription check time calculations. It takes a Unix time integer, like 1520203484, which many APIs will provide.
If you are feeling very clever, you can decode a 'MM/DD/YYYY hh:mm:ss' style string to a Unix time integer using string converters, which use some hacky and semi-reliable python %d-style values as per here. Look at the existing defaults for examples of this, and don't worry about being more accurate than 12/24 hours--trying to figure out timezone is a hell not worth attempting, and doesn't really matter in the long-run for subscriptions and thread watchers that might care.
"},{"location":"downloader_parsers_content_parsers.html#page_title","title":"watcher page title","text":"This lets the watcher know a good name/subject for its entries. The subject of a thread is obviously ideal here, but failing that you can try to fetch the first part of the first post's comment. It has precendence, like for URLs, so you can tell the parser which to prefer if you have multiple options. Just for neatness and ease of testing, you probably want to use a string converter here to cut it down to the first 64 characters or so.
"},{"location":"downloader_parsers_content_parsers.html#veto","title":"veto","text":"This is a special content type--it tells the next highest stage of parsing that this 'post' of parsing is invalid and to cancel and not return any data. For instance, if a thread post's file was deleted, the site might provide a default '404' stock File URL using the same markup structure as it would for normal images. You don't want to give the user the same 404 image ten times over (with fifteen kinds of tag and source time metadata attached), so you can add a little rule here that says \"If the image link is 'https://somesite.com/404.png', raise a veto: File 404\" or \"If the page has 'No results found' in its main content div, raise a veto: No results found\" or \"If the expected download tag does not have 'download link' as its text, raise a veto: No Download Link found--possibly Ugoira?\" and so on.
They will associate their name with the veto being raised, so it is useful to give these a decent descriptive name so you can see what might be going right or wrong during testing. If it is an appropriate and serious enough veto, it may also rise up to the user level and will be useful if they need to report you an error (like \"After five pages of parsing, it gives 'veto: no next page link'\").
"},{"location":"downloader_parsers_formulae.html","title":"Parser Formulae","text":"Formulae are tools used by higher-level components of the parsing system. They take some data (typically some HTML or JSON) and return 0 to n strings. For our purposes, these strings will usually be tags, URLs, and timestamps. You will usually see them summarised with this panel:
The different types are currently html, json, nested, zipper, and context variable.
"},{"location":"downloader_parsers_formulae.html#html_formula","title":"html","text":"This takes a full HTML document or a sample of HTML--and any regular sort of XML should also work. It starts at the root node and searches for lower nodes using one or more ordered rules based on tag name and attributes, and then returns string data from those final nodes.
For instance, if you have this:
<html>\n <body>\n <div class=\"media_taglist\">\n <span class=\"generaltag\"><a href=\"(search page)\">blonde hair</a> (3456)</span>\n <span class=\"generaltag\"><a href=\"(search page)\">blue eyes</a> (4567)</span>\n <span class=\"generaltag\"><a href=\"(search page)\">bodysuit</a> (5678)</span>\n <span class=\"charactertag\"><a href=\"(search page)\">samus aran</a> (2345)</span>\n <span class=\"artisttag\"><a href=\"(search page)\">splashbrush</a> (123)</span>\n </div>\n <div class=\"content\">\n <span class=\"media\">(a whole bunch of content that doesn't have tags in)</span>\n </div>\n </body>\n</html>\n
(Most boorus have a taglist like this on their file pages.)
To find the artist, \"splashbrush\", here, you could:
- search beneath the root tag (
<html>
) for the<div>
tag with attributeclass=\"media_taglist\"
- search beneath that
<div>
for<span>
tags with attributeclass=\"artisttag\"
- search beneath those
<span>
tags for<a>
tags - and then get the string content of those
<a>
tags
Changing the
artisttag
tocharactertag
orgeneraltag
would give yousamus aran
orblonde hair
,blue eyes
,bodysuit
respectively.You might be tempted to just go straight for any
"},{"location":"downloader_parsers_formulae.html#the_ui","title":"the ui","text":"<span>
withclass=\"artisttag\"
, but many sites use the same class to render a sidebar of favourite/popular tags or some other sponsored content, so it is generally best to try to narrow down to a larger<div>
container so you don't get anything you don't mean.Clicking 'edit formula' on an HTML formula gives you this:
You edit on the left and test on the right.
"},{"location":"downloader_parsers_formulae.html#finding_the_right_html_tags","title":"finding the right html tags","text":"When you add or edit one of the specific tag search rules, you get this:
You can set multiple key/value attribute search conditions, but you'll typically be searching for 'class' or 'id' here, if anything.
Note that you can set it to fetch only the xth instance of a found tag, which can be useful in situations like this:
<span class=\"generaltag\">\n <a href=\"(add tag)\">+</a>\n <a href=\"(remove tag)\">-</a>\n <a href=\"(search page)\">blonde hair</a> (3456)\n</span>\n
Without any more attributes, there isn't a great way to distinguish the
<a>
with \"blonde hair\" from the other two--so just setget the 3rd <a> tag
and you are good.Most of the time, you'll be searching descendants (i.e. walking down the tree), but sometimes you might have this:
<span>\n <a href=\"(link to post url)\">\n <img class=\"thumb\" src=\"(thumbnail image)\" />\n </a>\n</span>\n
There isn't a great way to find the
<span>
or the<a>
when looking from above here, as they are lacking a class or id, but you can find the<img>
ok, so if you find those and then add a rule where instead of searching descendants, you are 'walking back up ancestors' like this:You can solve some tricky problems this way!
You can also set a String Match, which is the same panel as you say in with URL Classes. It tests its best guess at the tag's 'string' value, so you can find a tag with 'Original Image' as its text or that with a regex starts with 'Posted on: '. Have a play with it and you'll figure it out.
"},{"location":"downloader_parsers_formulae.html#content_to_fetch","title":"content to fetch","text":"Once you have narrowed down the right nodes you want, you can decide what text to fetch. Given a node of:
<a href=\"(URL A)\" class=\"thumb_title\">Forest Glade</a>\n
Returning the
"},{"location":"downloader_parsers_formulae.html#string_match_and_conversion","title":"string match and conversion","text":"href
attribute would return the string \"(URL A)\", returning the string content would give \"Forest Glade\", and returning the full html would give<a href=\"(URL A)\" class=\"thumb\">Forest Glade</a>
. This last choice is useful in complicated situations where you want a second, separated layer of parsing, which we will get to later.You can set a final String Match to filter the parsed results (e.g. \"only allow strings that only contain numbers\" or \"only allow full URLs as based on (complicated regex)\") and String Converter to edit it (e.g. \"remove the first three characters of whatever you find\" or \"decode from base64\").
You won't use these much, but they can sometimes get you out of a complicated situation.
"},{"location":"downloader_parsers_formulae.html#testing","title":"testing","text":"The testing panel on the right is important and worth using. Copy the html from the source you want to parse and then hit the paste buttons to set that as the data to test with.
"},{"location":"downloader_parsers_formulae.html#json_formula","title":"json","text":"This takes some JSON and does a similar style of search:
It is a bit simpler than HTML--if the current node is a list (called an 'Array' in JSON), you can fetch every item or the xth item, and if it is a dictionary (called an 'Object' in JSON), you can fetch a particular entry by name. Since you can't jump down several layers with attribute lookups or tag names like with HTML, you have to go down every layer one at a time. In any case, if you have something like this:
Note
It is a great idea to check the html or json you are trying to parse with your browser. Most web browsers have excellent developer tools that let you walk through the nodes of the document you are trying to parse in a prettier way than I would ever have time to put together. This image is one of the views Firefox provides if you simply enter a JSON URL.
Searching for \"posts\"->1st list item->\"sub\" on this data will give you \"Nobody like kino here.\".
Searching for \"posts\"->all list items->\"tim\" will give you the three SHA256 file hashes (since the third post has no file attached and so no 'tim' entry, the parser skips over it without complaint).
Searching for \"posts\"->1st list item->\"com\" will give you the OP's comment, ~AS RAW UNPARSED HTML~.
The default is to fetch the final nodes' 'data content', which means coercing simple variables into strings. If the current node is a list or dict, no string is returned.
But if you like, you can return the json beneath the current node (which, like HTML, includes the current node). This again will come in useful later.
"},{"location":"downloader_parsers_formulae.html#nested_formula","title":"nested","text":"If you want to parse some JSON that is tucked inside an HTML attribute, or vice versa, use a nested formula. This parses the text using one formula type and then passes the result(s) to another.
The especially neat thing about this is the encoded characters like
"},{"location":"downloader_parsers_formulae.html#zipper_formula","title":"zipper","text":">
or escaped JSON characters are all handled natively for you. Before we had this, we had to hack our way around with crazy regex.If you want to combine strings from the results of different parsers--for instance by joining the 'tim' and the 'ext' in our json example--you can use a Zipper formula. This fetches multiple lists of strings and zips their result rows together using
\\1
regex substitution syntax:This is a complicated example taken from one of my thread parsers. I have to take a modified version of the original thread URL (the first rule, so
\\1
) and then append the filename (\\2
) and its extension (\\3
) on the end to get the final file URL of a post. You can mix in more characters in the substitution phrase, like\\1.jpg
or even have multiple instances (https://\\2.muhsite.com/\\2/\\1
), if that is appropriate.If your sub-formulae produce multiple results, the Zipper will produce that many also, iterating the sub-lists together.
ExampleIf parser 1 gives:\n a\n b\n c\n\nAnd parser 2 gives:\n 1\n 2\n 3\n\nUsing a substitution phrase of \"\\1-\\2\" will give:\n a-1\n b-2\n c-3\n
If one of the sub-formulae produces fewer results than the others, its final value will be used to fill in the gaps. In this way, you might somewhere parse one prefix and seven suffixes, where joining them will use the same prefix seven times.
"},{"location":"downloader_parsers_formulae.html#context_variable_formula","title":"context variable","text":"This is a basic hacky answer to a particular problem. It is a simple key:value dictionary that at the moment only stores one variable, 'url', which contains the original URL used to fetch the data being parsed.
If a different URL Class links to this parser via an API URL, this 'url' variable will always be the API URL (i.e. it literally is the URL used to fetch the data), not any thread/whatever URL the user entered.
Hit the 'edit example parsing context' to change the URL used for testing.
I have used this several times to stitch together file URLs when I am pulling data from APIs, like in the zipper formula example above. In this case, the starting URL is
https://a.4cdn.org/tg/thread/57806016.json
, from which I extract the board name, \"tg\", using the string converter, and then add in 4chan's CDN domain to make the appropriate base file URL (https:/i.4cdn.org/tg/
) for the given thread. I only have to jump through this hoop in 4chan's case because they explicitly store file URLs by board name. 8chan on the other hand, for instance, has a statichttps://media.8ch.net/file_store/
for all files, so it is a little easier (I think I just do a single 'prepend' string transformation somewhere).If you want to make some parsers, you will have to get familiar with how different sites store and present their data!
"},{"location":"downloader_parsers_full_example_api.html","title":"api example","text":"Some sites offer API calls for their pages. Depending on complexity and quality of content, using these APIs may or may not be a good idea. Artstation has a good one--let's first review our URL Classes:
We convert the original Post URL, https://www.artstation.com/artwork/mQLe1 to https://www.artstation.com/projects/mQLe1.json. Note that Artstation Post URLs can produce multiple files, and that the API url should not be associated with those final files.
So, when the client encounters an 'artstation file page' URL, it will generate the equivalent 'artstation file page json api' URL and use that for downloading and parsing. If you would like to review your API links, check out network->downloader components->manage url class links->api links. Using Example URLs, it will figure out which URL Classes link to others and ensure you are mapping parsers only to the final link in the chain--there should be several already in there by default.
Now lets look at the JSON. Loading clean JSON in a browser should present you with a nicer view:
I have highlighted the data we want, which is:
- The File URLs.
- Creator, title, medium, and unnamespaced tags.
- Source time.
JSON is a dream to parse, and I will assume you are comfortable with Content Parsers from the previous examples, so I'll simply paste the different formulae one after another:
Each image is stored under a separate numbered 'assets' list item. This one has just two, but some Artstation pages have dozens of images. The only unusual part here is I also put a String Match of
^(?!.*assets\\/covers).*$
, which filters out 'cover' images (such as on here), which make for nice portfolio thumbs on the site but are not interesting to us.This fetches the 'creator' tag. Artstation's API is great because it includes profile data in content requests. There's the creator's presentation name, username, profile link, avatar URLs, all that inside a regular request about this particular work. When that information is missing (like in yiff.party), it may make the API useless to you.
These are all simple. You can take or leave the title and medium tags--some people like them, some don't. This example has no unnamespaced tags, but this one does. Creator-entered tags are sometimes not worth parsing (on tumblr, for instance, you often get run-on tags like #imbored #whatisevengoingon that are irrelevent to the work), but Artstation users are all professionals trying to get their work noticed, so the tags are usually pretty good.
This again uses python's datetime to decode the date, which Artstation presents with millisecond accuracy, ha ha. I use a
"},{"location":"downloader_parsers_full_example_api.html#summary","title":"summary","text":"(.+:..)\\..*->\\1
regex (i.e. \"get everything before the period\") to strip off the timezone and milliseconds and then decode as normal.APIs that are stable and free to access (e.g. do not require OAuth or other complicated login headers) can make parsing fantastic. They save bandwidth and CPU time, and they are typically easier to work with than HTML. Unfortunately, the boorus that do provide APIs often list their tags without namespace information, so I recommend you double-check you can get what you want before you get too deep into it. Some APIs also offer incomplete data, such as relative URLs (relative to the original URL!), which can be a pain to figure out in our system.
"},{"location":"downloader_parsers_full_example_file_page.html","title":"file page example","text":"Let's look at this page: https://gelbooru.com/index.php?page=post&s=view&id=3837615.
What sorts of data are we interested in here?
- The image URL.
- The different tags and their namespaces.
- The secret md5 hash buried in the HTML.
- The post time.
- The Deviant Art source URL.
A tempting strategy for pulling the file URL is to just fetch the src of the embedded
<img>
tag, but:- If the booru also supports videos or flash, you'll have to write separate and likely more complicated rules for
<video>
and<embed>
tags. - If the booru shows 'sample' sizes for large images--as this one does!--pulling the src of the image you see won't get the full-size original for large images.
If you have an account with the site you are parsing and have clicked the appropriate 'Always view original' setting, you may not see these sorts of sample-size banners! I recommend you log out of/go incognito for sites you are inspecting for hydrus parsing (unless a log-in is required to see content, so the hydrus user will have to set up hydrus-side login to actually use the parser), or you can easily NSFW-gates and other logged-out hurdles.
When trying to pin down the right link, if there are no good alternatives, you often have to write several File URL rules with different precedence, saying 'get the \"Click Here to See Full Size\" link at 75' and 'get the embed's \"src\" at 25' and so on to make sure you cover different situations, but as it happens Gelbooru always posts the actual File URL at:
<meta property=\"og:image\" content=\"https://gelbooru.com/images/38/6e/386e12e33726425dbd637e134c4c09b5.jpeg\" />
under the<head>
<a href=\"https://simg3.gelbooru.com//images/38/6e/386e12e33726425dbd637e134c4c09b5.jpeg\" target=\"_blank\" style=\"font-weight: bold;\">Original image</a>
which can be found by putting a String Match in the html formula.
<meta>
withproperty=\"og:image\"
is easy to search for (and they use the same tag for video links as well!). For the Original Image, you can use a String Match like so:Gelbooru uses \"Original Image\" even when they link to webm, which is helpful, but like \"og:image\", it could be changed to 'video' in future.
I think I wrote my gelbooru parser before I added String Matches to individual HTML formulae tag rules, so I went with this, which is a bit more cheeky:
But it works. Sometimes, just regexing for links that fit the site's CDN is a good bet for finding difficult stuff.
"},{"location":"downloader_parsers_full_example_file_page.html#tags","title":"tags","text":"Most boorus have a taglist on the left that has a nice id or class you can pull, and then each namespace gets its own class for CSS-colouring:
Make sure you browse around the booru for a bit, so you can find all the different classes they use. character/artist/copyright are common, but some sneak in the odd meta/species/rating.
Skipping ?/-/+ characters can be a pain if you are lacking a nice tag-text class, in which case you can add a regex String Match to the HTML formula (as I do here, since Gelb offers '?' links for tag definitions) like [^\\?\\-+\\s], which means \"the text includes something other than just '?' or '-' or '+' or whitespace\".
"},{"location":"downloader_parsers_full_example_file_page.html#md5_hash","title":"md5 hash","text":"If you look at the Gelbooru File URL, https://gelbooru.com/images/38/6e/386e12e33726425dbd637e134c4c09b5.jpeg, you may notice the filename is all hexadecimal. It looks like they store their files under a two-deep folder structure, using the first four characters--386e here--as the key. It sure looks like '386e12e33726425dbd637e134c4c09b5' is not random ephemeral garbage!
In fact, Gelbooru use the MD5 of the file as the filename. Many storage systems do something like this (hydrus uses SHA256!), so if they don't offer a
<meta>
tag that explicitly states the md5 or sha1 or whatever, you can sometimes infer it from one of the file links. This screenshot is from the more recent version of hydrus, which has the more powerful 'string processing' system for string transformations. It has an intimidating number of nested dialogs, but we can stay simple for now, with only the one regex substitution step inside a string 'converter':Here we are using the same property=\"og:image\" rule to fetch the File URL, and then we are regexing the hex hash with
.*(\\[0-9a-f\\]{32}).*
(MD5s are 32 hex characters). We select 'hex' as the encoding type. Hashes require a tiny bit more data handling behind the scenes, but in the Content Parser test page it presents the hash again neatly in English: \"md5 hash: 386e12e33726425dbd637e134c4c09b5\"), meaning everything parsed correct. It presents the hash in hex even if you select the encoding type as base64.If you think you have found a hash string, you should obviously test your theory! The site might not be using the actual MD5 of file bytes, as hydrus does, but instead some proprietary scheme. Download the file and run it through a program like HxD (or hydrus!) to figure out its hashes, and then search the View Source for those hex strings--you might be surprised!
Finding the hash is hugely beneficial for a parser--it lets hydrus skip downloading files without ever having seen them before!
"},{"location":"downloader_parsers_full_example_file_page.html#source_time","title":"source time","text":"Post/source time lets subscriptions and watchers make more accurate guesses at current file velocity. It is neat to have if you can find it, but:
FUCK ALL TIMEZONES FOREVER
Gelbooru offers--
<li>Posted: 2017-08-18 19:59:44<br /> by <a href=\"index.php?page=account&s=profile&uname=jayage5ds\">jayage5ds</a></li>\n
--so let's see how we can turn that into a Unix timestamp:
I find the
"},{"location":"downloader_parsers_full_example_file_page.html#source_url","title":"source url","text":"<li>
that starts \"Posted: \" and then decode the date according to the hackery-dackery-doo format from here.%c
and%z
are unreliable, and attempting timezone adjustments is overall a supervoid that will kill your time for no real benefit--subs and watchers work fine with 12-hour imprecision, so if you have a +0300 or EST in your string, just cut those characters off with another String Transformation. As long as you are getting about the right day, you are fine.Source URLs are nice to have if they are high quality. Some boorus only ever offer artist profiles, like
https://twitter.com/artistname
, whereas we want singular Post URLs that point to other places that host this work. For Gelbooru, you could fetch the Source URL as we did source time, searching for \"Source: \", but they also offer more easily in an edit form:<input type=\"text\" name=\"source\" size=\"40\" id=\"source\" value=\"https://www.deviantart.com/art/Lara-Croft-Artifact-Dive-699335378\" />\n
This is a bit of a fragile location to parse from--Gelb could change or remove this form at any time, whereas the \"Posted: \"
<li>
is probably firmer, but I expect I wrote it before I had String Matches in. It works for now, which in this game is often Good Enough\u2122.Also--be careful pulling from text or tooltips rather than an href-like attribute, as whatever is presented to the user may be clipped for longer URLs. Make sure you try your rules on a couple of different pages to make sure you aren't pulling \"https://www.deviantart.com/art/Lara...\" by accident anywhere!
"},{"location":"downloader_parsers_full_example_file_page.html#summary","title":"summary","text":"Phew--all that for a bit of Lara Croft! Thankfully, most sites use similar schemes. Once you are familiar with the basic idea, the only real work is to duplicate an existing parser and edit for differences. Our final parser looks like this:
This is overall a decent parser. Some parts of it may fail when Gelbooru update to their next version, but that can be true of even very good parsers with multiple redundancy. For now, hydrus can use this to quickly and efficiently pull content from anything running Gelbooru 0.2.5., and the effort spent now can save millions of combined right-click->save as and manual tag copies in future. If you make something like this and share it about, you'll be doing a good service for those who could never figure it out.
"},{"location":"downloader_parsers_full_example_gallery_page.html","title":"gallery page example","text":"Caution
These guides should roughly follow what comes with the client by default! You might like to have the actual UI open in front of you so you can play around with the rules and try different test parses yourself.
Let's look at this page: https://e621.net/post/index/1/rating:safe pokemon
We've got 75 thumbnails and a bunch of page URLs at the bottom.
"},{"location":"downloader_parsers_full_example_gallery_page.html#main_page","title":"first, the main page","text":"This is easy. It gets a good name and some example URLs. e621 has some different ways of writing out their queries (and as they use some tags with '/', like 'male/female', this can cause character encoding issues depending on whether the tag is in the path or query!), but we'll put that off for now--we just want to parse some stuff.
"},{"location":"downloader_parsers_full_example_gallery_page.html#thumbnail_urls","title":"thumbnail links","text":"Most browsers have some good developer tools to let you Inspect Element and get a better view of the HTML DOM. Be warned that this information isn't always the same as View Source (which is what hydrus will get when it downloads the initial HTML document), as some sites load results dynamically with javascript and maybe an internal JSON API call (when sites move to systems that load more thumbs as you scroll down, it makes our job more difficult--in these cases, you'll need to chase down the embedded JSON or figure out what API calls their JS is making--the browser's developer tools can help you here again). Thankfully, e621 is (and most boorus are) fairly static and simple:
Every thumb on e621 is a
<span>
with class=\"thumb\" wrapping an<a>
and an<img>
. This is a common pattern, and easy to parse:There's no tricky String Matches or String Converters needed--we are just fetching hrefs. Note that the links get relative-matched to example.com for now--I'll probably fix this to apply to one of the example URLs, but rest assured that IRL the parser will 'join' its url up with the appropriate Gallery URL used to fetch the data. Sometimes, you might want to add a rule for
search descendents for the first <div> tag with id=content
to make sure you are only grabbing thumbs from the main box, whether that is a<div>
or a<span>
, and whether it hasid=\"content
\" orclass=\"mainBox\"
, but unless you know that booru likes to embed \"popular\" or \"favourite\" 'thumbs' up top that will be accidentally caught by a<span>
's withclass=\"thumb\"
, I recommend you not make your rules overly specific--all it takes is for their dev to change the name of their content box, and your whole parser breaks. I've ditched the<span>
requirement in the rule here for exactly that reason--class=\"thumb\"
is necessary and sufficient.Remember that the parsing system allows you to go up ancestors as well as down descendants. If your thumb-box has multiple links--like to see the artist's profile or 'set as favourite'--you can try searching for the
"},{"location":"downloader_parsers_full_example_gallery_page.html#next_gallery_url","title":"next gallery page link","text":"<span>
s, then down to the<img>
, and then up to the nearest<a>
. In English, this is saying, \"Find me all the image link URLs in the thumb boxes.\"Most boorus have 'next' or '>>' at the bottom, which can be simple enough, but many have a neat
<link href=\"/post/index/2/rating:safe%20pokemon\" rel=\"next\" />
in the<head>
. The<head>
solution is easier, if available, but my default e621 parser happens to pursue the 'paginator':As it happens, e621 also apply the
rel=\"next\"
attribute to their \"Next >>\" links, which makes it all that easier for us to find. Sometimes there is no \"next\" id or class, and you'll want to add a String Match to your html formula to test for a string value of '>>' or whatever it is. A good trick is to View Source and then search for the critical/post/index/2/
phrase you are looking for--you might find what you want in a<link>
tag you didn't expect or even buried in a hidden 'share to tumblr' button.<form>
s for reporting or commenting on content are another good place to find content ids.Note that this finds two URLs. e621 apply the
"},{"location":"downloader_parsers_full_example_gallery_page.html#summary","title":"summary","text":"rel=\"next\"
to both the \"2\" link and the \"Next >>\" one. The download engine merges the parser's dupes, so don't worry if you end up parsing both the 'top' and 'bottom' next page links, or if you use multiple rules to parse the same data in different ways.With those two rules, we are done. Gallery parsers are nice and simple.
"},{"location":"downloader_parsers_page_parsers.html","title":"Page Parsers","text":"We can now produce individual rows of rich metadata. To arrange them all into a useful structure, we will use Page Parsers.
The Page Parser is the top level parsing object. It takes a single document and produces a list--or a list of lists--of metadata. Here's the main UI:
Notice that the edit panel has three sub-pages.
"},{"location":"downloader_parsers_page_parsers.html#main","title":"main","text":"- Name: Like for content parsers, I recommend you add good names for your parsers.
- Pre-parsing conversion: If your API source encodes or wraps the data you want to parse, you can do some string transformations here. You won't need to use this very often, but if your source gives the JSON wrapped in javascript (like the old tumblr API), it can be invaluable.
- Example URLs: Here you should add a list of example URLs the parser works for. This lets the client automatically link this parser up with URL classes for you and any users you share the parser with.
This page is just a simple list:
Each content parser here will be applied to the document and returned in this page parser's results list. Like most boorus, e621's File Pages only ever present one file, and they have simple markup, so the solution here was simple. The full contents of that test window are:
*** 1 RESULTS BEGIN ***\n\ntag: character:krystal\ntag: creator:s mino930\nfile url: https://static1.e621.net/data/fc/b6/fcb673ed89241a7b8d87a5dcb3a08af7.jpg\ntag: anthro\ntag: black nose\ntag: blue fur\ntag: blue hair\ntag: clothing\ntag: female\ntag: fur\ntag: green eyes\ntag: hair\ntag: hair ornament\ntag: jewelry\ntag: short hair\ntag: solo\ntag: video games\ntag: white fur\ntag: series:nintendo\ntag: series:star fox\ntag: species:canine\ntag: species:fox\ntag: species:mammal\n\n*** RESULTS END ***\n
When the client sees this in a downloader context, it will where to download the file and which tags to associate with it based on what the user has chosen in their 'tag import options'.
"},{"location":"downloader_parsers_page_parsers.html#subsidiary_page_parsers","title":"subsidiary page parsers","text":"Here be dragons. This was an attempt to make parsing more helpful in certain API situations, but it ended up ugly. I do not recommend you use it, as I will likely scratch the whole thing and replace it with something better one day. It basically splits the page up into pieces that can then be parsed by nested page parsers as separate objects, but the UI and workflow is hell. Afaik, the imageboard API parsers use it, but little/nothing else. If you are really interested, check out how those work and maybe duplicate to figure out your own imageboard parser and/or send me your thoughts on how to separate File URL/timestamp combos better.
"},{"location":"downloader_sharing.html","title":"Sharing Downloaders","text":"If you are working with users who also understand the downloader system, you can swap your GUGs, URL Classes, and Parsers separately using the import/export buttons on the relevant dialogs, which work in pngs and clipboard text.
But if you want to share conveniently, and with users who are not familiar with the different downloader objects, you can package everything into a single easy-import png as per here.
The dialog to use is network->downloader components->export downloaders:
It isn't difficult. Essentially, you want to bundle enough objects to make one or more 'working' GUGs at the end. I recommend you start by just hitting 'add gug', which--using Example URLs--will attempt to figure out everything you need by itself.
This all works on Example URLs and some domain guesswork, so make sure your url classes are good and the parsers have correct Example URLs as well. If they don't, they won't all link up neatly for the end user. If part of your downloader is on a different domain to the GUGs and Gallery URLs, then you'll have to add them manually. Just start with 'add gug' and see if it looks like enough.
Once you have the necessary and sufficient objects added, you can export to png. You'll get a similar 'does this look right?' summary as what the end-user will see, just to check you have everything in order and the domains all correct. If that is good, then make sure to give the png a sensible filename and embellish the title and description if you need to. You can then send/post that png wherever, and any regular user will be able to use your work.
"},{"location":"downloader_url_classes.html","title":"URL Classes","text":"The fundamental connective tissue of the downloader system is the 'URL Class'. This object identifies and normalises URLs and links them to other components. Whenever the client handles a URL, it tries to match it to a URL Class to figure out what to do.
"},{"location":"downloader_url_classes.html#url_types","title":"the types of url","text":"For hydrus, an URL is useful if it is one of:
File URLThis returns the full, raw media file with no HTML wrapper. They typically end in a filename like http://safebooru.org//images/2333/cab1516a7eecf13c462615120ecf781116265f17.jpg, but sometimes they have a more complicated fetch command ending like 'file.php?id=123456' or '/post/content/123456'.
These URLs are remembered for the file in the 'known urls' list, so if the client happens to encounter the same URL in future, it can determine whether it can skip the download because the file is already in the database or has previously been deleted.
It is not important that File URLs be matched by a URL Class. File URL is considered the 'default', so if the client finds no match, it will assume the URL is a file and try to download and import the result. You might want to particularly specify them if you want to present them in the media viewer or discover File URLs are being confused for Post URLs or something.
Post URLThis typically return some HTML that contains a File URL and metadata such as tags and post time. They sometimes present multiple sizes (like 'sample' vs 'full size') of the file or even different formats (like 'ugoira' vs 'webm'). The Post URL for the file above, http://safebooru.org/index.php?page=post&s=view&id=2429668 has this 'sample' presentation. Finding the best File URL in these cases can be tricky!
This URL is also saved to 'known urls' and will usually be similarly skipped if it has previously been downloaded. It will also appear in the media viewer as a clickable link.
Gallery URL This presents a list of Post URLs or File URLs. They often also present a 'next page' URL. It could be a page like http://safebooru.org/index.php?page=post&s=list&tags=yorha_no._2_type_b&pid=0 or an API URL like http://safebooru.org/index.php?page=dapi&s=post&tags=yorha_no._2_type_b&q=index&pid=0. Watchable URL This is the same as a Gallery URL but represents an ephemeral page that receives new files much faster than a gallery but will soon 'die' and be deleted. For our purposes, this typically means imageboard threads."},{"location":"downloader_url_classes.html#url_components","title":"the components of a url","text":"As far as we are concerned, a URL string has four parts:
- Scheme:
http
orhttps
- Location/Domain:
safebooru.org
ori.4cdn.org
orcdn002.somebooru.net
- Path Components:
index.php
ortesla/res/7518.json
orpictures/user/daruak/page/2
orart/Commission-animation-Elsa-and-Anna-541820782
- Parameters:
page=post&s=list&tags=yorha_no._2_type_b&pid=40
orpage=post&s=view&id=2429668
So, let's look at the 'edit url class' panel, which is found under network->downloader components->manage url classes:
A TBIB File Page like https://tbib.org/index.php?page=post&s=view&id=6391256 is a Post URL. Let's look at the metadata first:
Name and typeLike with GUGs, we should set a good unambiguous name so the client can clearly summarise this url to the user. 'tbib file page' is good.
This is a Post URL, so we set the 'post url' type.
Association logicAll boorus and most sites only present one file per page, but some sites present multiple files on one page, usually several pages in a series/comic, as with pixiv. Danbooru-style thumbnail links to 'this file has a post parent' do not count here--I mean that a single URL embeds multiple full-size images, either with shared or separate tags. It is very important to the hydrus client's downloader logic (making decisions about whether it has previously visited a URL, so whether to skip checking it again) that if a site can present multiple files on a single page that 'can produce multiple files' is checked.
Related is the idea of whether a 'known url' should be associated. Typically, this should be checked for Post and File URLs, which are fixed, and unchecked for Gallery and Watchable URLs, which are ephemeral and give different results from day to day. There are some unusual exceptions, so give it a brief thought--but if you have no special reason, leave this as the default for the url type.
And now, for matching the string itself, let's revisit our four components:
Scheme TBIB supports http and https, so I have set the 'preferred' scheme to https. Any 'http' TBIB URL a user inputs will be automatically converted to https. Location/DomainFor Post URLs, the domain is always \"tbib.org\".
The 'allow' and 'keep' subdomains checkboxes let you determine if a URL with \"artistname.artsite.com\" will match a URL Class with \"artsite.com\" domain and if that subdomain should be remembered going forward. Most sites do not host content on subdomains, so you can usually leave 'match' unchecked. The 'keep' option (which is only available if 'keep' is checked) is more subtle, only useful for rare cases, and unless you have a special reason, you should leave it checked. (For keep: In cases where a site farms out File URLs to CDN servers on subdomains--like randomly serving a mirror of \"https://muhbooru.org/file/123456\" on \"https://srv2.muhbooru.org/file/123456\"--and removing the subdomain still gives a valid URL, you may not wish to keep the subdomain.) Since TBIB does not use subdomains, these options do not matter--we can leave both unchecked.
'www' and 'www2' and similar subdomains are automatically matched. Don't worry about them.
Path Components TBIB just uses a single \"index.php\" on the root directory, so the path is not complicated. Were it longer (like \"gallery/cgi/index.php\", we would add more (\"gallery\" and \"cgi\"), and since the path of a URL has a strict order, we would need to arrange the items in the listbox there so they were sorted correctly. Parameters TBIB's index.php takes many parameters to render different page types. Note that the Post URL uses \"s=view\", while TBIB Gallery URLs use \"s=list\". In any case, for a Post URL, \"id\", \"page\", and \"s\" are necessary and sufficient."},{"location":"downloader_url_classes.html#string_matches","title":"string matches","text":"As you edit these components, you will be presented with the Edit String Match Panel:
This lets you set the type of string that will be valid for that component. If a given path or query component does not match the rules given here, the URL will not match the URL Class. Most of the time you will probably want to set 'fixed characters' of something like \"post\" or \"index.php\", but if the component you are editing is more complicated and could have a range of different valid values, you can specify just numbers or letters or even a regex pattern. If you try to do something complicated, experiment with the 'example string' entry to make sure you have it set how you think.
Don't go overboard with this stuff, though--most sites do not have super-fine distinctions between their different URL types, and hydrus users will not be dropping user account or logout pages or whatever on the client, so you can be fairly liberal with the rules.
"},{"location":"downloader_url_classes.html#match_details","title":"how do they match, exactly?","text":"This URL Class will be assigned to any URL that matches the location, path, and query. Missing path component or parameters in the URL will invalidate the match but additonal ones will not!
For instance, given:
- URL A: https://8ch.net/tv/res/1002432.html
- URL B: https://8ch.net/tv/res
- URL C: https://8ch.net/tv/res/1002432
- URL D: https://8ch.net/tv/res/1002432.json
- URL Class that looks for \"(characters)/res/(numbers).html\" for the path
Only URL A will match
And:
- URL A: https://boards.4chan.org/m/thread/16086187
- URL B: https://boards.4chan.org/m/thread/16086187/ssg-super-sentai-general-651
- URL Class that looks for \"(characters)/thread/(numbers)\" for the path
Both URL A and B will match
And:
- URL A: https://www.pixiv.net/member_illust.php?mode=medium&illust_id=66476204
- URL B: https://www.pixiv.net/member_illust.php?mode=medium&illust_id=66476204&lang=jp
- URL C: https://www.pixiv.net/member_illust.php?mode=medium
- URL Class that looks for \"illust_id=(numbers)\" in the query
Both URL A and B will match, URL C will not
If multiple URL Classes match a URL, the client will try to assign the most 'complicated' one, with the most path components and then parameters.
Given two example URLs and URL Classes:
- URL A: https://somebooru.com/post/123456
- URL B: https://somebooru.com/post/123456/manga_subpage/2
- URL Class A that looks for \"post/(number)\" for the path
- URL Class B that looks for \"post/(number)/manga_subpage/(number)\" for the path
URL A will match URL Class A but not URL Class B and so will receive A.
URL B will match both and receive URL Class B as it is more complicated.
This situation is not common, but when it does pop up, it can be a pain. It is usually a good idea to match exactly what you need--no more, no less.
"},{"location":"downloader_url_classes.html#url_normalisation","title":"normalising urls","text":"Different URLs can give the same content. The http and https versions of a URL are typically the same, and:
- https://gelbooru.com/index.php?page=post&s=view&id=3767497
- gives the same as:
- https://gelbooru.com/index.php?id=3767497&page=post&s=view
And:
- https://e621.net/post/show/1421754/abstract_background-animal_humanoid-blush-brown_ey
- is the same as:
- https://e621.net/post/show/1421754
- is the same as:
- https://e621.net/post/show/1421754/help_computer-made_up_tags-REEEEEEEE
Since we are in the business of storing and comparing URLs, we want to 'normalise' them to a single comparable beautiful value. You see a preview of this normalisation on the edit panel. Normalisation happens to all URLs that enter the program.
Note that in e621's case (and for many other sites!), that text after the id is purely decoration. It can change when the file's tags change, so if we want to compare today's URLs with those we saw a month ago, we'd rather just be without it.
On normalisation, all URLs will get the preferred http/https switch, and their parameters will be alphabetised. File and Post URLs will also cull out any surplus path or query components. This wouldn't affect our TBIB example above, but it will clip the e621 example down to that 'bare' id URL, and it will take any surplus 'lang=en' or 'browser=netscape_24.11' garbage off the query text as well. URLs that are not associated and saved and compared (i.e. normal Gallery and Watchable URLs) are not culled of unmatched path components or query parameters, which can sometimes be useful if you want to match (and keep intact) gallery URLs that might or might not include an important 'sort=desc' type of parameter.
Since File and Post URLs will do this culling, be careful that you not leave out anything important in your rules. Make sure what you have is both necessary (nothing can be removed and still keep it valid) and sufficient (no more needs to be added to make it valid). It is a good idea to try pasting the 'normalised' version of the example URL into your browser, just to check it still works.
"},{"location":"downloader_url_classes.html#default_values","title":"'default' values","text":"Some sites present the first page of a search like this:
https://danbooru.donmai.us/posts?tags=skirt
But the second page is:
https://danbooru.donmai.us/posts?tags=skirt&page=2
Another example is:
https://www.hentai-foundry.com/pictures/user/Mister69M
https://www.hentai-foundry.com/pictures/user/Mister69M/page/2
What happened to 'page=1' and '/page/1'? Adding those '1' values in works fine! Many sites, when an index is absent, will secretly imply an appropriate 0 or 1. This looks pretty to users looking at a browser address bar, but it can be a pain for us, who want to match both styles to one URL Class. It would be nice if we could recognise the 'bare' initial URL and fill in the '1' values to coerce it to the explicit, automation-friendly format. Defaults to the rescue:
After you set a path component or parameter String Match, you will be asked for an optional 'default' value. You won't want to set one most of the time, but for Gallery URLs, it can be hugely useful--see how the normalisation process automatically fills in the missing path component with the default! There are plenty of examples in the default Gallery URLs of this, so check them out. Most sites use page indices starting at '1', but Gelbooru-style imageboards use 'pid=0' file index (and often move forward 42, so the next pages will be 'pid=42', 'pid=84', and so on, although others use deltas of 20 or 40).
"},{"location":"downloader_url_classes.html#next_gallery_page_prediction","title":"can we predict the next gallery page?","text":"Now we can harmonise gallery urls to a single format, we can predict the next gallery page! If, say, the third path component or 'page' parameter is always a number referring to page, you can select this under the 'next gallery page' section and set the delta to change it by. The 'next gallery page url' section will be automatically filled in. This value will be consulted if the parser cannot find a 'next gallery page url' from the page content.
It is neat to set this up, but I only recommend it if you actually cannot reliably parse a next gallery page url from the HTML later in the process. It is neater to have searches stop naturally because the parser said 'no more gallery pages' than to have hydrus always one page beyond and end every single search on an uglier 'No results found' or 404 result.
Unfortunately, some sites will either not produce an easily parsable next page link or randomly just not include it due to some issue on their end (Gelbooru is a funny example of this). Also, APIs will often have a kind of 'start=200&num=50', 'start=250&num=50' progression but not include that state in the XML or JSON they return. These cases require the automatic next gallery page rules (check out Artstation and tumblr api gallery page URL Classes in the defaults for examples of this).
"},{"location":"downloader_url_classes.html#api_links","title":"how do we link to APIs?","text":"If you know that a URL has an API backend, you can tell the client to use that API URL when it fetches data. The API URL needs its own URL Class.
To define the relationship, click the \"String Converter\" button, which gives you this:
You may have seen this panel elsewhere. It lets you convert a string to another over a number of transformation steps. The steps can be as simple as adding or removing some characters or applying a full regex substitution. For API URLs, you are mostly looking to isolate some unique identifying data (\"m/thread/16086187\" in this case) and then substituting that into the new API path. It is worth testing this with several different examples!
When the client links regular URLs to API URLs like this, it will still associate the human-pretty regular URL when it needs to display to the user and record 'known urls' and so on. The API is just a quick lookup when it actually fetches and parses the respective data.
"},{"location":"duplicates.html","title":"duplicates","text":"As files are shared on the internet, they are often resized, cropped, converted to a different format, altered by the original or a new artist, or turned into a template and reinterpreted over and over and over. Even if you have a very restrictive importing workflow, your client is almost certainly going to get some duplicates. Some will be interesting alternate versions that you want to keep, and others will be thumbnails and other low-quality garbage you accidentally imported and would rather delete. Along the way, it would be nice to merge your ratings and tags to the better files so you don't lose any work.
Finding and processing duplicates within a large collection is impossible to do by hand, so I have written a system to do the heavy lifting for you. It currently works on still images, but an extension for gifs and video is planned.
Hydrus finds potential duplicates using a search algorithm that compares images by their shape. Once these pairs of potentials are found, they are presented to you through a filter like the archive/delete filter to determine their exact relationship and if you want to make a further action, such as deleting the 'worse' file of a pair. All of your decisions build up in the database to form logically consistent groups of duplicates and 'alternate' relationships that can be used to infer future information. For instance, if you say that file A is a duplicate of B and B is a duplicate of C, A and C are automatically recognised as duplicates as well.
This all starts on--
"},{"location":"duplicates.html#duplicates_page","title":"the duplicates processing page","text":"On the normal 'new page' selection window, hit special->duplicates processing. This will open this page:
Let's go to the preparation page first:
The 'similar shape' algorithm works on distance. Two files with 0 distance are likely exact matches, such as resizes of the same file or lower/higher quality jpegs, whereas those with distance 4 tend to be to be hairstyle or costume changes. You will be starting on distance 0 and not expect to ever go above 4 or 8 or so. Going too high increases the danger of being overwhelmed by false positives.
If you are interested, the current version of this system uses a 64-bit phash to represent the image shape and a VPTree to search different files' phashes' relative hamming distance. I expect to extend it in future with multiple phash generation (flips, rotations, and 'interesting' image crops and video frames) and most-common colour comparisons.
Searching for duplicates is fairly fast per file, but with a large client with hundreds of thousands of files, the total CPU time adds up. You can do a little manual searching if you like, but once you are all settled here, I recommend you hit the cog icon on the preparation page and let hydrus do this page's catch-up search work in your regular maintenance time. It'll swiftly catch up and keep you up to date without you even thinking about it.
Start searching on the 'exact match' search distance of 0. It is generally easier and more valuable to get exact duplicates out of the way first.
Once you have some files searched, you should see a potential pair count appear in the 'filtering' page.
"},{"location":"duplicates.html#duplicate_filtering_page","title":"the filtering page","text":"Processing duplicates can be real trudge-work if you do not set up a workflow you enjoy. It is a little slower than the archive/delete filter, and sometimes takes a bit more cognitive work. For many users, it is a good task to do while listening to a podcast or having a video going on another screen.
If you have a client with tens of thousands of files, you will likely have thousands of potential pairs. This can be intimidating, but do not worry--due to the A, B, C logical inferrences as above, you will not have to go through every single one. The more information you put into the system, the faster the number will drop.
The filter has a regular file search interface attached. As you can see, it defaults to system:everything, but you can limit what files you will be working on simply by adding new search predicates. You might like to only work on files in your archive (i.e. that you know you care about to begin with), for instance. You can choose whether both files of the pair should match the search, or just one. 'creator:' tags work very well at cutting the search domain to something more manageable and consistent--try your favourite creator!
If you would like an example from the current search domain, hit the 'show some random potential pairs' button, and it will show two or more files that seem related. It is often interesting and surprising to see what it finds! The action buttons below allow for quick processing of these pairs and groups when convenient (particularly for large cg sets with 100+ alternates), but I recommend you leave these alone until you know the system better.
When you are ready, launch the filter.
"},{"location":"duplicates.html#duplicates_filter","title":"the duplicates filter","text":"We have not set up your duplicate 'merge' options yet, so do not get too into this. For this first time, just poke around, make some pretend choices, and then cancel out and choose to forget them.
Like the archive/delete filter, this uses quick mouse-clicks, keyboard shortcuts, or button clicks to action pairs. It presents two files at a time, labelled A and B, which you can quickly switch between just as in the normal media viewer. As soon as you action them, the next pair is shown. The two files will have their current zoom-size locked so they stay the same size (and in the same position) as you switch between them. Scroll your mouse wheel a couple of times and see if any obvious differences stand out.
Please note the hydrus media viewer does not currently work well with large resolutions at high zoom (it gets laggy and may have memory issues). Don't zoom in to 1600% and try to look at jpeg artifact differences on very large files, as this is simply not well supported yet.
The hover window on the right also presents a number of 'comparison statements' to help you make your decision. Green statements mean this current file is probably 'better', and red the opposite. Larger, older, higher-quality, more-tagged files are generally considered better. These statements have scores associated with them (which you can edit in file->options->duplicates), and the file of the pair with the highest score is presented first. If the files are duplicates, you can generally assume the first file you see, the 'A', is the better, particularly if there are several green statements.
The filter will need to occasionally checkpoint, saving the decisions so far to the database, before it can fetch the next batch. This allows it to apply inferred information from your current batch and reduce your pending count faster before serving up the next set. It will present you with a quick interstitial 'confirm/back' dialog just to let you know. This happens more often as the potential count decreases.
"},{"location":"duplicates.html#duplicates_decisions","title":"the decisions to make","text":"There are three ways a file can be related to another in the current duplicates system: duplicates, alternates, or false positive (not related).
False positive (not related) is the easiest. You will not see completely unrelated pairs presented very often in the filter, particularly at low search distances, but if the shape of face and hair and clothing happen to line up (or geometric shapes, often), the search system may make a false positive match. In this case, just click 'they are not related'.
Alternate relations are files that are not duplicates but obviously related in some way. Perhaps a costume change or a recolour. Hydrus does not have rich alternate support yet (but it is planned, and highly requested), so this relationship is mostly a 'holding area' for files that we will revisit for further processing in the future.
Duplicate files are of the exact same thing. They may be different resolutions, file formats, encoding quality, or one might even have watermark, but they are fundamentally different views on the exact same art. As you can see with the buttons, you can select one file as the 'better' or say they are about the same. If the files are basically the same, there is no point stressing about which is 0.2% better--just click 'they are the same'. For better/worse pairs, you might have reason to keep both, but most of the time I recommend you delete the worse.
You can customise the shortcuts under file->shortcuts->duplicate_filter. The defaults are:
-
Left-click: this is better, delete the other.
-
Right-click: they are related alternates.
-
Middle-click: Go back one decision.
-
Enter/Escape: Stop filtering.
If two duplicates have different metadata like tags or archive status, you probably want to merge them. Cancel out of the filter and click the 'edit default duplicate metadata merge options' button:
By default, these options are fairly empty. You will have to set up what you want based on your services and preferences. Setting a simple 'copy all tags' is generally a good idea, and like/dislike ratings also often make sense. The settings for better and same quality should probably be similar, but it depends on your situation.
If you choose the 'custom action' in the duplicate filter, you will be presented with a fresh 'edit duplicate merge options' panel for the action you select and can customise the merge specifically for that choice. ('favourite' options will come here in the future!)
Once you are all set up here, you can dive into the duplicate filter. Please let me know how you get on with it!
"},{"location":"duplicates.html#future","title":"what now?","text":"The duplicate system is still incomplete. Now the db side is solid, the UI needs to catch up. Future versions will show duplicate information on thumbnails and the media viewer and allow quick-navigation to a file's duplicates and alternates.
For now, if you wish to see a file's duplicates, right-click it and select file relationships. You can review all its current duplicates, open them in a new page, appoint the new 'best file' of a duplicate group, and even mass-action selections of thumbnails.
You can also search for files based on the number of file relations they have (including when setting the search domain of the duplicate filter!) using system:file relationships. You can also search for best/not best files of groups, which makes it easy, for instance, to find all the spare duplicate files if you decide you no longer want to keep them.
I expect future versions of the system to also auto-resolve easy duplicate pairs, such as clearing out pixel-for-pixel png versions of jpgs.
"},{"location":"duplicates.html#game_cgs","title":"game cgs","text":"If you import a lot of game CGs, which frequently have dozens or hundreds of alternates, I recommend you set them as alternates by selecting them all and setting the status through the thumbnail right-click menu. The duplicate filter, being limited to pairs, needs to compare all new members of an alternate group to all other members once to verify they are not duplicates. This is not a big deal for alternates with three or four members, but game CGs provide an overwhelming edge case. Setting a group of thumbnails as alternate 'fixes' their alternate status immediately, discounting the possibility of any internate duplicates, and provides an easy way out of this situation.
"},{"location":"duplicates.html#duplicates_examples","title":"more information and examples","text":""},{"location":"duplicates.html#duplicates_examples_better_worse","title":"better/worse","text":"Which of two files is better? Here are some common reasons:
- higher resolution
- better image quality
- png over jpg for screenshots
- jpg over png for busy images
- jpg over png for pixel-for-pixel duplicates
- a better crop
- no watermark or site-frame or undesired blemish
- has been tagged by other people, so is likely to be the more 'popular'
However these are not hard rules--sometimes a file has a larger resolution or filesize due to a bad upscaling or encoding decision by the person who 'reinterpreted' it. You really have to look at it and decide for yourself.
Here is a good example of a better/worse pair:
The first image is better because it is a png (pixel-perfect pngs are always better than jpgs for screenshots of applications--note how obvious the jpg's encoding artifacts are on the flat colour background) and it has a slightly higher (original) resolution, making it less blurry. I presume the second went through some FunnyJunk-tier trash meme site to get automatically cropped to 960px height and converted to the significantly smaller jpeg. Whatever happened, let's drop the second and keep the first.
When both files are jpgs, differences in quality are very common and often significant:
Again, this is mostly due to some online service resizing and lowering quality to ease on their bandwidth costs. There is usually no reason to keep the lower quality version.
"},{"location":"duplicates.html#duplicates_examples_same","title":"same quality duplicates","text":"When are two files the same quality? A good rule of thumb is if you scroll between them and see no obvious differences, and the comparison statements do not suggest anything significant, just set them as same quality.
Here are two same quality duplicates:
There is no obvious different between those two. The filesize is significantly different, so I suspect the smaller is a lossless png optimisation, but in the grand scheme of things, that doesn't matter so much. Many of the big content providers--Facebook, Google, Cloudflare--automatically 'optimise' the data that goes through their networks in order to save bandwidth. Although jpegs are often a slaughterhouse, with pngs it is usually harmless.
Given the filesize, you might decide that these are actually a better/worse pair--but if the larger image had tags and was the 'canonical' version on most boorus, the decision might not be so clear. You can choose better/worse and delete one randomly, but sometimes you may just want to keep both without a firm decision on which is best, so just set 'same quality' and move on. Your time is more valuable than a few dozen KB.
Sometimes, you will see pixel-for-pixel duplicate jpegs of very slightly different size, such as 787KB vs 779KB. The smaller of these is usually an exact duplicate that has had its internal metadata (e.g. EXIF tags) stripped by a program or website CDN. They are same quality unless you have a strong opinion on whether having internal metadata in a file is useful.
"},{"location":"duplicates.html#duplicates_examples_alternates","title":"alternates","text":"As I wrote above, hydrus's alternates system in not yet properly ready. It is important to have a basic 'alternates' relationship for now, but it is a holding area until we have a workflow to apply 'WIP'- or 'recolour'-type labels and present that information nicely in the media viewer.
Alternates are not of exactly the same thing, but one is variant of the other or they are both descended from a common original. The precise definition is up to you, but it generally means something like:
- the files are recolours
- the files are alternate versions of the same image produced by the same or different artists (e.g. clean/messy or with/without hair ribbon)
- iterations on a close template
- different versions of a file's progress, such as the steps from the initial draft sketch to a final shaded version
Here are some recolours of the same image:
And some WIP:
And a costume change:
None of these are duplicates, but they are obviously related. The duplicate search will notice they are similar, so we should let the client know they are 'alternate'.
Here's a subtler case:
These two files are very similar, but try opening both in separate tabs and then flicking back and forth: the second's glove-string is further into the mouth and has improved chin shading, a more refined eye shape, and shaved pubic hair. It is simple to spot these differences in the client's duplicate filter when you scroll back and forth.
I believe the second is an improvement on the first by the same artist, so it is a WIP alternate. You might also consider it a 'better' improvement.
Here are three files you might or might not consider to be alternates:
These are all based on the same template--which is why the dupe filter found them--but they are not so closely related as those above, and the last one is joking about a different ideology entirely and might deserve to be in its own group. Ultimately, you might prefer just to give them some shared tag and consider them not alternates per se.
"},{"location":"duplicates.html#duplicates_examples_false_positive","title":"not related/false positive","text":"Here are two files that match false positively:
Despite their similar shape, they are neither duplicates nor of even the same topic. The only commonality is the medium. I would not consider them close enough to be alternates--just adding something like 'screenshot' and 'imageboard' as tags to both is probably the closest connection they have.
Recording the 'false positive' relationship is important to make sure the comparison does not come up again in the duplicate filter.
The incidence of false positives increases as you broaden the search distance--the less precise your search, the less likely it is to be correct. At distance 14, these files all match, but uselessly:
"},{"location":"duplicates.html#duplicates_advanced","title":"the duplicates system","text":"(advanced nonsense, you can skip this section. tl;dr: duplicate file groups keep track of their best quality file, sometimes called the King)
Hydrus achieves duplicate transitivity by treating duplicate files as groups. Although you action pairs, if you set (A duplicate B), that creates a group (A,B). Subsequently setting (B duplicate C) extends the group to be (A,B,C), and so (A duplicate C) is transitively implied.
The first version of the duplicate system attempted to record better/worse/same information for all files in a virtual duplicate group, but this proved very complicated, workflow-heavy, and not particularly useful. The new system instead appoints a single King as the best file of a group. All other files in the group are beneath the King and have no other relationship data retained.
This King represents the group in the duplicate filter (and in potential pairs, which are actually recorded between duplicate media groups--even if most of them at the outset only have one member). If the other file in a pair is considered better, it becomes the new King, but if it is worse or equal, it merges into the other members. When two Kings are compared, whole groups can merge!
Alternates are stored in a similar way, except the members are duplicate groups rather than individual files and they have no significant internal relationship metadata yet. If \u03b1, \u03b2, and \u03b3 are duplicate groups that each have one or more files, then setting (\u03b1 alt \u03b2) and (\u03b2 alt \u03b3) creates an alternate group (\u03b1,\u03b2,\u03b3), with the caveat that \u03b1 and \u03b3 will still be sent to the duplicate filter once just to check they are not duplicates by chance. The specific file members of these groups, A, B, C and so on, inherit the relationships of their parent groups when you right-click on their thumbnails.
False positive relationships are stored between pairs of alternate groups, so they apply transitively between all the files of either side's alternate group. If (\u03b1 alt \u03b2) and (\u03c8 alt \u03c9) and you apply (\u03b1 fp \u03c8), then (\u03b1 fp \u03c9), (\u03b2 fp \u03c8), and (\u03b2 fp \u03c9) are all transitively implied.
More examplesThe duplicates filter can be pretty tedious to work with. Pairs that have trivial differences are easy to resolve, but working through dozens of obvious resizes or pixel duplicates that all follow the same pattern can get boring.
If only there were some way to automate common situations! We could have hydrus solve these trivial duplicates in the background, leaving us with less, more interesting work to do.
"},{"location":"duplicates_auto_resolution.html#duplicates_auto-resolution","title":"duplicates auto-resolution","text":"This is a new system that I am still developing. The plan is to roll out a hardcoded rule that resolves jpeg and png pixel dupes and then iterate on the UI and workflow to let users add their own custom rules. If you try it, let me know how you find things!
So, let's start with a simple and generally non-controversial example: pixel duplicate jpegs and pngs. When you save a jpeg, you get some 'fuzzy' artifacts, but when you save a png, it is always pixel perfect. Thus, when you have a normal jpeg and a png that are pixel duplicates, you know, for certain, that the png is a copy of the jpeg. This happens most often when someone is posting from one application to another, or with a phone, and rather than uploading the source jpeg, they do 'copy image' and paste that into the upload box--the browser creates the accursed 'Clipboard.png', and we are thus overwhelmed with spam.
In this case, we always want to keep the (almost always smaller) jpeg and ditch the (bloated, derived) png, which in the duplicates system would be:
- A two-part duplicates search, for 'system:filetype is jpg' and 'system:filetype is png', with 'must be pixel dupes'.
- Arranging 'the jpeg is A, the png is B'
- Sending the normal duplicate action of 'set A as better than B, and delete B'.
Let's check out the 'auto-resolution' tab under the duplicates filtering page:
(image)
The auto-resolution system lets you have multiple 'rules'. Each represents a search, a way of testing pairs, and then an action. Let's check the edit dialog:
(image of edit rules)
(image of edit rule, png vs jpeg)
Note that this adds the 'system:height/width > 128' predicates as a failsafe to ensure we are checking real images in this case, not tiny 16x16 icons where there might be a legitimate accidentaly jpeg/png pixel dupe, and where the decision on what to keep is not so simple. Automated systems are powerful magic wands, and we should always be careful waving them around.
Talk about metadata conditional objects here.
Talk about the pair Comparator stuff, 4x filesize and so on. Might be more UI, so maybe a picture of the sub-panel.
Hydrus will work these rules in its normal background maintenance time. You can force them to work a bit harder if you want to catch up somewhere, but normally you can just leave them alone and they'll stay up to date with new imports.
"},{"location":"duplicates_auto_resolution.html#future","title":"future","text":"I will expand the Metadata Conditional to cover more tests, including most of the hooks in the duplicates filter summaries, like 'this has exif data'. And, assuming the trivial cases go well, I'd like to push toward less-certain comparions and have some sort of tools for 'A is at least 99.7% similar to B', which will help with resize comparisons and differentiating dupes from alternates.
I'd also eventually like auto-resolution to apply to files as they are imported, so, in the vein of 'previously deleted', you could have an instant import result of 'duplicate discarded: (rule name)'.
"},{"location":"faq.html","title":"FAQ","text":""},{"location":"faq.html#repositories","title":"What is a repository?","text":"A repository is a service in the hydrus network that stores a certain kind of information--files or tag mappings, for instance--as submitted by users all over the internet. Those users periodically synchronise with the repository so they know everything that it stores. Sometimes, like with tags, this means creating a complete local copy of everything on the repository. Hydrus network clients never send queries to repositories; they perform queries over their local cache of the repository's data, keeping everything confined to the same computer.
"},{"location":"faq.html#tags","title":"What is a tag?","text":"Wikipedia
A tag is a small bit of text describing a single property of something. They make searching easy. Good examples are \"flower\" or \"nicolas cage\" or \"the sopranos\" or \"2003\". By combining several tags together ( e.g. [ 'tiger woods', 'sports illustrated', '2008' ] or [ 'cosplay', 'the legend of zelda' ] ), a huge image collection is reduced to a tiny and easy-to-digest sample.
A good word for the connection of a particular tag to a particular file is mapping.
Hydrus is designed with the intention that tags are for searching, not describing. Workflows and UI are tuned for finding files and other similar files (e.g. by the same artist), and while it is possible to have nice metadata overlays around files, this is not considered their chief purpose. Trying to have 'perfect' descriptions for files is often a rabbit-hole that can consume hours of work with relatively little demonstrable benefit.
All tags are automatically converted to lower case. 'Sunset Drive' becomes 'sunset drive'. Why?
- Although it is more beautiful to have 'The Lord of the Rings' rather than 'the lord of the rings', there are many, many special cases where style guides differ on which words to capitalise.
- As 'The Lord of the Rings' and 'the lord of the rings' are semantically identical, it is natural to search in a case insensitive way. When case does not matter, what point is there in recording it?
Furthermore, leading and trailing whitespace is removed, and multiple whitespace is collapsed to a single character.
' yellow dress '
becomes
'yellow dress'
"},{"location":"faq.html#namespaces","title":"What is a namespace?","text":"A namespace is a category that in hydrus prefixes a tag. An example is 'person' in the tag 'person:ron paul'--it lets people and software know that 'ron paul' is a name. You can create any namespace you like; just type one or more words and then a colon, and then the next string of text will have that namespace.
The hydrus client gives namespaces different colours so you can pick out important tags more easily in a large list, and you can also search by a particular namespace, even creating complicated predicates like 'give all files that do not have any character tags', for instance.
"},{"location":"faq.html#filenames","title":"Why not use filenames and folders?","text":"As a retrieval method, filenames and folders are less and less useful as the number of files increases. Why?
- A filename is not unique; did you mean this \"04.jpg\" or this \"04.jpg\" in another folder? Perhaps \"04 (3).jpg\"?
- A filename is not guaranteed to describe the file correctly, e.g. hello.jpg
- A filename is not guaranteed to stay the same, meaning other programs cannot rely on the filename address being valid or even returning the same data every time.
-
A filename is often--for ridiculous reasons--limited to a certain prohibitive character set. Even when utf-8 is supported, some arbitrary ascii characters are usually not, and different localisations, operating systems and formatting conventions only make it worse.
-
Folders can offer context, but they are clunky and time-consuming to change. If you put each chapter of a comic in a different folder, for instance, reading several volumes in one sitting can be a pain. Nesting many folders adds navigation-latency and tends to induce less informative \"04.jpg\"-type filenames.
So, the client tracks files by their hash. This technical identifier easily eliminates duplicates and permits the database to robustly attach other metadata like tags and ratings and known urls and notes and everything else, even across multiple clients and even if a file is deleted and later imported.
As a general rule, I suggest you not set up hydrus to parse and display all your imported files' filenames as tags. 'image.jpg' is useless as a tag. Shed the concept of filenames as you would chains.
"},{"location":"faq.html#external_files","title":"Can the client manage files from their original locations?","text":"When the client imports a file, it makes a quickly accessible but human-ugly copy in its internal database, by default under install_dir/db/client_files. When it needs to access that file again, it always knows where it is, and it can be confident it is what it expects it to be. It never accesses the original again.
This storage method is not always convenient, particularly for those who are hesitant about converting to using hydrus completely and also do not want to maintain two large copies of their collections. The question comes up--\"can hydrus track files from their original locations, without having to copy them into the db?\"
The technical answer is, \"This support could be added,\" but I have decided not to, mainly because:
- Files stored in locations outside of hydrus's responsibility can change or go missing (particularly if a whole parent folder is moved!), which erodes the assumptions it makes about file access, meaning additional checks would have to be added before important operations, often with no simple recovery.
- External duplicates would not be merged, and the file system would have to be extended to handle pointless 1->n hash->path relationships.
- Many regular operations--like figuring out whether orphaned files should be physically deleted--are less simple.
- Backing up or restoring a distributed external file system is much more complicated.
- It would require more code to maintain and would mean a laggier db and interface.
- Hydrus is an attempt to get away from files and folders--if a collection is too large and complicated to manage using explorer, what's the point in supporting that old system?
It is not unusual for new users who ask for this feature to find their feelings change after getting more experience with the software. If desired, path text can be preserved as tags using regexes during import, and getting into the swing of searching by metadata rather than navigating folders often shows how very effective the former is over the latter. Most users eventually import most or all of their collection into hydrus permanently, deleting their old folder structure as they go.
For this reason, if you are hesitant about doing things the hydrus way, I advise you try running it on a smaller subset of your collection, say 5,000 files, leaving the original copies completely intact. After a month or two, think about how often you used hydrus to look at the files versus navigating through folders. If you barely used the folders, you probably do not need them any more, but if you used them a lot, then hydrus might not be for you, or it might only be for some sorts of files in your collection.
"},{"location":"faq.html#sqlite","title":"Why use SQLite?","text":"Hydrus uses SQLite for its database engine. Some users who have experience with other engines such as MySQL or PostgreSQL sometimes suggest them as alternatives. SQLite serves hydrus's needs well, and at the moment, there are no plans to change.
Since this question has come up frequently, a user has written an excellent document talking about the reasons to stick with SQLite. If you are interested in this subject, please check it out here:
https://gitgud.io/prkc/hydrus-why-sqlite/blob/master/README.md
"},{"location":"faq.html#hashes","title":"What is a hash?","text":"Wikipedia
Hashes are a subject you usually have to be a software engineer to find interesting. The simple answer is that they are unique names for things. Hashes make excellent identifiers inside software, as you can safely assume that f099b5823f4e36a4bd6562812582f60e49e818cf445902b504b5533c6a5dad94 refers to one particular file and no other. In the client's normal operation, you will never encounter a file's hash. If you want to see a thumbnail bigger, double-click it; the software handles the mathematics.
For those who are interested: hydrus uses SHA-256, which spits out 32-byte (256-bit) hashes. The software stores the hash densely, as 32 bytes, only encoding it to 64 hex characters when the user views it or copies to clipboard. SHA-256 is not perfect, but it is a great compromise candidate; it is secure for now, it is reasonably fast, it is available for most programming languages, and newer CPUs perform it more efficiently all the time.
"},{"location":"faq.html#access_keys","title":"What is an access key?","text":"The hydrus network's repositories do not use username/password, but instead a single strong identifier-password like this:
7ce4dbf18f7af8b420ee942bae42030aab344e91dc0e839260fcd71a4c9879e3
These hex numbers give you access to a particular account on a particular repository, and are often combined like so:
7ce4dbf18f7af8b420ee942bae42030aab344e91dc0e839260fcd71a4c9879e3@hostname.com:45871
They are long enough to be impossible to guess, and also randomly generated, so they reveal nothing personally identifying about you. Many people can use the same access key (and hence the same account) on a repository without consequence, although they will have to share any bandwidth limits, and if one person screws around and gets the account banned, everyone will lose access.
The access key is the account. Do not give it to anyone you do not want to have access to the account. An administrator will never need it; instead they will want your account id.
"},{"location":"faq.html#account_ids","title":"What is an account id?","text":"This is another long string of random hexadecimal that identifies your account without giving away access. If you need to identify yourself to a repository administrator (say, to get your account's permissions modified), you will need to tell them your account id. You can copy it to your clipboard in services->review services.
"},{"location":"faq.html#service_isolation","title":"Why does the file I deleted and then re-imported still have its tags?","text":"Hydrus splits its different abilities and domains (e.g. the list of files on your disk, or the tag mappings in 'my tags', or your files' notes) into separate services. You can see these in review services and manage services. Although the services of the same type may interact (e.g. deleting a file from one service might send that file to the 'trash' service, or adding tag parents to one tag service might implicate tags on another), those of different types are generally completely independent. Your tags don't care where the files they map to are.
So, when you delete a file from 'my files', none of its tag mappings in 'my tags' change--they remain attached to the 'ghost' of the deleted file. Your notes, ratings, and known URLs are the same (URLs is important, since it lets the client skip URLs for files you previously deleted). If you re-import the file, it will have everything it did before, with only a couple of pertinent changes like, obviously, import time.
This is an important part of how the PTR works--when you sync with the PTR, your client downloads a couple billion mappings for files you do not have yet. Then, when you happen to import one of those files, it appears in your importer with its PTR tags 'apparently' already set--in truth, it always had them.
When you feel like playing with some more advanced concepts, turn on help->advanced mode and open a new search page. Change the file domain from 'my files' to 'all known files' or 'deleted from my files' and start typing a common tag--you'll get autocomplete results with counts! You can even run the search, and you'll get a ton of 'non-local' and therefore non-viewable files that are typically given a default hydrus thumbnail. These are files that your client is aware of, but does not currently have. You can run the manage x dialogs and edit the metadata of these ghost files just as you can your real ones. The only thing hydrus ever needs to attach metadata to a file is the file's SHA256 hash.
If you really want to delete the tags or other data for some files you deleted, then:
- If the job is small, do a search for the files inside 'deleted from my local files' (or 'all known files' if you did not leave a deletion record) and then hit
Ctrl+A->manage tags
and manually delete the tags there. - If the job is very large, then make a backup and hit up tags->migrate tags. You can select the tag service x tag mappings for all files in 'deleted from my local files' and then make the action to delete from x again.
- If the job is complicated, then note that you can open the tags->migrate tags dialog from manage tags, and it will only apply to the files that booted manage tags.
Not really. Unless your situation involves millions of richly locally tagged files and a gigantic deleted:kept file ratio, don't worry about it.
"},{"location":"faq.html#does_the_metadata_for_files_i_deleted_mean_there_is_some_kind_of_a_permanent_record_of_which_files_my_client_has_heard_about_andor_seen_directly_even_if_i_purge_the_deletion_record","title":"Does the metadata for files I deleted mean there is some kind of a permanent record of which files my client has heard about and/or seen directly, even if I purge the deletion record?","text":"Yes. I am working on updating the database infrastructure to allow a full purge, but the structure is complicated, so it will take some time. If you are afraid of someone stealing your hard drive and matriculating your sordid MLP collection (or, in this case, the historical log of horrors that you rejected), do some research into drive encryption. Hydrus runs fine off an encrypted disk.
"},{"location":"faq.html#i_just_imported_files_from_my_hard_drive_collection_how_can_i_get_their_tags_from_the_boorus","title":"I just imported files from my hard drive collection. How can I get their tags from the boorus?","text":"The problem of 'what tags should these files have?' is technically difficult to solve, and there isn't a fast and easy way to query a booru and say 'hey, what are your tags for this?', particularly en masse. It is even more difficult to keep up with updates (e.g. someone adding a tag to a file some months or years after it was uploaded). This is the main problem I designed the PTR to solve.
If you cannot or do not want to devote the local resources to sync with the PTR, there are a few hacky ways to perform tag lookups, mostly with manual hash-based lookups. The big boorus support file search based on 'md5' hash, so there are ways to build a workflow where you can 'search' a booru or iqdb for one file at a time to see if there is a hit, and then get tags as if you were downloading it. An old system in the client called 'file lookup scripts' works like this, in the manage tags dialog, and some users have figured out ways to make it work with some clever downloaders.
Be careful with these systems. They tend to be slow and use a lot of resources serverside, so you will be rude if you hit them too hard. They work for a handful of files every now and then, but please do not set up jobs of many many thousands of files, and absolutely do not repeat the job for the same files regularly--you will just waste a lot of CPU and network time for everyone, and only gain a couple of tags in the process. Note that the hash-based lookups only work if your files have not changed since being downloaded; if you have scaled them, stripped metadata, or optimised quality, then they will count as new files and the hashes will have changed, and you will need to think about services like iqdb or saucenao, or ultimately the hydrus duplicate resolution system.
That said, here is a user guide on how to perform various kinds of file lookups.
If you are feeling adventurous, you can also explore the newer AI-tagging tools that users are working on.
Ultimately, though, a good and simple way to backfill your files' tags is just rely on normal downloading workflows. Try downloading your favourite artists (and later set up subscriptions) and you will naturally get files you like, with tags, and if, by (expected) serendipity, a file on the site is the same as one you already imported, hydrus will add the tags to it retroactively.
"},{"location":"faq.html#encryption","title":"Does Hydrus run ok off an encrypted drive partition?","text":"Yes! Both the database and your files should be fine on any of the popular software solutions. These programs give your OS a virtual drive that on my end looks and operates like any other. I have yet to encounter one that SQLite has a problem with. Make sure you don't have auto-dismount set--or at least be hawkish that it will never trigger while hydrus is running--or you could damage your database.
Drive encryption is a good idea for all your private things. If someone steals your laptop or USB stick, it means you only have to deal with frustration and replacement expenses (rather than also a nightmare of anxiety and identity-loss as some bad guy combs through all your things).
If you don't know how drive encryption works, search it up and have a play with a spare USB stick or a small 256MB file partition. Veracrypt is a popular and easy program, but there are several solutions. Get some practice and take it seriously, since if you act foolishly you can really screw yourself (e.g. locking yourself out of the only copy of data you have left because you forgot the password). Make sure you have a good plan, reliable (encrypted) backups, and a password manager.
"},{"location":"faq.html#delays","title":"Why can my friend not see what I just uploaded?","text":"The repositories do not work like conventional search engines; it takes a short but predictable while for changes to propagate to other users.
The client's searches only ever happen over its local cache of what is on the repository. Any changes you make will be delayed for others until their next update occurs. At the moment, the update period is 100,000 seconds, which is about 1 day and 4 hours.
"},{"location":"filetypes.html","title":"Supported Filetypes","text":"This is a list of all filetypes Hydrus can import. Hydrus determines the filetype based on examining the file itself rather than the extension or MIME type.
The filetype for a file can be overridden with
"},{"location":"filetypes.html#images","title":"Images","text":"Filetype Extension MIME type Thumbnails Viewable in Hydrus Notes jpegmanage -> force filetype
in the context menu for a file..jpeg
image/jpeg
\u2705 \u2705 png.png
image/png
\u2705 \u2705 static gif.gif
image/gif
\u2705 \u2705 webp.webp
image/webp
\u2705 \u2705 avif.avif
image/avif
\u2705 \u2705 bitmap.bmp
image/bmp
\u2705 \u2705 heic.heic
image/heic
\u2705 \u2705 heif.heif
image/heif
\u2705 \u2705 icon.ico
image/x-icon
\u2705 \u2705 qoi.qoi
image/qoi
\u2705 \u2705 Quite OK Image Format tiff.tiff
image/tiff
\u2705 \u2705"},{"location":"filetypes.html#animations","title":"Animations","text":"Filetype Extension MIME type Thumbnails Viewable in Hydrus Notes animated gif.gif
image/gif
\u2705 \u2705 apng.apng
image/apng
\u2705 \u2705 animated webp.webp
image/webp
\u2705 \u2705 avif sequence.avifs
image/avif-sequence
\u2705 \u2705 heic sequence.heics
image/heic-sequence
\u2705 \u2705 heif sequence.heifs
image/heif-sequence
\u2705 \u2705 ugoira.zip
application/zip
\u2705 \u26a0\ufe0f More info"},{"location":"filetypes.html#ugoira","title":"Ugoira","text":"Pixiv Ugoira format is a custom animation format used by Pixiv. The Pixiv API provides a list of frame files (normally JPEG or PNG) and their durations. The frames can be stored in a ZIP file along with a JSON file containing the frame and duration information. A zip file containing images with 6 digit zero-padded filenames will be identified as a Ugoira file in hydrus.
If there are no frame durations provided hydrus will assume each frame should last 125ms. Hydrus will look inside the zip for a file called
animation.json
and try to parse it as the 2 most common metadata formats that PixivUtil and gallery-dl generate. The Ugoira file will only have a duration in the database if it contains a validanimation.json
.When played hydrus will first attempt to use the
"},{"location":"filetypes.html#video","title":"Video","text":"Filetype Extension MIME type Thumbnails Viewable in Hydrus Notes mp4animation.json
file, but if that does not exist, it will look for notes containing frame delays. First it looks for a note namedugoira json
and attempts to read it like theanimation.json
, it then looks for a note calledugoira frame delay array
which should be a note containing a simple JSON array, for example:[90, 90, 40, 90]
..mp4
video/mp4
\u2705 \u2705 webm.webm
video/webm
\u2705 \u2705 matroska.mkv
video/x-matroska
\u2705 \u2705 avi.avi
video/x-msvideo
\u2705 \u2705 flv.flv
video/x-flv
\u2705 \u2705 quicktime.mov
video/quicktime
\u2705 \u2705 mpeg.mpeg
video/mpeg
\u2705 \u2705 ogv.ogv
video/ogg
\u2705 \u2705 realvideo.rm
video/vnd.rn-realvideo
\u2705 \u2705 wmv.wmv
video/x-ms-wmv
\u2705 \u2705"},{"location":"filetypes.html#audio","title":"Audio","text":"Filetype Extension MIME type Viewable in Hydrus Notes mp3.mp3
audio/mp3
\u2705 ogg.ogg
audio/ogg
\u2705 flac.flac
audio/flac
\u2705 m4a.m4a
audio/mp4
\u2705 matroska audio.mkv
audio/x-matroska
\u2705 mp4 audio.mp4
audio/mp4
\u2705 realaudio.ra
audio/vnd.rn-realaudio
\u2705 tta.tta
audio/x-tta
\u2705 wave.wav
audio/x-wav
\u2705 wavpack.wv
audio/wavpack
\u2705 wma.wma
audio/x-ms-wma
\u2705"},{"location":"filetypes.html#applications","title":"Applications","text":"Filetype Extension MIME type Thumbnails Viewable in Hydrus Notes flash.swf
application/x-shockwave-flash
\u2705 \u274c pdf.pdf
application/pdf
\u2705 \u274c 300 DPI assumed for resolution. No thumbnails for encrypted PDFs. epub.epub
application/epub+zip
\u274c \u274c djvu.djvu
image/vnd.djvu
\u274c \u274c docx.docx
application/vnd.openxmlformats-officedocument.wordprocessingml.document
\u274c \u274c xlsx.xlsx
application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
\u274c \u274c pptx.pptx
application/vnd.openxmlformats-officedocument.presentationml.presentation
\u2705 \u274c 300 DPI assumed for resolution. Thumbnail only if embedded in the document doc.doc
application/msword
\u274c \u274c xls.xls
application/vnd.ms-excel
\u274c \u274c ppt.ppt
application/vnd.ms-powerpoint
\u274c \u274c rtf.rtf
application/rtf
\u274c \u274c"},{"location":"filetypes.html#image_project_files","title":"Image Project Files","text":"Filetype Extension MIME type Thumbnails Viewable in Hydrus Notes clip.clip
application/clip
1 \u2705 \u274c Clip Studio Paint krita.kra
application/x-krita
\u2705 \u2705 Krita. Hydrus shows the embedded preview image if present in the file. procreate.procreate
application/x-procreate
1 \u2705 \u274c Procreate app psd.psd
image/vnd.adobe.photoshop
\u2705 \u2705 Adobe Photoshop. Hydrus shows the embedded preview image if present in the file. sai2.sai2
application/sai2
1 \u274c \u274c PaintTool SAI2 svg.svg
image/svg+xml
\u2705 \u274c xcf.xcf
application/x-xcf
\u274c \u274c GIMP"},{"location":"filetypes.html#archives","title":"Archives","text":"Filetype Extension MIME type Thumbnails Notes cbz.cbz
application/vnd.comicbook+zip
\u2705 A zip file containing images with incrementing numbers in their filenames will be identified as a cbz file. The code for identifying a cbz file is inhydrus/core/files/HydrusArchiveHandling.py
7z.7z
application/x-7z-compressed
\u274c gzip.gz
application/gzip
\u274c rar.rar
application/vnd.rar
\u274c zip.zip
application/zip
\u274c-
This filetype doesn't have an official or de facto media type, the one listed was made up for Hydrus.\u00a0\u21a9\u21a9\u21a9
This page serves as a checklist or overview for the getting started part of Hydrus. It is recommended to read at least all of the getting started pages, but if you want to head to some specific section directly go ahead and do so.
"},{"location":"gettingStartedOverview.html#the_client","title":"The client","text":"Have a look at getting started with files to get an overview of the Hydrus client.
"},{"location":"gettingStartedOverview.html#local_files","title":"Local files","text":"If you already have many local files, either downloaded by hand or by some other downloader tool, head to the getting started importing section to begin importing them.
"},{"location":"gettingStartedOverview.html#downloading","title":"Downloading","text":"If you want to download with Hydrus, check out getting started with downloading. If you want to add the ability to download from sites not already available in Hydrus by default, check out adding new downloaders for how and a link to a user-maintained archive of downloaders.
"},{"location":"gettingStartedOverview.html#tags_and_ratings","title":"Tags and ratings","text":"If you have imported and/or downloaded some files and want to get started searching and tagging see searching and sorting and getting started with ratings.
It is also worth having a look at siblings for when you want to consolidate different tags that all mean the same thing, common misspellings, or preferential differences into one tag.
Parents are for when you want a tag to always add another tag. Commonly used for characters since you would usually want to add the series they're from too.
"},{"location":"gettingStartedOverview.html#duplicates","title":"Duplicates","text":"Have a lot of very similar looking pictures because of one reason or another? Have a look at duplicates, Hydrus' duplicates finder and filtering tool.
"},{"location":"gettingStartedOverview.html#api","title":"API","text":"Hydrus has an API that lets external tools connect to it. See API for how to turn it on and a list of some of these tools.
"},{"location":"getting_started_downloading.html","title":"Getting started with downloading","text":"The hydrus client has a sophisticated and completely user-customisable download system. It can pull from any booru or regular gallery site or imageboard, and also from some special examples like twitter and tumblr. A single file or URL to massive imports, the downloader can handle it all. A fresh install will by default have support for the bigger sites, but it is possible, with some work, for any user to create a new shareable downloader for a new site.
The downloader is highly parallelisable, and while the default bandwidth rules should stop you from running too hot and downloading so much at once that you annoy the servers you are downloading from, there are no brakes in the program on what you can get.
Danger
It is very important that you take this slow. Many users get overexcited with their new ability to download 500,000 files and then do so, only discovering later that 98% of what they got was junk that they now have to wade through. Figure out what workflows work for you, how fast you process files, what content you actually want, how much bandwidth and hard drive space you have, and prioritise and throttle your incoming downloads to match. If you can realistically only archive/delete filter 50 files a day, there is little benefit to downloading 500 new files a day. START SLOW.
It also takes a decent whack of CPU to import a file. You'll usually never notice this with just one hard drive import going, but if you have twenty different download queues all competing for database access and individual 0.1-second hits of heavy CPU work, you will discover your client starts to judder and lag. Keep it in mind, and you'll figure out what your computer is happy with. I also recommend you try to keep your total loaded files/urls to be under 20,000 to keep things snappy. Remember that you can pause your import queues, if you need to calm things down a bit.
"},{"location":"getting_started_downloading.html#downloader_types","title":"Downloader types","text":"There are a number of different downloader types, each with its own purpose:
URL download Intended for single posts or images. (Works with the API) Gallery For big download jobs such as an artist's catalogue, everything with a given tag on a booru. Subscriptions Repeated gallery jobs, for keeping up to date with an artist or tag. Use gallery downloader to get everything and a subscription to keep updated. Watcher Imageboard thread downloader, such as 4chan, 8chan, and what else exists. (Works with the API) Simple downloader Intended for simple one-off jobs like grabbing all linked images in a page."},{"location":"getting_started_downloading.html#url_download","title":"URL download","text":"The url downloader works like the gallery downloader but does not do searches. You can paste downloadable URLs to it, and it will work through them as one list. Dragging and dropping recognisable URLs onto the client (e.g. from your web browser) will also spawn and use this downloader.
The button next to the input field lets you paste multiple URLs at once such as if you've copied from a document or browser bookmarks. The URLs need to be newline separated.
"},{"location":"getting_started_downloading.html#api","title":"API","text":"If you use API-connected programs such as the Hydrus Companion, then any non-watchable URLs sent to Hydrus through them will end up in an URL downloader page, the specifics depending on the program's settings. You can't use this to force Hydrus to download paged galleries since the URL downloader page doesn't support traversing to the next page, use the gallery downloader for this.
"},{"location":"getting_started_downloading.html#gallery_download","title":"Gallery download","text":"The gallery page can download from multiple sources at the same time. Each entry in the list represents a basic combination of two things:
Source The site you are getting from. Safebooru or Danbooru or Deviant Art or twitter or anywhere else. In the example image this is the button labelledartstation artist lookup
. Query text Something like 'contrapposto' or 'blonde_hair blue_eyes' or an artist name like 'incase'. Whatever is searched on the site to return a list of ordered media. In the example image this is the text field withartist username
in it.So, when you want to start a new download, you first select the source with the button and then type in a query in the text box and hit enter. The download will soon start and fill in information, and thumbnails should stream in, just like the hard drive importer. The downloader typically works by walking through the search's gallery pages one by one, queueing up the found files for later download. There are several intentional delays built into the system, so do not worry if work seems to halt for a little while--you will get a feel for hydrus's 'slow persistent growth' style with experience.
Do a test download now, for fun! Pause its gallery search after a page or two, and then pause the file import queue after a dozen or so files come in.
The thumbnail panel can only show results from one queue at a time, so double-click on an entry to 'highlight' it, which will show its thumbs and also give more detailed info and controls in the 'highlighted query' panel. I encourage you to explore the highlight panel over time, as it can show and do quite a lot. Double-click again to 'clear' it.
It is a good idea to 'test' larger downloads, either by visiting the site itself for that query, or just waiting a bit and reviewing the first files that come in. Just make sure that you are getting what you thought you would, whether that be verifying that the query text is correct or that the site isn't only giving you bloated gifs or other bad quality files. The 'file limit', which stops the gallery search after the set number of files, is also great for limiting fishing expeditions (such as overbroad searches like 'wide_hips', which on the bigger boorus have 100k+ results and return variable quality). If the gallery search runs out of new files before the file limit is hit, the search will naturally stop (and the entry in the list should gain a \u23f9 'stop' symbol).
Note that some sites only serve 25 or 50 pages of results, despite their indices suggesting hundreds. If you notice that one site always bombs out at, say, 500 results, it may be due to a decision on their end. You can usually test this by visiting the pages hydrus tried in your web browser.
In general, particularly when starting out, artist searches are best. They are usually fewer than a thousand files and have fairly uniform quality throughout.
"},{"location":"getting_started_downloading.html#subscriptions","title":"Subscriptions","text":"Let's say you found an artist you like. You downloaded everything of theirs from some site, but every week, one or two new pieces is posted. You'd like to keep up with the new stuff, but you don't want to manually make a new download job every week for every single artist you like.
Subscriptions are a way to automatically recheck a good query in future, to keep up with new files. Many users come to use them. You set up a number of saved queries, and the client will 'sync' with the latest files in the gallery and download anything new, just as if you were running the download yourself.
Subscriptions only work for booru-like galleries that put the newest files first, and they only keep up with new content--once they have done their first sync, which usually gets the most recent hundred files or so, they will never reach further into the past. Getting older files, as you will see later, is a job best done with a normal download page.
Note
The entire subscription system assumes the source is a typical 'newest first' booru-style search. If you dick around with some order_by:rating/random metatag, it will not work reliably.
It is important to note that while subscriptions can have multiple queries (even hundreds!), they generally only work on one site. Expect to create one subscription for safebooru, one for artstation, one for paheal, and so on for every site you care about. Advanced users may be able to think of ways to get around this, but I recommend against it as it throws off some of the internal check timing calculations.
"},{"location":"getting_started_downloading.html#setting_up_subscriptions","title":"Setting up subscriptions","text":"Here's the dialog, which is under network->manage subscriptions:
This is a very simple example--there is only one subscription, for safebooru. It has two 'queries' (i.e. searches to keep up with).
Before we trip over the advanced buttons here, let's zoom in on the actual subscription:
Danger
Do not change the max number of new files options until you know exactly what they do and have a good reason to alter them!
This is a big and powerful panel! I recommend you open the screenshot up in a new browser tab, or in the actual client, so you can refer to it.
Despite all the controls, the basic idea is simple: Up top, I have selected the 'safebooru tag search' download source, and then I have added two artists--\"hong_soon-jae\" and \"houtengeki\". These two queries have their own panels for reviewing what URLs they have worked on and further customising their behaviour, but all they really are is little bits of search text. When the subscription runs, it will put the given search text into the given download source just as if you were running the regular downloader.
Warning
Subscriptions syncs are somewhat fragile. Do not try to play with the limits or checker options to download a whole 5,000 file query in one go--if you want everything for a query, run it in the manual downloader and get everything, then set up a normal sub for new stuff. There is no benefit to having a 'large' subscription, and it will trim itself down in time anyway.
You might want to put subscriptions off until you are more comfortable with galleries. There is more help here.
"},{"location":"getting_started_downloading.html#watchers","title":"Watchers","text":"If you are an imageboard user, try going to a thread you like and drag-and-drop its URL (straight from your web browser's address bar) onto the hydrus client. It should open up a new 'watcher' page and import the thread's files!
With only one URL to check, watchers are a little simpler than gallery searches, but as that page is likely receiving frequent updates, it checks it over and over until it dies. By default, the watcher's 'checker options' will regulate how quickly it checks based on the speed at which new files are coming in--if a thread is fast, it will check frequently; if it is running slow, it may only check once per day. When a thread falls below a critical posting velocity or 404s, checking stops.
In general, you can leave the checker options alone, but you might like to revisit them if you are always visiting faster or slower boards and find you are missing files or getting DEAD too early.
"},{"location":"getting_started_downloading.html#api_1","title":"API","text":"If you use API-connected programs such as the Hydrus Companion, then any watchable URLs sent to Hydrus through them will end up in a watcher page, the specifics depending on the program's settings.
"},{"location":"getting_started_downloading.html#simple_downloader","title":"Simple downloader","text":"The simple downloader will do very simple parsing for unusual jobs. If you want to download all the images in a page, or all the image link destinations, this is the one to use. There are several default parsing rules to choose from, and if you learn the downloader system yourself, it will be easy to make more.
"},{"location":"getting_started_downloading.html#import_options","title":"Import options","text":"Every importer in Hydrus has some 'import options' that change what is allowed, what is blacklisted, and whether tags or notes should be saved.
In previous versions these were split into completely different windows called
file import options
andtag import options
so if you see those anywhere, this is what they're talking about and not some hidden menu anywhere.Importers that download from websites rely on a flexible 'defaults' system, so you do not have to set them up every time you start a new downloader. While you should play around with your import options, once you know what works for you, you should set that as the default under network->downloaders->manage default import options. You can set them for all file posts generally, all watchers, and for specific sites as well.
"},{"location":"getting_started_downloading.html#file_import_options","title":"File import options","text":"This deals with the files being downloaded and what should happen to them. There's a few more tickboxes if you turn on advanced mode.
pre-import checks Pretty self-explanatory for the most part. If you want to redownload previously deleted files turning offexclude previously deleted files
will have Hydrus ignore deletion status. A few of the options have more information if you hover over them. import destinations See multiple file services, an advanced feature. post import actions See the files section on filtering for the first option, the other two have information if you hover over them."},{"location":"getting_started_downloading.html#tag_parsing","title":"Tag Parsing","text":"By default, hydrus now starts with a local tag service called 'downloader tags' and it will parse (get) all the tags from normal gallery sites and put them in this service. You don't have to do anything, you will get some decent tags. As you use the client, you will figure out which tags you like and where you want them. On the downloader page, click
import options
:This is an important dialog, although you will not need to use it much. It governs which tags are parsed and where they go. To keep things easy to manage, a new downloader will refer to the 'default' tag import options for a website, but for now let's set some values just for this downloader:
You can see that each tag service on your client has a separate section. If you add the PTR, that will get a new box too. A new client is set to get all tags for 'downloader tags' service. Things can get much more complicated. Have a play around with the options here as you figure things out. Most of the controls have tooltips or longer explainers in sub-dialogs, so don't be afraid to try things.
It is easy to get tens of thousands of tags by downloading this way. Different sites offer different kinds and qualities of tags, and the client's downloaders (which were designed by me, the dev, or a user) may parse all or only some of them. Many users like to just get everything on offer, but others only ever want, say,
creator
,series
, andcharacter
tags. If you feel brave, click that 'all tags' button, which will take you into hydrus's advanced 'tag filter', which allows you to select which of the incoming list of tags will be added.The blacklist button will let you skip downloading files that have certain tags (perhaps you would like to auto-skip all images with
gore
,scat
, ordiaper
?), again using the tag filter, while the whitelist enables you to only allow files that have at least one of a set of tags. The 'additional tags' adds some fixed personal tags to all files coming in--for instance, you might like to add 'process into favourites' to your 'my tags' for some query you really like so you can find those files again later and process them separately. That little 'cog' icon button can also do some advanced things.Warning
The file limit and import options on the upper panel of a gallery or watcher page, if changed, will only apply to new queries. If you want to change the options for an existing queue, either do so on its highlight panel below or use the 'set options to queries' button.
"},{"location":"getting_started_downloading.html#force_page_fetch","title":"Force Page Fetch","text":"By default, hydrus will not revisit web pages or API endpoints for URLs it knows A) refer to one known file only, and B) that file is already in your database or has previously been deleted. The way it navigates this can be a complicated mix of hash and URL data, and in certain logical situations hydrus will determine its own records are untrustworthy and decide to check the source again. This saves bandwidth and time as you run successive queries that include the same results. You should not disable the capability for normal operation.
But if you mess up your tag import options somewhere and need to re-run a download with forced tag re-fetching, how to do it?
At the moment, this is in tag import options, the
"},{"location":"getting_started_downloading.html#note_parsing","title":"Note Parsing","text":"force page fetch even if...
checkboxes. You can either set up a one-time downloader page with specific tag import options that check both of these checkboxes and then paste URLs in, or you can right-click a selection of thumbnails and have hydrus create the page for you under the urls->force metadata refetch menu. Once you are done with the downloader page, delete it and do not use it for normal jobs--again, this method of downloading is inefficient and should not be used for repeating, long-term, or speculative jobs. Only use it to fill in specific holes.Hydrus alsos parse 'notes' from some sites. This is a young feature, and a little advanced at times, but it generally means the comments that artists leave on certain gallery sites, or something like a tweet text. Notes are editable by you and appear in a hovering window on the right side of the media viewer.
Most of the controls here ensure that successive parses do not duplicate existing notes. The default settings are fine for all normal purposes, and you can leave them alone unless you know you want something special (e.g. turning note parsing off completely).
"},{"location":"getting_started_downloading.html#bandwidth","title":"Bandwidth","text":"It will not be too long until you see a \"bandwidth free in xxxxx...\" message. As a long-term storage solution, hydrus is designed to be polite in its downloading--both to the source server and your computer. The client's default bandwidth rules have some caps to stop big mistakes, spread out larger jobs, and at a bare minimum, no domain will be hit more than once a second.
All the bandwidth rules are completely customisable and are found in
network > data > review bandwidth usage and edit rules
. They can get quite complicated. I strongly recommend you not look for them until you have more experience. I especially strongly recommend you not ever turn them all off, thinking that will improve something, as you'll probably render the client too laggy to function and get yourself an IP ban from the next server you pull from.If you want to download 10,000 files, set up the queue and let it work. The client will take breaks, likely even to the next day, but it will get there in time. Many users like to leave their clients on all the time, just running in the background, which makes these sorts of downloads a breeze--you check back in the evening and discover your download queues, watchers, and subscriptions have given you another thousand things to deal with.
Again: the real problem with downloading is not finding new things, it is keeping up with what you get. Start slow and figure out what is important to your bandwidth budget, hard drive budget, and free time budget. Almost everyone fails at this.
"},{"location":"getting_started_downloading.html#logins","title":"Logins","text":"The client now supports a flexible (but slightly prototype and ugly) login system. It can handle simple sites and is as completely user-customisable as the downloader system. The client starts with multiple login scripts by default, which you can review under network->logins->manage logins:
Many sites grant all their content without you having to log in at all, but others require it for NSFW or special content, or you may wish to take advantage of site-side user preferences like personal blacklists. If you wish, you can give hydrus some login details here, and it will try to login--just as a browser would--before it downloads anything from that domain.
Warning
For multiple reasons, I do not recommend you use important accounts with hydrus. Use a throwaway account you don't care much about.
To start using a login script, select the domain and click 'edit credentials'. You'll put in your username/password, and then 'activate' the login for the domain, and that should be it! The next time you try to get something from that site, the first request will wait (usually about ten seconds) while a login popup performs the login. Most logins last for about thirty days (and many refresh that 30-day timer every time you make a new request), so once you are set up, you usually never notice it again, especially if you have a subscription on the domain.
Most sites only have one way of logging in, but hydrus does support more. Hentai Foundry is a good example--by default, the client performs the 'click-through' login as a guest, which requires no credentials and means any hydrus client can get any content from the start. But this way of logging in only lasts about 60 minutes or so before having to be refreshed, and it does not hide any spicy stuff, so if you use HF a lot, I recommend you create a throwaway account, set the filters you like in your HF profile (e.g. no guro content), and then click the 'change login script' in the client to the proper username/pass login.
The login system is not very clever. Don't try to pull off anything too weird with it! If anything goes wrong, it will likely delay the script (and hence the whole domain) from working for a while, or invalidate it entirely. If the error is something simple, like a password typo or current server maintenance, go back to this dialog to fix and scrub the error and try again. If the site just changed its layout, you may need to update the login script. If it is more complicated, please contact me, hydrus_dev, with the details!
If you would like to login to a site that is not yet supported by hydrus (usually ones with a Captcha in the login page), you have two options:
- Get a web browser add-on that lets you export a cookies.txt (either for the whole browser or just for that domain) and then drag and drop that cookies.txt file onto the hydrus network->data->review session cookies dialog. This sometimes does not work if your add-on's export formatting is unusual. If it does work, hydrus will import and use those cookies, which skips the login by making your hydrus pretend to be your browser directly. This is obviously advanced and hacky, so if you need to do it, let me know how you get on and what tools you find work best!
- Use Hydrus Companion browser add-on to do the same basic thing automatically.
Boorus are usually easy to parse from, and there are many hydrus downloaders available that work well. Other sites are less easy to download from. Some will purposefully disguise access behind captchas or difficult login tokens that the hydrus downloader just isn't clever enough to handle. In these cases, it can be best just to go to an external downloader program that is specially tuned for these complex sites.
It takes a bit of time to set up these sorts of programs--and if you get into them, you'll likely want to make a script to help automate their use--but if you know they solve your problem, it is well worth it!
- yt-dlp - This is an excellent video downloader that can download from hundreds of different websites. Learn how it works, it is useful for all sorts of things!
- gallery-dl - This is an excellent image and small-vid downloader that works for pretty much any booru and many larger/professional gallery sites, particularly when those sites need logins. Check the documentation, since you may be able to get it to rip cookies right out of your firefox, or you can give it your actual user/password for many sites and it'll handle all the login for you.
- imgbrd-grabber - Another excellent, mostly booru downloader, with an UI. You can export some metadata to filenames, which you might like to then suck up with hydrus filename-import-parsing.
With these tools, used manually and/or with some scripts you set up, you may be able to set up a regular import workflow to hydrus (especilly with an
Import Folder
as under thefile
menu) and get most of what you would with an internal downloader. Some things like known URLs and tag parsing may be limited or non-existant, but it is better than nothing, and if you only need to do it for a couple sources on a couple sites every month, you can fill in the most of the gap manually yourself.Hydev is planning to roll yt-dlp and gallery-dl support into the program natively in a future update of the downloader engine.
"},{"location":"getting_started_files.html","title":"Getting started with files","text":"Warning
Hydrus can be powerful, and you control everything. By default, you are not connected to any servers and absolutely nothing is shared with other users--and you can't accidentally one-click your way to exposing your whole collection--but if you tag private files with real names and click to upload that data to a tag repository that other people have access to, the program won't try to stop you. If you want to do private sexy slideshows of your shy wife, that's great, but think twice before you upload files or tags anywhere, particularly as you learn. It is impossible to contain leaks of private information.
There are no limits and few brakes on your behaviour. It is possible to import millions of files. For many new users, their first mistake is downloading too much too fast in overexcitement and becoming overwhelmed. Take things slow and figure out good processing workflows that work for your schedule before you start adding 500 subscriptions.
"},{"location":"getting_started_files.html#the_problem","title":"The problem","text":"If you have ever seen something like this--
--then you already know the problem: using a filesystem to manage a lot of images sucks.
Finding the right picture quickly can be difficult. Finding everything by a particular artist at a particular resolution is unthinkable. Integrating new files into the whole nested-folder mess is a further pain, and most operating systems bug out when displaying 10,000+ thumbnails.
"},{"location":"getting_started_files.html#the_client","title":"The client","text":"Let's first focus on importing files.
When you first boot the client, you will see a blank page. There are no files in the database and so there is nothing to search. To get started, I suggest you simply drag-and-drop a folder with a hundred or so images onto the main window. A dialog will appear affirming what you want to import. Ok that, and a new page will open. Thumbnails will stream in as the software processes each file.
The files are being imported into the client's database. The client discards their filenames.
Notice your original folder and its files are untouched. You can move the originals somewhere else, delete them, and the client will still return searches fine. In the same way, you can delete from the client, and the original files will remain unchanged--import is a copy, not a move, operation. The client performs all its operations on its internal database, which holds copies of the files it imports. If you find yourself enjoying using the client and decide to completely switch over, you can delete the original files you import without worry. You can always export them back again later.
FAQ: can the client manage files from their original locations?
Now:
- Click on a thumbnail; it'll show in the preview screen, bottom left.
- Double- or middle-click the thumbnail to open the media viewer. You can hit F to switch between giving the fullscreen a frame or not. You can use your scrollwheel or page up/down to browse the media and ctrl+scrollwheel to zoom in and out.
-
Move your mouse to the top-left, top-middle and top-right of the media viewer. You should see some 'hover' panels pop into place.
The one on the left is for tags, the middle is for browsing and zoom commands, and the right is for status and ratings icons. You will learn more about these things as you get more experience with the program.
-
Press Enter or double/middle-click again to close the media viewer.
- You can quickly select multiple files by shift- or ctrl- clicking. Notice how the status bar at the bottom of the screen updates with the number selected and their total size. Right-clicking your selection will present another summary and many actions.
- Hit F9 to bring up a new page chooser. You can navigate it with the arrow keys, your numpad, or your mouse.
-
On the left of a normal search page is a text box. When it is focused, a dropdown window appears. It looks like this:
This is where you enter the predicates that define the current search. If the text box is empty, the dropdown will show 'system' tags that let you search by file metadata such as file size or animation duration. To select one, press the up or down arrow keys and then enter, or double click with the mouse.
When you have some tags in your database, typing in the text box will search them:
The (number) shows how many files have that tag, and hence how large the search result will be if you select that tag.
Clicking 'searching immediately' will pause the searcher, letting you add several tags in a row without sending it off to get results immediately. Ignore the other buttons for now--you will figure them out as you gain experience with the program.
-
You can remove from the list of 'active tags' in the box above with a double-click, or by entering the exact same tag again through the dropdown.
- Play with the system tags more if you like, and the sort-by dropdown. The collect-by dropdown is advanced, so wait until you understand namespaces before expecting it to do anything.
- To close a page, middle-click its tab.
Hydrus supports many filetypes. A full list can be viewed on the Supported Filetypes page.
Although some support is imperfect for the complicated filetypes. For the Windows and Linux built releases, hydrus now embeds an MPV player for video, audio and gifs, which provides smooth playback and audio, but some other environments may not support MPV and so will default when possible to the native hydrus software renderer, which does not support audio. When something does not render how you want, right-clicking on its thumbnail presents the option 'open externally', which will open the file in the appropriate default program (e.g. ACDSee, VLC).
The client can also download files from several websites, including 4chan and other imageboards, many boorus, and gallery sites like deviant art and hentai foundry. You will learn more about this later.
"},{"location":"getting_started_files.html#inbox_and_archive","title":"Inbox and archive","text":"The client sends newly imported files to an inbox, just like your email. Inbox acts like a tag, matched by 'system:inbox'. A small envelope icon is drawn in the top corner of all inbox files:
If you are sure you want to keep a file long-term, you should archive it, which will remove it from the inbox. You can archive from your selected thumbnails' right-click menu, or by pressing F7. If you make a mistake, you can spam Ctrl+Z for undo or hit Shift+F7 on any set of files to explicitly return them to the inbox.
Anything you do not want to keep should be deleted by selecting from the right-click menu or by hitting the delete key. Deleted files are sent to the trash. They will get a little trash icon:
A trashed file will not appear in subsequent normal searches, although you can search the trash specifically by clicking the 'my files' button on the autocomplete dropdown and changing the file domain to 'trash'. Undeleting a file (Shift+Del) will return it to 'my files' as if nothing had happened. Files that remain in the trash will be permanently deleted, usually after a few days. You can change the permanent deletion behaviour in the client's options.
A quick way of processing new files is\u2013
"},{"location":"getting_started_files.html#filtering_your_inbox","title":"Filtering your inbox","text":"Lets say you just downloaded a good thread, or perhaps you just imported an old folder of miscellany. You now have a whole bunch of files in your inbox--some good, some awful. You probably want to quickly go through them, saying yes, yes, yes, no, yes, no, no, yes, where yes means 'keep and archive' and no means 'delete this trash'. Filtering is the solution.
Select some thumbnails, and either choose filter->archive/delete from the right-click menu or hit F12. You will see them in a special version of the media viewer, with the following default controls:
- Left Button or F7: keep and archive the file, move on
- Right Button or Del: delete the file, move on
- Up: Skip this file, move on
- Middle Button or Backspace: I didn't mean that, go back one
- Esc, Enter, or F12: stop filtering now
Your choices will not be committed until you finish filtering.
This saves time.
"},{"location":"getting_started_files.html#what_hydrus_is_for","title":"What Hydrus is for","text":"The hydrus client's workflows are not designed for half-finished files that you are still working on. Think of it as a giant archive for everything excellent you have decided to store away. It lets you find and remember these things quickly.
In general, Hydrus is good for individual files like you commonly find on imageboards or boorus. Although advanced users can cobble together some page-tag-based solutions, it is not yet great for multi-file media like comics and definitely not as a typical playlist-based music player.
If you are looking for a comic manager to supplement hydrus, check out this user-made guide to other archiving software here!
And although the client can hold millions of files, it starts to creak and chug when displaying or otherwise tracking more than about 40,000 or so in a single gui window. As you learn to use it, please try not to let your download queues or general search pages regularly sit at more than 40 or 50k total items, or you'll start to slow other things down. Another common mistake is to leave one large 'system:everything' or 'system:inbox' page open with 70k+ files. For these sorts of 'ongoing processing' pages, try adding a 'system:limit=256' to keep them snappy. One user mentioned he had regular gui hangs of thirty seconds or so, and when we looked into it, it turned out his handful of download pages had three million files queued up! Just try and take things slow until you figure out what your computer's limits are.
"},{"location":"getting_started_importing.html","title":"Importing and exporting","text":"By now you should have launched Hydrus. If you're like most new users you probably already have a fair bit of images or other media files that you're looking at getting organised.
Note
If you're planning to import or export a large amount of files it's recommended to use the automated folders since Hydrus can have trouble dealing with large, single jobs. Splitting them up in this manner will make it much easier on the program.
"},{"location":"getting_started_importing.html#importing_files","title":"Importing files","text":"Navigate to
file -> import files
in the toolbar. OR Drag-and-drop one or more folders or files into Hydrus.This will open the
import files
window. Here you can add files or folders, or delete files from the import queue. Let Hydrus parse what it will update and then look over the options. By default the option to delete original files after succesful import (if it's ignored for any reason or already present in Hydrus for example) is not checked, activate on your own risk. Infile import options
you can find some settings for minimum and maximum file size, resolution, and whether to import previously deleted files or not.From here there's two options:
import now
which will just import as is, andadd tags before import >>
which lets you set up some rules to add tags to files on import. Examples are keeping filename as a tag, add folders as tag (useful if you have some sort of folder based organisation scheme), or load tags from an accompanying text file generated by some other program.Once you're done click apply (or
"},{"location":"getting_started_importing.html#exporting_files","title":"Exporting files","text":"import now
) and Hydrus will start processing the files. Exact duplicates are not imported so if you had dupes spread out you will end up with only one file in the end. If files look similar but Hydrus imports both then that's a job for the dupe filter as there is some difference even if you can't tell it by eye. A common one is compression giving files with different file sizes, but otherwise looking identical or files with extra meta data baked into them.If you want to share your files then export is the way to go. Basic way is to mark the files in Hydrus, dragging from there and dropping the files where you want them. You can also copy files or use export files to, well, export your files to a select location. All (or at least most) non-drag'n'drop export options can be found on right-clicking the select files and going down
"},{"location":"getting_started_importing.html#dragndrop","title":"Drag'n'drop","text":"share
and then eithercopy
orexport
.Just dragging from the thumbnail view will export (copy) all the selected files to wherever you drop them. You can also start a drag and drop for single files from the media viewer using this arrow button on the top hover window:
If you want to drag and drop to discord, check the special BUGFIX option under
options > gui
. You also find a filename pattern setting for that drag and drop here.By default, the files will be named by their ugly hexadecimal hash, which is how they are stored inside the database.
If you use a drag and drop to open a file inside an image editing program, remember to hit 'save as' and give it a new filename in a new location! The client does not expect files inside its db directory to ever change.
"},{"location":"getting_started_importing.html#copy","title":"Copy","text":"You can also copy the files by right-clicking and going down
"},{"location":"getting_started_importing.html#export","title":"Export","text":"share -> copy -> files
and then pasting the files where you want them.You can also export files with tags, either in filename or as a sidecar file by right-clicking and going down
share -> export -> files
. Have a look at the settings and then pressexport
. You can create folders to export files into by using backslashes on Windows (\\
) and slashes on Linux (/
) in the filename. This can be combined with the patterns listed in the pattern shortcut button dropdown. As example[series]\\{filehash}
will export files into folders named after theseries:
namespaced tags on the files, all files tagged with one series goes into one folder, files tagged with another series goes into another folder as seen in the image below.Clicking the
pattern shortcuts
button gives you an overview of available patterns.The EXPERIMENTAL option is only available under advanced mode, use at your own risk.
"},{"location":"getting_started_importing.html#automation","title":"Automation","text":"Under
"},{"location":"getting_started_importing.html#import_folders","title":"Import folders","text":"file -> import and export folders
you'll find options for setting up automated import and export folders that can run on a schedule. Both have a fair deal of options and rules you can set so look them over carefully.Like with a manual import, if you wish you can import tags by parsing filenames or loading sidecars.
"},{"location":"getting_started_importing.html#export_folders","title":"Export folders","text":"Like with manual export, you can set the filenames using a tag pattern, and you can export to sidecars too.
"},{"location":"getting_started_importing.html#importing_and_exporting_tags","title":"Importing and exporting tags","text":"While you can import and export tags together with images sometimes you just don't want to deal with the files.
Going to
"},{"location":"getting_started_installing.html","title":"Installing and Updating","text":"tags -> migrate tags
you get a window that lets you deal with just tags. One of the options here is what's called a Hydrus Tag Archive, a file containing the hash <-> tag mappings for the files and tags matching the query.If any of this is confusing, a simpler guide is here, and some video guides are here!
"},{"location":"getting_started_installing.html#downloading","title":"Downloading","text":"You can get the latest release at the github releases page.
I try to release a new version every Wednesday by 8pm EST and write an accompanying post on my tumblr and a Hydrus Network General thread on 8chan.moe /t/.
"},{"location":"getting_started_installing.html#installing","title":"Installing","text":"WindowsmacOSLinuxDockerFrom Source- If you want the easy solution, download the .exe installer. Run it, hit ok several times.
- If you know what you are doing and want a little more control, get the .zip. Don't extract it to Program Files unless you are willing to run it as administrator every time (it stores all its user data inside its own folder). You probably want something like D:\\hydrus.
- If you run <Win10, you need Visual C++ Redistributable for Visual Studio 2015 if you don't already have it for vidya.
- If you use Windows 10 N (a version of Windows without some media playback features), you will likely need the 'Media Feature Pack'. There have been several versions of this, so it may best found by searching for the latest version or hitting Windows Update, but otherwise check here.
- If you run Win7, you cannot run Qt6 programs, so you cannot run the official executable release. You have options by running from source.
- Third parties (not maintained by Hydrus Developer):
- Chocolatey
- Scoop (
hydrus-network
in the 'Extras' bucket) - Winget. The command is
winget install --id=HydrusNetwork.HydrusNetwork -e --location \"\\PATH\\TO\\INSTALL\\HERE\"
, which can, if you know what you are doing, bewinget install --id=HydrusNetwork.HydrusNetwork -e --location \".\\\"
, maybe rolled into a batch file. - User guide for Anaconda
- Get the .dmg App. Open it, drag it to Applications, and check the readme inside.
- macOS users have no mpv support for now, so no audio, and video may be laggy.
- This release has always been a little buggy. Many macOS users are having better success running from source.
Wayland
Unfortunately, hydrus has several bad bugs in Wayland. The mpv window will often not embed properly into the media viewer, menus and windows may position on the wrong screen, and the taskbar icon may not work at all. Running from source may improve the situation, but some of these issues seem to be intractable for now. X11 is much happier with hydrus.
One user notes that launching with the environment variable
QT_QPA_PLATFORM=xcb
may help!XCB Qt compatibility
If you run into trouble running Qt6, usually with an XCB-related error like
qt.qpa.plugin: Could not load the Qt platform plugin \"xcb\" in \"\" even though it was found.
, try installing the packageslibicu-dev
andlibxcb-cursor-dev
. Withapt
that will be:sudo apt-get install libicu-dev
sudo apt-get install libxcb-cursor-dev
- Get the .tag.gz. Extract it somewhere useful and create shortcuts to 'client' and 'server' as you like. The build is made on Ubuntu, so if you run something else, compatibility is hit and miss.
- If you have problems running the Ubuntu build, running from source is usually an improvement, and it is easy to set up these days.
- You might need to get 'libmpv1' to get mpv working and playing video/audio. This is the mpv library, not the necessarily the player. Check help->about to see if it is available--if not, see if you can get it like so:
apt-get install libmpv1
- Use options->media to set your audio/video/animations to 'show using mpv' once you have it installed.
- If the about window provides you an mpv error popup like this:
Then please do this:OSError: /lib/x86_64-linux-gnu/libgio-2.0.so.0: undefined symbol: g_module_open_full\n(traceback)\npyimod04_ctypes.install.<locals>.PyInstallerImportError: Failed to load dynlib/dll 'libmpv.so.1'. Most likely this dynlib/dll was not found when the application was frozen.\n
- Search your /usr/ dir for
libgmodule*
. You are looking for something likelibgmodule-2.0.so
. Users report finding it in/usr/lib64/
and/usr/lib/x86_64-linux-gnu
. - Copy that .so file to the hydrus install base directory.
- Boot the client and hit help->about to see if it reports a version.
- If it all seems good, hit options->media to set up mpv as your player for video/audio and try to view some things.
- If it still doesn't work, see if you can do the same for libmpv.so and libcdio.so--or consider running from source
- Search your /usr/ dir for
- You can also try running the Windows version in wine.
- Third parties (not maintained by Hydrus Developer):
- (These both run from source, so if you have trouble with the built release, they may work better for you!)
- AUR package - Although please note that since AUR packages work off your system python, this has been known to cause issues when Arch suddenly updates to the latest Qt or something before we have had a chance to test things and it breaks hydrus. If you can, try just running from source yourself instead, where we can control things better!
- flatpak
- A rudimentary documentation for the container setup can be found here.
- You can also run from source. This is often the best way to fix compatibility problems, and it is the most pleasant way to run and update the program (you can update in five seconds!), although it requires a bit more work to set up the first time. It is not too complicated to do, though--my guide will walk you through each step.
By default, hydrus stores all its data\u2014options, files, subscriptions, everything\u2014entirely inside its own directory. You can extract it to a usb stick, move it from one place to another, have multiple installs for multiple purposes, wrap it all up inside a truecrypt volume, whatever you like. The .exe installer writes some unavoidable uninstall registry stuff to Windows, but the 'installed' client itself will run fine if you manually move it.
Bad Locations
Do not install to a network location! (i.e. on a different computer's hard drive) The SQLite database is sensitive to interruption and requires good file locking, which network interfaces often fake. There are ways of splitting your client up so the database is on a local SSD but the files are on a network--this is fine--but you really should not put the database on a remote machine unless you know what you are doing and have a backup in case things go wrong.
Do not install to a location with filesystem-level compression enabled! (e.g. BTRFS) It may work ok to start, but when the SQLite database grows to large size, this can cause extreme access latency and I/O errors and corruption.
For macOS users
The Hydrus App is non-portable and puts your database in
"},{"location":"getting_started_installing.html#anti_virus","title":"Anti-virus","text":"~/Library/Hydrus
(i.e./Users/[You]/Library/Hydrus
). You can update simply by replacing the old App with the new, but if you wish to backup, you should be looking at~/Library/Hydrus
, not the App itself.Hydrus is made by an Anon out of duct tape and string. It combines file parsing tech with lots of network and database code in unusual and powerful ways, and all through a hacked-together executable that isn't signed by any big official company.
Unfortunately, we have been hit by anti-virus false positives throughout development. Every few months, one or more of the larger anti-virus programs sees some code that looks like something bad, or they run the program in a testbed and don't like something it does, and then they quarantine it. Every single instance of this so far has been a false positive. They usually go away the next week or two when the next set of definitions roll out. Some hydrus users are kind enough to report the program as a false positive to the anti-virus companies themselves, which also helps here.
Some users have never had the problem, some get hit regularly. The situation is obviously worse on Windows. If you try to extract the zip and hydrus_client.exe or the whole folder suddenly disappears, please check your anti-virus software.
I am interested in reports about these false-positives, just so I know what is going on. Sometimes I have been able to reduce problems by changing something in the build (one of these was, no shit, an anti-virus testbed running the installer and then opening the help html at the end, which launched Edge browser, which then triggered Windows Update, which hit UAC and was considered suspicious. I took out the 'open help' checkbox from the installer as a result).
You should be careful about random software online. For my part, the program is completely open source, and I have a long track record of designing it with privacy foremost. There is no intentional spyware of any sort--the program never connects to another computer unless you tell it to. Furthermore, the exe you download is now built on github's cloud, so there are very few worries about a trojan-infected build environment putting something I did not intend into the program (as there once were when I built the release on my home machine). That doesn't stop Windows Defender from sometimes calling it an ugly name like \"Tedy.4675\" and definitively declaring \"This program is dangerous and executes commands from an attacker\" but that's the modern anti-virus ecosystem.
There aren't excellent solutions to this problem. I don't like to say 'just exclude the program directory from your anti-virus settings', but some users are comfortable with this and say it works fine. One thing I do know that helps (with other things too), if you are using the default Windows Defender, is going into the Windows Security shield icon on your taskbar, and 'virus and threat protection' and then 'virus and threat protection settings', and turning off 'Cloud-delivered protection' and 'Automatic sample submission'. It seems with these on, Windows will talk with a central server about executables you run and download early updates, and this gives a lot of false positives.
If you are still concerned, please feel free to run from source, as above. You are controlling everything, then, and can change anything about the program you like. Or you can only run releases from four weeks ago, since you know the community would notice by then if there ever were a true positive. Or just run it in a sandbox and watch its network traffic.
In 2022 I am going to explore a different build process to see if that reduces the false positives. We currently make the executable with PyInstaller, which has some odd environment set-up the anti-virus testbeds don't seem to like, and perhaps PyOxidizer will be better. We'll see.
"},{"location":"getting_started_installing.html#running","title":"Running","text":"To run the client:
WindowsmacOSLinux- For the installer, run the Start menu shortcut it added.
- For the extract, run 'hydrus_client.exe' in the base directory, or make a shortcut to it.
- Run the App you installed.
- Run the 'client' executable in the base directory. You may be able to double-click it, otherwise you are running
./client
from the terminal. - If you experience virtual memory crashes, please review this thorough guide by a user.
Warning
Hydrus is imageboard-tier software, wild and fun--but also unprofessional. It is written by one Anon spinning a lot of plates. Mistakes happen from time to time, usually in the update process. There are also no training wheels to stop you from accidentally overwriting your whole db if you screw around. Be careful when updating. Make backups beforehand!
Hydrus does not auto-update. It will stay the same version unless you download and install a new one.
Although I put out a new version every week, you can update far less often if you prefer. The client keeps to itself, so if it does exactly what you want and a new version does nothing you care about, you can just leave it. Other users enjoy updating every week, simply because it makes for a nice schedule. Others like to stay a week or two behind what is current, just in case I mess up and cause a temporary bug in something they like.
The update process:
- If the client is running, close it!
- If you maintain a backup, run it now!
- Update your install:
- If you use the installer, just download the new installer and run it. It should detect where the last install was and overwrite everything automatically.
- If you use the extract, then just extract the new version right on top of your current install and overwrite manually. It is wise to extract it straight from the archive to your install folder.
- If you use the macOS App, just drag and drop from the dmg to your Applications as normal.
- If you run from source, then run
git pull
as normal.
- Start your client or server. It may take a few minutes to update its database. I will say in the release post if it is likely to take longer.
A user has written a longer and more formal guide to updating here.
Be extremely careful making test runs of the Extract releaseDo not test-run the extract before copying it over your install! Running the program anywhere will create database files in the /db/ dir, and if you then copy that once-run folder on top of your real install, you will overwrite your real database! Of course it doesn't really matter, because you made a full backup before you started, right? :^)
If you need to perform tests of an update, make sure you have a good backup before you start and then remember to delete any functional test extracts before extracting from the original archive once more for the actual 'install'.
Several older versions, like 334, 526, and 570 have special update instructions.
Unless the update specifically disables or reconfigures something, all your files and tags and settings will be remembered after the update.
Releases typically need to update your database to their version. New releases can retroactively perform older database updates, so if the new version is v255 but your database is on v250, you generally only need to get the v255 release, and it'll do all the intervening v250->v251, v251->v252, etc... update steps in order as soon as you boot it. If you need to update from a release more than, say, ten versions older than current, see below. You might also like to skim the release posts or changelog to see what is new.
Clients and servers of different versions can usually connect to one another, but from time to time, I make a change to the network protocol, and you will get polite error messages if you try to connect to a newer server with an older client or vice versa. There is still no need to update the client--it'll still do local stuff like searching for files completely fine. Read my release posts and judge for yourself what you want to do.
"},{"location":"getting_started_installing.html#clean_installs","title":"Clean installs","text":"This is usually only relevant if you use the extract release and have a dll conflict or otherwise update and cannot boot at all. A handful of hydrus updates through its history have needed this.
Very rarely, hydrus needs a clean install. This can be due to a special update like when we moved from 32-bit to 64-bit or needing to otherwise 'reset' a custom install situation. The problem is usually that a library file has been renamed in a new version and hydrus has trouble figuring out whether to use the older one (from a previous version) or the newer.
In any case, if you cannot boot hydrus and it either fails silently or you get a crash log or system-level error popup complaining in a technical way about not being able to load a dll/pyd/so file, you may need a clean install, which essentially means clearing any old files out and reinstalling.
However, you need to be careful not to delete your database! It sounds silly, but at least one user has made a mistake here. The process is simple, do not deviate:
- Make a backup if you can!
- Go to your install directory.
- Delete all the files and folders except the 'db' dir (and all of its contents, obviously).
- Extract the new version of hydrus as you normally do.
After that, you'll have a 'clean' version of hydrus that only has the latest version's dlls. If hydrus still will not boot, I recommend you roll back to your last working backup and let me, hydrus dev, know what your error is.
Note that macOS App users will not ever have to do a clean install because every App is self-contained and non-merging with previous Apps. Source users similarly do not have to worry about this issue, although if they update their system python, they'll want to recreate their venv. Windows Installer users basically get a clean install every time, so they shouldn't have to worry about this.
"},{"location":"getting_started_installing.html#big_updates","title":"Big updates","text":"If you have not updated in some time--say twenty versions or more--doing it all in one jump, like v290->v330, may not work. I am doing a lot of unusual stuff with hydrus, change my code at a fast pace, and do not have a ton of testing in place. Hydrus update code often falls to bit rot, and so some underlying truth I assumed for the v299->v300 code may not still apply six months later. If you try to update more than 50 versions at once (i.e. trying to perform more than a year of updates in one go), the client will give you a polite error rather than even try.
As a result, if you get a failure on trying to do a big update, try cutting the distance in half--try v290->v310 first, and boot it. If the database updates correctly and the program boots, then shut down and move on to v310->v330. If the update does not work, cut down the gap and try v290->v300, and so on. Again, it is very important you make a backup before starting a process like this so you can roll back and try a different version if things go wrong.
If you narrow the gap down to just one version and still get an error, please let me know. If the problem is ever quick to appear and ugly/serious-looking, and perhaps talking about a \"bootloader\" or \"dll\" issue, then try doing a clean install as above. I am very interested in these sorts of problems and will be happy to help figure out a fix with you (and everyone else who might be affected).
All that said, and while updating is complex and every client is different, various user reports over the years suggest this route works and is efficient: 204 > 238 > 246 > 291 > 328 > 335 (clean install) > 376 > 421 > 466 (clean install) > 474 > 480 > 521 (maybe clean install) > 527 (special clean install) > 535 > 558 > 571 (clean install)
334->335We moved from python 2 to python 3.
If you need to update from 334 or before to 335 or later, then:
- If you use the Windows installer, install as normal.
- If you use one of the normal extract builds, you will have to do a 'clean install', as above.
- If you use the macOS app, there are no special instructions. Update as normal.
- If you run from source, there are no special instructions. Update as normal.
Some new dlls cause a potential conflict.
If you need to update from 427 or before to 428 or later, then:
- If you use the Windows installer, install as normal.
- If you use one of the normal extract builds, you will have to do a 'clean install', as above.
- If you use the macOS app, there are no special instructions. Update as normal.
- If you run from source, there are no special instructions. Update as normal.
527 changed the program executable name from 'client' to 'hydrus_client'. There was also a library update that caused a dll conflict with previous installs.
If you need to update from 526 or before to 527 or later, then:
- If you use the Windows installer, install as normal. Your start menu 'hydrus client' shortcut should be overwritten with one to the new executable, but if you use a custom shortcut, you will need to update that too.
- If you use one of the normal extract builds, you will have to do a 'clean install', as above.
- If you use the macOS app, there are no special instructions. Update as normal.
- If you run from source,
git pull
as normal. If you haven't already, feel free to run setup_venv again to get the new OpenCV. Update your launch scripts to point at the newhydrus_client.py
boot scripts.
571 updated the python version, which caused a dll conflict with previous installs.
If you need to update from 570 or before to 571 or later, then:
- If you use the Windows installer, install as normal.
- If you use one of the normal extract builds, you will have to do a 'clean install', as above.
- If you use the macOS app, there are no special instructions. Update as normal.
- If you run from source, there are no special instructions. Update as normal.
I am not joking around: if you end up liking hydrus, you should back up your database
Maintaining a regular backup is important for hydrus. The program stores a lot of complicated data that you will put hours and hours of work into, and if you only have one copy and your hard drive breaks, you could lose everything. This has happened before--to people who thought it would never happen to them--and it sucks big time to go through. Don't let it be you.
Hydrus's database engine, SQLite, is excellent at keeping data safe, but it cannot work in a faulty environment. Ways in which users of hydrus have damaged/lost their database:
- Hard drive hardware failure (age, bad ventilation, bad cables, etc...)
- Lightning strike on non-protected socket or rough power cut on non-UPS'd power supply
- RAM failure
- Motherboard/PSU power problems
- Accidental deletion
- Accidental overwrite (usually during a borked update)
- Encrypted partition auto-dismount/other borked settings
- Cloud backup interfering with ongoing writes
- An automatic OS backup routine misfiring and causing a rollback, wiping out more than a year of progress
- A laptop that incorrectly and roughly disconnected an external USB drive on every sleep
- Network drive location not guaranteeing accurate file locks
- Windows NVMe driver bugs necessitating a different SQLite journalling method
Some of those you can mitigate (don't run the database over a network!) and some will always be a problem, but if you have a backup, none of them can kill you.
This mostly means your database, not your files
Note that nearly all the serious and difficult-to-fix problems occur to the database, which is four large .db files, not your media. All your images and movies are read-only in hydrus, and there's less worry if they are on a network share with bad locks or a machine that suddenly loses power. The database, however, maintains a live connection, with regular complex writes, and here a hardware failure can lead to corruption (basically the failure scrambles the data that is written, so when you try to boot back up, a small section of the database is incomprehensible garbage).
If you do not already have a backup routine for your files, this is a great time to start. I now run a backup every week of all my data so that if my computer blows up or anything else awful happens, I'll at worst have lost a few days' work. Before I did this, I once lost an entire drive with tens of thousands of files, and it felt awful. If you are new to saving a lot of media, I hope you can avoid what I felt. ;_;
I use ToDoList to remind me of my jobs for the day, including backup tasks, and FreeFileSync to actually mirror over to an external usb drive. I recommend both highly (and for ToDoList, I recommend hiding the complicated columns, stripping it down to a simple interface). It isn't a huge expense to get a couple-TB usb drive either--it is absolutely worth it for the peace of mind.
By default, hydrus stores all your user data in one location, so backing up is simple:
"},{"location":"getting_started_installing.html#the_simple_way_-_inside_the_client","title":"The simple way - inside the client","text":"Go database->set up a database backup location in the client. This will tell the client where you want your backup to be stored. A fresh, empty directory on a different drive is ideal.
Once you have your location set up, you can thereafter hit database->update database backup. It will lock everything and mirror your files, showing its progress in a popup message. The first time you make this backup, it may take a little while (as it will have to fully copy your database and all its files), but after that, it will only have to copy new or altered files and should only ever take a couple of minutes.
Advanced users who have migrated their database and files across multiple locations will not have this option--use an external program in this case.
"},{"location":"getting_started_installing.html#the_powerful_and_best_way_-_using_an_external_program","title":"The powerful (and best) way - using an external program","text":"Doing it yourself is best. If you are an advanced user with a complicated hydrus install migrated across multiple drives, then you will have to do it this way--the simple backup will be disabled.
You need to backup two things, which are both, by default, beneath install_dir/db: the four client*.db files and your client_files directory(ies). The .db files contain absolutely everything about your client and files--your settings and file lists and metadata like inbox/archive and tags--while the client_files subdirs store your actual media and its thumbnails.
If everything is still under install_dir/db, then it is usually easiest to just backup the whole install dir, keeping a functional 'portable' copy of your install that you can restore no prob. Make sure you keep the .db files together--they are not interchangeable and mostly useless on their own!
An example FreeFileSync profile for backing up a database will look like this:
Note it has 'file time and size' and 'mirror' as the main settings. This quickly ensures that changes to the left-hand side are copied to the right-hand side, adding new files and removing since-deleted files and overwriting modified files. You can save a backup profile like that and it should only take a few minutes every week to stay safely backed up, even if you have hundreds of thousands of files.
Shut the client down while you run the backup, obviously.
"},{"location":"getting_started_installing.html#a_few_options","title":"A few options","text":"There are a host of other great alternatives out there, probably far too many to count. These are a couple that are often recommended and used by Hydrus users and are, in the spirit of Hydrus Network itself, free and open source.
FreeFileSync Linux, MacOS, Windows. Recommended and used by dev. Somewhat basic but does the job well enough.
Borg Backup FreeBSD, Linux, MacOS. More advanced and featureful backup tool.
Restic Almost every OS you can name.
Danger
Do not put your live database in a folder that continuously syncs to a cloud backup. Many of these services will interfere with a running client and can cause database corruption. If you still want to use a system like this, either turn the sync off while the client is running, or use the above backup workflows to safely backup your client to a separate folder that syncs to the cloud.
There is significantly more information about the database structure here.
I recommend you always backup before you update, just in case there is a problem with my update code that breaks your database. If that happens, please contact me, describing the problem, and revert to the functioning older version. I'll get on any problems like that immediately.
"},{"location":"getting_started_installing.html#backing_up_small","title":"Backing up with not much space","text":"If you decide not to maintain a backup because you cannot afford drive space for all your files, please please at least back up your actual database files. Use FreeFileSync or a similar program to back up the four 'client*.db' files in install_dir/db when the client is not running. Just make sure you have a copy of those files, and then if your main install becomes damaged, we will have a reference to either roll back to or manually restore data from. Even if you lose a bunch of media files in this case, with an intact database we'll be able to schedule recovery of anything with a URL.
If you are really short on space, note also that the database files are very compressible. A very large database where the four files add up to 70GB can compress down to 17GB zip with 7zip on default settings. Better compression ratios are possible if you make sure to put all four files in the same archive and turn up the quality. This obviously takes some additional time to do, but if you are really short on space it may be the only way it fits, and if your only backup drive is a slow USB stick, then you might actually save time from not having to transfer the other 53GB! Media files (jpegs, webms, etc...) are generally not very compressible, usually 5% at best, so it is usually not worth trying.
It is best to have all four database files. It is generally easy and quick to fix problems if you have a backup of all four. If client.caches.db is missing, you can recover but it might take ten or more hours of CPU work to regenerate. If client.mappings.db is missing, you might be able to recover tags for your local files from a mirror in an intact client.caches.db. However, client.master.db and client.db are the most important. If you lose either of those, or they become too damaged to read and you have no backup, then your database is essentially dead and likely every single archive and view and tag and note and url record you made is lost. This has happened before, do not let it be you.
"},{"location":"getting_started_more_tags.html","title":"Tags Can Get Complicated","text":"Tags are powerful, and there are many tools within hydrus to customise how they apply and display. I recommend you play around with the basics before making your own new local tag services or jumping right into the PTR, so take it slow.
"},{"location":"getting_started_more_tags.html#tags_are_for_searching_not_describing","title":"Tags are for Searching not Describing","text":"Hydrus users tend to be nerds of one sort or another, and we all like thinking about categorisation and semantic relationships. I provide several clever tools in this program, and it is not uncommon for newer users to spend hours sketching out intricate tree-charts and idiosyncratic taxonomy algebra in a One True Plan and then only tagging five actual files of anime cat girls before burning out. Try not to let this happen to you.
In making hydrus, I have discovered two rules to stop you going crazy:
- Don't try to be perfect.
- Only add those tags you actually use in searches.
There is always work to do, and it is easy to exhaust onesself or get lost in the bushes agonising over whether to use 'smile' or 'smiling' or 'smirk'--before you know it, you have been tagging the same file for four minutes, and there are twelve thousand to go. The details are not as important as the broad strokes, and problems are easy to correct in future. There is often also no perfect answer, and even if there were, we would never have time to apply it everywhere. The ride never ends.
The sheer number of tags can also be overwhelming. Importing all the many tags from boorus is totally fine, but if you are typing tags yourself, I suggest you try not to exhaustively tag everything in the image. You will go crazy and burn out!
Ultimately, tags are a medium for searching, not describing. Anyone can see what is in an image just by looking at it, so--for the most part--the only use in writing any of it down is if you would ever use those particular words to find the thing again. Character, series and creator namespaces are a great simple place to start. After that, add whatever you are most interested in, be that 'blue sky' or 'midriff' or fanfic ship names, whatever you would actually use in a search, and then you can spend your valuable time actually using your media rather than drowning-by-categorisation.
"},{"location":"getting_started_more_tags.html#tag_services","title":"Tag services","text":"Hydrus lets you organise tags across multiple separate 'services'. By default there are two, but you can have however many you want (
services->manage services
). You might like to add more for different sets of siblings/parents, tags you don't want to see but still search by, parsing tags into different services based on reliability of the source or the source itself. You could for example parse all tags from Pixiv into one service, Danbooru tags into another, Deviantart etc. and so on as you chose. You must always have at least one local tag service.Local tag services are stored only on your hard drive--they are completely private. No tags, siblings, or parents will accidentally leak, so feel free to go wild with whatever odd scheme you want to try out.
Each tag service comes with its own tags, siblings and parents.
"},{"location":"getting_started_more_tags.html#my_tags","title":"My tags","text":"The intent is to use this service for tags you yourself want to add.
"},{"location":"getting_started_more_tags.html#downloader_tags","title":"Downloader tags","text":"The default place for tags coming from downloaders. Tags of things you download will end up here unless you change the settings. It is a good idea to set up some tag blacklists for tags you do not want.
"},{"location":"getting_started_more_tags.html#tag_repositories","title":"Tag repositories","text":"It can take a long time to tag even small numbers of files well, so I created tag repositories so people can share the work.
Tag repos store many file->tag relationships. Anyone who has an access key to the repository can sync with it and hence download all these relationships. If any of their own files match up, they will get those tags. Access keys will also usually have permission to upload new tags and ask for incorrect ones to be deleted.
Anyone can run a tag repository, but it is a bit complicated for new users. I ran a public tag repository for a long time, and now this large central store is run by users. It has over a billion tags and is free to access and contribute to.
To connect with it, please check here. Please read that page if you want to try out the PTR. It is only appropriate for someone on an SSD!
If you add it, your client will download updates from the repository over time and, usually when it is idle or shutting down, 'process' them into its database until it is fully synchronised. The processing step is CPU and HDD heavy, and you can customise when it happens in file->options->maintenance and processing. As the repository synchronises, you should see some new tags appear, particularly on famous files that lots of people have.
You can watch more detailed synchronisation progress in the services->review services window.
Your new service should now be listed on the left of the manage tags dialog. Adding tags to a repository works very similarly to the 'my tags' service except hitting 'apply' will not immediately confirm your changes--it will put them in a queue to be uploaded. These 'pending' tags will be counted with a plus '+' or minus '-' sign.
Notice that a 'pending' menu has appeared on the main window. This lets you start the upload when you are ready and happy with everything that you have queued.
When you upload your pending tags, they will commit and look to you like any other tag. The tag repository will anonymously bundle them into the next update, which everyone else will download in a day or so. They will see your tags just like you saw theirs.
If you attempt to remove a tag that has been uploaded, you may be prompted to give a reason, creating a petition that a janitor for the repository will review.
I recommend you not spam tags to the public tag repo until you get a rough feel for the guidelines, and my original tag schema thoughts, or just lurk until you get the idea. It roughly follows what you will see on a typical booru. The general rule is to only add factual tags--no subjective opinion.
You can connect to more than one tag repository if you like. When you are in the manage tags dialog, pressing the up or down arrow keys on an empty input switches between your services.
FAQ: why can my friend not see what I just uploaded?
"},{"location":"getting_started_more_tags.html#siblings_and_parents","title":"Siblings and parents","text":"For more in-depth information, see siblings and parents.
tl;dr: Siblings rename/alias tags in an undoable way. Parents virtually add/imply one or more tags (parents) if the 'child' tag is present. The PTR has a lot of them.
"},{"location":"getting_started_more_tags.html#display_rules","title":"Display rules","text":"If you go to
"},{"location":"getting_started_ratings.html","title":"getting started with ratings","text":"tags -> manage where siblings and parents apply
you'll get a window where you can customise where and in what order siblings and parents apply. The service at the top of the list has precedence over all else, then second, and so on depending on how many you have. If you for example have PTR you can use a tag service to overwrite tags/siblings for cases where you disagree with the PTR standards.The hydrus client supports two kinds of ratings: like/dislike and numerical. Let's start with the simpler one:
"},{"location":"getting_started_ratings.html#like_dislike","title":"like/dislike","text":"A new client starts with one of these, called 'favourites'. It can set one of two values to a file. It does not have to represent like or dislike--it can be anything you want, like 'send to export folder' or 'explicit/safe' or 'cool babes'. Go to services->manage services->add->local like/dislike ratings:
You can set a variety of colours and shapes.
"},{"location":"getting_started_ratings.html#numerical","title":"numerical","text":"This is '3 out of 5 stars' or '8/10'. You can set the range to whatever whole numbers you like:
As well as the shape and colour options, you can set how many 'stars' to display and whether 0/10 is permitted.
If you change the star range at a later date, any existing ratings will be 'stretched' across the new range. As values are collapsed to the nearest integer, this is best done for scales that are multiples. \u2156 will neatly become 4/10 on a zero-allowed service, for instance, and 0/4 can nicely become \u2155 if you disallow zero ratings in the same step. If you didn't intuitively understand that, just don't touch the number of stars or zero rating checkbox after you have created the numerical rating service!
"},{"location":"getting_started_ratings.html#incdec","title":"inc/dec","text":"This is a simple counter. It can represent whatever you like, but most people usually go for 'I x this image y times'. You left-click to +1 it, right-click to -1.
"},{"location":"getting_started_ratings.html#using_ratings","title":"now what?","text":"Ratings are displayed in the top-right of the media viewer:
Hovering over each control will pop up its name, in case you forget which is which.
For like/dislike:
- Left-click: Set 'like'
- Right-click: Set 'dislike'
- Second X-click: Set 'not rated'
For numerical:
- Left-click: Set value
- Right-click: Set 'not rated'
For inc/dec:
- Left-click: +1
- Right-click: -1
Pressing F4 on a selection of thumbnails will open a dialog with a very similar layout, which will let you set the same rating to many files simultaneously.
Once you have some ratings set, you can search for them using system:rating, which produces this dialog:
On my own client, I find it useful to have several like/dislike ratings set up as quick one-click pseudo-tags. Stuff like 'this would make a good post' or 'read this later' that I can hit while I am doing archive/delete filtering.
"},{"location":"getting_started_searching.html","title":"Searching and sorting","text":"The primary purpose of tags is to be able to find what you've tagged again. Let's see more how it works.
"},{"location":"getting_started_searching.html#searching","title":"Searching","text":"Just open a new search page (
"},{"location":"getting_started_searching.html#the_dropdown_controls","title":"The dropdown controls","text":"pages > new file search page
or Ctrl+T> file search
) and start typing in the search field which should be focused when you first open the page.Let's look at the tag autocomplete dropdown:
-
system predicates
Hydrus calls search terms predicates. 'system predicates', which search metadata other than simple tags, show on any search page with an empty autocomplete input. You can mix them into any search alongside tags. They are very useful, so try them out!
-
include current/pending tags
Turn these on and off to control whether tag predicates apply to tags that exist, or those pending to be uploaded to a tag repository. Just searching 'pending' tags is useful if you want to scan what you have pending to go up to the PTR--just turn off 'current' tags and search
system:num tags > 0
. -
searching immediately
This controls whether a change to the list of current search predicates will instantly run the new search and get new results. Turning this off is helpful if you want to add, remove, or replace several heavy search terms in a row without getting UI lag.
-
OR
You only see this if you have 'advanced mode' on. It lets you enter some pretty complicated tags!
-
file/tag domains
By default, you will search in 'my files' and 'all known tags' domain. This is the intersection of your local media files (on your hard disk) and the union of all known tag searches. If you search for
character:samus aran
, then you will get file results from your 'my files' domain that havecharacter:samus aran
in any known tag service. For most purposes, this combination is fine, but as you use the client more, you will sometimes want to access different search domains.For instance, if you change the file domain to 'trash', then you will instead get files that are in your trash. Setting the tag domain to 'my tags' will ignore other tag services (e.g. the PTR) for all tag search predicates, so a
system:num_tags
or acharacter:samus aran
will only look 'my tags'.Turning on 'advanced mode' gives access to more search domains. Some of them are subtly complicated, run extremely slowly, and only useful for clever jobs--most of the time, you still want 'my files' and 'all known tags'.
-
favourite searches star
Once you are more experienced, have a play with this. It lets you save your common searches for future, so you don't have to either keep re-entering them or keep them open all the time. If you close big things down when you aren't using them, you will keep your client lightweight and save time.
When you type a tag in a search page, Hydrus will treat a space the same way as an underscore. Searching
character:samus aran
will find files tagged withcharacter:samus aran
andcharacter:samus_aran
. This is true of some other syntax characters,[](){}/\\\"'-
, too.Tags will be searchable by all their siblings. If there's a sibling for
"},{"location":"getting_started_searching.html#wildcards","title":"Wildcards","text":"large
->huge
then typinglarge
will providehuge
as a suggestion. This goes for the whole sibling chain, no matter how deep or a tag's position in it.The autocomplete tag dropdown supports wildcard searching with
*
.The
*
will match any number of characters. Every normal autocomplete search has a secret*
on the end that you don't see, which is how full words get matched from you only typing in a few letters.This is useful when you can only remember part of a word, or can't spell part of it. You can put
*
characters anywhere, but you should experiment to get used to the exact way these searches work. Some results can be surprising!You can select the special predicate inserted at the top of your autocomplete results (the highlighted
*gelion
and*va*ge*
above). It will return all files that match that wildcard, i.e. every file for every other tag in the dropdown list.This is particularly useful if you have a number of files with commonly structured over-informationed tags, like this:
In this case, selecting the
"},{"location":"getting_started_searching.html#editing_predicates","title":"Editing Predicates","text":"title:cool pic*
predicate will return all three images in the same search, where you can conveniently give them some more-easily searched tags likeseries:cool pic
andpage:1
,page:2
,page:3
.You can edit any selected 'active' search predicates by either its Right-Click menu or through Shift+Double-Left-Click on the selection. For simple tags, this means just changing the text (and, say, adding/removing a leading hyphen for negation/inclusion), but any 'system' predicate can be fully edited with its original panel. If you entered 'system:filesize < 200KB' and want to make it a little bigger, don't delete and re-add--just edit the existing one in place.
"},{"location":"getting_started_searching.html#other_shortcuts","title":"Other Shortcuts","text":"These will eventually be migrated to the shortcut system where they will be more visible and changeable, but for now:
- Left-Click on any taglist is draggable, if you want to select multiple tags quickly.
- Shift+Left-Click across any taglist will do a multi-select. This click is also draggable.
- Ctrl+Left-Click on any taglist will add to or remove from the selection. This is draggable, and if you start on a 'remove', the drag will be a 'remove' drag. Play with it--you'll see how it works.
- Double-Left-Click on one or more tags in the 'selection tags' box moves them to the active search box. Doing the same on the active search box removes them.
- Ctrl+Double-Left-Click on one or more tags in the 'selection tags' box will add their negation (i.e. '-skirt').
- Shift+Double-Left-Click on more than one tags in the 'selection tags' box will add their 'OR' to the active search box. What's an OR? Well:
Searches find files that match every search 'predicate' in the list (it is an AND search), which makes it difficult to search for files that include one OR another tag. For example the query
red eyes
ANDgreen eyes
(aka what you get if you enter each tag by itself) will only find files that has both tags. While the queryred eyes
ORgreen eyes
will present you with files that are tagged with red eyes or green eyes, or both.More recently, simple OR search support was added. All you have to do is hold down Shift when you enter/double-click a tag in the autocomplete entry area. Instead of sending the tag up to the active search list up top, it will instead start an under-construction 'OR chain' in the tag results below:
You can keep searching for and entering new tags. Holding down ++Shift++ on new tags will extend the OR chain, and entering them as normal will 'cap' the chain and send it to the complete and active search predicates above.
Any file that has one or more of those OR sub-tags will match.
If you enter an OR tag incorrectly, you can either cancel or 'rewind' the under-construction search predicate with these new buttons that will appear:
You can also cancel an under-construction OR by hitting Esc on an empty input. You can add any sort of search term to an OR search predicate, including system predicates. Some unusual sub-predicates (typically a
-tag
, or a very broad system predicate) can run very slowly, but they will run much faster if you include non-OR search predicates in the search:This search will return all files that have the tag
fanfic
and one or more ofmedium:text
, a positive value for the like/dislike rating 'read later', or PDF mime.There's a more advanced OR search function available by pressing the OR button. Previous knowledge of operators expected and required.
"},{"location":"getting_started_searching.html#sorting","title":"Sorting","text":"At the top-left of most pages there's a
sort by:
dropdown menu. Most of the options are self-explanatory. They do nothing except change in what order Hydrus presents the currently searched files to you.Default sort order and more
"},{"location":"getting_started_searching.html#sorting_with_systemlimit","title":"Sorting withsort by: namespace
are found infile -> options -> sort/collect
.system:limit
","text":"If you add
system:limit
to a search, the client will consider what that page's file sort currently is. If it is simple enough--something like file size or import time--then it will sort your results before they come back and clip the limit according to that sort, getting the n 'largest file size' or 'newest imports' and so on. This can be a great way to set up a lightweight filtering page for 'the 256 biggest videos in my inbox'.If you change the sort, hydrus will not refresh the search, it'll just re-sort the n files you have. Hit F5 to refresh the search with a new sort.
Not all sorts are supported. Anything complicated like tag sort will result in a random sample instead.
"},{"location":"getting_started_searching.html#collecting","title":"Collecting","text":"Collection is found under the
"},{"location":"getting_started_subscriptions.html","title":"subscriptions","text":"sort by:
dropdown and uses namespaces listed in thesort by: namespace
sort options. The new namespaces will only be available in new pages.The introduction to subscriptions has been moved to the main downloading help here.
"},{"location":"getting_started_subscriptions.html#description","title":"how do subscriptions work?","text":"For the most part, all you need to do to set up a good subscription is give it a name, select the download source, and use the 'paste queries' button to paste what you want to search. Subscriptions have great default options for almost all query types, so you don't have to go any deeper than that to get started.
Once you hit ok on the main subscription dialog, the subscription system should immediately come alive. If any queries are due for a 'check', they will perform their search and look for new files (i.e. URLs it has not seen before). Once that is finished, the file download queue will be worked through as normal. Typically, the sub will make a popup like this while it works:
The initial sync can sometimes take a few minutes, but after that, each query usually only needs thirty seconds' work every few days. If you leave your client on in the background, you'll rarely see them. If they ever get in your way, don't be afraid to click their little cancel button or call a global halt with network->pause->subscriptions--the next time they run, they will resume from where they were before.
Similarly, the initial sync may produce a hundred files, but subsequent runs are likely to only produce one to ten. If a subscription comes across a lot of big files at once, it may not download them all in one go--but give it time, and it will catch back up before you know it.
When it is done, it leaves a little popup button that will open a new page for you:
This can often be a nice surprise!
"},{"location":"getting_started_subscriptions.html#good_subs","title":"what makes a good subscription?","text":"The same rules as for downloaders apply: start slow, be hesitant, and plan for the long-term. Artist queries make great subscriptions as they update reliably but not too often and have very stable quality. Pick the artists you like most, see where their stuff is posted, and set up your subs like that.
Series and character subscriptions are sometimes valuable, but they can be difficult to keep up with and have highly variable quality. It is not uncommon for users to only keep 15% of what a character sub produces. I do not recommend them for anything but your waifu.
Attribute subscriptions like 'blue_eyes' or 'smile' make for terrible subs as the quality is all over the place and you will be inundated by too much content. The only exceptions are for specific, low-count searches that really matter to you, like 'contrapposto' or 'gothic trap thighhighs'.
If you end up subscribing to eight hundred things and get ten thousand new files a week, you made a mistake. Subscriptions are for keeping up with things you like. If you let them overwhelm you, you'll resent them.
It is a good idea to run a 'full' download for a search before you set up a subscription. As well as making sure you have the exact right query text and that you have everything ever posted (beyond the 100 files deep a sub will typically look), it saves the bulk of the work (and waiting on bandwidth) for the manual downloader, where it belongs. When a new subscription picks up off a freshly completed download queue, its initial subscription sync only takes thirty seconds since its initial URLs are those that were already processed by the manual downloader. I recommend you stack artist searches up in the manual downloader using 'no limit' file limit, and when they are all finished, select them in the list and right-click->copy queries, which will put the search texts in your clipboard, newline-separated. This list can be pasted into the subscription dialog in one go with the 'paste queries' button again!
"},{"location":"getting_started_subscriptions.html#checking","title":"images/how often do subscriptions check?","text":"Hydrus subscriptions use the same variable-rate checking system as its thread watchers, just on a larger timescale. If you subscribe to a busy feed, it might check for new files once a day, but if you enter an artist who rarely posts, it might only check once every month. You don't have to do anything. The fine details of this are governed by the 'checker options' button. This is one of the things you should not mess with as you start out.
If a query goes too 'slow' (typically, this means no new files for 180 days), it will be marked DEAD in the same way a thread will, and it will not be checked again. You will get a little popup when this happens. This is all editable as you get a better feel for the system--if you wish, it is completely possible to set up a sub that never dies and only checks once a year.
I do not recommend setting up a sub that needs to check more than once a day. Any search that is producing that many files is probably a bad fit for a subscription. Subscriptions are for lightweight searches that are updated every now and then.
(you might like to come back to this point once you have tried subs for a week or so and want to refine your workflow)
"},{"location":"getting_started_subscriptions.html#presentation","title":"ok, I set up three hundred queries, and now these popup buttons are a hassle","text":"On the edit subscription panel, the 'presentation' options let you publish files to a page. The page will have the subscription's name, just like the button makes, but it cuts out the middle-man and 'locks it in' more than the button, which will be forgotten if you restart the client. Also, if a page with that name already exists, the new files will be appended to it, just like a normal import page! I strongly recommend moving to this once you have several subs going. Make a 'page of pages' called 'subs' and put all your subscription landing pages in there, and then you can check it whenever is convenient.
If you discover your subscription workflow tends to be the same for each sub, you can also customise the publication 'label' used. If multiple subs all publish to the 'nsfw subs' label, they will all end up on the same 'nsfw subs' popup button or landing page. Sending multiple subscriptions' import streams into just one or two locations like this can be great.
You can also hide the main working popup. I don't recommend this unless you are really having a problem with it, since it is useful to have that 'active' feedback if something goes wrong.
Note that subscription file import options will, by default, only present 'new' files. Anything already in the db will still be recorded in the internal import cache and used to calculate next check times and so on, but it won't clutter your import stream. This is different to the default for all the other importers, but when you are ready to enter the ranks of the Patricians, you will know to edit your 'loud' default file import options under options->importing to behave this way as well. Efficient workflows only care about new files.
"},{"location":"getting_started_subscriptions.html#syncing_explanation","title":"how exactly does the sync work?","text":"Figuring out when a repeating search has 'caught up' can be a tricky problem to solve. It sounds simple, but unusual situations like 'a file got tagged late, so it inserted deeper than it ideally should in the gallery search' or 'the website changed its URL format completely, help' can cause problems. Subscriptions are automatic systems, so they tend to be a bit more careful and paranoid about problems, lest they burn 10GB on 10,000 unexpected diaperfur images.
The initial sync is simple. It does a regular search, stopping if it reaches the 'initial file limit' or the last file in the gallery, whichever comes first. The default initial file sync is 100, which is a great number for almost all situations.
Subsequent syncs are more complicated. It ideally 'stops' searching when it reaches files it saw in a previous sync, but if it comes across new files mixed in with the old, it will search a bit deeper. It is not foolproof, and if a file gets tagged very late and ends up a hundred deep in the search, it will probably be missed. There is no good and computationally cheap way at present to resolve this problem, but thankfully it is rare.
Remember that an important 'staying sane' philosophy of downloading and subscriptions is to focus on dealing with the 99.5% you have before worrying about the 0.5% you do not.
The amount of time between syncs is calculated by the checker options. Based on the timestamps attached to existing urls in the subscription cache (either added time, or the post time as parsed from the url), the sub estimates how long it will be before n new files appear, and then next check is scheduled for then. Unless you know what you are doing, checker options, like file limits, are best left alone. A subscription will naturally adapt its checking speed to the file 'velocity' of the source, and there is usually very little benefit to trying to force a sub to check at a radically different speed.
Tip
If you want to force your subs to run at the same time, say every evening, it is easier to just use network->pause->subscriptions as a manual master on/off control. The ones that are due will catch up together, the ones that aren't won't waste your time.
Remember that subscriptions only keep up with new content. They cannot search backwards in time in order to 'fill out' a search, nor can they fill in gaps. Do not change the file limits or check times to try to make this happen. If you want to ensure complete sync with all existing content for a particular search, use the manual downloader.
In practice, most subs only need to check the first page of a gallery since only the first two or three urls are new.
"},{"location":"getting_started_subscriptions.html#periodic_file_limit","title":"periodic file limit exceeded","text":"If, during a regular sync, the sub keeps finding new URLs, never hitting a block of already-seen URLs, it will stop upon hitting its 'periodic file limit', which is also usually 100. When it happens, you will get a popup message notification. There are two typical reasons for this:
- A user suddenly posted a large number of files to the site for that query. This sometimes happens with CG gallery spam.
- The website changed their URL format.
The first case is a natural accident of statistics. The subscription now has a 'gap' in its sync. If you want to get what you missed, you can try to fill in the gap with a manual downloader page. Just download to 200 files or so, and the downloader will work quickly to one-time work through the URLs in the gap.
The second case is a safety stopgap for hydrus. If a site decides to have
"},{"location":"getting_started_subscriptions.html#merging_and_separating","title":"I put character queries in my artist sub, and now things are all mixed up","text":"/post/123456
style URLs instead ofpost.php?id=123456
style, hydrus will suddenly see those as entirely 'new' URLs. It could also be because of an updated downloader, which pulls URLs in API format or similar. This is again thankfully quite rare, but it triggers several problems--the associated downloader usually breaks, as it does not yet recognise those new URLs, and all your subs for that site will parse through and hit the periodic limit for every query. When this happens, you'll usually get several periodic limit popups at once, and you may need to update your downloader. If you know the person who wrote the original downloader, they'll likely want to know about the problem, or may already have a fix sorted. It is often a good idea to pause the affected subs until you have it figured out and working in a normal gallery downloader page.On the main subscription dialog, there are 'merge' and 'separate' buttons. These are powerful, but they will walk you through the process of pulling queries out of a sub and merging them back into a different one. Only subs that use the same download source can be merged. Give them a go, and if it all goes wrong, just hit the cancel button on the dialog.
"},{"location":"getting_started_tags.html","title":"Getting started with tags","text":"A tag is a small bit of text describing a single property of something. They make searching easy. Good examples are \"flower\" or \"nicolas cage\" or \"the sopranos\" or \"2003\". By combining several tags together ( e.g. [ 'tiger woods', 'sports illustrated', '2008' ] or [ 'cosplay', 'the legend of zelda' ] ), a huge image collection is reduced to a tiny and easy-to-digest sample.
"},{"location":"getting_started_tags.html#intro","title":"How do we find files?","text":"So, you have some files imported. Let's give them some tags so we can find them again later.
FAQ: what is a tag?
Your client starts with two local tags services, called 'my tags' and 'downloader tags' which keep all of their file->tag mappings in your client's database where only you can see them. 'my tags' is a good place to practise.
Select a file and press F3 to open the manage tags dialog:
The area below where you type is the 'autocomplete dropdown'. You will see this on normal search pages too. Type part of a tag, and matching results will appear below. Since you are starting out, your 'my tags' service won't have many tags in it yet, but things will populate fast! Select the tag you want with the arrow keys and hit enter. If you want to remove a tag, enter the exact same thing again or double-click it in the box above.
Prefixing a tag with a category and a colon will create a namespaced tag. This helps inform the software and other users about what the tag is. Examples of namespaced tags are:
character:batman
series:street fighter
person:jennifer lawrence
title:vitruvian man
The client is set up to draw common namespaces in different colours, just like boorus do. You can change these colours in the options.
Once you are happy with your tag changes, click 'apply', or hit F3 again, or simply press Enter on the text box while it is empty. The tags are now saved to your database.
Media Viewer Manage Tags
You can also open the manage tags dialog from the full media viewer, but note that this one does not have 'apply' and 'cancel' buttons, only 'close'. It makes its changes instantly, and you can keep using the rest of the program while it is open (it is a non-'modal' dialog).
Also, you need not close the media viewer's manage tags dialog while you browse. Just like you can hit Enter on the empty text box to close the dialog, hitting Page Up/Down navigates the parent viewer Back/Forward!
AlsoHit Arrow Up/Down on an empty text input to switch between the tag service tabs!
Once you have some tags set, typing the first few characters of one in on a search page will show the counts of all the tags that start with that. Enter the one you want, and the search will run:
If you add more 'predicates' to a search, you will limit the results to those files that match every single one:
You can also exclude a tag by prefixing it with a hyphen (e.g.
-solo
).You can add as many tags as you want. In general, the more search predicates you add, the smaller and faster the results will be, but some types of tag (like excluded
"},{"location":"introduction.html","title":"introduction and statement of principles","text":""},{"location":"introduction.html#this_help","title":"this help","text":"-tags
), or the cleverersystem
tags that you will soon learn about, can be suddenly CPU expensive. If a search takes more than a few seconds to run, a 'stop' button appears by the tag input. It cancels things out pretty quick in most cases.Click the links on the left to go through the getting started guide. Subheadings are on the right. Larger sections are up top. Please at least skim every page in the getting started section, as this will introduce you to the main systems in the client. There is a lot, so you do not have to do it all in one go.
The section on installing, updating, and backing up is very important.
This help is available locally in every release. Hit
"},{"location":"introduction.html#files","title":"on having too many files","text":"help->help and getting started guide
in the client, or openinstall_dir/help/index.html
.I've been on the internet and imageboards for a long time, saving everything I like to my hard drive. After a while, the whole collection was just too large to manage on my own. I couldn't find anything in the mess, and I just saved new files in there with names like 'image1257.jpg'.
There aren't many solutions to this problem that aren't online, and I didn't want to lose my privacy or control.
"},{"location":"introduction.html#anonymous","title":"on being anonymous","text":"I enjoy being anonymous online. When you aren't afraid of repercussions, you can be as truthful as you want and share interesting things, no matter how unusual. You can have unique conversations and tackle some otherwise unsolvable problems. It's fun!
I'm a normal Anon, nothing special. :^)
"},{"location":"introduction.html#hydrus_network","title":"the hydrus network","text":"So! I'm developing a program that helps people organise their files on their own terms and, if they want to, collaborate with others anonymously. I want to help you do what you want with your stuff, and that's it. You can share some tags (and files, but this is limited) with other people if you want to, but you don't have to connect to anything if you don't. The default is complete privacy, no sharing, and every upload requires a conscious action on your part. I don't plan to ever record metrics on users, nor serve ads, nor charge for my software. The software never phones home.
This does a lot more than a normal image viewer. If you are totally new to the idea of personal media collections and booru-style tagging, I suggest you start slow, walk through the getting started guides, and experiment doing different things. If you aren't sure on what a button does, try clicking it! You'll be importing thousands of files and applying tens of thousands of tags in no time. The best way to learn is just to try things out.
The client is chiefly a file database. It stores your files inside its own folders, managing them far better than an explorer window or some online gallery. Here's a screenshot of one of my test installs with a search showing all files:
As well as the client, there is also a server that anyone can run to store files or tags for sharing between many users. This is advanced, and almost always confusing to new users, so do not explore this until you know what you are doing. There is, however, a user-run public tag repository, with more than a billion tags, that you can access and contribute to if you wish.
I have many plans to expand the client and the network.
"},{"location":"introduction.html#principles","title":"statement of principles","text":"- Speech should be as free as possible.
- Everyone should be able to control their own media diet.
- Computer data and network logs should be absolutely private.
None of the above are currently true, but I would love to live in a world where they were. My software is an attempt to move us a little closer.
Where possible, I prefer decentralised systems that are focused on people. I still use gmail and youtube IRL just like pretty much everyone, but I would rather we have alternative systems for alternate work, especially in the future. No one seemed to be making what I wanted for file management, particularly as everything rushed to the cloud space, so I decided to make a local solution myself, and here we are.
If, after a few months, you find you enjoy the software and would like to further support it, I have set up a simple no-reward patreon, which you can read more about here.
"},{"location":"introduction.html#license","title":"license","text":"These programs are free software. Everything I, hydrus dev, have made is under the Do What The Fuck You Want To Public License, Version 3:
license.txtDO WHAT THE FUCK YOU WANT TO PUBLIC LICENSE\n Version 3, May 2010\n\nCopyright (C) 2011 Hydrus Developer\n\nEveryone is permitted to copy and distribute verbatim or modified\ncopies of this license document, and changing it is allowed as long\nas the name is changed.\n\nThis license applies to any copyrightable work with which it is\npackaged and/or distributed, except works that are already covered by\nanother license. Any other license that applies to the same work\nshall take precedence over this one.\n\nTo the extent permitted by applicable law, the works covered by this\nlicense are provided \"as is\" and do not come with any warranty except\nwhere otherwise explicitly stated.\n\n\n DO WHAT THE FUCK YOU WANT TO PUBLIC LICENSE\n TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION, AND MODIFICATION\n\n 0. You just DO WHAT THE FUCK YOU WANT TO.\n
Do what the fuck you want to with my software, and if shit breaks, DEAL WITH IT.
"},{"location":"ipfs.html","title":"IPFS","text":"IPFS is a p2p protocol that makes it easy to share many sorts of data. The hydrus client can communicate with an IPFS daemon to send and receive files.
You can read more about IPFS from their homepage, or this guide that explains its various rules in more detail.
For our purposes, we only need to know about these concepts:
- IPFS daemon -- A running instance of the IPFS executable that can talk to the larger network.
- IPFS multihash -- An IPFS-specific identifier for a file or group of files.
- pin -- To tell our IPFS daemon to host a file or group of files.
- unpin -- To tell our IPFS daemon to stop hosting a file or group of files.
Note there is now a nicer desktop package here. I haven't used it, but it may be a nicer intro to the program.
Get the prebuilt executable here. Inside should be a very simple 'ipfs' executable that does everything. Extract it somewhere and open up a terminal in the same folder, and then type:
ipfs init
ipfs daemon
The IPFS exe should now be running in that terminal, ready to respond to requests:
You can kill it with Ctrl+C and restart it with the
ipfs daemon
call again (you only have to runipfs init
once).When it is running, opening this page should download and display an example 'Hello World!' file from ~~~across the internet~~~.
Your daemon listens for other instances of ipfs using port 4001, so if you know how to open that port in your firewall and router, make sure you do.
"},{"location":"ipfs.html#connecting","title":"connecting your client","text":"IPFS daemons are treated as services inside hydrus, so go to services->manage services->remote->ipfs daemons and add in your information. Hydrus uses the API port, default 5001, so you will probably want to use credentials of
127.0.0.1:5001
. You can click 'test credentials' to make sure everything is working.Thereafter, you will get the option to 'pin' and 'unpin' from a thumbnail's right-click menu, like so:
This works like hydrus's repository uploads--it won't happen immediately, but instead will be queued up at the pending menu. Commit all your pins when you are ready:
Notice how the IPFS icon appears on your pending and pinned files. You can search for these files using 'system:file service'.
Unpin works the same as pin, just like a hydrus repository petition.
Right-clicking any pinned file will give you a new 'share' action:
Which will put it straight in your clipboard. In this case, it is QmP6BNvWfkNf74bY3q1ohtDZ9gAmss4LAjuFhqpDPQNm1S.
If you want to share a pinned file with someone, you have to tell them this multihash. They can then:
- View it through their own ipfs daemon's gateway, at
http://127.0.0.1:8080/ipfs/[multihash]
- View it through a public web gateway, such as the one the IPFS people run, at
http://ipfs.io/ipfs/[multihash]
- Download it through their ipfs-connected hydrus client by going pages->new download popup->an ipfs multihash.
If you have many files to share, IPFS also supports directories, and now hydrus does as well. IPFS directories use the same sorts of multihash as files, and you can download them into the hydrus client using the same pages->new download popup->an ipfs multihash menu entry. The client will detect the multihash represents a directory and give you a simple selection dialog:
You may recognise those hash filenames--this example was created by hydrus, which can create ipfs directories from any selection of files from the same right-click menu:
Hydrus will pin all the files and then wrap them in a directory, showing its progress in a popup. Your current directory shares are summarised on the respective services->review services panel:
"},{"location":"ipfs.html#additional_links","title":"additional links","text":"If you find you use IPFS a lot, here are some add-ons for your web browser, as recommended by /tech/:
This script changes all bare ipfs hashes into clickable links to the ipfs gateway (on page loads):
- https://greasyfork.org/en/scripts/14837-ipfs-hash-linker
These redirect all gateway links to your local daemon when it's on, it works well with the previous script:
- https://github.com/lidel/ipfs-firefox-addon
- https://github.com/dylanPowers/ipfs-chrome-extension
You can launch the program with several different arguments to alter core behaviour. If you are not familiar with this, you are essentially putting additional text after the launch command that runs the program. You can run this straight from a terminal console (usually good to test with), or you can bundle it into an easy shortcut that you only have to double-click. An example of a launch command with arguments:
C:\\Hydrus Network\\hydrus_client.exe -d=\"E:\\hydrus db\" --no_db_temp_files\n
You can also add --help to your program path, like this:
hydrus_client.py --help
hydrus_server.exe --help
./hydrus_server --help
Which gives you a full listing of all below arguments, however this will not work with the built hydrus_client executables, which are bundled as a non-console programs and will not give you text output to any console they are launched from. As hydrus_client.exe is the most commonly run version of the program, here is the list, with some more help about each command:
"},{"location":"launch_arguments.html#-d_db_dir_--db_dir_db_dir","title":"-d DB_DIR, --db_dir DB_DIR
","text":"Lets you customise where hydrus should use for its base database directory. This is install_dir/db by default, but many advanced deployments will move this around, as described here. When an argument takes a complicated value like a path that could itself include whitespace, you should wrap it in quote marks, like this:
"},{"location":"launch_arguments.html#--temp_dir_temp_dir","title":"-d=\"E:\\my hydrus\\hydrus db\"\n
--temp_dir TEMP_DIR
","text":"This tells all aspects of the client, including the SQLite database, to use a different path for temp operations. This would be by default your system temp path, such as:
C:\\Users\\You\\AppData\\Local\\Temp\n
But you can also check it in help->about. A handful of database operations (PTR tag processing, vacuums) require a lot of free space, so if your system drive is very full, or you have unusual ramdisk-based temp storage limits, you may want to relocate to another location or drive.
"},{"location":"launch_arguments.html#--db_journal_mode_waltruncatepersistmemory","title":"--db_journal_mode {WAL,TRUNCATE,PERSIST,MEMORY}
","text":"Change the journal mode of the SQLite database. The default is WAL, which works great for almost all SSD drives, but if you have a very old or slow drive, or if you encounter 'disk I/O error' errors on Windows with an NVMe drive, try TRUNCATE. Full docs are here.
Briefly:
- WAL - Clever write flushing that takes advantage of new drive synchronisation tools to maintain integrity and reduce total writes.
- TRUNCATE - Compatibility mode. Use this if your drive cannot launch WAL.
- PERSIST - This is newly added to hydrus. The ideal is that if you have a high latency HDD drive and want sync with the PTR, this will work more efficiently than WAL journals, which will be regularly wiped and recreated and be fraggy. Unfortunately, with hydrus's multiple database file system, SQLite ultimately treats this as DELETE, which in our situation is basically the same as TRUNCATE, so does not increase performance. Hopefully this will change in future.
- MEMORY - Danger mode. Extremely fast, but you had better guarantee a lot of free ram and no unclean exits.
--db_transaction_commit_period DB_TRANSACTION_COMMIT_PERIOD
","text":"Change the regular duration at which any database changes are committed to disk. By default this is 30 (seconds) for the client, and 120 for the server. Minimum value is 10. Typically, if hydrus crashes, it may 'forget' what happened up to this duration on the next boot. Increasing the duration will result in fewer overall 'commit' writes during very heavy work that makes several changes to the same database pages (read up on WAL mode for more details here), but it will increase commit time and memory/storage needs. Note that changes can only be committed after a job is complete, so if a single job takes longer than this period, changes will not be saved until it is done.
"},{"location":"launch_arguments.html#--db_cache_size_db_cache_size","title":"--db_cache_size DB_CACHE_SIZE
","text":"Change the size of the cache SQLite will use for each db file, in MB. By default this is 256, for 256MB, which for the four main client db files could mean an absolute 1GB peak use if you run a very heavy client and perform a long period of PTR sync. This does not matter so much (nor should it be fully used) if you have a smaller client.
"},{"location":"launch_arguments.html#--db_synchronous_override_0123","title":"--db_synchronous_override {0,1,2,3}
","text":"Change the rules governing how SQLite writes committed changes to your disk. The hydrus default is 1 with WAL, 2 otherwise.
A user has written a full guide on this value here! SQLite docs here.
"},{"location":"launch_arguments.html#--no_db_temp_files","title":"--no_db_temp_files
","text":"When SQLite performs very large queries, it may spool temporary table results to disk. These go in your temp directory. If your temp dir is slow but you have a ton of memory, set this to never spool to disk, as here.
"},{"location":"launch_arguments.html#--boot_debug","title":"--boot_debug
","text":"Prints additional debug information to the log during the bootup phase of the application.
"},{"location":"launch_arguments.html#--profile_mode","title":"--profile_mode
","text":"This starts the program with 'Profile Mode' turned on, which captures the performance of boot functions. This is also a way to get Profile Mode on the server, although support there is very limited.
"},{"location":"launch_arguments.html#--win_qt_darkmode_test","title":"--win_qt_darkmode_test
","text":"Windows only, client only: This starts the program with Qt's 'darkmode' detection enabled, as here, set to 1 mode. It will override any existing qt.conf, so it is only for experimentation. We are going to experiment more with the 2 mode, but that locks the style to
"},{"location":"launch_arguments.html#server_arguments","title":"server arguments","text":"windows
, and can't handle switches between light and dark mode.The server supports the same arguments. It also takes an optional positional argument of 'start' (start the server, the default), 'stop' (stop any existing server), or 'restart' (do a stop, then a start), which should go before any of the above arguments.
"},{"location":"petitionPractices.html","title":"Petitions practices","text":"This document exists to give a rough idea what to do in regard to the PTR to avoid creating uncecessary work for the janitors.
"},{"location":"petitionPractices.html#general_practice","title":"General practice","text":"Kindly avoid creating unnecessary work. Create siblings for underscore and non-namespaced/namespaced versions. Petition for deletion if they are wrong. Providing a reason outside of the stock choices helps the petition getting accepted. If, for whatever reason, you have some mega job that needs doing it's often a good idea to talk to a janitor instead since we can just go ahead and do the job directly without having to deal with potentially tens of petitions because of how Hydrus splits them on the server. An example that we often come across is the removal of the awful Sankaku URLs that are almost everywhere these days due to people using a faulty parser. It's a pretty easy search and delete for a janitor, but a lot of annoying clicking if dealt with as a petition since one big petition can be split out to God-only-knows-how many.
Eventually the PTR janitors will get tools to replace various bad but correct tags on the server itself. These include underscored, wrong or no namespace, common misspelling, wrong locale, and so on. Since we're going to have to do the job eventually anyway there's not much of a point making us do it twice by petitioning the existing bad but correct tags. Just sibling them and leave them be for now.
"},{"location":"petitionPractices.html#ambiguity","title":"Ambiguity","text":"Don't make additions involving ambiguous tags.
"},{"location":"petitionPractices.html#petitions_involving_system_predicates","title":"Petitions involving system predicates","text":"hibiki
->character:hibiki (kantai collection)
is bad since there's far more than one character with that name. There's quite a few wrongly tagged images because of things like this. Petitioning the deletion of such a bad sibling is good.Anything that's covered by system predicates. Siblinging these is unecessary and parenting pointless. There's no harm leaving them be aside from crowding the tag list but there's no harm to deleting them either.
-
system:dimensions
covers most everything related to resolution and aspect ratios.medium:high resolution
,4:3 aspect ratio
, and pixel count. -
system:duration
for whether something has duration (is a video or animated gif/png/whatever), or is a still image. -
system:has audio
for if an image has audio or not.system:has duration + system:no audio
replacesvideo with no sound
as an example. -
system:filesize
for things likehuge filesize
. -
system:filetype
for filetypes. Gif, webm, mp4, psd, and so on. Anything that Hydrus can recognise which is quite a bit.
Don't push parents for tags that are not top-level siblings. It makes tracking down potential issues hard.
Only push parents for relations that are literally always true, no exceptions.
character:james bond
->series:james bond
is a good example because James Bond always belong to that series. ->gender:male
is bad because an artist might decide to draw a genderbent piece of art. Similarily ->person:pierce brosnan
is bad because there have been other actors for the character.List of some bad parents to
"},{"location":"petitionPractices.html#translations","title":"Translations","text":"character:
tags as an example: -species:
due to the various -zations (humanization, animalization, mechanization). -creator:
since just about anybody can draw art of the character. -gender:
Sincegenderswap
and variations exists. - Any form of physical characteristics such as hair or eye colour, hair length, clothing and accessories, etc.Translations should be siblinged to what the closest in-use romanised tag is if there's no proper translation. If the tag is ambiguous, such as
"},{"location":"privacy.html","title":"privacy","text":"\u97ff
or\u30d2\u30d3\u30ad
which meanshibiki
, just sibling them to the ambiguous tag. The tag can then later on be deleted and replaced by a less ambiguous tag. On the other hand,\u97ff(\u8266\u968a\u3053\u308c\u304f\u3057\u3087\u3093)
straight up meanshibiki (kantai kollection)
and can safely be siblinged to the propercharacter:
tag. Do the same for subjective tags.\u9b45\u60d1\u306e\u3075\u3068\u3082\u3082
can be translated tobewitching thighs
.\u307e\u3063\u305f\u304f\u3001\u99c6\u9010\u8266\u306f\u6700\u9ad8\u3060\u305c!!
straight up translates toGeez, destroyers are the best!!
, which does not contain much usable information for Hydrus currently. These can then either be siblinged down to an unsubjective tag (thighs
) if there's objective information in the tag, deleted if purely subjective, or deleted and replaced if ambiguous.tl;dr
Using a trustworthy VPN for all your remotely fun internet traffic is a good idea. It is cheap and easy these days, and it offers multiple levels of general protection.
I have tried very hard to ensure the hydrus network servers respect your privacy. They do not work like normal websites, and the amount of information your client will reveal to them is very limited. For most general purposes, normal users can rest assured that their activity on a repository like the Public Tag Repository (PTR) is effectively completely anonymous.
You need an account to connect, but all that really means serverside is a random number with a random passcode. Your client tells nothing more to the server than the exact content you upload to it (e.g. tag mappings, which are a tag+file_hash pair). The server cannot help but be aware of your IP address to accept your network request, but in all but one situation--uploading a file to a file repository when the administrator has set to save IPs for DMCA purposes--it forgets your IP as soon as the job is done.
So that janitors can process petitions efficiently and correct mistakes, servers remember which accounts upload which content, but they do not communicate this to any place, and the memory only lasts for a certain time--after which the content is completely anonymised. The main potential privacy worries are over a malicious janitor or--more realistically, since the janitor UI is even more buggy and feature-poor than the hydrus front-end!--a malicious server owner or anyone else who gains raw access to the server's raw database files or its code as it operates. Even in the case where you cannot trust the server you are talking to, hydrus should be fairly robust, simply because the client does not say much to the server, nor that often. The only realistic worries, as I talk about in detail below, are if you actually upload personal files or tag personal files with real names. I can't do much about being Anon if you (accidentally or not), declare who you are.
So, in general, if you are on a good VPN and tagging anime babes from boorus, I think we are near perfect on privacy. That said, our community is rightly constantly thinking about this topic, so in the following I have tried to go into exhaustive detail. Some of the vulnerabilities are impractical and esoteric, but if nothing else it is fun to think about. If you can think of more problems, or decent mitigations, let me know!
"},{"location":"privacy.html#https_certificates","title":"https certificates","text":"Hydrus servers only communicate in https, so anyone who is able to casually observe your traffic (say your roommate cracked your router, or the guy running the coffee shop whose wifi you are using likes to snoop) should not ever be able to see what data you are sending or receiving. If you do not use a VPN, they will be able to see that you are talking to the repository (and the repository will technically see who you are, too, though as above, it normally isn't interested). Someone more powerful, like your ISP or Government, may be able to do more:
If you just start a new server yourselfWhen you first make a server, the 'certificate' it creates to enable https is a low quality one. It is called 'self-signed' because it is only endorsed by itself and it is not tied to a particular domain on the internet that everyone agrees on via DNS. Your traffic to this server is still encrypted, but an advanced attacker who stands between you and the server could potentially perform what is called a man-in-the-middle attack and see your traffic.
This problem is fairly mitigated by using a VPN, since even if someone were able to MitM your connection, they know no more than your VPN's location, not your IP.
A future version of the network will further mitigate this problem by having you enter unverified certificates into a certificate manager and then compare to that store on future requests, to try to detect if a MitM attack is occurring.
If the server is on a domain and now uses a proper verified certificate If the admin hosts the server on a website domain (rather than a raw IP address) and gets a proper certificate for that domain from a service like Let's Encrypt, they can swap that into the server and then your traffic should be protected from any eavesdropper. It is still good to use a VPN to further obscure who you are, including from the server admin.You can check how good a server's certificate is by loading its base address in the form
"},{"location":"privacy.html#accounts","title":"accounts","text":"https://host:port
into your browser. If it has a nice certificate--like the PTR--the welcome page will load instantly. If it is still on self-signed, you'll get one of those 'can't show this page unless you make an exception' browser error pages before it will show.An account has two hex strings, like this:
-
Access key:
4a285629721ca442541ef2c15ea17d1f7f7578b0c3f4f5f2a05f8f0ab297786f
This is in your services->manage services panel, and acts like a password. Keep this absolutely secret--only you know it, and no one else ever needs to. If the server has not had its code changed, it does not actually know this string, but it is stores special data that lets it verify it when you 'log in'.
-
Account ID:
207d592682a7962564d52d2480f05e72a272443017553cedbd8af0fecc7b6e0a
This can be copied from a button in your services->review services panel, and acts a bit like a semi-private username. Only janitors should ever have access to this. If you ever want to contact the server admin about an account upgrade or similar, they will need to know this so they can load up your account and alter it.
When you generate a new account, the client first asks the server for a list of available auto-creatable account types, then asks for a registration token for one of them, then uses the token to generate an access key. The server is never told anything about you, and it forgets your IP address as soon as it finishes talking to you.
Your account also stores a bandwidth use record and some miscellaneous data such as when the account was created, if and when it expires, what permissions and bandwidth rules it has, an aggregate score of how often it has petitions approved rather than denied, and whether it is currently banned. I do not think someone inspecting the bandwidth record could figure out what you were doing based on byte counts (especially as with every new month the old month's bandwidth records are compressed to just one number) beyond the rough time you synced and whether you have done much uploading. Since only a janitor can see your account and could feasibly attempt to inspect bandwidth data, they would already know this information.
"},{"location":"privacy.html#downloading","title":"downloading","text":"When you sync with a repository, your client will download and then keep up to date with all the metadata the server knows. This metadata is downloaded the same way by all users, and it comes in a completely anonymous format. The server does not know what you are interested in, and no one who downloads knows who uploaded what. Since the client regularly updates, a detailed analysis of the raw update files will reveal roughly when a tag or other row was added or deleted, although that timestamp is no more precise than the duration of the update period (by default, 100,000 seconds, or a little over a day).
Your client will never ask the server for information about a particular file or tag. You download everything in generic chunks, form a local index of that information, and then all queries are performed on your own hard drive with your own CPU.
By just downloading, even if the server owner were to identify you by your IP address, all they know is that you sync. They cannot tell anything about your files.
In the case of a file repository, you client downloads all the thumbnails automatically, but then you download actual files separately as you like. The server does not log which files you download.
"},{"location":"privacy.html#uploading","title":"uploading","text":"When you upload, your account is temporarily linked to the rows of content you add. This is so janitors can group petitions by who makes them, undo large mistakes easily, and even leave you a brief message (like \"please stop adding those clothing siblings\") for your client to pick up the next time it syncs your account. After the temporary period is over, all submissions are anonymised. So, what are the privacy concerns with that? Isn't the account 'Anon'?
Privacy can be tricky. Hydrus tech is obviously far, far better than anything normal consumers use, but here I believe are the remaining barriers to pure Anonymity, assuming someone with resources was willing to put a lot of work in to attack you:
Note
I am using the PTR as the example since that is what most people are using. If you are uploading to a server run between friends, privacy is obviously more difficult to preserve--if there are only three users, it may not be too hard to figure out who is uploading the NarutoXSonichu diaperfur content! If you are talking to a server with a small group of users, don't upload anything crazy or personally identifying unless that's the point of the server.
"},{"location":"privacy.html#ip_address_across_network","title":"IP Address Across Network","text":"Attacker: ISP/Government.
Exposure: That you use the PTR.
Problem: Your IP address may be recorded by servers in between you and the PTR (e.g. your ISP/Government). Anyone who could convert that IP address and timestamp into your identity would know you were a PTR user.
Mitigation: Use a trustworthy VPN.
"},{"location":"privacy.html#ip_address_at_ptr","title":"IP Address At PTR","text":"Attacker: PTR administrator or someone else who has access to the server as it runs.
Exposure: Which PTR account you are.
Problem: I may be lying to you about the server forgetting IPs, or the admin running the PTR may have secretly altered its code. If the malicious admin were able to convert IP address and timestamp into your identity, they obviously be able to link that to your account and thus its various submissions.
Mitigation: Use a trustworthy VPN.
"},{"location":"privacy.html#time_identifiable_uploads","title":"Time Identifiable Uploads","text":"Attacker: Anyone with an account on the PTR.
Exposure: That you use the PTR.
Problem: If a tag was added way before the file was public, then it is likely the original owner tagged it. An example would be if you were an artist and you tagged your own work on the PTR two weeks before publishing the work. Anyone who looked through the server updates carefully and compared to file publish dates, particularly if they were targeting you already, could notice the date discrepancy and know you were a PTR user.
Mitigation: Don't tag any file you plan to share if you are currently the only person who has any copies. Upload it, then tag it.
"},{"location":"privacy.html#content_identifiable_uploads","title":"Content Identifiable Uploads","text":"Attacker: Anyone with an account on the PTR.
Exposure: That you use the PTR.
Problem: All uploads are shared anonymously with other users, but if the content itself is identifying, you may be exposed. An example would be if there was some popular lewd file floating around of you and your girlfriend, but no one knew who was in it. If you decided to tag it with accurate 'person:' tags, anyone synced with the PTR, when they next looked at that file, would see those person tags. The same would apply if the file was originally private but then leaked.
Mitigation: Just like an imageboard, do not upload any personally identifying information.
"},{"location":"privacy.html#individual_account_cross-referencing","title":"Individual Account Cross-referencing","text":"Attacker: PTR administrator or someone else with access to the server database files after one of your uploads has been connected to your real identity, perhaps with a Time/Content Identifiable Upload as above.
Exposure: What you have been uploading recently.
Problem: If you accidentally tie your identity to an individual content row (could be as simple as telling an admin 'yes, I, person whose name you know, uploaded that sibling last week'), then anyone who can see which accounts uploaded what will obviously be able to see your other uploads.
Mitigation: Best practise is to not to reveal specifically what you upload. Note that this vulnerability (an admin looking up what else you uploaded after they discover something else you did) is now well mitigated by the account history anonymisation as below (assuming the admin has not altered the code to disable it!). If the server is set to anonymise content after 90 days, then your account can only be identified from specific content rows that were uploaded in the past 90 days, and cross-references would also only see the last 90 days of activity.
"},{"location":"privacy.html#big_brain_individual_account_mapping_fingerprint_cross-referencing","title":"Big Brain Individual Account Mapping Fingerprint Cross-referencing","text":"Attacker: Someone who has access to tag/file favourite lists on another site and gets access to a hydrus repository that has been compromised to not anonymise history for a long duration.
Exposure: Which PTR account another website's account uses.
Problem: Someone who had raw access to the PTR database's historical account record (i.e. they had disabled the anonymisation routine below) and also had compiled some booru users' 'favourite tag/artist' lists and was very clever could try to cross reference those two lists and connect a particular PTR account to a particular booru account based on similar tag distributions. There would be many holes in the PTR record, since only the first account to upload a tag mapping is linked to it, but maybe it would be possible to get high confidence on a match if you have really distinct tastes. Favourites lists are probably decent digital fingerprints, and there may be a shadow of that in your PTR uploads, although I also think there are enough users uploading and 'competing' for saved records on different tags that each users' shadow would be too indistinct to really pull this off.
Mitigation: I am mostly memeing here. But privacy is tricky, and who knows what the scrapers of the future are going to do with all the cloud data they are sucking up. Even then, the historical anonymisation routine below now generally eliminates this threat, assuming the server has not been compromised to disable it, so it matters far less if its database files fall into bad hands in the future, but accounts on regular websites are already being aggregated by the big marketing engines, and this will only happen in more clever ways in future. I wouldn't be surprised if booru accounts are soon being connected to other online identities based on fingerprint profiles of likes and similar. Don't save your spicy favourites on a website, even if that list is private, since if that site gets hacked or just bought out one day, someone really smart could start connecting dots ten years from now.
"},{"location":"privacy.html#account_history","title":"account history anonymisation","text":"As the PTR moved to multiple accounts, we talked more about the potential account cross-referencing worries. The threats are marginal today, but it may be a real problem in future. If the server database files were to ever fall into bad hands, having a years-old record of who uploaded what is not excellent. Like the AOL search leak, that data may have unpleasant rammifications, especially to an intelligent scraper in the future. This historical record is also not needed for most janitorial work.
Therefore, hydrus repositories now completely anonymise all uploads after a certain delay. It works by assigning ownership of every file, mapping, or tag sibling/parent to a special 'null' account, so all trace that your account uploaded any of it is deleted. It happens by default 90 days after the content is uploaded, but it can be more or less depending on the local admin and janitors. You can see the current 'anonymisation' period under review services.
If you are a janitor with the ability to modify accounts based on uploaded content, you will see anything old will bring up the null account. It is specially labelled, so you can't miss it. You cannot ban or otherwise alter this account. No one can actually use it.
"},{"location":"reducing_lag.html","title":"reducing lag","text":""},{"location":"reducing_lag.html#intro","title":"hydrus is cpu and hdd hungry","text":"The hydrus client manages a lot of complicated data and gives you a lot of power over it. To add millions of files and tags to its database, and then to perform difficult searches over that information, it needs to use a lot of CPU time and hard drive time--sometimes in small laggy blips, and occasionally in big 100% CPU chunks. I don't put training wheels or limiters on the software either, so if you search for 300,000 files, the client will try to fetch that many.
Furthermore, I am just one unprofessional guy dealing with a lot of legacy code from when I was even worse at programming. I am always working to reduce lag and other inconveniences, and improve UI feedback when many things are going on, but there is still a lot for me to do.
In general, the client works best on snappy computers with low-latency hard drives where it does not have to constantly compete with other CPU- or HDD- heavy programs. Running hydrus on your games computer is no problem at all, but if you leave the client on all the time, then make sure under the options it is set not to do idle work while your CPU is busy, so your games can run freely. Similarly, if you run two clients on the same computer, you should have them set to work at different times, because if they both try to process 500,000 tags at once on the same hard drive, they will each slow to a crawl.
If you run on an HDD, keeping it defragged is very important, and good practice for all your programs anyway. Make sure you know what this is and that you do it.
"},{"location":"reducing_lag.html#maintenance_and_processing","title":"maintenance and processing","text":"I have attempted to offload most of the background maintenance of the client (which typically means repository processing and internal database defragging) to time when you are not using the client. This can either be 'idle time' or 'shutdown time'. The calculations for what these exactly mean are customisable in file->options->maintenance and processing.
If you run a quick computer, you likely don't have to change any of these options. Repositories will synchronise and the database will stay fairly optimal without you even noticing the work that is going on. This is especially true if you leave your client on all the time.
If you have an old, slower computer though, or if your hard drive is high latency, make sure these options are set for whatever is best for your situation. Turning off idle time completely is often helpful as some older computers are slow to even recognise--mid task--that you want to use the client again, or take too long to abandon a big task half way through. If you set your client to only do work on shutdown, then you can control exactly when that happens.
"},{"location":"reducing_lag.html#reducing_lag","title":"reducing search and general gui lag","text":"Searching for tags via the autocomplete dropdown and searching for files in general can sometimes take a very long time. It depends on many things. In general, the more predicates (tags and system:something) you have active for a search, and the more specific they are, the faster it will be.
You can also look at file->options->speed and memory. Increasing the autocomplete thresholds under tags->manage tag display and search is also often helpful. You can even force autocompletes to only fetch results when you manually ask for them.
Having lots of thumbnails open or downloads running can slow many things down. Check the 'pages' menu to see your current session weight. If it is about 50,000, or you have individual pages with more than 10,000 files or download URLs, try cutting down a bit.
"},{"location":"reducing_lag.html#profiles","title":"finally - profiles","text":"Programming is all about re-editing your first, second, third drafts of an idea. You are always going back to old code and adding new features or making it work better. If something is running slow for you, I can almost always speed it up or at least improve the way it schedules that chunk of work.
However figuring out exactly why something is running slow or holding up the UI is tricky and often gives an unexpected result. I can guess what might be running inefficiently from reports, but what I really need to be sure is a profile, which drills down into every function of a job, counting how many times they are called and timing how long they take. A profile for a single call looks like this.
So, please let me know:
- The general steps to reproduce the problem (e.g. \"Running system:numtags>4 is ridiculously slow on its own on 'all known tags'.\")
- Your client's approximate overall size (e.g. \"500k files, and it syncs to the PTR.\")
- The type of hard drive you are running hydrus from. (e.g. \"A 2TB 7200rpm drive that is 20% full. I regularly defrag it.\")
- Any profiles you have collected.
You can generate a profile by hitting help->debug->profiling->profile mode, which tells the client to generate profile information for almost all of its behind the scenes jobs. This can be spammy, so don't leave it on for a very long time (you can turn it off by hitting the help menu entry again).
Turn on profile mode, do the thing that runs slow for you (importing a file, fetching some tags, whatever), and then check your database folder (most likely install_dir/db) for a new 'client profile - DATE.log' file. This file will be filled with several sets of tables with timing information. Please send that whole file to me, or if it is too large, cut what seems important. It should not contain any personal information, but feel free to look through it.
There are several ways to contact me.
"},{"location":"running_from_source.html","title":"running from source","text":"I write the client and server entirely in python, which can run straight from source. It is getting simpler and simpler to run python programs like this, so don't be afraid of it. If none of the built packages work for you (for instance if you use Windows 8.1 or 18.04 Ubuntu (or equivalent)), it may be the only way you can get the program to run. Also, if you have a general interest in exploring the code or wish to otherwise modify the program, you will obviously need to do this.
"},{"location":"running_from_source.html#simple_setup_guide","title":"Simple Setup Guide","text":"There are now setup scripts that make this easy on Windows and Linux. You do not need any python experience.
"},{"location":"running_from_source.html#summary","title":"Summary:","text":"- Get Python.
- Get Hydrus source.
- Get mpv/SQLite/FFMPEG.
- Run setup_venv script.
- Run setup_help script.
- Run client script.
First of all, you will need git. If you are just a normal Windows user, you will not have it. Get it:
Git for WindowsGit is an excellent tool for synchronising code across platforms. Instead of downloading and extracting the whole .zip every time you want to update, it allows you to just run one line and all the code updates are applied in about three seconds. You can also run special versions of the program, or test out changes I committed two minutes ago without having to wait for me to make a whole build. You don't have to, but I recommend you get it.
Installing it is simple, but it can be intimidating. These are a bunch of very clever tools coming over from Linux-land, and the installer has a 10+ page wizard with several technical questions. Luckily, the 'default' is broadly fine, but I'll write everything out so you can follow along. I can't promise this list will stay perfectly up to date, so let me know if there is something complex and new you don't understand. This is also a record that I can refer to when I set up a new machine.
- First off, get it here. Run the installer.
- On the first page, with checkboxes, I recommend you uncheck 'Windows Explorer Integration', with its 'Open Git xxx here' sub-checkboxes. This stuff will just be annoying for our purposes.
- Then set your text editor. Select the one you use, and if you don't recognise anything, set 'notepad'.
- Now we enter the meat of the wizard pages. Everything except the default console window is best left as default:
Let Git decide
on using \"master\" as the default main branch nameGit from the command line and also from 3rd-party software
Use bundled OpenSSH
Use the OpenSSL library
Checkout Windows-style, commit Unix-style line endings
- (NOT DEFAULT)
Use Windows' default console window
. Let's keep things simple, but it isn't a big deal. Fast-forward or merge
Git Credential Manager
- Do
Enable file system caching
/Do notEnable symbolic links
- Do not enable experimental stuff
Git should now be installed on your system. Any new terminal/command line/powershell window (shift+right-click on any folder and hit something like 'Open in terminal') now has the
Windows 7git
command!For a long time, I supported Windows 7 via running from source. Unfortunately, as libraries and code inevitably updated, this finally seems to no longer be feasible. Python 3.8 will no longer run the program. I understand v582 is one of the last versions of the program to work.
First, you will have to install the older Python 3.8, since that is the latest version that you can run.
Then, later, when you do the
git clone https://github.com/hydrusnetwork/hydrus
line, you will need to rungit checkout tags/v578
, which will rewind you to that point in time.You will also need to navigate to
install_dir/static/requirements/advanced
and editrequirements_core.txt
; remove the 'psd-tools' line before you run setup_venv.I can't promise anything though. The requirements.txt isn't perfect, and something else may break in future! You may like to think about setting up a Linux instance.
Then you will need to install Python. Get 3.10 or 3.11 here. During the install process, make sure it has something like 'Add Python to PATH' checked. This makes Python available everywhere in Windows.
You should already have a fairly new python. Ideally, you want at least 3.9. You can find out what version you have just by opening a new terminal and typing 'python'.
You should already have a fairly new python. Ideally, you want at least 3.9. You can find out what version you have just by opening a new terminal and typing 'python'.
If you are already on newer python, like 3.12+, that's ok--you might need to select the 'advanced' setup later on and choose the '(t)est' options. If you are stuck on 3.9, try the same thing, but with the '(o)lder' options (but I can't promise it will work!).
Then, get the hydrus source. It is best to get it with Git: make a new folder somewhere, open a terminal in it, and then paste:
git clone https://github.com/hydrusnetwork/hydrus\n
The whole repository will be copied to that location--this is now your install dir. You can move it if you like.
If Git is not available, then just go to the latest release and download and extract the source code .zip somewhere.
Read-only install locations
Make sure the install directory has convenient write permissions (e.g. on Windows, don't put it in \"Program Files\"). Extracting straight to a spare drive, something like \"D:\\Hydrus Network\", is ideal.
We will call the base extract directory, the one with 'hydrus_client.py' in it,
install_dir
.Mixed Builds
Don't mix and match build extracts and source extracts. The process that runs the code gets confused if there are unexpected extra .dlls in the directory. If you need to convert between built and source releases, perform a clean install.
If you are converting from one install type to another, make a backup before you start. Then, if it all goes wrong, you'll always have a safe backup to rollback to.
"},{"location":"running_from_source.html#built_programs","title":"Built Programs","text":"There are three special external libraries. You just have to get them and put them in the correct place:
WindowsLinuxmacOS-
mpv
- If you are on Windows 8.1 or older, this is known safe.
- If you are on Windows 10 or newer and want the very safe answer, try this.
- Otherwise, go for this.
- I have been testing this newer version and this very new version and things seem to be fine too, at least on updated Windows.
Then open that archive and place the 'mpv-1.dll'/'mpv-2.dll'/'libmpv-2.dll' into
mpv on older Windowsinstall_dir
.I have word that that newer mpv, the API version 2.1 that you have to rename to mpv-2.dll, will work on Qt5 and Windows 7. If this applies to you, have a play around with different versions here. You'll need the newer mpv choice in the setup-venv script however, which, depending on your situation, may not be possible.
-
SQLite3
Go to
install_dir/static/build_files/windows
and copy 'sqlite3.dll' intoinstall_dir
. -
FFMPEG
Get a Windows build of FFMPEG here.
Extract the ffmpeg.exe into
install_dir/bin
.
-
mpv
Try running
apt-get install libmpv1
in a new terminal. You can typeapt show libmpv1
to see your current version. Or, if you use a different package manager, try searchinglibmpv
orlibmpv1
on that.- If you have earlier than 0.34.1, you will be looking at running the 'advanced' setup in the next section and selecting the 'old' mpv.
- If you have 0.34.1 or later, you can run the normal setup script.
-
SQLite3
No action needed.
-
FFMPEG
You should already have ffmpeg. Just type
ffmpeg
into a new terminal, and it should give a basic version response. If you somehow don't have ffmpeg, check your package manager.
-
mpv
Unfortunately, mpv is not well supported in macOS yet. You may be able to install it in brew, but it seems to freeze the client as soon as it is loaded. Hydev is thinking about fixes here.
-
SQLite3
No action needed.
-
FFMPEG
You should already have ffmpeg.
Double-click
setup_venv.bat
.The file is
setup_venv.sh
. You may be able to double-click it. If not, open a terminal in the folder and type:./setup_venv.sh
If you do not have permission to execute the file, do this before trying again:
chmod +x setup_venv.sh
You will likely have to do the same on the other .sh files.
If you get an error about the venv failing to activate during
setup_venv.sh
, you may need to install venv especially for your system. The specific error message should help you out, but you'll be looking at something along the lines ofapt install python3.10-venv
.If you like, you can run the
setup_desktop.sh
file to install an io.github.hydrusnetwork.hydrus.desktop file to your applications folder. (Or check the template ininstall_dir/static/io.github.hydrusnetwork.hydrus.desktop
and do it yourself!)Double-click
setup_venv.command
.If you do not have permission to run the .command file, then open a terminal on the folder and enter:
chmod +x setup_venv.command
You will likely have to do the same on the other .command files.
You may need to experiment with the advanced choices, especially if your macOS is a litle old.
The setup will ask you some questions. Just type the letters it asks for and hit enter. Most users are looking at the (s)imple setup, but if your situation is unusual, try the (a)dvanced, which will walk you through the main decisions. Once ready, it should take a minute to download its packages and a couple minutes to install them. Do not close it until it is finished installing everything and says 'Done!'. If it seems like it hung, just give it time to finish.
If something messes up, or you want to make a different decision, just run the setup script again and it will reinstall everything. Everything these scripts do ends up in the 'venv' directory, so you can also just delete that folder to 'uninstall' the venv. It should just work on most normal computers, but let me know if you have any trouble.
Then run the 'setup_help' script to build the help. This isn't necessary, but it is nice to have it built locally. You can run this again at any time to rebuild the current help.
"},{"location":"running_from_source.html#running_it_1","title":"Running it","text":"WindowsLinuxmacOSRun 'hydrus_client.bat' to start the client.
Qt compatibility
If you run into trouble running newer versions of Qt6, some users have fixed it by installing the packages
libicu-dev
andlibxcb-cursor-dev
. Withapt
that will be:sudo apt-get install libicu-dev
sudo apt-get install libxcb-cursor-dev
If you still have trouble with the default Qt6 version, try running setup_venv again and choose a different version. There are several to choose from, including (w)riting a custom version. Check the advanced requirements.txts files in
install_dir/static/requirements/advanced
for more info, and you can also work off this list: PySide6Run 'hydrus_client.sh' to start the client. Don't forget to set
chmod +x hydrus_client.sh
if you need it.Run 'hydrus_client.command' to start the client. Don't forget to set
chmod +x hydrus_client.command
if you need it.The first start will take a little longer (it has to compile all the code into something your computer understands). Once up, it will operate just like a normal build with the same folder structure and so on.
Missing a Library
If the client fails to boot, it should place a 'hydrus_crash.log' in your 'db' directory or your desktop, or, if it got far enough, it may write the error straight to the 'client - date.log' file in your db directory.
If that error talks about a missing library, try reinstalling your venv. Are you sure it finished correctly? Do you need to run the advanced setup and select a different version of Qt?
WindowsLinuxmacOSIf you want to redirect your database or use any other launch arguments, then copy 'hydrus_client.bat' to 'hydrus_client-user.bat' and edit it, inserting your desired db path. Run this instead of 'hydrus_client.bat'. New
git pull
commands will not affect 'hydrus_client-user.bat'.You probably can't pin your .bat file to your Taskbar or Start (and if you try and pin the running program to your taskbar, its icon may revert to Python), but you can make a shortcut to the .bat file, pin that to Start, and in its properties set a custom icon. There's a nice hydrus one in
install_dir/static
.However, some versions of Windows won't let you pin a shortcut to a bat to the start menu. In this case, make a shortcut like this:
C:\\Windows\\System32\\cmd.exe /c \"C:\\hydrus\\Hydrus Source\\hydrus_client-user.bat\"
This is a shortcut to tell the terminal to run the bat; it should be pinnable to start. You can give it a nice name and the hydrus icon and you should be good!
If you want to redirect your database or use any other launch arguments, then copy 'hydrus_client.sh' to 'hydrus_client-user.sh' and edit it, inserting your desired db path. Run this instead of 'hydrus_client.sh'. New
git pull
commands will not affect 'hydrus_client-user.sh'.If you want to redirect your database or use any other launch arguments, then copy 'hydrus_client.command' to 'hydrus_client-user.command' and edit it, inserting your desired db path. Run this instead of 'hydrus_client.command'. New
"},{"location":"running_from_source.html#simple_updating_guide","title":"Simple Updating Guide","text":"git pull
commands will not affect 'hydrus_client-user.command'.To update, you do the same thing as for the extract builds.
- If you installed by extracting the source zip, then download the latest release source zip and extract it over the top of the folder you have, overwriting the existing source files.
- If you installed with git, then just run
git pull
as normal. I have added easy 'git_pull' scripts to the install directory for your convenience (on Windows, just double-click 'git_pull.bat').
If you get a library version error when you try to boot, run the venv setup again. It is worth doing this anyway, every now and then, just to stay up to date.
"},{"location":"running_from_source.html#migrating_from_an_existing_install","title":"Migrating from an Existing Install","text":"Many users start out using one of the official built releases and decide to move to source. There is lots of information here about how to migrate the database, but for your purposes, the simple method is this:
If you never moved your database to another place and do not use -d/--db_dir launch parameter
- Follow the above guide to get the source install working in a new folder on a fresh database
- MAKE A BACKUP OF EVERYTHING
- Delete everything from the source install's
db
directory. - Move your built release's entire
db
directory to the source. - Run your source release again--it should load your old db no problem!
- Update your backup routine to point to the new source install location.
If you moved your database to another location and use the -d/--db_dir launch parameter
- Follow the above guide to get the source install working in a new folder on a fresh database (without -db_dir)
- MAKE A BACKUP OF EVERYTHING
- Just to be neat, delete the .db files, .log files, and client_files folder from the source install's
db
directory. - Run the source install with --db_dir just as you would the built executable--it should load your old db no problem!
This is for advanced users only.
If you have never used python before, do not try this. If the easy setup scripts failed for you and you don't know what happened, please contact hydev before trying this, as the thing that went wrong there will probably go much more wrong here.
You can also set up the environment yourself. Inside the extract should be hydrus_client.py and hydrus_server.py. You will be treating these basically the same as the 'client' and 'server' executables--with the right environment, you should be able to launch them the same way and they take the same launch parameters as the exes.
Hydrus needs a whole bunch of libraries, so let's now set your python up. I strongly recommend you create a virtual environment. It is easy and doesn't mess up your system python.
You have to do this in the correct order! Do not switch things up. If you make a mistake, delete your venv folder and start over from the beginning.
To create a new venv environment:
- Open a terminal at your hydrus extract folder. If
python3
doesn't work, usepython
. python3 -m pip install virtualenv
(if you need it)python3 -m venv venv
source venv/bin/activate
(CALL venv\\Scripts\\activate.bat
in Windows cmd)python -m pip install --upgrade pip
python -m pip install --upgrade wheel
venvs
That
source venv/bin/activate
line turns on your venv. You should see your terminal prompt note you are now in it. A venv is an isolated environment of python that you can install modules to without worrying about breaking something system-wide. Ideally, you do not want to install python modules to your system python.This activate line will be needed every time you alter your venv or run the
hydrus_client.py
/hydrus_server.py
files. You can easily tuck this into a launch script--check the easy setup files for examples.On Windows Powershell, the command is
.\\venv\\Scripts\\activate
, but you may find the whole deal is done much easier in cmd than Powershell. When in Powershell, just typecmd
to get an old fashioned command line. In cmd, the launch command is justvenv\\scripts\\activate.bat
, no leading period.After you have activated the venv, you can use pip to install everything you need to it from the requirements.txt in the install_dir:
python -m pip install -r requirements.txt\n
If you need different versions of libraries, check the cut-up requirements.txts the 'advanced' easy-setup uses in
"},{"location":"running_from_source.html#qt","title":"Qt","text":"install_dir/static/requirements/advanced
. Check and compare their contents to the main requirements.txt to see what is going on. You'll likely need the newer OpenCV on Python 3.10, for instance.Qt is the UI library. You can run PySide2, PySide6, PyQt5, or PyQt6. A wrapper library called
qtpy
allows this. The default is PySide6, but if it is missing, qtpy will fall back to an available alternative. For PyQt5 or PyQt6, you need an extra Chart module, so go:python -m pip install qtpy PyQtChart PyQt5\n-or-\npython -m pip install qtpy PyQt6-Charts PyQt6\n
If you have multiple Qts installed, then select which one you want to use by setting the
QT_API
environment variable to 'pyside2', 'pyside6', 'pyqt5', or 'pyqt6'. Check help->about to make sure it loaded the right one.If you want to set QT_API in a batch file, do this:
set QT_API=pyqt6
If you run <= Windows 8.1 or Ubuntu 18.04, you cannot run Qt6. Try PySide2 or PyQt5.
Qt compatibility
If you run into trouble running newer versions of Qt6 on Linux, often with an XCB-related error such as
qt.qpa.plugin: Could not load the Qt platform plugin \"xcb\" in \"\" even though it was found.
, try installing the packageslibicu-dev
andlibxcb-cursor-dev
. Withapt
that will be:sudo apt-get install libicu-dev
sudo apt-get install libxcb-cursor-dev
If you still have trouble with the default Qt6 version, check the advanced requirements.txts in
"},{"location":"running_from_source.html#mpv","title":"mpv","text":"install_dir/static/requirements/advanced
. There should be several older version examples you can explore, and you can also work off these lists: PySide6 PyQt6 PySide2 Pyqt5MPV is optional and complicated, but it is great, so it is worth the time to figure out!
As well as the python wrapper, 'python-mpv' (which is in the requirements.txt), you also need the underlying dev library. This is not mpv the program, but 'libmpv', often called 'libmpv1'.
For Windows, the dll builds are here, although getting a stable version can be difficult. Just put it in your hydrus base install directory. Check the links in the easy-setup guide above for good versions. You can also just grab the 'mpv-1.dll'/'mpv-2.dll' I bundle in my extractable Windows release.
If you are on Linux, you can usually get 'libmpv1' like so:
apt-get install libmpv1
On macOS, you should be able to get it with
brew install mpv
, but you are likely to find mpv crashes the program when it tries to load. Hydev is working on this, but it will probably need a completely different render API.Hit help->about to see your mpv status. If you don't have it, it will present an error popup box with more info.
"},{"location":"running_from_source.html#sqlite","title":"SQLite","text":"If you can, update python's SQLite--it'll improve performance. The SQLite that comes with stock python is usually quite old, so you'll get a significant boost in speed. In some python deployments, the built-in SQLite not compiled with neat features like Fast Text Search (FTS) that hydrus needs.
On Windows, get the 64-bit sqlite3.dll here, and just drop it in your base install directory. You can also just grab the 'sqlite3.dll' I bundle in my extractable Windows release.
You may be able to update your SQLite on Linux or macOS with:
apt-get install libsqlite3-dev
- (activate your venv)
python -m pip install pysqlite3
But as long as the program launches, it usually isn't a big deal.
Extremely safe no way it can go wrong
If you want to update SQLite for your Windows system python install, you can also drop it into
C:\\Program Files\\Python310\\DLLs
or wherever you have python installed, and it'll update for all your python projects. You'll be overwriting the old file, so make a backup of the old one (I have never had trouble updating like this, however).A user who made a Windows venv with Anaconda reported they had to replace the sqlite3.dll in their conda env at
"},{"location":"running_from_source.html#ffmpeg","title":"FFMPEG","text":"~/.conda/envs/<envname>/Library/bin/sqlite3.dll
.If you don't have FFMPEG in your PATH and you want to import anything more fun than jpegs, you will need to put a static FFMPEG executable in your PATH or the
"},{"location":"running_from_source.html#running_it","title":"Running It","text":"install_dir/bin
directory. This should always point to a new build for Windows. Alternately, you can just copy the exe from one of my extractable Windows releases.Once you have everything set up, hydrus_client.py and hydrus_server.py should look for and run off client.db and server.db just like the executables. You can use the 'hydrus_client.bat/sh/command' scripts in the install dir or use them as inspiration for your own. In any case, you are looking at entering something like this into the terminal:
source venv/bin/activate\npython hydrus_client.py\n
This will use the 'db' directory for your database by default, but you can use the launch arguments just like for the executables. For example, this could be your client-user.sh file:
"},{"location":"running_from_source.html#building_these_docs","title":"Building these Docs","text":"#!/bin/bash\n\nsource venv/bin/activate\npython hydrus_client.py -d=\"/path/to/database\"\n
When running from source you may want to build the hydrus help docs yourself. You can also check the
"},{"location":"running_from_source.html#windows_build","title":"Building Packages on Windows","text":"setup_help
scripts in the install directory.Almost everything you get through pip is provided as pre-compiled 'wheels' these days, but if you get an error about Visual Studio C++ when you try to pip something, you have two choices:
- Get Visual Studio 14/whatever build tools
- Pick a different library version
Option B is always simpler. If opencv-headless as the requirements.txt specifies won't compile in your python, then try a newer version--there will probably be one of these new highly compatible wheels and it'll just work in seconds. Check my build scripts and various requirements.txts for ideas on what versions to try for your python etc...
If you are confident you need Visual Studio tools, then prepare for headaches. Although the tools are free from Microsoft, it can be a pain to get them through the official (and often huge) downloader installer from Microsoft. Expect a 5GB+ install with an eye-watering number of checkboxes that probably needs some stackexchange searches to figure out.
On Windows 10, Chocolatey has been the easy answer. These can be useful:
choco install -y vcredist-all\nchoco install -y vcbuildtools (this is Visual Studio 2015)\nchoco install -y visualstudio2017buildtools\nchoco install -y visualstudio2022buildtools\nchoco install -y windows-sdk-10.0\n
Update: On Windows 11, I have had some trouble with the above. The VS2015 seems not to install any more. A basic stock Win 11 install with Python 3.10 or 3.11 is fine getting everything on our requirements, but freezing with PyInstaller may have trouble finding certain 'api-***.dll' files. I am now trying to figure this out with my latest dev machine as of 2024-01. If you try this, let me know what you find out!
"},{"location":"running_from_source.html#additional_windows","title":"Additional Windows Info","text":"This does not matter much any more, but in the old days, building modules like lz4 and lxml was a complete nightmare, and hooking up Visual Studio was even more difficult. This page has a lot of prebuilt binaries--I have found it very helpful many times.
I have a fair bit of experience with Windows python, so send me a mail if you need help.
"},{"location":"running_from_source.html#my_code","title":"My Code","text":"I develop hydrus on and am most experienced with Windows, so the program is more stable and reasonable on that. I do not have as much experience with Linux or macOS, but I still appreciate and will work on your Linux/macOS bug reports.
My coding style is unusual and unprofessional. Everything is pretty much hacked together. If you are interested in how things work, please do look through the source and ask me if you don't understand something.
I'm constantly throwing new code together and then cleaning and overhauling it down the line. I work strictly alone. While I am very interested in detailed bug reports or suggestions for good libraries to use, I am not looking for pull requests or suggestions on style. I know a lot of things are a mess. Everything I do is WTFPL, so feel free to fork and play around with things on your end as much as you like.
"},{"location":"server.html","title":"running your own server","text":"Note
You do not need the server to do anything with hydrus! It is only for advanced users to do very specific jobs! The server is also hacked-together and quite technical. It requires a fair amount of experience with the client and its concepts, and it does not operate on a timescale that works well on a LAN. Only try running your own server once you have a bit of experience synchronising with something like the PTR and you think, 'Hey, I know exactly what that does, and I would like one!'
Here is a document put together by a user describing whether you want the server.
"},{"location":"server.html#intro","title":"setting up a server","text":"I will use two terms, server and service, to mean two distinct things:
- A server is an instantiation of the hydrus server executable (e.g. hydrus_server.exe in Windows). It has a complicated and flexible database that can run many different services in parallel.
- A service sits on a port (e.g. 45871) and responds to certain http requests (e.g.
/file
or/update
) that the hydrus client can plug into. A service might be a repository for a certain kind of data, the administration interface to manage what services run on a server, or anything else.
Setting up a hydrus server is easy compared to, say, Apache. There are no .conf files to mess about with, and everything is controlled through the client. When started, the server will place an icon in your system tray in Windows or open a small frame in Linux or macOS. To close the server, either right-click the system tray icon and select exit, or just close the frame.
The basic process for setting up a server is:
- Start the server.
- Set up your client with its address and initialise the admin account
- Set the server's options and services.
- Make some accounts for your users.
- ???
- Profit
Let's look at these steps in more detail:
"},{"location":"server.html#start","title":"start the server","text":"Since the server and client have so much common code, I package them together. If you have the client, you have the server. If you installed in Windows, you can hit the shortcut in your start menu. Otherwise, go straight to 'hydrus_server' or 'hydrus_server.exe' or 'hydrus_server.py' in your installation directory. The program will first try to take port 45870 for its administration interface, so make sure that is free. Open your firewall as appropriate.
"},{"location":"server.html#setting_up_the_client","title":"set up the client","text":"In the services->manage services dialog, add a new 'hydrus server administration service' and set up the basic options as appropriate. If you are running the server on the same computer as the client, its hostname is 'localhost'.
In order to set up the first admin account and an access key, use 'init' as a registration token. This special registration token will only work to initialise this first super-account.
YOU'LL WANT TO SAVE YOUR ACCESS KEY IN A SAFE PLACE
If you lose your admin access key, there is no way to get it back, and if you are not sqlite-proficient, you'll have to restart from the beginning by deleting your server's database files.
If the client can't connect to the server, it is either not running or you have a firewall/port-mapping problem. If you want a quick way to test the server's visibility, just put
"},{"location":"server.html#setting_up_the_server","title":"set up the server","text":"https://host:port
into your browser (make sure it is https! http will not work)--if it is working, your browser will probably complain about its self-signed https certificate. Once you add a certificate exception, the server should return some simple html identifying itself.You should have a new submenu, 'administrate services', under 'services', in the client gui. This is where you control most server and service-wide stuff.
admin->your server->manage services lets you add, edit, and delete the services your server runs. Every time you add one, you will also be added as that service's first administrator, and the admin menu will gain a new entry for it.
"},{"location":"server.html#making_accounts","title":"making accounts","text":"Go admin->your service->create new accounts to create new registration tokens. Send the registration tokens to the users you want to give these new accounts. A registration token will only work once, so if you want to give several people the same account, they will have to share the access key amongst themselves once one of them has registered the account. (Or you can register the account yourself and send them all the same access key. Do what you like!)
Go admin->manage account types to add, remove, or edit account types. Make sure everyone has at least downloader (get_data) permissions so they can stay synchronised.
You can create as many accounts of whatever kind you like. Depending on your usage scenario, you may want to have all uploaders, one uploader and many downloaders, or just a single administrator. There are many combinations.
"},{"location":"server.html#have_fun","title":"???","text":"The most important part is to have fun! There are no losers on the INFORMATION SUPERHIGHWAY.
"},{"location":"server.html#profit","title":"profit","text":"I honestly hope you can get some benefit out of my code, whether just as a backup or as part of a far more complex system. Please mail me your comments as I am always keen to make improvements.
"},{"location":"server.html#backing_up","title":"btw, how to backup a repo's db","text":"All of a server's files and options are stored in its accompanying .db file and respective subdirectories, which are created on first startup (just like with the client). To backup or restore, you have two options:
- Shut down the server, copy the database files and directories, then restart it. This is the only way, currently, to restore a db.
- In the client, hit admin->your server->make a backup. This will lock the db server-side while it makes a copy of everything server-related to
server_install_dir/db/server_backup
. When the operation is complete, you can ftp/batch-copy/whatever the server_backup folder wherever you like.
If you get to a point where you can no longer boot the repository, try running SQLite Studio and opening server.db. If the issue is simple--like manually changing the port number--you may be in luck. Send me an email if it is tricky.
Remember that everything is breaking all the time. Make regular backups, and you'll minimise your problems.
"},{"location":"support.html","title":"Financial Support","text":""},{"location":"support.html#support","title":"can I contribute to hydrus development?","text":"I do not expect anything from anyone. I'm amazed and grateful that anyone wants to use my software and share tags with others. I enjoy the feedback and work, and I hope to keep putting completely free weekly releases out as long as there is more to do.
That said, as I have developed the software, several users have kindly offered to contribute money, either as thanks for a specific feature or just in general. I kept putting the thought off, but I eventually got over my hesitance and set something up.
I find the tactics of most internet fundraising very distasteful, especially when they promise something they then fail to deliver. I much prefer the 'if you like me and would like to contribute, then please do, meanwhile I'll keep doing what I do' model. I support several 'put out regular free content' creators on Patreon in this way, and I get a lot out of it, even though I have no direct reward beyond the knowledge that I helped some people do something neat.
If you feel the same way about my work, I've set up a simple Patreon page here. If you can help out, it is deeply appreciated.
"},{"location":"wine.html","title":"running a client or server in wine","text":"Several Linux and macOS users have found success running hydrus with Wine. Here is a post from a Linux dude:
Some things I picked up on after extended use:
- Wine is kinda retarded sometimes, do not try to close the window by pressing the red close button, while in fullscreen.
- It will just \"go through\" it, and do whatever to whats behind it.
- Flash do work, IF you download the internet explorer version, and install it through wine.
- Hydrus is selfcontained, and portable. That means that one instance of hydrus do not know what another is doing. This is great if you want different installations for different things.
- Some of the input fields behave a little wonky. Though that may just be standard Hydrus behavior.
- Mostly everything else works fine. I was able to connect to the test server and view there. Only thing I need to test is the ability to host a server.
Installation process:
- Get a standard Wine installation.
- Download the latest hydrus .zip file.
- Unpack it with your chosen zip file opener, in the chosen folder. Do not need to be in the wine folder.
- Run it with wine, either though the file manager, or though the terminal.
- For Flash support install the IE version through wine.
If you get the client running in Wine, please let me know how you get on!
"},{"location":"youDontWantTheServer.html","title":"You don't want the server","text":"The hydrus_server.exe/hydrus_server.py is the victim of many a misconception. You don't need to use the server to use Hydrus. The vast majority of features are contained in the client itself so if you're new to Hydrus, just use that.
The server is only really useful for a few specific cases which will not apply for the vast majority of users.
"},{"location":"youDontWantTheServer.html#the_server","title":"The server","text":"The Hydrus server doesn't really work as most people envision a server working. Rather than on-demand viewing, when you link with a Hydrus server, you synchronise a complete copy of all its data. For the tag repository, you download every single tag it has ever been told about. For the file repository, you download the whole file list, related file info, and every single thumbnail, which lets you browse the whole repository in your client in a regular search page--to view files in the media viewer, you need to download and import them specifically.
"},{"location":"youDontWantTheServer.html#you_dont_want_the_server_probably","title":"You don't want the server (probably)","text":"Do you want to remotely view your files? You don't want the server.
Do you want to host your files on another computer since your daily driver don't have a lot of storage space? You don't want the server.
Do you want to use multiple clients and have everything synced between them? You don't want the server.
Do you want to expose API for Hydrus Web, Hydroid, or some other third-party tool? You don't want the server.
Do you want to share some files and/or tags in a small group of friends? You might actually want the server.
"},{"location":"youDontWantTheServer.html#the_options","title":"The options","text":"Now, you're not the first person to have any of the above ideas and some of the thinkers even had enough programming know-how to make something for it. Below is a list of some options, see this page for a few more.
"},{"location":"youDontWantTheServer.html#hydrus_web","title":"Hydrus Web","text":"- Lets you browse and manage your collection.
- Lets you browse and manage your collection.
- Lets you browse your collection.
- Lets you host your files on another drive, even on another computer in the network.
The hydrus network client is a desktop application written for Anonymous and other internet enthusiasts with large media collections. It organises your files into an internal database and browses them with tags instead of folders, a little like a booru on your desktop. Tags and files can be anonymously shared through custom servers that any user may run. Everything is free, nothing phones home, and the source code is included with the release. It is developed mostly for Windows, but builds for Linux and macOS are available (perhaps with some limitations, depending on your situation).
The software is constantly being improved. I try to put out a new release every Wednesday by 8pm Eastern.
Hydrus supports various filetypes for images, video and audio files, image project files, and more. A full list of supported filetypes is here.
On the Windows and Linux builds, an MPV window is embedded to play video and audio smoothly. For files like pdf, which cannot currently be viewed in the client, it is easy to launch any file with your OS's default program.
The client can download files and parse tags from a number of websites, including by default:
- 4chan and other imageboards, with a thread watcher
- the popular boorus
- gallery sites like deviant art, hentai foundry, and pixiv
- tumblr and twitter
And can be extended to download from more locations using easily shareable user-made downloaders. It can also be set to 'subscribe' to any gallery search, repeating it every few days to keep up with new results.
The program's emphasis is on your freedom. There is no DRM, no spying, no censorship. The program never phones home.
"},{"location":"index.html#start_here","title":"Start Here","text":"If you would like to try hydrus, I strongly recommend you check out the help and getting started guide. It will take you through all the main systems.
"},{"location":"index.html#links","title":"links","text":"- homepage
- github (latest build)
- issue tracker
- 8chan.moe /t/ (Hydrus Network General)
- tumblr
- x
- discord
- patreon
- user-run repository and wiki (including download presets for several non-default boorus)
- more links and contact info
- Hydrus crashes without a crash log
- Standard error reads
Killed
- System logs say OOMKiller
- Programs appear to havevery high virtual memory utilization despite low real memory.
Add the followng line to the end of
/etc/sysctl.conf
. You will need admin, so usesudo nano /etc/sysctl.conf
orsudo gedit /etc/sysctl.conf
vm.min_free_kbytes=1153434\nvm.overcommit_memory=1\n
Check that you have (enough) swap space or you might still run out of memory.
sudo swapon --show\n
If you need swap
Add tosudo fallocate -l 16G /swapfile #make 16GiB of swap\nsudo chmod 600 /swapfile\nsudo mkswap /swapfile\n
/etc/fstab
so your swap is mounted on reboot/swapfile swap swap defaults 0 0\n
You may add as many swapfiles as you like, and should add a new swapfile before you delete an old one if you plan to do so, as unmounting a swapfile will evict its contents back in to real memory. You may also wish to use a swapfile type that uses compression, this saves you some disk space for a little bit of a performance hit, but also significantly saves on mostly empty memory.
Reboot for all changes to take effect, or use
"},{"location":"Fixing_Hydrus_Random_Crashes_Under_Linux.html#details","title":"Details","text":"sysctl
to setvm
variables.Linux's memory allocator is lazy and does not perform opportunistic reclaim. This means that the system will continue to give your process memory from the real and virtual memory pool(swap) until there is none left.
Linux will only cleanup if the available total real and virtual memory falls below the watermark as defined in the system control configuration file
/etc/sysctl.conf
. The watermark's name isvm.min_free_kbytes
, it is the number of kilobytes the system keeps in reserve, and therefore the maximum amount of memory the system can allocate in one go before needing to reclaim memory it gave eariler but which is no longer in use.The default value is
vm.min_free_kbytes=65536
, which means 66MiB (megabytes).If for a given request the amount of memory asked to be allocated is under
vm.min_free_kbytes
, but this would result in an ammount of total free memory less thanvm.min_free_kbytes
then the OS will clean up memory to service the request.If
vm.min_free_kbytes
is less than the ammount requested and there is no virtual memory left, then the system is officially unable to service the request and will lauch the OOMKiller (Out of Memory Killer) to free memory by kiling memory glut processes.Increase the
"},{"location":"Fixing_Hydrus_Random_Crashes_Under_Linux.html#the_oom_killer","title":"The OOM Killer","text":"vm.min_free_kbytes
value to prevent this scenario.The OOM kill decides which program to kill to reclaim memory, since hydrus loves memory it is usually picked first, even if another program asking for memory caused the OOM condition. Setting the minimum free kilobytes higher will avoid the running of the OOMkiller which is always preferable, and almost always preventable.
"},{"location":"Fixing_Hydrus_Random_Crashes_Under_Linux.html#memory_overcommmit","title":"Memory Overcommmit","text":"We mentioned that Linux will keep giving out memory, but actually it's possible for Linux to launch the OOM killer if it just feel like our program is aking for too much memory too quickly. Since hydrus is a heavyweight scientific processing package we need to turn this feature off. To turn it off change the value of
vm.overcommit_memory
which defaults to2
.Set
"},{"location":"Fixing_Hydrus_Random_Crashes_Under_Linux.html#what_about_swappiness","title":"What about swappiness?","text":"vm.overcommit_memory=1
this prevents the OS from using a heuristic and it will just always give memory to anyone who asks for it.Swapiness is a setting you might have seen, but it only determines Linux's desire to spend a little bit of time moving memory you haven't touched in a while out of real memory and into virtual memory, it will not prevent the OOM condition it just determines how much time to use for moving things into swap.
"},{"location":"Fixing_Hydrus_Random_Crashes_Under_Linux.html#why_does_my_linux_system_studder_or_become_unresponsive_when_hydrus_has_been_running_a_while","title":"Why does my Linux system studder or become unresponsive when hydrus has been running a while?","text":"You are running out of pages because Linux releases I/O buffer pages only when a file is closed, OR memory fragmentation in Hydrus is high because you have a big session weight or had a big I/O spike. Thus the OS is waiting for you to hit the watermark(as described in \"why is hydrus crashing\") to start freeing pages, which causes the chug.
When contents is written from memory to disk the page is retained so that if you reread that part of the disk the OS does not need to access disk it just pulls it from the much faster memory. This is usually a good thing, but Hydrus makes many small writes to files you probably wont be asking for again soon it eats up pages over time.
Hydrus also holds the database open and red/wrires new areas to it often even if it will not acess those parts again for ages. It tends to accumulate lots of I/O cache for these small pages it will not be interested in. This is really good for hydrus (because it will over time have the root of the most important indexes in memory) but sucks for the responsiveness of other apps, and will cause hydrus to consume pages after doing a lengthy operation in anticipation of needing them again, even when it is thereafter idle. You need to set
vm.dirtytime_expire_seconds
to a lower value.vm.dirtytime_expire_seconds
When a lazytime inode is constantly having its pages dirtied, the inode with an updated timestamp will never get chance to be written out. And, if the only thing that has happened on the file system is a dirtytime inode caused by an atime update, a worker will be scheduled to make sure that inode eventually gets pushed out to disk. This tunable is used to define when dirty inode is old enough to be eligible for writeback by the kernel flusher threads. And, it is also used as the interval to wakeup dirtytime writeback thread.On many distros this happens only once every 12 hours, try setting it close to every one hour or 2. This will cause the OS to drop pages that were written over 1-2 hours ago. Returning them to the free store for use by other programs.
https://www.kernel.org/doc/Documentation/sysctl/vm.txt
"},{"location":"Fixing_Hydrus_Random_Crashes_Under_Linux.html#why_does_everything_become_clunky_for_a_bit_if_i_have_tuned_all_of_the_above_settings_especially_if_i_try_to_do_something_on_the_system_that_isnt_hydrus","title":"Why does everything become clunky for a bit if I have tuned all of the above settings? (especially if I try to do something on the system that isn't hydrus)","text":"The kernel launches a process called
kswapd
to swap and reclaim memory pages, after hydrus has used pages they need to be returned to the OS (unless fragmentation is preventing this). The OS needs to scan for pages allocated to programs which are not in use, it doens't do this all the time because holding the required locks would have a serious performance impact. The behaviour ofkswapd
is goverened by several important values. If you are using a classic system with a reasonably sized amount of memoery and a swapfile you should tune these. If you are using memory compression (or should be using memory compression because you have a cheap system) read this whole document for info specific to that configuration.-
vm.watermark_scale_factor
This factor controls the aggressiveness of kswapd. It defines the amount of memory left in a node/system before kswapd is woken up and how much memory needs to be free before kswapd goes back to sleep. The unit is in fractions of 10,000. The default value of 10 means the distances between watermarks are 0.1% of the available memory in the node/system. The maximum value is 1000, or 10% of memory. A high rate of threads entering direct reclaim (allocstall) or kswapd going to sleep prematurely (kswapd_low_wmark_hit_quickly) can indicate that the number of free pages kswapd maintains for latency reasons is too small for the allocation bursts occurring in the system. This knob can then be used to tune kswapd aggressiveness accordingly. -
vm.watermark_boost_factor
: If memory fragmentation is high raise the scale factor to look for reclaimable/swappable pages more agressively.
I like to keep
watermark_scale_factor
at 70 (70/10,000)=0.7%, so kswapd will run until at least 0.7% of system memory has been reclaimed. i.e. If 32GiB (real and virt) of memory, it will try to keep at least 0.224 GiB immediately available.vm.dirty_ratio
: The absolute maximum number of un-synced memory(as a percentage of available memory) that the system will buffer before blocking writing processes. This protects you against OOM, but does not keep your system responsive.-
Note: A default installation of Ubuntu sets this way too high (60%) as it does not expect your workload to just be hammering possibly slow disks with written pages. Even with memory overcomitting this can make you OOM, because you will run out of real memory before the system pauses the program that is writing so hard. A more reasonable value is 10 (10%)
-
vm.dirty_background_ratio
: How many the number of unsynced pages that can exist before the system starts comitting them in the background. If this is set too low the system will constantly spend cycles trying to write out dirty pages. If it is set too high it will be way to lazy. I like to set it to 8. -
vm.vfs_cache_pressure
The tendancy for the kernel to reclaim I/O cache for files and directories. This is less important than the other values, but hydrus opens and closes lots of file handles so you may want to boost it a bit higher than default. Default=100, set to 110 to bias the kernel into reclaiming I/O pages over keeping them at a \"fair rate\" compared to other pages. Hydrus tends to write a lot of files and then ignore them for a long time, so its a good idea to prefer freeing pages for infrequent I/O. Note: Increasingvfs_cache_pressure
significantly beyond 100 may have negative performance impact. Reclaim code needs to take various locks to find freeable directory and inode objects. Withvfs_cache_pressure=1000
, it will look for ten times more freeable objects than there are.
An example
/etc/sysctl.conf
section for virtual memory settings.
"},{"location":"Fixing_Hydrus_Random_Crashes_Under_Linux.html#virtual_memory_under_linux_5_phenomenal_cosmic_power_itty_bitty_living_space","title":"Virtual Memory Under Linux 5: Phenomenal Cosmic Power; Itty bitty living space","text":"########\n# virtual memory\n########\n\n#1 always overcommit, prevents the kernel from using a heuristic to decide that a process is bad for asking for a lot of memory at once and killing it.\n#https://www.kernel.org/doc/Documentation/vm/overcommit-accounting\nvm.overcommit_memory=1\n\n#force linux to reclaim pages if under a gigabyte \n#is available so large chunk allocates don't fire off the OOM killer\nvm.min_free_kbytes = 1153434\n\n#Start freeing up pages that have been written but which are in open files, after 2 hours.\n#Allows pages in long lived files to be reclaimed\nvm.dirtytime_expire_seconds = 7200\n\n#Have kswapd try to reclaim .7% = 70/10000 of pages before returning to sleep\n#This increases responsiveness by reclaiming a larger portion of pages in low memory condition\n#So that the next time you make a large allocation the kernel doesn't have to stall and look for pages to free immediately.\nvm.watermark_scale_factor=70\n\n#Have the kernel prefer to reclaim I/O pages at 110% of the rate at which it frees other pages.\n#Don't set this value much over 100 or the kernel will spend all its time reclaiming I/O pages\nvm.vfs_cache_pressure=110\n
Are you trying to run hydrus on a 200 dollar miniPC, this is suprisingly doable, but you will need to really understand what you are tuning.
To start lets explain memory tiers. As memory moves further away from the CPU it becomes slower. Memory close to the CPU is volatile, which means if you remove power from it, it disappears forever. Conversely disk is called non-volatile memory, or persistant storage. We want to get files written to non-volatile storage, and we don't want to have to compete to read non-volatile storage, we would also prefer to not have to compete for writing, butthis is harder.
The most straight forward way of doing this is to seperate where hydrus writes its SQLITE database(index) files, from where it writes the imported files. But we can make a more flexible setup that will also keep our system responsive, we just need to make sure that the system writes to the fastest possible place first. So let's illustrate the options.
graph\n direction LR\n CPU-->RAM;\n RAM-->ZRAM;\n ZRAM-->SSD;\n SSD-->HDD;\n subgraph Non-Volatile\n SSD;\n HDD;\n end
- RAM: Information must be in RAM for it to be operated on
- ZRAM: A compressed area of RAM that cannot be directly accessed. Like a zip file but in memory. Or for the more technical, like a compressed ramdisk.
- SSD: Fast non-volatile storage,good for random access about 100-1000x slower than RAM.
- HDD: Slow non-volatile storage good for random access. About 10000x slowee than RAM.
- Tape(Not shown): Slow archival storage or backup. surprisingly fast actually but can only be accessed sequentially.
The objective is to make the most of our limited hardware so we definitely want to go through zram first. Depending on your configuration you might have a bulk storage (NAS) downstream that you can write the files to, if all of your storage is in the same tower as you are running hydrus, then make sure the SQLITE .db files are on an SSD volume.
Next you should enable ZRAM devices (Not to be confused with ZWAP). A ZRAM device is a compressed swapfile that lives in RAM.
ZRAM can drastically improve performance, and RAM capacity. Experimentally, a 1.7GB partition usually shrinks to around 740MiB. Depending on your system ZRAM may generate several partitions. The author asked for 4x2GB=8GIB partitions, hence the cited ratio.
ZRAM must be created every boot as RAM-disks are lost when power is removed Install a zram generator as part of your startup process. If you still do not have enough swap, you can still create a swapfile. RAM can be configured to use a partition as fallback, but not a file. However you can enable a standard swapfile as described in the prior section. ZRAM generators usually create ZRAM partitions with the highest priority (lowest priority number) so ZRAM will fill up first, before normal disk swaping.
To check your swap configuration
swapon #no argument\ncat /proc/swapinfo\n
To make maximum use of your swap make sure to SET THE FOLLOWING VM SETTINGS
#disable IO clustering we are writing to memroy which is super fast\n#IF YOU DO NOT DO THIS YOUR SYSTEM WILL HITCH as it tries to lock multiple RAM pages. This would be desirable on non-volatile storages but is actually bad on RAM.\nvm.page-cluster=0\n\n#Tell the system that it costs almost as much to swap as to write out dirty pages. But bias it very slightly to completing writes. This is ideal since hydrus tends to hammer the system with writes, and we want to use ZRAM to eat spikes, but also want the system to slightly prefer writing dirty pages\nvm.swappiness=99\n
The above is good for most users. If however you also need to speed up your storage due to a high number of applications on your network using it you may wish to install cache, provided you have at least one or two avialable SSD slots, and the writing pattern is many small random writes.
You should never create a write cache without knowing what you are doing. You need two SSDs to crosscheck eachother, and ideally ressilant server SSDs with large capacitors that ensure all content is always written. If you go with a commercial storage solution they will probably check this already, and give you a nice interface for just inserting and assigning SSD cache.
You can also create a cach manually wit the Logical Volume using the LVM. If you do this you can group together storage volumes. In particular you can put a read or write cache with an SSD in front of slower HDD.
"},{"location":"PTR.html","title":"PTR for Dummies","text":"or Myths and facts about the Public Tag Repository
"},{"location":"PTR.html#what_is_the_ptr","title":"What is the PTR?","text":"Short for Public Tag Repository, a now community managed repository of tags. Locally it acts as a tag service, just like
my tags
. At the time of writing 54 million files have tags on it. The PTR only store the sha256 hash and tag mappings of a file, not the files themselves or any non-tag meta data. In other words: If you do not see it in the tag list then it is not stored.Most of the things in this document also applies to self-hosted servers, except for tag guidelines.
"},{"location":"PTR.html#connecting_to_the_ptr","title":"Connecting to the PTR","text":"The easiest method is to use the built in function, found under
help -> add the public tag repository
. For adding it manually, if you so desire, read the Hydrus help document on access keys.Once you are connected, Hydrus will proceed to download and then process the update files. The progress of this can be seen under
services -> review services -> remote -> tag repositories -> public tag repository
. Here you can view its status, your account (the default account is a shared public account. Currently only janitors and the administrator have personal accounts), tag status, and how synced you are. Being behind on the sync by a certain amount makes you unable to push tags and petitions until you are caught up again.QuickSync 2
If you are starting out with a completely fresh client (i.e. you have not imported any files yet), you can instead download a fully pre-synced client here (overview) Though a little out of date, it will nonetheless save processing time. Some settings may differ from the defaults of an official installation.
"},{"location":"PTR.html#how_does_it_work","title":"How does it work?","text":"For something to end up on the PTR it has to be pushed there. Tags can either be entered into the tag service manually by the user through the
manage tags
window, or be routed there by a parser when downloading files. See parsing tags. Once tags have been entered into the PTR tag service they are pending until pushed. This is indicated by thepending ()
that will appear betweentags
andhelp
in the menu bar. Here you can chose to either push your changes to the PTR or discard them.- Adding tags pass automatically.
- Deleting (petitioning) tags requires janitor action.
- If a tag has been deleted from a file it will not be added again.
- Currently there is no way for a normal user to re-add a deleted tag. If it gets deleted then it is gone. A janitor can undelete tags manually.
- Adding and petitioning siblings and parents all require janitor action.
- The client always assumes the server approves any petition. If your petition gets rejected you wont know.
When making petitions it is important to remember that janitors are only human. We do not necessarily know everything about every niche. We do not necessarily have the files you are making changes for and we will only see a blank thumbnail if we do not have the file. Explain why you are making a petition. Try and keep the number of files manageable. If a janitor at any point is unsure if the petition is correct they are likely to deny the entire petition rather than risk losing good tags. Some users have pushed changes regarding hundreds of tags over thousands of files at once, but due to disregarding PTR tagging practices or being lazy with justification the petition has been denied entirely. Or they have just been plain wrong, trying to impose frankly stupid tagging methods.
Furthermore, if you are two weeks out of sync with PTR you are unable to push additions or deletions until you're back within the threshold.
Q: Does this automagically tag my files? A: No. Until we get machine learning based auto-tagging nothing is truly automatic. All tags on the PTR were uploaded by another user, so if nobody uploaded tags associated with the hash of your file it won't have any tags in the PTR. Q: How good is the PTR at tagging [insert file format or thing from site here]? A: That depends largely on if there's a scrapable database of tags for whatever you're asking about. Anything that comes from a booru or site that supports tags is fairly likely to have something on the PTR. Original content on some obscure chan-style imageboard is less so. Q: Help! My files don't have any tags! What do!? A: As stated above, some things are just very likely to not have any tags. It is also possible that the files have been altered by whichever service you downloaded from. Imgur, Reddit, Discord, and many other sites and services recompress images to save space which might give it a different hash even if it looks indistinguishable from the original file. Use one of the IQDB lookup programs linked in Cuddle's wiki. Q: Why is my database so big!? This can't be right. A: It is working as intended. The size is because you are literally downloading and processing the entire tag database and history of the PTR. It is done this way to ensure redundancy and privacy. Redundancy because anybody with an up-to-date PTR sync can just start their own. Privacy because nobody can tell what files you have since you are downloading the tags for everything the PTR has. Q: Does that mean I can't do anything about the size? A: Correct. There are some plans to crunch the size through a few methods but there are a lot of other far more requested features being, well, requested. Speaking crassly if you are bothered by the size requirement of the PTR you probably don't have a big enough library to really benefit and would be better off just using the IQDB script."},{"location":"PTR.html#janitors","title":"Janitors","text":"Janitors are the people that review petitions. You can meet us at the community Discord to ask questions or see us bitch about some of the silly stuff boorus and users cause to end up in the PTR.
"},{"location":"PTR.html#tag_guidelines","title":"Tag Guidelines","text":"These are a mix of standard practice used by various boorus and changes made by Hydrus Developer and PTR users, ratified by the janitors that actually have to manage all of this. The \"full\" document is viewable at Cuddle's git repo. See Hydrus Developer's thoughts on a public tagging schema.
If you are looking to help out by tagging low tag-count files, remember to keep the tags objective, start simple by for example adding the characters/persons and big obvious things in the image or what else. Tagging every little thing and detail is a sure path to burnout. If you are looking to petition removal of tags then it is preferable to sibling common misspellings, underscores, and defunct tags rather than deleting them outright. The exception is for ambiguous tags where it is better to delete and replace with a less ambiguous tag. When deleting tags that don't belong in the image it can be helpful if you include a short description as to why. It's also helpful if you sanitise downloaded tags from sites with tagged galleries before pushing them to the PTR. For example Pixiv, where you can have a gallery of multiple images, each containing one character, and all of the characters being tagged. Consequently all images in that gallery will have all of the character tags despite no image having more than one character.
"},{"location":"PTR.html#siblings_and_parents","title":"Siblings and parents","text":"When making siblings, go for the closest less-bad tag. Example:
bad_tag
->bad tag
, rather than going for what the top level sibling might be. This creates less potential future work in case standards change and makes it so your request is less likely to be denied by a janitor not being entirely certain that what you're asking is right. Be careful about creating siblings for potentially ambiguous tags. Isjames bond
supposed to becharacter:james bond
or is itseries:james bond
? This is a bit of a bad example due to having the case of the character always belonging to the series, so you can safely sibling it toseries:james bond
since all instances of the character will also have the series, but not all instances of the series will have the character. So let us look at another example: how aboutwool
? Is it the material harvested from sheep, or is it the Malaysian artist that likes to draw Touhou? In doubtful cases it's better to leave it as is, petition the tag for deletion if it's incorrect and add the correct tag.When making parents, make sure it's an always factually correct relationship.
"},{"location":"PTR.html#namespaces","title":"Namespaces","text":"character:james bond
always belongs toseries:james bond
. Butcharacter:james bond
is not alwaysperson:pierce brosnan
. Common examples of not-always true relationships: gender (genderbending), species (furrynisation/humanisation/anthropomorphism), hair colour, eye colour, and other mutable traits.creator:
Used for the creator of the tagged piece of media. Hydrus being primarily used for images it will often be the artist that drew the image. Other potential examples are the author of a book or musician for a song.character:
Refers to characters. James Bond is a character.person:
Refers to real persons. Pierce Brosnan is a person.series:
Used for series. James Bond is a series tag and so is GoldenEye. Due to usage being different on some boorus chance is that you will also see things like Absolut Vodka and other brands in it.photoset:
Used for photosets. Primarily seen for content from idols, cosplayers, and gravure idols.studio:
Is used for the entity that facilitated the production of the file or what's in it. Eon Productions for the James Bond movies.species:
Species of the depicted characters/people/animals. Somewhat controversial for being needlessly detailed, some janitors not liking the namespace at all. Primarily used for furry content.title:
The title of the file. One of the tags Hydrus uses for various purposes such as sorting and collecting. Somewhat tainted by rampant Reddit parsers.medium:
Used for tags about the image and how it's made. Photography, water painting, napkin sketch as a few examples. White background, simple background, checkered background as a few others. What you see about the image.meta:
This namespace is used for information that isn't visible in the image itself or where you might need to go to the source. Some examples include: third-party edit, paid reward (patreon/enty/gumroad/fantia/fanbox), translated, commentary, and such. What you know about the image.Namespaces not listed above are not \"supported\" by the janitors and are liable to get siblinged out, removed, and/or mocked if judged being bad and annoying enough to justify the work. Do not take this to mean that all un-listed namespaces are bad, some are created and used by parsers to indicate where an image came from which can be helpful if somebody else wants to fetch the original or check source tags against the PTR tags. But do exercise some care in what you put on the PTR if you use custom namespaces. Recently
"},{"location":"Understanding_Database_Synchronization.html","title":"Understanding Database Synchronization Options","text":"clothing:
was removed due to being disliked, no booru using it, and the person(s) pushing for it seeming to have disappeared, leaving a less-than-finished mess behind. It was also rife with lossy siblings and things that just plain don't belong with clothing, such asclothing:brown hair
.Tuning your database synchronization using the
"},{"location":"Understanding_Database_Synchronization.html#key_points","title":"Key Points","text":"--db_synchronous_override=0
launch argument can make Hydrus significantly faster with some caveats.- This is a tutorial for advanced users who have read and understood this document and the risk/recovery procedure.
- It is nearly always safe to use
--db_synchronous_override=1
on any modern filesystem and this is the default. - It is always more expensive to access the disk than doing things in memory. SSDs are 10-100x as slow as memory, and HDDs are 1000-10000x as slow as memory.
- If you turn synchronization to
0
you are gambling, but it is a safe gamble if you have a backup and know exactly what you are doing. - After running with synchronization set to zero you must either:
- Exit hydrus normally and let the OS flush disk caches (either by letting the system run/\"idle\" for a while, running
sync
on *NIX systems, or normal shutdown), or - Restore the sqlite database files backup if the OS shutdown abnormally.
- Because of the potential for a lot of outstanding writes when using
synchronous=0
, other I/O on your system will slow down as the pending writes are interleaved. Normal shutdown may also take abnormally long because the system is syncing these pending writes, but you must allow it to take its time as explained in the section below.
Note: In historical versions of hydrus (
"},{"location":"Understanding_Database_Synchronization.html#the_secret_sauce","title":"The Secret Sauce","text":"synchronous=2
), performance was terrible because hydrus would agressively (it was arguably somewhat paranoid) write changes to disk.Setting the synchronous to 0 lets the database engine defer writing to disk as long as physically possible. In the normal operation of your system, files are constantly being partially transfered to disk, even if the OS pretends they have been fully written to disk. This is called write cache and it is really important to use it or your system's performance would be terrible. The caveat is that until you have \"
synced
\" the disk cache, the changes to files are not actually in permanent storage. One purpose of a normal shutdown of the operating system is to make sure all disk caches have been flushed and synced. A program can also request that a file it has just written to be flushed or synced, and it will wait until that is done before continuing.When not in synchronous 0 mode, the database engine syncs at regular intervals to make sure data has been written. - Setting synchronous to 0 is generally safe if and only if the system also shuts down normally, allowing any of these pending writes to be flushed. - The database can back out of partial changes if hydrus crashes even if
"},{"location":"Understanding_Database_Synchronization.html#technical_explanation","title":"Technical Explanation","text":"synchronous=0
, so your database will not go corrupt from hydrus shutting down abnormally, only from the system shutting down abnormally.Programmers are responsible for handling partially written files, but this is tedious for large complex data, so they use a database engine which handles all of this. The database ensures that any partially written data is reversible to a known state (called a rollback).
An existing file may be in 3 possible states:
- Unflushed: Contents is owned by the program writing the file, but control returns immediately to the program instead of waiting for a full write. Content can be transitioned from unflushed to flushed using
fflush(FILE)
.fflush()
is called automatically when a programmer closes a file, or exits the program normally(under most runtimes but not for example in Java). If the program exits abnormally before data is flushed it will be lost when the program crashes. - Flushed: Pending write to permenant storage but memory has been transfered to the operating system. Data will not be lost if the calling program crashes, since the OS promises it will \"eventually\" arrive on disk before returning from
fflush()
. When you \"safely shutdown:, you are instructing the OS among other things to sync the flushed files. If someone decides to read a file before it has been synced the OS will read the contents up until the flush from the flush buffer, and return that instead of what is actually on disk. If the OS crashes due to error or power failure, data that are flushed but not synced will be lost. - Synced: Written to permenant storage. A programmer may request that the contents of the file be synced, or it is done gradually over time to free the OS buffers
To ensure the consistency of the database and rollback when needed, the database engine keeps a journal of what it is doing. Each transaction ends in a
flush
which may be followed by async
. Insynchronous=2
there is a sync after EVERYCOMMIT
, forsynchronous=1
it depends on the journal mode, often enough to maintian consistanc, but not after every commit. The flush ensures that everything written before the flush will occur before the line that indicates the transaction completed. The sync ensures that the entire contents of the transaction has been written to permenant storage before proceeding. The OS is not obligated to write chunks of the database file in the order it recieves them. It only guarantees that if you flush, everything submitted before the flush happens first, and everything submitted after the flush happens next.The sync is what is controlled by the
"},{"location":"Understanding_Database_Synchronization.html#an_example_journal","title":"An example journal","text":"synchronous
switch. Allowing the database to ignore whether sync actually completes is the magic that makessynchronous=0
so dang fast.- Begin Transaction 1
- Write Change 1
- Write Change 2
- Read data
- Write Change 3
- End Transaction 1
Each of these steps are performed in order. Suppose a crash occcured mid writing
- Begin Transaction 1
- Write Change 1
- Write Cha
When the database resumes it will start scanning the journal at step 1. Since it will reach the end without seeing
End Transaction 1
it knows that data was only partialy written, and can put the data back in the state before transaction 1 began. This property of a database is called atomicity in the sense that something atomic is \"indivisible\"; either all of the steps in transaction 1 occur or non of them occur.Hydrus is structured in such a way that the database is written to to keep track of your file catalog only once the file has been fully imported and moved where it is supposed to be. Thus every action hydrus takes is kept \"atomic\" or \"repeatable\" (redo existing work that was partway through). If hydrus crashes in the middle of importing a file, then when it resumes, as far as it is aware, it didn't even start importing the file. It will repeat the steps from the start until the file catalog is \"consistent\" with what is on disk.
"},{"location":"Understanding_Database_Synchronization.html#where_synchronization_comes_in","title":"Where synchronization comes in","text":"Let's revisit the journal, this time with two transactions. Note that the database is syncing on step 8 and thus will have to wait for the OS to write to disk before proceeding, holding up transaction 2, and any other access to the database.
- Begin Transaction 1
- Write Change 1
- Write Change 2
- Read data
- Write Change 3
- FLUSH
- End Transaction 1
- SYNC
- Begin Transaction 2
- Write Change 2
- Write Change 2
- Read data
- Write Change 3
- FLUSH
- End Transaction 2
- SYNC
What happens if we remove step 8 and then die at step 11?
- Begin Transaction 1
- Write Change 1
- Write Change 2
- Read data
- Write Change 3
- FLUSH
- End Transaction 1
- SYNC
- Begin Transaction 2
- Write Change 2
- Write Ch
What if we crash ,
End Transaction 1
possibly has not been written to disk. Now not only do we need to repeat transaction 2, we also need to repeat transaction 1. Note that this just increases the ammount of repeatable work, and actually is fully recoverable (assuming a file you were downloading didn't cease to exist in the interim).Now what happens if we do the above and the OS crashes? As written we are actually glossing over a number of steps that happen in step 8. Actually the database must make a few syncs to be sure the database is reversible. The steps are roughly speaking
- Write and sync rollback
- Update database file with changes
- Sync database file
- Remove rollback/update WAL checkpoint
If sqlite crashes, but the OS doesn't that's fine all of this in flight data is in the OS write buffer and the OS will pretend as if it is on disc. But what if We haven't even finished creating a rollback for the changes made in step 1 and step 2 starts partially changing the database file? Then bam power failure. We now can't revert the database because we don't have a complete rollback, but we also can't move forward in time either because we don't have a marker showing the completion of transaction 2. So we are stuck in the middle of an incomplete transaction, and have lost the data necessary to leave either end.
See also: https://www.sqlite.org/atomiccommit.html#section_6_2
Thus if the OS crashes at the exact wrong moment, there is no way to be sure that the journal is correct if syncing was skipped (
synchronous=0
). This means there is no way for you to determine whether the database file is correct after a system crash if you had synchronous 0, and you MUST restore your files from backup as this will be the ONLY WAY to know they are in a known good state.So, setting
"},{"location":"about_docs.html","title":"About These Docs","text":"synchronous=0
gets you a pretty huge speed boost, but you are gambling that everything goes perfectly and will pay the price of a manual restore every time it doesn't.The Hydrus docs are built with MkDocs using the Material for MkDocs theme. The .md files in the
"},{"location":"about_docs.html#local_setup","title":"Local Setup","text":"docs
directory are converted into nice html in thehelp
directory. This is done automatically in the built releases, but if you run from source, you will want to build your own.To see or work on the docs locally, install
mkdocs-material
:The recommended installation method is
pip
:
"},{"location":"about_docs.html#building","title":"Building","text":"pip install mkdocs-material\n
To build the help, run:
In the base hydrus directory (same as themkdocs build -d help\n
mkdocs.yml
file), which will build it into thehelp
directory. You will then be good!Repeat the command and MkDocs will clear out the old directory and rebuild it, so you can fold this into any update script.
"},{"location":"about_docs.html#live_preview","title":"Live Preview","text":"To edit the
docs
directory, you can run the live preview development server with:mkdocs serve \n
Again in the base hydrus directory. It will host the help site at http://127.0.0.1:8000/, and when you change a file, it will automatically rebuild and reload the page in your browser.
"},{"location":"access_keys.html","title":"PTR access keys","text":"The PTR is now run by users with more bandwidth than I had to give, so the bandwidth limits are gone! If you would like to talk with the new management, please check the discord.
A guide and schema for the new PTR is here.
"},{"location":"access_keys.html#first_off","title":"first off","text":"I don't like it when programs I use connect anywhere without asking me, so I have purposely not pre-baked any default repositories into the client. You have to choose to connect yourself. The client will never connect anywhere until you tell it to.
For a long time, I ran the Public Tag Repository myself and was the lone janitor. It grew to 650 million tags, and siblings and parents were just getting complicated, and I no longer had the bandwidth or time it deserved. It is now run by users.
There also used to be just one user account that everyone shared. Everyone was essentially the same Anon, and all uploads were merged to that one ID. As the PTR became more popular, and more sophisticated and automatically generated content was being added, it became increasingly difficult for the janitors to separate good submissions from bad and undo large scale mistakes.
That old shared account is now a 'read-only' account. This account can only download--it cannot upload new tags or siblings/parents. Users who want to upload now generate their own individual accounts, which are still Anon, but separate, which helps janitors approve and deny uploaded petitions more accurately and efficiently.
I recommend using the shared read-only account, below, to start with, but if you decide you would like to upload, making your own account is easy--just click the 'check for automatic account creation' button in services->manage services, and you should be good. You can change your access key on an existing service--you don't need to delete and re-add or anything--and your client should quickly resync and recognise your new permissions.
"},{"location":"access_keys.html#privacy","title":"privacy","text":"I have tried very hard to ensure the PTR respects your privacy. Your account is a very barebones thing--all a server stores is a couple of random hexadecimal texts and which rows of content you uploaded, and even the memory of what you uploaded is deleted after a delay. The server obviously needs to be aware of your IP address to accept your network request, but it forgets it as soon as the job is done. Normal users are never told which accounts submitted any content, so the only privacy implications are against janitors or (more realistically, since the janitor UI is even more buggy and feature-poor than the hydrus front-end!) the server owner or anyone else with raw access to the server as it operates or its database files.
Most users should have very few worries about privacy. The general rule is that it is always healthy to use a VPN, but please check here for a full discussion and explanation of the anonymisation routine.
"},{"location":"access_keys.html#ssd","title":"a note on resources","text":"Danger
If your database files are stored on an HDD, or your SSD does not have at least 96GB of free space, do not add the PTR!
The PTR has been operating since 2011 and is now huge, more than two billion mappings! Your client will be downloading and indexing them all, which is currently (2021-06) about 6GB of bandwidth and 50GB of hard drive space. It will take hours of total processing time to catch up on all the years of submissions. Furthermore, because of mechanical drive latency, HDDs are too slow to process all the content in reasonable time. Syncing is only recommended if your hydrus db is on an SSD. It doesn't matter if you store your jpegs and webms and stuff on an external HDD; this is just your actual .db database files (normally in install_dir/db folder). Note also that it is healthier if the work is done in small pieces in the background, either during idle time or shutdown time, rather than trying to do it all at once. Just leave it to download and process on its own--it usually takes a couple of weeks to quietly catch up. If you happen to see it working, it will start as fast as 50,000 rows/s (with some bumps down to 1 rows/s as it commits data), and eventually it will slow, when fully synced, to 100-1,000 rows/s. You'll see tags appear on your files as processing continues, first on older, then all the way up to new files just uploaded a couple days ago. Once you are synced, the daily processing work to stay synced is usually just a few minutes. If you leave your client on all the time in the background, you'll likely never notice it.
"},{"location":"access_keys.html#easy_setup","title":"easy setup","text":"Hit help->add the public tag repository and you will all be set up.
"},{"location":"access_keys.html#manually","title":"manually","text":"Hit services->manage services and click add->hydrus tag repository. You'll get a panel, fill it out like this:
Here's the info so you can copy it:
address
portptr.hydrus.network\n
access key45871\n
4a285629721ca442541ef2c15ea17d1f7f7578b0c3f4f5f2a05f8f0ab297786f\n
Note that because this is the public shared key, you can ignore the 'DO NOT SHARE' red text warning.
It is worth checking the 'test address' and 'test access key' buttons just to double-check your firewall and key are all correct. Notice the 'check for automatic account creation' button, for if and when you decide you want to contribute to the PTR.
Then you can check your PTR at any time under services->review services, under the 'remote' tab:
"},{"location":"access_keys.html#quicksync","title":"jump-starting an install","text":"A user kindly manages a store of update files and pre-processed empty client databases. If you want to start a new client that syncs with the PTR (i.e. you have not started this new client and not imported any files yet), this will get you going quicker. This is generally recommended for advanced users or those following a guide, but if you are otherwise interested, please check it out:
https://cuddlebear92.github.io/Quicksync/
"},{"location":"adding_new_downloaders.html","title":"adding new downloaders","text":""},{"location":"adding_new_downloaders.html#anonymous","title":"all downloaders are user-creatable and -shareable","text":"Since the big downloader overhaul, all downloaders can be created, edited, and shared by any user. Creating one from scratch is not simple, and it takes a little technical knowledge, but importing what someone else has created is easy.
Hydrus objects like downloaders can sometimes be shared as data encoded into png files, like this:
This contains all the information needed for a client to add a realbooru tag search entry to the list you select from when you start a new download or subscription.
You can get these pngs from anyone who has experience in the downloader system. An archive is maintained here.
To 'add' the easy-import pngs to your client, hit network->downloaders->import downloaders. A little image-panel will appear onto which you can drag-and-drop these png files. The client will then decode and go through the png, looking for interesting new objects and automatically import and link them up without you having to do any more. Your only further input on your end is a 'does this look correct?' check right before the actual import, just to make sure there isn't some mistake or other glaring problem.
Objects imported this way will take precedence over existing functionality, so if one of your downloaders breaks due to a site change, importing a fixed png here will overwrite the broken entries and become the new default.
"},{"location":"advanced.html","title":"general clever tricks","text":"this is non-comprehensive
I am always changing and adding little things. The best way to learn is just to look around. If you think a shortcut should probably do something, try it out! If you can't find something, let me know and I'll try to add it!
"},{"location":"advanced.html#advanced_mode","title":"advanced mode","text":"To avoid confusing clutter, several advanced menu items and buttons are hidden by default. When you are comfortable with the program, hit help->advanced mode to reveal them!
"},{"location":"advanced.html#exclude_deleted_files","title":"exclude deleted files","text":"In the client's options is a checkbox to exclude deleted files. It recurs pretty much anywhere you can import, under 'import file options'. If you select this, any file you ever deleted will be excluded from all future remote searches and import operations. This can stop you from importing/downloading and filtering out the same bad files several times over. The default is off. You may wish to have it set one way most of the time, but switch it the other just for one specific import or search.
"},{"location":"advanced.html#ime","title":"inputting non-english lanuages","text":"If you typically use an IME to input Japanese or another non-english language, you may have encountered problems entering into the autocomplete tag entry control in that you need Up/Down/Enter to navigate the IME, but the autocomplete steals those key presses away to navigate the list of results. To fix this, press Insert to temporarily disable the autocomplete's key event capture. The autocomplete text box will change colour to let you know it has released its normal key capture. Use your IME to get the text you want, then hit Insert again to restore the autocomplete to normal behaviour.
"},{"location":"advanced.html#tag_display","title":"tag display","text":"If you do not like a particular tag or namespace, you can easily hide it with tags->manage tag display and search:
This image is out of date, sorry!
You can exclude single tags, like as shown above, or entire namespaces (enter the colon, like 'species:'), or all namespaced tags (use ':'), or all unnamespaced tags (''). 'all known tags' will be applied to everything, as well as any repository-specific rules you set.
A blacklist excludes whatever is listed; a whitelist excludes whatever is not listed.
This censorship is local to your client. No one else will experience your changes or know what you have censored.
"},{"location":"advanced.html#importing_with_tags","title":"importing and adding tags at the same time","text":"Add tags before importing on file->import files lets you give tags to the files you import en masse, and intelligently, using regexes that parse filename:
This should be somewhat self-explanatory to anyone familiar with regexes. I hate them, personally, but I recognise they are powerful and exactly the right tool to use in this case. This is a good introduction.
Once you are done, you'll get something neat like this:
Which you can more easily manage by collecting:
Collections have a small icon in the bottom left corner. Selecting them actually selects many files (see the status bar), and performing an action on them (like archiving, uploading) will do so to every file in the collection. Viewing collections fullscreen pages through their contents just like an uncollected search.
Here is a particularly zoomed out view, after importing volume 2:
Importing with tags is great for long-running series with well-formatted filenames, and will save you literally hours' finicky tagging.
"},{"location":"advanced.html#tag_migration","title":"tag migration","text":"Danger
At some point I will write some better help for this system, which is powerful. Be careful with it!
Sometimes, you may wish to move thousands or millions of tags from one place to another. These actions are now collected in one place: services->tag migration.
It proceeds from left to right, reading data from the source and applying it to the destination with the certain action. There are multiple filters available to select which sorts of tag mappings or siblings or parents will be selected from the source. The source and destination can be the same, for instance if you wanted to delete all 'clothing:' tags from a service, you would pull all those tags and then apply the 'delete' action on the same service.
You can import from and export to Hydrus Tag Archives (HTAs), which are external, portable .db files. In this way, you can move millions of tags between two hydrus clients, or share with a friend, or import from an HTA put together from a website scrape.
Tag Migration is a powerful system. Be very careful with it. Do small experiments before starting large jobs, and if you intend to migrate millions of tags, make a backup of your db beforehand, just in case it goes wrong.
This system was once much more simple, but it still had HTA support. If you wish to play around with some HTAs, there are some old user-created ones here.
"},{"location":"advanced.html#shortcuts","title":"custom shortcuts","text":"Once you are comfortable with manually setting tags and ratings, you may be interested in setting some shortcuts to do it quicker. Try hitting file->shortcuts or clicking the keyboard icon on any media viewer window's top hover window.
There are two kinds of shortcuts in the program--reserved, which have fixed names, are undeletable, and are always active in certain contexts (related to their name), and custom, which you create and name and edit and are only active in a media viewer when you want them to. You can redefine some simple shortcut commands, but most importantly, you can create shortcuts for adding/removing a tag or setting/unsetting a rating.
Use the same 'keyboard' icon to set the current and default custom shortcuts.
"},{"location":"advanced.html#finding_duplicates","title":"finding duplicates","text":"system:similar_to lets you run the duplicates processing page's searches manually. You can either insert the hash and hamming distance manually, or you can launch these searches automatically from the thumbnail right-click->find similar files menu. For example:
"},{"location":"advanced.html#file_import_errors","title":"truncated/malformed file import errors","text":"Some files, even though they seem ok in another program, will not import to hydrus. This is usually because they file has some 'truncated' or broken data, probably due to a bad upload or storage at some point in its internet history. While sophisticated external programs can usually patch the error (often rendering the bottom lines of a jpeg as grey, for instance), hydrus is not so clever. Please feel free to send or link me, hydrus developer, to these files, so I can check them out on my end and try to fix support.
If the file is one you particularly care about, the easiest solution is to open it in photoshop or gimp and save it again. Those programs should be clever enough to parse the file's weirdness, and then make a nice clean saved file when it exports. That new file should be importable to hydrus.
"},{"location":"advanced.html#password","title":"setting a password","text":"the client offers a very simple password system, enough to keep out noobs. You can set it at database->set a password. It will thereafter ask for the password every time you start the program, and will not open without it. However none of the database is encrypted, and someone with enough enthusiasm or a tool and access to your computer can still very easily see what files you have. The password is mainly to stop idle snoops checking your images if you are away from your machine.
"},{"location":"advanced_multiple_local_file_services.html","title":"multiple local file services","text":"The client lets you store your files in different overlapping partitions. This can help management workflows and privacy.
"},{"location":"advanced_multiple_local_file_services.html#the_problem","title":"what's the problem?","text":"Most of us end up storing all sorts of things in our clients, often from different parts of our lives. With everything in the same 'my files' domain, some personal photos might be sitting right beside nsfw content, a bunch of wallpapers, and thousands of comic pages. Different processing jobs, like 'go through those old vidya screenshots I imported' and 'filter my subscription files' and 'load up my favourite pictures of babes' all operate on the same gigantic list of files and must be defined through careful queries of tags, ratings, and other file metadata to separate what you want from what you don't.
The problem is aggravated the larger your client grows. When you are trying to sift the 500 art reference images out 850,000 random internet files from the last ten years, it can be difficult getting good tag counts or just generally browsing around without stumbling across other content. This particularly matters when you are typing in search tags, since the tag you want, 'anatomy drawing guide', is going to come with thousands of others, starting 'a...', 'an...', and 'ana...' as you type. If someone is looking over your shoulder as you load up the images, you want to preserve your privacy.
Wouldn't it be nice if you could break your collection into separate areas?
"},{"location":"advanced_multiple_local_file_services.html#file_domains","title":"multiple file domains","text":"tl;dr: you can have more than one 'my files', add them in 'manage services'.
A file domain (or file service) in the hydrus context, is, very simply, a list of files. There is a bit of extra metadata like the time each file was imported to the domain, and a ton of behind the scenes calculation to accelerate searching and aggregate autocomplete tag counts and so on, but overall, when you search in 'my files', you are telling the client \"find all the files in this list that have tag x, y, z on any tag domain\". If you switch to searching 'trash', you are then searching that list of trashed files.
A search page's tag domain is similar. Normally, you will be set to 'all known tags', which is basically the union of all your tag services, but if you need to, you can search just 'my tags' or 'PTR', which will make your search \"find all the files in my files that have tag x, y, z on my tags\". You are setting up an intersection of a file and a tag domain.
Changing the tag domain to 'PTR' or 'all known tags' would make for a different blue circle with a different intersection of search results ('PTR' probably has a lot more 'pretty dress', although maybe not for your files, and 'all known tags', being the union of all the blue circles, will make the same or larger intersection).
This idea of dynamically intersecting domains is very important to hydrus. Each service stands on its own, and the 'my tags' domain is not linked to 'my files'. It does not care where its tagged files are. When you delete a file, no tags are changed. But when you delete a file, the 'file domain' circle will shrink, and that may change the search results in the intersection.
With multiple local file services, you can create new file lists beyond 'my files', letting you make different red circles. You can move and copy files between your local file domains to make new sub-collections and search them separately for a very effective filter.
You can add and remove them under services->manage services:
"},{"location":"advanced_multiple_local_file_services.html#sfw","title":"what does this actually mean?","text":"I think the best simple idea for most regular users is to try a sfw/nsfw split. Make a new 'sfw' local file domain and start adding some images to it. You might eventualy plan to send all your sfw images there, or just your 'IRL' stuff like family photos, but it will be a separate area for whitelisted safe content you are definitely happy for others to glance at.
Search up some appropriate images in your collection and then add them to 'sfw':
This 'add' command is a copy. The files stay in 'my files', but they also go to 'sfw'. You still only have one file on your hard drive, but the database has its identifier in both file lists. Now make a new search page, switch it to 'sfw', and try typing in a search.
The tag results are limited to the files we added to 'sfw'. Nothing from 'my files' bleeds over. The same is true of a file search. Note the times the file was added to 'my files' and 'sfw' are both tracked.
Also note that these files now have two 'delete' commands. You will be presented with more complicated delete and undelete dialogs for files in multiple services. Files only end up in the trash when they are no longer in any local file domain.
You can be happy that any search in this new domain--for tags or files--is not going to provide any unexpected surprises. You can also do 'system:everything', 'system:limit=64' for a random sample, or any other simple search predicate for browsing, and the search should run fast and safe.
If you want to try multiple local file services out, I recommend this split to start off. If you don't like it, you can delete 'sfw' later with no harm done.
Note
While 'add to y' copies the files, 'move from x to y' deletes the files from the original location. They get a delete timestamp (\"deleted from my files 5 minutes ago\"), and they can be undeleted or 'added' back, and they will get their old import timestamp back.
"},{"location":"advanced_multiple_local_file_services.html#using_it","title":"using it","text":"The main way to add and move files around is the thumbnail/media viewer right-click menu.
You can make shortcuts for the add/move operations too. Check file->shortcuts and then the 'media actions' set.
In the future, I expect to have more ways to move files around, particularly integration into the archive/delete filter, and ideally a 'file migration' system that will allow larger operations such as 'add all the files in search x to place y'.
I also expect to write a system to easily merge clients together. Several users already run several different clients to get their 'my files' separation (e.g. a sfw client and a nsfw client), and now we have this tech supported in one client, it makes a lot of efficiency sense to merge them together.
Note that when you select a file domain, you can select 'multiple locations'. This provides the union of whichever domains you like. Tag counts will be correct but imprecise, often something like 'blonde hair (2-5)', meaning 'between two and five files', due to the complexity of quickly counting within these complicated domains.
As soon as you add another local file service, you will also see a 'all my files' service listed in the file domain selector. This is a virtual service that provides a very efficient and accurate search space of the union of all your local file domains.
This whole system is new. I will keep working on it, including better 'at a glance' indications of which files are where (current thoughts are custom thumbnail border colours and little indicator icons). Let me know how you get on with it!
"},{"location":"advanced_multiple_local_file_services.html#meta_file_domains","title":"advanced: a word on the meta file domains","text":"If you are in help->advanced mode, your file search file domain selectors will see 'all known files'. This domain is similar to 'all known tags', but it is not useful for normal browsing. It represents not filtering your tag services by any file list, fetching all tagged file results regardless of what your client knows about them.
If you search 'all known files'/'PTR', you can search all the files the PTR knows about, the vast majority of which you will likely never import. The client will show these files with a default hydrus thumbnail and offer very limited information about them. For file searches, this search domain is only useful for debug and janitorial purposes. You cannot combine 'all known files' with 'all known tags'. It also has limited sibling/parent support.
You can search for deleted files under 'multiple domains' too. These may or may not still be in your client, so they might get the hydrus icon again. You won't need to do this much, but it can be super useful for some maintenance operations like 'I know I deleted this file by accident, what was its URL so I can find it again?'.
Another service is 'all local files'. This is a larger version of 'all my files'. It essentially means 'all the files on your hard disk', which strictly means the union of all the files in your local file domains ('my files' and any others you create, i.e. the 'all my files' domain), 'repository updates' (which stores update files for hydrus repository sync), and 'trash'. This search can be useful for some advanced maintenance jobs.
If you select 'repository updates' specifically, you can inspect this advanced domain, but I recommend you not touch it! Otherwise, if you search 'all local files', repository files are usually hidden from view.
Your client looks a bit like this:
graph TB\n A[all local files] --- B[repository updates]\n A --- C[all my files]\n C --- D[local file domains]\n A --- E[trash]
Repository files, your media, and the trash are actually mutually exclusive. When a file is imported, it is added to 'all local files' and either repository updates or 'all my files' and one or more local file domains. When it is deleted from all of those, it is taken from 'all my files' and moved to trash. When trashed files are cleared, the files are removed from 'trash' and then 'all local files' and thus your hard disk.
"},{"location":"advanced_multiple_local_file_services.html#advanced","title":"more advanced usage","text":"Warning
Careful! It is easy to construct a massively overcomplicated Mind Palace here that won't actually help you due to the weight of overhead. If you want to categorise things, tags are generally better. But if you do want strict search separations for speed, workflow, or privacy, try this out.
If you put your files through several layers of processing, such as
inbox/archive->tags->rating
, it might be helpful to create different file domains for each step. I have seen a couple of proposals like this that I think make sense:graph LR\n A[inbox] --> B[sfw processing]\n A --> C[nsfw processing]\n B --> D[sfw archive]\n C --> E[nsfw archive]
Where the idea would be to make the 'is this sfw/nsfw?' choice early, probably at the same time as archive/delete, and splitting files off to either side before doing tagging and rating. I expect to expand the 'archive/delete' filter to support more actions soon to help make these workflows easy.
File Import Options allows you to specify which service it will import to. You can even import to multiple, although that is probably a bit much. If your inbox filters are overwhelming you--or each other--you might like to have more than one 'landing zone' for your files:
graph LR\n A[subscription and gallery inbox] --> B[archive]\n B --- C[sfw]\n D[watcher inbox] --> B\n E[hard drive inbox] --> B\n F[that zip of cool architecture photos] --> C
Some users have floated the idea of storing your archive on one drive and the inbox on another. This makes a lot of sense for network storage situations--the new inbox could be on a local disk, but the less-accessed archive on cheap network storage. File domains would be a great way to manage this in future, turning the workflow into nice storage commands.
Another likely use of this in future is in the Client API, when sharing with others. If you were to put the files you wanted to share in a file domain, and the Client API were set up to search just on that domain, this would guarantee great privacy. I am still thinking about this, and it may ultimately end up just being something that works that way behind the scenes.
"},{"location":"advanced_parents.html","title":"tag parents","text":"graph LR\n A[inbox] --> B[19th century fishman conspiracy theory evidence]\n A --> C[the mlp x sonic hyperplex]\n A --> D[extremely detailed drawings of hands and feet]\n A --> E[normal stuff]\n E --- F[share with dave]
Tag parents let you automatically add a particular tag every time another tag is added. The relationship will also apply retroactively.
"},{"location":"advanced_parents.html#the_problem","title":"what's the problem?","text":"Tags often fall into certain heirarchies. Certain tags always imply other tags, and it is annoying and time-consuming to type them all out individually every time.
As a basic example, a
car
is avehicle
. It is a subset. Any time you see a car, you also see a vehicle. Similarly, arifle
is afirearm
,face tattoo
impliestattoo
, andspecies:pikachu
impliesspecies:pok\u00e9mon
which also impliesseries:pok\u00e9mon
.Another way of thinking about this is considering what you would expect to see when you search these terms. If you search
vehicle
, you would expect the result to include allcars
. If you searchseries:league of legends
, you would expect to see all instances ofcharacter:ahri
(even if, on rare occasion, she were just appearing in cameo or in a crossover).For hydrus terms,
character x is in series y
is a common relationship, as iscostume x is of character y
:graph TB\n C[series:metroid] --- B[character:samus aran] --- A[character:zero suit samus]
In this instance, anything with
character:zero suit samus
would also havecharacter:samus aran
. Anything withcharacter:samus aran
(and thus anything withcharacter:zero suit samus
) would haveseries:metroid
.Remember that the reverse is not true. Samus comes inextricably from Metroid, but not everything Metroid is Samus (e.g. a picture of just Ridley).
Even a small slice of these relationships can get complicated:
graph TB\n A[studio:blizzard entertainment]\n A --- B[series:overwatch]\n B --- B1[character:dr. angela 'mercy' ziegler]\n B1 --- B1b[character:pink mercy]\n B1 --- B1c[character:witch mercy]\n B --- B2[character:hana 'd.va' song]\n B2 --- B2b[\"character:d.va (gremlin)\"]\n A --- C[series:world of warcraft]\n C --- C1[character:jaina proudmoore]\n C1 --- C1a[character:dreadlord jaina]\n C --- C2[character:sylvanas windrunner]
Some franchises are bananas:
Also, unlike siblings, which as we previously saw are
n->1
, some tags have more than one implication (n->n
):graph TB\n A[adjusting clothes] --- B[adjusting swimsuit]\n C[swimsuit] --- B
adjusting swimsuit
implies both aswimsuit
andadjusting clothes
. Consider howadjusting bikini
might fit on this chart--perhaps this:graph TB\n A[adjusting clothes] --- B[adjusting swimsuit]\n A --- E[adjusting bikini]\n C[swimsuit] --- B\n F[bikini] --- E\n D[swimwear] --- C\n D --- F
Note this is not a loop--like with siblings, loops are not allowed--this is a family tree with three 'generations'.
adjusting bikini
is a child to bothbikini
andadjusting clothes
, andbikini
is a child to the newswimwear
, which is also a parent toswimsuit
.adjusting bikini
andadjusting swimsuit
are both grandchildren toswimwear
.This can obviously get as complicated and over-engineered as you like, but be careful of being too confident. Reasonable people disagree on what is 'clearly' a parent or sibling, or what is an excessive level of detail (e.g.
person:scarlett johansson
may begender:female
, if you think that useful, butspecies:human
,species:mammal
, andspecies:animal
may be going a little far). Beyond its own intellectual neatness, ask yourself the purpose of what you are creating.Of course you can create any sort of parent tags on your local tags or your own tag repositories, but this sort of thing can easily lead to arguments between reasonable people on a shared server like the PTR.
Just like with normal tags, try not to create anything 'perfect' or stray away from what you actually search with, as it usually ends up wasting time. Act from need, not toward purpose.
"},{"location":"advanced_parents.html#tag_parents","title":"tag parents","text":"Let's define the child-parent relationship 'C->P' as saying that tag P is the semantic superset/superclass of tag C. All files that have C should also have P, without exception.
Any file that has C should appear to have P. Any search for P will include all of C implicitly.
Tags can have multiple parents, and multiple tags have the same parent. Loops are not allowed.
Note
In hydrus, tag parents are virtual. P is not actually added to every file by C, it just appears as if it is. When you look at a file in manage tags, you will see the implication, just like you see how tags will be renamed by siblings, but you won't see the parent unless it actually happens to also be there as a 'hard' tag. If you remove a
C->P
parent relationship, all the implied P tags will disappear!It also takes a bunch of CPU to figure this stuff out. Please bear with this system, sometimes it can take time.
"},{"location":"advanced_parents.html#how_to_do_it","title":"how you do it","text":"Go to tags->manage tag parents:
Which looks and works just like the manage tag siblings dialog.
Note that when you hit ok, the client will look up all the files with all your added tag Cs and retroactively apply/pend the respective tag Ps if needed. This could mean thousands of tags!
Once you have some relationships added, the parents and grandparents will show indented anywhere you 'write' tags, such as the manage tags dialog:
"},{"location":"advanced_parents.html#remote_parents","title":"remote parents","text":"Whenever you add or remove a tag parent pair to a tag repository, you will have to supply a reason (like when you petition a tag). A janitor will review this petition, and will approve or deny it. If it is approved, all users who synchronise with that tag repository will gain that parent pair. If it is denied, only you will see it.
"},{"location":"advanced_parents.html#parent_favourites","title":"parent 'favourites'","text":"As you use the client, you will likely make several processing workflows to archive/delete your different sorts of imports. You don't always want to go through things randomly--you might want to do some big videos for a bit, or focus on a particular character. A common search page is something like
[system:inbox, creator:blah, limit:256]
, which will show a sample of a creator in your inbox, so you can process just that creator. This is easy to set up and save in your favourite searches and quick to run, so you can load it up, do some archive/delete, and then dismiss it without too much hassle.But what happens if you want to search for multiple creators? You might be tempted to make a large OR search predicate, like
creator:aaa OR creator:bbb OR creator:ccc OR creator:ddd
, of all your favourite creators so you can process them together as a 'premium' group. But if you want to add or remove a creator from that long OR, it can be cumbersome. And OR searches can just run slow sometimes. One answer is to use the new tag parents tools to apply a 'favourite' parent on all the artists and then search for that favourite.Let's assume you want to search bunch of 'creator' tags on the PTR. What you will do is:
- Create a new 'local tag service' in manage services called 'my parent favourites'. This will hold our subjective parents without uploading anything to the PTR.
- Go to tags->manage where tag siblings and parents apply and add 'my parent favourites' as the top priority for parents, leaving 'PTR' as second priority.
-
Under tags->manage tag parents, on your 'my parent favourites' service, add:
creator:aaa->favourite:aesthetic art
creator:bbb->favourite:aesthetic art
creator:ccc->favourite:aesthetic art
creator:ddd->favourite:aesthetic art
Watch/wait a few seconds for the parents to apply across the PTR for those creator tags.
-
Then save a new favourite search of
[system:inbox, favourite:aesthetic art, limit:256]
. This search will deliver results with any of the child 'creator' tags, just like a big OR search, and real fast!
If you want to add or remove any creators to the 'aesthetic art' group, you can simply go back to tags->manage tag parents, and it will apply everywhere. You can create more umbrella/group tags if you like (and not just creators--think about clothing, or certain characters), and also use them in regular searches when you just want to browse some cool files.
"},{"location":"advanced_siblings.html","title":"tag siblings","text":"Tag siblings let you replace a bad tag with a better tag.
"},{"location":"advanced_siblings.html#the_problem","title":"what's the problem?","text":"Reasonable people often use different words for the same things.
A great example is in Japanese names, which are natively written surname first.
character:ayanami rei
andcharacter:rei ayanami
have the same meaning, but different users will use one, or the other, or even both.Other examples are tiny syntactic changes, common misspellings, and unique acronyms:
- smiling and smile
- staring at camera and looking at viewer
- pokemon and pok\u00e9mon
- jersualem and jerusalem
- lotr and series:the lord of the rings
- marimite and series:maria-sama ga miteru
- ishygddt and i sure hope you guys don't do that
A particular repository may have a preferred standard, but it is not easy to guarantee that all the users will know exactly which tag to upload or search for.
After some time, you get this:
Without continual intervention by janitors or other experienced users to make sure y\u2287x (i.e. making the yellow circle entirely overlap the blue by manually giving y to everything with x), searches can only return x (blue circle) or y (yellow circle) or x\u2229y (the lens-shaped overlap). What we really want is x\u222ay (both circles).
So, how do we fix this problem?
"},{"location":"advanced_siblings.html#tag_siblings","title":"tag siblings","text":"Let's define a relationship, A->B, that means that any time we would normally see or use tag A or tag B, we will instead only get tag B:
Note that this relationship implies that B is in some way 'better' than A.
"},{"location":"advanced_siblings.html#more_complicated","title":"ok, I understand; now confuse me","text":"This relationship is transitive, which means as well as saying
A->B
, you can also sayB->C
, which impliesA->C
andB->C
.graph LR\n A[lena_oxton] --> B[lena oxton] --> C[character:tracer];
In this case, everything with 'lena_oxton' or 'lena oxton' will show 'character:tracer' instead.
You can also have an
A->C
andB->C
that does not includeA->B
.graph LR\n A[d.va] --> C[character:hana 'd.va' song]\n B[hana song] --> C
The outcome of these two arrangements is the same--everything ends up as C.
Many complicated arrangements are possible (and inevitable, as we try to merge many different communities' ideal tags):
graph LR\n A[angela_ziegler] --> B[angela ziegler] --> I[character:dr. angela 'mercy' ziegler]\n C[\"angela_ziegler_(overwatch)\"] --> B\n D[character:mercy] --> I\n E[\"character:mercy (overwatch)\"] --> I\n F[dr angela ziegler] --> I\n G[\"character:\u30de\u30fc\u30b7\u30fc\uff08\u30aa\u30fc\u30d0\u30fc\u30a6\u30a9\u30c3\u30c1\uff09\"] --> E\n H[overwatch mercy] --> I
Note that if you say
"},{"location":"advanced_siblings.html#how_to_do_it","title":"how you do it","text":"A->B
, you cannot also sayA->C
. This is ann->1
relationship. Many things can point to a single ideal, but a tag cannot have more than one ideal. Also, obviously, these graphs are non-cyclic--no loops.Just open tags->manage tag siblings, and add a few.
The client will automatically collapse the tagspace to whatever you set. It'll even work with autocomplete, like so:
Please note that siblings' autocomplete counts may be slightly inaccurate, as unioning the count is difficult to quickly estimate.
The client will not collapse siblings anywhere you 'write' tags, such as the manage tags dialog. You will be able to add or remove A as normal, but it will be written in some form of \"A (B)\" to let you know that, ultimately, the tag will end up displaying in the main gui as B:
Although the client may present A as B, it will secretly remember A! You can remove the association A->B, and everything will return to how it was. No information is lost at any point.
"},{"location":"advanced_siblings.html#remote_siblings","title":"remote siblings","text":"Whenever you add or remove a tag sibling pair to a tag repository, you will have to supply a reason (like when you petition a tag). A janitor will review this petition, and will approve or deny it. If it is approved, all users who synchronise with that tag repository will gain that sibling pair. If it is denied, only you will see it.
"},{"location":"advanced_sidecars.html","title":"sidecars","text":"Sidecars are files that provide additional metadata about a master file. They typically share the same basic filename--if the master is 'Image_123456.jpg', the sidecar will be something like 'Image_123456.txt' or 'Image_123456.jpg.json'. This obviously makes it easy to figure out which sidecar goes with which file.
Hydrus does not use sidecars in its own storage, but it can import data from them and export data to them. It currently supports raw data in .txt files and encoded data in .json files, and that data can be either tags or URLs. I expect to extend this system in future to support XML and other metadata types such as ratings, timestamps, and inbox/archive status.
We'll start with .txt, since they are simpler.
"},{"location":"advanced_sidecars.html#importing_sidecars","title":"Importing Sidecars","text":"Imagine you have some jpegs you downloaded with another program. That program grabbed the files' tags somehow, and you want to import the files with their tags without messing around with the Client API.
If your extra program can export the tags to a simple format--let's say newline-separated .txt files with the same basic filename as the jpegs, or you can, with some very simple scripting, convert to that format--then importing them to hydrus is easy!
Put the jpegs and the .txt files in the same directory and then drag and drop the directory onto the client, as you would for a normal import. The .txt files should not be added to the list. Then click 'add tags/urls with the import'. The sidecars are managed on one of the tabs:
This system can get quite complicated, but the essential idea is that you are selecting one or more sidecar
"},{"location":"advanced_sidecars.html#the_source_dialog","title":"The Source Dialog","text":"sources
, parsing their text, and sending that list of data to one hydrus servicedestination
. Most of the time you will be pulling from just one sidecar at a time.The
source
is a description of a sidecar to load and how to read what it contains.In this example, the texts are like so:
4e01850417d1978e6328d4f40c3b550ef582f8558539b4ad46a1cb7650a2e10b.jpg.txt
5e390f043321de57cb40fd7ca7cf0cfca29831670bd4ad71622226bc0a057876.jpg.txtflowers\nlandscape\nblue sky\n
fast car\nanime girl\nnight sky\n
Since our sidecars in this example are named (filename.ext).txt, and use newlines as the separator character, we can leave things mostly as default.
If you do not have newline-separated tags, for instance comma-separated tags (
flowers, landscape, blue sky
), then you can set that here. Be careful if you are making your own sidecars, since any separator character obviously cannot be used in tag text!If your sidecars are named (filename).txt instead of (filename.ext).txt, then just hit the checkbox, but if the conversion is more complicated, then play around with the filename string converter and the test boxes.
If you need to, you can further process the texts that are loaded. They'll be trimmed of extra whitespace and so on automatically, so no need to worry about that, but if you need to, let's say, add the
"},{"location":"advanced_sidecars.html#the_router_dialog","title":"The Router Dialog","text":"creator:
prefix to everything, or filter out some mis-parsed garbage, this is the place.A 'Router' is a single set of orders to grab from one or more sidecars and send to a destination. You can have several routers in a single import or export context.
You can do more string processing here, and it will apply to everything loaded from every sidecar.
The destination is either a tag service (adding the loaded strings as tags), or your known URLs store.
"},{"location":"advanced_sidecars.html#previewing","title":"Previewing","text":"Once you have something set up, you can see the results are live-loaded in the dialog. Make sure everything looks all correct, and then start the import as normal and you should see the tags or URLs being added as the import works.
It is good to try out some simple situations with one or two files just to get a feel for the system.
"},{"location":"advanced_sidecars.html#import_folders","title":"Import Folders","text":"If you have a constant flow of sidecar-attached media, then you can add sidecars to Import Folders too. Do a trial-run of anything you want to parse with a manual import before setting up the automatic system.
"},{"location":"advanced_sidecars.html#exporting_sidecars","title":"Exporting Sidecars","text":"The rules for exporting are similar, but now you are pulling from one or more hydrus service
sources
and sending to a singledestination
sidecar every time. Let's look at the UI:I have chosen to select these files' URLs and send them to newline-separated .urls.txt files. If I wanted to get the tags too, I could pull from one or more tag services, filter and convert the tags as needed, and then output to a .tags.txt file.
The best way to learn with this is just to experiment. The UI may seem intimidating, but most jobs don't need you to work with multiple sidecars or string processing or clever filenames.
"},{"location":"advanced_sidecars.html#json_files","title":"JSON Files","text":"JSON is more complicated than .txt. You might have multiple metadata types all together in one file, so you may end up setting up multiple routers that parse the same file for different content, or for an export you might want to populate the same export file with multiple kinds of content. Hydrus can do it!
"},{"location":"advanced_sidecars.html#importing","title":"Importing","text":"Since JSON files are richly structured, we will have to dip into the Hydrus parsing system:
If you have made a downloader before, you will be familiar with this. If not, then you can brave the help or just have a play around with the UI. In this example, I am getting the URL(s) of each JSON file, which are stored in a list under the
file_info_urls
key.It is important to paste an example JSON file that you want to parse into the parsing testing area (click the paste button) so you can test on read data live.
Once you have the parsing set up, the rest of the sidecar UI is the same as for .txt. The JSON Parsing formula is just the replacement/equivalent for the .txt 'separator' setting.
Note that you could set up a second Router to import the tags from this file!
"},{"location":"advanced_sidecars.html#exporting","title":"Exporting","text":"In Hydrus, the exported JSON is typically a nested Object with a similar format as in the Import example. You set the names of the Object keys.
Here I have set the URLs of each file to be stored under
metadata->urls
, which will make this sort of structure:{\n \"metadata\" : {\n \"urls\" : [\n \"http://example.com/123456\",\n \"https://site.org/post/45678\"\n ]\n }\n}\n
The cool thing about JSON files is I can export multiple times to the same file and it will update it! Lets say I made a second Router that grabbed the tags, and it was set to export to the same filename but under
metadata->tags
. The final sidecar would look like this:{\n \"metadata\" : {\n \"tags\" : [\n \"blonde hair\",\n \"blue eyes\",\n \"skirt\"\n ],\n \"urls\" : [\n \"http://example.com/123456\",\n \"https://site.org/post/45678\"\n ]\n }\n}\n
You should be careful that the location you are exporting to does not have any old JSON files with conflicting filenames in it--hydrus will update them, not overwrite them! This may be an issue if you have an synchronising Export Folder that exports random files with the same filenames.
"},{"location":"advanced_sidecars.html#note_on_notes","title":"Note on Notes","text":"You can now import/export notes with your sidecars. Since notes have two variables--name and text--but the sidecars system only supports lists of single strings, I merge these together! If you export notes, they will output in the form 'name: text'. If you want to import notes, arrange them in the same form, 'name: text'.
If you do need to select a particular note out of many, see if a String Match (regex
^name:
) in the String Processor will do it.If you need to work with multiple notes that have newlines, I recommend you use JSON rather than txt. If you have to use txt on multiple multi-paragraph-notes, then try a different separator than newline. Go for
||||
or something, whatever works for your job.Depending on how awkward this all is, I may revise it.
"},{"location":"after_disaster.html","title":"Recovering After Disaster","text":""},{"location":"after_disaster.html#you_just_had_a_database_problem","title":"you just had a database problem","text":"I have helped quite a few users recover a mangled database from disk failure or accidental deletion. You just had similar and have been pointed here. This is a simple spiel on the next step that I, hydev, like to give people once we are done.
"},{"location":"after_disaster.html#what_next","title":"what next?","text":"When I was younger, I lost a disk with about 75,000 curated files. It really sucks to go through, and whether you have only had a brush with death or lost tens or hundreds of thousands of files, I know exactly how you have been feeling. The only thing you can change now is the future. Let's make sure it does not happen again.
The good news is the memory of that sinking 'oh shit' feeling is a great motivator. You don't want to feel that way again, so use that to set up and maintain a proper backup regime. If you have a good backup, the worst case scenario, even if your whole computer blows up, is usually just a week's lost work.
So, plan to get a good external USB drive and figure out a backup script and a reminder to ensure you never forget to run it. Having a 'backup day' in your schedule works well, and you can fold in other jobs like computer updates and restarts at the same time. It takes a bit of extra 'computer budget' every year and a few minutes a week, but it is absolutely worth the peace of mind it brings.
Here's the how to backup help, if you want to revisit it. If you would like help setting up FreeFileSync or ToDoList or other similar software, let me know.
This is also a great time to think about backing up other things in your life. All of your documents, family photos, your password manager file--are they backed up? Would you be ok with losing them if their drive failed tomorrow? Movies and music will need a real drive, but your smaller things like documents can also fit on an (encrypted) USB stick that you can put in your wallet or keychain.
"},{"location":"changelog.html","title":"changelog","text":"Note
This is the new changelog, only the most recent builds. For all versions, see the old changelog.
"},{"location":"changelog.html#version_598","title":"Version 598","text":""},{"location":"changelog.html#misc","title":"misc","text":"- I screwed up the import folder deduplication fix last week! it caused import folders that contained duplicated items (and a handful of subscriptions, and even one normal GUI session) to not be able to save back their work. nothing was damaged, per se, but progress was not being saved and work was stopping after the respective systems paused out of safety. I am sorry for the trouble and worry here, and I hate it when this kind of error happens. I did made a test to test this thing worked, but it wasn't good enough. I have fixed it now and I am rejigging my test procedures to explicitly check for this specific class of object type problem (issue #1624)
- fixed the duplicate filter comparison statements to obey the new 'do not use pretty (720p etc..) resolution swap-in strings' option (issue #1621)
- the 'maintenance and processing' page now has some expand/collapse stuff on its boxes to make the options page not want to be so tall on startup
- the 'edit filename tagging options' panel under the 'edit import folder' dialog now auto-populates the example filename from the actual folder's current contents. thanks to a user for pointing this out
- moved a bunch of checkboxes around in the options.
options->tags
is renamedtag autocomplete tabs
and now just handles children and favourites.search
is renamedfile search
and handles the 'read' autocomplete and implicit system:limit, and a new pagetag editing
is added to handle the 'write' autocomplete and various 'tag service panel' settings - the normal search page 'selection tags' list now only computes the tags for the first n thumbnails (default 4096) on a page when you have no files selected. this saves time on mega pages when you click off a selection and also on giant import pages where new files are continually streaming in at the end. I expect this to reduce CPU and lag significantly on clients that idle looking at big import pages. you can set the n under
options->tag presentation
, including turning it off entirely. I did some misc optimisation here too, but I also found some places I can improve the general tag re-compute in future cleanup work - I may have improved some media viewer hover window positioning, sizing, and flicker in layout, particularly on the note window
- the 'do really heavy sibling and parents calculation work in the background' daemon now waits 60 seconds after boot to start work (previously 10s). since I added the new fast sibling and parent cache (which works quick but takes some extra work to initialise), I've noticed you often get a heap of lag as this guy is initially populated right after boot. so, the primary caller now happens a little later in the boot rush and should smooth out the curve a little
- I rewrote the 'ListBook' the options dialog relies on from ancient and irll-desingned wx code to a nice clean simple Qt panel
- if you have a ton of tag services, a new 'use listbook instead of tabbed notebook for tag service panels' checkbox under
options->tag editing
now lets you use the new listbook instead of the old notebook/tabbed widget in: manage tags, manage tag siblings, manage tag parents, manage tag display and application, and review tag display sync
- moved the DnD options out of
options->gui
and to a newexporting
panel and added a bit of text - the BUGFIX 'secret' Discord fix is now formalised into an always-on 'set the DnD to a move flag', with a nice explanatory tooltip. it is now also always safe because it will now only ever run if you are set to export your DnDs to the temp folder
- the 'DnD temp folder' system is now cleaner and DnD temp folders will now be deleted after six hours (previously they were only cleaned up on client exit)
- added a note to the 'getting started with files' help to say you can export files with drag and drop m8
- fixed a bad list type definition in the new auto-resolution rules UI. it thought it was the export folder dialog's list and was throwing weird errors if that list was sorted in column >=4
- if a multi-column list fails to sort, it now catches and displays the error and continues with whatever was going on at the time
- if a multi-column list status is asked for a non-existing column type, the status now reports the error info and attempts its best fallback
- improved multi-column list initialisation across the board so the above problem cannot happen again (the list type was being set in two different locations, and I missed a ctrl+c/v edit)
- behind the scenes, the 'subsidiary page parser' is now one object. it was a janky thing before
- the subsidiary page parsers list in the parsing edit UI now has import/export/duplicate buttons
- it doesn't matter outside of file log post order, I don't think, but subsidiary page parsers now always work in alphabetical order
- they also now name themselves specifically when they cause an error
- parsers now deduplicate the list when saying what they 'produce/parse' in UI
- tweaked my linter settings to better catch some stupid errors and put the effort into cleaning up the hundreds of long-time warnings, probably more than a thousand items of Qt Signal false-positive spam, and the actual real bugs. I am hoping to better expose future needles without a haystack of garbage in the way. I am determined to maintain a 0 error count on Unresolved References going forward
- every single unused import statement is now removed or suppressed. I'm sure there are still tangles and bad ideas generally, but everything is completely lean now
- fixed some PILImage enum references
- improved some hydrus serialisable typedefs
- fixed some exception/warning defs
- deleted some old defunct 'retry' code from subscriptions
- fixed some bitmap generation code to handle non-c-contiguous memoryviews properly
- cleaned up some html parsing to properly navigate weird stuff bs4 might put out
- fixed a stupid type error in the old HydrusTagArchive namespace code
- fixed some account type calls in manage services auto-account creation
- fixed an issue with unusual tab drag and drops
- deleted the empty
TestClientData.py
- deleted the empty
ServerServices.py
- fixed a bunch of misc typedefs in general
- updated my Windows 'running from source' help to now say you need to put the sqlite3.dll in your actual python DLLs dir. as this is more scary than just dropping it in your hydrus install dir, I emphasise this is optional
- updated my 'requirements_server.txt', which is not used but a reference, to use the new requests and setuptools versions we recently updated to
- I am dropping support for the ancient OpenCV 2. we've had some enum monkeypatches in place for years and years, but I don't even know if 2 will even run on any modern python; it is gone now
- fixed an issue that caused non-empty hard drive file import file logs that were created before v595 (this typically affected import folders that are set to 'leave source alone, do not reattempt it' for any of the result actions) to lose track of their original import objects' unique IDs and thus, when given more items to possibly add (again, usually on an import folder sync), to re-add the same items one time over again and essentially double-up in size one time. this broke the ability to review the file log UI panel too, so users who noticed the behaviour was jank couldn't see what was going on. on update, all the newer duplicate items will be removed and you'll reset to the original 'already in db' etc.. stuff you had before. all file logs now check for and remove newer duplicates whenever they load or change contents. this happened because of the 'make file logs load faster' update in v595--it worked great for downloaders and subs, but local file imports use a slightly different ID system to differentiate separate objects and it was not updated correct
- the main text-fetching routine that failed to load the list UI in the above case can now recover from null results if this happens again
- file import objects now have some more safety code to ensure they are identifying themselves correctly on load
- did some more work on copying tags: the new 'always copy parents with tags' was not as helpful as I expected, so this is no longer the default when you hit Ctrl+C (it goes back to the old behaviour of just copying the top-line rows in your selection). when you open a tag selection 'copy' menu, it now lists as a separate item 'copy 2 selected and 3 parents' kind of thing if you do want parents. also, parents will no longer copy with their indent (wew), and the taglists are now deduped so you will not be inundated with tagspam. futhermore, the 'what tags do we have' taglist in the manage tags dialog, and favourites/suggestions taglists, are now more parent-aware and plugged into this system
- added Mr Bones to the frame locations list under
options->gui
. if you use him a lot, he'll now remember where he was and how big he was - also added
manage_times_dialog
,manage_urls_dialog
,manage_notes_dialog
, andexport_files_frame
to the list. they will all remember last size and position by default - the client now recovers from a missing frame location entry with a fallback and a note in the log
- rewrote the way the media viewer hover windows and their sub-controls are updated to the current media object. the old asynchronous pubsub is out, and synchronous Qt signals are in. fingers crossed this truly fixes the rare-but-annoying 'oh the ratings in the top-right hover aren't updating I guess' bug, but we'll see. I had to be stricter about the pipeline here, and I was careful to ensure it would be failsafe, so if you discover a media viewer with hover windows that simply won't switch media (they'd probably be frozen in a null state from viewer open), let me know the details!
- some built versions of the client seem unable to find their local help, so now, when a user asks to open a help page, if it seems to be missing locally, a little text with the paths involved is now written to the log
- all formulae now have a 'name/description' field. this is wholly decorative and simply appears in the single- or multi-line summary of the formula in UI. all formulae start with and will initialise with a blank label
- the generic 'edit formula' panel (the one where you can change the formula type) now has import/export buttons
- updated the ZIPPER UI to use a newer single-class 'queue list' widget rather than some ten year old 'still has some wx in it' scatter of gubbins
- added import/export/duplicate capability to the 'queue list' widget, and added it for ZIPPER formulae
- also added import/export/duplicate buttons to the 'edit string processor' list!!
- 'any characters' String Match objects now describe themselves with the 'such as' respective example string, with the new proviso that no String Match will give this string if it is stuck at the 'example string' default. you'll probably most see this in the manage url class dialog for components and parameters
- cleaned a bunch of this code generally
- fixed an issue fetching millisecond-precise timestamps in the
file_metadata
call when one of the timestamps had a null value (for instance if the file has no modified date of any kind registered) - in the various potential duplicates calls, some simple searches (usually when one/both of two searches are system:everything) are now optimised using the same routine that happens in UI
- the client api version is now 75
- for Win 7 users who run from source, I believe newer the program's newer virtual environments will no longer build in Win 7. it looks like a new version of psd-tools will not compile in python 3.8, and there's also some code in newer versions of the program that 3.8 simply won't run. I think the last version that works for you is v582. we've known this train was coming for a while, so I'm afraid Win 7 guys will have to freeze at that version unless and until they update Windows or move to Linux/macOS
- I have updated the 'running from source' help to talk about this, including adding the magic git line you need to choose a specific version rather than normal git pull. this is likely the last time I will specifically support Win 7, and I suspect I will sunset pyside2 and PyQt5 testing too
- I am releasing a future build alongside this release, just for Windows. it has new dlls for SQLite and mpv. advanced users are invited to test it out and tell me if there are any problems booting and playing media, and if there are no issues, I'll fold this into the normal build next week
- mpv: 2023-08-20 to 2024-10-20
- SQLite: 3.45.3 to 3.47.0
- these bring normal optimisations and bug fixes. I expect no huge problems (although I believe the mpv dll strictly no longer supports Win 7, but that is now moot), but please check and we'll see
- in prep for duplicates auto-resolution, the five variables that go into a potential duplicates search (two file searches, the search type, the pixel dupe requirement, and the max hamming distance) are now bundled into one nice clean object that is simpler to handle and will be easier to update in future. everything that touches this stuff--the page manager, the page UI (there's a whole edit panel for the new class), the filter itself, the Client API, the db search code, all the unit tests, and now the duplicates auto-resolution system--all works on this new thing rather than throwing list of variables around
- I pushed this forward in a bunch of ways. nothing actually works yet, still, but if you poke around in the advanced placeholder UI, you'll see the new potential duplicates search context UI, now with side-by-side file search context panels, for the fleshed-out pixel-perfect jpeg/png default
- due to an ill-planned parsing update, several downloaders' hash lookups (which allow the client to quickly determine 'already in db'/'previously deleted' sometimes) broke last week. they are fixed today, sorry for the trouble!
- the fps number on the file info line, which was previously rounded always to the nearest integer, is now reported to two sig figs when small. it'll say 1.2fps and 0.50fps
- I hacked in some collapse/expand tech into my static box layout that I use all over the place and tentatively turned it on, and defaulting to collapsed, in the bigger review services sub-panels. the giganto-tall repository panel is now much shorter by default, making the rest of the pages more normal sized on first open. let's see how it goes, and I expect I'll put it elsewhere too and add collapse memory and stuff if that makes sense
- the 'copy service key' on review services panels is now hidden behind advanced mode
- tweaked some layout sizers for some spinboxes (the number controls that have an up/down arrow on the side) and my 'noneable' spinboxes so they aren't so width-hesitant. they were not showing their numbers fully on some styles where the arrows were particularly wide. they mostly size stupidly wide now, but at least that lines up with pretty much everything else so the number of stupid layout problems we are dealing with has reduced by one
- the frame locations list under
options->gui
has four new buttons to mass-set 'remember size/position' and 'reset last size/position' to all selected - max implicit system:limit in
options->search
is raised from 100 thousand to 100 million - if there is a critical drive problem when adding a file to the file structure, the exact error is now spammed to a popup and log. previously, it was just propagated up to the caller
- I messed up the 'hex' and 'base64' decode stuff last week. we used to have hex and base64 decode back in python 2 to do some hash conversion stuff, but it was overhauled into the content parser hash type dropdown and the explict conversion was deprecated to a no-op. last week, I foolishly re-used the same ids when I revived the decoding functionality, which caused a bunch of old parsers like gelbooru 0.2.5, e621, 4chan, and likely others, which still had the no-op, to suddenly hex- or base-64-afy their parsed hashes, breaking the parse and lookup
- this week I redefined the hacky enums and generally cleaned this code, and I am deleting all hex and base64 string conversion decodes from all pre-596 parsers. this fixes all the old downloaders by explicitly deleting the no-op so it won't trouble us again
- if you made a string converter in v595 that decodes hex or base64, that encoding step will be deleted, sorry! I have to ask you to re-make it
- added a 'connect.bat' (and .sql file) to the db dir to make it easy to load up the whole database with 'correct' ATTACHED schema names in the sqlite3 terminal
- added
database->db maintenance->get tables using definitions
, which uses the long-planned database module rewrite maintenance tech ( basically a faux foreign key) to fetch every table that uses hash_ids or tag_ids along with the specific column name that uses the id. this will help with various advanced maintenance jobs where we need to clear off a particular master definition to, as for instance happened this week, reset a super-huge autoincrement value on the master hashes table. this same feature will eventually trim client.master.db by discovering which master definitions are no longer used anywhere (e.g. after PTR delete)
- thanks to the continuing efforts of the user making Ugoira improvements, the Client API's
/get_files/render
call will now render an Ugoira to apng or animated webp. note the apng conversion appears to take a while, so make sure you try both formats to see what you prefer - fixed a critical bug in the Client API where if you used the
file_id(s)
request parameter, and gave novel ids, the database was hitting emergency repair code and filling in the ids with pseudorandom recovery hashes. this wasn't such a huge deal, but if you put a very high number in, the autoincrementhash_id
of the hashes table would then move up to there, and if the number was sufficiently high, SQLite would have trouble because of max integer limits and all kinds of stuff blew up. asking about a non-existentfile_id
will now raise a 404, as originally intended - refactored the note set/delete calls, which were doing their own thing, to use the unified hash-parsing routine with the new safety code
- if the Client API is ever asked about a hash_id that is negative or over a ~quadrillion (1024^5), it now throws a special error
- as a backup, if the Client DB is ever asked about a novel hash_id that is negative or over a ~quadrillion (1024^5), it now throws a special error rather than trigger the pseudorandom hash recovery code
- the Client API version is now 74
- fleshed out the duplicates auto-resolution manager and plugged it into the main controller. the mainloop boots and exits now, but it doesn't do anything yet
- updated the multiple-file warning in the edit file urls dialog
- gave the Client API review services panel a very small user-friendliness pass
- I converted more old multi-column list display/sort generation code from the old bridge to the newer, more efficient separated calls for 10 of the remaining 43 lists to do
- via some beardy-but-I-think-it-is-ok typedefs, all the managers and stuff that take the controller as a param now use the new 'only import when linting'
ClientGlobals
Controller type, all unified through that one place, and in a way that should be failsafe, making for much better linting in any decent IDE. I didn't want to spam the 'only import when linting' blocks everywhere, so this was the compromise - deleted the
interface
modules with the Controller interface gubbins. this was an ok start of an idea, but the new Globals import trick makes it redundant - pulled and unified a bunch of the common
ManagerWithMainLoop
code up to the superclass and cleaned up all the different managers a bit - deleted
ClientMaintenance.py
, which was an old attempt to unify some global maintenance daemons that never got off the ground and I had honestly forgotten about - moved responsibility for the
remote_thumbnails
table to the Client Repositories DB module; it is also now plugged into the newer content type maintenance system - moved responsibility for the
service_info
table to the Client Services DB module - the only CREATE TABLE stuff still in the old Client DB creation method is the version table and the old YAML options structure, so we are essentially all moved to the new modules now
- fixed some bugs/holes in the table definition reporting system after playing with the new table export tool (some bad sibling/parent tables, wrongly reported deferred tables, missing notes_map and url_map due to a bad content type def, and the primary master definition tables, which I decided to include). I'm sure there are some more out there, but we are moving forward on a long-term job here and it seems to work
- thanks to a user who put in a lot of work, we finally have Ugoira rendering! all ugoiras will now animate using the hydrus native animation player. if the ugoira has json timing data in its zip (those downloaded with PixivUtil and gallery-dl will!), we will use that, but if it is just a zip of images (which is most older ugoiras you'll see in the wild), it'll check a couple of note names for the timing data, and, failing that, will assign a default 125ms per frame fallback. ugoiras without internal timing data will currently get no 'duration' metadata property, but right-clicking on them will show their note-based or simulated duration on the file info line
- all existing ugoiras will be metadata rescanned and thumbnail regenned on update
- technical info here: https://hydrusnetwork.github.io/hydrus/filetypes.html#ugoira
- ugoira metadata and thumbnail generation is cleaner
- a bug in ugoira thumbnail selection, when the file contains non-image files, is fixed
- a future step will be to write a special hook into the hydrus downloader engine to recognise ugoiras (typically on Pixiv) and splice the timing data into the zip on download, at which point we'll finally be able to turn on Ugoira downloading on Pixiv on our end. for now, please check out PixivUtil or gallery-dl to get rich Ugoiras
- I'd like to bake the simulated or note-based durations into the database somehow, as I don't like the underlying media object thinking these things have no duration, but it'll need more thought
- all multi-column lists now sort string columns in a caseless manner. a subscription called 'Tents' will now slot between 'sandwiches' and 'umbrellas'
- in 'favourite searches', the 'folder' name now has hacky nested folder support. just put '/' in the folder name and it'll make nested submenus. in future this will be implemented with a nicer tree widget
- file logs now load faster in a couple of ways, which should speed up UI session and subscriptions dialog load. previously, there were two rounds of URL normalisation on URL file import object load, one wasteful and one fixable with a cache; these are now dealt with. thanks to the users who sent in profiles of the subscriptions dialog opening; let me know how things seem now (hopefully this fixes/relieves #1612)
- added 'Swap in common resolution labels' to
options->media viewer
. this lets you turn off the '1080p' and '4k'-style label swap-ins for common resolutions on file descriptor strings - the 'are you sure you want to exit the client? 3 pages say \"I am still importing\"' popup now says the page names, and in a pretty way, and it shows multiple messages nicer
- the primary 'sort these tags in a human way m8' routine now uses unicode tech to sort things like \u00df better
- the String Converter can decode 'hex' and 'base64' again (so you can now do '68656c6c6f20776f726c64' or 'aGVsbG8gd29ybGQ=' to 'hello world'). these functions were a holdover from hash parsing in the python 2 times, but I've brushed them off and cleared out the 'what if we put raw bytes in the parsing system bro' nonsense we used to have to deal with. these types are now explictly UTF-8. I also added a couple unit tests for them
- fixed an options initialisation bug where setting two files in the duplicate filter as 'not related' was updating the A file to have the B file's file modified time if that was earlier!! if you have files in this category, you will be asked on update if you want to reset their file modified date back to what is actually on disk (the duplicate merge would not have overwritten this; this only happens if you edit the time in the times dialog by hand). a unit test now checks this situation. sorry for the trouble, and thank you to the user who noticed and reported this
- the hydrus Docker package now sets the 'hydrus' process to
autorestart=unexpected
. I understand this makesfile->exit
stick without an automatic restart. it seems like commanding the whole Docker image to shut down still causes a near-instant unclean exit (some SIGTERM thing isn't being caught right, I think), butfile->exit
should now be doable beforehand. we will keep working here
- the new 'replace selected with their OR' and the original 'add an OR of the selected' are now mutually exclusive, depending on whether the current selection is entirely in the active search list
- added 'start an OR with selected', which opens the 'edit OR predicate' panel on the current selection. this works if you only select one item, too
- added 'dissolve selected into single predicates', when you select only OR predicates. it does the opposite of the 'replace'
- the new OR menu gubbins is now in its own separated menu section on the tag right-click
- the indent for OR sub preds is moved up from two spaces to four
- wrote some help about the 'force page refetch' checkboxes in 'tag import options' here: https://hydrusnetwork.github.io/hydrus/getting_started_downloading.html#force_page_fetch
- added a new submenu
urls->force metadata refetch
that lets you quickly and automatically create a new urls downloader page with the selected files' 'x URL Class' urls with the tag import options set to the respective URLs' default but with these checkboxes all set for you. we finally have a simple answer to 'I messed up my tag parse, I need to redownload these files to get the tags'! - the urls menu offers the 'for x url class' even when only one file is selected now. crazy files with fifty of the same url class can now be handled
- wrote some placeholder UI for the new system. anyone who happens to be in advanced mode will see another tab on duplicate filter pages. you can poke around if you like, but it is mostly just blank lists that aren't plugged into anything
- wrote some placeholder help too. same deal, just a placeholder that you have to look for to find that I'll keep working on
- I still feel good about the duplicates auto-resolution system. there is much more work to do, but I'll keep iterating and fleshing things out
- the new
/get_files/file_path
command now returns thefiletype
andsize
of the file - updated the Client API help and unit tests for this
- client api version is now 73
- the library updates we've been testing the past few weeks have gone well, so I am rolling them into the normal builds for everyone. the libraries that do 'fetch stuff from the internet' and 'help python manage its packages' are being updated because of some security problems that I don't think matter for us at all (there's some persistent https verification thing in requests that I know we don't care about, and a malicious URL exploit in setuptools that only matters if you are using it to download packages, which, as I understand, we don't), but we are going to be good and update anyway
requests
is updated from2.31.0
to2.32.3
setuptools
is updated from69.1.1
to70.3.0
PyInstaller
is updated from6.2
to6.7
for Windows and Linux to handle the newsetuptools
- there do not appear to be any update conflicts with dlls or anything, so just update like you normally do. I don't think the new pyinstaller will have problems with older/weirder Windows, but let me know if you run into anything
- users who run from source may like to reinstall their venvs after pulling to get the new libraries too
- refactored
ClientGUIDuplicates
to a newduplicates
gui module and renamed it toClientGUIDuplicateActions
- harmonised some duplicates auto-resolution terminology across the client to exactly that form. not auto-duplicates or duplicate auto resolution, but 'duplicates auto-resolution'
- fixed some bad help link anchors
- clarified a couple things in the 'help my db is broke.txt' document
- updated the new x.svg to a black version; it looks a bit better in light & dark styles
- fixed an error that was stopping files from being removed sometimes (it also messed up thumbnail selection). it could even cause crashes! the stupid logical problem was in my new list code; it was causing the thumbnail grid backing list to get pseudorandomly poisoned with bad indices when a previous remove event removed the last item in the list
- the tag
right-click->search
menu, on a multiple selection of non-OR predicates that exists in its entirely in the current search context, now hasreplace selected with their OR
, which removes the selection and replaces it with an OR of them all! - the system predicate parser no longer removes all underscores from to-be-parsed text. this fixes parsing for namespaces, URLs, service names, etc.. with underscores in (issue #1610)
- fixed some bad layout in the edit predicates dialog for system:hash (issue #1590)
- fixed some content update logic for the advanced delete choices of 'delete from all local file domains' and 'physically delete now', where the UI-side thumbnail logic was not removing the file from the 'all my files' or 'all local files' domains respectively, which caused some funny thumbnail display and hide/show rules until a restart rebuilt the media object from the (correct) db source
- if you physically delete a file, I no longer force-remove it from view so enthusiastically. if you are looking at 'all known files', it should generally still display after the delete (and now it will properly recognise it is now non-local)
- I may have fixed an issue with page tab bar clicks on the very new Qt 6.8, which has been rolling out this week
- wrote out my two rules for tagging (don't be perfect, only tag what you search) to the 'getting started - more tags' help page: https://hydrusnetwork.github.io/hydrus/getting_started_more_tags.html#tags_are_for_searching_not_describing
- I cleaned up and think I fixed some SIGTERM and related 'woah, we have to shut down right now' shutdown handling. if a non-UI thread calls for the program to exit, the main 'save data now' calls are now all done by or blocked on that thread, with improved thread safety for when it does tell Qt to hide and save the UI and so on (issue #1601, but not sure I totally fixed it)
- added some SIGTERM test calls to
help->debug->tests
so we can explore this more in future - on the client, the managers for db maintenance, quick downloads, file maintanence, and import folders now shut down more gracefully, with overall program shutdown waiting for them to exit their loops and reporting what it is still waiting on in the exit splash (like it already does for subscriptions and tag display). as a side thing, these managers also start faster on program boot if you nudge their systems to do something
- wrote some unit tests to test my unique list and better catch stupid errors like I made last week
- added default values for the 'select from list of things' dialogs for: edit duplicate merge rating action; edit duplicate merge tag action; and edit url/parser link
- moved
FastIndexUniqueList
fromHydrusData
toHydrusLists
- fixed an error in the main import object if it parses (and desires to skip associating) a domain-modified 'post time' that's in the first week of 1970
- reworked the text for the 'focus the text input when you change pages' checkbox under
options->gui pages
and added a tooltip - reworded and changed tone of the boot error message on missing database tables if the tables are all caches and completely recoverable
- updated the twitter link and icon in
help->links
to X
- in a normal search page tag autocomplete input, search results will recognise exact-text-matches of their worse siblings for 'put at the top of the list' purposes. so, if you type 'lotr', and it was siblinged to 'series:lord of the rings', then 'series:lord of the rings' is now promoted to the top of the list, regardless of count, as if you had typed in that full ideal tag
- OR predicates are now multi-line. the top line is OR:, and then each sub-tag is now listed indented below. if you construct an OR pred using shift+enter in the tag autocomplete, this new OR does start to eat up some space, but if you are making crazy 17-part OR preds, maybe you'll want to use the OR button dialog input anyway
- when you right-click an OR predicate, the 'copy' menu now recognises this as '3 selected tags' etc.. and will copy all the involved tags and handle subtags correctly
- the 'remove/reset for all selected' file relationship menu is no longer hidden behind advanced mode. it being buried five layers deep is enough
- to save a button press, the manage tag siblings dialog now has a paste button for the right-side tag autocomplete input. if you paste multiple lines of content, it just takes the first
- updated the file maintenance job descriptions for the 'try to redownload' jobs to talk about how to deal with URL downloads that 404 or produce a duplicate and brushed up a bit of that language in general
- the new 'if a db job took more than 15 seconds, log it' thing now tests if the program was non-idle at the start or end of the db job, rather than just the end. this will catch some 'it took so long that some \"wake up\" stuff had time to kick in' instances
- fixed a typo where if the 'other' hashes were unknown, the 'sha512 (unknown)' label was saying 'md5 (unknown)'
- file import logs get a new 'advanced' menu option, tucked away a little, to 'renormalise' their contents. this is a maintenance job to clear out duplicate chaff on an existing list after the respective URL Class rules have changed to remove something in normalisation (e.g. setting a parameter to be ephemeral). I added a unit test for this also, but let me know how it works in the wild
- fixed the source time parsing for the gelbooru 0.2.0 (rule34.xxx and others) and gelbooru 0.2.5 (gelbooru proper) page parsers
- fixed the 'permits everything' API Permissions update from a couple weeks ago. it was supposed to set 'permits everything' when the existing permissions structure was 'mostly full', but the logic was bad and it was setting it when the permissions were sparse. if you were hit by this and did not un-set the 'permits everything' yourself in review services, you will get a yes/no prompt on update asking if you want to re-run the fixed update. if the update only missed out setting \"permits everything\" where it should have, you'll just get a popup saying it did them. sorry for missing this, my too-brief dev machine test happened to be exactly on the case of a coin flip landing three times on its edge--I've improved my API permission tests for future
- I got started on the db module that will handle duplicates auto-resolution. this started out feeling daunting, and I wasn't totally sure how I'd do some things, but I gave it a couple iterations and managed to figure out a simple design I am very happy with. I think it is about 25-33% complete (while object design is ~50-75% and UI is 0%), so there is a decent bit to go here, but the way is coming into focus
- updated my
SortedList
, which does some fast index lookup stuff, to handle more situations, optimised some remove actions, made it more compatible as a list drop-in replacement, moved it toHydrusData
, and renamed it toFastIndexUniqueList
- the autocomplete results system uses the new
FastIndexUniqueList
a bit for some cached matches and results reordering stuff - expanded my
TemporerIntegerTable
system, which I use to do some beardy 'executemany' SELECT statements, to support an arbitrary number of integer columns. the duplicate auto-resolution system is going to be doing mass potential pair set intersections, and this makes it simple - thanks to a user, the core
Globals
files get some linter magic that lets an IDE do good type checking on the core controller classes without running into circular import issues. this reduced project-wide PyCharm linter warnings from like 4,500 to 2,200 wew - I pulled the
ServerController
andTestController
gubbins out ofHydrusGlobals
into their own 'Globals' files in their respective modules to ensure other module-crawlers (e.g. perhaps PyInstaller) do not get confused about what they are importing here, and to generally clean this up a bit - improved a daemon unit test that would sometimes fail because it was not waiting long enough for the daemon to finish. I cut some other fat and it is now four or five seconds faster too
- the 'read' autocomplete dropdown has a new one-click 'clear search' button, just beside the favourites 'star' menu button. the 'empty page' favourite is removed from new users' defaults
- in an alteration to the recent Autocomplete key processing, Ctrl+c/Ctrl+Insert will now propagate to the results list if you currently have none of the text input selected (i.e. if it would have been a no-op on the text input, we assume you wanted whatever is selected in the list)
- in the normal thumbnail/viewer menu and review services, the 'files' entry is renamed to 'locations'. this continues work in the left hand button of the autocomplete dropdown where you set the 'location', which can be all sorts of complicated things these days, rather than just 'file service key selector'. I don't think I'll rename 'my files' or anything, but I will try to emphasise this 'locations' idea more when I am talking about local file domains etc.. in other places going forward; what I often think of as 'oh yeah the files bit' isn't actually referring to the files themselves, but where they are located, so let's be precise
- last week's tag pair filtering in tags->migrate tags now has 'if either the left or right of the pair have count', and when you hit 'Go' with any of the new count filter checkboxes hit, the preview summary on the yes/no confirmation dialog talks about it
- any time a watcher subject is parsed, if the text contains non-decoded html entities (like
>
), they are now auto-converted to normal chars. these strings are often ripped from odd places and are only used for user display, so this just makes that simpler - if you are set to remove trashed files from view, this now works when the files are in multpile local file domains, and you choose 'delete from all local file services', and you are looking at 'all my files' or a subset of your local file domains
- we now log any time (when the client is non-idle) that a database job's work inside the transaction wrapper takes more than 15 seconds to complete
- fixed an issue caused by the sibling or parents system doing some regen work at an unlucky time
- thanks to user help, the derpibooru post parser now additionally grabs the raw markdown of a description as a second note. this catches links and images better than the html string parse. if you strictly only want one of these notes, please feel free to dive into network->downloaders->defailt import options for your derpi downloader and try to navigate the 'note import options' hell I designed and let me know how it could be more user friendly
- added a new NESTED formula type. this guy holds two formulae of any type internally, parsing the document with the first and passing those results on to the second. it is designed to solve the problem of 'how do I parse this JSON tucked inside HTML' and vice versa. various encoding stuff all seems to be handled, no extra work needed
- added Nested formula stuff to the 'how to make a downloader' help
- made all the screenshot in the parsing formula help clickable
- renamed the COMPOUND formula to ZIPPER formula
- all the 'String Processor' buttons across the program now have copy and paste buttons, so it is now easy to duplicate some rules you set up
- in the parsing system, sidecar importer, and clipboard watcher, all strings are now cleansed of errant 'surrogate' characters caused by the source incorrectly providing utf-16 garbage in a utf-8 stream. fingers crossed, the cleansing here will actually fix problem characters by converting them to utf-8, but we'll see
- thanks to a user, the JSON parsing system has a new 'de-minify json' parsing rule, which decompresses a particular sort of minified JSON that expresses multiply-referenced values using list positions. as it happened that I added NESTED formulae this week, I wonder if we will migrate this capability to the string processing system, but let's give it time to breathe
- fixed the permission check on the new 'get file/thumbnail local path' commands--due to me copy/pasting stupidly, they were still just checking 'search files' perm
- added
/get_files/local_file_storage_locations
, which spits out the stuff in database->move media files and lets you do local file access en masse - added help and a unit test for this new command
- the client api version is now 72
- the 'old' OpenCV version in the
(a)dvanced
setup, which pointed to version 4.5.3.56, which had the webp vulnerability, is no longer an option. I believe this means that the program will no longer run on python 3.7. I understad Win 7 can run python 3.8 at the latest, so we are nearing the end of the line on that front - the old/new Pillow choice in
(a)dvanced
setup, which offered support for python 3.7, is removed - I have added a new question to the
(a)dvanced
venv setup to handle misc 'future' tests better, and I added a new future test for two security patches forsetuptools
andrequests
: - A)
setuptools
is updated to 70.3.0 (from 69.1.1) to resolve a security issue related to downloading packages from bad places (don't think this would ever affect us, but we'll be good) - B)
requests
is updated to 2.32.3 (from 2.31.0) to resolve a security issue with verify=False (the specific problem doesn't matter for us, but we'll be good) - if you run from source and want to help me test, you might like to rebuild your venv this week and choose the new future choice. these version increments do not appear to be a big deal, so assuming no problems I will roll these new libraries into a 'future' test build next week, and then into the normal builds a week after
- did a bunch more
super()
refactoring. I think all__init__
is now converted across the program, and I cleared all the normal calls in the canvas and media results panel code too - refactored
ClientGUIResults
into four files for the core class, the loading, the thumbnails, and some menu gubbins. also unified the mish-mash ofResults
andMediaPanel
nomenclature toMediaResultsPanel
- fixed a stupid oversight with last week's \"move page focus left/right after closing tab\" thing where it was firing even when the page closed was not the current tab!! it now correctly only moves your focus if you close the current tab, not if you just middle click some other one
- fixed the share->export files menu command not showing if you right-clicked on just one file
- cleaned some of the broader thumbnail menu code, separating the 'stuff to show if we have a focus' and 'stuff to show if we have a selection'; the various 'manage' commands now generally show even if there is no current 'focus' in the preview (which happens if you select with ctrl+click or ctrl+a and then right-click in whitespace)
- the 'migrate tags' dialog now allows you to filter the sibling or parent pairs by whether the child/worse or parent/ideal tag has actual mapping counts on an arbitrary tag service. some new unit tests ensure this capability
- fixed an error in the duplicate metadata merge system where if files were exchanging known URLs, and one of those URLs was not actually an URL (e.g. it was garbage data, or human-entered 'location' info), a secondary system that tried to merge correlated domain-based timestamps was throwing an exception
- to reduce comma-confusion, the template for 'show num files and import status' on page names is now \"name - (num_files - import_status)\"
- the option that governs whether page names have the file count after them (under options->gui pages) has a new choice--'show for all pages, but only if greater than zero'--which is now the default for new users
- broke up the over-coupled 'migrate tags' unit tests into separate content types and the new count-filtering stuff
- cleaned up the 'share' menu construction code--it was messy after some recent rewrites
- added some better error handling around some of the file/thumbnail path fetching/regen routines
- the client api gets a new permissions state this week: the permissions structure you edit for an access key can now be (and, as a convenient default, starts as) a simple 'permits everything' state. if the permissions are set to 'permit everything', then this overrules all the specific rules and tag search filter gubbins. nice and simple, and a permissions set this way will automatically inherit new permissions in the future. any api access keys that have all the permissions up to 'edit ratings' will be auto-updated to 'permits everything' and you will get an update saying this happened--check your permissions in review services if you need finer control
- added a new permission,
13
, for 'see local paths' - added
/get_files/file_path
, which fetches the local path of a file. it needs the new permission - added
/get_files/thumbnail_path
, which fetches the local path of a thumbnail and optionally the filetype of the actual thumb (jpeg or png). it needs the new permission - the
/request_new_permissions
command now accepts apermits_everything
bool as a selective alternate to thebasic_permissions
list - the
/verify_access_key
command now responds with the name of the access key and the newpermits_everything
value - the API help is updated for the above
- new unit tests test all the above
- the Client API version is now 71
- the main
ClientLocalServerResources
file has been getting too huge (5,000 lines), so I've moved it andClientLocalServer
to their ownapi
module and broken the Resources file up into core functions, the superclass, and the main verbs - fixed permissions check for
/manage_popups/update_popup
, which was checking for pages permission rather than popup permission - did a general linting pass of these easier-to-handle files; cleaned up some silly stuff
- the 'check now' button in manage subscriptions is generally more intelligent and now offers questions around paused status: if all the selected queries are DEAD, it now asks you if you want to resurrect them with a yes/no variant of the DEAD/ALIVE question (previously it just did it); if you are in edit subscriptions and any of the selected subs are paused, it now asks you if you want to include them (and unpause) in the check now, and if not it reduces the queries examined for the DEAD/ALIVE question appropriately (previously it just did their queries, and did not unpause); in either edit subscriptions or edit subscription, if any queries in the selection after any 'paused subs' or 'DEAD/ALIVE' filtering are paused, it asks you if you want to include (and unpause) them in the check now (previously it just did and unpaused them all)
- if you shrink the search page's preview window down to 0 size (which it will suddenly snap to, and which is a silghtly different hide state to the one caused by double-left-clicking the splitter sash), the preview canvas will now recognise it is hidden and no longer load media as you click on thumbs. previously this thing was loading noisy videos in the background etc..
- the
StringMatch
'character set' match type now has 'hexadecimal characters' (^[\\da-fA-F]+$
) and 'base-64 characters' (^[a-zA-Z\\d+/]+={0,2}$
) in its dropdown choice - the 'gui pages' options panel now has 'when closing tabs, move focus (left/right)', so if you'd rather move left when middle-clicking tabs etc.., you can now set it, and if your style's default behaviour is whack and never moved to the right before despite you wanting it, now you can force it; it is now explicit either way. let me know if any crazy edge-case focus logic happens in this mode with nested page of pages or whatever
- when you right-click a file, in the share->copy hash menu, the md5, sha1, and sha512 hashes are now loaded from the database, usually in the milliseconds after the menu is opened, and shown in the menu labels for quick human reference. if your client does not have these hashes for the file, it says so
- the 'share' thumbnail menu is now visible on non-local files. it is severely truncated, basically just shows copy hash/file_id stuff
- wrote a 'Current Deleted Pending Petitioned' section for the Developer API to discuss how the states in the content storage system overlap and change in relation to various commands in the content update pipeline https://hydrusnetwork.github.io/hydrus/developer_api.html#CDPP It may be of interest to non-API-devs who are generally interested in what exactly the 'pending' state etc.. is
- if the file import options in a hard drive import page currently imports to an empty location context (e.g. you deleted the local file service it wanted to import to), the import page now pauses and presents an appropriate error text. the URL importers already did this, so this is the hdd import joining them
- this 'check we are good to do file work' test in the importer pages now in all cases pursues a 'default' file import options to the actual real one that will be used, so if your importer file import options are borked, this is now detected too and the importer will pause rather than fail everything in its file log
- thanks to a user, fixed a typo bug in the new multi-column list work that was causing problems when looking at gallery logs that included mis-linked log entries. in general, the main 'turn this integer into something human' function will now handle errors better
- advanced/technical, tl;dr: x.com URLs save better now. since a better fix will take more work, the 'x post' URL class is for now set to associate URLs. this fixes the association of x.com URLs when those are explicitly referred to as source URLs in a booru post. previously, some hydrus network engine magic related to how x URLs are converted to twitter URLs (and then fx/vxtwitter URLs) to get parsed by the twitter parser was causing some problems. a full 'render this URL as this URL' system will roll out in future to better handle this situation where two non-API URLs can mean the same thing. this will result in some twitter/x post URL duplication--we'll figure out a nice merge later!
- I have written the first skeleton of the
MetadataConditional
object. it has a rule based on a system predicate (like 'width > 400px') and returns True/False when you give it a media object. this lego-brick will plug into a variety of different systems in future, including the duplicate auto-resolution system, with a unified UI - system predicates cannot yet do this arbitrarily, so it will be future work to fill out this code. to start with, I've just got system:filetype working to ultimately effect the first duplicate auto-resolution job of 'if pixel duplicates and one is jpeg, one png, then keep the jpeg'
- add some unit tests to test this capability
- refactored the main
Predicate
object and friends to the newClientSearchPredicate
- refactored the main
NumberTest
object and friends to the newClientNumberTest
- refactored the main
TagContext
object and friends to the newClientTagContext
- refactored the main
FileSearchContext
object and friends to the newClientSearchFileSearchContext
- moved some other
ClientSearch
stuff to other places and renamed the original file toClientSearchFavourites
; it now just contains the favourite searches manager - some misc cleanup around here. some enums had bad names, that sort of thing
- the similar-files search maintenance code has an important update that corrects tree rebalancing for a variety of clients that initialised with an unlucky first import file. in the database update, I will check if you were affected here and immediately optimise your tree if so. it might take a couple minutes if you have millions of files
- tag parent and sibling changes now calculate faster at the database level. a cache that maintains the structure of which pairs need to be synced is now adjusted with every parent/sibling content change, rather than regenerated. for the PTR, I believe this will save about a second of deferred CPU time on an arbitrary parent/sibling change for the price of about 2MB of memory, hooray. fingers crossed, looking at the tags->sibling/parent sync->review panel while repository processing is going on will now be a smooth-updating affair, rather than repeated 'refreshing...' wait-flicker
- the 'the pairs you mean to add seem to connect to some loops' auto-loop-resolution popup in the manage siblings/parents dialogs will now only show when it is relevent to pairs to be added. previously, this thing was spamming during the pre-check of the process of the user actually breaking up loops by removing pairs
- added an item, 'sync now', to the tags->sibling/parent sync menu. this is a nice easy way to force 'work hard' on all services that need work. it tells you if there was no work to do
- reworked the 'new page chooser' mini-dialog and better fixed-in-place the intended static 3x3 button layout
- showing 'all my files' and 'local files' in the 'new page chooser' mini-dialog is now configurable in options->pages. previously 'local files' was hidden behind advanced mode. 'all my files' will only ever show if you have more than one local files domain
- when a login script fails with 401 or 403, or indeed any other network error, it now presents a simpler error in UI (previously it could spam the raw html of the response up to UI)
- generally speaking, the network job status widget will now only show the first line of any status text it is given. if some crazy html document or other long error ends up spammed to this thing, it should now show a better summary
- the 'filename' and 'first/second/etc.. directory' checkbox-and-text-input controls in the filename tagging panel now auto-check when you type something in
- the 'review sibling/parent sync' and 'manage where tag siblings and parents apply' dialogs are now plugged into the 'default tag service' system. they open to this tab, and if you are set to update it to the last seen, they save over the value on changes
- fixed the default safebooru file page parser to stop reading undesired '?' tags for every namespace (they changed their html recently I think)
- catbox 'collection' pages are now parseable by default
- fixed an issue with showing the 'manage export folders' dialog. sorry for the trouble--in my list rewrite, I didn't account for one thing that is special for this list and it somehow slipped through testing. as a side benefit, we are better prepped for a future update that will support column hiding and rearranging
- optimised about half of the new multi-column lists, as discussed last week. particularly included are file log, gallery log, watcher page, gallery page, and filename tagging panel, which all see a bunch of regular display/sort updates. the calls to get display data or sort data for a row are now separate, so if the display code is CPU expensive, it won't slow a sort
- in a couple places, url type column is now sorted by actual text, i.e. file url-gallery url-post url-watchable url, rather than the previous conveniently ordered enum. not sure if this is going to be annoying, so we'll see
- the filename tagging list no longer sorts the tag column by tag contents, instead it just does '#''. this makes this list sort superfast, so let's see if it is super annoying, but since this guy often has 10,000+ items, we may prefer the fast sort/updates for now
- the
/add_files/add_file
command now has adelete_after_successful_import
parameter, default false, that does the same as the manual file import's similar checkbox. it only works on commands with apath
parameter, obviously - updated client api help and unit tests to test this
- client api version is now 70
- I cleaned up a mash of ancient shortcut-processing jank in the tag autocomplete input and fixed some logic. everything is now processed through one event filter, the result flags are no longer topsy-turvy, and the question of which key events are passed from the text input to the results list is now a simple strict whitelist--basically now only up/down/page up/page down/home/end/enter (sometimes)/escape (sometimes) and ctrl+p/n (for legacy reasons) are passed to the results list. this fixes some unhelpful behaviour where you couldn't select text and ctrl+c it unless the results list was empty (since the list was jumping in, after recent updates, and saying 'hey, I can do ctrl+c, bro' and copying the currently selected results)
- the key event processing in multi-column lists is also cleaned up from the old wx bridge to native Qt handling
- and some crazy delete handling in the manage urls dialog is cleaned up too
- the old
EVT_KEY_DOWN
wx bridge is finally cleared out of the program. I also cleared out some other old wx event definitions that have long been replaced. mostly we just have some mouse handling and window state changes to deal with now - replaced many of my ancient static inheritance references with python's
super()
gubbins. I disentangled all the program's multiple inheritance into super() and did I think about half of the rest. still like 360__init__
lines to do in future - a couple of the 'noneable text' widgets that I recently set to have a default text, in the subscription dialogs, now use that text as placeholder rather than actual default. having 'my subscription' or whatever is ok as a guide, but when the user actually wants to edit, having it be real text is IRL a pain
- refactored the repair file locations dialog and manage options dialog and new page picker mini-dialog to their own python files
The hydrus client now supports a very simple API so you can access it with external programs.
"},{"location":"client_api.html#enabling_the_api","title":"Enabling the API","text":"By default, the Client API is not turned on. Go to services->manage services and give it a port to get it started. I recommend you not allow non-local connections (i.e. only requests from the same computer will work) to start with.
The Client API should start immediately. It will only be active while the client is open. To test it is running all correct (and assuming you used the default port of 45869), try loading this:
http://127.0.0.1:45869
You should get a welcome page. By default, the Client API is HTTP, which means it is ok for communication on the same computer or across your home network (e.g. your computer's web browser talking to your computer's hydrus), but not secure for transmission across the internet (e.g. your phone to your home computer). You can turn on HTTPS, but due to technical complexities it will give itself a self-signed 'certificate', so the security is good but imperfect, and whatever is talking to it (e.g. your web browser looking at https://127.0.0.1:45869) may need to add an exception.
The Client API is still experimental and sometimes not user friendly. If you want to talk to your home computer across the internet, you will need some networking experience. You'll need a static IP or reverse proxy service or dynamic domain solution like no-ip.org so your device can locate it, and potentially port-forwarding on your router to expose the port. If you have a way of hosting a domain and have a signed certificate (e.g. from Let's Encrypt), you can overwrite the client.crt and client.key files in your 'db' directory and HTTPS hydrus should host with those.
Once the API is running, go to its entry in services->review services. Each external program trying to access the API will need its own access key, which is the familiar 64-character hexadecimal used in many places in hydrus. You can enter the details manually from the review services panel and then copy/paste the key to your external program, or the program may have the ability to request its own access while a mini-dialog launched from the review services panel waits to catch the request.
"},{"location":"client_api.html#tools_created_by_hydrus_users","title":"Tools created by hydrus users","text":""},{"location":"client_api.html#browser_add-on","title":"Browser Add-on","text":"- Hydrus Companion: a Chrome/Firefox extension for hydrus that allows easy download queueing as you browse and advanced login support
- Hydrus Web: a web client for hydrus (allows phone browsing of hydrus)
- Hyshare: a way to share small galleries with friends--a replacement for the old 'local booru' system
- Hydra Vista: a macOS client for hydrus
- LoliSnatcher: a booru client for Android that can talk to hydrus
- Anime Boxes: a booru browser, now supports adding your client as a Hydrus Server
- FlipFlip: an advanced slideshow interface, now supports hydrus as a source
- Hydrus Archive Delete: Archive/Delete filter in your web browser
- hydownloader: Hydrus-like download system based on gallery-dl
- hydrus-dd: DeepDanbooru tagging for Hydrus
- wd-e621-hydrus-tagger: More AI tagging, with more models
- Hydrus Video Deduplicator: Discovers duplicate videos in your client and queues them for the duplicate filter
- tagrank: Shows you comparison images and cleverly ranks your favourite tag.
- hyextract: Extract archives from Hydrus and reimport with tags and URL associations
- Send to Hydrus: send URLs from your Android device to your client
- Iwara-Hydrus: a userscript to simplify sending Iwara videos to Hydrus Network
- dolphin-hydrus-actions: Adds Hydrus right-click context menu actions to Dolphin file manager.
- more projects on github
I welcome all your bug reports, questions, ideas, and comments. It is always interesting to see how other people are using my software and what they generally think of it. Most of the changes every week are suggested by users.
You can contact me by email, twitter, discord, or the release threads on 8chan or Endchan--I do not mind which. Please know that I have difficulty with social media, and while I try to reply to all messages, it sometimes takes me a while to catch up.
If you need it, here's my public GPG key.
The Github Issue Tracker was turned off for some time, as it did not fit my workflow and I could not keep up, but it is now running again, managed by a team of volunteer users. Please feel free to submit feature requests there if you are comfortable with Github. I am not socially active on Github, please do not ping me there.
I am on the discord on Saturday afternoon, USA time, if you would like to talk live, and briefly on Wednesday after I put the release out. If that is not a good time for you, please leave me a DM and I will get to you when I can. There are also plenty of other hydrus users who idle who can help with support questions.
I delete all tweets and resolved email conversations after three months. So, if you think you are waiting for a reply, or I said I was going to work on something you care about and seem to have forgotten, please do nudge me.
I am always overwhelmed by work and behind on my messages. This is not to say that I do not enjoy just hanging out or talking about possible new features, but forgive me if some work takes longer than expected or if I cannot get to a particular idea quickly. In the same way, if you encounter actual traceback-raising errors or crashes, there is only one guy to fix it, so I prefer to know ASAP so I can prioritise.
I work by myself because I have acute difficulty working with others. Please do not spontaneously write long design documents or prepare other work for me--I find it more stressful than helpful, every time, and I won't give it the attention it deserves. If you would like to contribute time to hydrus, the user projects like the downloader repository and wiki help guides always have things to do.
That said:
- homepage
- github (latest build)
- issue tracker
- 8chan.moe /t/ (Hydrus Network General) (endchan bunker (.org))
- tumblr (rss)
- new downloads
- old downloads
- x
- discord
- patreon
- user-run repository and wiki (including download presets for several non-default boorus)
Warning
I am working on this system right now and will be moving the 'move files now' action to a more granular, always-on background migration. This document will update to reflect those changes!
"},{"location":"database_migration.html#database_migration","title":"database migration","text":""},{"location":"database_migration.html#intro","title":"the hydrus database","text":"A hydrus client consists of three components:
-
the software installation
This is the part that comes with the installer or extract release, with the executable and dlls and a handful of resource folders. It doesn't store any of your settings--it just knows how to present a database as a nice application. If you just run the hydrus_client executable straight, it looks in its 'db' subdirectory for a database, and if one is not found, it creates a new one. If it sees a database running at a lower version than itself, it will update the database before booting it.
It doesn't really matter where you put this. An SSD will load it marginally quicker the first time, but you probably won't notice. If you run it without command-line parameters, it will try to write to its own directory (to create the initial database), so if you mean to run it like that, it should not be in a protected place like Program Files.
-
the actual SQLite database
The client stores all its preferences and current state and knowledge about files--like file size and resolution, tags, ratings, inbox status, and so on and on--in a handful of SQLite database files, defaulting to install_dir/db. Depending on the size of your client, these might total 1MB in size or be as much as 10GB.
In order to perform a search or to fetch or process tags, the client has to interact with these files in many small bursts, which means it is best if these files are on a drive with low latency. An SSD is ideal, but a regularly-defragged HDD with a reasonable amount of free space also works well.
-
your media files
All of your jpegs and webms and so on (and their thumbnails) are stored in a single complicated directory that is by default at install_dir/db/client_files. All the files are named by their hash and stored in efficient hash-based subdirectories. In general, it is not navigable by humans, but it works very well for the fast access from a giant pool of files the client needs to do to manage your media.
Thumbnails tend to be fetched dozens at a time, so it is, again, ideal if they are stored on an SSD. Your regular media files--which on many clients total hundreds of GB--are usually fetched one at a time for human consumption and do not benefit from the expensive low-latency of an SSD. They are best stored on a cheap HDD, and, if desired, also work well across a network file system.
Although an initial install will keep these parts together, it is possible to, say, run the SQLite database on a fast drive but keep your media in cheap slow storage. This is an excellent arrangement that works for many users. And if you have a very large collection, you can even spread your files across multiple drives. It is not very technically difficult, but I do not recommend it for new users.
Backing such an arrangement up is obviously more complicated, and the internal client backup is not sophisticated enough to capture everything, so I recommend you figure out a broader solution with a third-party backup program like FreeFileSync.
"},{"location":"database_migration.html#pulling_media_apart","title":"pulling your media apart","text":"Danger
As always, I recommend creating a backup before you try any of this, just in case it goes wrong.
If you would like to move your files and thumbnails to new locations, I generally recommend you not move their folders around yourself--the database has an internal knowledge of where it thinks its file and thumbnail folders are, and if you move them while it is closed, it will become confused.
Missing LocationsIf your folders are in the wrong locations on a client boot, a repair dialog appears, and you can manually update the client's internal understanding. This is not impossible to figure out, and in some tricky storage situations doing this on purpose can be faster than letting the client migrate things itself, but generally it is best and safest to do everything through the dialog.
Go database->move media files, giving you this dialog:
The buttons let you add more locations and remove old ones. The operations on this dialog are simple and atomic--at no point is your db ever invalid.
Beneath db? means that the path is beneath the main db dir and so is stored internally as a relative path. Portable paths will still function if the database changes location between boots (for instance, if you run the client from a USB drive and it mounts under a different location).
Weight means the relative amount of media you would like to store in that location. It only matters if you are spreading your files across multiple locations. If location A has a weight of 1 and B has a weight of 2, A will get approximately one third of your files and B will get approximately two thirds.
Max Size means the max total size of files the client will want to store in that location. Again, it only matters if you are spreading your files across multiple locations, but it is a simple way to ensure you don't go over a particular smaller hard drive's size. One location must always be limitless. This is not precise, so give it some padding. When one location is maxed out, the remaining locations will distribute the remainder of the files according to their respective weights. For the meantime, this will not update by itself. If you import many files, the location may go over its limit and you will have to revisit 'move media files' to rebalance your files again. Bear with me--I will fix this soon with the background migrate.
Let's set up an example move:
I made several changes:
- Added
C:\\hydrus_files
to store files. - Added
D:\\hydrus_files
to store files, with a max size of 128MB. - Set
C:\\hydrus_thumbs
as the location to store thumbnails. - Removed the original
C:\\Hydrus Network\\db\\client_files
location.
While the ideal usage has changed significantly, note that the current usage remains the same. Nothing moves until you click 'move files now'. Moving files will take some time to finish. Once done, it looks like this:
The current and ideal usages line up, and the defunct
"},{"location":"database_migration.html#launch_parameter","title":"informing the software that the SQLite database is not in the default location","text":"C:\\Hydrus Network\\db\\client_files
location, which no longer stores anything, is removed from the list.A straight call to the hydrus_client executable will look for a SQLite database in install_dir/db. If one is not found, it will create one. If you move your database and then try to run the client again, it will try to create a new empty database in that old location!
To tell it about the new database location, pass it a
-d
or--db_dir
command line argument, like so:hydrus_client -d=\"D:\\media\\my_hydrus_database\"
- --or--
hydrus_client --db_dir=\"G:\\misc documents\\New Folder (3)\\DO NOT ENTER\"
- --or, from source--
python hydrus_client.py -d=\"D:\\media\\my_hydrus_database\"
- --or, for macOS--
open -n -a \"Hydrus Network.app\" --args -d=\"/path/to/db\"
And it will instead use the given path. If no database is found, it will similarly create a new empty one at that location. You can use any path that is valid in your system.
Bad Locations
Do not run a SQLite database on a network location! The database relies on clever hardware-level exclusive file locks, which network interfaces often fake. While the program may work, I cannot guarantee the database will stay non-corrupt.
Do not run a SQLite database on a location with filesystem-level compression enabled! In the best case (BTRFS), the database can suddenly get extremely slow when it hits a certain size; in the worst (NTFS), a >50GB database will encounter I/O errors and receive sporadic corruption!
Rather than typing the path out in a terminal every time you want to launch your external database, create a new shortcut with the argument in. Something like this:
Note that an install with an 'external' database no longer needs access to write to its own path, so you can store it anywhere you like, including protected read-only locations (e.g. in 'Program Files'). Just double-check your shortcuts are good.
"},{"location":"database_migration.html#finally","title":"backups","text":"If your database now lives in one or more new locations, make sure to update your backup routine to follow them!
"},{"location":"database_migration.html#to_an_ssd","title":"moving to an SSD","text":"As an example, let's say you started using the hydrus client on your HDD, and now you have an SSD available and would like to move your thumbnails and main install to that SSD to speed up the client. Your database will be valid and functional at every stage of this, and it can all be undone. The basic steps are:
- Move your 'fast' files to the fast location.
- Move your 'slow' files out of the main install directory.
- Move the install and db itself to the fast location and update shortcuts.
Specifically:
- Update your backup if you maintain one.
- Create an empty folder on your HDD that is outside of your current install folder. Call it 'hydrus_files' or similar.
- Create two empty folders on your SSD with names like 'hydrus_db' and 'hydrus_thumbnails'.
- Set the 'thumbnail location override' to 'hydrus_thumbnails'. You should get that new location in the list, currently empty but prepared to take all your thumbs.
- Hit 'move files now' to actually move the thumbnails. Since this involves moving a lot of individual files from a high-latency source, it will take a long time to finish. The hydrus client may hang periodically as it works, but you can just leave it to work on its own--it will get there in the end. You can also watch it do its disk work under Task Manager.
- Now hit 'add location' and select your new 'hydrus_files'. 'hydrus_files' should be appear and be willing to take 50% of the files.
- Select the old location (probably 'install_dir/db/client_files') and hit 'remove location' or 'decrease weight' until it has weight 0 and you are prompted to remove it completely. 'hydrus_files' should now be willing to take all 100% of the files from the old location.
- Hit 'move files now' again to make this happen. This should be fast since it is just moving a bunch of folders across the same partition.
- With everything now 'non-portable' and hence decoupled from the db, you can now easily migrate the install and db to 'hydrus_db' simply by shutting the client down and moving the install folder in a file explorer.
- Update your shortcut to the new hydrus_client.exe location and try to boot.
- Update your backup scheme to match your new locations.
- Enjoy a much faster client.
You should now have something like this (let's say the D drive is the fast SSD, and E is the high capacity HDD):
"},{"location":"database_migration.html#multiple_clients","title":"p.s. running multiple clients","text":"Since you now know how to tell the software about an external database, you can, if you like, run multiple clients from the same install (and if you previously had multiple install folders, now you can now just use the one). Just make multiple shortcuts to the same hydrus_client executable but with different database directories. They can run at the same time. You'll save yourself a little memory and update-hassle.
"},{"location":"developer_api.html","title":"API documentation","text":""},{"location":"developer_api.html#library_modules_created_by_hydrus_users","title":"Library modules created by hydrus users","text":"- Hydrus API: A python module that talks to the API.
- hydrus.js: A node.js module that talks to the API.
- more projects on github
In general, the API deals with standard UTF-8 JSON. POST requests and 200 OK responses are generally going to be a JSON 'Object' with variable names as keys and values obviously as values. There are examples throughout this document. For GET requests, everything is in standard GET parameters, but some variables are complicated and will need to be JSON-encoded and then URL-encoded. An example would be the 'tags' parameter on GET /get_files/search_files, which is a list of strings. Since GET http URLs have limits on what characters are allowed, but hydrus tags can have all sorts of characters, you'll be doing this:
-
Your list of tags:
[ 'character:samus aran', 'creator:\u9752\u3044\u685c', 'system:height > 2000' ]\n
-
JSON-encoded:
[\"character:samus aran\", \"creator:\\\\u9752\\\\u3044\\\\u685c\", \"system:height > 2000\"]\n
-
Then URL-encoded:
%5B%22character%3Asamus%20aran%22%2C%20%22creator%3A%5Cu9752%5Cu3044%5Cu685c%22%2C%20%22system%3Aheight%20%3E%202000%22%5D\n
-
In python, converting your tag list to the URL-encoded string would be:
urllib.parse.quote( json.dumps( tag_list ), safe = '' )\n
-
Full URL path example:
/get_files/search_files?file_sort_type=6&file_sort_asc=false&tags=%5B%22character%3Asamus%20aran%22%2C%20%22creator%3A%5Cu9752%5Cu3044%5Cu685c%22%2C%20%22system%3Aheight%20%3E%202000%22%5D\n
The API returns JSON for everything except actual file/thumbnail requests. Every JSON response includes the
version
of the Client API andhydrus_version
of the Client hosting it (for brevity, these values are not included in the example responses in this help). For errors, you'll typically get 400 for a missing/invalid parameter, 401/403/419 for missing/insufficient/expired access, and 500 for a real deal serverside error.Note
For any request sent to the API, the total size of the initial request line (this includes the URL and any parameters) and the headers must not be larger than 2 megabytes. Exceeding this limit will cause the request to fail. Make sure to use pagination if you are passing very large JSON arrays as parameters in a GET request.
"},{"location":"developer_api.html#cbor","title":"CBOR","text":"The API now tentatively supports CBOR, which is basically 'byte JSON'. If you are in a lower level language or need to do a lot of heavy work quickly, try it out!
To send CBOR, for POST put Content-Type
application/cbor
in your request header instead ofapplication/json
, and for GET just add acbor=1
parameter to the URL string. Use CBOR to encode any parameters that you would previously put in JSON:For POST requests, just print the pure bytes in the body, like this:
cbor2.dumps( arg_dict )\n
For GET, encode the parameter value in base64, like this:
-or-base64.urlsafe_b64encode( cbor2.dumps( argument ) )\n
str( base64.urlsafe_b64encode( cbor2.dumps( argument ) ), 'ascii' )\n
If you send CBOR, the client will return CBOR. If you want to send CBOR and get JSON back, or vice versa (or you are uploading a file and can't set CBOR Content-Type), send the Accept request header, like so:
Accept: application/cbor\nAccept: application/json\n
If the client does not support CBOR, you'll get 406.
"},{"location":"developer_api.html#access_and_permissions","title":"Access and permissions","text":"The client gives access to its API through different 'access keys', which are the typical random 64-character hex used in many other places across hydrus. Each guarantees different permissions such as handling files or tags. Most of the time, a user will provide full access, but do not assume this. If a access key header or parameter is not provided, you will get 401, and all insufficient permission problems will return 403 with appropriate error text.
Access is required for every request. You can provide this as an http header, like so:
Hydrus-Client-API-Access-Key : 0150d9c4f6a6d2082534a997f4588dcf0c56dffe1d03ffbf98472236112236ae\n
Or you can include it in the normal parameters of any request (except POST /add_files/add_file, which uses the entire POST body for the file's bytes).
For GET, this means including it into the URL parameters:
/get_files/thumbnail?file_id=452158&Hydrus-Client-API-Access-Key=0150d9c4f6a6d2082534a997f4588dcf0c56dffe1d03ffbf98472236112236ae\n
For POST, this means in the JSON body parameters, like so:
{\n \"hash_id\" : 123456,\n \"Hydrus-Client-API-Access-Key\" : \"0150d9c4f6a6d2082534a997f4588dcf0c56dffe1d03ffbf98472236112236ae\"\n}\n
There is also a simple 'session' system, where you can get a temporary key that gives the same access without having to include the permanent access key in every request. You can fetch a session key with the /session_key command and thereafter use it just as you would an access key, just with Hydrus-Client-API-Session-Key instead.
Session keys will expire if they are not used within 24 hours, or if the client is restarted, or if the underlying access key is deleted. An invalid/expired session key will give a 419 result with an appropriate error text.
Bear in mind the Client API is still under construction. Setting up the Client API to be accessible across the internet requires technical experience to be convenient. HTTPS is available for encrypted comms, but the default certificate is self-signed (which basically means an eavesdropper can't see through it, but your ISP/government could if they decided to target you). If you have your own domain to host from and an SSL cert, you can replace them and it'll use them instead (check the db directory for client.crt and client.key). Otherwise, be careful about transmitting sensitive content outside of your localhost/network.
"},{"location":"developer_api.html#common_complex_parameters","title":"Common Complex Parameters","text":""},{"location":"developer_api.html#parameters_files","title":"files","text":"If you need to refer to some files, you can use any of the following:
Arguments:file_id
: (selective, a numerical file id)file_ids
: (selective, a list of numerical file ids)hash
: (selective, a hexadecimal SHA256 hash)hashes
: (selective, a list of hexadecimal SHA256 hashes)
In GET requests, make sure any list is percent-encoded JSON. Your
"},{"location":"developer_api.html#parameters_file_domain","title":"file domain","text":"[1,2,3]
becomesurllib.parse.quote( json.dumps( [1,2,3] ), safe = '' )
, and thusfile_ids=%5B1%2C%202%2C%203%5D
.When you are searching, you may want to specify a particular file domain. Most of the time, you'll want to just set
Arguments:file_service_key
, but this can get complex:file_service_key
: (optional, selective A, hexadecimal, the file domain on which to search)file_service_keys
: (optional, selective A, list of hexadecimals, the union of file domains on which to search)deleted_file_service_key
: (optional, selective B, hexadecimal, the 'deleted from this file domain' on which to search)deleted_file_service_keys
: (optional, selective B, list of hexadecimals, the union of 'deleted from this file domain' on which to search)
The service keys are as in /get_services.
Hydrus supports two concepts here:
- Searching over a UNION of subdomains. If the user has several local file domains, e.g. 'favourites', 'personal', 'sfw', and 'nsfw', they might like to search two of them at once.
- Searching deleted files of subdomains. You can specifically, and quickly, search the files that have been deleted from somewhere.
You can play around with this yourself by clicking 'multiple locations' in the client with help->advanced mode on.
In extreme edge cases, these two can be mixed by populating both A and B selective, making a larger union of both current and deleted file records.
Please note that unions can be very very computationally expensive. If you can achieve what you want with a single file_service_key, two queries in a row with different service keys, or an umbrella like
all my files
orall local files
, please do. Otherwise, let me know what is running slow and I'll have a look at it.'deleted from all local files' includes all files that have been physically deleted (i.e. deleted from the trash) and not available any more for fetch file/thumbnail requests. 'deleted from all my files' includes all of those physically deleted files and the trash. If a file is deleted with the special 'do not leave a deletion record' command, then it won't show up in a 'deleted from file domain' search!
'all known files' is a tricky domain. It converts much of the search tech to ignore where files actually are and look at the accompanying tag domain (e.g. all the files that have been tagged), and can sometimes be very expensive.
Also, if you have the option to set both file and tag domains, you cannot enter 'all known files'/'all known tags'. It is too complicated to support, sorry!
"},{"location":"developer_api.html#legacy_service_name_parameters","title":"legacy service_name parameters","text":"The Client API used to respond to name-based service identifiers, for instance using 'my tags' instead of something like '6c6f63616c2074616773'. Service names can change, and they aren't strictly unique either, so I have moved away from them, but there is some soft legacy support.
The client will attempt to convert any of these to their 'service_key(s)' equivalents:
- file_service_name
- tag_service_name
- service_names_to_tags
- service_names_to_actions_to_tags
- service_names_to_additional_tags
But I strongly encourage you to move away from them as soon as reasonably possible. Look up the service keys you need with /get_service or /get_services.
If you have a clever script/program that does many things, then hit up /get_services on session initialisation and cache an internal map of key_to_name for the labels to use when you present services to the user.
Also, note that all users can now copy their service keys from review services.
"},{"location":"developer_api.html#services_object","title":"The Services Object","text":"Hydrus manages its different available domains and actions with what it calls services. If you are a regular user of the program, you will know about review services and manage services. The Client API needs to refer to services, either to accept commands from you or to tell you what metadata files have and where.
When it does this, it gives you this structure, typically under a
Services Objectservices
key right off the root node:{\n \"c6f63616c2074616773\" : {\n \"name\" : \"my tags\",\n \"type\": 5,\n \"type_pretty\" : \"local tag service\"\n },\n \"5674450950748cfb28778b511024cfbf0f9f67355cf833de632244078b5a6f8d\" : {\n \"name\" : \"example tag repo\",\n \"type\" : 0,\n \"type_pretty\" : \"hydrus tag repository\"\n },\n \"6c6f63616c2066696c6573\" : {\n \"name\" : \"my files\",\n \"type\" : 2,\n \"type_pretty\" : \"local file domain\"\n },\n \"7265706f7369746f72792075706461746573\" : {\n \"name\" : \"repository updates\",\n \"type\" : 20,\n \"type_pretty\" : \"local update file domain\"\n },\n \"ae7d9a603008919612894fc360130ae3d9925b8577d075cd0473090ac38b12b6\" : {\n \"name\": \"example file repo\",\n \"type\" : 1,\n \"type_pretty\" : \"hydrus file repository\"\n },\n \"616c6c206c6f63616c2066696c6573\" : {\n \"name\" : \"all local files\",\n \"type\": 15,\n \"type_pretty\" : \"virtual combined local file service\"\n },\n \"616c6c206c6f63616c206d65646961\" : {\n \"name\" : \"all my files\",\n \"type\" : 21,\n \"type_pretty\" : \"virtual combined local media service\"\n },\n \"616c6c206b6e6f776e2066696c6573\" : {\n \"name\" : \"all known files\",\n \"type\" : 11,\n \"type_pretty\" : \"virtual combined file service\"\n },\n \"616c6c206b6e6f776e2074616773\" : {\n \"name\" : \"all known tags\",\n \"type\": 10,\n \"type_pretty\" : \"virtual combined tag service\"\n },\n \"74d52c6238d25f846d579174c11856b1aaccdb04a185cb2c79f0d0e499284f2c\" : {\n \"name\" : \"example local rating like service\",\n \"type\" : 7,\n \"type_pretty\" : \"local like/dislike rating service\",\n \"star_shape\" : \"circle\"\n },\n \"90769255dae5c205c975fc4ce2efff796b8be8a421f786c1737f87f98187ffaf\" : {\n \"name\" : \"example local rating numerical service\",\n \"type\" : 6,\n \"type_pretty\" : \"local numerical rating service\",\n \"star_shape\" : \"fat star\",\n \"min_stars\" : 1,\n \"max_stars\" : 5\n },\n \"b474e0cbbab02ca1479c12ad985f1c680ea909a54eb028e3ad06750ea40d4106\" : {\n \"name\" : \"example local rating inc/dec service\",\n \"type\" : 22,\n \"type_pretty\" : \"local inc/dec rating service\"\n },\n \"7472617368\" : {\n \"name\" : \"trash\",\n \"type\" : 14,\n \"type_pretty\" : \"local trash file domain\"\n }\n}\n
I hope you recognise some of the information here. But what's that hex key on each section? It is the
service_key
.All services have these properties:
name
- A mutable human-friendly name like 'my tags'. You can use this to present the service to the user--they should recognise it.type
- An integer enum saying whether the service is a local tag service or like/dislike rating service or whatever. This cannot change.service_key
- The true 'id' of the service. It is a string of hex, sometimes just twenty or so characters but in many cases 64 characters. This cannot change, and it is how we will refer to different services.
This
service_key
is important. A user can rename their services, soname
is not an excellent identifier, and definitely not something you should save to any permanent config file.If we want to search some files on a particular file and tag domain, we should expect to be saying something like
file_service_key=6c6f63616c2066696c6573
andtag_service_key=f032e94a38bb9867521a05dc7b189941a9c65c25048911f936fc639be2064a4b
somewhere in the request.You won't see all of these, but the service
type
enum is:- 0 - tag repository
- 1 - file repository
- 2 - a local file domain like 'my files'
- 5 - a local tag domain like 'my tags'
- 6 - a 'numerical' rating service with several stars
- 7 - a 'like/dislike' rating service with on/off status
- 10 - all known tags -- a union of all the tag services
- 11 - all known files -- a union of all the file services and files that appear in tag services
- 12 - the local booru -- you can ignore this
- 13 - IPFS
- 14 - trash
- 15 - all local files -- all files on hard disk ('all my files' + updates + trash)
- 17 - file notes
- 18 - Client API
- 19 - deleted from anywhere -- you can ignore this
- 20 - local updates -- a file domain to store repository update files in
- 21 - all my files -- union of all local file domains
- 22 - a 'inc/dec' rating service with positive integer rating
- 99 - server administration
type_pretty
is something you can show users. Hydrus uses the same labels in manage services and so on.Rating services now have some extra data:
- like/dislike and numerical services have
star_shape
, which is one ofcircle | square | fat star | pentagram star
- numerical services have
min_stars
(0 or 1) andmax_stars
(1 to 20)
If you are displaying ratings, don't feel crazy obligated to obey the shape! Show a \u2158, select from a dropdown list, do whatever you like!
If you want to know the services in a client, hit up /get_services, which simply gives the above. The same structure has recently been added to /get_files/file_metadata for convenience, since that refers to many different services when it is talking about file locations and ratings and so on.
Note: If you need to do some quick testing, you should be able to copy the
"},{"location":"developer_api.html#CDPP","title":"Current Deleted Pending Petitioned","text":"service_key
of any service by hitting the 'copy service key' button in review services.The content storage and update pipeline systems in hydrus consider content (e.g. 'on file service x, file y exists', 'on tag service x, file y has tag z', or 'on rating service x, file y has rating z') as existing within a blend of four states:
- Current - The content exists on the service.
- Deleted - The content has been removed from on the service.
- Pending - The content is queued to be added to the service.
- Petitioned - The content is queued to be removed from the service.
Where content that has never touched the service has a default 'null status' of no state at all.
Content may be in two categories at once--for instance, any Petitioned data is always also Current--but some states are mutually exclusive: Current data cannot also be Deleted.
Let's examine this more carefully specifically. Current, Deleted, and Pending may exist on their own, and Deleted and Pending may exist simultaneously. Read this horizontally to vertically, such that 'Content that is Current may also be Petitioned' while 'Content that is Petitioned must also be Current':
Current Deleted Pending Petitioned Current - Never Never May Deleted Never - May Never Pending Never May - Never Petitioned Must Never Never -Local services have no concept of pending or petitioned, so they generally just have 'add x'/'delete x' verbs to convert content between current and deleted. Remote services like the PTR have a queue of pending changes waiting to be committed by the user to the server, so in these cases I will expose to you the full suite of 'pend x'/'petition x'/'rescind pend x'/'rescind petition x'. Although you might somewhere be able to 'pend'/'petition' content to a local service, these 'pending' changes will be committed instantly so they are synonymous with add/delete.
- When an 'add' is committed, the data is removed from the deleted record and added to the current record.
- When a 'delete' is committed, the data is removed from the current record and added to the deleted record.
- When a 'pend' is committed, the data is removed from the deleted record and added to the current record. (It is also deleted from the pending record!)
- When a 'petition' is committed, the data is removed from the current record and added to the deleted record. (It is also deleted from the petitioned record!)
- When a 'rescind pend' is committed, the data is removed from the pending record.
- When a 'rescind petition' is committed, the data is removed from the petitioned record.
Let's look at where the verbs make sense. Again, horizontal, so 'Content that is Current can receive a Petition command':
Add/Pend Delete/Petition Rescind Pend Rescind Petition No state May May - - Current - May - - Deleted May - - - Pending May overwrite an existing reason - May - Petitioned - May overwrite an existing reason - MayIn hydrus, anything in the content update pipeline that doesn't make sense, here a '-', tends to result in an errorless no-op, so you might not care to do too much filtering on your end of things if you don't need to--don't worry about deleting something twice.
Note that content that does not yet exist can be pre-emptively petitioned/deleted. A couple of advanced cases enjoy this capability, for instance when you are syncing delete records from one client to another.
Also, it is often the case that content that is recorded as deleted is more difficult to re-add/re-pend. You might need to be a janitor to re-pend something, or, for this API, set some
"},{"location":"developer_api.html#access_management","title":"Access Management","text":""},{"location":"developer_api.html#api_version","title":"GEToverride_previously_deleted_mappings
parameter. This is by design and helps you to stop automatically re-adding something that the user spent slow human time deciding to delete./api_version
","text":"Gets the current API version. This increments every time I alter the API.
Restricted access: NO.
Required Headers: n/a
Arguments: n/a
Response: Some simple JSON describing the current api version (and hydrus client version, if you are interested). Note that this is not very useful any more, for two reasons:- The 'Server' header of every response (and a duplicated 'Hydrus-Server' one, if you have a complicated proxy situation that overwrites 'Server') are now in the form \"client api/{client_api_version} ({software_version})\", e.g. \"client api/32 (497)\".
- Every JSON response explicitly includes this now.
"},{"location":"developer_api.html#request_new_permissions","title":"GET{\n \"version\" : 17,\n \"hydrus_version\" : 441\n}\n
/request_new_permissions
","text":"Register a new external program with the client. This requires the 'add from api request' mini-dialog under services->review services to be open, otherwise it will 403.
Restricted access: NO.
Required Headers: n/a
Arguments:name
: (descriptive name of your access)permits_everything
: (selective, bool, whether to permit all tasks now and in future)-
basic_permissions
: Selective. A JSON-encoded list of numerical permission identifiers you want to request.The permissions are currently:
- 0 - Import and Edit URLs
- 1 - Import and Delete Files
- 2 - Edit File Tags
- 3 - Search for and Fetch Files
- 4 - Manage Pages
- 5 - Manage Cookies and Headers
- 6 - Manage Database
- 7 - Edit File Notes
- 8 - Edit File Relationships
- 9 - Edit File Ratings
- 10 - Manage Popups
- 11 - Edit File Times
- 12 - Commit Pending
- 13 - See Local Paths
Example request (for permissions [0,1])/request_new_permissions?name=migrator&permit_everything=true\n
Response: Some JSON with your access key, which is 64 characters of hex. This will not be valid until the user approves the request in the client ui. Example response/request_new_permissions?name=my%20import%20script&basic_permissions=%5B0%2C1%5D\n
{\n \"access_key\" : \"73c9ab12751dcf3368f028d3abbe1d8e2a3a48d0de25e64f3a8f00f3a1424c57\"\n}\n
The
"},{"location":"developer_api.html#session_key","title":"GETpermits_everything
overrules all the individual permissions and will encompass any new permissions added in future. It is a convenient catch-all for local-only services where you are running things yourself or the user otherwise implicitly trusts you./session_key
","text":"Get a new session key.
Restricted access: YES. No permissions required.
Required Headers: n/a
Arguments: n/a
Response: Some JSON with a new session key in hex. Example response{\n \"session_key\" : \"f6e651e7467255ade6f7c66050f3d595ff06d6f3d3693a3a6fb1a9c2b278f800\"\n}\n
Note
Note that the access you provide to get a new session key can be a session key, if that happens to be useful. As long as you have some kind of access, you can generate a new session key.
A session key expires after 24 hours of inactivity, whenever the client restarts, or if the underlying access key is deleted. A request on an expired session key returns 419.
"},{"location":"developer_api.html#verify_access_key","title":"GET/verify_access_key
","text":"Check your access key is valid.
Restricted access: YES. No permissions required.
Required Headers: n/a
Arguments: n/a
Response: 401/403/419 and some error text if the provided access/session key is invalid, otherwise some JSON with basic permission info. Example response
"},{"location":"developer_api.html#get_service","title":"GET{\n \"name\" : \"autotagger\",\n \"permits_everything\" : false,\n \"basic_permissions\" : [0, 1, 3],\n \"human_description\" : \"API Permissions (autotagger): add tags to files, import files, search for files: Can search: only autotag this\"\n}\n
/get_service
","text":"Ask the client about a specific service.
Restricted access: YES. At least one of Add Files, Add Tags, Manage Pages, or Search Files permission needed.Required Headers: n/a
Arguments:service_name
: (selective, string, the name of the service)service_key
: (selective, hex string, the service key of the service)
Response: Some JSON about the service. A similar format as /get_services and The Services Object. Example response/get_service?service_name=my%20tags\n/get_service?service_key=6c6f63616c2074616773\n
{\n \"service\" : {\n \"name\" : \"my tags\",\n \"service_key\" : \"6c6f63616c2074616773\",\n \"type\" : 5,\n \"type_pretty\" : \"local tag service\"\n }\n}\n
If the service does not exist, this gives 404. It is very unlikely but edge-case possible that two services will have the same name, in this case you'll get the pseudorandom first.
It will only respond to services in the /get_services list. I will expand the available types in future as we add ratings etc... to the Client API.
"},{"location":"developer_api.html#get_services","title":"GET/get_services
","text":"Ask the client about its services.
Restricted access: YES. At least one of Add Files, Add Tags, Manage Pages, or Search Files permission needed.Required Headers: n/a
Arguments: n/a
Response: Some JSON listing the client's services. Example response{\n \"services\" : \"The Services Object\"\n}\n
This now primarily uses The Services Object.
Note
If you do the request and look at the actual response, you will see a lot more data under different keys--this is deprecated, and will be deleted in 2024. If you use the old structure, please move over!
"},{"location":"developer_api.html#importing_and_deleting_files","title":"Importing and Deleting Files","text":""},{"location":"developer_api.html#add_files_add_file","title":"POST/add_files/add_file
","text":"Tell the client to import a file.
Restricted access: YES. Import Files permission needed. Required Headers:- Content-Type:
application/json
(if sending path),application/octet-stream
(if sending file)
path
: (the path you want to import)delete_after_successful_import
: (optional, defaults tofalse
, sets to delete the source file on a 'successful' or 'already in db' result)- file domain (optional, local file domain(s) only, defaults to your \"quiet\" file import options's destination)
{\n \"path\" : \"E:\\\\to_import\\\\ayanami.jpg\"\n}\n
If you include a file domain, it can only include 'local' file domains (by default on a new client this would just be \"my files\"), but you can send multiple to import to more than one location at once. Asking to import to 'all local files', 'all my files', 'trash', 'repository updates', or a file repository/ipfs will give you 400.
Arguments (as bytes): You can alternately just send the file's raw bytes as the entire POST body. In this case, you cannot send any other parameters, so you will be left with the default import file domain. Response:Some JSON with the import result. Please note that file imports for large files may take several seconds, and longer if the client is busy doing other db work, so make sure your request is willing to wait that long for the response. Example response
{\n \"status\" : 1,\n \"hash\" : \"29a15ad0c035c0a0e86e2591660207db64b10777ced76565a695102a481c3dd1\",\n \"note\" : \"\"\n}\n
status
is:- 1 - File was successfully imported
- 2 - File already in database
- 3 - File previously deleted
- 4 - File failed to import
- 7 - File vetoed
A file 'veto' is caused by the file import options (which in this case is the 'quiet' set under the client's options->importing) stopping the file due to its resolution or minimum file size rules, etc...
'hash' is the file's SHA256 hash in hexadecimal, and 'note' is any additional human-readable text appropriate to the file status that you may recognise from hydrus's normal import workflow. For an outright import error, it will be a summary of the exception that you can present to the user, and a new field
"},{"location":"developer_api.html#add_files_delete_files","title":"POSTtraceback
will have the full trace for debugging purposes./add_files/delete_files
","text":"Tell the client to send files to the trash.
Restricted access: YES. Import Files permission needed. Required Headers:Content-Type
:application/json
- files
- file domain (optional, defaults to all my files)
reason
: (optional, string, the reason attached to the delete action)
Response: 200 and no content.{\n \"hash\" : \"78f92ba4a786225ee2a1236efa6b7dc81dd729faf4af99f96f3e20bad6d8b538\"\n}\n
If you specify a file service, the file will only be deleted from that location. Only local file domains are allowed (so you can't delete from a file repository or unpin from ipfs yet). It defaults to all my files, which will delete from all local services (i.e. force sending to trash). Sending 'all local files' on a file already in the trash will trigger a physical file delete.
"},{"location":"developer_api.html#add_files_undelete_files","title":"POST/add_files/undelete_files
","text":"Tell the client to restore files that were previously deleted to their old file service(s).
Restricted access: YES. Import Files permission needed. Required Headers:Content-Type
: application/json
- files
- file domain (optional, defaults to all my files)
Response: 200 and no content.{\n \"hash\" : \"78f92ba4a786225ee2a1236efa6b7dc81dd729faf4af99f96f3e20bad6d8b538\"\n}\n
This is the reverse of a delete_files--restoring files back to where they came from. If you specify a file service, the files will only be undeleted to there (if they have a delete record, otherwise this is nullipotent). The default, 'all my files', undeletes to all local file services for which there are deletion records.
This operation will only occur on files that are currently in your file store (i.e. in 'all local files', and maybe, but not necessarily, in 'trash'). You cannot 'undelete' something you do not have!
"},{"location":"developer_api.html#add_files_clear_file_deletion_record","title":"POST/add_files/clear_file_deletion_record
","text":"Tell the client to forget that it once deleted files.
Restricted access: YES. Import Files permission needed. Required Headers:Content-Type
: application/json
- files
Response: 200 and no content.{\n \"hash\" : \"78f92ba4a786225ee2a1236efa6b7dc81dd729faf4af99f96f3e20bad6d8b538\"\n}\n
This is the same as the advanced deletion option of the same basic name. It will erase the record that a file has been physically deleted (i.e. it only applies to deletion records in the 'all local files' domain). A file that no longer has a 'all local files' deletion record will pass a 'exclude previously deleted files' check in a file import options.
"},{"location":"developer_api.html#add_files_migrate_files","title":"POST/add_files/migrate_files
","text":"Copy files from one local file domain to another.
Restricted access: YES. Import Files permission needed. Required Headers:Content-Type
:application/json
- files
- file domain
Response: 200 and no content.{\n \"hash\" : \"78f92ba4a786225ee2a1236efa6b7dc81dd729faf4af99f96f3e20bad6d8b538\",\n \"file_service_key\" : \"572ff2bd34857c0b3210b967a5a40cb338ca4c5747f2218d4041ddf8b6d077f1\"\n}\n
This is only appropriate if the user has multiple local file services. It does the same as the media files->add to->domain menu action. If the files are originally in local file domain A, and you say to add to B, then afterwards they will be in both A and B. You can say 'B and C' to add to multiple domains at once, if needed. The action is idempotent and will not overwrite 'already in' files with fresh timestamps or anything.
If you need to do a 'move' migrate, then please follow this command with a delete from wherever you need to remove from.
If you try to add non-local files (specifically, files that are not in 'all my files'), or migrate to a file domain that is not a local file domain, then this will 400!
"},{"location":"developer_api.html#add_files_archive_files","title":"POST/add_files/archive_files
","text":"Tell the client to archive inboxed files.
Restricted access: YES. Import Files permission needed. Required Headers:Content-Type
: application/json
- files
Response: 200 and no content.{\n \"hash\" : \"78f92ba4a786225ee2a1236efa6b7dc81dd729faf4af99f96f3e20bad6d8b538\"\n}\n
This puts files in the 'archive', taking them out of the inbox. It only has meaning for files currently in 'my files' or 'trash'. There is no error if any files do not currently exist or are already in the archive.
"},{"location":"developer_api.html#add_files_unarchive_files","title":"POST/add_files/unarchive_files
","text":"Tell the client re-inbox archived files.
Restricted access: YES. Import Files permission needed. Required Headers:Content-Type
: application/json
- files
Response: 200 and no content.{\n \"hash\" : \"78f92ba4a786225ee2a1236efa6b7dc81dd729faf4af99f96f3e20bad6d8b538\"\n}\n
This puts files back in the inbox, taking them out of the archive. It only has meaning for files currently in 'my files' or 'trash'. There is no error if any files do not currently exist or are already in the inbox.
"},{"location":"developer_api.html#add_files_generate_hashes","title":"POST/add_files/generate_hashes
","text":"Generate hashes for an arbitrary file.
Restricted access: YES. Import Files permission needed. Required Headers:- Content-Type:
application/json
(if sending path),application/octet-stream
(if sending file)
path
: (the path you want to import)
Arguments (as bytes): You can alternately just send the file's bytes as the POST body. Response:{\n \"path\" : \"E:\\\\to_import\\\\ayanami.jpg\"\n}\n
Some JSON with the hashes of the file Example response
{\n \"hash\": \"7de421a3f9be871a7037cca8286b149a31aecb6719268a94188d76c389fa140c\",\n \"perceptual_hashes\": [\n \"b44dc7b24dcb381c\"\n ],\n \"pixel_hash\": \"c7bf20e5c4b8a524c2c3e3af2737e26975d09cba2b3b8b76341c4c69b196da4e\",\n}\n
hash
is the sha256 hash of the submitted file.perceptual_hashes
is a list of perceptual hashes for the file.pixel_hash
is the sha256 hash of the pixel data of the rendered image.
"},{"location":"developer_api.html#importing_and_editing_urls","title":"Importing and Editing URLs","text":""},{"location":"developer_api.html#add_urls_get_url_files","title":"GEThash
will always be returned for any file, the others will only be returned for filetypes they can be generated for./add_urls/get_url_files
","text":"Ask the client about an URL's files.
Restricted access: YES. Import URLs permission needed.Required Headers: n/a
Arguments:url
: (the url you want to ask about)doublecheck_file_system
: true or false (optional, defaults False)
http://safebooru.org/index.php?page=post&s=view&id=2753608
:
Response: Some JSON which files are known to be mapped to that URL. Note this needs a database hit, so it may be delayed if the client is otherwise busy. Don't rely on this to always be fast. Example response/add_urls/get_url_files?url=http%3A%2F%2Fsafebooru.org%2Findex.php%3Fpage%3Dpost%26s%3Dview%26id%3D2753608\n
{\n \"normalised_url\" : \"https://safebooru.org/index.php?id=2753608&page=post&s=view\",\n \"url_file_statuses\" : [\n {\n \"status\" : 2,\n \"hash\" : \"20e9002824e5e7ffc240b91b6e4a6af552b3143993c1778fd523c30d9fdde02c\",\n \"note\" : \"url recognised: Imported at 2015/10/18 10:58:01, which was 3 years 4 months ago (before this check).\"\n }\n ]\n}\n
The
url_file_statuses
is a list of zero-to-n JSON Objects, each representing a file match the client found in its database for the URL. Typically, it will be of length 0 (for as-yet-unvisited URLs or Gallery/Watchable URLs that are not attached to files) or 1, but sometimes multiple files are given the same URL (sometimes by mistaken misattribution, sometimes by design, such as pixiv manga pages). Handling n files per URL is a pain but an unavoidable issue you should account for.status
mas the same mapping as for/add_files/add_file
, but the possible results are different:- 0 - File not in database, ready for import (you will only see this very rarely--usually in this case you will just get no matches)
- 2 - File already in database
- 3 - File previously deleted
hash
is the file's SHA256 hash in hexadecimal, and 'note' is some occasional additional human-readable text you may recognise from hydrus's normal import workflow.If you set
"},{"location":"developer_api.html#add_urls_get_url_info","title":"GETdoublecheck_file_system
totrue
, then any result that is 'already in db' (2) will be double-checked against the actual file system. This check happens on any normal file import process, just to check for and fix missing files (if the file is missing, the status becomes 0--new), but the check can take more than a few milliseconds on an HDD or a network drive, so the default behaviour, assuming you mostly just want to spam for 'seen this before' file statuses, is to not do it./add_urls/get_url_info
","text":"Ask the client for information about a URL.
Restricted access: YES. Import URLs permission needed.Required Headers: n/a
Arguments:url
: (the url you want to ask about)
https://boards.4chan.org/tv/thread/197641945/itt-moments-in-film-or-tv-that-aged-poorly
:
Response:/add_urls/get_url_info?url=https%3A%2F%2Fboards.4chan.org%2Ftv%2Fthread%2F197641945%2Fitt-moments-in-film-or-tv-that-aged-poorly\n
Some JSON describing what the client thinks of the URL. Example response
{\n \"request_url\" : \"https://a.4cdn.org/tv/thread/197641945.json\",\n \"normalised_url\" : \"https://boards.4chan.org/tv/thread/197641945\",\n \"url_type\" : 4,\n \"url_type_string\" : \"watchable url\",\n \"match_name\" : \"8chan thread\",\n \"can_parse\" : true\n}\n
The url types are currently:
- 0 - Post URL
- 2 - File URL
- 3 - Gallery URL
- 4 - Watchable URL
- 5 - Unknown URL (i.e. no matching URL Class)
'Unknown' URLs are treated in the client as direct File URLs. Even though the 'File URL' type is available, most file urls do not have a URL Class, so they will appear as Unknown. Adding them to the client will pass them to the URL Downloader as a raw file for download and import.
The
normalised_url
is the fully normalised URL--what is used for comparison and saving to disk.The
"},{"location":"developer_api.html#add_urls_add_url","title":"POSTrequest_url
is either the lighter 'for server' normalised URL, which may include ephemeral token parameters, or, as in the case here, the fully converted API/redirect URL. (When hydrus is asked to check a 4chan thread, it doesn't hit the HTML, but the JSON API.)/add_urls/add_url
","text":"Tell the client to 'import' a URL. This triggers the exact same routine as drag-and-dropping a text URL onto the main client window.
Restricted access: YES. Import URLs permission needed. Add Tags needed to include tags. Required Headers:Content-Type
:application/json
url
: (the url you want to add)destination_page_key
: (optional page identifier for the page to receive the url)destination_page_name
: (optional page name to receive the url)- file domain (optional, sets where to import the file)
show_destination_page
: (optional, defaulting to false, controls whether the UI will change pages on add)service_keys_to_additional_tags
: (optional, selective, tags to give to any files imported from this url)filterable_tags
: (optional tags to be filtered by any tag import options that applies to the URL)
If you specify a
destination_page_name
and an appropriate importer page already exists with that name, that page will be used. Otherwise, a new page with that name will be recreated (and used by subsequent calls with that name). Make sure it that page name is unique (e.g. '/b/ threads', not 'watcher') in your client, or it may not be found.Alternately,
destination_page_key
defines exactly which page should be used. Bear in mind this page key is only valid to the current session (they are regenerated on client reset or session reload), so you must figure out which one you want using the /manage_pages/get_pages call. If the correct page_key is not found, or the page it corresponds to is of the incorrect type, the standard page selection/creation rules will apply.You can set a destination file domain, which will select (or, for probably most of your initial requests, create) a download page that has a non-default 'file import options' with the given destination. If you set both a file domain and also a
destination_page_key
, then the page key takes precedence. If you do not set a file domain, then the import uses whatever the page has, like normal; for url import pages, this is probably your \"loud\" file import options default.show_destination_page
defaults to False to reduce flicker when adding many URLs to different pages quickly. If you turn it on, the client will behave like a URL drag and drop and select the final page the URL ends up on.service_keys_to_additional_tags
uses the same data structure as in /add_tags/add_tags--service keys to a list of tags to add. You will need 'add tags' permission or this will 403. These tags work exactly as 'additional' tags work in a tag import options. They are service specific, and always added unless some advanced tag import options checkbox (like 'only add tags to new files') is set.filterable_tags works like the tags parsed by a hydrus downloader. It is just a list of strings. They have no inherant service and will be sent to a tag import options, if one exists, to decide which tag services get what. This parameter is useful if you are pulling all a URL's tags outside of hydrus and want to have them processed like any other downloader, rather than figuring out service names and namespace filtering on your end. Note that in order for a tag import options to kick in, I think you will have to have a Post URL URL Class hydrus-side set up for the URL so some tag import options (whether that is Class-specific or just the default) can be loaded at import time.
Example request body
Example request body{\n \"url\" : \"https://8ch.net/tv/res/1846574.html\",\n \"destination_page_name\" : \"kino zone\",\n \"service_keys_to_additional_tags\" : {\n \"6c6f63616c2074616773\" : [\"as seen on /tv/\"]\n }\n}\n
Response: Some JSON with info on the URL added. Example response{\n \"url\" : \"https://safebooru.org/index.php?page=post&s=view&id=3195917\",\n \"filterable_tags\" : [\n \"1girl\",\n \"artist name\",\n \"creator:azto dio\",\n \"blonde hair\",\n \"blue eyes\",\n \"breasts\",\n \"character name\",\n \"commentary\",\n \"english commentary\",\n \"formal\",\n \"full body\",\n \"glasses\",\n \"gloves\",\n \"hair between eyes\",\n \"high heels\",\n \"highres\",\n \"large breasts\",\n \"long hair\",\n \"long sleeves\",\n \"looking at viewer\",\n \"series:metroid\",\n \"mole\",\n \"mole under mouth\",\n \"patreon username\",\n \"ponytail\",\n \"character:samus aran\",\n \"solo\",\n \"standing\",\n \"suit\",\n \"watermark\"\n ]\n}\n
"},{"location":"developer_api.html#add_urls_associate_url","title":"POST{\n \"human_result_text\" : \"\\\"https://8ch.net/tv/res/1846574.html\\\" URL added successfully.\",\n \"normalised_url\" : \"https://8ch.net/tv/res/1846574.html\"\n}\n
/add_urls/associate_url
","text":"Manage which URLs the client considers to be associated with which files.
Restricted access: YES. Import URLs permission needed. Required Headers:Content-Type
:application/json
- files
url_to_add
: (optional, selective A, an url you want to associate with the file(s))urls_to_add
: (optional, selective A, a list of urls you want to associate with the file(s))url_to_delete
: (optional, selective B, an url you want to disassociate from the file(s))urls_to_delete
: (optional, selective B, a list of urls you want to disassociate from the file(s))normalise_urls
: (optional, default true, only affects the 'add' urls)
The single/multiple arguments work the same--just use whatever is convenient for you.
Unless you really know what you are doing, I strongly recommend you stick to associating URLs with just one single 'hash' at a time. Multiple hashes pointing to the same URL is unusual and frequently unhelpful.
By default, anything you throw at the 'add' side will be normalised nicely, but if you need to add some specific/weird URL text, or you need to add a URI, set
Example request bodynormalise_urls
tofalse
. Anything you throw at the 'delete' side will not be normalised, so double-check you are deleting exactly what you mean to via GET /get_files/file_metadata etc..
Response: 200 with no content. Like when adding tags, this is safely idempotent--do not worry about re-adding URLs associations that already exist or accidentally trying to delete ones that don't."},{"location":"developer_api.html#editing_file_tags","title":"Editing File Tags","text":""},{"location":"developer_api.html#add_tags_clean_tags","title":"GET{\n \"url_to_add\" : \"https://rule34.xxx/index.php?id=2588418&page=post&s=view\",\n \"hash\" : \"3b820114f658d768550e4e3d4f1dced3ff8db77443472b5ad93700647ad2d3ba\"\n}\n
/add_tags/clean_tags
","text":"Ask the client about how it will see certain tags.
Restricted access: YES. Add Tags permission needed.Required Headers: n/a
Arguments (in percent-encoded JSON):tags
: (a list of the tags you want cleaned)
[ \" bikini \", \"blue eyes\", \" character : samus aran \", \" :)\", \" \", \"\", \"10\", \"11\", \"9\", \"system:wew\", \"-flower\" ]
:
Response:/add_tags/clean_tags?tags=%5B%22%20bikini%20%22%2C%20%22blue%20%20%20%20eyes%22%2C%20%22%20character%20%3A%20samus%20aran%20%22%2C%20%22%3A%29%22%2C%20%22%20%20%20%22%2C%20%22%22%2C%20%2210%22%2C%20%2211%22%2C%20%229%22%2C%20%22system%3Awew%22%2C%20%22-flower%22%5D\n
The tags cleaned according to hydrus rules. They will also be in hydrus human-friendly sorting order. Example response
{\n \"tags\" : [\"9\", \"10\", \"11\", \" ::)\", \"bikini\", \"blue eyes\", \"character:samus aran\", \"flower\", \"wew\"]\n}\n
Mostly, hydrus simply trims excess whitespace, but the other examples are rare issues you might run into. 'system' is an invalid namespace, tags cannot be prefixed with hyphens, and any tag starting with ':' is secretly dealt with internally as \"[no namespace]:[colon-prefixed-subtag]\". Again, you probably won't run into these, but if you see a mismatch somewhere and want to figure it out, or just want to sort some numbered tags, you might like to try this.
"},{"location":"developer_api.html#add_tags_get_siblings_and_parents","title":"GET/add_tags/get_siblings_and_parents
","text":"Ask the client about tags' sibling and parent relationships.
Restricted access: YES. Add Tags permission needed.Required Headers: n/a
Arguments (in percent-encoded JSON):tags
: (a list of the tags you want info on)
[ \"blue eyes\", \"samus aran\" ]
:
Response:/add_tags/get_siblings_and_parents?tags=%5B%22blue%20eyes%22%2C%20%22samus%20aran%22%5D\n
An Object showing all the display relationships for each tag on each service. Also The Services Object. Example response
{\n \"services\" : \"The Services Object\"\n \"tags\" : {\n \"blue eyes\" : {\n \"6c6f63616c2074616773\" : {\n \"ideal_tag\" : \"blue eyes\",\n \"siblings\" : [\n \"blue eyes\",\n \"blue_eyes\",\n \"blue eye\",\n \"blue_eye\"\n ],\n \"descendants\" : [],\n \"ancestors\" : []\n },\n \"877bfcf81f56e7e3e4bc3f8d8669f92290c140ba0acfd6c7771c5e1dc7be62d7\": {\n \"ideal_tag\" : \"blue eyes\",\n \"siblings\" : [\n \"blue eyes\"\n ],\n \"descendants\" : [],\n \"ancestors\" : []\n }\n },\n \"samus aran\" : {\n \"6c6f63616c2074616773\" : {\n \"ideal_tag\" : \"character:samus aran\",\n \"siblings\" : [\n \"samus aran\",\n \"samus_aran\",\n \"character:samus aran\"\n ],\n \"descendants\" : [\n \"character:samus aran (zero suit)\"\n \"cosplay:samus aran\"\n ],\n \"ancestors\" : [\n \"series:metroid\",\n \"studio:nintendo\"\n ]\n },\n \"877bfcf81f56e7e3e4bc3f8d8669f92290c140ba0acfd6c7771c5e1dc7be62d7\": {\n \"ideal_tag\" : \"samus aran\",\n \"siblings\" : [\n \"samus aran\"\n ],\n \"descendants\" : [\n \"zero suit samus\",\n \"samus_aran_(cosplay)\"\n ],\n \"ancestors\" : []\n }\n }\n }\n}\n
This data is essentially how mappings in the
storage
tag_display_type
becomedisplay
.The hex keys are the service keys, which you will have seen elsewhere, like GET /get_files/file_metadata. Note that there is no concept of 'all known tags' here. If a tag is in 'my tags', it follows the rules of 'my tags', and then all the services' display tags are merged into the 'all known tags' pool for user display.
Also, the siblings and parents here are not just what is in tags->manage tag siblings/parents, they are the final computed combination of rules as set in tags->manage where tag siblings and parents apply. The data given here is not guaranteed to be useful for editing siblings and parents on a particular service. That data, which is currently pair-based, will appear in a different API request in future.
ideal_tag
is how the tag appears in normal display to the user.siblings
is every tag that will show as theideal_tag
, including theideal_tag
itself.descendants
is every child (and recursive grandchild, great-grandchild...) that implies theideal_tag
.ancestors
is every parent (and recursive grandparent, great-grandparent...) that our tag implies.
Every descendant and ancestor is an
ideal_tag
itself that may have its own siblings.Most situations are simple, but remember that siblings and parents in hydrus can get complex. If you want to display this data, I recommend you plan to support simple service-specific workflows, and add hooks to recognise conflicts and other difficulty and, when that happens, abandon ship (send the user back to Hydrus proper). Also, if you show summaries of the data anywhere, make sure you add a 'and 22 more...' overflow mechanism to your menus, since if you hit up 'azur lane' or 'pokemon', you are going to get hundreds of children.
I generally warn you off computing sibling and parent mappings or counts yourself. The data from this request is best used for sibling and parent decorators on individual tags in a 'manage tags' presentation. The code that actually computes what siblings and parents look like in the 'display' context can be a pain at times, and I've already done it. Just run /search_tags or /file_metadata again after any changes you make and you'll get updated values.
"},{"location":"developer_api.html#add_tags_search_tags","title":"GET/add_tags/search_tags
","text":"Search the client for tags.
Restricted access: YES. Search for Files and Add Tags permission needed.Required Headers: n/a
Arguments:search
: (the tag text to search for, enter exactly what you would in the client UI)- file domain (optional, defaults to all my files)
tag_service_key
: (optional, hexadecimal, the tag domain on which to search, defaults to all known tags)tag_display_type
: (optional, string, to select whether to search raw or sibling-processed tags, defaults tostorage
)
The
file domain
andtag_service_key
perform the function of the file and tag domain buttons in the client UI.The
tag_display_type
can be eitherstorage
(the default), which searches your file's stored tags, just as they appear in a 'manage tags' dialog, ordisplay
, which searches the sibling-processed tags, just as they appear in a normal file search page. In the example above, setting thetag_display_type
todisplay
could well combine the two kim possible tags and give a count of 3 or 4.'all my files'/'all known tags' works fine for most cases, but a specific tag service or 'all known files'/'tag service' can work better for editing tag repository
Example request: Example requeststorage
contexts, since it provides results just for that service, and for repositories, it gives tags for all the non-local files other users have tagged.
Response: Some JSON listing the client's matching tags. Example response/add_tags/search_tags?search=kim&tag_display_type=display\n
{\n \"tags\" : [\n {\n \"value\" : \"series:kim possible\", \n \"count\" : 3\n },\n {\n \"value\" : \"kimchee\", \n \"count\" : 2\n },\n {\n \"value\" : \"character:kimberly ann possible\", \n \"count\" : 1\n }\n ]\n}\n
The
tags
list will be sorted by descending count. The various rules in tags->manage tag display and search (e.g. no pure*
searches on certain services) will also be checked--and if violated, you will get 200 OK but an empty result.Note that if your client api access is only allowed to search certain tags, the results will be similarly filtered.
"},{"location":"developer_api.html#add_tags_add_tags","title":"POST/add_tags/add_tags
","text":"Make changes to the tags that files have.
Restricted access: YES. Add Tags permission needed.Required Headers: n/a
Arguments (in JSON):- files
service_keys_to_tags
: (selective B, an Object of service keys to lists of tags to be 'added' to the files)service_keys_to_actions_to_tags
: (selective B, an Object of service keys to content update actions to lists of tags)override_previously_deleted_mappings
: (optional, defaulttrue
)create_new_deleted_mappings
: (optional, defaulttrue
)
In 'service_keys_to...', the keys are as in /get_services. You may need some selection UI on your end so the user can pick what to do if there are multiple choices.
Also, you can use either '...to_tags', which is simple and add-only, or '...to_actions_to_tags', which is more complicated and allows you to remove/petition or rescind pending content.
The permitted 'actions' are:
- 0 - Add to a local tag service.
- 1 - Delete from a local tag service.
- 2 - Pend to a tag repository.
- 3 - Rescind a pend from a tag repository.
- 4 - Petition from a tag repository. (This is special)
- 5 - Rescind a petition from a tag repository.
Read about Current Deleted Pending Petitioned for more info on these states.
When you petition a tag from a repository, a 'reason' for the petition is typically needed. If you send a normal list of tags here, a default reason of \"Petitioned from API\" will be given. If you want to set your own reason, you can instead give a list of [ tag, reason ] pairs.
Some example requests: Adding some tags to a file
Adding more tags to two files{\n \"hash\" : \"df2a7b286d21329fc496e3aa8b8a08b67bb1747ca32749acb3f5d544cbfc0f56\",\n \"service_keys_to_tags\" : {\n \"6c6f63616c2074616773\" : [\"character:supergirl\", \"rating:safe\"]\n }\n}\n
A complicated transaction with all possible actions{\n \"hashes\" : [\n \"df2a7b286d21329fc496e3aa8b8a08b67bb1747ca32749acb3f5d544cbfc0f56\",\n \"f2b022214e711e9a11e2fcec71bfd524f10f0be40c250737a7861a5ddd3faebf\"\n ],\n \"service_keys_to_tags\" : {\n \"6c6f63616c2074616773\" : [\"process this\"],\n \"ccb0cf2f9e92c2eb5bd40986f72a339ef9497014a5fb8ce4cea6d6c9837877d9\" : [\"creator:dandon fuga\"]\n }\n}\n
{\n \"hash\" : \"df2a7b286d21329fc496e3aa8b8a08b67bb1747ca32749acb3f5d544cbfc0f56\",\n \"service_keys_to_actions_to_tags\" : {\n \"6c6f63616c2074616773\" : {\n \"0\" : [\"character:supergirl\", \"rating:safe\"],\n \"1\" : [\"character:superman\"]\n },\n \"aa0424b501237041dab0308c02c35454d377eebd74cfbc5b9d7b3e16cc2193e9\" : {\n \"2\" : [\"character:supergirl\", \"rating:safe\"],\n \"3\" : [\"filename:image.jpg\"],\n \"4\" : [[\"creator:danban faga\", \"typo\"], [\"character:super_girl\", \"underscore\"]],\n \"5\" : [\"skirt\"]\n }\n }\n}\n
This last example is far more complicated than you will usually see. Pend rescinds and petition rescinds are not common. Petitions are also quite rare, and gathering a good petition reason for each tag is often a pain.
Note that the enumerated status keys in the service_keys_to_actions_to_tags structure are strings, not ints (JSON does not support int keys for Objects).
The
override_previously_deleted_mappings
parameter adjusts your Add/Pend actions. In the client, if a human, in the manage tags dialog, tries to add a tag mapping that has been previously deleted, that deleted record will be overwritten. An automatic system like a gallery parser will filter/skip any Add/Pend actions in this case (so that repeat downloads do not overwrite a human user delete, etc..). The Client API acts like a human, by default, overwriting previously deleted mappings. If you want to spam a lot of new mappings but do not want to overwrite previously deletion decisions, acting like a downloader, then set this tofalse
.The
create_new_deleted_mappings
parameter adjusts your Delete/Petition actions, particularly whether a delete record should be made even if the tag does not exist on the file. There are not many ways to spontaneously create a delete record in the normal hydrus UI, but you as the Client API should think whether this is what you want. By default, the Client API will write a delete record whether the tag already exists for the file or not. If you only want to create a delete record (which prohibits the tag being added back again by something like a downloader, as withoverride_previously_deleted_mappings
) when the tag already exists on the file, then set this tofalse
. Are you saying 'migrate all these deleted tag records from A to B so that none of them are re-added'? Then you want thistrue
. Are you saying 'This tag was applied incorrectly to some but perhaps not all of of these files; where it exists, delete it.'? Then set itfalse
.There is currently no way to delete a tag mapping without leaving a delete record (as you can with files). This will probably happen, though, and it'll be a new parameter here.
Response description: 200 and no content.Note
Note also that hydrus tag actions are safely idempotent. You can pend a tag that is already pended, or add a tag that already exists, and not worry about an error--the surplus add action will be discarded. The same is true if you try to pend a tag that actually already exists, or rescinding a petition that doesn't. Any invalid actions will fail silently.
It is fine to just throw your 'process this' tags at every file import and not have to worry about checking which files you already added them to.
"},{"location":"developer_api.html#editing_file_ratings","title":"Editing File Ratings","text":""},{"location":"developer_api.html#edit_ratings_set_rating","title":"POST/edit_ratings/set_rating
","text":"Add or remove ratings associated with a file.
Restricted access: YES. Edit Ratings permission needed. Required Headers:Content-Type
:application/json
- files
rating_service_key
: (hexadecimal, the rating service you want to edit)rating
: (mixed datatype, the rating value you want to set)
{\n \"hash\" : \"3b820114f658d768550e4e3d4f1dced3ff8db77443472b5ad93700647ad2d3ba\",\n \"rating_service_key\" : \"282303611ba853659aa60aeaa5b6312d40e05b58822c52c57ae5e320882ba26e\",\n \"rating\" : 2\n}\n
This is fairly simple, but there are some caveats around the different rating service types and the actual data you are setting here. It is the same as you'll see in GET /get_files/file_metadata.
"},{"location":"developer_api.html#likedislike_ratings","title":"Like/Dislike Ratings","text":"Send
"},{"location":"developer_api.html#numerical_ratings","title":"Numerical Ratings","text":"true
for 'like',false
for 'dislike', ornull
for 'unset'.Send an
"},{"location":"developer_api.html#incdec_ratings","title":"Inc/Dec Ratings","text":"int
for the number of stars to set, ornull
for 'unset'.Send an
int
for the number to set. 0 is your minimum.As with GET /get_files/file_metadata, check The Services Object for the min/max stars on a numerical rating service.
Response: 200 and no content."},{"location":"developer_api.html#editing_file_times","title":"Editing File Times","text":""},{"location":"developer_api.html#edit_times_set_time","title":"POST/edit_times/set_time
","text":"Add or remove timestamps associated with a file.
Restricted access: YES. Edit Times permission needed. Required Headers:Content-Type
:application/json
- files
timestamp
: (selective, float or int of the time in seconds, ornull
for deleting web domain times)timestamp_ms
: (selective, int of the time in milliseconds, ornull
for deleting web domain times)timestamp_type
: (int, the type of timestamp you are editing)file_service_key
: (dependant, hexadecimal, the file service you are editing in 'imported'/'deleted'/'previously imported')canvas_type
: (dependant, int, the canvas type you are editing in 'last viewed')domain
: (dependant, string, the domain you are editing in 'modified (web domain)')
Example request body, more complicated{\n \"timestamp\" : \"1641044491\",\n \"timestamp_type\" : 5\n}\n
Example request body, deleting{\n \"timestamp\" : \"1641044491.458\",\n \"timestamp_type\" : 6,\n \"canvas_type\" : 1\n}\n
{\n \"timestamp_ms\" : null,\n \"timestamp_type\" : 0,\n \"domain\" : \"blahbooru.org\"\n}\n
This is a copy of the manage times dialog in the program, so if you are uncertain about something, check that out. The client records timestamps up to millisecond accuracy.
You have to select some files, obviously. I'd imagine most uses will be over one file at a time, but you can spam 100 or 10,000 if you need to.
Then choose whether you want to work with
timestamp
ortimestamp_ms
.timestamp
can be an integer or a float, and in the latter case, the API will suck up the three most significant digits to be the millisecond data.timestamp_ms
is an integer of milliseconds, simply thetimestamp
value multiplied by 1,000. It doesn't matter which you use--whichever is easiest for you.If you send
null
timestamp time, then this will instruct to delete the existing value, if possible and reasonable.timestamp_type
is an enum as follows:- 0 - File modified time (web domain)
- 1 - File modified time (on the hard drive)
- 3 - File import time
- 4 - File delete time
- 5 - Archived time
- 6 - Last viewed (in the media viewer)
- 7 - File originally imported time
Adding or Deleting
You can add or delete type 0 (web domain) timestamps, but you can only edit existing instances of all the others. This is broadly how the manage times dialog works, also. Stuff like 'last viewed' is tied up with other numbers like viewtime and num_views, so if that isn't already in the database, then we can't just add the timestamp on its own. Same with 'deleted time' for a file that isn't deleted! So, in general, other than web domain stuff, you can only edit times you already see in /get_files/file_metadata.
If you select 0, you have to include a
domain
, which will usually be a web domain, but you can put anything in there.If you select 1, the client will not alter the modified time on your hard disk, only the database record. This is unlike the dialog. Let's let this system breathe a bit before we try to get too clever.
If you select 3, 4, or 7, you have to include a
file_service_key
. The 'previously imported' time is for deleted files only; it records when the file was originally imported so if the user hits 'undo', the database knows what import time to give back to it.If you select 6, you have to include a
canvas_type
, which is:- 0 - Media viewer
- 1 - Preview viewer
/add_notes/set_notes
","text":"Add or update notes associated with a file.
Restricted access: YES. Add Notes permission needed. Required Headers:Content-Type
:application/json
notes
: (an Object mapping string names to string texts)hash
: (selective, an SHA256 hash for the file in 64 characters of hexadecimal)file_id
: (selective, the integer numerical identifier for the file)merge_cleverly
: true or false (optional, defaults false)extend_existing_note_if_possible
: true or false (optional, defaults true)conflict_resolution
: 0, 1, 2, or 3 (optional, defaults 3)
With
merge_cleverly
leftfalse
, then this is a simple update operation. Existing notes will be overwritten exactly as you specify. Any other notes the file has will be untouched. Example request body{\n \"notes\" : {\n \"note name\" : \"content of note\",\n \"another note\" : \"asdf\"\n },\n \"hash\" : \"3b820114f658d768550e4e3d4f1dced3ff8db77443472b5ad93700647ad2d3ba\"\n}\n
If you turn on
merge_cleverly
, then the client will merge your new notes into the file's existing notes using the same logic you have seen in Note Import Options and the Duplicate Metadata Merge Options. This navigates conflict resolution, and you should use it if you are adding potential duplicate content from an 'automatic' source like a parser and do not want to wade into the logic. Do not use it for a user-editing experience (a user expects a strict overwrite/replace experience and will be confused by this mode).To start off, in this mode, if your note text exists under a different name for the file, your dupe note will not be added to your new name.
If a new note name already exists and its new text differs from what already exists:extend_existing_note_if_possible
makes it so your existing note text will overwrite an existing name (or a '... (1)' rename of that name) if the existing text is inside your given text.conflict_resolution
is an enum governing what to do in all other conflicts:- 0 - replace - Overwrite the existing conflicting note.
- 1 - ignore - Make no changes.
- 2 - append - Append the new text to the existing text.
- 3 - rename (default) - Add the new text under a 'name (x)'-style rename.
merge_cleverly=false
, this is exactly what you gave, and this operation is idempotent. Ifmerge_cleverly=true
, then this may differ, even be empty, and this operation might not be idempotent. Example response
"},{"location":"developer_api.html#add_notes_delete_notes","title":"POST{\n \"notes\" : {\n \"note name\" : \"content of note\",\n \"another note (1)\" : \"asdf\"\n }\n}\n
/add_notes/delete_notes
","text":"Remove notes associated with a file.
Restricted access: YES. Add Notes permission needed. Required Headers:Content-Type
:application/json
note_names
: (a list of string note names to delete)hash
: (selective, an SHA256 hash for the file in 64 characters of hexadecimal)file_id
: (selective, the integer numerical identifier for the file)
Response: 200 with no content. This operation is idempotent."},{"location":"developer_api.html#searching_and_fetching_files","title":"Searching and Fetching Files","text":"{\n \"note_names\" : [\"note name\", \"another note\"],\n \"hash\" : \"3b820114f658d768550e4e3d4f1dced3ff8db77443472b5ad93700647ad2d3ba\"\n}\n
File search in hydrus is not paginated like a booru--all searches return all results in one go. In order to keep this fast, search is split into two steps--fetching file identifiers with a search, and then fetching file metadata in batches. You may have noticed that the client itself performs searches like this--thinking a bit about a search and then bundling results in batches of 256 files before eventually throwing all the thumbnails on screen.
"},{"location":"developer_api.html#get_files_search_files","title":"GET/get_files/search_files
","text":"Search for the client's files.
Restricted access: YES. Search for Files permission needed. Additional search permission limits may apply.Required Headers: n/a
Arguments (in percent-encoded JSON):tags
: (a list of tags you wish to search for)- file domain (optional, defaults to all my files)
tag_service_key
: (optional, hexadecimal, the tag domain on which to search, defaults to all my files)include_current_tags
: (optional, bool, whether to search 'current' tags, defaults totrue
)include_pending_tags
: (optional, bool, whether to search 'pending' tags, defaults totrue
)file_sort_type
: (optional, integer, the results sort method, defaults to2
forimport time
)file_sort_asc
: true or false (optional, defaulttrue
, the results sort order)return_file_ids
: true or false (optional, defaulttrue
, returns file id results)return_hashes
: true or false (optional, defaultfalse
, returns hex hash results)
/get_files/search_files?tags=%5B%22blue%20eyes%22%2C%20%22blonde%20hair%22%2C%20%22%5Cu043a%5Cu0438%5Cu043d%5Cu043e%22%2C%20%22system%3Ainbox%22%2C%20%22system%3Alimit%3D16%22%5D\n
If the access key's permissions only permit search for certain tags, at least one positive whitelisted/non-blacklisted tag must be in the \"tags\" list or this will 403. Tags can be prepended with a hyphen to make a negated tag (e.g. \"-green eyes\"), but these will not be checked against the permissions whitelist.
Wildcards and namespace searches are supported, so if you search for 'character:sam*' or 'series:*', this will be handled correctly clientside.
Many system predicates are also supported using a text parser! The parser was designed by a clever user for human input and allows for a certain amount of error (e.g. ~= instead of \u2248, or \"isn't\" instead of \"is not\") or requires more information (e.g. the specific hashes for a hash lookup). Here's a big list of examples that are supported:
System Predicates- system:everything
- system:inbox
- system:archive
- system:has duration
- system:no duration
- system:is the best quality file of its duplicate group
- system:is not the best quality file of its duplicate group
- system:has audio
- system:no audio
- system:has exif
- system:no exif
- system:has human-readable embedded metadata
- system:no human-readable embedded metadata
- system:has icc profile
- system:no icc profile
- system:has tags
- system:no tags
- system:untagged
- system:number of tags > 5
- system:number of tags ~= 10
- system:number of tags > 0
- system:number of words < 2
- system:height = 600
- system:height > 900
- system:width < 200
- system:width > 1000
- system:filesize ~= 50 kilobytes
- system:filesize > 10megabytes
- system:filesize < 1 GB
- system:filesize > 0 B
- system:similar to abcdef01 abcdef02 abcdef03, abcdef04 with distance 3
- system:similar to abcdef distance 5
- system:limit = 100
- system:filetype = image/jpg, image/png, apng
- system:hash = abcdef01 abcdef02 abcdef03 (this does sha256)
- system:hash = abcdef01 abcdef02 md5
- system:modified date < 7 years 45 days 7h
- system:modified date > 2011-06-04
- system:last viewed time < 7 years 45 days 7h
- system:last view time < 7 years 45 days 7h
- system:date modified > 7 years 2 months
- system:date modified < 0 years 1 month 1 day 1 hour
- system:import time < 7 years 45 days 7h
- system:time imported < 7 years 45 days 7h
- system:time imported > 2011-06-04
- system:time imported > 7 years 2 months
- system:time imported < 0 years 1 month 1 day 1 hour
- system:time imported ~= 2011-1-3
- system:time imported ~= 1996-05-2
- system:duration < 5 seconds
- system:duration ~= 600 msecs
- system:duration > 3 milliseconds
- system:file service is pending to my files
- system:file service currently in my files
- system:file service is not currently in my files
- system:file service is not pending to my files
- system:number of file relationships = 2 duplicates
- system:number of file relationships > 10 potential duplicates
- system:num file relationships < 3 alternates
- system:num file relationships > 3 false positives
- system:ratio is wider than 16:9
- system:ratio is 16:9
- system:ratio taller than 1:1
- system:num pixels > 50 px
- system:num pixels < 1 megapixels
- system:num pixels ~= 5 kilopixel
- system:media views ~= 10
- system:all views > 0
- system:preview views < 10
- system:media viewtime < 1 days 1 hour 0 minutes
- system:all viewtime > 1 hours 100 seconds
- system:preview viewtime ~= 1 day 30 hours 100 minutes 90s
- system:has url matching regex index\\.php
- system:does not have a url matching regex index\\.php
- system:has url https://safebooru.donmai.us/posts/4695284
- system:does not have url https://safebooru.donmai.us/posts/4695284
- system:has domain safebooru.com
- system:does not have domain safebooru.com
- system:has a url with class safebooru file page
- system:does not have a url with url class safebooru file page
- system:tag as number page < 5
- system:has notes
- system:no notes
- system:does not have notes
- system:num notes is 5
- system:num notes > 1
- system:has note with name note name
- system:no note with name note name
- system:does not have note with name note name
- system:has a rating for
service_name
- system:does not have a rating for
service_name
- system:rating for
service_name
> \u2157 (numerical services) - system:rating for
service_name
is like (like/dislike services) - system:rating for
service_name
= 13 (inc/dec services)
Please test out the system predicates you want to send. If you are in help->advanced mode, you can test this parser in the advanced text input dialog when you click the OR* button on a tag autocomplete dropdown. More system predicate types and input formats will be available in future. Reverse engineering system predicate data from text is obviously tricky. If a system predicate does not parse, you'll get 400.
Also, OR predicates are now supported! Just nest within the tag list, and it'll be treated like an OR. For instance:
[ \"skirt\", [ \"samus aran\", \"lara croft\" ], \"system:height > 1000\" ]
Makes:
- skirt
- samus aran OR lara croft
- system:height > 1000
The file and tag services are for search domain selection, just like clicking the buttons in the client. They are optional--default is 'all my files' and 'all known tags'.
include_current_tags
andinclude_pending_tags
do the same as the buttons on the normal search interface. They alter the search of normal tags and tag-related system predicates like 'system:number of tags', including or discluding that type of tag from whatever the search is doing. If you set both of these tofalse
, you'll often get no results.File searches occur in the
display
tag_display_type
. If you want to pair autocomplete tag lookup from /search_tags to this file search (e.g. for making a standard booru search interface), then make sure you are searchingdisplay
tags there.file_sort_asc is 'true' for ascending, and 'false' for descending. The default is descending.
file_sort_type is by default import time. It is an integer according to the following enum, and I have written the semantic (asc/desc) meaning for each type after:
- 0 - file size (smallest first/largest first)
- 1 - duration (shortest first/longest first)
- 2 - import time (oldest first/newest first)
- 3 - filetype (N/A)
- 4 - random (N/A)
- 5 - width (slimmest first/widest first)
- 6 - height (shortest first/tallest first)
- 7 - ratio (tallest first/widest first)
- 8 - number of pixels (ascending/descending)
- 9 - number of tags (on the current tag domain) (ascending/descending)
- 10 - number of media views (ascending/descending)
- 11 - total media viewtime (ascending/descending)
- 12 - approximate bitrate (smallest first/largest first)
- 13 - has audio (audio first/silent first)
- 14 - modified time (oldest first/newest first)
- 15 - framerate (slowest first/fastest first)
- 16 - number of frames (smallest first/largest first)
- 18 - last viewed time (oldest first/newest first)
- 19 - archive timestamp (oldest first/newest first)
- 20 - hash hex (lexicographic/reverse lexicographic)
- 21 - pixel hash hex (lexicographic/reverse lexicographic)
- 22 - blurhash (lexicographic/reverse lexicographic)
The pixel and blurhash sorts will put files without one of these (e.g. an mp3) at the end, regardless of asc/desc.
Response:The full list of numerical file ids that match the search. Example response
Example response with return_hashes=true{\n \"file_ids\" : [125462, 4852415, 123, 591415]\n}\n
{\n \"hashes\" : [\n \"1b04c4df7accd5a61c5d02b36658295686b0abfebdc863110e7d7249bba3f9ad\",\n \"fe416723c731d679aa4d20e9fd36727f4a38cd0ac6d035431f0f452fad54563f\",\n \"b53505929c502848375fbc4dab2f40ad4ae649d34ef72802319a348f81b52bad\"\n ],\n \"file_ids\" : [125462, 4852415, 123]\n}\n
You can of course also specify
return_hashes=true&return_file_ids=false
just to get the hashes. The order of both lists is the same.File ids are internal and specific to an individual client. For a client, a file with hash H always has the same file id N, but two clients will have different ideas about which N goes with which H. IDs are a bit faster to retrieve than hashes and search with en masse, which is why they are exposed here.
This search does not apply the implicit limit that most clients set to all searches (usually 10,000), so if you do system:everything on a client with millions of files, expect to get boshed. Even with a system:limit included, complicated queries with large result sets may take several seconds to respond. Just like the client itself.
"},{"location":"developer_api.html#get_files_file_hashes","title":"GET/get_files/file_hashes
","text":"Lookup file hashes from other hashes.
Restricted access: YES. Search for Files permission needed.Required Headers: n/a
Arguments (in percent-encoded JSON):hash
: (selective, a hexadecimal hash)hashes
: (selective, a list of hexadecimal hashes)source_hash_type
: [sha256|md5|sha1|sha512] (optional, defaulting to sha256)desired_hash_type
: [sha256|md5|sha1|sha512]
If you have some MD5 hashes and want to see what their SHA256 are, or vice versa, this is the place. Hydrus records the non-SHA256 hashes for every file it has ever imported. This data is not removed on file deletion.
Example request
Response: A mapping Object of the successful lookups. Where no matching hash is found, no entry will be made (therefore, if none of your source hashes have matches on the client, this will return an empty/get_files/file_hashes?hash=ec5c5a4d7da4be154597e283f0b6663c&source_hash_type=md5&desired_hash_type=sha256\n
hashes
Object). Example response
"},{"location":"developer_api.html#get_files_file_metadata","title":"GET{\n \"hashes\" : {\n \"ec5c5a4d7da4be154597e283f0b6663c\" : \"2a0174970defa6f147f2eabba829c5b05aba1f1aea8b978611a07b7bb9cf9399\"\n }\n}\n
/get_files/file_metadata
","text":"Get metadata about files in the client.
Restricted access: YES. Search for Files permission needed. Additional search permission limits may apply.Required Headers: n/a
Arguments (in percent-encoded JSON):- files
create_new_file_ids
: true or false (optional if asking with hash(es), defaulting to false)only_return_identifiers
: true or false (optional, defaulting to false)only_return_basic_information
: true or false (optional, defaulting to false)detailed_url_information
: true or false (optional, defaulting to false)include_blurhash
: true or false (optional, defaulting to false. Only applies whenonly_return_basic_information
is true)include_milliseconds
: true or false (optional, defaulting to false)include_notes
: true or false (optional, defaulting to false)include_services_object
: true or false (optional, defaulting to true)hide_service_keys_tags
: Deprecated, will be deleted soon! true or false (optional, defaulting to true)
If your access key is restricted by tag, the files you search for must have been in the most recent search result.
Example request for two files with ids 123 and 4567
The same, but only wants hashes back/get_files/file_metadata?file_ids=%5B123%2C%204567%5D\n
And one that fetches two hashes/get_files/file_metadata?file_ids=%5B123%2C%204567%5D&only_return_identifiers=true\n
/get_files/file_metadata?hashes=%5B%224c77267f93415de0bc33b7725b8c331a809a924084bee03ab2f5fae1c6019eb2%22%2C%20%223e7cb9044fe81bda0d7a84b5cb781cba4e255e4871cba6ae8ecd8207850d5b82%22%5D\n
This request string can obviously get pretty ridiculously long. It also takes a bit of time to fetch metadata from the database. In its normal searches, the client usually fetches file metadata in batches of 256.
Response: A list of JSON Objects that store a variety of file metadata. Also The Services Object for service reference.Example response
And one where only_return_identifiers is true{\n \"services\" : \"The Services Object\",\n \"metadata\" : [\n {\n \"file_id\" : 123,\n \"hash\" : \"4c77267f93415de0bc33b7725b8c331a809a924084bee03ab2f5fae1c6019eb2\",\n \"size\" : 63405,\n \"mime\" : \"image/jpeg\",\n \"filetype_forced\" : false,\n \"filetype_human\" : \"jpeg\",\n \"filetype_enum\" : 1,\n \"ext\" : \".jpg\",\n \"width\" : 640,\n \"height\" : 480,\n \"thumbnail_width\" : 200,\n \"thumbnail_height\" : 150,\n \"duration\" : null,\n \"time_modified\" : null,\n \"time_modified_details\" : {},\n \"file_services\" : {\n \"current\" : {},\n \"deleted\" : {}\n },\n \"ipfs_multihashes\" : {},\n \"has_audio\" : false,\n \"blurhash\" : \"U6PZfSi_.AyE_3t7t7R**0o#DgR4_3R*D%xt\",\n \"pixel_hash\" : \"2519e40f8105599fcb26187d39656b1b46f651786d0e32fff2dc5a9bc277b5bb\",\n \"num_frames\" : null,\n \"num_words\" : null,\n \"is_inbox\" : false,\n \"is_local\" : false,\n \"is_trashed\" : false,\n \"is_deleted\" : false,\n \"has_exif\" : true,\n \"has_human_readable_embedded_metadata\" : true,\n \"has_icc_profile\" : true,\n \"has_transparency\" : false,\n \"known_urls\" : [],\n \"ratings\" : {\n \"74d52c6238d25f846d579174c11856b1aaccdb04a185cb2c79f0d0e499284f2c\" : null,\n \"90769255dae5c205c975fc4ce2efff796b8be8a421f786c1737f87f98187ffaf\" : null,\n \"b474e0cbbab02ca1479c12ad985f1c680ea909a54eb028e3ad06750ea40d4106\" : 0\n },\n \"tags\" : {\n \"6c6f63616c2074616773\" : {\n \"storage_tags\" : {},\n \"display_tags\" : {}\n },\n \"37e3849bda234f53b0e9792a036d14d4f3a9a136d1cb939705dbcd5287941db4\" : {\n \"storage_tags\" : {},\n \"display_tags\" : {}\n },\n \"616c6c206b6e6f776e2074616773\" : {\n \"storage_tags\" : {},\n \"display_tags\" : {}\n }\n }\n },\n {\n \"file_id\" : 4567,\n \"hash\" : \"3e7cb9044fe81bda0d7a84b5cb781cba4e255e4871cba6ae8ecd8207850d5b82\",\n \"size\" : 199713,\n \"mime\" : \"video/webm\",\n \"filetype_forced\" : false,\n \"filetype_human\" : \"webm\",\n \"filetype_enum\" : 21,\n \"ext\" : \".webm\",\n \"width\" : 1920,\n \"height\" : 1080,\n \"thumbnail_width\" : 200,\n \"thumbnail_height\" : 113,\n \"duration\" : 4040,\n \"time_modified\" : 1604055647,\n \"time_modified_details\" : {\n \"local\" : 1641044491,\n \"gelbooru.com\" : 1604055647\n },\n \"file_services\" : {\n \"current\" : {\n \"616c6c206c6f63616c2066696c6573\" : {\n \"time_imported\" : 1641044491\n },\n \"616c6c206c6f63616c206d65646961\" : {\n \"time_imported\" : 1641044491\n },\n \"cb072cffbd0340b67aec39e1953c074e7430c2ac831f8e78fb5dfbda6ec8dcbd\" : {\n \"time_imported\" : 1641204220\n }\n },\n \"deleted\" : {\n \"6c6f63616c2066696c6573\" : {\n \"time_deleted\" : 1641204274,\n \"time_imported\" : 1641044491\n }\n }\n },\n \"ipfs_multihashes\" : {\n \"55af93e0deabd08ce15ffb2b164b06d1254daab5a18d145e56fa98f71ddb6f11\" : \"QmReHtaET3dsgh7ho5NVyHb5U13UgJoGipSWbZsnuuM8tb\"\n },\n \"has_audio\" : true,\n \"blurhash\" : \"UHF5?xYk^6#M@-5b,1J5@[or[k6.};FxngOZ\",\n \"pixel_hash\" : \"1dd9625ce589eee05c22798a9a201602288a1667c59e5cd1fb2251a6261fbd68\",\n \"num_frames\" : 102,\n \"num_words\" : null,\n \"is_inbox\" : false,\n \"is_local\" : true,\n \"is_trashed\" : false,\n \"is_deleted\" : false,\n \"has_exif\" : false,\n \"has_human_readable_embedded_metadata\" : false,\n \"has_icc_profile\" : false,\n \"has_transparency\" : false,\n \"known_urls\" : [\n \"https://gelbooru.com/index.php?page=post&s=view&id=4841557\",\n \"https://img2.gelbooru.com/images/80/c8/80c8646b4a49395fb36c805f316c49a9.jpg\",\n \"http://origin-orig.deviantart.net/ed31/f/2019/210/7/8/beachqueen_samus_by_dandonfuga-ddcu1xg.jpg\"\n ],\n \"ratings\" : {\n \"74d52c6238d25f846d579174c11856b1aaccdb04a185cb2c79f0d0e499284f2c\" : true,\n \"90769255dae5c205c975fc4ce2efff796b8be8a421f786c1737f87f98187ffaf\" : 3,\n \"b474e0cbbab02ca1479c12ad985f1c680ea909a54eb028e3ad06750ea40d4106\" : 11\n },\n \"tags\" : {\n \"6c6f63616c2074616773\" : {\n \"storage_tags\" : {\n \"0\" : [\"samus favourites\"],\n \"2\" : [\"process this later\"]\n },\n \"display_tags\" : {\n \"0\" : [\"samus favourites\", \"favourites\"],\n \"2\" : [\"process this later\"]\n }\n },\n \"37e3849bda234f53b0e9792a036d14d4f3a9a136d1cb939705dbcd5287941db4\" : {\n \"storage_tags\" : {\n \"0\" : [\"blonde_hair\", \"blue_eyes\", \"looking_at_viewer\"],\n \"1\" : [\"bodysuit\"]\n },\n \"display_tags\" : {\n \"0\" : [\"blonde hair\", \"blue_eyes\", \"looking at viewer\"],\n \"1\" : [\"bodysuit\", \"clothing\"]\n }\n },\n \"616c6c206b6e6f776e2074616773\" : {\n \"storage_tags\" : {\n \"0\" : [\"samus favourites\", \"blonde_hair\", \"blue_eyes\", \"looking_at_viewer\"],\n \"1\" : [\"bodysuit\"]\n },\n \"display_tags\" : {\n \"0\" : [\"samus favourites\", \"favourites\", \"blonde hair\", \"blue_eyes\", \"looking at viewer\"],\n \"1\" : [\"bodysuit\", \"clothing\"]\n }\n }\n }\n }\n ]\n}\n
And where only_return_basic_information is true{\n \"services\" : \"The Services Object\",\n \"metadata\" : [\n {\n \"file_id\" : 123,\n \"hash\" : \"4c77267f93415de0bc33b7725b8c331a809a924084bee03ab2f5fae1c6019eb2\"\n },\n {\n \"file_id\" : 4567,\n \"hash\" : \"3e7cb9044fe81bda0d7a84b5cb781cba4e255e4871cba6ae8ecd8207850d5b82\"\n }\n ]\n}\n
"},{"location":"developer_api.html#basics","title":"basics","text":"{\n \"services\" : \"The Services Object\",\n \"metadata\" : [\n {\n \"file_id\" : 123,\n \"hash\" : \"4c77267f93415de0bc33b7725b8c331a809a924084bee03ab2f5fae1c6019eb2\",\n \"size\" : 63405,\n \"mime\" : \"image/jpeg\",\n \"filetype_forced\" : false,\n \"filetype_human\" : \"jpeg\",\n \"filetype_enum\" : 1,\n \"ext\" : \".jpg\",\n \"width\" : 640,\n \"height\" : 480,\n \"duration\" : null,\n \"has_audio\" : false,\n \"num_frames\" : null,\n \"num_words\" : null\n },\n {\n \"file_id\" : 4567,\n \"hash\" : \"3e7cb9044fe81bda0d7a84b5cb781cba4e255e4871cba6ae8ecd8207850d5b82\",\n \"size\" : 199713,\n \"mime\" : \"video/webm\",\n \"filetype_forced\" : false,\n \"filetype_human\" : \"webm\",\n \"filetype_enum\" : 21,\n \"ext\" : \".webm\",\n \"width\" : 1920,\n \"height\" : 1080,\n \"duration\" : 4040,\n \"has_audio\" : true,\n \"num_frames\" : 102,\n \"num_words\" : null\n }\n ]\n}\n
Size is in bytes. Duration is in milliseconds, and may be an int or a float.
is_trashed
means if the file is currently in the trash but available on the hard disk.is_deleted
means currently either in the trash or completely deleted from disk.file_services
stores which file services the file is currently in and deleted from. The entries are by the service key, same as for tags later on. In rare cases, the timestamps may benull
, if they are unknown (e.g. atime_deleted
for the file deleted before this information was tracked). Thetime_modified
can also be null. Time modified is just the filesystem modified time for now, but it will evolve into more complicated storage in future with multiple locations (website post times) that'll be aggregated to a sensible value in UI.ipfs_multihashes
stores the ipfs service key to any known multihash for the file.The
thumbnail_width
andthumbnail_height
are a generally reliable prediction but aren't a promise. The actual thumbnail you get from /get_files/thumbnail will be different if the user hasn't looked at it since changing their thumbnail options. You only get these rows for files that hydrus actually generates an actual thumbnail for. Things like pdf won't have it. You can use your own thumb, or ask the api and it'll give you a fixed fallback; those are mostly 200x200, but you can and should size them to whatever you want.include_notes
will decide whether to show a file's notes, in a simple names->texts Object.include_milliseconds
will determine if timestamps are integers (1641044491
), which is the default, or floats with three significant figures (1641044491.485
). As of v559, all file timestamps across the program are internally tracked with milliseconds.If the file has a thumbnail,
blurhash
gives a base 83 encoded string of its blurhash.pixel_hash
is an SHA256 of the image's pixel data and should exactly match for pixel-identical files (it is used in the duplicate system for 'must be pixel duplicates').If the file's filetype is forced by the user,
"},{"location":"developer_api.html#tags","title":"tags","text":"filetype_forced
becomestrue
and a second mime string,original_mime
is added.The
tags
structure is similar to the /add_tags/add_tags scheme, excepting that the status numbers are:- 0 - current
- 1 - pending
- 2 - deleted
- 3 - petitioned
Note
Since JSON Object keys must be strings, these status numbers are strings, not ints.
Read about Current Deleted Pending Petitioned for more info on these states.
While the 'storage_tags' represent the actual tags stored on the database for a file, 'display_tags' reflect how tags appear in the UI, after siblings are collapsed and parents are added. If you want to edit a file's tags, refer to the storage tags. If you want to render to the user, use the display tags. The display tag calculation logic is very complicated; if the storage tags change, do not try to guess the new display tags yourself--just ask the API again.
"},{"location":"developer_api.html#ratings","title":"ratings","text":"The
ratings
structure is simple, but it holds different data types. For each service:- For a like/dislike service, 'no rating' is null. 'like' is true, 'dislike' is false.
- For a numerical service, 'no rating' is null. Otherwise it will be an integer, for the number of stars.
- For an inc/dec service, it is always an integer. The default value is 0 for all files.
Check The Services Object to see the shape of a rating star, and min/max number of stars in a numerical service.
"},{"location":"developer_api.html#services","title":"services","text":"The
tags
,ratings
, andfile_services
structures use the hexadecimalservice_key
extensively. If you need to look up the respective service name or type, check The Services Object under the top levelservices
key.Note
If you look, those file structures actually include the service name and type already, but this bloated data is deprecated and will be deleted in 2024, so please transition over.
If you don't want the services object (it is generally superfluous on the 'simple' responses), then add
"},{"location":"developer_api.html#parameters","title":"parameters","text":"include_services_object=false
.The
metadata
list should come back in the same sort order you asked, whether that is infile_ids
orhashes
!If you ask with hashes rather than file_ids, hydrus will, by default, only return results when it has seen those hashes before. This is to stop the client making thousands of new file_id records in its database if you perform a scanning operation. If you ask about a hash the client has never encountered before--for which there is no file_id--you will get this style of result:
Missing file_id example{\n \"metadata\" : [\n {\n \"file_id\" : null,\n \"hash\" : \"766da61f81323629f982bc1b71b5c1f9bba3f3ed61caf99906f7f26881c3ae93\"\n }\n ]\n}\n
You can change this behaviour with
create_new_file_ids=true
, but bear in mind you will get a fairly 'empty' metadata result with lots of 'null' lines, so this is only useful for gathering the numerical ids for later Client API work.If you ask about file_ids that do not exist, you'll get 404.
If you set
only_return_basic_information=true
, this will be much faster for first-time requests than the full metadata result, but it will be slower for repeat requests. The full metadata object is cached after first fetch, the limited file info object is not. You can optionally setinclude_blurhash
when using this option to fetch blurhash strings for the files.If you add
For exampledetailed_url_information=true
, a new entry,detailed_known_urls
, will be added for each file, with a list of the same structure as /add_urls/get_url_info
. This may be an expensive request if you are querying thousands of files at once.
"},{"location":"developer_api.html#get_files_file","title":"GET{\n \"detailed_known_urls\": [\n {\n \"normalised_url\": \"https://gelbooru.com/index.php?id=4841557&page=post&s=view\",\n \"url_type\": 0,\n \"url_type_string\": \"post url\",\n \"match_name\": \"gelbooru file page\",\n \"can_parse\": true\n },\n {\n \"normalised_url\": \"https://img2.gelbooru.com/images/80/c8/80c8646b4a49395fb36c805f316c49a9.jpg\",\n \"url_type\": 5,\n \"url_type_string\": \"unknown url\",\n \"match_name\": \"unknown url\",\n \"can_parse\": false\n }\n ]\n}\n
/get_files/file
","text":"Get a file.
Restricted access: YES. Search for Files permission needed. Additional search permission limits may apply.Required Headers: n/a
Arguments :file_id
: (selective, numerical file id for the file)hash
: (selective, a hexadecimal SHA256 hash for the file)download
: (optional, boolean, defaultfalse
)
Only use one of
file_id
orhash
. As with metadata fetching, you may only use the hash argument if you have access to all files. If you are tag-restricted, you will have to use a file_id in the last search you ran.Example request
Example request/get_files/file?file_id=452158\n
Response: The file itself. You should get the correct mime type as the Content-Type header./get_files/file?hash=7f30c113810985b69014957c93bc25e8eb4cf3355dae36d8b9d011d8b0cf623a&download=true\n
By default, this will set the
Content-Disposition
header toinline
, which causes a web browser to show the file. If you setdownload=true
, it will set it toattachment
, which triggers the browser to automatically download it (or open the 'save as' dialog) instead.This stuff supports
"},{"location":"developer_api.html#get_files_thumbnail","title":"GETRange
requests, so if you want to build a video player, go nuts./get_files/thumbnail
","text":"Get a file's thumbnail.
Restricted access: YES. Search for Files permission needed. Additional search permission limits may apply.Required Headers: n/a
Arguments:file_id
: (selective, numerical file id for the file)hash
: (selective, a hexadecimal SHA256 hash for the file)
Only use one. As with metadata fetching, you may only use the hash argument if you have access to all files. If you are tag-restricted, you will have to use a file_id in the last search you ran.
Example request
Example request/get_files/thumbnail?file_id=452158\n
Response:/get_files/thumbnail?hash=7f30c113810985b69014957c93bc25e8eb4cf3355dae36d8b9d011d8b0cf623a\n
The thumbnail for the file. Some hydrus thumbs are jpegs, some are pngs. It should give you the correct image/jpeg or image/png Content-Type.
If hydrus keeps no thumbnail for the filetype, for instance with pdfs, then you will get the same default 'pdf' icon you see in the client. If the file does not exist in the client, or the thumbnail was expected but is missing from storage, you will get the fallback 'hydrus' icon, again just as you would in the client itself. This request should never give a 404.
Size of Normal Thumbs
Thumbnails are not guaranteed to be the correct size! If a thumbnail has not been loaded in the client in years, it could well have been fitted for older thumbnail settings. Also, even 'clean' thumbnails will not always fit inside the settings' bounding box; they may be boosted due to a high-DPI setting or spill over due to a 'fill' vs 'fit' preference. You cannot easily predict what resolution a thumbnail will or should have!
In general, thumbnails are the correct ratio. If you are drawing thumbs, you should embed them to fit or fill, but don't fix them at 100% true size: make sure they can scale to the size you want!
Size of Defaults
If you get a 'default' filetype thumbnail like the pdf or hydrus one, you will be pulling the pngs straight from the hydrus/static folder. They will most likely be 200x200 pixels.
"},{"location":"developer_api.html#get_files_file_path","title":"GET/get_files/file_path
","text":"Get a local file path.
Restricted access: YES. Search for Files permission and See Local Paths permission needed. Additional search permission limits may apply.Required Headers: n/a
Arguments :file_id
: (selective, numerical file id for the file)hash
: (selective, a hexadecimal SHA256 hash for the file)
Only use one. As with metadata fetching, you may only use the hash argument if you have access to all files. If you are tag-restricted, you will have to use a file_id in the last search you ran.
Example request
Example request/get_files/file_path?file_id=452158\n
Response: The actual path to the file on the host system. Filetype and size are included for convenience. Example response/get_files/file_path?hash=7f30c113810985b69014957c93bc25e8eb4cf3355dae36d8b9d011d8b0cf623a\n
{\n \"path\" : \"D:\\hydrus_files\\f7f\\7f30c113810985b69014957c93bc25e8eb4cf3355dae36d8b9d011d8b0cf623a.jpg\",\n \"filetype\" : \"image/jpeg\",\n \"size\" : 95237\n}\n
This will give 404 if the file is not stored locally (which includes if it should exist but is actually missing from the file store).
"},{"location":"developer_api.html#get_files_thumbnail","title":"GET/get_files/thumbnail_path
","text":"Get a local thumbnail path.
Restricted access: YES. Search for Files permission and See Local Paths permission needed. Additional search permission limits may apply.Required Headers: n/a
Arguments:file_id
: (selective, numerical file id for the file)hash
: (selective, a hexadecimal SHA256 hash for the file)include_thumbnail_filetype
: (optional, boolean, defaults tofalse
)
Only use one of
file_id
orhash
. As with metadata fetching, you may only use the hash argument if you have access to all files. If you are tag-restricted, you will have to use a file_id in the last search you ran.Example request
Example request/get_files/thumbnail?file_id=452158\n
Response: The actual path to the thumbnail on the host system. Example response/get_files/thumbnail?hash=7f30c113810985b69014957c93bc25e8eb4cf3355dae36d8b9d011d8b0cf623a&include_thumbnail_filetype=true\n
Example response with include_thumbnail_filetype=true{\n \"path\" : \"D:\\hydrus_files\\f7f\\7f30c113810985b69014957c93bc25e8eb4cf3355dae36d8b9d011d8b0cf623a.thumbnail\"\n}\n
{\n \"path\" : \"C:\\hydrus_thumbs\\f85\\85daaefdaa662761d7cb1b026d7b101e74301be08e50bf09a235794ec8656f79.thumbnail\",\n \"filetype\" : \"image/png\"\n}\n
All thumbnails in hydrus have the .thumbnail file extension and in content are either jpeg (almost always) or png (to handle transparency).
This will 400 if the given file type does not have a thumbnail in hydrus, and it will 404 if there should be a thumbnail but one does not exist and cannot be generated from the source file (which probably would mean that the source file was itself Not Found).
"},{"location":"developer_api.html#get_local_file_storage_locations","title":"GET/get_files/local_file_storage_locations
","text":"Get the local file storage locations, as you see under database->migrate files.
Restricted access: YES. Search for Files permission and See Local Paths permission needed.Required Headers: n/a
Arguments: n/a
Response: A list of the different file storage locations and what they store. Example response{\n \"locations\" : [\n {\n \"path\" : \"C:\\my_thumbs\",\n \"ideal_weight\" : 1,\n \"max_num_bytes\": None,\n \"prefixes\" : [\n \"t00\", \"t01\", \"t02\", \"t03\", \"t04\", \"t05\", \"t06\", \"t07\", \"t08\", \"t09\", \"t0a\", \"t0b\", \"t0c\", \"t0d\", \"t0e\", \"t0f\",\n \"t10\", \"t11\", \"t12\", \"t13\", \"t14\", \"t15\", \"t16\", \"t17\", \"t18\", \"t19\", \"t1a\", \"t1b\", \"t1c\", \"t1d\", \"t1e\", \"t1f\",\n \"t20\", \"t21\", \"t22\", \"t23\", \"t24\", \"t25\", \"t26\", \"t27\", \"t28\", \"t29\", \"t2a\", \"t2b\", \"t2c\", \"t2d\", \"t2e\", \"t2f\",\n \"t30\", \"t31\", \"t32\", \"t33\", \"t34\", \"t35\", \"t36\", \"t37\", \"t38\", \"t39\", \"t3a\", \"t3b\", \"t3c\", \"t3d\", \"t3e\", \"t3f\",\n \"t40\", \"t41\", \"t42\", \"t43\", \"t44\", \"t45\", \"t46\", \"t47\", \"t48\", \"t49\", \"t4a\", \"t4b\", \"t4c\", \"t4d\", \"t4e\", \"t4f\",\n \"t50\", \"t51\", \"t52\", \"t53\", \"t54\", \"t55\", \"t56\", \"t57\", \"t58\", \"t59\", \"t5a\", \"t5b\", \"t5c\", \"t5d\", \"t5e\", \"t5f\",\n \"t60\", \"t61\", \"t62\", \"t63\", \"t64\", \"t65\", \"t66\", \"t67\", \"t68\", \"t69\", \"t6a\", \"t6b\", \"t6c\", \"t6d\", \"t6e\", \"t6f\",\n \"t70\", \"t71\", \"t72\", \"t73\", \"t74\", \"t75\", \"t76\", \"t77\", \"t78\", \"t79\", \"t7a\", \"t7b\", \"t7c\", \"t7d\", \"t7e\", \"t7f\",\n \"t80\", \"t81\", \"t82\", \"t83\", \"t84\", \"t85\", \"t86\", \"t87\", \"t88\", \"t89\", \"t8a\", \"t8b\", \"t8c\", \"t8d\", \"t8e\", \"t8f\",\n \"t90\", \"t91\", \"t92\", \"t93\", \"t94\", \"t95\", \"t96\", \"t97\", \"t98\", \"t99\", \"t9a\", \"t9b\", \"t9c\", \"t9d\", \"t9e\", \"t9f\",\n \"ta0\", \"ta1\", \"ta2\", \"ta3\", \"ta4\", \"ta5\", \"ta6\", \"ta7\", \"ta8\", \"ta9\", \"taa\", \"tab\", \"tac\", \"tad\", \"tae\", \"taf\",\n \"tb0\", \"tb1\", \"tb2\", \"tb3\", \"tb4\", \"tb5\", \"tb6\", \"tb7\", \"tb8\", \"tb9\", \"tba\", \"tbb\", \"tbc\", \"tbd\", \"tbe\", \"tbf\",\n \"tc0\", \"tc1\", \"tc2\", \"tc3\", \"tc4\", \"tc5\", \"tc6\", \"tc7\", \"tc8\", \"tc9\", \"tca\", \"tcb\", \"tcc\", \"tcd\", \"tce\", \"tcf\",\n \"td0\", \"td1\", \"td2\", \"td3\", \"td4\", \"td5\", \"td6\", \"td7\", \"td8\", \"td9\", \"tda\", \"tdb\", \"tdc\", \"tdd\", \"tde\", \"tdf\",\n \"te0\", \"te1\", \"te2\", \"te3\", \"te4\", \"te5\", \"te6\", \"te7\", \"te8\", \"te9\", \"tea\", \"teb\", \"tec\", \"ted\", \"tee\", \"tef\",\n \"tf0\", \"tf1\", \"tf2\", \"tf3\", \"tf4\", \"tf5\", \"tf6\", \"tf7\", \"tf8\", \"tf9\", \"tfa\", \"tfb\", \"tfc\", \"tfd\", \"tfe\", \"tff\"\n ]\n },\n {\n \"path\" : \"D:\\hydrus_files_1\",\n \"ideal_weight\" : 5,\n \"max_num_bytes\": None,\n \"prefixes\" : [\n \"f00\", \"f02\", \"f04\", \"f05\", \"f08\", \"f0c\", \"f11\", \"f12\", \"f13\", \"f15\", \"f17\", \"f18\", \"f1a\", \"f1b\", \"f20\", \"f23\",\n \"f25\", \"f26\", \"f27\", \"f2b\", \"f2e\", \"f2f\", \"f31\", \"f35\", \"f36\", \"f37\", \"f38\", \"f3a\", \"f40\", \"f42\", \"f43\", \"f44\",\n \"f49\", \"f4b\", \"f4d\", \"f4e\", \"f50\", \"f51\", \"f55\", \"f59\", \"f60\", \"f63\", \"f64\", \"f65\", \"f66\", \"f68\", \"f69\", \"f6e\",\n \"f71\", \"f73\", \"f78\", \"f79\", \"f7a\", \"f7d\", \"f7f\", \"f82\", \"f83\", \"f84\", \"f86\", \"f87\", \"f88\", \"f89\", \"f8f\", \"f90\",\n \"f91\", \"f96\", \"f9e\", \"fa1\", \"fa4\", \"fa5\", \"fa7\", \"faa\", \"fad\", \"faf\", \"fb1\", \"fb9\", \"fba\", \"fbb\", \"fbf\", \"fc1\",\n \"fc4\", \"fc7\", \"fc8\", \"fcf\", \"fd2\", \"fd6\", \"fd7\", \"fd8\", \"fd9\", \"fdf\", \"fe2\", \"fe8\", \"fe9\", \"fea\", \"feb\", \"fec\",\n \"ff4\", \"ff7\", \"ffd\", \"ffe\"\n ]\n },\n {\n \"path\" : \"E:\\hydrus\\hydrus_files_2\",\n \"ideal_weight\" : 2,\n \"max_num_bytes\": 805306368000,\n \"prefixes\" : [\n \"f01\", \"f03\", \"f06\", \"f07\", \"f09\", \"f0a\", \"f0b\", \"f0d\", \"f0e\", \"f0f\", \"f10\", \"f14\", \"f16\", \"f19\", \"f1c\", \"f1d\",\n \"f1e\", \"f1f\", \"f21\", \"f22\", \"f24\", \"f28\", \"f29\", \"f2a\", \"f2c\", \"f2d\", \"f30\", \"f32\", \"f33\", \"f34\", \"f39\", \"f3b\",\n \"f3c\", \"f3d\", \"f3e\", \"f3f\", \"f41\", \"f45\", \"f46\", \"f47\", \"f48\", \"f4a\", \"f4c\", \"f4f\", \"f52\", \"f53\", \"f54\", \"f56\",\n \"f57\", \"f58\", \"f5a\", \"f5b\", \"f5c\", \"f5d\", \"f5e\", \"f5f\", \"f61\", \"f62\", \"f67\", \"f6a\", \"f6b\", \"f6c\", \"f6d\", \"f6f\",\n \"f70\", \"f72\", \"f74\", \"f75\", \"f76\", \"f77\", \"f7b\", \"f7c\", \"f7e\", \"f80\", \"f81\", \"f85\", \"f8a\", \"f8b\", \"f8c\", \"f8d\",\n \"f8e\", \"f92\", \"f93\", \"f94\", \"f95\", \"f97\", \"f98\", \"f99\", \"f9a\", \"f9b\", \"f9c\", \"f9d\", \"f9f\", \"fa0\", \"fa2\", \"fa3\",\n \"fa6\", \"fa8\", \"fa9\", \"fab\", \"fac\", \"fae\", \"fb0\", \"fb2\", \"fb3\", \"fb4\", \"fb5\", \"fb6\", \"fb7\", \"fb8\", \"fbc\", \"fbd\",\n \"fbe\", \"fc0\", \"fc2\", \"fc3\", \"fc5\", \"fc6\", \"fc9\", \"fca\", \"fcb\", \"fcc\", \"fcd\", \"fce\", \"fd0\", \"fd1\", \"fd3\", \"fd4\",\n \"fd5\", \"fda\", \"fdb\", \"fdc\", \"fdd\", \"fde\", \"fe0\", \"fe1\", \"fe3\", \"fe4\", \"fe5\", \"fe6\", \"fe7\", \"fed\", \"fee\", \"fef\",\n \"ff0\", \"ff1\", \"ff2\", \"ff3\", \"ff5\", \"ff6\", \"ff8\", \"ff9\", \"ffa\", \"ffb\", \"ffc\", \"fff\"\n ]\n }\n ]\n}\n
Note that
ideal_weight
andmax_num_bytes
are provided for courtesy and mean nothing fixed. Each storage location might store anything, thumbnails or files or nothing, regardless of the ideal situation. Whenever a folder is non-ideal, the 'move media files' dialog shows \"files need to be moved now\", but it will still keep doing its thing.For now, a prefix only occurs in one location, so there will always be 512 total prefixes in this response, all unique. However, please note that this will not always be true! In a future expansion, the client will be, on user command, slowly migrating files from one place to another in the background, and during that time there will be multiple valid locations for a file to actually be. When this happens, you will have to hit all the possible locations and test.
Also, it won't be long before the client supports moving to some form of three- and four-character prefix. I am still thinking how this will happen other than it will be an atomic change--no slow migration where we try to support both at once--but it will certainly complicate something in here (e.g. while the prefix may be 'f012', maybe the subfolder will be '\\f01\\2'), so we'll see.
"},{"location":"developer_api.html#get_files_render","title":"GET/get_files/render
","text":"Get an image or ugoira file as rendered by Hydrus.
Restricted access: YES. Search for Files permission needed. Additional search permission limits may apply.Required Headers: n/a
Arguments :file_id
: (selective, numerical file id for the file)hash
: (selective, a hexadecimal SHA256 hash for the file)download
: (optional, boolean, defaultfalse
)render_format
: (optional, integer, the filetype enum value to render the file to, for still images it defaults2
for PNG, for Ugoiras it defaults to23
for APNG)render_quality
: (optional, integer, the quality or PNG compression level to use for encoding the image, default1
for PNG and80
for JPEG and WEBP, has no effect for Ugoiras using APNG)width
andheight
: (optional but must provide both if used, integer, the width and height to scale the image to. Doesn't apply to Ugoiras)
Only use one of file_id or hash. As with metadata fetching, you may only use the hash argument if you have access to all files. If you are tag-restricted, you will have to use a file_id in the last search you ran.
Currently the accepted values for
render_format
for image files are:1
for JPEG (quality
sets JPEG quality 0 to 100, always progressive 4:2:0 encoding)2
for PNG (quality
sets the compression level from 0 to 9. A higher value means a smaller size and longer compression time)33
for WEBP (quality
sets WEBP quality 1 to 100, for values over 100 lossless compression is used)
The accepted values for Ugoiras are:
23
for APNG (quality
does nothing for this format)83
for animated WEBP (quality
sets WEBP quality 1 to 100, for values over 100 lossless compression is used)
The file you request must be a still image file that Hydrus can render (this includes PSD files) or a Ugoira file. This request uses the client image cache for images.
Example request
Example request/get_files/render?file_id=452158\n
Response: A PNG (or APNG), JPEG, or WEBP file of the image as would be rendered in the client, optionally resized as specified in the query parameters. It will be converted to sRGB color if the file had a color profile but the rendered file will not have any color profile./get_files/render?hash=7f30c113810985b69014957c93bc25e8eb4cf3355dae36d8b9d011d8b0cf623a&download=true\n
By default, this will set the
"},{"location":"developer_api.html#managing_file_relationships","title":"Managing File Relationships","text":"Content-Disposition
header toinline
, which causes a web browser to show the file. If you setdownload=true
, it will set it toattachment
, which triggers the browser to automatically download it (or open the 'save as' dialog) instead.This refers to the File Relationships system, which includes 'potential duplicates', 'duplicates', and 'alternates'.
This system is pending significant rework and expansion, so please do not get too married to some of the routines here. I am mostly just exposing my internal commands, so things are a little ugly/hacked. I expect duplicate and alternate groups to get some form of official identifier in future, which may end up being the way to refer and edit things here.
Also, at least for now, 'Manage File Relationships' permission is not going to be bound by the search permission restrictions that normal file search does. Getting this file relationship management permission allows you to search anything.
There is more work to do here, including adding various 'dissolve'/'undo' commands to break groups apart.
"},{"location":"developer_api.html#manage_file_relationships_get_file_relationships","title":"GET/manage_file_relationships/get_file_relationships
","text":"Get the current relationships for one or more files.
Restricted access: YES. Manage File Relationships permission needed.Required Headers: n/a
Arguments (in percent-encoded JSON):- files
- file domain (optional, defaults to all my files)
Response: A JSON Object mapping the hashes to their relationships. Example response/manage_file_relationships/get_file_relationships?hash=ac940bb9026c430ea9530b4f4f6980a12d9432c2af8d9d39dfc67b05d91df11d\n
{\n \"file_relationships\" : {\n \"ac940bb9026c430ea9530b4f4f6980a12d9432c2af8d9d39dfc67b05d91df11d\" : {\n \"is_king\" : false,\n \"king\" : \"8784afbfd8b59de3dcf2c13dc1be9d7cb0b3d376803c8a7a8b710c7c191bb657\",\n \"king_is_on_file_domain\" : true,\n \"king_is_local\" : true,\n \"0\" : [\n ],\n \"1\" : [],\n \"3\" : [\n \"8bf267c4c021ae4fd7c4b90b0a381044539519f80d148359b0ce61ce1684fefe\"\n ],\n \"8\" : [\n \"8784afbfd8b59de3dcf2c13dc1be9d7cb0b3d376803c8a7a8b710c7c191bb657\",\n \"3fa8ef54811ec8c2d1892f4f08da01e7fc17eed863acae897eb30461b051d5c3\"\n ]\n }\n }\n}\n
king
refers to which file is set as the best of a duplicate group. If you are doing potential duplicate comparisons, the kings of your two groups are usually the ideal representatives, and the 'get some pairs to filter'-style commands try to select the kings of the various to-be-compared duplicate groups.is_king
is a convenience bool for when a file is king of its own group.It is possible for the king to not be available. Every group has a king, but if that file has been deleted, or if the file domain here is limited and the king is on a different file service, then it may not be available. A similar issue occurs when you search for filtering pairs--while it is ideal to compare kings with kings, if you set 'files must be pixel dupes', then the user will expect to see those pixel duplicates, not their champions--you may be forced to compare non-kings.
king_is_on_file_domain
lets you know if the king is on the file domain you set, andking_is_local
lets you know if it is on the hard disk--ifking_is_local=true
, you can do a/get_files/file
request on it. It is generally rare, but you have to deal with the king being unavailable--in this situation, your best bet is to just use the file itself as its own representative.All the relationships you get are filtered by the file domain. If you set the file domain to 'all known files', you will get every relationship a file has, including all deleted files, which is often less useful than you would think. The default, 'all my files', is usually most useful.
A file that has no duplicates is considered to be in a duplicate group of size 1 and thus is always its own king.
The numbers are from a duplicate status enum, as so:
- 0 - potential duplicates
- 1 - false positives
- 3 - alternates
- 8 - duplicates
Note that because of JSON constraints, these are the string versions of the integers since they are Object keys.
All the hashes given here are in 'all my files', i.e. not in the trash. A file may have duplicates that have long been deleted, but, like the null king above, they will not show here.
"},{"location":"developer_api.html#manage_file_relationships_get_potentials_count","title":"GET/manage_file_relationships/get_potentials_count
","text":"Get the count of remaining potential duplicate pairs in a particular search domain. Exactly the same as the counts you see in the duplicate processing page.
Restricted access: YES. Manage File Relationships permission needed.Required Headers: n/a
Arguments (in percent-encoded JSON):- file domain (optional, defaults to all my files)
tag_service_key_1
: (optional, default 'all known tags', a hex tag service key)tags_1
: (optional, default system:everything, a list of tags you wish to search for)tag_service_key_2
: (optional, default 'all known tags', a hex tag service key)tags_2
: (optional, default system:everything, a list of tags you wish to search for)potentials_search_type
: (optional, integer, default 0, regarding how the pairs should match the search(es))pixel_duplicates
: (optional, integer, default 1, regarding whether the pairs should be pixel duplicates)max_hamming_distance
: (optional, integer, default 4, the max 'search distance' of the pairs)
/manage_file_relationships/get_potentials_count?tag_service_key_1=c1ba23c60cda1051349647a151321d43ef5894aacdfb4b4e333d6c4259d56c5f&tags_1=%5B%22dupes_to_process%22%2C%20%22system%3Awidth%3C400%22%5D&potentials_search_type=1&pixel_duplicates=2&max_hamming_distance=0&max_num_pairs=50\n
tag_service_key_x
andtags_x
work the same as /get_files/search_files. The_2
variants are only useful if thepotentials_search_type
is 2.potentials_search_type
andpixel_duplicates
are enums:- 0 - one file matches search 1
- 1 - both files match search 1
- 2 - one file matches search 1, the other 2
-and-
- 0 - must be pixel duplicates
- 1 - can be pixel duplicates
- 2 - must not be pixel duplicates
The
Response: A JSON Object stating the count. Example responsemax_hamming_distance
is the same 'search distance' you see in the Client UI. A higher number means more speculative 'similar files' search. Ifpixel_duplicates
is set to 'must be', thenmax_hamming_distance
is obviously ignored.{\n \"potential_duplicates_count\" : 17\n}\n
If you confirm that a pair of potentials are duplicates, this may transitively collapse other potential pairs and decrease the count by more than 1.
"},{"location":"developer_api.html#manage_file_relationships_get_potential_pairs","title":"GET/manage_file_relationships/get_potential_pairs
","text":"Get some potential duplicate pairs for a filtering workflow. Exactly the same as the 'duplicate filter' in the duplicate processing page.
Restricted access: YES. Manage File Relationships permission needed.Required Headers: n/a
Arguments (in percent-encoded JSON):- file domain (optional, defaults to all my files)
tag_service_key_1
: (optional, default 'all known tags', a hex tag service key)tags_1
: (optional, default system:everything, a list of tags you wish to search for)tag_service_key_2
: (optional, default 'all known tags', a hex tag service key)tags_2
: (optional, default system:everything, a list of tags you wish to search for)potentials_search_type
: (optional, integer, default 0, regarding how the pairs should match the search(es))pixel_duplicates
: (optional, integer, default 1, regarding whether the pairs should be pixel duplicates)max_hamming_distance
: (optional, integer, default 4, the max 'search distance' of the pairs)max_num_pairs
: (optional, integer, defaults to client's option, how many pairs to get in a batch)
/manage_file_relationships/get_potential_pairs?tag_service_key_1=c1ba23c60cda1051349647a151321d43ef5894aacdfb4b4e333d6c4259d56c5f&tags_1=%5B%22dupes_to_process%22%2C%20%22system%3Awidth%3C400%22%5D&potentials_search_type=1&pixel_duplicates=2&max_hamming_distance=0&max_num_pairs=50\n
The search arguments work the same as /manage_file_relationships/get_potentials_count.
Response: A JSON Object listing a batch of hash pairs. Example responsemax_num_pairs
is simple and just caps how many pairs you get.{\n \"potential_duplicate_pairs\" : [\n [ \"16470d6e73298cd75d9c7e8e2004810e047664679a660a9a3ba870b0fa3433d3\", \"7ed062dc76265d25abeee5425a859cfdf7ab26fd291f50b8de7ca381e04db079\" ],\n [ \"eeea390357f259b460219d9589b4fa11e326403208097b1a1fbe63653397b210\", \"9215dfd39667c273ddfae2b73d90106b11abd5fd3cbadcc2afefa526bb226608\" ],\n [ \"a1ea7d671245a3ae35932c603d4f3f85b0d0d40c5b70ffd78519e71945031788\", \"8e9592b2dfb436fe0a8e5fa15de26a34a6dfe4bca9d4363826fac367a9709b25\" ]\n ]\n}\n
The selected pair sample and their order is strictly hardcoded for now (e.g. to guarantee that a decision will not invalidate any other pair in the batch, you shouldn't see the same file twice in a batch, nor two files in the same duplicate group). Treat it as the client filter does, where you fetch batches to process one after another. I expect to make it more flexible in future, in the client itself and here.
You will see significantly fewer than
"},{"location":"developer_api.html#manage_file_relationships_get_random_potentials","title":"GETmax_num_pairs
(and potential duplicate count) as you close to the last available pairs, and when there are none left, you will get an empty list./manage_file_relationships/get_random_potentials
","text":"Get some random potentially duplicate file hashes. Exactly the same as the 'show some random potential dupes' button in the duplicate processing page.
Restricted access: YES. Manage File Relationships permission needed.Required Headers: n/a
Arguments (in percent-encoded JSON):- file domain (optional, defaults to all my files)
tag_service_key_1
: (optional, default 'all known tags', a hex tag service key)tags_1
: (optional, default system:everything, a list of tags you wish to search for)tag_service_key_2
: (optional, default 'all known tags', a hex tag service key)tags_2
: (optional, default system:everything, a list of tags you wish to search for)potentials_search_type
: (optional, integer, default 0, regarding how the files should match the search(es))pixel_duplicates
: (optional, integer, default 1, regarding whether the files should be pixel duplicates)max_hamming_distance
: (optional, integer, default 4, the max 'search distance' of the files)
/manage_file_relationships/get_random_potentials?tag_service_key_1=c1ba23c60cda1051349647a151321d43ef5894aacdfb4b4e333d6c4259d56c5f&tags_1=%5B%22dupes_to_process%22%2C%20%22system%3Awidth%3C400%22%5D&potentials_search_type=1&pixel_duplicates=2&max_hamming_distance=0\n
The arguments work the same as /manage_file_relationships/get_potentials_count, with the caveat that
potentials_search_type
has special logic:- 0 - first file matches search 1
- 1 - all files match search 1
- 2 - first file matches search 1, the others 2
Essentially, the first hash is the 'master' to which the others are paired. The other files will include every matching file.
Response: A JSON Object listing a group of hashes exactly as the client would. Example response{\n \"random_potential_duplicate_hashes\" : [\n \"16470d6e73298cd75d9c7e8e2004810e047664679a660a9a3ba870b0fa3433d3\",\n \"7ed062dc76265d25abeee5425a859cfdf7ab26fd291f50b8de7ca381e04db079\",\n \"9e0d6b928b726562d70e1f14a7b506ba987c6f9b7f2d2e723809bb11494c73e6\",\n \"9e01744819b5ff2a84dda321e3f1a326f40d0e7f037408ded9f18a11ee2b2da8\"\n ]\n}\n
If there are no potential duplicate groups in the search, this returns an empty list.
"},{"location":"developer_api.html#manage_file_relationships_remove_potentials","title":"POST/manage_file_relationships/remove_potentials
","text":"Remove all potential pairs that any of the given files are a part of. If you hit /manage_file_relationships/get_file_relationships after this on any of these files, they will have no potential relationships, and any hashes that were potential to them before will no longer, conversely, refer to these files as potentials.
Restricted access: YES. Manage File Relationships permission needed. Required Headers:Content-Type
: application/json
- files
Response: 200 with no content.{\n \"file_id\" : 123\n}\n
If the files are a part of any potential pairs (with any files, including those you did not specify), those pairs will be deleted. This deletes everything they are involved in, and the files will not be queued up for a re-scan, so I recommend you only do this if you know you added the potentials yourself (e.g. this is regarding video files) or you otherwise have a plan to replace the deleted potential pairs with something more useful.
"},{"location":"developer_api.html#manage_file_relationships_set_file_relationships","title":"POST/manage_file_relationships/set_file_relationships
","text":"Set the relationships to the specified file pairs.
Restricted access: YES. Manage File Relationships permission needed. Required Headers:Content-Type
: application/json
relationships
: (a list of Objects, one for each file-pair being set)
Each Object is:
* `hash_a`: (a hexadecimal SHA256 hash)\n* `hash_b`: (a hexadecimal SHA256 hash)\n* `relationship`: (integer enum for the relationship being set)\n* `do_default_content_merge`: (bool)\n* `delete_a`: (optional, bool, default false)\n* `delete_b`: (optional, bool, default false)\n
hash_a
andhash_b
are normal hex SHA256 hashes for your file pair.relationship
is one of this enum:- 0 - set as potential duplicates
- 1 - set as false positives
- 2 - set as same quality
- 3 - set as alternates
- 4 - set A as better
- 7 - set B as better
2, 4, and 7 all make the files 'duplicates' (8 under
/get_file_relationships
), which, specifically, merges the two files' duplicate groups. 'same quality' has different duplicate content merge options to the better/worse choices, but it ultimately sets something similar to A>B (but see below for more complicated outcomes). You obviously don't have to use 'B is better' if you prefer just to swap the hashes. Do what works for you.do_default_content_merge
sets whether the user's duplicate content merge options should be loaded and applied to the files along with the relationship. Most operations in the client do this automatically, so the user may expect it to apply, but if you want to do content merge yourself, set this to false.
Example request bodydelete_a
anddelete_b
are booleans that select whether to delete A and/or B in the same operation as setting the relationship. You can also do this externally if you prefer.
Response: 200 with no content.{\n \"relationships\" : [\n {\n \"hash_a\" : \"b54d09218e0d6efc964b78b070620a1fa19c7e069672b4c6313cee2c9b0623f2\",\n \"hash_b\" : \"bbaa9876dab238dcf5799bfd8319ed0bab805e844f45cf0de33f40697b11a845\",\n \"relationship\" : 4,\n \"do_default_content_merge\" : true,\n \"delete_b\" : true\n },\n {\n \"hash_a\" : \"22667427eaa221e2bd7ef405e1d2983846c863d40b2999ce8d1bf5f0c18f5fb2\",\n \"hash_b\" : \"65d228adfa722f3cd0363853a191898abe8bf92d9a514c6c7f3c89cfed0bf423\",\n \"relationship\" : 4,\n \"do_default_content_merge\" : true,\n \"delete_b\" : true\n },\n {\n \"hash_a\" : \"0480513ffec391b77ad8c4e57fe80e5b710adfa3cb6af19b02a0bd7920f2d3ec\",\n \"hash_b\" : \"5fab162576617b5c3fc8caabea53ce3ab1a3c8e0a16c16ae7b4e4a21eab168a7\",\n \"relationship\" : 2,\n \"do_default_content_merge\" : true\n }\n ]\n}\n
If you try to add an invalid or redundant relationship, for instance setting files that are already duplicates as potential duplicates, no changes are made.
This is the file relationships request that is probably most likely to change in future. I may implement content merge options. I may move from file pairs to group identifiers. When I expand alternates, those file groups are going to support more variables.
"},{"location":"developer_api.html#king_merge_rules","title":"king merge rules","text":"Recall in
/get_file_relationships
that we discussed how duplicate groups have a 'king' for their best file. This file is the most useful representative when you do comparisons, since if you say \"King A > King B\", then we know that King A is also better than all of King B's normal duplicate group members. We can merge the group simply just by folding King B and all the other members into King A's group.So what happens if you say 'A = B'? We have to have a king, so which should it be?
What happens if you say \"non-king member of A > non-king member of B\"? We don't want to merge all of B into A, since King B might be higher quality than King A.
The logic here can get tricky, but I have tried my best to avoid overcommitting and accidentally promoting the wrong king. Here are all the possible situations ('>' means 'better than', and '=' means 'same quality as'):
Merges- King A > King B
- Merge B into A
- King A is king of the new combined group
- Non-King A > King B
- Merge B into A
- King of A is king of the new combined group
- King A > Non-King B
- Remove Non-King B from B and merge it into A
- King A stays king of A
- King of B stays king of B
- Non-King A > Non-King B
- Remove Non-King B from B and merge it into A
- King of A stays king of A
- King of B stays king of B
- King A = King B
- Merge B into A
- King A is king of the new combined group
- Non-King A = King B
- Merge B into A
- King of A is king of the new combined group
- King A = Non-King B
- Merge A into B
- King of B is king of the new combined group
- Non-King A = Non-King B
- Remove Non-King B from B and merge it into A
- King of A stays king of A
- King of B stays king of B
So, if you can, always present kings to your users, and action using those kings' hashes. It makes the merge logic easier in all cases. Remember that you can set
"},{"location":"developer_api.html#manage_file_relationships_set_kings","title":"POSTsystem:is the best quality file of its duplicate group
in any file search to exclude any non-kings (e.g. if you are hunting for easily actionable pixel potential duplicates)./manage_file_relationships/set_kings
","text":"Set the specified files to be the kings of their duplicate groups.
Restricted access: YES. Manage File Relationships permission needed. Required Headers:Content-Type
: application/json
- files
Response: 200 with no content.{\n \"file_id\" : 123\n}\n
The files will be promoted to be the kings of their respective duplicate groups. If the file is already the king (also true for any file with no duplicates), this is idempotent. It also processes the files in the given order, so if you specify two files in the same group, the latter will be the king at the end of the request.
"},{"location":"developer_api.html#managing_services","title":"Managing Services","text":"For now, this refers to just seeing and committing pending content (which you see in the main \"pending\" menubar if you have an IPFS, Tag Repository, or File Repository service).
"},{"location":"developer_api.html#manage_services_get_pending_counts","title":"GET/manage_services/get_pending_counts
","text":"Get the counts of pending content for each upload-capable service. This basically lets you construct the \"pending\" menu in the main GUI menubar.
Restricted access: YES. Start Upload permission needed.Required Headers: n/a
Arguments: n/a
Example request
Response: A JSON Object of all the service keys capable of uploading and their current pending content counts. Also The Services Object. Example response/manage_services/get_pending_counts\n
{\n \"services\" : \"The Services Object\",\n \"pending_counts\" : {\n \"ae91919b0ea95c9e636f877f57a69728403b65098238c1a121e5ebf85df3b87e\" : {\n \"pending_tag_mappings\" : 11564,\n \"petitioned_tag_mappings\" : 5,\n \"pending_tag_siblings\" : 2,\n \"petitioned_tag_siblings\" : 0,\n \"pending_tag_parents\" : 0,\n \"petitioned_tag_parents\" : 0\n },\n \"3902aabc3c4c89d1b821eaa9c011be3047424fd2f0c086346e84794e08e136b0\" : {\n \"pending_tag_mappings\" : 0,\n \"petitioned_tag_mappings\" : 0,\n \"pending_tag_siblings\" : 0,\n \"petitioned_tag_siblings\" : 0,\n \"pending_tag_parents\" : 0,\n \"petitioned_tag_parents\" : 0\n },\n \"e06e1ae35e692d9fe2b83cde1510a11ecf495f51910d580681cd60e6f21fde73\" : {\n \"pending_files\" : 2,\n \"petitioned_files\" : 0\n }\n }\n}\n
The keys are as in /get_services.
Each count here represents one 'row' of content, so for \"tag_mappings\" that is one (tag, file) pair and for \"tag_siblings\" one (tag, tag) pair. You always get everything, even if the counts are all 0.
"},{"location":"developer_api.html#manage_services_commit_pending","title":"POST/manage_services/commit_pending
","text":"Start the job to upload a service's pending content.
Restricted access: YES. Start Upload permission needed.Required Headers: n/a
Arguments (in JSON):service_key
: (the service to commit)
{\n \"service_key\" : \"ae91919b0ea95c9e636f877f57a69728403b65098238c1a121e5ebf85df3b87e\"\n}\n
This starts the upload popup, just like if you click 'commit' in the menu. This upload could ultimately take one second or several minutes to finish, but the response will come back immediately.
If the job is already running, this will return 409. If it cannot start because of a difficult problem, like all repositories being paused or the service account object being unsynced or something, it gives 422; in this case, please direct the user to check their client manually, since there is probably an error popup on screen.
If tracking the upload job's progress is important, you could hit it again and see if it gives 409, or you could /get_pending_counts again--since the counts will update live as the upload happens--but note that the user may pend more just after the upload is complete, so do not wait forever for it to fall back down to 0.
"},{"location":"developer_api.html#manage_services_forget_pending","title":"POST/manage_services/forget_pending
","text":"Forget all pending content for a service.
Restricted access: YES. Start Upload permission needed.Required Headers: n/a
Arguments (in JSON):service_key
: (the service to forget for)
{\n \"service_key\" : \"ae91919b0ea95c9e636f877f57a69728403b65098238c1a121e5ebf85df3b87e\"\n}\n
This clears all pending content for a service, just like if you click 'forget' in the menu.
Response description: 200 and no content."},{"location":"developer_api.html#managing_cookies","title":"Managing Cookies","text":"This refers to the cookies held in the client's session manager, which you can review under network->data->manage session cookies. These are sent to every request on the respective domains.
"},{"location":"developer_api.html#manage_cookies_get_cookies","title":"GET/manage_cookies/get_cookies
","text":"Get the cookies for a particular domain.
Restricted access: YES. Manage Cookies and Headers permission needed.Required Headers: n/a
Arguments:domain
Response: A JSON Object listing all the cookies for that domain in [ name, value, domain, path, expires ] format. Example response/manage_cookies/get_cookies?domain=gelbooru.com\n
{\n \"cookies\" : [\n [\"__cfduid\", \"f1bef65041e54e93110a883360bc7e71\", \".gelbooru.com\", \"/\", 1596223327],\n [\"pass_hash\", \"0b0833b797f108e340b315bc5463c324\", \"gelbooru.com\", \"/\", 1585855361],\n [\"user_id\", \"123456\", \"gelbooru.com\", \"/\", 1585855361]\n ]\n}\n
"},{"location":"developer_api.html#manage_cookies_set_cookies","title":"POSTNote that these variables are all strings except 'expires', which is either an integer timestamp or _null_ for session cookies.\n\nThis request will also return any cookies for subdomains. The session system in hydrus generally stores cookies according to the second-level domain, so if you request for specific.someoverbooru.net, you will still get the cookies for someoverbooru.net and all its subdomains.\n
/manage_cookies/set_cookies
","text":"Set some new cookies for the client. This makes it easier to 'copy' a login from a web browser or similar to hydrus if hydrus's login system can't handle the site yet.
Restricted access: YES. Manage Cookies and Headers permission needed. Required Headers:Content-Type
: application/json
cookies
: (a list of cookie rows in the same format as the GET request above)
{\n \"cookies\" : [\n [\"PHPSESSID\", \"07669eb2a1a6e840e498bb6e0799f3fb\", \".somesite.com\", \"/\", 1627327719],\n [\"tag_filter\", \"1\", \".somesite.com\", \"/\", 1627327719]\n ]\n}\n
You can set 'value' to be null, which will clear any existing cookie with the corresponding name, domain, and path (acting essentially as a delete).
Expires can be null, but session cookies will time-out in hydrus after 60 minutes of non-use.
"},{"location":"developer_api.html#managing_http_headers","title":"Managing HTTP Headers","text":"This refers to the custom headers you can see under network->data->manage http headers.
"},{"location":"developer_api.html#manage_headers_get_headers","title":"GET/manage_headers/get_headers
","text":"Get the custom http headers.
Restricted access: YES. Manage Cookies and Headers permission needed.Required Headers: n/a
Arguments:domain
: optional, the domain to fetch headers for
Example request (for global)/manage_headers/get_headers?domain=gelbooru.com\n
Response: A JSON Object listing all the headers: Example response/manage_headers/get_headers\n
"},{"location":"developer_api.html#manage_headers_set_headers","title":"POST{\n \"network_context\" : {\n \"type\" : 2,\n \"data\" : \"gelbooru.com\"\n },\n \"headers\" : {\n \"User-Agent\" : {\n \"value\" : \"Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:56.0) Gecko/20100101 Firefox/56.0\",\n \"approved\" : \"approved\",\n \"reason\" : \"Set by Client API\"\n },\n \"DNT\" : {\n \"value\" : \"1\",\n \"approved\" : \"approved\",\n \"reason\" : \"Set by Client API\"\n }\n }\n}\n
/manage_headers/set_headers
","text":"Manages the custom http headers.
Restricted access: YES. Manage Cookies and Headers permission needed. Required Headers: *Content-Type
: application/json Arguments (in JSON):domain
: (optional, the specific domain to set the header for)headers
: (a JSON Object that holds \"key\" objects)
Example request body that deletes{\n \"domain\" : \"mysite.com\",\n \"headers\" : {\n \"User-Agent\" : {\n \"value\" : \"Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:56.0) Gecko/20100101 Firefox/56.0\"\n },\n \"DNT\" : {\n \"value\" : \"1\"\n },\n \"CoolStuffToken\" : {\n \"value\" : \"abcdef0123456789\",\n \"approved\" : \"pending\",\n \"reason\" : \"This unlocks the Sonic fanfiction!\"\n }\n }\n}\n
{\n \"domain\" : \"myothersite.com\",\n \"headers\" : {\n \"User-Agent\" : {\n \"value\" : null\n },\n \"Authorization\" : {\n \"value\" : null\n }\n }\n}\n
If you do not set a domain, or you set it to
null
, the 'context' will be the global context, which applies as a fallback to all jobs.Domain headers also apply to their subdomains--unless they are overwritten by specific subdomain entries.
Each
key
Object underheaders
has the same form as /manage_headers/get_headers.value
is obvious--it is the value of the header. If the pair doesn't exist yet, you need thevalue
, but if you just want to approve something, it is optional. Set it tonull
to delete an existing pair.You probably won't ever use
approved
orreason
, but they plug into the 'validation' system in the client. They are both optional. Approved can be any of[ approved, denied, pending ]
, and by default everything you add will beapproved
. If there is anythingpending
when a network job asks, the user will be presented with a yes/no popup presenting the reason for the header. If they click 'no', the header is set todenied
and the network job goes ahead without it. If you have a header that changes behaviour or unlocks special content, you might like to make it optional in this way.If you need to reinstate it, the default
"},{"location":"developer_api.html#manage_headers_set_user_agent","title":"POSTglobal
User-Agent
isMozilla/5.0 (compatible; Hydrus Client)
./manage_headers/set_user_agent
","text":"This is deprecated--move to /manage_headers/set_headers!
This sets the 'Global' User-Agent for the client, as typically editable under network->data->manage http headers, for instance if you want hydrus to appear as a specific browser associated with some cookies.
Restricted access: YES. Manage Cookies and Headers permission needed. Required Headers: *Content-Type
: application/json Arguments (in JSON):user-agent
: (a string)
{\n \"user-agent\" : \"Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:56.0) Gecko/20100101 Firefox/56.0\"\n}\n
Send an empty string to reset the client back to the default User-Agent, which should be
"},{"location":"developer_api.html#managing_pages","title":"Managing Pages","text":"Mozilla/5.0 (compatible; Hydrus Client)
.This refers to the pages of the main client UI.
"},{"location":"developer_api.html#manage_pages_get_pages","title":"GET/manage_pages/get_pages
","text":"Get the page structure of the current UI session.
Restricted access: YES. Manage Pages permission needed.Required Headers: n/a
Arguments: n/a
Response: A JSON Object of the top-level page 'notebook' (page of pages) detailing its basic information and current sub-pages. Page of pages beneath it will list their own sub-page lists. Example response{\n \"pages\" : {\n \"name\" : \"top pages notebook\",\n \"page_key\" : \"3b28d8a59ec61834325eb6275d9df012860a1ecfd9e1246423059bc47fb6d5bd\",\n \"page_state\" : 0,\n \"page_type\" : 10,\n \"is_media_page\" : false,\n \"selected\" : true,\n \"pages\" : [\n {\n \"name\" : \"files\",\n \"page_key\" : \"d436ff5109215199913705eb9a7669d8a6b67c52e41c3b42904db083255ca84d\",\n \"page_state\" : 0,\n \"page_type\" : 6,\n \"is_media_page\" : true,\n \"selected\" : false\n },\n {\n \"name\" : \"thread watcher\",\n \"page_key\" : \"40887fa327edca01e1d69b533dddba4681b2c43e0b4ebee0576177852e8c32e7\",\n \"page_state\" : 0,\n \"page_type\" : 9,\n \"is_media_page\" : true,\n \"selected\" : false\n },\n {\n \"name\" : \"pages\",\n \"page_key\" : \"2ee7fa4058e1e23f2bd9e915cdf9347ae90902a8622d6559ba019a83a785c4dc\",\n \"page_state\" : 0,\n \"page_type\" : 10,\n \"is_media_page\" : false,\n \"selected\" : true,\n \"pages\" : [\n {\n \"name\" : \"urls\",\n \"page_key\" : \"9fe22cb760d9ee6de32575ed9f27b76b4c215179cf843d3f9044efeeca98411f\",\n \"page_state\" : 0,\n \"page_type\" : 7,\n \"is_media_page\" : true,\n \"selected\" : true\n },\n {\n \"name\" : \"files\",\n \"page_key\" : \"2977d57fc9c588be783727bcd54225d577b44e8aa2f91e365a3eb3c3f580dc4e\",\n \"page_state\" : 0,\n \"page_type\" : 6,\n \"is_media_page\" : true,\n \"selected\" : false\n }\n ]\n }\n ]\n }\n}\n
name
is the full text on the page tab.page_key
is a unique identifier for the page. It will stay the same for a particular page throughout the session, but new ones are generated on a session reload.page_type
is as follows:- 1 - Gallery downloader
- 2 - Simple downloader
- 3 - Hard drive import
- 5 - Petitions (used by repository janitors)
- 6 - File search
- 7 - URL downloader
- 8 - Duplicates
- 9 - Thread watcher
- 10 - Page of pages
page_state
is as follows:- 0 - ready
- 1 - initialising
- 2 - searching/loading
- 3 - search cancelled
Most pages will be 0, normal/ready, at all times. Large pages will start in an 'initialising' state for a few seconds, which means their session-saved thumbnails aren't loaded yet. Search pages will enter 'searching' after a refresh or search change and will either return to 'ready' when the search is complete, or fall to 'search cancelled' if the search was interrupted (usually this means the user clicked the 'stop' button that appears after some time).
is_media_page
is simply a shorthand for whether the page is a normal page that holds thumbnails or a 'page of pages'. Only media pages can have files (and accept /manage_files/add_files commands).selected
means which page is currently in view. It will propagate down the page of pages until it terminates. It may terminate in an empty page of pages, so do not assume it will end on a media page.The top page of pages will always be there, and always selected.
"},{"location":"developer_api.html#manage_pages_get_page_info","title":"GET/manage_pages/get_page_info
","text":"Get information about a specific page.
Under Construction
This is under construction. The current call dumps a ton of info for different downloader pages. Please experiment in IRL situations and give feedback for now! I will flesh out this help with more enumeration info and examples as this gets nailed down. POST commands to alter pages (adding, removing, highlighting), will come later.
Restricted access: YES. Manage Pages permission needed.Required Headers: n/a
Arguments:page_key
: (hexadecimal page_key as stated in /manage_pages/get_pages)simple
: true or false (optional, defaulting to true)
Response description A JSON Object of the page's information. At present, this mostly means downloader information. Example response with simple = true/manage_pages/get_page_info?page_key=aebbf4b594e6986bddf1eeb0b5846a1e6bc4e07088e517aff166f1aeb1c3c9da&simple=true\n
{\n \"page_info\" : {\n \"name\" : \"threads\",\n \"page_key\" : \"aebbf4b594e6986bddf1eeb0b5846a1e6bc4e07088e517aff166f1aeb1c3c9da\",\n \"page_state\" : 0,\n \"page_type\" : 3,\n \"is_media_page\" : true,\n \"management\" : {\n \"multiple_watcher_import\" : {\n \"watcher_imports\" : [\n {\n \"url\" : \"https://someimageboard.net/m/123456\",\n \"watcher_key\" : \"cf8c3525c57a46b0e5c2625812964364a2e801f8c49841c216b8f8d7a4d06d85\",\n \"created\" : 1566164269,\n \"last_check_time\" : 1566164272,\n \"next_check_time\" : 1566174272,\n \"files_paused\" : false,\n \"checking_paused\" : false,\n \"checking_status\" : 0,\n \"subject\" : \"gundam pictures\",\n \"imports\" : {\n \"status\" : \"4 successful (2 already in db)\",\n \"simple_status\" : \"4\",\n \"total_processed\" : 4,\n \"total_to_process\" : 4\n },\n \"gallery_log\" : {\n \"status\" : \"1 successful\",\n \"simple_status\" : \"1\",\n \"total_processed\" : 1,\n \"total_to_process\" : 1\n }\n },\n {\n \"url\" : \"https://someimageboard.net/a/1234\",\n \"watcher_key\" : \"6bc17555b76da5bde2dcceedc382cf7d23281aee6477c41b643cd144ec168510\",\n \"created\" : 1566063125,\n \"last_check_time\" : 1566063133,\n \"next_check_time\" : 1566104272,\n \"files_paused\" : false,\n \"checking_paused\" : true,\n \"checking_status\" : 1,\n \"subject\" : \"anime pictures\",\n \"imports\" : {\n \"status\" : \"124 successful (22 already in db), 2 previously deleted\",\n \"simple_status\" : \"124\",\n \"total_processed\" : 124,\n \"total_to_process\" : 124\n },\n \"gallery_log\" : {\n \"status\" : \"3 successful\",\n \"simple_status\" : \"3\",\n \"total_processed\" : 3,\n \"total_to_process\" : 3\n }\n }\n ]\n },\n \"highlight\" : \"cf8c3525c57a46b0e5c2625812964364a2e801f8c49841c216b8f8d7a4d06d85\"\n }\n },\n \"media\" : {\n \"num_files\" : 4\n }\n}\n
name
,page_key
,page_state
, andpage_type
are as in /manage_pages/get_pages.As you can see, even the 'simple' mode can get very large. Imagine that response for a page watching 100 threads! Turning simple mode off will display every import item, gallery log entry, and all hashes in the media (thumbnail) panel.
For this first version, the five importer pages--hdd import, simple downloader, url downloader, gallery page, and watcher page--all give rich info based on their specific variables. The first three only have one importer/gallery log combo, but the latter two of course can have multiple. The \"imports\" and \"gallery_log\" entries are all in the same data format.
"},{"location":"developer_api.html#manage_pages_add_files","title":"POST/manage_pages/add_files
","text":"Add files to a page.
Restricted access: YES. Manage Pages permission needed. Required Headers:Content-Type
: application/json
page_key
: (the page key for the page you wish to add files to)- files
The files you set will be appended to the given page, just like a thumbnail drag and drop operation. The page key is the same as fetched in the /manage_pages/get_pages call.
Example request body
Response: 200 with no content. If the page key is not found, it will 404. If you try to add files to a 'page of pages' (i.e.{\n \"page_key\" : \"af98318b6eece15fef3cf0378385ce759bfe056916f6e12157cd928eb56c1f18\",\n \"file_ids\" : [123, 124, 125]\n}\n
is_media_page=false
in the /manage_pages/get_pages call), you'll get 400."},{"location":"developer_api.html#manage_pages_focus_page","title":"POST/manage_pages/focus_page
","text":"'Show' a page in the main GUI, making it the current page in view. If it is already the current page, no change is made.
Restricted access: YES. Manage Pages permission needed. Required Headers:Content-Type
: application/json
page_key
: (the page key for the page you wish to show)
The page key is the same as fetched in the /manage_pages/get_pages call.
Example request body
Response: 200 with no content. If the page key is not found, this will 404."},{"location":"developer_api.html#manage_pages_refresh_page","title":"POST{\n \"page_key\" : \"af98318b6eece15fef3cf0378385ce759bfe056916f6e12157cd928eb56c1f18\"\n}\n
/manage_pages/refresh_page
","text":"Refresh a page in the main GUI. Like hitting F5 in the client, this obviously makes file search pages perform their search again, but for other page types it will force the currently in-view files to be re-sorted.
Restricted access: YES. Manage Pages permission needed. Required Headers:Content-Type
: application/json
page_key
: (the page key for the page you wish to refresh)
The page key is the same as fetched in the /manage_pages/get_pages call. If a file search page is not set to 'searching immediately', a 'refresh' command does nothing.
Example request body
Response: 200 with no content. If the page key is not found, this will 404.{\n \"page_key\" : \"af98318b6eece15fef3cf0378385ce759bfe056916f6e12157cd928eb56c1f18\"\n}\n
Poll the
"},{"location":"developer_api.html#managing_popups","title":"Managing Popups","text":"page_state
in /manage_pages/get_pages or /manage_pages/get_page_info to see when the search is complete.Under Construction
This is under construction. The popup managment APIs and data structures may change in future versions.
"},{"location":"developer_api.html#job_status_objects","title":"Job Status Objects","text":"Job statuses represent shared information about a job in hydrus. In the API they are currently only used for popups.
Job statuses have these fields:
key
: the generated hex key identifying the job statuscreation_time
: the UNIX timestamp when the job status was created, as a floating point number in seconds.status_title
: the title for the job statusstatus_text_1
andstatus_text_2
: Two fields for body texthad_error
: a boolean indiciating if the job status has an error.traceback
: if the job status has an error this will contain the traceback text.is_cancellable
: a boolean indicating the job can be canceled.is_cancelled
: a boolean indicating the job has been cancelled.is_deleted
: a boolean indicating the job status has been dismissed but not removed yet.is_pausable
: a boolean indicating the job can be pausedis_paused
: a boolean indicating the job is paused.is_working
: a boolean indicating whether the job is currently working.nice_string
: a string representing the job status. This is generated using thestatus_title
,status_text_1
,status_text_2
, andtraceback
if present.attached_files_mergable
: a boolean indicating whether the files in the job status can be merged with the files of another submitted job status with the same label.popup_gauge_1
andpopup_gauge_2
: each of these is a 2 item array of numbers representing a progress bar shown in the client. The first number is the current value and the second is the maximum of the range. The minimum is always 0. When using these in combination with thestatus_text
fields they are shown in this order:status_text_1
,popup_gauge_1
,status_text_2
,popup_gauge_2
.api_data
: an arbitrary object for use by API clients.files
: an object representing the files attached to this job status, shown as a button in the client that opens a search page for the given hashes. It has these fields:hashes
: an array of sha256 hashes.label
: the label for the show files button.
user_callable_label
: if the job status has a user callable function this will be the label for the button that triggers it.network_job
: An object represneting the current network job. It has these fields:url
: the url being downloaded.waiting_on_connection_error
: booleandomain_ok
: booleanwaiting_on_serverside_bandwidth
: booleanno_engine_yet
: booleanhas_error
: booleantotal_data_used
: integer number of bytesis_done
: booleanstatus_text
: stringcurrent_speed
: integer number of bytes per secondbytes_read
: integer number of bytesbytes_to_read
: integer number of bytes
All fields other than
"},{"location":"developer_api.html#manage_popups_get_popups","title":"GETkey
andcreation_time
are optional and will only be returned if they're set./manage_popups/get_popups
","text":"Get a list of popups from the client.
Restricted access: YES. Manage Popups permission needed.Required Headers: n/a
Arguments:only_in_view
: whether to show only the popups currently in view in the client, true or false (optional, defaulting to false)
job_statuses
which is a list of job status objects Example response
"},{"location":"developer_api.html#manage_popups_add_popuip","title":"POST{\n \"job_statuses\": [\n {\n \"key\": \"e57d42d53f957559ecaae3054417d28bfef3cd84bbced352be75dedbefb9a40e\",\n \"creation_time\": 1700348905.7647762,\n \"status_text_1\": \"This is a test popup message\",\n \"had_error\": false,\n \"is_cancellable\": false,\n \"is_cancelled\": false,\n \"is_done\": true,\n \"is_pausable\": false,\n \"is_paused\": false,\n \"is_working\": true,\n \"nice_string\": \"This is a test popup message\"\n },\n {\n \"key\": \"0d9e134fe0b30b05f39062b48bd60c35cb3bf3459c967d4cf95dde4d01bbc801\",\n \"creation_time\": 1700348905.7667763,\n \"status_title\": \"sub gap downloader test\",\n \"had_error\": false,\n \"is_cancellable\": false,\n \"is_cancelled\": false,\n \"is_done\": true,\n \"is_pausable\": false,\n \"is_paused\": false,\n \"is_working\": true,\n \"nice_string\": \"sub gap downloader test\",\n \"user_callable_label\": \"start a new downloader for this to fill in the gap!\"\n },\n {\n \"key\": \"d59173b59c96b841ab82a08a05556f04323f8446abbc294d5a35851fa01035e6\",\n \"creation_time\": 1700689162.6635988,\n \"status_text_1\": \"downloading files for \\\"elf\\\" (1/1)\",\n \"status_text_2\": \"file 4/27: downloading file\",\n \"status_title\": \"subscriptions - safebooru\",\n \"had_error\": false,\n \"is_cancellable\": true,\n \"is_cancelled\": false,\n \"is_done\": false,\n \"is_pausable\": false,\n \"is_paused\": false,\n \"is_working\": true,\n \"nice_string\": \"subscriptions - safebooru\\r\\ndownloading files for \\\"elf\\\" (1/1)\\r\\nfile 4/27: downloading file\",\n \"popup_gauge_2\": [\n 3,\n 27\n ],\n \"files\": {\n \"hashes\": [\n \"9b5485f83948bf369892dc1234c0a6eef31a6293df3566f3ee6034f2289fe984\",\n \"cd6ebafb8b39b3455fe382cba0daeefea87848950a6af7b3f000b05b43f2d4f2\",\n \"422cebabc95fabcc6d9a9488060ea88fd2f454e6eb799de8cafa9acd83595d0d\"\n ],\n \"label\": \"safebooru: elf\"\n },\n \"network_job\": {\n \"url\": \"https://safebooru.org//images/4425/17492ccf2fe97591e14531d4b070e922c70384c9.jpg\",\n \"waiting_on_connection_error\": false,\n \"domain_ok\": true,\n \"waiting_on_serverside_bandwidth\": false,\n \"no_engine_yet\": false,\n \"has_error\": false,\n \"total_data_used\": 2031616,\n \"is_done\": false,\n \"status_text\": \"downloading\u2026\",\n \"current_speed\": 2031616,\n \"bytes_read\": 2031616,\n \"bytes_to_read\": 3807369\n }\n }\n ]\n}\n
/manage_popups/add_popup
","text":"Add a popup.
Restricted access: YES. Manage Popups permission needed. Required Headers:Content-Type
: application/json
- it accepts these fields of a job status object:
is_cancellable
is_pausable
attached_files_mergable
status_title
status_text_1
andstatus_text_2
popup_gauge_1
andpopup_gauge_2
api_data
files_label
: the label for the files attached to the job status. It will be returned aslabel
in thefiles
object in the job status object.- files that will be added to the job status. They will be returned as
hashes
in thefiles
object in the job status object.files_label
is required to add files.
A new job status will be created and submitted as a popup. Set a
status_title
on bigger ongoing jobs that will take a while and receive many updates--and leave it alone, even when the job is done. For simple notes, just setstatus_text_1
.Finishing Jobs
The pausable, cancellable, and files-mergable status of a job is only settable at creation. A pausable or cancellable popup represents an ongoing and unfinished job. The popup will exist indefinitely and will not be user-dismissable unless the user can first cancel it.
You, as the creator, must plan to call Finish once your work is done. Yes, even if there is an error!
Pausing and Cancelling
If the user pauses a job, you should recognise that and pause your work. Resume when they do.
If the user cancels a job, you should recognise that and stop work. Either call
finish
with an appropriate status update, orfinish_and_dismiss
if you have nothing more to say.If your long-term job has a main loop, place this at the top of the loop, along with your status update calls.
Example request body
Example request body{\n \"status_text_1\": \"Note to user\"\n}\n
Response: A JSON Object containing{\n \"status_title\": \"Example Popup\",\n \"popup_gauge_1\": [35, 120],\n \"popup_gauge_2\": [9, 10],\n \"status_text_1\": \"Doing things\",\n \"status_text_2\": \"Doing other things\",\n \"is_cancellable\": true,\n \"api_data\": {\n \"whatever\": \"stuff\"\n },\n \"files_label\": \"test files\",\n \"hashes\": [\n \"ad6d3599a6c489a575eb19c026face97a9cd6579e74728b0ce94a601d232f3c3\",\n \"4b15a4a10ac1d6f3d143ba5a87f7353b90bb5567d65065a8ea5b211c217f77c6\"\n ]\n}\n
job_status
, the job status object that was added."},{"location":"developer_api.html#manage_popups_call_user_callable","title":"POST/manage_popups/call_user_callable
","text":"Call the user callable function of a popup.
Restricted access: YES. Manage Pages permission needed. Required Headers:Content-Type
: application/json
job_status_key
: The job status key to call the user callable of
The job status must have a user callable (the
Example request bodyuser_callable_label
in the job status object indicates this) to call it.
Response: 200 with no content."},{"location":"developer_api.html#manage_popups_cancel_popup","title":"POST{\n \"job_status_key\" : \"abee8b37d47dba8abf82638d4afb1d11586b9ef7be634aeb8ae3bcb8162b2c86\"\n}\n
/manage_popups/cancel_popup
","text":"Try to cancel a popup.
Restricted access: YES. Manage Popups permission needed. Required Headers:Content-Type
: application/json
job_status_key
: The job status key to cancel
The job status must be cancellable to be cancelled. If it isn't, this is nullipotent.
Example request body
Response: 200 with no content."},{"location":"developer_api.html#manage_popups_dismiss_popup","title":"POST{\n \"job_status_key\" : \"abee8b37d47dba8abf82638d4afb1d11586b9ef7be634aeb8ae3bcb8162b2c86\"\n}\n
/manage_popups/dismiss_popup
","text":"Try to dismiss a popup.
Restricted access: YES. Manage Popups permission needed. Required Headers:Content-Type
: application/json
job_status_key
: The job status key to dismiss
This is a call an 'observer' (i.e. not the job creator) makes. In the client UI, it would be a user right-clicking a popup to dismiss it. If the job is dismissable (i.e. it
is_done
), the popup disappears, but if it is pausable/cancellable--an ongoing job--then this action is nullipotent.You should call this on jobs you did not create yourself.
Example request body
Response: 200 with no content."},{"location":"developer_api.html#manage_popups_finish_popup","title":"POST{\n \"job_status_key\": \"abee8b37d47dba8abf82638d4afb1d11586b9ef7be634aeb8ae3bcb8162b2c86\"\n}\n
/manage_popups/finish_popup
","text":"Mark a popup as done.
Restricted access: YES. Manage Popups permission needed. Required Headers:Content-Type
: application/json
job_status_key
: The job status key to finish
Important
You may only call this on jobs you created yourself.
You only need to call it on jobs that you created pausable or cancellable. It clears those statuses, sets
is_done
, and allows the user to dismiss the job with a right-click.Once called, the popup will remain indefinitely. You should marry this call with an
Example request bodyupdate
that clears the texts and gauges you were using and leaves a \"Done, processed x files with y errors!\" or similar statement to let the user know how the job went.
Response: 200 with no content."},{"location":"developer_api.html#manage_popups_finish_and_dismiss_popup","title":"POST{\n \"job_status_key\" : \"abee8b37d47dba8abf82638d4afb1d11586b9ef7be634aeb8ae3bcb8162b2c86\"\n}\n
/manage_popups/finish_and_dismiss_popup
","text":"Finish and dismiss a popup.
Restricted access: YES. Manage Popups permission needed. Required Headers:Content-Type
: application/json
job_status_key
: The job status key to dismissseconds
: (optional) an integer number of seconds to wait before dismissing the job status, defaults to happening immediately
Important
You may only call this on jobs you created yourself.
This will call
finish
immediately and flag the message for auto-dismissal (i.e. removing it from the popup toaster) either immediately or after the given number of seconds.You would want this instead of just
Example request bodyfinish
for when you either do not need to leave a 'Done!' summary, or if the summary is not so important, and is only needed if the user happens to glance that way. If you did boring work for ten minutes, you might like to set a simple 'Done!' and auto-dismiss after thirty seconds or so.
Response: 200 with no content."},{"location":"developer_api.html#manage_popups_update_popuip","title":"POST{\n \"job_status_key\": \"abee8b37d47dba8abf82638d4afb1d11586b9ef7be634aeb8ae3bcb8162b2c86\",\n \"seconds\": 5\n}\n
/manage_popups/update_popup
","text":"Update a popup.
Restricted access: YES. Manage Popups permission needed. Required Headers:Content-Type
: application/json
job_status_key
: The hex key of the job status to update.- It accepts these fields of a job status object:
status_title
status_text_1
andstatus_text_2
popup_gauge_1
andpopup_gauge_2
api_data
files_label
: the label for the files attached to the job status. It will be returned aslabel
in thefiles
object in the job status object.- files that will be added to the job status. They will be returned as
hashes
in thefiles
object in the job status object.files_label
is required to add files.
The specified job status will be updated with the new values submitted. Any field without a value will be left alone and any field set to
Example request bodynull
will be removed from the job status.
Response: A JSON Object containing{\n \"job_status_key\": \"abee8b37d47dba8abf82638d4afb1d11586b9ef7be634aeb8ae3bcb8162b2c86\",\n \"status_title\": \"Example Popup\",\n \"status_text_1\": null,\n \"popup_gauge_1\": [12, 120],\n \"api_data\": {\n \"whatever\": \"other stuff\"\n }\n}\n
job_status
, the job status object that was updated."},{"location":"developer_api.html#managing_the_database","title":"Managing the Database","text":""},{"location":"developer_api.html#manage_database_lock_on","title":"POST/manage_database/lock_on
","text":"Pause the client's database activity and disconnect the current connection.
Restricted access: YES. Manage Database permission needed.Arguments: None
This is a hacky prototype. It commands the client database to pause its job queue and release its connection (and related file locks and journal files). This puts the client in a similar position as a long VACUUM command--it'll hang in there, but not much will work, and since the UI async code isn't great yet, the UI may lock up after a minute or two. If you would like to automate database backup without shutting the client down, this is the thing to play with.
This should return pretty quick, but it will wait up to five seconds for the database to actually disconnect. If there is a big job (like a VACUUM) current going on, it may take substantially longer to finish that up and process this STOP command. You might like to check for the existence of a journal file in the db dir just to be safe.
As long as this lock is on, all Client API calls except the unlock command will return 503. (This is a decent way to test the current lock status, too)
"},{"location":"developer_api.html#manage_database_lock_off","title":"POST/manage_database/lock_off
","text":"Reconnect the client's database and resume activity.
Restricted access: YES. Manage Database permission needed.Arguments: None
This is the obvious complement to the lock. The client will resume processing its job queue and will catch up. If the UI was frozen, it should free up in a few seconds, just like after a big VACUUM.
"},{"location":"developer_api.html#manage_database_mr_bones","title":"GET/manage_database/mr_bones
","text":"Get the data from help->how boned am I?. This is a simple Object of numbers just for hacky advanced purposes if you want to build up some stats in the background. The numbers are the same as the dialog shows, so double check that to confirm what means what.
Restricted access: YES. Manage Database permission needed. Arguments (in percent-encoded JSON):tags
: (optional, a list of tags you wish to search for)- file domain (optional, defaults to all my files)
tag_service_key
: (optional, hexadecimal, the tag domain on which to search, defaults to all my files)
Example response/manage_database/mr_bones\n/manage_database/mr_bones?tags=%5B%22blonde_hair%22%2C%20%22blue_eyes%22%5D\n
{\n \"boned_stats\" : {\n \"num_inbox\" : 8356,\n \"num_archive\" : 229,\n \"num_deleted\" : 7010,\n \"size_inbox\" : 7052596762,\n \"size_archive\" : 262911007,\n \"size_deleted\" : 13742290193,\n \"earliest_import_time\" : 1451408539,\n \"total_viewtime\" : [3280, 41621, 2932, 83021],\n \"total_alternate_files\" : 265,\n \"total_duplicate_files\" : 125,\n \"total_potential_pairs\" : 3252\n }\n}\n
The arguments here are the same as for GET /get_files/search_files. You can set any or none of them to set a search domain like in the dialog.
"},{"location":"developer_api.html#manage_database_get_client_options","title":"GET/manage_database/get_client_options
","text":"Unstable Response
The response for this path is unstable and subject to change without warning. No examples are given.
Gets the current options from the client.
Restricted access: YES. Manage Database permission needed.Required Headers: n/a
Arguments: n/a
Response: A JSON dump of nearly all options set in the client. The format of this is based on internal hydrus structures and is subject to change without warning with new hydrus versions. Do not rely on anything you find here to continue to exist and don't rely on the structure to be the same."},{"location":"docker.html","title":"Hydrus in a container(HiC)","text":"Latest hydrus client that runs in docker 24/7. Employs xvfb and vnc. Runs on alpine.
TL;DR:
docker run --name hydrusclient -d -p 5800:5800 -p 5900:5900 ghcr.io/hydrusnetwork/hydrus:latest
. Connect to noVNC viahttp://yourdockerhost:5800/vnc.html
or use Tiger VNC Viewer or any other VNC client and connect on port 5900.For persistent storage you can either create a named volume or mount a new/existing db path
"},{"location":"docker.html#the_container_will_not_fix_the_permissions_inside_the_db_folder_chown_your_db_folder_content_on_your_own","title":"The container will NOT fix the permissions inside the db folder. CHOWN YOUR DB FOLDER CONTENT ON YOUR OWN","text":"-v /hydrus/client/db:/opt/hydrus/db
. The client runs with default permissions of1000:1000
, this can be changed by the ENVUID
andGID
(not working atm, fixed to 1000) will be fixed someday\u2122.If you have enough RAM, mount
/tmp
as tmpfs. If not, download more RAM.As of
v359
hydrus understands IPFSnocopy
. And can be easily run with go-ipfs container. Read Hydrus IPFS help. MountHOST_PATH_DB/client_files
to/data/client_files
in ipfs. Go manage the ipfs service and set the path to/data/client_files
, you'll know where to put it in.Example compose file:
Further containerized application of interest:version: '3.8'\nvolumes:\n tor-config:\n driver: local\n hybooru-pg-data:\n driver: local\n hydrus-server:\n driver: local\n hydrus-client:\n driver: local\n ipfs-data:\n driver: local\n hydownloader-data:\n driver: local\nservices:\n hydrusclient:\n image: ghcr.io/hydrusnetwork/hydrus:latest\n container_name: hydrusclient\n restart: unless-stopped\n environment:\n - UID=1000\n - GID=1000\n volumes:\n - hydrus-client:/opt/hydrus/db\n tmpfs:\n - /tmp #optional for SPEEEEEEEEEEEEEEEEEEEEEEEEED and less disk access\n ports:\n - 5800:5800 #noVNC\n - 5900:5900 #VNC\n - 45868:45868 #Booru\n - 45869:45869 #API\n\n hydrusserver:\n image: ghcr.io/hydrusnetwork/hydrus:server\n container_name: hydrusserver\n restart: unless-stopped\n volumes:\n - hydrus-server:/opt/hydrus/db\n\n hydrusclient-ipfs:\n image: ipfs/go-ipfs\n container_name: hydrusclient-ipfs\n restart: unless-stopped\n volumes:\n - ipfs-data:/data/ipfs\n - hydrus-clients:/data/db:ro\n ports:\n - 4001:4001 # READ\n - 5001:5001 # THE\n - 8080:8080 # IPFS\n - 8081:8081 # DOCS\n\n hydrus-web:\n image: floogulinc/hydrus-web\n container_name: hydrus-web\n restart: always\n ports:\n - 8080:80 # READ\n\n hybooru-pg:\n image: healthcheck/postgres\n container_name: hybooru-pg\n environment:\n - POSTGRES_USER=hybooru\n - POSTGRES_PASSWORD=hybooru\n - POSTGRES_DB=hybooru\n volumes:\n - hybooru-pg-data:/var/lib/postgresql/data\n restart: unless-stopped\n\n hybooru:\n image: suika/hybooru:latest # https://github.com/funmaker/hybooru build it yourself\n container_name: hybooru\n restart: unless-stopped\n depends_on:\n hybooru-pg:\n condition: service_started\n ports:\n - 8081:80 # READ\n volumes:\n - hydrus-client:/opt/hydrus/db\n\n hydownloader:\n image: ghcr.io/thatfuckingbird/hydownloader:edge\n container_name: hydownloader\n restart: unless-stopped\n ports:\n - 53211:53211\n volumes:\n - hydownloader-data:/db\n - hydrus-client:/hydb\n\n tor-socks-proxy:\n #network_mode: \"container:myvpn_container\" # in case you have a vpn container\n container_name: tor-socks-proxy\n image: peterdavehello/tor-socks-proxy:latest\n restart: unless-stopped\n\n tor-hydrus:\n image: goldy/tor-hidden-service\n container_name: tor-hydrus\n depends_on:\n hydrusclient:\n condition: service_healthy\n hydrusserver:\n condition: service_healthy\n hybooru:\n condition: service_started\n environment:\n HYBOORU_TOR_SERVICE_HOSTS: '80:hybooru:80'\n HYBOORU_TOR_SERVICE_VERSION: '3'\n HYSERV_TOR_SERVICE_HOSTS: 45870:hydrusserver:45870,45871:hydrusserver:45871\n HYSERV_TOR_SERVICE_VERSION: '3'\n HYCLNT_TOR_SERVICE_HOSTS: 45868:hydrusclient:45868,45869:hydrusclient:45869\n HYCLNT_TOR_SERVICE_VERSION: '3'\n volumes:\n - tor-config:/var/lib/tor/hidden_service \n
- Hybooru: Hydrus-based booru-styled imageboard in React, inspired by hyve.
- hydownloader: Alternative way of downloading and importing files. Decoupled from hydrus logic and limitations.
"},{"location":"downloader_completion.html","title":"Putting it all together","text":"# Alpine (client)\ncd hydrus/\ndocker build -t ghcr.io/hydrusnetwork/hydrus:latest -f static/build_files/docker/client/Dockerfile .\n
Now you know what GUGs, URL Classes, and Parsers are, you should have some ideas of how URL Classes could steer what happens when the downloader is faced with an URL to process. Should a URL be imported as a media file, or should it be parsed? If so, how?
You may have noticed in the Edit GUG ui that it lists if a current URL Class matches the example URL output. If the GUG has no matching URL Class, it won't be listed in the main 'gallery selector' button's list--it'll be relegated to the 'non-functioning' page. Without a URL Class, the client doesn't know what to do with the output of that GUG. But if a URL Class does match, we can then hand the result over to a parser set at network->downloader components->manage url class links:
Here you simply set which parsers go with which URL Classes. If you have URL Classes that do not have a parser linked (which is the default for new URL Classes), you can use the 'try to fill in gaps...' button to automatically fill the gaps based on guesses using the parsers' example URLs. This is usually the best way to line things up unless you have multiple potential parsers for that URL Class, in which case it'll usually go by the parser name earliest in the alphabet.
If the URL Class has no parser set or the parser is broken or otherwise invalid, the respective URL's file import object in the downloader or subscription is going to throw some kind of error when it runs. If you make and share some parsers, the first indication that something is wrong is going to be several users saying 'I got this error: (copy notes from file import status window)'. You can then load the parser back up in manage parsers and try to figure out what changed and roll out an update.
manage url class links also shows 'api/redirect link review', which summarises which URL Classes redirect to others. In these cases, only the redirected-to URL gets a parser entry in the first 'parser links' window, since the first will never be fetched for parsing (in the downloader, it will always be converted to the Redirected URL, and that is fetched and parsed).
Once your GUG has a URL Class and your URL Classes have parsers linked, test your downloader! Note that Hydrus's URL drag-and-drop import uses URL Classes, so if you don't have the GUG and gallery stuff done but you have a Post URL set up, you can test that just by dragging a Post URL from your browser to the client, and it should be added to a new URL Downloader and just work. It feels pretty good once it does!
"},{"location":"downloader_gugs.html","title":"Gallery URL Generators","text":"Gallery URL Generators, or GUGs are simple objects that take a simple string from the user, like:
- blue_eyes
- blue_eyes blonde_hair
- InCase
- elsa dandon_fuga
- wlop
- goth* order:id_asc
And convert them into an initialising Gallery URL, such as:
- http://safebooru.org/index.php?page=post&s=list&tags=blue_eyes&pid=0
- https://konachan.com/post?page=1&tags=blonde_hair+blue_eyes
- https://www.hentai-foundry.com/pictures/user/InCase/page/1
- http://rule34.paheal.net/post/list/elsa dandon_fuga/1
- https://www.deviantart.com/wlop/favourites/?offset=0
- https://danbooru.donmai.us/posts?page=1&tags=goth*+order:id_asc
These are all the 'first page' of the results if you type or click-through to the same location on those sites. We are essentially emulating their own simple search-url generation inside the hydrus client.
"},{"location":"downloader_gugs.html#doing_it","title":"actually doing it","text":"Although it is usually a fairly simple process of just substituting the inputted tags into a string template, there are a couple of extra things to think about. Let's look at the ui under network->downloader components->manage gugs:
The client will split whatever the user enters by whitespace, so
blue_eyes blonde_hair
becomes two search terms,[ 'blue_eyes', 'blonde_hair' ]
, which are then joined back together with the given 'search terms separator', to makeblue_eyes+blonde_hair
. Different sites use different separators, although ' ', '+', and ',' are most common. The new string is substituted into the%tags%
in the template phrase, and the URL is made.Note that you will not have to make %20 or %3A percent-encodings for reserved characters here--the network engine handles all that before the request is sent. For the most part, if you need to include or a user puts in ':' or ' ' or '\u304a\u3063\u3071\u3044', you can just pass it along straight into the final URL without worrying.
This ui should update as you change it, so have a play and look at how the output example url changes to get a feel for things. Look at the other defaults to see different examples. Even if you break something, you can just cancel out.
The name of the GUG is important, as this is what will be listed when the user chooses what 'downloader' they want to use. Make sure it has a clear unambiguous name.
The initial search text is also important. Most downloaders just take some text tags, but if your GUG expects a numerical artist id (like pixiv artist search does), you should specify that explicitly to the user. You can even put in a brief '(two tag maximum)' type of instruction if you like.
Notice that the Deviart Art example above is actually the stream of wlop's favourites, not his works, and without an explicit notice of that, a user could easily mistake what they have selected. 'gelbooru' or 'newgrounds' are bad names, 'type here' is a bad initialising text.
"},{"location":"downloader_gugs.html#nested_gugs","title":"Nested GUGs","text":"Nested Gallery URL Generators are GUGs that hold other GUGs. Some searches actually use more than one stream (such as a Hentai Foundry artist lookup, where you might want to get both their regular works and their scraps, which are two separate galleries under the site), so NGUGs allow you to generate multiple initialising URLs per input. You can experiment with this ui if you like--it isn't too complicated--but you might want to hold off doing anything for real until you are comfortable with everything and know how producing multiple initialising URLs is going to work in the actual downloader.
"},{"location":"downloader_intro.html","title":"Making a Downloader","text":"Caution
Creating custom downloaders is only for advanced users who understand HTML or JSON. Beware! If you are simply looking for how to add new downloaders, please head over here.
"},{"location":"downloader_intro.html#intro","title":"this system","text":"The first versions of hydrus's downloaders were all hardcoded and static--I wrote everything into the program itself and nothing was user-creatable or -fixable. After the maintenance burden of the entire messy system proved too large for me to keep up with and a semi-editable booru system proved successful, I decided to overhaul the entire thing to allow user creation and sharing of every component. It is designed to be very simple to the front-end user--they will typically handle a couple of png files and then select a new downloader from a list--but very flexible (and hence potentially complicated) on the back-end. These help pages describe the different compontents with the intention of making an HTML- or JSON- fluent user able to create and share a full new downloader on their own.
As always, this is all under active development. Your feedback on the system would be appreciated, and if something is confusing or you discover something in here that is out of date, please let me know.
"},{"location":"downloader_intro.html#downloader","title":"what is a downloader?","text":"In hydrus, a downloader is one of:
Gallery Downloader This takes a string like 'blue_eyes' to produce a series of thumbnail gallery page URLs that can be parsed for image page URLs which can ultimately be parsed for file URLs and metadata like tags. Boorus fall into this category. URL Downloader This does just the Gallery Downloader's back-end--instead of taking a string query, it takes the gallery or post URLs directly from the user, whether that is one from a drag-and-drop event or hundreds pasted from clipboard. For our purposes here, the URL Downloader is a subset of the Gallery Downloader. Watcher This takes a URL that it will check in timed intervals, parsing it for new URLs that it then queues up to be downloaded. It typically stops checking after the 'file velocity' (such as '1 new file per day') drops below a certain level. It is mostly for watching imageboard threads. Simple Downloader This takes a URL one-time and parses it for direct file URLs. This is a miscellaneous system for certain simple gallery types and some testing/'I just need the third tag's src on this one page' jobs.The system currently supports HTML and JSON parsing. XML should be fine under the HTML parser--it isn't strict about checking types and all that.
"},{"location":"downloader_intro.html#pipeline","title":"what does a downloader do?","text":"The Gallery Downloader is the most complicated downloader and uses all the possible components. In order for hydrus to convert our example 'blue_eyes' query into a bunch of files with tags, it needs to:
- Present some user interface named 'safebooru tag search' to the user that will convert their input of 'blue_eyes' into https://safebooru.org/index.php?page=post&s=list&tags=blue_eyes&pid=0.
- Recognise https://safebooru.org/index.php?page=post&s=list&tags=blue_eyes&pid=0 as a Safebooru Gallery URL.
- Convert the HTML of a Safebooru Gallery URL into a list URLs like https://safebooru.org/index.php?page=post&s=view&id=2437965 and possibly a 'next page' URL (e.g. https://safebooru.org/index.php?page=post&s=list&tags=blue_eyes&pid=40) that points to the next page of thumbnails.
- Recognise the https://safebooru.org/index.php?page=post&s=view&id=2437965 URLs as Safebooru Post URLs.
- Convert the HTML of a Safebooru Post URL into a file URL like https://safebooru.org//images/2329/b6e8c263d691d1c39a2eeba5e00709849d8f864d.jpg and some tags like: 1girl, bangs, black gloves, blonde hair, blue eyes, braid, closed mouth, day, fingerless gloves, fingernails, gloves, grass, hair ornament, hairclip, hands clasped, creator:hankuri, interlocked fingers, long hair, long sleeves, outdoors, own hands together, parted bangs, pointy ears, character:princess zelda, smile, solo, series:the legend of zelda, underbust.
So we have three components:
- Gallery URL Generator (GUG): faces the user and converts text input into initialising Gallery URLs.
- URL Class: identifies URLs and informs the client how to deal with them.
- Parser: converts data from URLs into hydrus-understandable metadata.
URL downloaders and watchers do not need the Gallery URL Generator, as their input is an URL. And simple downloaders also have an explicit 'just download it and parse it with this simple rule' action, so they do not use URL Classes (or even full-fledged Page Parsers) either.
"},{"location":"downloader_login.html","title":"Login Manager","text":"The system works, but this help was never done! Check the defaults for examples of how it works, sorry!
"},{"location":"downloader_parsers.html","title":"Parsers","text":"In hydrus, a parser is an object that takes a single block of HTML or JSON data and returns many kinds of hydrus-level metadata.
Parsers are flexible and potentially quite complicated. You might like to open network->downloader components->manage parsers and explore the UI as you read these pages. Check out how the default parsers already in the client work, and if you want to write a new one, see if there is something already in there that is similar--it is usually easier to duplicate an existing parser and then alter it than to create a new one from scratch every time.
There are three main components in the parsing system (click to open each component's help page):
- Formulae: Take parsable data, search it in some manner, and return 0 to n strings.
- Content Parsers: Take parsable data, apply a formula to it to get some strings, and apply a single metadata 'type' and perhaps some additional modifiers.
- Page Parsers: Take parsable data, apply content parsers to it, and return all the metadata in an appropriate structure.
Once you are comfortable with these objects, you might like to check out these walkthroughs, which create full parsers from nothing:
- e621 HTML gallery page
- Gelbooru HTML file page
- Artstation JSON file page API
Once you are comfortable with parsers, and if you are feeling brave, check out how the default imageboard and pixiv parsers work. These are complicated and use more experimental areas of the code to get their job done. If you are trying to get a new imageboard parser going and can't figure out subsidiary page parsers, send me a mail or something and I'll try to help you out!
When you are making a parser, consider this checklist (you might want to copy/have your own version of this somewhere):
- Do you get good URLs with good priority? Do you ever accidentally get favourite/popular/advert results you didn't mean to?
- If you need a next gallery page URL, is it ever not available (and hence needs a URL Class fix)? Does it change for search tags with unicode or http-restricted characters?
- Do you get nice namespaced tags? Are any unwanted single characters like -/+/? getting through?
- Is the file hash available anywhere?
- Is a source/post time available?
- Is a source URL available? Is it good quality, or does it often just point to an artist's base twitter profile? If you pull it from text or a tooltip, is it clipped for longer URLs?
Taken a break? Now let's put it all together ---->
"},{"location":"downloader_parsers_content_parsers.html","title":"Content Parsers","text":"So, we can now generate some strings from a document. Content Parsers will let us apply a single metadata type to those strings to inform hydrus what they are.
A content parser has a name, a content type, and a formula. This example fetches the character tags from a danbooru post.
The name is just decorative, but it is generally a good idea so you can find things again when you next revisit them.
The current content types are:
"},{"location":"downloader_parsers_content_parsers.html#intro","title":"urls","text":"This should be applied to relative ('/image/smile.jpg') and absolute ('https://mysite.com/content/image/smile.jpg') URLs. If the URL is relative, the client will generate an absolute URL based on the original URL used to fetch the data being parsed (i.e. it should all just work).
You can set several types of URL:
- url to download/pursue means a Post URL or a File URL in our URL Classes system, like a booru post or an actual raw file like a jpg or webm.
- url to associate means an URL you want added to the list of 'known urls' for the file, but not one you want to client to actually download and parse. Use this to neatly add booru 'source' urls.
- next gallery page means the next Gallery URL on from the current one.
The 'file url quality precedence' allows the client to select the best of several possible URLs. Given multiple content parsers producing URLs at the same 'level' of parsing, it will select the one with the highest value. Consider these two posts:
- https://danbooru.donmai.us/posts/3016415
- https://danbooru.donmai.us/posts/3040603
The Garnet image fits into a regular page and so Danbooru embed the whole original file in the main media canvas. One easy way to find the full File URL in this case would be to select the \"src\" attribute of the \"img\" tag with id=\"image\".
The Cirno one, however, is much larger and has been scaled down. The src of the main canvas tag points to a resized 'sample' link. The full link can be found at the 'view original' link up top, which is an \"a\" tag with id=\"image-resize-link\".
The Garnet post does not have the 'view original' link, so to cover both situations we might want two content parsers--one fetching the 'canvas' \"src\" and the other finding the 'view original' \"href\". If we set the 'canvas' one with a quality of 40 and the 'view original' 60, then the parsing system would know to select the 60 when it was available but to fall back to the 40 if not.
As it happens, Danbooru (afaik, always) gives a link to the original file under the 'Size:' metadata to the left. This is the same 'best link' for both posts above, but it isn't so easy to identify. It is a quiet \"a\" tag without an \"id\" and it isn't always in the same location, but if you could pin it down reliably, it might be nice to circumvent the whole issue.
Sites can change suddenly, so it is nice to have a bit of redundancy here if it is easy.
"},{"location":"downloader_parsers_content_parsers.html#tags","title":"tags","text":"These are simple--they tell the client that the given strings are tags. You set the namespace here as well. I recommend you parse 'splashbrush' and set the namespace 'creator' here rather than trying to mess around with 'append prefix \"creator:\"' string conversions at the formula level--it is simpler up here and it lets hydrus handle any edge case logic for you.
Leave the namespace field blank for unnamespaced tags.
"},{"location":"downloader_parsers_content_parsers.html#file_hash","title":"file hash","text":"This says 'this is the hash for the file otherwise referenced in this parser'. So, if you have another content parser finding a File or Post URL, this lets the client know early that that destination happens to have a particular MD5, for instance. The client will look for that hash in its own database, and if it finds a match, it can predetermine if it already has the file (or has previously deleted it) without ever having to download it. When this happens, it will still add tags and associate the file with the URL for it's 'known urls' just as if it had downloaded it!
If you understand this concept, it is great to include. It saves time and bandwidth for everyone. Many site APIs include a hash for this exact reason--they want you to be able to skip a needless download just as much as you do.
The usual suite of hash types are supported: MD5, SHA1, SHA256, and SHA512. An old version of this required some weird string decoding, but this is no longer true. Select 'hex' or 'base64' from the encoding type dropdown, and then just parse the 'e5af57a687f089894f5ecede50049458' or '5a9XpofwiYlPXs7eUASUWA==' text, and hydrus should handle the rest. It will present the parsed hash in hex.
"},{"location":"downloader_parsers_content_parsers.html#timestamp","title":"timestamp","text":"This lets you say that a given number refers to a particular time for a file. At the moment, I only support 'source time', which represents a 'post' time for the file and is useful for thread and subscription check time calculations. It takes a Unix time integer, like 1520203484, which many APIs will provide.
If you are feeling very clever, you can decode a 'MM/DD/YYYY hh:mm:ss' style string to a Unix time integer using string converters, which use some hacky and semi-reliable python %d-style values as per here. Look at the existing defaults for examples of this, and don't worry about being more accurate than 12/24 hours--trying to figure out timezone is a hell not worth attempting, and doesn't really matter in the long-run for subscriptions and thread watchers that might care.
"},{"location":"downloader_parsers_content_parsers.html#page_title","title":"watcher page title","text":"This lets the watcher know a good name/subject for its entries. The subject of a thread is obviously ideal here, but failing that you can try to fetch the first part of the first post's comment. It has precendence, like for URLs, so you can tell the parser which to prefer if you have multiple options. Just for neatness and ease of testing, you probably want to use a string converter here to cut it down to the first 64 characters or so.
"},{"location":"downloader_parsers_content_parsers.html#veto","title":"veto","text":"This is a special content type--it tells the next highest stage of parsing that this 'post' of parsing is invalid and to cancel and not return any data. For instance, if a thread post's file was deleted, the site might provide a default '404' stock File URL using the same markup structure as it would for normal images. You don't want to give the user the same 404 image ten times over (with fifteen kinds of tag and source time metadata attached), so you can add a little rule here that says \"If the image link is 'https://somesite.com/404.png', raise a veto: File 404\" or \"If the page has 'No results found' in its main content div, raise a veto: No results found\" or \"If the expected download tag does not have 'download link' as its text, raise a veto: No Download Link found--possibly Ugoira?\" and so on.
They will associate their name with the veto being raised, so it is useful to give these a decent descriptive name so you can see what might be going right or wrong during testing. If it is an appropriate and serious enough veto, it may also rise up to the user level and will be useful if they need to report you an error (like \"After five pages of parsing, it gives 'veto: no next page link'\").
"},{"location":"downloader_parsers_formulae.html","title":"Parser Formulae","text":"Formulae are tools used by higher-level components of the parsing system. They take some data (typically some HTML or JSON) and return 0 to n strings. For our purposes, these strings will usually be tags, URLs, and timestamps. You will usually see them summarised with this panel:
The different types are currently html, json, nested, zipper, and context variable.
"},{"location":"downloader_parsers_formulae.html#html_formula","title":"html","text":"This takes a full HTML document or a sample of HTML--and any regular sort of XML should also work. It starts at the root node and searches for lower nodes using one or more ordered rules based on tag name and attributes, and then returns string data from those final nodes.
For instance, if you have this:
<html>\n <body>\n <div class=\"media_taglist\">\n <span class=\"generaltag\"><a href=\"(search page)\">blonde hair</a> (3456)</span>\n <span class=\"generaltag\"><a href=\"(search page)\">blue eyes</a> (4567)</span>\n <span class=\"generaltag\"><a href=\"(search page)\">bodysuit</a> (5678)</span>\n <span class=\"charactertag\"><a href=\"(search page)\">samus aran</a> (2345)</span>\n <span class=\"artisttag\"><a href=\"(search page)\">splashbrush</a> (123)</span>\n </div>\n <div class=\"content\">\n <span class=\"media\">(a whole bunch of content that doesn't have tags in)</span>\n </div>\n </body>\n</html>\n
(Most boorus have a taglist like this on their file pages.)
To find the artist, \"splashbrush\", here, you could:
- search beneath the root tag (
<html>
) for the<div>
tag with attributeclass=\"media_taglist\"
- search beneath that
<div>
for<span>
tags with attributeclass=\"artisttag\"
- search beneath those
<span>
tags for<a>
tags - and then get the string content of those
<a>
tags
Changing the
artisttag
tocharactertag
orgeneraltag
would give yousamus aran
orblonde hair
,blue eyes
,bodysuit
respectively.You might be tempted to just go straight for any
"},{"location":"downloader_parsers_formulae.html#the_ui","title":"the ui","text":"<span>
withclass=\"artisttag\"
, but many sites use the same class to render a sidebar of favourite/popular tags or some other sponsored content, so it is generally best to try to narrow down to a larger<div>
container so you don't get anything you don't mean.Clicking 'edit formula' on an HTML formula gives you this:
You edit on the left and test on the right.
"},{"location":"downloader_parsers_formulae.html#finding_the_right_html_tags","title":"finding the right html tags","text":"When you add or edit one of the specific tag search rules, you get this:
You can set multiple key/value attribute search conditions, but you'll typically be searching for 'class' or 'id' here, if anything.
Note that you can set it to fetch only the xth instance of a found tag, which can be useful in situations like this:
<span class=\"generaltag\">\n <a href=\"(add tag)\">+</a>\n <a href=\"(remove tag)\">-</a>\n <a href=\"(search page)\">blonde hair</a> (3456)\n</span>\n
Without any more attributes, there isn't a great way to distinguish the
<a>
with \"blonde hair\" from the other two--so just setget the 3rd <a> tag
and you are good.Most of the time, you'll be searching descendants (i.e. walking down the tree), but sometimes you might have this:
<span>\n <a href=\"(link to post url)\">\n <img class=\"thumb\" src=\"(thumbnail image)\" />\n </a>\n</span>\n
There isn't a great way to find the
<span>
or the<a>
when looking from above here, as they are lacking a class or id, but you can find the<img>
ok, so if you find those and then add a rule where instead of searching descendants, you are 'walking back up ancestors' like this:You can solve some tricky problems this way!
You can also set a String Match, which is the same panel as you say in with URL Classes. It tests its best guess at the tag's 'string' value, so you can find a tag with 'Original Image' as its text or that with a regex starts with 'Posted on: '. Have a play with it and you'll figure it out.
"},{"location":"downloader_parsers_formulae.html#content_to_fetch","title":"content to fetch","text":"Once you have narrowed down the right nodes you want, you can decide what text to fetch. Given a node of:
<a href=\"(URL A)\" class=\"thumb_title\">Forest Glade</a>\n
Returning the
"},{"location":"downloader_parsers_formulae.html#string_match_and_conversion","title":"string match and conversion","text":"href
attribute would return the string \"(URL A)\", returning the string content would give \"Forest Glade\", and returning the full html would give<a href=\"(URL A)\" class=\"thumb\">Forest Glade</a>
. This last choice is useful in complicated situations where you want a second, separated layer of parsing, which we will get to later.You can set a final String Match to filter the parsed results (e.g. \"only allow strings that only contain numbers\" or \"only allow full URLs as based on (complicated regex)\") and String Converter to edit it (e.g. \"remove the first three characters of whatever you find\" or \"decode from base64\").
You won't use these much, but they can sometimes get you out of a complicated situation.
"},{"location":"downloader_parsers_formulae.html#testing","title":"testing","text":"The testing panel on the right is important and worth using. Copy the html from the source you want to parse and then hit the paste buttons to set that as the data to test with.
"},{"location":"downloader_parsers_formulae.html#json_formula","title":"json","text":"This takes some JSON and does a similar style of search:
It is a bit simpler than HTML--if the current node is a list (called an 'Array' in JSON), you can fetch every item or the xth item, and if it is a dictionary (called an 'Object' in JSON), you can fetch a particular entry by name. Since you can't jump down several layers with attribute lookups or tag names like with HTML, you have to go down every layer one at a time. In any case, if you have something like this:
Note
It is a great idea to check the html or json you are trying to parse with your browser. Most web browsers have excellent developer tools that let you walk through the nodes of the document you are trying to parse in a prettier way than I would ever have time to put together. This image is one of the views Firefox provides if you simply enter a JSON URL.
Searching for \"posts\"->1st list item->\"sub\" on this data will give you \"Nobody like kino here.\".
Searching for \"posts\"->all list items->\"tim\" will give you the three SHA256 file hashes (since the third post has no file attached and so no 'tim' entry, the parser skips over it without complaint).
Searching for \"posts\"->1st list item->\"com\" will give you the OP's comment, ~AS RAW UNPARSED HTML~.
The default is to fetch the final nodes' 'data content', which means coercing simple variables into strings. If the current node is a list or dict, no string is returned.
But if you like, you can return the json beneath the current node (which, like HTML, includes the current node). This again will come in useful later.
"},{"location":"downloader_parsers_formulae.html#nested_formula","title":"nested","text":"If you want to parse some JSON that is tucked inside an HTML attribute, or vice versa, use a nested formula. This parses the text using one formula type and then passes the result(s) to another.
The especially neat thing about this is the encoded characters like
"},{"location":"downloader_parsers_formulae.html#zipper_formula","title":"zipper","text":">
or escaped JSON characters are all handled natively for you. Before we had this, we had to hack our way around with crazy regex.If you want to combine strings from the results of different parsers--for instance by joining the 'tim' and the 'ext' in our json example--you can use a Zipper formula. This fetches multiple lists of strings and zips their result rows together using
\\1
regex substitution syntax:This is a complicated example taken from one of my thread parsers. I have to take a modified version of the original thread URL (the first rule, so
\\1
) and then append the filename (\\2
) and its extension (\\3
) on the end to get the final file URL of a post. You can mix in more characters in the substitution phrase, like\\1.jpg
or even have multiple instances (https://\\2.muhsite.com/\\2/\\1
), if that is appropriate.If your sub-formulae produce multiple results, the Zipper will produce that many also, iterating the sub-lists together.
ExampleIf parser 1 gives:\n a\n b\n c\n\nAnd parser 2 gives:\n 1\n 2\n 3\n\nUsing a substitution phrase of \"\\1-\\2\" will give:\n a-1\n b-2\n c-3\n
If one of the sub-formulae produces fewer results than the others, its final value will be used to fill in the gaps. In this way, you might somewhere parse one prefix and seven suffixes, where joining them will use the same prefix seven times.
"},{"location":"downloader_parsers_formulae.html#context_variable_formula","title":"context variable","text":"This is a basic hacky answer to a particular problem. It is a simple key:value dictionary that at the moment only stores one variable, 'url', which contains the original URL used to fetch the data being parsed.
If a different URL Class links to this parser via an API URL, this 'url' variable will always be the API URL (i.e. it literally is the URL used to fetch the data), not any thread/whatever URL the user entered.
Hit the 'edit example parsing context' to change the URL used for testing.
I have used this several times to stitch together file URLs when I am pulling data from APIs, like in the zipper formula example above. In this case, the starting URL is
https://a.4cdn.org/tg/thread/57806016.json
, from which I extract the board name, \"tg\", using the string converter, and then add in 4chan's CDN domain to make the appropriate base file URL (https:/i.4cdn.org/tg/
) for the given thread. I only have to jump through this hoop in 4chan's case because they explicitly store file URLs by board name. 8chan on the other hand, for instance, has a statichttps://media.8ch.net/file_store/
for all files, so it is a little easier (I think I just do a single 'prepend' string transformation somewhere).If you want to make some parsers, you will have to get familiar with how different sites store and present their data!
"},{"location":"downloader_parsers_full_example_api.html","title":"api example","text":"Some sites offer API calls for their pages. Depending on complexity and quality of content, using these APIs may or may not be a good idea. Artstation has a good one--let's first review our URL Classes:
We convert the original Post URL, https://www.artstation.com/artwork/mQLe1 to https://www.artstation.com/projects/mQLe1.json. Note that Artstation Post URLs can produce multiple files, and that the API url should not be associated with those final files.
So, when the client encounters an 'artstation file page' URL, it will generate the equivalent 'artstation file page json api' URL and use that for downloading and parsing. If you would like to review your API links, check out network->downloader components->manage url class links->api links. Using Example URLs, it will figure out which URL Classes link to others and ensure you are mapping parsers only to the final link in the chain--there should be several already in there by default.
Now lets look at the JSON. Loading clean JSON in a browser should present you with a nicer view:
I have highlighted the data we want, which is:
- The File URLs.
- Creator, title, medium, and unnamespaced tags.
- Source time.
JSON is a dream to parse, and I will assume you are comfortable with Content Parsers from the previous examples, so I'll simply paste the different formulae one after another:
Each image is stored under a separate numbered 'assets' list item. This one has just two, but some Artstation pages have dozens of images. The only unusual part here is I also put a String Match of
^(?!.*assets\\/covers).*$
, which filters out 'cover' images (such as on here), which make for nice portfolio thumbs on the site but are not interesting to us.This fetches the 'creator' tag. Artstation's API is great because it includes profile data in content requests. There's the creator's presentation name, username, profile link, avatar URLs, all that inside a regular request about this particular work. When that information is missing (like in yiff.party), it may make the API useless to you.
These are all simple. You can take or leave the title and medium tags--some people like them, some don't. This example has no unnamespaced tags, but this one does. Creator-entered tags are sometimes not worth parsing (on tumblr, for instance, you often get run-on tags like #imbored #whatisevengoingon that are irrelevent to the work), but Artstation users are all professionals trying to get their work noticed, so the tags are usually pretty good.
This again uses python's datetime to decode the date, which Artstation presents with millisecond accuracy, ha ha. I use a
"},{"location":"downloader_parsers_full_example_api.html#summary","title":"summary","text":"(.+:..)\\..*->\\1
regex (i.e. \"get everything before the period\") to strip off the timezone and milliseconds and then decode as normal.APIs that are stable and free to access (e.g. do not require OAuth or other complicated login headers) can make parsing fantastic. They save bandwidth and CPU time, and they are typically easier to work with than HTML. Unfortunately, the boorus that do provide APIs often list their tags without namespace information, so I recommend you double-check you can get what you want before you get too deep into it. Some APIs also offer incomplete data, such as relative URLs (relative to the original URL!), which can be a pain to figure out in our system.
"},{"location":"downloader_parsers_full_example_file_page.html","title":"file page example","text":"Let's look at this page: https://gelbooru.com/index.php?page=post&s=view&id=3837615.
What sorts of data are we interested in here?
- The image URL.
- The different tags and their namespaces.
- The secret md5 hash buried in the HTML.
- The post time.
- The Deviant Art source URL.
A tempting strategy for pulling the file URL is to just fetch the src of the embedded
<img>
tag, but:- If the booru also supports videos or flash, you'll have to write separate and likely more complicated rules for
<video>
and<embed>
tags. - If the booru shows 'sample' sizes for large images--as this one does!--pulling the src of the image you see won't get the full-size original for large images.
If you have an account with the site you are parsing and have clicked the appropriate 'Always view original' setting, you may not see these sorts of sample-size banners! I recommend you log out of/go incognito for sites you are inspecting for hydrus parsing (unless a log-in is required to see content, so the hydrus user will have to set up hydrus-side login to actually use the parser), or you can easily NSFW-gates and other logged-out hurdles.
When trying to pin down the right link, if there are no good alternatives, you often have to write several File URL rules with different precedence, saying 'get the \"Click Here to See Full Size\" link at 75' and 'get the embed's \"src\" at 25' and so on to make sure you cover different situations, but as it happens Gelbooru always posts the actual File URL at:
<meta property=\"og:image\" content=\"https://gelbooru.com/images/38/6e/386e12e33726425dbd637e134c4c09b5.jpeg\" />
under the<head>
<a href=\"https://simg3.gelbooru.com//images/38/6e/386e12e33726425dbd637e134c4c09b5.jpeg\" target=\"_blank\" style=\"font-weight: bold;\">Original image</a>
which can be found by putting a String Match in the html formula.
<meta>
withproperty=\"og:image\"
is easy to search for (and they use the same tag for video links as well!). For the Original Image, you can use a String Match like so:Gelbooru uses \"Original Image\" even when they link to webm, which is helpful, but like \"og:image\", it could be changed to 'video' in future.
I think I wrote my gelbooru parser before I added String Matches to individual HTML formulae tag rules, so I went with this, which is a bit more cheeky:
But it works. Sometimes, just regexing for links that fit the site's CDN is a good bet for finding difficult stuff.
"},{"location":"downloader_parsers_full_example_file_page.html#tags","title":"tags","text":"Most boorus have a taglist on the left that has a nice id or class you can pull, and then each namespace gets its own class for CSS-colouring:
Make sure you browse around the booru for a bit, so you can find all the different classes they use. character/artist/copyright are common, but some sneak in the odd meta/species/rating.
Skipping ?/-/+ characters can be a pain if you are lacking a nice tag-text class, in which case you can add a regex String Match to the HTML formula (as I do here, since Gelb offers '?' links for tag definitions) like [^\\?\\-+\\s], which means \"the text includes something other than just '?' or '-' or '+' or whitespace\".
"},{"location":"downloader_parsers_full_example_file_page.html#md5_hash","title":"md5 hash","text":"If you look at the Gelbooru File URL, https://gelbooru.com/images/38/6e/386e12e33726425dbd637e134c4c09b5.jpeg, you may notice the filename is all hexadecimal. It looks like they store their files under a two-deep folder structure, using the first four characters--386e here--as the key. It sure looks like '386e12e33726425dbd637e134c4c09b5' is not random ephemeral garbage!
In fact, Gelbooru use the MD5 of the file as the filename. Many storage systems do something like this (hydrus uses SHA256!), so if they don't offer a
<meta>
tag that explicitly states the md5 or sha1 or whatever, you can sometimes infer it from one of the file links. This screenshot is from the more recent version of hydrus, which has the more powerful 'string processing' system for string transformations. It has an intimidating number of nested dialogs, but we can stay simple for now, with only the one regex substitution step inside a string 'converter':Here we are using the same property=\"og:image\" rule to fetch the File URL, and then we are regexing the hex hash with
.*(\\[0-9a-f\\]{32}).*
(MD5s are 32 hex characters). We select 'hex' as the encoding type. Hashes require a tiny bit more data handling behind the scenes, but in the Content Parser test page it presents the hash again neatly in English: \"md5 hash: 386e12e33726425dbd637e134c4c09b5\"), meaning everything parsed correct. It presents the hash in hex even if you select the encoding type as base64.If you think you have found a hash string, you should obviously test your theory! The site might not be using the actual MD5 of file bytes, as hydrus does, but instead some proprietary scheme. Download the file and run it through a program like HxD (or hydrus!) to figure out its hashes, and then search the View Source for those hex strings--you might be surprised!
Finding the hash is hugely beneficial for a parser--it lets hydrus skip downloading files without ever having seen them before!
"},{"location":"downloader_parsers_full_example_file_page.html#source_time","title":"source time","text":"Post/source time lets subscriptions and watchers make more accurate guesses at current file velocity. It is neat to have if you can find it, but:
FUCK ALL TIMEZONES FOREVER
Gelbooru offers--
<li>Posted: 2017-08-18 19:59:44<br /> by <a href=\"index.php?page=account&s=profile&uname=jayage5ds\">jayage5ds</a></li>\n
--so let's see how we can turn that into a Unix timestamp:
I find the
"},{"location":"downloader_parsers_full_example_file_page.html#source_url","title":"source url","text":"<li>
that starts \"Posted: \" and then decode the date according to the hackery-dackery-doo format from here.%c
and%z
are unreliable, and attempting timezone adjustments is overall a supervoid that will kill your time for no real benefit--subs and watchers work fine with 12-hour imprecision, so if you have a +0300 or EST in your string, just cut those characters off with another String Transformation. As long as you are getting about the right day, you are fine.Source URLs are nice to have if they are high quality. Some boorus only ever offer artist profiles, like
https://twitter.com/artistname
, whereas we want singular Post URLs that point to other places that host this work. For Gelbooru, you could fetch the Source URL as we did source time, searching for \"Source: \", but they also offer more easily in an edit form:<input type=\"text\" name=\"source\" size=\"40\" id=\"source\" value=\"https://www.deviantart.com/art/Lara-Croft-Artifact-Dive-699335378\" />\n
This is a bit of a fragile location to parse from--Gelb could change or remove this form at any time, whereas the \"Posted: \"
<li>
is probably firmer, but I expect I wrote it before I had String Matches in. It works for now, which in this game is often Good Enough\u2122.Also--be careful pulling from text or tooltips rather than an href-like attribute, as whatever is presented to the user may be clipped for longer URLs. Make sure you try your rules on a couple of different pages to make sure you aren't pulling \"https://www.deviantart.com/art/Lara...\" by accident anywhere!
"},{"location":"downloader_parsers_full_example_file_page.html#summary","title":"summary","text":"Phew--all that for a bit of Lara Croft! Thankfully, most sites use similar schemes. Once you are familiar with the basic idea, the only real work is to duplicate an existing parser and edit for differences. Our final parser looks like this:
This is overall a decent parser. Some parts of it may fail when Gelbooru update to their next version, but that can be true of even very good parsers with multiple redundancy. For now, hydrus can use this to quickly and efficiently pull content from anything running Gelbooru 0.2.5., and the effort spent now can save millions of combined right-click->save as and manual tag copies in future. If you make something like this and share it about, you'll be doing a good service for those who could never figure it out.
"},{"location":"downloader_parsers_full_example_gallery_page.html","title":"gallery page example","text":"Caution
These guides should roughly follow what comes with the client by default! You might like to have the actual UI open in front of you so you can play around with the rules and try different test parses yourself.
Let's look at this page: https://e621.net/post/index/1/rating:safe pokemon
We've got 75 thumbnails and a bunch of page URLs at the bottom.
"},{"location":"downloader_parsers_full_example_gallery_page.html#main_page","title":"first, the main page","text":"This is easy. It gets a good name and some example URLs. e621 has some different ways of writing out their queries (and as they use some tags with '/', like 'male/female', this can cause character encoding issues depending on whether the tag is in the path or query!), but we'll put that off for now--we just want to parse some stuff.
"},{"location":"downloader_parsers_full_example_gallery_page.html#thumbnail_urls","title":"thumbnail links","text":"Most browsers have some good developer tools to let you Inspect Element and get a better view of the HTML DOM. Be warned that this information isn't always the same as View Source (which is what hydrus will get when it downloads the initial HTML document), as some sites load results dynamically with javascript and maybe an internal JSON API call (when sites move to systems that load more thumbs as you scroll down, it makes our job more difficult--in these cases, you'll need to chase down the embedded JSON or figure out what API calls their JS is making--the browser's developer tools can help you here again). Thankfully, e621 is (and most boorus are) fairly static and simple:
Every thumb on e621 is a
<span>
with class=\"thumb\" wrapping an<a>
and an<img>
. This is a common pattern, and easy to parse:There's no tricky String Matches or String Converters needed--we are just fetching hrefs. Note that the links get relative-matched to example.com for now--I'll probably fix this to apply to one of the example URLs, but rest assured that IRL the parser will 'join' its url up with the appropriate Gallery URL used to fetch the data. Sometimes, you might want to add a rule for
search descendents for the first <div> tag with id=content
to make sure you are only grabbing thumbs from the main box, whether that is a<div>
or a<span>
, and whether it hasid=\"content
\" orclass=\"mainBox\"
, but unless you know that booru likes to embed \"popular\" or \"favourite\" 'thumbs' up top that will be accidentally caught by a<span>
's withclass=\"thumb\"
, I recommend you not make your rules overly specific--all it takes is for their dev to change the name of their content box, and your whole parser breaks. I've ditched the<span>
requirement in the rule here for exactly that reason--class=\"thumb\"
is necessary and sufficient.Remember that the parsing system allows you to go up ancestors as well as down descendants. If your thumb-box has multiple links--like to see the artist's profile or 'set as favourite'--you can try searching for the
"},{"location":"downloader_parsers_full_example_gallery_page.html#next_gallery_url","title":"next gallery page link","text":"<span>
s, then down to the<img>
, and then up to the nearest<a>
. In English, this is saying, \"Find me all the image link URLs in the thumb boxes.\"Most boorus have 'next' or '>>' at the bottom, which can be simple enough, but many have a neat
<link href=\"/post/index/2/rating:safe%20pokemon\" rel=\"next\" />
in the<head>
. The<head>
solution is easier, if available, but my default e621 parser happens to pursue the 'paginator':As it happens, e621 also apply the
rel=\"next\"
attribute to their \"Next >>\" links, which makes it all that easier for us to find. Sometimes there is no \"next\" id or class, and you'll want to add a String Match to your html formula to test for a string value of '>>' or whatever it is. A good trick is to View Source and then search for the critical/post/index/2/
phrase you are looking for--you might find what you want in a<link>
tag you didn't expect or even buried in a hidden 'share to tumblr' button.<form>
s for reporting or commenting on content are another good place to find content ids.Note that this finds two URLs. e621 apply the
"},{"location":"downloader_parsers_full_example_gallery_page.html#summary","title":"summary","text":"rel=\"next\"
to both the \"2\" link and the \"Next >>\" one. The download engine merges the parser's dupes, so don't worry if you end up parsing both the 'top' and 'bottom' next page links, or if you use multiple rules to parse the same data in different ways.With those two rules, we are done. Gallery parsers are nice and simple.
"},{"location":"downloader_parsers_page_parsers.html","title":"Page Parsers","text":"We can now produce individual rows of rich metadata. To arrange them all into a useful structure, we will use Page Parsers.
The Page Parser is the top level parsing object. It takes a single document and produces a list--or a list of lists--of metadata. Here's the main UI:
Notice that the edit panel has three sub-pages.
"},{"location":"downloader_parsers_page_parsers.html#main","title":"main","text":"- Name: Like for content parsers, I recommend you add good names for your parsers.
- Pre-parsing conversion: If your API source encodes or wraps the data you want to parse, you can do some string transformations here. You won't need to use this very often, but if your source gives the JSON wrapped in javascript (like the old tumblr API), it can be invaluable.
- Example URLs: Here you should add a list of example URLs the parser works for. This lets the client automatically link this parser up with URL classes for you and any users you share the parser with.
This page is just a simple list:
Each content parser here will be applied to the document and returned in this page parser's results list. Like most boorus, e621's File Pages only ever present one file, and they have simple markup, so the solution here was simple. The full contents of that test window are:
*** 1 RESULTS BEGIN ***\n\ntag: character:krystal\ntag: creator:s mino930\nfile url: https://static1.e621.net/data/fc/b6/fcb673ed89241a7b8d87a5dcb3a08af7.jpg\ntag: anthro\ntag: black nose\ntag: blue fur\ntag: blue hair\ntag: clothing\ntag: female\ntag: fur\ntag: green eyes\ntag: hair\ntag: hair ornament\ntag: jewelry\ntag: short hair\ntag: solo\ntag: video games\ntag: white fur\ntag: series:nintendo\ntag: series:star fox\ntag: species:canine\ntag: species:fox\ntag: species:mammal\n\n*** RESULTS END ***\n
When the client sees this in a downloader context, it will where to download the file and which tags to associate with it based on what the user has chosen in their 'tag import options'.
"},{"location":"downloader_parsers_page_parsers.html#subsidiary_page_parsers","title":"subsidiary page parsers","text":"Here be dragons. This was an attempt to make parsing more helpful in certain API situations, but it ended up ugly. I do not recommend you use it, as I will likely scratch the whole thing and replace it with something better one day. It basically splits the page up into pieces that can then be parsed by nested page parsers as separate objects, but the UI and workflow is hell. Afaik, the imageboard API parsers use it, but little/nothing else. If you are really interested, check out how those work and maybe duplicate to figure out your own imageboard parser and/or send me your thoughts on how to separate File URL/timestamp combos better.
"},{"location":"downloader_sharing.html","title":"Sharing Downloaders","text":"If you are working with users who also understand the downloader system, you can swap your GUGs, URL Classes, and Parsers separately using the import/export buttons on the relevant dialogs, which work in pngs and clipboard text.
But if you want to share conveniently, and with users who are not familiar with the different downloader objects, you can package everything into a single easy-import png as per here.
The dialog to use is network->downloader components->export downloaders:
It isn't difficult. Essentially, you want to bundle enough objects to make one or more 'working' GUGs at the end. I recommend you start by just hitting 'add gug', which--using Example URLs--will attempt to figure out everything you need by itself.
This all works on Example URLs and some domain guesswork, so make sure your url classes are good and the parsers have correct Example URLs as well. If they don't, they won't all link up neatly for the end user. If part of your downloader is on a different domain to the GUGs and Gallery URLs, then you'll have to add them manually. Just start with 'add gug' and see if it looks like enough.
Once you have the necessary and sufficient objects added, you can export to png. You'll get a similar 'does this look right?' summary as what the end-user will see, just to check you have everything in order and the domains all correct. If that is good, then make sure to give the png a sensible filename and embellish the title and description if you need to. You can then send/post that png wherever, and any regular user will be able to use your work.
"},{"location":"downloader_url_classes.html","title":"URL Classes","text":"The fundamental connective tissue of the downloader system is the 'URL Class'. This object identifies and normalises URLs and links them to other components. Whenever the client handles a URL, it tries to match it to a URL Class to figure out what to do.
"},{"location":"downloader_url_classes.html#url_types","title":"the types of url","text":"For hydrus, an URL is useful if it is one of:
File URLThis returns the full, raw media file with no HTML wrapper. They typically end in a filename like http://safebooru.org//images/2333/cab1516a7eecf13c462615120ecf781116265f17.jpg, but sometimes they have a more complicated fetch command ending like 'file.php?id=123456' or '/post/content/123456'.
These URLs are remembered for the file in the 'known urls' list, so if the client happens to encounter the same URL in future, it can determine whether it can skip the download because the file is already in the database or has previously been deleted.
It is not important that File URLs be matched by a URL Class. File URL is considered the 'default', so if the client finds no match, it will assume the URL is a file and try to download and import the result. You might want to particularly specify them if you want to present them in the media viewer or discover File URLs are being confused for Post URLs or something.
Post URLThis typically return some HTML that contains a File URL and metadata such as tags and post time. They sometimes present multiple sizes (like 'sample' vs 'full size') of the file or even different formats (like 'ugoira' vs 'webm'). The Post URL for the file above, http://safebooru.org/index.php?page=post&s=view&id=2429668 has this 'sample' presentation. Finding the best File URL in these cases can be tricky!
This URL is also saved to 'known urls' and will usually be similarly skipped if it has previously been downloaded. It will also appear in the media viewer as a clickable link.
Gallery URL This presents a list of Post URLs or File URLs. They often also present a 'next page' URL. It could be a page like http://safebooru.org/index.php?page=post&s=list&tags=yorha_no._2_type_b&pid=0 or an API URL like http://safebooru.org/index.php?page=dapi&s=post&tags=yorha_no._2_type_b&q=index&pid=0. Watchable URL This is the same as a Gallery URL but represents an ephemeral page that receives new files much faster than a gallery but will soon 'die' and be deleted. For our purposes, this typically means imageboard threads."},{"location":"downloader_url_classes.html#url_components","title":"the components of a url","text":"As far as we are concerned, a URL string has four parts:
- Scheme:
http
orhttps
- Location/Domain:
safebooru.org
ori.4cdn.org
orcdn002.somebooru.net
- Path Components:
index.php
ortesla/res/7518.json
orpictures/user/daruak/page/2
orart/Commission-animation-Elsa-and-Anna-541820782
- Parameters:
page=post&s=list&tags=yorha_no._2_type_b&pid=40
orpage=post&s=view&id=2429668
So, let's look at the 'edit url class' panel, which is found under network->downloader components->manage url classes:
A TBIB File Page like https://tbib.org/index.php?page=post&s=view&id=6391256 is a Post URL. Let's look at the metadata first:
Name and typeLike with GUGs, we should set a good unambiguous name so the client can clearly summarise this url to the user. 'tbib file page' is good.
This is a Post URL, so we set the 'post url' type.
Association logicAll boorus and most sites only present one file per page, but some sites present multiple files on one page, usually several pages in a series/comic, as with pixiv. Danbooru-style thumbnail links to 'this file has a post parent' do not count here--I mean that a single URL embeds multiple full-size images, either with shared or separate tags. It is very important to the hydrus client's downloader logic (making decisions about whether it has previously visited a URL, so whether to skip checking it again) that if a site can present multiple files on a single page that 'can produce multiple files' is checked.
Related is the idea of whether a 'known url' should be associated. Typically, this should be checked for Post and File URLs, which are fixed, and unchecked for Gallery and Watchable URLs, which are ephemeral and give different results from day to day. There are some unusual exceptions, so give it a brief thought--but if you have no special reason, leave this as the default for the url type.
And now, for matching the string itself, let's revisit our four components:
Scheme TBIB supports http and https, so I have set the 'preferred' scheme to https. Any 'http' TBIB URL a user inputs will be automatically converted to https. Location/DomainFor Post URLs, the domain is always \"tbib.org\".
The 'allow' and 'keep' subdomains checkboxes let you determine if a URL with \"artistname.artsite.com\" will match a URL Class with \"artsite.com\" domain and if that subdomain should be remembered going forward. Most sites do not host content on subdomains, so you can usually leave 'match' unchecked. The 'keep' option (which is only available if 'keep' is checked) is more subtle, only useful for rare cases, and unless you have a special reason, you should leave it checked. (For keep: In cases where a site farms out File URLs to CDN servers on subdomains--like randomly serving a mirror of \"https://muhbooru.org/file/123456\" on \"https://srv2.muhbooru.org/file/123456\"--and removing the subdomain still gives a valid URL, you may not wish to keep the subdomain.) Since TBIB does not use subdomains, these options do not matter--we can leave both unchecked.
'www' and 'www2' and similar subdomains are automatically matched. Don't worry about them.
Path Components TBIB just uses a single \"index.php\" on the root directory, so the path is not complicated. Were it longer (like \"gallery/cgi/index.php\", we would add more (\"gallery\" and \"cgi\"), and since the path of a URL has a strict order, we would need to arrange the items in the listbox there so they were sorted correctly. Parameters TBIB's index.php takes many parameters to render different page types. Note that the Post URL uses \"s=view\", while TBIB Gallery URLs use \"s=list\". In any case, for a Post URL, \"id\", \"page\", and \"s\" are necessary and sufficient."},{"location":"downloader_url_classes.html#string_matches","title":"string matches","text":"As you edit these components, you will be presented with the Edit String Match Panel:
This lets you set the type of string that will be valid for that component. If a given path or query component does not match the rules given here, the URL will not match the URL Class. Most of the time you will probably want to set 'fixed characters' of something like \"post\" or \"index.php\", but if the component you are editing is more complicated and could have a range of different valid values, you can specify just numbers or letters or even a regex pattern. If you try to do something complicated, experiment with the 'example string' entry to make sure you have it set how you think.
Don't go overboard with this stuff, though--most sites do not have super-fine distinctions between their different URL types, and hydrus users will not be dropping user account or logout pages or whatever on the client, so you can be fairly liberal with the rules.
"},{"location":"downloader_url_classes.html#match_details","title":"how do they match, exactly?","text":"This URL Class will be assigned to any URL that matches the location, path, and query. Missing path component or parameters in the URL will invalidate the match but additonal ones will not!
For instance, given:
- URL A: https://8ch.net/tv/res/1002432.html
- URL B: https://8ch.net/tv/res
- URL C: https://8ch.net/tv/res/1002432
- URL D: https://8ch.net/tv/res/1002432.json
- URL Class that looks for \"(characters)/res/(numbers).html\" for the path
Only URL A will match
And:
- URL A: https://boards.4chan.org/m/thread/16086187
- URL B: https://boards.4chan.org/m/thread/16086187/ssg-super-sentai-general-651
- URL Class that looks for \"(characters)/thread/(numbers)\" for the path
Both URL A and B will match
And:
- URL A: https://www.pixiv.net/member_illust.php?mode=medium&illust_id=66476204
- URL B: https://www.pixiv.net/member_illust.php?mode=medium&illust_id=66476204&lang=jp
- URL C: https://www.pixiv.net/member_illust.php?mode=medium
- URL Class that looks for \"illust_id=(numbers)\" in the query
Both URL A and B will match, URL C will not
If multiple URL Classes match a URL, the client will try to assign the most 'complicated' one, with the most path components and then parameters.
Given two example URLs and URL Classes:
- URL A: https://somebooru.com/post/123456
- URL B: https://somebooru.com/post/123456/manga_subpage/2
- URL Class A that looks for \"post/(number)\" for the path
- URL Class B that looks for \"post/(number)/manga_subpage/(number)\" for the path
URL A will match URL Class A but not URL Class B and so will receive A.
URL B will match both and receive URL Class B as it is more complicated.
This situation is not common, but when it does pop up, it can be a pain. It is usually a good idea to match exactly what you need--no more, no less.
"},{"location":"downloader_url_classes.html#url_normalisation","title":"normalising urls","text":"Different URLs can give the same content. The http and https versions of a URL are typically the same, and:
- https://gelbooru.com/index.php?page=post&s=view&id=3767497
- gives the same as:
- https://gelbooru.com/index.php?id=3767497&page=post&s=view
And:
- https://e621.net/post/show/1421754/abstract_background-animal_humanoid-blush-brown_ey
- is the same as:
- https://e621.net/post/show/1421754
- is the same as:
- https://e621.net/post/show/1421754/help_computer-made_up_tags-REEEEEEEE
Since we are in the business of storing and comparing URLs, we want to 'normalise' them to a single comparable beautiful value. You see a preview of this normalisation on the edit panel. Normalisation happens to all URLs that enter the program.
Note that in e621's case (and for many other sites!), that text after the id is purely decoration. It can change when the file's tags change, so if we want to compare today's URLs with those we saw a month ago, we'd rather just be without it.
On normalisation, all URLs will get the preferred http/https switch, and their parameters will be alphabetised. File and Post URLs will also cull out any surplus path or query components. This wouldn't affect our TBIB example above, but it will clip the e621 example down to that 'bare' id URL, and it will take any surplus 'lang=en' or 'browser=netscape_24.11' garbage off the query text as well. URLs that are not associated and saved and compared (i.e. normal Gallery and Watchable URLs) are not culled of unmatched path components or query parameters, which can sometimes be useful if you want to match (and keep intact) gallery URLs that might or might not include an important 'sort=desc' type of parameter.
Since File and Post URLs will do this culling, be careful that you not leave out anything important in your rules. Make sure what you have is both necessary (nothing can be removed and still keep it valid) and sufficient (no more needs to be added to make it valid). It is a good idea to try pasting the 'normalised' version of the example URL into your browser, just to check it still works.
"},{"location":"downloader_url_classes.html#default_values","title":"'default' values","text":"Some sites present the first page of a search like this:
https://danbooru.donmai.us/posts?tags=skirt
But the second page is:
https://danbooru.donmai.us/posts?tags=skirt&page=2
Another example is:
https://www.hentai-foundry.com/pictures/user/Mister69M
https://www.hentai-foundry.com/pictures/user/Mister69M/page/2
What happened to 'page=1' and '/page/1'? Adding those '1' values in works fine! Many sites, when an index is absent, will secretly imply an appropriate 0 or 1. This looks pretty to users looking at a browser address bar, but it can be a pain for us, who want to match both styles to one URL Class. It would be nice if we could recognise the 'bare' initial URL and fill in the '1' values to coerce it to the explicit, automation-friendly format. Defaults to the rescue:
After you set a path component or parameter String Match, you will be asked for an optional 'default' value. You won't want to set one most of the time, but for Gallery URLs, it can be hugely useful--see how the normalisation process automatically fills in the missing path component with the default! There are plenty of examples in the default Gallery URLs of this, so check them out. Most sites use page indices starting at '1', but Gelbooru-style imageboards use 'pid=0' file index (and often move forward 42, so the next pages will be 'pid=42', 'pid=84', and so on, although others use deltas of 20 or 40).
"},{"location":"downloader_url_classes.html#next_gallery_page_prediction","title":"can we predict the next gallery page?","text":"Now we can harmonise gallery urls to a single format, we can predict the next gallery page! If, say, the third path component or 'page' parameter is always a number referring to page, you can select this under the 'next gallery page' section and set the delta to change it by. The 'next gallery page url' section will be automatically filled in. This value will be consulted if the parser cannot find a 'next gallery page url' from the page content.
It is neat to set this up, but I only recommend it if you actually cannot reliably parse a next gallery page url from the HTML later in the process. It is neater to have searches stop naturally because the parser said 'no more gallery pages' than to have hydrus always one page beyond and end every single search on an uglier 'No results found' or 404 result.
Unfortunately, some sites will either not produce an easily parsable next page link or randomly just not include it due to some issue on their end (Gelbooru is a funny example of this). Also, APIs will often have a kind of 'start=200&num=50', 'start=250&num=50' progression but not include that state in the XML or JSON they return. These cases require the automatic next gallery page rules (check out Artstation and tumblr api gallery page URL Classes in the defaults for examples of this).
"},{"location":"downloader_url_classes.html#api_links","title":"how do we link to APIs?","text":"If you know that a URL has an API backend, you can tell the client to use that API URL when it fetches data. The API URL needs its own URL Class.
To define the relationship, click the \"String Converter\" button, which gives you this:
You may have seen this panel elsewhere. It lets you convert a string to another over a number of transformation steps. The steps can be as simple as adding or removing some characters or applying a full regex substitution. For API URLs, you are mostly looking to isolate some unique identifying data (\"m/thread/16086187\" in this case) and then substituting that into the new API path. It is worth testing this with several different examples!
When the client links regular URLs to API URLs like this, it will still associate the human-pretty regular URL when it needs to display to the user and record 'known urls' and so on. The API is just a quick lookup when it actually fetches and parses the respective data.
"},{"location":"duplicates.html","title":"duplicates","text":"As files are shared on the internet, they are often resized, cropped, converted to a different format, altered by the original or a new artist, or turned into a template and reinterpreted over and over and over. Even if you have a very restrictive importing workflow, your client is almost certainly going to get some duplicates. Some will be interesting alternate versions that you want to keep, and others will be thumbnails and other low-quality garbage you accidentally imported and would rather delete. Along the way, it would be nice to merge your ratings and tags to the better files so you don't lose any work.
Finding and processing duplicates within a large collection is impossible to do by hand, so I have written a system to do the heavy lifting for you. It currently works on still images, but an extension for gifs and video is planned.
Hydrus finds potential duplicates using a search algorithm that compares images by their shape. Once these pairs of potentials are found, they are presented to you through a filter like the archive/delete filter to determine their exact relationship and if you want to make a further action, such as deleting the 'worse' file of a pair. All of your decisions build up in the database to form logically consistent groups of duplicates and 'alternate' relationships that can be used to infer future information. For instance, if you say that file A is a duplicate of B and B is a duplicate of C, A and C are automatically recognised as duplicates as well.
This all starts on--
"},{"location":"duplicates.html#duplicates_page","title":"the duplicates processing page","text":"On the normal 'new page' selection window, hit special->duplicates processing. This will open this page:
Let's go to the preparation page first:
The 'similar shape' algorithm works on distance. Two files with 0 distance are likely exact matches, such as resizes of the same file or lower/higher quality jpegs, whereas those with distance 4 tend to be to be hairstyle or costume changes. You will be starting on distance 0 and not expect to ever go above 4 or 8 or so. Going too high increases the danger of being overwhelmed by false positives.
If you are interested, the current version of this system uses a 64-bit phash to represent the image shape and a VPTree to search different files' phashes' relative hamming distance. I expect to extend it in future with multiple phash generation (flips, rotations, and 'interesting' image crops and video frames) and most-common colour comparisons.
Searching for duplicates is fairly fast per file, but with a large client with hundreds of thousands of files, the total CPU time adds up. You can do a little manual searching if you like, but once you are all settled here, I recommend you hit the cog icon on the preparation page and let hydrus do this page's catch-up search work in your regular maintenance time. It'll swiftly catch up and keep you up to date without you even thinking about it.
Start searching on the 'exact match' search distance of 0. It is generally easier and more valuable to get exact duplicates out of the way first.
Once you have some files searched, you should see a potential pair count appear in the 'filtering' page.
"},{"location":"duplicates.html#duplicate_filtering_page","title":"the filtering page","text":"Processing duplicates can be real trudge-work if you do not set up a workflow you enjoy. It is a little slower than the archive/delete filter, and sometimes takes a bit more cognitive work. For many users, it is a good task to do while listening to a podcast or having a video going on another screen.
If you have a client with tens of thousands of files, you will likely have thousands of potential pairs. This can be intimidating, but do not worry--due to the A, B, C logical inferrences as above, you will not have to go through every single one. The more information you put into the system, the faster the number will drop.
The filter has a regular file search interface attached. As you can see, it defaults to system:everything, but you can limit what files you will be working on simply by adding new search predicates. You might like to only work on files in your archive (i.e. that you know you care about to begin with), for instance. You can choose whether both files of the pair should match the search, or just one. 'creator:' tags work very well at cutting the search domain to something more manageable and consistent--try your favourite creator!
If you would like an example from the current search domain, hit the 'show some random potential pairs' button, and it will show two or more files that seem related. It is often interesting and surprising to see what it finds! The action buttons below allow for quick processing of these pairs and groups when convenient (particularly for large cg sets with 100+ alternates), but I recommend you leave these alone until you know the system better.
When you are ready, launch the filter.
"},{"location":"duplicates.html#duplicates_filter","title":"the duplicates filter","text":"We have not set up your duplicate 'merge' options yet, so do not get too into this. For this first time, just poke around, make some pretend choices, and then cancel out and choose to forget them.
Like the archive/delete filter, this uses quick mouse-clicks, keyboard shortcuts, or button clicks to action pairs. It presents two files at a time, labelled A and B, which you can quickly switch between just as in the normal media viewer. As soon as you action them, the next pair is shown. The two files will have their current zoom-size locked so they stay the same size (and in the same position) as you switch between them. Scroll your mouse wheel a couple of times and see if any obvious differences stand out.
Please note the hydrus media viewer does not currently work well with large resolutions at high zoom (it gets laggy and may have memory issues). Don't zoom in to 1600% and try to look at jpeg artifact differences on very large files, as this is simply not well supported yet.
The hover window on the right also presents a number of 'comparison statements' to help you make your decision. Green statements mean this current file is probably 'better', and red the opposite. Larger, older, higher-quality, more-tagged files are generally considered better. These statements have scores associated with them (which you can edit in file->options->duplicates), and the file of the pair with the highest score is presented first. If the files are duplicates, you can generally assume the first file you see, the 'A', is the better, particularly if there are several green statements.
The filter will need to occasionally checkpoint, saving the decisions so far to the database, before it can fetch the next batch. This allows it to apply inferred information from your current batch and reduce your pending count faster before serving up the next set. It will present you with a quick interstitial 'confirm/back' dialog just to let you know. This happens more often as the potential count decreases.
"},{"location":"duplicates.html#duplicates_decisions","title":"the decisions to make","text":"There are three ways a file can be related to another in the current duplicates system: duplicates, alternates, or false positive (not related).
False positive (not related) is the easiest. You will not see completely unrelated pairs presented very often in the filter, particularly at low search distances, but if the shape of face and hair and clothing happen to line up (or geometric shapes, often), the search system may make a false positive match. In this case, just click 'they are not related'.
Alternate relations are files that are not duplicates but obviously related in some way. Perhaps a costume change or a recolour. Hydrus does not have rich alternate support yet (but it is planned, and highly requested), so this relationship is mostly a 'holding area' for files that we will revisit for further processing in the future.
Duplicate files are of the exact same thing. They may be different resolutions, file formats, encoding quality, or one might even have watermark, but they are fundamentally different views on the exact same art. As you can see with the buttons, you can select one file as the 'better' or say they are about the same. If the files are basically the same, there is no point stressing about which is 0.2% better--just click 'they are the same'. For better/worse pairs, you might have reason to keep both, but most of the time I recommend you delete the worse.
You can customise the shortcuts under file->shortcuts->duplicate_filter. The defaults are:
-
Left-click: this is better, delete the other.
-
Right-click: they are related alternates.
-
Middle-click: Go back one decision.
-
Enter/Escape: Stop filtering.
If two duplicates have different metadata like tags or archive status, you probably want to merge them. Cancel out of the filter and click the 'edit default duplicate metadata merge options' button:
By default, these options are fairly empty. You will have to set up what you want based on your services and preferences. Setting a simple 'copy all tags' is generally a good idea, and like/dislike ratings also often make sense. The settings for better and same quality should probably be similar, but it depends on your situation.
If you choose the 'custom action' in the duplicate filter, you will be presented with a fresh 'edit duplicate merge options' panel for the action you select and can customise the merge specifically for that choice. ('favourite' options will come here in the future!)
Once you are all set up here, you can dive into the duplicate filter. Please let me know how you get on with it!
"},{"location":"duplicates.html#future","title":"what now?","text":"The duplicate system is still incomplete. Now the db side is solid, the UI needs to catch up. Future versions will show duplicate information on thumbnails and the media viewer and allow quick-navigation to a file's duplicates and alternates.
For now, if you wish to see a file's duplicates, right-click it and select file relationships. You can review all its current duplicates, open them in a new page, appoint the new 'best file' of a duplicate group, and even mass-action selections of thumbnails.
You can also search for files based on the number of file relations they have (including when setting the search domain of the duplicate filter!) using system:file relationships. You can also search for best/not best files of groups, which makes it easy, for instance, to find all the spare duplicate files if you decide you no longer want to keep them.
I expect future versions of the system to also auto-resolve easy duplicate pairs, such as clearing out pixel-for-pixel png versions of jpgs.
"},{"location":"duplicates.html#game_cgs","title":"game cgs","text":"If you import a lot of game CGs, which frequently have dozens or hundreds of alternates, I recommend you set them as alternates by selecting them all and setting the status through the thumbnail right-click menu. The duplicate filter, being limited to pairs, needs to compare all new members of an alternate group to all other members once to verify they are not duplicates. This is not a big deal for alternates with three or four members, but game CGs provide an overwhelming edge case. Setting a group of thumbnails as alternate 'fixes' their alternate status immediately, discounting the possibility of any internate duplicates, and provides an easy way out of this situation.
"},{"location":"duplicates.html#duplicates_examples","title":"more information and examples","text":""},{"location":"duplicates.html#duplicates_examples_better_worse","title":"better/worse","text":"Which of two files is better? Here are some common reasons:
- higher resolution
- better image quality
- png over jpg for screenshots
- jpg over png for busy images
- jpg over png for pixel-for-pixel duplicates
- a better crop
- no watermark or site-frame or undesired blemish
- has been tagged by other people, so is likely to be the more 'popular'
However these are not hard rules--sometimes a file has a larger resolution or filesize due to a bad upscaling or encoding decision by the person who 'reinterpreted' it. You really have to look at it and decide for yourself.
Here is a good example of a better/worse pair:
The first image is better because it is a png (pixel-perfect pngs are always better than jpgs for screenshots of applications--note how obvious the jpg's encoding artifacts are on the flat colour background) and it has a slightly higher (original) resolution, making it less blurry. I presume the second went through some FunnyJunk-tier trash meme site to get automatically cropped to 960px height and converted to the significantly smaller jpeg. Whatever happened, let's drop the second and keep the first.
When both files are jpgs, differences in quality are very common and often significant:
Again, this is mostly due to some online service resizing and lowering quality to ease on their bandwidth costs. There is usually no reason to keep the lower quality version.
"},{"location":"duplicates.html#duplicates_examples_same","title":"same quality duplicates","text":"When are two files the same quality? A good rule of thumb is if you scroll between them and see no obvious differences, and the comparison statements do not suggest anything significant, just set them as same quality.
Here are two same quality duplicates:
There is no obvious different between those two. The filesize is significantly different, so I suspect the smaller is a lossless png optimisation, but in the grand scheme of things, that doesn't matter so much. Many of the big content providers--Facebook, Google, Cloudflare--automatically 'optimise' the data that goes through their networks in order to save bandwidth. Although jpegs are often a slaughterhouse, with pngs it is usually harmless.
Given the filesize, you might decide that these are actually a better/worse pair--but if the larger image had tags and was the 'canonical' version on most boorus, the decision might not be so clear. You can choose better/worse and delete one randomly, but sometimes you may just want to keep both without a firm decision on which is best, so just set 'same quality' and move on. Your time is more valuable than a few dozen KB.
Sometimes, you will see pixel-for-pixel duplicate jpegs of very slightly different size, such as 787KB vs 779KB. The smaller of these is usually an exact duplicate that has had its internal metadata (e.g. EXIF tags) stripped by a program or website CDN. They are same quality unless you have a strong opinion on whether having internal metadata in a file is useful.
"},{"location":"duplicates.html#duplicates_examples_alternates","title":"alternates","text":"As I wrote above, hydrus's alternates system in not yet properly ready. It is important to have a basic 'alternates' relationship for now, but it is a holding area until we have a workflow to apply 'WIP'- or 'recolour'-type labels and present that information nicely in the media viewer.
Alternates are not of exactly the same thing, but one is variant of the other or they are both descended from a common original. The precise definition is up to you, but it generally means something like:
- the files are recolours
- the files are alternate versions of the same image produced by the same or different artists (e.g. clean/messy or with/without hair ribbon)
- iterations on a close template
- different versions of a file's progress, such as the steps from the initial draft sketch to a final shaded version
Here are some recolours of the same image:
And some WIP:
And a costume change:
None of these are duplicates, but they are obviously related. The duplicate search will notice they are similar, so we should let the client know they are 'alternate'.
Here's a subtler case:
These two files are very similar, but try opening both in separate tabs and then flicking back and forth: the second's glove-string is further into the mouth and has improved chin shading, a more refined eye shape, and shaved pubic hair. It is simple to spot these differences in the client's duplicate filter when you scroll back and forth.
I believe the second is an improvement on the first by the same artist, so it is a WIP alternate. You might also consider it a 'better' improvement.
Here are three files you might or might not consider to be alternates:
These are all based on the same template--which is why the dupe filter found them--but they are not so closely related as those above, and the last one is joking about a different ideology entirely and might deserve to be in its own group. Ultimately, you might prefer just to give them some shared tag and consider them not alternates per se.
"},{"location":"duplicates.html#duplicates_examples_false_positive","title":"not related/false positive","text":"Here are two files that match false positively:
Despite their similar shape, they are neither duplicates nor of even the same topic. The only commonality is the medium. I would not consider them close enough to be alternates--just adding something like 'screenshot' and 'imageboard' as tags to both is probably the closest connection they have.
Recording the 'false positive' relationship is important to make sure the comparison does not come up again in the duplicate filter.
The incidence of false positives increases as you broaden the search distance--the less precise your search, the less likely it is to be correct. At distance 14, these files all match, but uselessly:
"},{"location":"duplicates.html#duplicates_advanced","title":"the duplicates system","text":"(advanced nonsense, you can skip this section. tl;dr: duplicate file groups keep track of their best quality file, sometimes called the King)
Hydrus achieves duplicate transitivity by treating duplicate files as groups. Although you action pairs, if you set (A duplicate B), that creates a group (A,B). Subsequently setting (B duplicate C) extends the group to be (A,B,C), and so (A duplicate C) is transitively implied.
The first version of the duplicate system attempted to record better/worse/same information for all files in a virtual duplicate group, but this proved very complicated, workflow-heavy, and not particularly useful. The new system instead appoints a single King as the best file of a group. All other files in the group are beneath the King and have no other relationship data retained.
This King represents the group in the duplicate filter (and in potential pairs, which are actually recorded between duplicate media groups--even if most of them at the outset only have one member). If the other file in a pair is considered better, it becomes the new King, but if it is worse or equal, it merges into the other members. When two Kings are compared, whole groups can merge!
Alternates are stored in a similar way, except the members are duplicate groups rather than individual files and they have no significant internal relationship metadata yet. If \u03b1, \u03b2, and \u03b3 are duplicate groups that each have one or more files, then setting (\u03b1 alt \u03b2) and (\u03b2 alt \u03b3) creates an alternate group (\u03b1,\u03b2,\u03b3), with the caveat that \u03b1 and \u03b3 will still be sent to the duplicate filter once just to check they are not duplicates by chance. The specific file members of these groups, A, B, C and so on, inherit the relationships of their parent groups when you right-click on their thumbnails.
False positive relationships are stored between pairs of alternate groups, so they apply transitively between all the files of either side's alternate group. If (\u03b1 alt \u03b2) and (\u03c8 alt \u03c9) and you apply (\u03b1 fp \u03c8), then (\u03b1 fp \u03c9), (\u03b2 fp \u03c8), and (\u03b2 fp \u03c9) are all transitively implied.
More examplesThe duplicates filter can be pretty tedious to work with. Pairs that have trivial differences are easy to resolve, but working through dozens of obvious resizes or pixel duplicates that all follow the same pattern can get boring.
If only there were some way to automate common situations! We could have hydrus solve these trivial duplicates in the background, leaving us with less, more interesting work to do.
"},{"location":"duplicates_auto_resolution.html#duplicates_auto-resolution","title":"duplicates auto-resolution","text":"This is a new system that I am still developing. The plan is to roll out a hardcoded rule that resolves jpeg and png pixel dupes and then iterate on the UI and workflow to let users add their own custom rules. If you try it, let me know how you find things!
So, let's start with a simple and generally non-controversial example: pixel duplicate jpegs and pngs. When you save a jpeg, you get some 'fuzzy' artifacts, but when you save a png, it is always pixel perfect. Thus, when you have a normal jpeg and a png that are pixel duplicates, you know, for certain, that the png is a copy of the jpeg. This happens most often when someone is posting from one application to another, or with a phone, and rather than uploading the source jpeg, they do 'copy image' and paste that into the upload box--the browser creates the accursed 'Clipboard.png', and we are thus overwhelmed with spam.
In this case, we always want to keep the (almost always smaller) jpeg and ditch the (bloated, derived) png, which in the duplicates system would be:
- A two-part duplicates search, for 'system:filetype is jpg' and 'system:filetype is png', with 'must be pixel dupes'.
- Arranging 'the jpeg is A, the png is B'
- Sending the normal duplicate action of 'set A as better than B, and delete B'.
Let's check out the 'auto-resolution' tab under the duplicates filtering page:
(image)
The auto-resolution system lets you have multiple 'rules'. Each represents a search, a way of testing pairs, and then an action. Let's check the edit dialog:
(image of edit rules)
(image of edit rule, png vs jpeg)
Note that this adds the 'system:height/width > 128' predicates as a failsafe to ensure we are checking real images in this case, not tiny 16x16 icons where there might be a legitimate accidentaly jpeg/png pixel dupe, and where the decision on what to keep is not so simple. Automated systems are powerful magic wands, and we should always be careful waving them around.
Talk about metadata conditional objects here.
Talk about the pair Comparator stuff, 4x filesize and so on. Might be more UI, so maybe a picture of the sub-panel.
Hydrus will work these rules in its normal background maintenance time. You can force them to work a bit harder if you want to catch up somewhere, but normally you can just leave them alone and they'll stay up to date with new imports.
"},{"location":"duplicates_auto_resolution.html#future","title":"future","text":"I will expand the Metadata Conditional to cover more tests, including most of the hooks in the duplicates filter summaries, like 'this has exif data'. And, assuming the trivial cases go well, I'd like to push toward less-certain comparions and have some sort of tools for 'A is at least 99.7% similar to B', which will help with resize comparisons and differentiating dupes from alternates.
I'd also eventually like auto-resolution to apply to files as they are imported, so, in the vein of 'previously deleted', you could have an instant import result of 'duplicate discarded: (rule name)'.
"},{"location":"faq.html","title":"FAQ","text":""},{"location":"faq.html#repositories","title":"What is a repository?","text":"A repository is a service in the hydrus network that stores a certain kind of information--files or tag mappings, for instance--as submitted by users all over the internet. Those users periodically synchronise with the repository so they know everything that it stores. Sometimes, like with tags, this means creating a complete local copy of everything on the repository. Hydrus network clients never send queries to repositories; they perform queries over their local cache of the repository's data, keeping everything confined to the same computer.
"},{"location":"faq.html#tags","title":"What is a tag?","text":"Wikipedia
A tag is a small bit of text describing a single property of something. They make searching easy. Good examples are \"flower\" or \"nicolas cage\" or \"the sopranos\" or \"2003\". By combining several tags together ( e.g. [ 'tiger woods', 'sports illustrated', '2008' ] or [ 'cosplay', 'the legend of zelda' ] ), a huge image collection is reduced to a tiny and easy-to-digest sample.
A good word for the connection of a particular tag to a particular file is mapping.
Hydrus is designed with the intention that tags are for searching, not describing. Workflows and UI are tuned for finding files and other similar files (e.g. by the same artist), and while it is possible to have nice metadata overlays around files, this is not considered their chief purpose. Trying to have 'perfect' descriptions for files is often a rabbit-hole that can consume hours of work with relatively little demonstrable benefit.
All tags are automatically converted to lower case. 'Sunset Drive' becomes 'sunset drive'. Why?
- Although it is more beautiful to have 'The Lord of the Rings' rather than 'the lord of the rings', there are many, many special cases where style guides differ on which words to capitalise.
- As 'The Lord of the Rings' and 'the lord of the rings' are semantically identical, it is natural to search in a case insensitive way. When case does not matter, what point is there in recording it?
Furthermore, leading and trailing whitespace is removed, and multiple whitespace is collapsed to a single character.
' yellow dress '
becomes
'yellow dress'
"},{"location":"faq.html#namespaces","title":"What is a namespace?","text":"A namespace is a category that in hydrus prefixes a tag. An example is 'person' in the tag 'person:ron paul'--it lets people and software know that 'ron paul' is a name. You can create any namespace you like; just type one or more words and then a colon, and then the next string of text will have that namespace.
The hydrus client gives namespaces different colours so you can pick out important tags more easily in a large list, and you can also search by a particular namespace, even creating complicated predicates like 'give all files that do not have any character tags', for instance.
"},{"location":"faq.html#filenames","title":"Why not use filenames and folders?","text":"As a retrieval method, filenames and folders are less and less useful as the number of files increases. Why?
- A filename is not unique; did you mean this \"04.jpg\" or this \"04.jpg\" in another folder? Perhaps \"04 (3).jpg\"?
- A filename is not guaranteed to describe the file correctly, e.g. hello.jpg
- A filename is not guaranteed to stay the same, meaning other programs cannot rely on the filename address being valid or even returning the same data every time.
-
A filename is often--for ridiculous reasons--limited to a certain prohibitive character set. Even when utf-8 is supported, some arbitrary ascii characters are usually not, and different localisations, operating systems and formatting conventions only make it worse.
-
Folders can offer context, but they are clunky and time-consuming to change. If you put each chapter of a comic in a different folder, for instance, reading several volumes in one sitting can be a pain. Nesting many folders adds navigation-latency and tends to induce less informative \"04.jpg\"-type filenames.
So, the client tracks files by their hash. This technical identifier easily eliminates duplicates and permits the database to robustly attach other metadata like tags and ratings and known urls and notes and everything else, even across multiple clients and even if a file is deleted and later imported.
As a general rule, I suggest you not set up hydrus to parse and display all your imported files' filenames as tags. 'image.jpg' is useless as a tag. Shed the concept of filenames as you would chains.
"},{"location":"faq.html#external_files","title":"Can the client manage files from their original locations?","text":"When the client imports a file, it makes a quickly accessible but human-ugly copy in its internal database, by default under install_dir/db/client_files. When it needs to access that file again, it always knows where it is, and it can be confident it is what it expects it to be. It never accesses the original again.
This storage method is not always convenient, particularly for those who are hesitant about converting to using hydrus completely and also do not want to maintain two large copies of their collections. The question comes up--\"can hydrus track files from their original locations, without having to copy them into the db?\"
The technical answer is, \"This support could be added,\" but I have decided not to, mainly because:
- Files stored in locations outside of hydrus's responsibility can change or go missing (particularly if a whole parent folder is moved!), which erodes the assumptions it makes about file access, meaning additional checks would have to be added before important operations, often with no simple recovery.
- External duplicates would not be merged, and the file system would have to be extended to handle pointless 1->n hash->path relationships.
- Many regular operations--like figuring out whether orphaned files should be physically deleted--are less simple.
- Backing up or restoring a distributed external file system is much more complicated.
- It would require more code to maintain and would mean a laggier db and interface.
- Hydrus is an attempt to get away from files and folders--if a collection is too large and complicated to manage using explorer, what's the point in supporting that old system?
It is not unusual for new users who ask for this feature to find their feelings change after getting more experience with the software. If desired, path text can be preserved as tags using regexes during import, and getting into the swing of searching by metadata rather than navigating folders often shows how very effective the former is over the latter. Most users eventually import most or all of their collection into hydrus permanently, deleting their old folder structure as they go.
For this reason, if you are hesitant about doing things the hydrus way, I advise you try running it on a smaller subset of your collection, say 5,000 files, leaving the original copies completely intact. After a month or two, think about how often you used hydrus to look at the files versus navigating through folders. If you barely used the folders, you probably do not need them any more, but if you used them a lot, then hydrus might not be for you, or it might only be for some sorts of files in your collection.
"},{"location":"faq.html#sqlite","title":"Why use SQLite?","text":"Hydrus uses SQLite for its database engine. Some users who have experience with other engines such as MySQL or PostgreSQL sometimes suggest them as alternatives. SQLite serves hydrus's needs well, and at the moment, there are no plans to change.
Since this question has come up frequently, a user has written an excellent document talking about the reasons to stick with SQLite. If you are interested in this subject, please check it out here:
https://gitgud.io/prkc/hydrus-why-sqlite/blob/master/README.md
"},{"location":"faq.html#hashes","title":"What is a hash?","text":"Wikipedia
Hashes are a subject you usually have to be a software engineer to find interesting. The simple answer is that they are unique names for things. Hashes make excellent identifiers inside software, as you can safely assume that f099b5823f4e36a4bd6562812582f60e49e818cf445902b504b5533c6a5dad94 refers to one particular file and no other. In the client's normal operation, you will never encounter a file's hash. If you want to see a thumbnail bigger, double-click it; the software handles the mathematics.
For those who are interested: hydrus uses SHA-256, which spits out 32-byte (256-bit) hashes. The software stores the hash densely, as 32 bytes, only encoding it to 64 hex characters when the user views it or copies to clipboard. SHA-256 is not perfect, but it is a great compromise candidate; it is secure for now, it is reasonably fast, it is available for most programming languages, and newer CPUs perform it more efficiently all the time.
"},{"location":"faq.html#access_keys","title":"What is an access key?","text":"The hydrus network's repositories do not use username/password, but instead a single strong identifier-password like this:
7ce4dbf18f7af8b420ee942bae42030aab344e91dc0e839260fcd71a4c9879e3
These hex numbers give you access to a particular account on a particular repository, and are often combined like so:
7ce4dbf18f7af8b420ee942bae42030aab344e91dc0e839260fcd71a4c9879e3@hostname.com:45871
They are long enough to be impossible to guess, and also randomly generated, so they reveal nothing personally identifying about you. Many people can use the same access key (and hence the same account) on a repository without consequence, although they will have to share any bandwidth limits, and if one person screws around and gets the account banned, everyone will lose access.
The access key is the account. Do not give it to anyone you do not want to have access to the account. An administrator will never need it; instead they will want your account id.
"},{"location":"faq.html#account_ids","title":"What is an account id?","text":"This is another long string of random hexadecimal that identifies your account without giving away access. If you need to identify yourself to a repository administrator (say, to get your account's permissions modified), you will need to tell them your account id. You can copy it to your clipboard in services->review services.
"},{"location":"faq.html#service_isolation","title":"Why does the file I deleted and then re-imported still have its tags?","text":"Hydrus splits its different abilities and domains (e.g. the list of files on your disk, or the tag mappings in 'my tags', or your files' notes) into separate services. You can see these in review services and manage services. Although the services of the same type may interact (e.g. deleting a file from one service might send that file to the 'trash' service, or adding tag parents to one tag service might implicate tags on another), those of different types are generally completely independent. Your tags don't care where the files they map to are.
So, when you delete a file from 'my files', none of its tag mappings in 'my tags' change--they remain attached to the 'ghost' of the deleted file. Your notes, ratings, and known URLs are the same (URLs is important, since it lets the client skip URLs for files you previously deleted). If you re-import the file, it will have everything it did before, with only a couple of pertinent changes like, obviously, import time.
This is an important part of how the PTR works--when you sync with the PTR, your client downloads a couple billion mappings for files you do not have yet. Then, when you happen to import one of those files, it appears in your importer with its PTR tags 'apparently' already set--in truth, it always had them.
When you feel like playing with some more advanced concepts, turn on help->advanced mode and open a new search page. Change the file domain from 'my files' to 'all known files' or 'deleted from my files' and start typing a common tag--you'll get autocomplete results with counts! You can even run the search, and you'll get a ton of 'non-local' and therefore non-viewable files that are typically given a default hydrus thumbnail. These are files that your client is aware of, but does not currently have. You can run the manage x dialogs and edit the metadata of these ghost files just as you can your real ones. The only thing hydrus ever needs to attach metadata to a file is the file's SHA256 hash.
If you really want to delete the tags or other data for some files you deleted, then:
- If the job is small, do a search for the files inside 'deleted from my local files' (or 'all known files' if you did not leave a deletion record) and then hit
Ctrl+A->manage tags
and manually delete the tags there. - If the job is very large, then make a backup and hit up tags->migrate tags. You can select the tag service x tag mappings for all files in 'deleted from my local files' and then make the action to delete from x again.
- If the job is complicated, then note that you can open the tags->migrate tags dialog from manage tags, and it will only apply to the files that booted manage tags.
Not really. Unless your situation involves millions of richly locally tagged files and a gigantic deleted:kept file ratio, don't worry about it.
"},{"location":"faq.html#does_the_metadata_for_files_i_deleted_mean_there_is_some_kind_of_a_permanent_record_of_which_files_my_client_has_heard_about_andor_seen_directly_even_if_i_purge_the_deletion_record","title":"Does the metadata for files I deleted mean there is some kind of a permanent record of which files my client has heard about and/or seen directly, even if I purge the deletion record?","text":"Yes. I am working on updating the database infrastructure to allow a full purge, but the structure is complicated, so it will take some time. If you are afraid of someone stealing your hard drive and matriculating your sordid MLP collection (or, in this case, the historical log of horrors that you rejected), do some research into drive encryption. Hydrus runs fine off an encrypted disk.
"},{"location":"faq.html#i_just_imported_files_from_my_hard_drive_collection_how_can_i_get_their_tags_from_the_boorus","title":"I just imported files from my hard drive collection. How can I get their tags from the boorus?","text":"The problem of 'what tags should these files have?' is technically difficult to solve, and there isn't a fast and easy way to query a booru and say 'hey, what are your tags for this?', particularly en masse. It is even more difficult to keep up with updates (e.g. someone adding a tag to a file some months or years after it was uploaded). This is the main problem I designed the PTR to solve.
If you cannot or do not want to devote the local resources to sync with the PTR, there are a few hacky ways to perform tag lookups, mostly with manual hash-based lookups. The big boorus support file search based on 'md5' hash, so there are ways to build a workflow where you can 'search' a booru or iqdb for one file at a time to see if there is a hit, and then get tags as if you were downloading it. An old system in the client called 'file lookup scripts' works like this, in the manage tags dialog, and some users have figured out ways to make it work with some clever downloaders.
Be careful with these systems. They tend to be slow and use a lot of resources serverside, so you will be rude if you hit them too hard. They work for a handful of files every now and then, but please do not set up jobs of many many thousands of files, and absolutely do not repeat the job for the same files regularly--you will just waste a lot of CPU and network time for everyone, and only gain a couple of tags in the process. Note that the hash-based lookups only work if your files have not changed since being downloaded; if you have scaled them, stripped metadata, or optimised quality, then they will count as new files and the hashes will have changed, and you will need to think about services like iqdb or saucenao, or ultimately the hydrus duplicate resolution system.
That said, here is a user guide on how to perform various kinds of file lookups.
If you are feeling adventurous, you can also explore the newer AI-tagging tools that users are working on.
Ultimately, though, a good and simple way to backfill your files' tags is just rely on normal downloading workflows. Try downloading your favourite artists (and later set up subscriptions) and you will naturally get files you like, with tags, and if, by (expected) serendipity, a file on the site is the same as one you already imported, hydrus will add the tags to it retroactively.
"},{"location":"faq.html#encryption","title":"Does Hydrus run ok off an encrypted drive partition?","text":"Yes! Both the database and your files should be fine on any of the popular software solutions. These programs give your OS a virtual drive that on my end looks and operates like any other. I have yet to encounter one that SQLite has a problem with. Make sure you don't have auto-dismount set--or at least be hawkish that it will never trigger while hydrus is running--or you could damage your database.
Drive encryption is a good idea for all your private things. If someone steals your laptop or USB stick, it means you only have to deal with frustration and replacement expenses (rather than also a nightmare of anxiety and identity-loss as some bad guy combs through all your things).
If you don't know how drive encryption works, search it up and have a play with a spare USB stick or a small 256MB file partition. Veracrypt is a popular and easy program, but there are several solutions. Get some practice and take it seriously, since if you act foolishly you can really screw yourself (e.g. locking yourself out of the only copy of data you have left because you forgot the password). Make sure you have a good plan, reliable (encrypted) backups, and a password manager.
"},{"location":"faq.html#delays","title":"Why can my friend not see what I just uploaded?","text":"The repositories do not work like conventional search engines; it takes a short but predictable while for changes to propagate to other users.
The client's searches only ever happen over its local cache of what is on the repository. Any changes you make will be delayed for others until their next update occurs. At the moment, the update period is 100,000 seconds, which is about 1 day and 4 hours.
"},{"location":"filetypes.html","title":"Supported Filetypes","text":"This is a list of all filetypes Hydrus can import. Hydrus determines the filetype based on examining the file itself rather than the extension or MIME type.
The filetype for a file can be overridden with
"},{"location":"filetypes.html#images","title":"Images","text":"Filetype Extension MIME type Thumbnails Viewable in Hydrus Notes jpegmanage -> force filetype
in the context menu for a file..jpeg
image/jpeg
\u2705 \u2705 png.png
image/png
\u2705 \u2705 static gif.gif
image/gif
\u2705 \u2705 webp.webp
image/webp
\u2705 \u2705 avif.avif
image/avif
\u2705 \u2705 bitmap.bmp
image/bmp
\u2705 \u2705 heic.heic
image/heic
\u2705 \u2705 heif.heif
image/heif
\u2705 \u2705 icon.ico
image/x-icon
\u2705 \u2705 qoi.qoi
image/qoi
\u2705 \u2705 Quite OK Image Format tiff.tiff
image/tiff
\u2705 \u2705"},{"location":"filetypes.html#animations","title":"Animations","text":"Filetype Extension MIME type Thumbnails Viewable in Hydrus Notes animated gif.gif
image/gif
\u2705 \u2705 apng.apng
image/apng
\u2705 \u2705 animated webp.webp
image/webp
\u2705 \u2705 avif sequence.avifs
image/avif-sequence
\u2705 \u2705 heic sequence.heics
image/heic-sequence
\u2705 \u2705 heif sequence.heifs
image/heif-sequence
\u2705 \u2705 ugoira.zip
application/zip
\u2705 \u26a0\ufe0f More info"},{"location":"filetypes.html#ugoira","title":"Ugoira","text":"Pixiv Ugoira format is a custom animation format used by Pixiv. The Pixiv API provides a list of frame files (normally JPEG or PNG) and their durations. The frames can be stored in a ZIP file along with a JSON file containing the frame and duration information. A zip file containing images with 6 digit zero-padded filenames will be identified as a Ugoira file in hydrus.
If there are no frame durations provided hydrus will assume each frame should last 125ms. Hydrus will look inside the zip for a file called
animation.json
and try to parse it as the 2 most common metadata formats that PixivUtil and gallery-dl generate. The Ugoira file will only have a duration in the database if it contains a validanimation.json
.When played hydrus will first attempt to use the
"},{"location":"filetypes.html#video","title":"Video","text":"Filetype Extension MIME type Thumbnails Viewable in Hydrus Notes mp4animation.json
file, but if that does not exist, it will look for notes containing frame delays. First it looks for a note namedugoira json
and attempts to read it like theanimation.json
, it then looks for a note calledugoira frame delay array
which should be a note containing a simple JSON array, for example:[90, 90, 40, 90]
..mp4
video/mp4
\u2705 \u2705 webm.webm
video/webm
\u2705 \u2705 matroska.mkv
video/x-matroska
\u2705 \u2705 avi.avi
video/x-msvideo
\u2705 \u2705 flv.flv
video/x-flv
\u2705 \u2705 quicktime.mov
video/quicktime
\u2705 \u2705 mpeg.mpeg
video/mpeg
\u2705 \u2705 ogv.ogv
video/ogg
\u2705 \u2705 realvideo.rm
video/vnd.rn-realvideo
\u2705 \u2705 wmv.wmv
video/x-ms-wmv
\u2705 \u2705"},{"location":"filetypes.html#audio","title":"Audio","text":"Filetype Extension MIME type Viewable in Hydrus Notes mp3.mp3
audio/mp3
\u2705 ogg.ogg
audio/ogg
\u2705 flac.flac
audio/flac
\u2705 m4a.m4a
audio/mp4
\u2705 matroska audio.mkv
audio/x-matroska
\u2705 mp4 audio.mp4
audio/mp4
\u2705 realaudio.ra
audio/vnd.rn-realaudio
\u2705 tta.tta
audio/x-tta
\u2705 wave.wav
audio/x-wav
\u2705 wavpack.wv
audio/wavpack
\u2705 wma.wma
audio/x-ms-wma
\u2705"},{"location":"filetypes.html#applications","title":"Applications","text":"Filetype Extension MIME type Thumbnails Viewable in Hydrus Notes flash.swf
application/x-shockwave-flash
\u2705 \u274c pdf.pdf
application/pdf
\u2705 \u274c 300 DPI assumed for resolution. No thumbnails for encrypted PDFs. epub.epub
application/epub+zip
\u274c \u274c djvu.djvu
image/vnd.djvu
\u274c \u274c docx.docx
application/vnd.openxmlformats-officedocument.wordprocessingml.document
\u274c \u274c xlsx.xlsx
application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
\u274c \u274c pptx.pptx
application/vnd.openxmlformats-officedocument.presentationml.presentation
\u2705 \u274c 300 DPI assumed for resolution. Thumbnail only if embedded in the document doc.doc
application/msword
\u274c \u274c xls.xls
application/vnd.ms-excel
\u274c \u274c ppt.ppt
application/vnd.ms-powerpoint
\u274c \u274c rtf.rtf
application/rtf
\u274c \u274c"},{"location":"filetypes.html#image_project_files","title":"Image Project Files","text":"Filetype Extension MIME type Thumbnails Viewable in Hydrus Notes clip.clip
application/clip
1 \u2705 \u274c Clip Studio Paint krita.kra
application/x-krita
\u2705 \u2705 Krita. Hydrus shows the embedded preview image if present in the file. procreate.procreate
application/x-procreate
1 \u2705 \u274c Procreate app psd.psd
image/vnd.adobe.photoshop
\u2705 \u2705 Adobe Photoshop. Hydrus shows the embedded preview image if present in the file. sai2.sai2
application/sai2
1 \u274c \u274c PaintTool SAI2 svg.svg
image/svg+xml
\u2705 \u274c xcf.xcf
application/x-xcf
\u274c \u274c GIMP"},{"location":"filetypes.html#archives","title":"Archives","text":"Filetype Extension MIME type Thumbnails Notes cbz.cbz
application/vnd.comicbook+zip
\u2705 A zip file containing images with incrementing numbers in their filenames will be identified as a cbz file. The code for identifying a cbz file is inhydrus/core/files/HydrusArchiveHandling.py
7z.7z
application/x-7z-compressed
\u274c gzip.gz
application/gzip
\u274c rar.rar
application/vnd.rar
\u274c zip.zip
application/zip
\u274c-
This filetype doesn't have an official or de facto media type, the one listed was made up for Hydrus.\u00a0\u21a9\u21a9\u21a9
This page serves as a checklist or overview for the getting started part of Hydrus. It is recommended to read at least all of the getting started pages, but if you want to head to some specific section directly go ahead and do so.
"},{"location":"gettingStartedOverview.html#the_client","title":"The client","text":"Have a look at getting started with files to get an overview of the Hydrus client.
"},{"location":"gettingStartedOverview.html#local_files","title":"Local files","text":"If you already have many local files, either downloaded by hand or by some other downloader tool, head to the getting started importing section to begin importing them.
"},{"location":"gettingStartedOverview.html#downloading","title":"Downloading","text":"If you want to download with Hydrus, check out getting started with downloading. If you want to add the ability to download from sites not already available in Hydrus by default, check out adding new downloaders for how and a link to a user-maintained archive of downloaders.
"},{"location":"gettingStartedOverview.html#tags_and_ratings","title":"Tags and ratings","text":"If you have imported and/or downloaded some files and want to get started searching and tagging see searching and sorting and getting started with ratings.
It is also worth having a look at siblings for when you want to consolidate different tags that all mean the same thing, common misspellings, or preferential differences into one tag.
Parents are for when you want a tag to always add another tag. Commonly used for characters since you would usually want to add the series they're from too.
"},{"location":"gettingStartedOverview.html#duplicates","title":"Duplicates","text":"Have a lot of very similar looking pictures because of one reason or another? Have a look at duplicates, Hydrus' duplicates finder and filtering tool.
"},{"location":"gettingStartedOverview.html#api","title":"API","text":"Hydrus has an API that lets external tools connect to it. See API for how to turn it on and a list of some of these tools.
"},{"location":"getting_started_downloading.html","title":"Getting started with downloading","text":"The hydrus client has a sophisticated and completely user-customisable download system. It can pull from any booru or regular gallery site or imageboard, and also from some special examples like twitter and tumblr. A single file or URL to massive imports, the downloader can handle it all. A fresh install will by default have support for the bigger sites, but it is possible, with some work, for any user to create a new shareable downloader for a new site.
The downloader is highly parallelisable, and while the default bandwidth rules should stop you from running too hot and downloading so much at once that you annoy the servers you are downloading from, there are no brakes in the program on what you can get.
Danger
It is very important that you take this slow. Many users get overexcited with their new ability to download 500,000 files and then do so, only discovering later that 98% of what they got was junk that they now have to wade through. Figure out what workflows work for you, how fast you process files, what content you actually want, how much bandwidth and hard drive space you have, and prioritise and throttle your incoming downloads to match. If you can realistically only archive/delete filter 50 files a day, there is little benefit to downloading 500 new files a day. START SLOW.
It also takes a decent whack of CPU to import a file. You'll usually never notice this with just one hard drive import going, but if you have twenty different download queues all competing for database access and individual 0.1-second hits of heavy CPU work, you will discover your client starts to judder and lag. Keep it in mind, and you'll figure out what your computer is happy with. I also recommend you try to keep your total loaded files/urls to be under 20,000 to keep things snappy. Remember that you can pause your import queues, if you need to calm things down a bit.
"},{"location":"getting_started_downloading.html#downloader_types","title":"Downloader types","text":"There are a number of different downloader types, each with its own purpose:
URL download Intended for single posts or images. (Works with the API) Gallery For big download jobs such as an artist's catalogue, everything with a given tag on a booru. Subscriptions Repeated gallery jobs, for keeping up to date with an artist or tag. Use gallery downloader to get everything and a subscription to keep updated. Watcher Imageboard thread downloader, such as 4chan, 8chan, and what else exists. (Works with the API) Simple downloader Intended for simple one-off jobs like grabbing all linked images in a page."},{"location":"getting_started_downloading.html#url_download","title":"URL download","text":"The url downloader works like the gallery downloader but does not do searches. You can paste downloadable URLs to it, and it will work through them as one list. Dragging and dropping recognisable URLs onto the client (e.g. from your web browser) will also spawn and use this downloader.
The button next to the input field lets you paste multiple URLs at once such as if you've copied from a document or browser bookmarks. The URLs need to be newline separated.
"},{"location":"getting_started_downloading.html#api","title":"API","text":"If you use API-connected programs such as the Hydrus Companion, then any non-watchable URLs sent to Hydrus through them will end up in an URL downloader page, the specifics depending on the program's settings. You can't use this to force Hydrus to download paged galleries since the URL downloader page doesn't support traversing to the next page, use the gallery downloader for this.
"},{"location":"getting_started_downloading.html#gallery_download","title":"Gallery download","text":"The gallery page can download from multiple sources at the same time. Each entry in the list represents a basic combination of two things:
Source The site you are getting from. Safebooru or Danbooru or Deviant Art or twitter or anywhere else. In the example image this is the button labelledartstation artist lookup
. Query text Something like 'contrapposto' or 'blonde_hair blue_eyes' or an artist name like 'incase'. Whatever is searched on the site to return a list of ordered media. In the example image this is the text field withartist username
in it.So, when you want to start a new download, you first select the source with the button and then type in a query in the text box and hit enter. The download will soon start and fill in information, and thumbnails should stream in, just like the hard drive importer. The downloader typically works by walking through the search's gallery pages one by one, queueing up the found files for later download. There are several intentional delays built into the system, so do not worry if work seems to halt for a little while--you will get a feel for hydrus's 'slow persistent growth' style with experience.
Do a test download now, for fun! Pause its gallery search after a page or two, and then pause the file import queue after a dozen or so files come in.
The thumbnail panel can only show results from one queue at a time, so double-click on an entry to 'highlight' it, which will show its thumbs and also give more detailed info and controls in the 'highlighted query' panel. I encourage you to explore the highlight panel over time, as it can show and do quite a lot. Double-click again to 'clear' it.
It is a good idea to 'test' larger downloads, either by visiting the site itself for that query, or just waiting a bit and reviewing the first files that come in. Just make sure that you are getting what you thought you would, whether that be verifying that the query text is correct or that the site isn't only giving you bloated gifs or other bad quality files. The 'file limit', which stops the gallery search after the set number of files, is also great for limiting fishing expeditions (such as overbroad searches like 'wide_hips', which on the bigger boorus have 100k+ results and return variable quality). If the gallery search runs out of new files before the file limit is hit, the search will naturally stop (and the entry in the list should gain a \u23f9 'stop' symbol).
Note that some sites only serve 25 or 50 pages of results, despite their indices suggesting hundreds. If you notice that one site always bombs out at, say, 500 results, it may be due to a decision on their end. You can usually test this by visiting the pages hydrus tried in your web browser.
In general, particularly when starting out, artist searches are best. They are usually fewer than a thousand files and have fairly uniform quality throughout.
"},{"location":"getting_started_downloading.html#subscriptions","title":"Subscriptions","text":"Let's say you found an artist you like. You downloaded everything of theirs from some site, but every week, one or two new pieces is posted. You'd like to keep up with the new stuff, but you don't want to manually make a new download job every week for every single artist you like.
Subscriptions are a way to automatically recheck a good query in future, to keep up with new files. Many users come to use them. You set up a number of saved queries, and the client will 'sync' with the latest files in the gallery and download anything new, just as if you were running the download yourself.
Subscriptions only work for booru-like galleries that put the newest files first, and they only keep up with new content--once they have done their first sync, which usually gets the most recent hundred files or so, they will never reach further into the past. Getting older files, as you will see later, is a job best done with a normal download page.
Note
The entire subscription system assumes the source is a typical 'newest first' booru-style search. If you dick around with some order_by:rating/random metatag, it will not work reliably.
It is important to note that while subscriptions can have multiple queries (even hundreds!), they generally only work on one site. Expect to create one subscription for safebooru, one for artstation, one for paheal, and so on for every site you care about. Advanced users may be able to think of ways to get around this, but I recommend against it as it throws off some of the internal check timing calculations.
"},{"location":"getting_started_downloading.html#setting_up_subscriptions","title":"Setting up subscriptions","text":"Here's the dialog, which is under network->manage subscriptions:
This is a very simple example--there is only one subscription, for safebooru. It has two 'queries' (i.e. searches to keep up with).
Before we trip over the advanced buttons here, let's zoom in on the actual subscription:
Danger
Do not change the max number of new files options until you know exactly what they do and have a good reason to alter them!
This is a big and powerful panel! I recommend you open the screenshot up in a new browser tab, or in the actual client, so you can refer to it.
Despite all the controls, the basic idea is simple: Up top, I have selected the 'safebooru tag search' download source, and then I have added two artists--\"hong_soon-jae\" and \"houtengeki\". These two queries have their own panels for reviewing what URLs they have worked on and further customising their behaviour, but all they really are is little bits of search text. When the subscription runs, it will put the given search text into the given download source just as if you were running the regular downloader.
Warning
Subscriptions syncs are somewhat fragile. Do not try to play with the limits or checker options to download a whole 5,000 file query in one go--if you want everything for a query, run it in the manual downloader and get everything, then set up a normal sub for new stuff. There is no benefit to having a 'large' subscription, and it will trim itself down in time anyway.
You might want to put subscriptions off until you are more comfortable with galleries. There is more help here.
"},{"location":"getting_started_downloading.html#watchers","title":"Watchers","text":"If you are an imageboard user, try going to a thread you like and drag-and-drop its URL (straight from your web browser's address bar) onto the hydrus client. It should open up a new 'watcher' page and import the thread's files!
With only one URL to check, watchers are a little simpler than gallery searches, but as that page is likely receiving frequent updates, it checks it over and over until it dies. By default, the watcher's 'checker options' will regulate how quickly it checks based on the speed at which new files are coming in--if a thread is fast, it will check frequently; if it is running slow, it may only check once per day. When a thread falls below a critical posting velocity or 404s, checking stops.
In general, you can leave the checker options alone, but you might like to revisit them if you are always visiting faster or slower boards and find you are missing files or getting DEAD too early.
"},{"location":"getting_started_downloading.html#api_1","title":"API","text":"If you use API-connected programs such as the Hydrus Companion, then any watchable URLs sent to Hydrus through them will end up in a watcher page, the specifics depending on the program's settings.
"},{"location":"getting_started_downloading.html#simple_downloader","title":"Simple downloader","text":"The simple downloader will do very simple parsing for unusual jobs. If you want to download all the images in a page, or all the image link destinations, this is the one to use. There are several default parsing rules to choose from, and if you learn the downloader system yourself, it will be easy to make more.
"},{"location":"getting_started_downloading.html#import_options","title":"Import options","text":"Every importer in Hydrus has some 'import options' that change what is allowed, what is blacklisted, and whether tags or notes should be saved.
In previous versions these were split into completely different windows called
file import options
andtag import options
so if you see those anywhere, this is what they're talking about and not some hidden menu anywhere.Importers that download from websites rely on a flexible 'defaults' system, so you do not have to set them up every time you start a new downloader. While you should play around with your import options, once you know what works for you, you should set that as the default under network->downloaders->manage default import options. You can set them for all file posts generally, all watchers, and for specific sites as well.
"},{"location":"getting_started_downloading.html#file_import_options","title":"File import options","text":"This deals with the files being downloaded and what should happen to them. There's a few more tickboxes if you turn on advanced mode.
pre-import checks Pretty self-explanatory for the most part. If you want to redownload previously deleted files turning offexclude previously deleted files
will have Hydrus ignore deletion status. A few of the options have more information if you hover over them. import destinations See multiple file services, an advanced feature. post import actions See the files section on filtering for the first option, the other two have information if you hover over them."},{"location":"getting_started_downloading.html#tag_parsing","title":"Tag Parsing","text":"By default, hydrus now starts with a local tag service called 'downloader tags' and it will parse (get) all the tags from normal gallery sites and put them in this service. You don't have to do anything, you will get some decent tags. As you use the client, you will figure out which tags you like and where you want them. On the downloader page, click
import options
:This is an important dialog, although you will not need to use it much. It governs which tags are parsed and where they go. To keep things easy to manage, a new downloader will refer to the 'default' tag import options for a website, but for now let's set some values just for this downloader:
You can see that each tag service on your client has a separate section. If you add the PTR, that will get a new box too. A new client is set to get all tags for 'downloader tags' service. Things can get much more complicated. Have a play around with the options here as you figure things out. Most of the controls have tooltips or longer explainers in sub-dialogs, so don't be afraid to try things.
It is easy to get tens of thousands of tags by downloading this way. Different sites offer different kinds and qualities of tags, and the client's downloaders (which were designed by me, the dev, or a user) may parse all or only some of them. Many users like to just get everything on offer, but others only ever want, say,
creator
,series
, andcharacter
tags. If you feel brave, click that 'all tags' button, which will take you into hydrus's advanced 'tag filter', which allows you to select which of the incoming list of tags will be added.The blacklist button will let you skip downloading files that have certain tags (perhaps you would like to auto-skip all images with
gore
,scat
, ordiaper
?), again using the tag filter, while the whitelist enables you to only allow files that have at least one of a set of tags. The 'additional tags' adds some fixed personal tags to all files coming in--for instance, you might like to add 'process into favourites' to your 'my tags' for some query you really like so you can find those files again later and process them separately. That little 'cog' icon button can also do some advanced things.Warning
The file limit and import options on the upper panel of a gallery or watcher page, if changed, will only apply to new queries. If you want to change the options for an existing queue, either do so on its highlight panel below or use the 'set options to queries' button.
"},{"location":"getting_started_downloading.html#force_page_fetch","title":"Force Page Fetch","text":"By default, hydrus will not revisit web pages or API endpoints for URLs it knows A) refer to one known file only, and B) that file is already in your database or has previously been deleted. The way it navigates this can be a complicated mix of hash and URL data, and in certain logical situations hydrus will determine its own records are untrustworthy and decide to check the source again. This saves bandwidth and time as you run successive queries that include the same results. You should not disable the capability for normal operation.
But if you mess up your tag import options somewhere and need to re-run a download with forced tag re-fetching, how to do it?
At the moment, this is in tag import options, the
"},{"location":"getting_started_downloading.html#note_parsing","title":"Note Parsing","text":"force page fetch even if...
checkboxes. You can either set up a one-time downloader page with specific tag import options that check both of these checkboxes and then paste URLs in, or you can right-click a selection of thumbnails and have hydrus create the page for you under the urls->force metadata refetch menu. Once you are done with the downloader page, delete it and do not use it for normal jobs--again, this method of downloading is inefficient and should not be used for repeating, long-term, or speculative jobs. Only use it to fill in specific holes.Hydrus alsos parse 'notes' from some sites. This is a young feature, and a little advanced at times, but it generally means the comments that artists leave on certain gallery sites, or something like a tweet text. Notes are editable by you and appear in a hovering window on the right side of the media viewer.
Most of the controls here ensure that successive parses do not duplicate existing notes. The default settings are fine for all normal purposes, and you can leave them alone unless you know you want something special (e.g. turning note parsing off completely).
"},{"location":"getting_started_downloading.html#bandwidth","title":"Bandwidth","text":"It will not be too long until you see a \"bandwidth free in xxxxx...\" message. As a long-term storage solution, hydrus is designed to be polite in its downloading--both to the source server and your computer. The client's default bandwidth rules have some caps to stop big mistakes, spread out larger jobs, and at a bare minimum, no domain will be hit more than once a second.
All the bandwidth rules are completely customisable and are found in
network > data > review bandwidth usage and edit rules
. They can get quite complicated. I strongly recommend you not look for them until you have more experience. I especially strongly recommend you not ever turn them all off, thinking that will improve something, as you'll probably render the client too laggy to function and get yourself an IP ban from the next server you pull from.If you want to download 10,000 files, set up the queue and let it work. The client will take breaks, likely even to the next day, but it will get there in time. Many users like to leave their clients on all the time, just running in the background, which makes these sorts of downloads a breeze--you check back in the evening and discover your download queues, watchers, and subscriptions have given you another thousand things to deal with.
Again: the real problem with downloading is not finding new things, it is keeping up with what you get. Start slow and figure out what is important to your bandwidth budget, hard drive budget, and free time budget. Almost everyone fails at this.
"},{"location":"getting_started_downloading.html#logins","title":"Logins","text":"The client now supports a flexible (but slightly prototype and ugly) login system. It can handle simple sites and is as completely user-customisable as the downloader system. The client starts with multiple login scripts by default, which you can review under network->logins->manage logins:
Many sites grant all their content without you having to log in at all, but others require it for NSFW or special content, or you may wish to take advantage of site-side user preferences like personal blacklists. If you wish, you can give hydrus some login details here, and it will try to login--just as a browser would--before it downloads anything from that domain.
Warning
For multiple reasons, I do not recommend you use important accounts with hydrus. Use a throwaway account you don't care much about.
To start using a login script, select the domain and click 'edit credentials'. You'll put in your username/password, and then 'activate' the login for the domain, and that should be it! The next time you try to get something from that site, the first request will wait (usually about ten seconds) while a login popup performs the login. Most logins last for about thirty days (and many refresh that 30-day timer every time you make a new request), so once you are set up, you usually never notice it again, especially if you have a subscription on the domain.
Most sites only have one way of logging in, but hydrus does support more. Hentai Foundry is a good example--by default, the client performs the 'click-through' login as a guest, which requires no credentials and means any hydrus client can get any content from the start. But this way of logging in only lasts about 60 minutes or so before having to be refreshed, and it does not hide any spicy stuff, so if you use HF a lot, I recommend you create a throwaway account, set the filters you like in your HF profile (e.g. no guro content), and then click the 'change login script' in the client to the proper username/pass login.
The login system is not very clever. Don't try to pull off anything too weird with it! If anything goes wrong, it will likely delay the script (and hence the whole domain) from working for a while, or invalidate it entirely. If the error is something simple, like a password typo or current server maintenance, go back to this dialog to fix and scrub the error and try again. If the site just changed its layout, you may need to update the login script. If it is more complicated, please contact me, hydrus_dev, with the details!
If you would like to login to a site that is not yet supported by hydrus (usually ones with a Captcha in the login page), you have two options:
- Get a web browser add-on that lets you export a cookies.txt (either for the whole browser or just for that domain) and then drag and drop that cookies.txt file onto the hydrus network->data->review session cookies dialog. This sometimes does not work if your add-on's export formatting is unusual. If it does work, hydrus will import and use those cookies, which skips the login by making your hydrus pretend to be your browser directly. This is obviously advanced and hacky, so if you need to do it, let me know how you get on and what tools you find work best!
- Use Hydrus Companion browser add-on to do the same basic thing automatically.
Boorus are usually easy to parse from, and there are many hydrus downloaders available that work well. Other sites are less easy to download from. Some will purposefully disguise access behind captchas or difficult login tokens that the hydrus downloader just isn't clever enough to handle. In these cases, it can be best just to go to an external downloader program that is specially tuned for these complex sites.
It takes a bit of time to set up these sorts of programs--and if you get into them, you'll likely want to make a script to help automate their use--but if you know they solve your problem, it is well worth it!
- yt-dlp - This is an excellent video downloader that can download from hundreds of different websites. Learn how it works, it is useful for all sorts of things!
- gallery-dl - This is an excellent image and small-vid downloader that works for pretty much any booru and many larger/professional gallery sites, particularly when those sites need logins. Check the documentation, since you may be able to get it to rip cookies right out of your firefox, or you can give it your actual user/password for many sites and it'll handle all the login for you.
- imgbrd-grabber - Another excellent, mostly booru downloader, with an UI. You can export some metadata to filenames, which you might like to then suck up with hydrus filename-import-parsing.
With these tools, used manually and/or with some scripts you set up, you may be able to set up a regular import workflow to hydrus (especilly with an
Import Folder
as under thefile
menu) and get most of what you would with an internal downloader. Some things like known URLs and tag parsing may be limited or non-existant, but it is better than nothing, and if you only need to do it for a couple sources on a couple sites every month, you can fill in the most of the gap manually yourself.Hydev is planning to roll yt-dlp and gallery-dl support into the program natively in a future update of the downloader engine.
"},{"location":"getting_started_files.html","title":"Getting started with files","text":"Warning
Hydrus can be powerful, and you control everything. By default, you are not connected to any servers and absolutely nothing is shared with other users--and you can't accidentally one-click your way to exposing your whole collection--but if you tag private files with real names and click to upload that data to a tag repository that other people have access to, the program won't try to stop you. If you want to do private sexy slideshows of your shy wife, that's great, but think twice before you upload files or tags anywhere, particularly as you learn. It is impossible to contain leaks of private information.
There are no limits and few brakes on your behaviour. It is possible to import millions of files. For many new users, their first mistake is downloading too much too fast in overexcitement and becoming overwhelmed. Take things slow and figure out good processing workflows that work for your schedule before you start adding 500 subscriptions.
"},{"location":"getting_started_files.html#the_problem","title":"The problem","text":"If you have ever seen something like this--
--then you already know the problem: using a filesystem to manage a lot of images sucks.
Finding the right picture quickly can be difficult. Finding everything by a particular artist at a particular resolution is unthinkable. Integrating new files into the whole nested-folder mess is a further pain, and most operating systems bug out when displaying 10,000+ thumbnails.
"},{"location":"getting_started_files.html#the_client","title":"The client","text":"Let's first focus on importing files.
When you first boot the client, you will see a blank page. There are no files in the database and so there is nothing to search. To get started, I suggest you simply drag-and-drop a folder with a hundred or so images onto the main window. A dialog will appear affirming what you want to import. Ok that, and a new page will open. Thumbnails will stream in as the software processes each file.
The files are being imported into the client's database. The client discards their filenames.
Notice your original folder and its files are untouched. You can move the originals somewhere else, delete them, and the client will still return searches fine. In the same way, you can delete from the client, and the original files will remain unchanged--import is a copy, not a move, operation. The client performs all its operations on its internal database, which holds copies of the files it imports. If you find yourself enjoying using the client and decide to completely switch over, you can delete the original files you import without worry. You can always export them back again later.
FAQ: can the client manage files from their original locations?
Now:
- Click on a thumbnail; it'll show in the preview screen, bottom left.
- Double- or middle-click the thumbnail to open the media viewer. You can hit F to switch between giving the fullscreen a frame or not. You can use your scrollwheel or page up/down to browse the media and ctrl+scrollwheel to zoom in and out.
-
Move your mouse to the top-left, top-middle and top-right of the media viewer. You should see some 'hover' panels pop into place.
The one on the left is for tags, the middle is for browsing and zoom commands, and the right is for status and ratings icons. You will learn more about these things as you get more experience with the program.
-
Press Enter or double/middle-click again to close the media viewer.
- You can quickly select multiple files by shift- or ctrl- clicking. Notice how the status bar at the bottom of the screen updates with the number selected and their total size. Right-clicking your selection will present another summary and many actions.
- You can quickly copy-export files out of the client by drag-and-dropping to a folder or external program, including web browser upload boxes. Discord support may need a special checkbox set under options->exporting.
- Hit F9 to bring up a new page chooser. You can navigate it with the arrow keys, your numpad, or your mouse.
-
On the left of a normal search page is a text box. When it is focused, a dropdown window appears. It looks like this:
This is where you enter the predicates that define the current search. If the text box is empty, the dropdown will show 'system' tags that let you search by file metadata such as file size or animation duration. To select one, press the up or down arrow keys and then enter, or double click with the mouse.
When you have some tags in your database, typing in the text box will search them:
The (number) shows how many files have that tag, and hence how large the search result will be if you select that tag.
Clicking 'searching immediately' will pause the searcher, letting you add several tags in a row without sending it off to get results immediately. Ignore the other buttons for now--you will figure them out as you gain experience with the program.
-
You can remove from the list of 'active tags' in the box above with a double-click, or by entering the exact same tag again through the dropdown.
- Play with the system tags more if you like, and the sort-by dropdown. The collect-by dropdown is advanced, so wait until you understand namespaces before expecting it to do anything.
- To close a page, middle-click its tab.
Hydrus supports many filetypes. A full list can be viewed on the Supported Filetypes page.
Although some support is imperfect for the complicated filetypes. For the Windows and Linux built releases, hydrus now embeds an MPV player for video, audio and gifs, which provides smooth playback and audio, but some other environments may not support MPV and so will default when possible to the native hydrus software renderer, which does not support audio. When something does not render how you want, right-clicking on its thumbnail presents the option 'open externally', which will open the file in the appropriate default program (e.g. ACDSee, VLC).
The client can also download files from several websites, including 4chan and other imageboards, many boorus, and gallery sites like deviant art and hentai foundry. You will learn more about this later.
"},{"location":"getting_started_files.html#inbox_and_archive","title":"Inbox and archive","text":"The client sends newly imported files to an inbox, just like your email. Inbox acts like a tag, matched by 'system:inbox'. A small envelope icon is drawn in the top corner of all inbox files:
If you are sure you want to keep a file long-term, you should archive it, which will remove it from the inbox. You can archive from your selected thumbnails' right-click menu, or by pressing F7. If you make a mistake, you can spam Ctrl+Z for undo or hit Shift+F7 on any set of files to explicitly return them to the inbox.
Anything you do not want to keep should be deleted by selecting from the right-click menu or by hitting the delete key. Deleted files are sent to the trash. They will get a little trash icon:
A trashed file will not appear in subsequent normal searches, although you can search the trash specifically by clicking the 'my files' button on the autocomplete dropdown and changing the file domain to 'trash'. Undeleting a file (Shift+Del) will return it to 'my files' as if nothing had happened. Files that remain in the trash will be permanently deleted, usually after a few days. You can change the permanent deletion behaviour in the client's options.
A quick way of processing new files is\u2013
"},{"location":"getting_started_files.html#filtering_your_inbox","title":"Filtering your inbox","text":"Lets say you just downloaded a good thread, or perhaps you just imported an old folder of miscellany. You now have a whole bunch of files in your inbox--some good, some awful. You probably want to quickly go through them, saying yes, yes, yes, no, yes, no, no, yes, where yes means 'keep and archive' and no means 'delete this trash'. Filtering is the solution.
Select some thumbnails, and either choose filter->archive/delete from the right-click menu or hit F12. You will see them in a special version of the media viewer, with the following default controls:
- Left Button or F7: keep and archive the file, move on
- Right Button or Del: delete the file, move on
- Up: Skip this file, move on
- Middle Button or Backspace: I didn't mean that, go back one
- Esc, Enter, or F12: stop filtering now
Your choices will not be committed until you finish filtering.
This saves time.
"},{"location":"getting_started_files.html#what_hydrus_is_for","title":"What Hydrus is for","text":"The hydrus client's workflows are not designed for half-finished files that you are still working on. Think of it as a giant archive for everything excellent you have decided to store away. It lets you find and remember these things quickly.
In general, Hydrus is good for individual files like you commonly find on imageboards or boorus. Although advanced users can cobble together some page-tag-based solutions, it is not yet great for multi-file media like comics and definitely not as a typical playlist-based music player.
If you are looking for a comic manager to supplement hydrus, check out this user-made guide to other archiving software here!
And although the client can hold millions of files, it starts to creak and chug when displaying or otherwise tracking more than about 40,000 or so in a single gui window. As you learn to use it, please try not to let your download queues or general search pages regularly sit at more than 40 or 50k total items, or you'll start to slow other things down. Another common mistake is to leave one large 'system:everything' or 'system:inbox' page open with 70k+ files. For these sorts of 'ongoing processing' pages, try adding a 'system:limit=256' to keep them snappy. One user mentioned he had regular gui hangs of thirty seconds or so, and when we looked into it, it turned out his handful of download pages had three million files queued up! Just try and take things slow until you figure out what your computer's limits are.
"},{"location":"getting_started_importing.html","title":"Importing and exporting","text":"By now you should have launched Hydrus. If you're like most new users you probably already have a fair bit of images or other media files that you're looking at getting organised.
Note
If you're planning to import or export a large amount of files it's recommended to use the automated folders since Hydrus can have trouble dealing with large, single jobs. Splitting them up in this manner will make it much easier on the program.
"},{"location":"getting_started_importing.html#importing_files","title":"Importing files","text":"Navigate to
file -> import files
in the toolbar. OR Drag-and-drop one or more folders or files into Hydrus.This will open the
import files
window. Here you can add files or folders, or delete files from the import queue. Let Hydrus parse what it will update and then look over the options. By default the option to delete original files after succesful import (if it's ignored for any reason or already present in Hydrus for example) is not checked, activate on your own risk. Infile import options
you can find some settings for minimum and maximum file size, resolution, and whether to import previously deleted files or not.From here there's two options:
import now
which will just import as is, andadd tags before import >>
which lets you set up some rules to add tags to files on import. Examples are keeping filename as a tag, add folders as tag (useful if you have some sort of folder based organisation scheme), or load tags from an accompanying text file generated by some other program.Once you're done click apply (or
"},{"location":"getting_started_importing.html#exporting_files","title":"Exporting files","text":"import now
) and Hydrus will start processing the files. Exact duplicates are not imported so if you had dupes spread out you will end up with only one file in the end. If files look similar but Hydrus imports both then that's a job for the dupe filter as there is some difference even if you can't tell it by eye. A common one is compression giving files with different file sizes, but otherwise looking identical or files with extra meta data baked into them.If you want to share your files then export is the way to go. Basic way is to mark the files in Hydrus, dragging from there and dropping the files where you want them. You can also copy files or use export files to, well, export your files to a select location. All (or at least most) non-drag'n'drop export options can be found on right-clicking the select files and going down
"},{"location":"getting_started_importing.html#dragndrop","title":"Drag'n'drop","text":"share
and then eithercopy
orexport
.Just dragging from the thumbnail view will export (copy) all the selected files to wherever you drop them. You can also start a drag and drop for single files from the media viewer using this arrow button on the top hover window:
If you want to drag and drop to discord, check the special BUGFIX option under
options > gui
. You also find a filename pattern setting for that drag and drop here.By default, the files will be named by their ugly hexadecimal hash, which is how they are stored inside the database.
If you use a drag and drop to open a file inside an image editing program, remember to hit 'save as' and give it a new filename in a new location! The client does not expect files inside its db directory to ever change.
"},{"location":"getting_started_importing.html#copy","title":"Copy","text":"You can also copy the files by right-clicking and going down
"},{"location":"getting_started_importing.html#export","title":"Export","text":"share -> copy -> files
and then pasting the files where you want them.You can also export files with tags, either in filename or as a sidecar file by right-clicking and going down
share -> export -> files
. Have a look at the settings and then pressexport
. You can create folders to export files into by using backslashes on Windows (\\
) and slashes on Linux (/
) in the filename. This can be combined with the patterns listed in the pattern shortcut button dropdown. As example[series]\\{filehash}
will export files into folders named after theseries:
namespaced tags on the files, all files tagged with one series goes into one folder, files tagged with another series goes into another folder as seen in the image below.Clicking the
pattern shortcuts
button gives you an overview of available patterns.The EXPERIMENTAL option is only available under advanced mode, use at your own risk.
"},{"location":"getting_started_importing.html#automation","title":"Automation","text":"Under
"},{"location":"getting_started_importing.html#import_folders","title":"Import folders","text":"file -> import and export folders
you'll find options for setting up automated import and export folders that can run on a schedule. Both have a fair deal of options and rules you can set so look them over carefully.Like with a manual import, if you wish you can import tags by parsing filenames or loading sidecars.
"},{"location":"getting_started_importing.html#export_folders","title":"Export folders","text":"Like with manual export, you can set the filenames using a tag pattern, and you can export to sidecars too.
"},{"location":"getting_started_importing.html#importing_and_exporting_tags","title":"Importing and exporting tags","text":"While you can import and export tags together with images sometimes you just don't want to deal with the files.
Going to
"},{"location":"getting_started_installing.html","title":"Installing and Updating","text":"tags -> migrate tags
you get a window that lets you deal with just tags. One of the options here is what's called a Hydrus Tag Archive, a file containing the hash <-> tag mappings for the files and tags matching the query.If any of this is confusing, a simpler guide is here, and some video guides are here!
"},{"location":"getting_started_installing.html#downloading","title":"Downloading","text":"You can get the latest release at the github releases page.
I try to release a new version every Wednesday by 8pm EST and write an accompanying post on my tumblr and a Hydrus Network General thread on 8chan.moe /t/.
"},{"location":"getting_started_installing.html#installing","title":"Installing","text":"WindowsmacOSLinuxDockerFrom Source- If you want the easy solution, download the .exe installer. Run it, hit ok several times.
- If you know what you are doing and want a little more control, get the .zip. Don't extract it to Program Files unless you are willing to run it as administrator every time (it stores all its user data inside its own folder). You probably want something like D:\\hydrus.
- If you run <Win10, you need Visual C++ Redistributable for Visual Studio 2015 if you don't already have it for vidya.
- If you use Windows 10 N (a version of Windows without some media playback features), you will likely need the 'Media Feature Pack'. There have been several versions of this, so it may best found by searching for the latest version or hitting Windows Update, but otherwise check here.
- If you run Win7, you cannot run Qt6 programs, so you cannot run the official executable release. You have options by running from source.
- Third parties (not maintained by Hydrus Developer):
- Chocolatey
- Scoop (
hydrus-network
in the 'Extras' bucket) - Winget. The command is
winget install --id=HydrusNetwork.HydrusNetwork -e --location \"\\PATH\\TO\\INSTALL\\HERE\"
, which can, if you know what you are doing, bewinget install --id=HydrusNetwork.HydrusNetwork -e --location \".\\\"
, maybe rolled into a batch file. - User guide for Anaconda
- Get the .dmg App. Open it, drag it to Applications, and check the readme inside.
- macOS users have no mpv support for now, so no audio, and video may be laggy.
- This release has always been a little buggy. Many macOS users are having better success running from source.
Wayland
Unfortunately, hydrus has several bad bugs in Wayland. The mpv window will often not embed properly into the media viewer, menus and windows may position on the wrong screen, and the taskbar icon may not work at all. Running from source may improve the situation, but some of these issues seem to be intractable for now. X11 is much happier with hydrus.
One user notes that launching with the environment variable
QT_QPA_PLATFORM=xcb
may help!XCB Qt compatibility
If you run into trouble running Qt6, usually with an XCB-related error like
qt.qpa.plugin: Could not load the Qt platform plugin \"xcb\" in \"\" even though it was found.
, try installing the packageslibicu-dev
andlibxcb-cursor-dev
. Withapt
that will be:sudo apt-get install libicu-dev
sudo apt-get install libxcb-cursor-dev
- Get the .tag.gz. Extract it somewhere useful and create shortcuts to 'client' and 'server' as you like. The build is made on Ubuntu, so if you run something else, compatibility is hit and miss.
- If you have problems running the Ubuntu build, running from source is usually an improvement, and it is easy to set up these days.
- You might need to get 'libmpv1' to get mpv working and playing video/audio. This is the mpv library, not the necessarily the player. Check help->about to see if it is available--if not, see if you can get it like so:
apt-get install libmpv1
- Use options->media to set your audio/video/animations to 'show using mpv' once you have it installed.
- If the about window provides you an mpv error popup like this:
Then please do this:OSError: /lib/x86_64-linux-gnu/libgio-2.0.so.0: undefined symbol: g_module_open_full\n(traceback)\npyimod04_ctypes.install.<locals>.PyInstallerImportError: Failed to load dynlib/dll 'libmpv.so.1'. Most likely this dynlib/dll was not found when the application was frozen.\n
- Search your /usr/ dir for
libgmodule*
. You are looking for something likelibgmodule-2.0.so
. Users report finding it in/usr/lib64/
and/usr/lib/x86_64-linux-gnu
. - Copy that .so file to the hydrus install base directory.
- Boot the client and hit help->about to see if it reports a version.
- If it all seems good, hit options->media to set up mpv as your player for video/audio and try to view some things.
- If it still doesn't work, see if you can do the same for libmpv.so and libcdio.so--or consider running from source
- Search your /usr/ dir for
- You can also try running the Windows version in wine.
- Third parties (not maintained by Hydrus Developer):
- (These both run from source, so if you have trouble with the built release, they may work better for you!)
- AUR package - Although please note that since AUR packages work off your system python, this has been known to cause issues when Arch suddenly updates to the latest Qt or something before we have had a chance to test things and it breaks hydrus. If you can, try just running from source yourself instead, where we can control things better!
- flatpak
- A rudimentary documentation for the container setup can be found here.
- You can also run from source. This is often the best way to fix compatibility problems, and it is the most pleasant way to run and update the program (you can update in five seconds!), although it requires a bit more work to set up the first time. It is not too complicated to do, though--my guide will walk you through each step.
By default, hydrus stores all its data\u2014options, files, subscriptions, everything\u2014entirely inside its own directory. You can extract it to a usb stick, move it from one place to another, have multiple installs for multiple purposes, wrap it all up inside a truecrypt volume, whatever you like. The .exe installer writes some unavoidable uninstall registry stuff to Windows, but the 'installed' client itself will run fine if you manually move it.
Bad Locations
Do not install to a network location! (i.e. on a different computer's hard drive) The SQLite database is sensitive to interruption and requires good file locking, which network interfaces often fake. There are ways of splitting your client up so the database is on a local SSD but the files are on a network--this is fine--but you really should not put the database on a remote machine unless you know what you are doing and have a backup in case things go wrong.
Do not install to a location with filesystem-level compression enabled! (e.g. BTRFS) It may work ok to start, but when the SQLite database grows to large size, this can cause extreme access latency and I/O errors and corruption.
For macOS users
The Hydrus App is non-portable and puts your database in
"},{"location":"getting_started_installing.html#anti_virus","title":"Anti-virus","text":"~/Library/Hydrus
(i.e./Users/[You]/Library/Hydrus
). You can update simply by replacing the old App with the new, but if you wish to backup, you should be looking at~/Library/Hydrus
, not the App itself.Hydrus is made by an Anon out of duct tape and string. It combines file parsing tech with lots of network and database code in unusual and powerful ways, and all through a hacked-together executable that isn't signed by any big official company.
Unfortunately, we have been hit by anti-virus false positives throughout development. Every few months, one or more of the larger anti-virus programs sees some code that looks like something bad, or they run the program in a testbed and don't like something it does, and then they quarantine it. Every single instance of this so far has been a false positive. They usually go away the next week or two when the next set of definitions roll out. Some hydrus users are kind enough to report the program as a false positive to the anti-virus companies themselves, which also helps here.
Some users have never had the problem, some get hit regularly. The situation is obviously worse on Windows. If you try to extract the zip and hydrus_client.exe or the whole folder suddenly disappears, please check your anti-virus software.
I am interested in reports about these false-positives, just so I know what is going on. Sometimes I have been able to reduce problems by changing something in the build (one of these was, no shit, an anti-virus testbed running the installer and then opening the help html at the end, which launched Edge browser, which then triggered Windows Update, which hit UAC and was considered suspicious. I took out the 'open help' checkbox from the installer as a result).
You should be careful about random software online. For my part, the program is completely open source, and I have a long track record of designing it with privacy foremost. There is no intentional spyware of any sort--the program never connects to another computer unless you tell it to. Furthermore, the exe you download is now built on github's cloud, so there are very few worries about a trojan-infected build environment putting something I did not intend into the program (as there once were when I built the release on my home machine). That doesn't stop Windows Defender from sometimes calling it an ugly name like \"Tedy.4675\" and definitively declaring \"This program is dangerous and executes commands from an attacker\" but that's the modern anti-virus ecosystem.
There aren't excellent solutions to this problem. I don't like to say 'just exclude the program directory from your anti-virus settings', but some users are comfortable with this and say it works fine. One thing I do know that helps (with other things too), if you are using the default Windows Defender, is going into the Windows Security shield icon on your taskbar, and 'virus and threat protection' and then 'virus and threat protection settings', and turning off 'Cloud-delivered protection' and 'Automatic sample submission'. It seems with these on, Windows will talk with a central server about executables you run and download early updates, and this gives a lot of false positives.
If you are still concerned, please feel free to run from source, as above. You are controlling everything, then, and can change anything about the program you like. Or you can only run releases from four weeks ago, since you know the community would notice by then if there ever were a true positive. Or just run it in a sandbox and watch its network traffic.
In 2022 I am going to explore a different build process to see if that reduces the false positives. We currently make the executable with PyInstaller, which has some odd environment set-up the anti-virus testbeds don't seem to like, and perhaps PyOxidizer will be better. We'll see.
"},{"location":"getting_started_installing.html#running","title":"Running","text":"To run the client:
WindowsmacOSLinux- For the installer, run the Start menu shortcut it added.
- For the extract, run 'hydrus_client.exe' in the base directory, or make a shortcut to it.
- Run the App you installed.
- Run the 'client' executable in the base directory. You may be able to double-click it, otherwise you are running
./client
from the terminal. - If you experience virtual memory crashes, please review this thorough guide by a user.
Warning
Hydrus is imageboard-tier software, wild and fun--but also unprofessional. It is written by one Anon spinning a lot of plates. Mistakes happen from time to time, usually in the update process. There are also no training wheels to stop you from accidentally overwriting your whole db if you screw around. Be careful when updating. Make backups beforehand!
Hydrus does not auto-update. It will stay the same version unless you download and install a new one.
Although I put out a new version every week, you can update far less often if you prefer. The client keeps to itself, so if it does exactly what you want and a new version does nothing you care about, you can just leave it. Other users enjoy updating every week, simply because it makes for a nice schedule. Others like to stay a week or two behind what is current, just in case I mess up and cause a temporary bug in something they like.
The update process:
- If the client is running, close it!
- If you maintain a backup, run it now!
- Update your install:
- If you use the installer, just download the new installer and run it. It should detect where the last install was and overwrite everything automatically.
- If you use the extract, then just extract the new version right on top of your current install and overwrite manually. It is wise to extract it straight from the archive to your install folder.
- If you use the macOS App, just drag and drop from the dmg to your Applications as normal.
- If you run from source, then run
git pull
as normal.
- Start your client or server. It may take a few minutes to update its database. I will say in the release post if it is likely to take longer.
A user has written a longer and more formal guide to updating here.
Be extremely careful making test runs of the Extract releaseDo not test-run the extract before copying it over your install! Running the program anywhere will create database files in the /db/ dir, and if you then copy that once-run folder on top of your real install, you will overwrite your real database! Of course it doesn't really matter, because you made a full backup before you started, right? :^)
If you need to perform tests of an update, make sure you have a good backup before you start and then remember to delete any functional test extracts before extracting from the original archive once more for the actual 'install'.
Several older versions, like 334, 526, and 570 have special update instructions.
Unless the update specifically disables or reconfigures something, all your files and tags and settings will be remembered after the update.
Releases typically need to update your database to their version. New releases can retroactively perform older database updates, so if the new version is v255 but your database is on v250, you generally only need to get the v255 release, and it'll do all the intervening v250->v251, v251->v252, etc... update steps in order as soon as you boot it. If you need to update from a release more than, say, ten versions older than current, see below. You might also like to skim the release posts or changelog to see what is new.
Clients and servers of different versions can usually connect to one another, but from time to time, I make a change to the network protocol, and you will get polite error messages if you try to connect to a newer server with an older client or vice versa. There is still no need to update the client--it'll still do local stuff like searching for files completely fine. Read my release posts and judge for yourself what you want to do.
"},{"location":"getting_started_installing.html#clean_installs","title":"Clean installs","text":"This is usually only relevant if you use the extract release and have a dll conflict or otherwise update and cannot boot at all. A handful of hydrus updates through its history have needed this.
Very rarely, hydrus needs a clean install. This can be due to a special update like when we moved from 32-bit to 64-bit or needing to otherwise 'reset' a custom install situation. The problem is usually that a library file has been renamed in a new version and hydrus has trouble figuring out whether to use the older one (from a previous version) or the newer.
In any case, if you cannot boot hydrus and it either fails silently or you get a crash log or system-level error popup complaining in a technical way about not being able to load a dll/pyd/so file, you may need a clean install, which essentially means clearing any old files out and reinstalling.
However, you need to be careful not to delete your database! It sounds silly, but at least one user has made a mistake here. The process is simple, do not deviate:
- Make a backup if you can!
- Go to your install directory.
- Delete all the files and folders except the 'db' dir (and all of its contents, obviously).
- Extract the new version of hydrus as you normally do.
After that, you'll have a 'clean' version of hydrus that only has the latest version's dlls. If hydrus still will not boot, I recommend you roll back to your last working backup and let me, hydrus dev, know what your error is.
Note that macOS App users will not ever have to do a clean install because every App is self-contained and non-merging with previous Apps. Source users similarly do not have to worry about this issue, although if they update their system python, they'll want to recreate their venv. Windows Installer users basically get a clean install every time, so they shouldn't have to worry about this.
"},{"location":"getting_started_installing.html#big_updates","title":"Big updates","text":"If you have not updated in some time--say twenty versions or more--doing it all in one jump, like v290->v330, may not work. I am doing a lot of unusual stuff with hydrus, change my code at a fast pace, and do not have a ton of testing in place. Hydrus update code often falls to bit rot, and so some underlying truth I assumed for the v299->v300 code may not still apply six months later. If you try to update more than 50 versions at once (i.e. trying to perform more than a year of updates in one go), the client will give you a polite error rather than even try.
As a result, if you get a failure on trying to do a big update, try cutting the distance in half--try v290->v310 first, and boot it. If the database updates correctly and the program boots, then shut down and move on to v310->v330. If the update does not work, cut down the gap and try v290->v300, and so on. Again, it is very important you make a backup before starting a process like this so you can roll back and try a different version if things go wrong.
If you narrow the gap down to just one version and still get an error, please let me know. If the problem is ever quick to appear and ugly/serious-looking, and perhaps talking about a \"bootloader\" or \"dll\" issue, then try doing a clean install as above. I am very interested in these sorts of problems and will be happy to help figure out a fix with you (and everyone else who might be affected).
All that said, and while updating is complex and every client is different, various user reports over the years suggest this route works and is efficient: 204 > 238 > 246 > 291 > 328 > 335 (clean install) > 376 > 421 > 466 (clean install) > 474 > 480 > 521 (maybe clean install) > 527 (special clean install) > 535 > 558 > 571 (clean install)
334->335We moved from python 2 to python 3.
If you need to update from 334 or before to 335 or later, then:
- If you use the Windows installer, install as normal.
- If you use one of the normal extract builds, you will have to do a 'clean install', as above.
- If you use the macOS app, there are no special instructions. Update as normal.
- If you run from source, there are no special instructions. Update as normal.
Some new dlls cause a potential conflict.
If you need to update from 427 or before to 428 or later, then:
- If you use the Windows installer, install as normal.
- If you use one of the normal extract builds, you will have to do a 'clean install', as above.
- If you use the macOS app, there are no special instructions. Update as normal.
- If you run from source, there are no special instructions. Update as normal.
527 changed the program executable name from 'client' to 'hydrus_client'. There was also a library update that caused a dll conflict with previous installs.
If you need to update from 526 or before to 527 or later, then:
- If you use the Windows installer, install as normal. Your start menu 'hydrus client' shortcut should be overwritten with one to the new executable, but if you use a custom shortcut, you will need to update that too.
- If you use one of the normal extract builds, you will have to do a 'clean install', as above.
- If you use the macOS app, there are no special instructions. Update as normal.
- If you run from source,
git pull
as normal. If you haven't already, feel free to run setup_venv again to get the new OpenCV. Update your launch scripts to point at the newhydrus_client.py
boot scripts.
571 updated the python version, which caused a dll conflict with previous installs.
If you need to update from 570 or before to 571 or later, then:
- If you use the Windows installer, install as normal.
- If you use one of the normal extract builds, you will have to do a 'clean install', as above.
- If you use the macOS app, there are no special instructions. Update as normal.
- If you run from source, there are no special instructions. Update as normal.
I am not joking around: if you end up liking hydrus, you should back up your database
Maintaining a regular backup is important for hydrus. The program stores a lot of complicated data that you will put hours and hours of work into, and if you only have one copy and your hard drive breaks, you could lose everything. This has happened before--to people who thought it would never happen to them--and it sucks big time to go through. Don't let it be you.
Hydrus's database engine, SQLite, is excellent at keeping data safe, but it cannot work in a faulty environment. Ways in which users of hydrus have damaged/lost their database:
- Hard drive hardware failure (age, bad ventilation, bad cables, etc...)
- Lightning strike on non-protected socket or rough power cut on non-UPS'd power supply
- RAM failure
- Motherboard/PSU power problems
- Accidental deletion
- Accidental overwrite (usually during a borked update)
- Encrypted partition auto-dismount/other borked settings
- Cloud backup interfering with ongoing writes
- An automatic OS backup routine misfiring and causing a rollback, wiping out more than a year of progress
- A laptop that incorrectly and roughly disconnected an external USB drive on every sleep
- Network drive location not guaranteeing accurate file locks
- Windows NVMe driver bugs necessitating a different SQLite journalling method
Some of those you can mitigate (don't run the database over a network!) and some will always be a problem, but if you have a backup, none of them can kill you.
This mostly means your database, not your files
Note that nearly all the serious and difficult-to-fix problems occur to the database, which is four large .db files, not your media. All your images and movies are read-only in hydrus, and there's less worry if they are on a network share with bad locks or a machine that suddenly loses power. The database, however, maintains a live connection, with regular complex writes, and here a hardware failure can lead to corruption (basically the failure scrambles the data that is written, so when you try to boot back up, a small section of the database is incomprehensible garbage).
If you do not already have a backup routine for your files, this is a great time to start. I now run a backup every week of all my data so that if my computer blows up or anything else awful happens, I'll at worst have lost a few days' work. Before I did this, I once lost an entire drive with tens of thousands of files, and it felt awful. If you are new to saving a lot of media, I hope you can avoid what I felt. ;_;
I use ToDoList to remind me of my jobs for the day, including backup tasks, and FreeFileSync to actually mirror over to an external usb drive. I recommend both highly (and for ToDoList, I recommend hiding the complicated columns, stripping it down to a simple interface). It isn't a huge expense to get a couple-TB usb drive either--it is absolutely worth it for the peace of mind.
By default, hydrus stores all your user data in one location, so backing up is simple:
"},{"location":"getting_started_installing.html#the_simple_way_-_inside_the_client","title":"The simple way - inside the client","text":"Go database->set up a database backup location in the client. This will tell the client where you want your backup to be stored. A fresh, empty directory on a different drive is ideal.
Once you have your location set up, you can thereafter hit database->update database backup. It will lock everything and mirror your files, showing its progress in a popup message. The first time you make this backup, it may take a little while (as it will have to fully copy your database and all its files), but after that, it will only have to copy new or altered files and should only ever take a couple of minutes.
Advanced users who have migrated their database and files across multiple locations will not have this option--use an external program in this case.
"},{"location":"getting_started_installing.html#the_powerful_and_best_way_-_using_an_external_program","title":"The powerful (and best) way - using an external program","text":"Doing it yourself is best. If you are an advanced user with a complicated hydrus install migrated across multiple drives, then you will have to do it this way--the simple backup will be disabled.
You need to backup two things, which are both, by default, beneath install_dir/db: the four client*.db files and your client_files directory(ies). The .db files contain absolutely everything about your client and files--your settings and file lists and metadata like inbox/archive and tags--while the client_files subdirs store your actual media and its thumbnails.
If everything is still under install_dir/db, then it is usually easiest to just backup the whole install dir, keeping a functional 'portable' copy of your install that you can restore no prob. Make sure you keep the .db files together--they are not interchangeable and mostly useless on their own!
An example FreeFileSync profile for backing up a database will look like this:
Note it has 'file time and size' and 'mirror' as the main settings. This quickly ensures that changes to the left-hand side are copied to the right-hand side, adding new files and removing since-deleted files and overwriting modified files. You can save a backup profile like that and it should only take a few minutes every week to stay safely backed up, even if you have hundreds of thousands of files.
Shut the client down while you run the backup, obviously.
"},{"location":"getting_started_installing.html#a_few_options","title":"A few options","text":"There are a host of other great alternatives out there, probably far too many to count. These are a couple that are often recommended and used by Hydrus users and are, in the spirit of Hydrus Network itself, free and open source.
FreeFileSync Linux, MacOS, Windows. Recommended and used by dev. Somewhat basic but does the job well enough.
Borg Backup FreeBSD, Linux, MacOS. More advanced and featureful backup tool.
Restic Almost every OS you can name.
Danger
Do not put your live database in a folder that continuously syncs to a cloud backup. Many of these services will interfere with a running client and can cause database corruption. If you still want to use a system like this, either turn the sync off while the client is running, or use the above backup workflows to safely backup your client to a separate folder that syncs to the cloud.
There is significantly more information about the database structure here.
I recommend you always backup before you update, just in case there is a problem with my update code that breaks your database. If that happens, please contact me, describing the problem, and revert to the functioning older version. I'll get on any problems like that immediately.
"},{"location":"getting_started_installing.html#backing_up_small","title":"Backing up with not much space","text":"If you decide not to maintain a backup because you cannot afford drive space for all your files, please please at least back up your actual database files. Use FreeFileSync or a similar program to back up the four 'client*.db' files in install_dir/db when the client is not running. Just make sure you have a copy of those files, and then if your main install becomes damaged, we will have a reference to either roll back to or manually restore data from. Even if you lose a bunch of media files in this case, with an intact database we'll be able to schedule recovery of anything with a URL.
If you are really short on space, note also that the database files are very compressible. A very large database where the four files add up to 70GB can compress down to 17GB zip with 7zip on default settings. Better compression ratios are possible if you make sure to put all four files in the same archive and turn up the quality. This obviously takes some additional time to do, but if you are really short on space it may be the only way it fits, and if your only backup drive is a slow USB stick, then you might actually save time from not having to transfer the other 53GB! Media files (jpegs, webms, etc...) are generally not very compressible, usually 5% at best, so it is usually not worth trying.
It is best to have all four database files. It is generally easy and quick to fix problems if you have a backup of all four. If client.caches.db is missing, you can recover but it might take ten or more hours of CPU work to regenerate. If client.mappings.db is missing, you might be able to recover tags for your local files from a mirror in an intact client.caches.db. However, client.master.db and client.db are the most important. If you lose either of those, or they become too damaged to read and you have no backup, then your database is essentially dead and likely every single archive and view and tag and note and url record you made is lost. This has happened before, do not let it be you.
"},{"location":"getting_started_more_tags.html","title":"Tags Can Get Complicated","text":"Tags are powerful, and there are many tools within hydrus to customise how they apply and display. I recommend you play around with the basics before making your own new local tag services or jumping right into the PTR, so take it slow.
"},{"location":"getting_started_more_tags.html#tags_are_for_searching_not_describing","title":"Tags are for Searching not Describing","text":"Hydrus users tend to be nerds of one sort or another, and we all like thinking about categorisation and semantic relationships. I provide several clever tools in this program, and it is not uncommon for newer users to spend hours sketching out intricate tree-charts and idiosyncratic taxonomy algebra in a One True Plan and then only tagging five actual files of anime cat girls before burning out. Try not to let this happen to you.
In making hydrus, I have discovered two rules to stop you going crazy:
- Don't try to be perfect.
- Only add those tags you actually use in searches.
There is always work to do, and it is easy to exhaust onesself or get lost in the bushes agonising over whether to use 'smile' or 'smiling' or 'smirk'--before you know it, you have been tagging the same file for four minutes, and there are twelve thousand to go. The details are not as important as the broad strokes, and problems are easy to correct in future. There is often also no perfect answer, and even if there were, we would never have time to apply it everywhere. The ride never ends.
The sheer number of tags can also be overwhelming. Importing all the many tags from boorus is totally fine, but if you are typing tags yourself, I suggest you try not to exhaustively tag everything in the image. You will go crazy and burn out!
Ultimately, tags are a medium for searching, not describing. Anyone can see what is in an image just by looking at it, so--for the most part--the only use in writing any of it down is if you would ever use those particular words to find the thing again. Character, series and creator namespaces are a great simple place to start. After that, add whatever you are most interested in, be that 'blue sky' or 'midriff' or fanfic ship names, whatever you would actually use in a search, and then you can spend your valuable time actually using your media rather than drowning-by-categorisation.
"},{"location":"getting_started_more_tags.html#tag_services","title":"Tag services","text":"Hydrus lets you organise tags across multiple separate 'services'. By default there are two, but you can have however many you want (
services->manage services
). You might like to add more for different sets of siblings/parents, tags you don't want to see but still search by, parsing tags into different services based on reliability of the source or the source itself. You could for example parse all tags from Pixiv into one service, Danbooru tags into another, Deviantart etc. and so on as you chose. You must always have at least one local tag service.Local tag services are stored only on your hard drive--they are completely private. No tags, siblings, or parents will accidentally leak, so feel free to go wild with whatever odd scheme you want to try out.
Each tag service comes with its own tags, siblings and parents.
"},{"location":"getting_started_more_tags.html#my_tags","title":"My tags","text":"The intent is to use this service for tags you yourself want to add.
"},{"location":"getting_started_more_tags.html#downloader_tags","title":"Downloader tags","text":"The default place for tags coming from downloaders. Tags of things you download will end up here unless you change the settings. It is a good idea to set up some tag blacklists for tags you do not want.
"},{"location":"getting_started_more_tags.html#tag_repositories","title":"Tag repositories","text":"It can take a long time to tag even small numbers of files well, so I created tag repositories so people can share the work.
Tag repos store many file->tag relationships. Anyone who has an access key to the repository can sync with it and hence download all these relationships. If any of their own files match up, they will get those tags. Access keys will also usually have permission to upload new tags and ask for incorrect ones to be deleted.
Anyone can run a tag repository, but it is a bit complicated for new users. I ran a public tag repository for a long time, and now this large central store is run by users. It has over a billion tags and is free to access and contribute to.
To connect with it, please check here. Please read that page if you want to try out the PTR. It is only appropriate for someone on an SSD!
If you add it, your client will download updates from the repository over time and, usually when it is idle or shutting down, 'process' them into its database until it is fully synchronised. The processing step is CPU and HDD heavy, and you can customise when it happens in file->options->maintenance and processing. As the repository synchronises, you should see some new tags appear, particularly on famous files that lots of people have.
You can watch more detailed synchronisation progress in the services->review services window.
Your new service should now be listed on the left of the manage tags dialog. Adding tags to a repository works very similarly to the 'my tags' service except hitting 'apply' will not immediately confirm your changes--it will put them in a queue to be uploaded. These 'pending' tags will be counted with a plus '+' or minus '-' sign.
Notice that a 'pending' menu has appeared on the main window. This lets you start the upload when you are ready and happy with everything that you have queued.
When you upload your pending tags, they will commit and look to you like any other tag. The tag repository will anonymously bundle them into the next update, which everyone else will download in a day or so. They will see your tags just like you saw theirs.
If you attempt to remove a tag that has been uploaded, you may be prompted to give a reason, creating a petition that a janitor for the repository will review.
I recommend you not spam tags to the public tag repo until you get a rough feel for the guidelines, and my original tag schema thoughts, or just lurk until you get the idea. It roughly follows what you will see on a typical booru. The general rule is to only add factual tags--no subjective opinion.
You can connect to more than one tag repository if you like. When you are in the manage tags dialog, pressing the up or down arrow keys on an empty input switches between your services.
FAQ: why can my friend not see what I just uploaded?
"},{"location":"getting_started_more_tags.html#siblings_and_parents","title":"Siblings and parents","text":"For more in-depth information, see siblings and parents.
tl;dr: Siblings rename/alias tags in an undoable way. Parents virtually add/imply one or more tags (parents) if the 'child' tag is present. The PTR has a lot of them.
"},{"location":"getting_started_more_tags.html#display_rules","title":"Display rules","text":"If you go to
"},{"location":"getting_started_ratings.html","title":"getting started with ratings","text":"tags -> manage where siblings and parents apply
you'll get a window where you can customise where and in what order siblings and parents apply. The service at the top of the list has precedence over all else, then second, and so on depending on how many you have. If you for example have PTR you can use a tag service to overwrite tags/siblings for cases where you disagree with the PTR standards.The hydrus client supports two kinds of ratings: like/dislike and numerical. Let's start with the simpler one:
"},{"location":"getting_started_ratings.html#like_dislike","title":"like/dislike","text":"A new client starts with one of these, called 'favourites'. It can set one of two values to a file. It does not have to represent like or dislike--it can be anything you want, like 'send to export folder' or 'explicit/safe' or 'cool babes'. Go to services->manage services->add->local like/dislike ratings:
You can set a variety of colours and shapes.
"},{"location":"getting_started_ratings.html#numerical","title":"numerical","text":"This is '3 out of 5 stars' or '8/10'. You can set the range to whatever whole numbers you like:
As well as the shape and colour options, you can set how many 'stars' to display and whether 0/10 is permitted.
If you change the star range at a later date, any existing ratings will be 'stretched' across the new range. As values are collapsed to the nearest integer, this is best done for scales that are multiples. \u2156 will neatly become 4/10 on a zero-allowed service, for instance, and 0/4 can nicely become \u2155 if you disallow zero ratings in the same step. If you didn't intuitively understand that, just don't touch the number of stars or zero rating checkbox after you have created the numerical rating service!
"},{"location":"getting_started_ratings.html#incdec","title":"inc/dec","text":"This is a simple counter. It can represent whatever you like, but most people usually go for 'I x this image y times'. You left-click to +1 it, right-click to -1.
"},{"location":"getting_started_ratings.html#using_ratings","title":"now what?","text":"Ratings are displayed in the top-right of the media viewer:
Hovering over each control will pop up its name, in case you forget which is which.
For like/dislike:
- Left-click: Set 'like'
- Right-click: Set 'dislike'
- Second X-click: Set 'not rated'
For numerical:
- Left-click: Set value
- Right-click: Set 'not rated'
For inc/dec:
- Left-click: +1
- Right-click: -1
Pressing F4 on a selection of thumbnails will open a dialog with a very similar layout, which will let you set the same rating to many files simultaneously.
Once you have some ratings set, you can search for them using system:rating, which produces this dialog:
On my own client, I find it useful to have several like/dislike ratings set up as quick one-click pseudo-tags. Stuff like 'this would make a good post' or 'read this later' that I can hit while I am doing archive/delete filtering.
"},{"location":"getting_started_searching.html","title":"Searching and sorting","text":"The primary purpose of tags is to be able to find what you've tagged again. Let's see more how it works.
"},{"location":"getting_started_searching.html#searching","title":"Searching","text":"Just open a new search page (
"},{"location":"getting_started_searching.html#the_dropdown_controls","title":"The dropdown controls","text":"pages > new file search page
or Ctrl+T> file search
) and start typing in the search field which should be focused when you first open the page.Let's look at the tag autocomplete dropdown:
-
system predicates
Hydrus calls search terms predicates. 'system predicates', which search metadata other than simple tags, show on any search page with an empty autocomplete input. You can mix them into any search alongside tags. They are very useful, so try them out!
-
include current/pending tags
Turn these on and off to control whether tag predicates apply to tags that exist, or those pending to be uploaded to a tag repository. Just searching 'pending' tags is useful if you want to scan what you have pending to go up to the PTR--just turn off 'current' tags and search
system:num tags > 0
. -
searching immediately
This controls whether a change to the list of current search predicates will instantly run the new search and get new results. Turning this off is helpful if you want to add, remove, or replace several heavy search terms in a row without getting UI lag.
-
OR
You only see this if you have 'advanced mode' on. It lets you enter some pretty complicated tags!
-
file/tag domains
By default, you will search in 'my files' and 'all known tags' domain. This is the intersection of your local media files (on your hard disk) and the union of all known tag searches. If you search for
character:samus aran
, then you will get file results from your 'my files' domain that havecharacter:samus aran
in any known tag service. For most purposes, this combination is fine, but as you use the client more, you will sometimes want to access different search domains.For instance, if you change the file domain to 'trash', then you will instead get files that are in your trash. Setting the tag domain to 'my tags' will ignore other tag services (e.g. the PTR) for all tag search predicates, so a
system:num_tags
or acharacter:samus aran
will only look 'my tags'.Turning on 'advanced mode' gives access to more search domains. Some of them are subtly complicated, run extremely slowly, and only useful for clever jobs--most of the time, you still want 'my files' and 'all known tags'.
-
favourite searches star
Once you are more experienced, have a play with this. It lets you save your common searches for future, so you don't have to either keep re-entering them or keep them open all the time. If you close big things down when you aren't using them, you will keep your client lightweight and save time.
When you type a tag in a search page, Hydrus will treat a space the same way as an underscore. Searching
character:samus aran
will find files tagged withcharacter:samus aran
andcharacter:samus_aran
. This is true of some other syntax characters,[](){}/\\\"'-
, too.Tags will be searchable by all their siblings. If there's a sibling for
"},{"location":"getting_started_searching.html#wildcards","title":"Wildcards","text":"large
->huge
then typinglarge
will providehuge
as a suggestion. This goes for the whole sibling chain, no matter how deep or a tag's position in it.The autocomplete tag dropdown supports wildcard searching with
*
.The
*
will match any number of characters. Every normal autocomplete search has a secret*
on the end that you don't see, which is how full words get matched from you only typing in a few letters.This is useful when you can only remember part of a word, or can't spell part of it. You can put
*
characters anywhere, but you should experiment to get used to the exact way these searches work. Some results can be surprising!You can select the special predicate inserted at the top of your autocomplete results (the highlighted
*gelion
and*va*ge*
above). It will return all files that match that wildcard, i.e. every file for every other tag in the dropdown list.This is particularly useful if you have a number of files with commonly structured over-informationed tags, like this:
In this case, selecting the
"},{"location":"getting_started_searching.html#editing_predicates","title":"Editing Predicates","text":"title:cool pic*
predicate will return all three images in the same search, where you can conveniently give them some more-easily searched tags likeseries:cool pic
andpage:1
,page:2
,page:3
.You can edit any selected 'active' search predicates by either its Right-Click menu or through Shift+Double-Left-Click on the selection. For simple tags, this means just changing the text (and, say, adding/removing a leading hyphen for negation/inclusion), but any 'system' predicate can be fully edited with its original panel. If you entered 'system:filesize < 200KB' and want to make it a little bigger, don't delete and re-add--just edit the existing one in place.
"},{"location":"getting_started_searching.html#other_shortcuts","title":"Other Shortcuts","text":"These will eventually be migrated to the shortcut system where they will be more visible and changeable, but for now:
- Left-Click on any taglist is draggable, if you want to select multiple tags quickly.
- Shift+Left-Click across any taglist will do a multi-select. This click is also draggable.
- Ctrl+Left-Click on any taglist will add to or remove from the selection. This is draggable, and if you start on a 'remove', the drag will be a 'remove' drag. Play with it--you'll see how it works.
- Double-Left-Click on one or more tags in the 'selection tags' box moves them to the active search box. Doing the same on the active search box removes them.
- Ctrl+Double-Left-Click on one or more tags in the 'selection tags' box will add their negation (i.e. '-skirt').
- Shift+Double-Left-Click on more than one tags in the 'selection tags' box will add their 'OR' to the active search box. What's an OR? Well:
Searches find files that match every search 'predicate' in the list (it is an AND search), which makes it difficult to search for files that include one OR another tag. For example the query
red eyes
ANDgreen eyes
(aka what you get if you enter each tag by itself) will only find files that has both tags. While the queryred eyes
ORgreen eyes
will present you with files that are tagged with red eyes or green eyes, or both.More recently, simple OR search support was added. All you have to do is hold down Shift when you enter/double-click a tag in the autocomplete entry area. Instead of sending the tag up to the active search list up top, it will instead start an under-construction 'OR chain' in the tag results below:
You can keep searching for and entering new tags. Holding down ++Shift++ on new tags will extend the OR chain, and entering them as normal will 'cap' the chain and send it to the complete and active search predicates above.
Any file that has one or more of those OR sub-tags will match.
If you enter an OR tag incorrectly, you can either cancel or 'rewind' the under-construction search predicate with these new buttons that will appear:
You can also cancel an under-construction OR by hitting Esc on an empty input. You can add any sort of search term to an OR search predicate, including system predicates. Some unusual sub-predicates (typically a
-tag
, or a very broad system predicate) can run very slowly, but they will run much faster if you include non-OR search predicates in the search:This search will return all files that have the tag
fanfic
and one or more ofmedium:text
, a positive value for the like/dislike rating 'read later', or PDF mime.There's a more advanced OR search function available by pressing the OR button. Previous knowledge of operators expected and required.
"},{"location":"getting_started_searching.html#sorting","title":"Sorting","text":"At the top-left of most pages there's a
sort by:
dropdown menu. Most of the options are self-explanatory. They do nothing except change in what order Hydrus presents the currently searched files to you.Default sort order and more
"},{"location":"getting_started_searching.html#sorting_with_systemlimit","title":"Sorting withsort by: namespace
are found infile -> options -> sort/collect
.system:limit
","text":"If you add
system:limit
to a search, the client will consider what that page's file sort currently is. If it is simple enough--something like file size or import time--then it will sort your results before they come back and clip the limit according to that sort, getting the n 'largest file size' or 'newest imports' and so on. This can be a great way to set up a lightweight filtering page for 'the 256 biggest videos in my inbox'.If you change the sort, hydrus will not refresh the search, it'll just re-sort the n files you have. Hit F5 to refresh the search with a new sort.
Not all sorts are supported. Anything complicated like tag sort will result in a random sample instead.
"},{"location":"getting_started_searching.html#collecting","title":"Collecting","text":"Collection is found under the
"},{"location":"getting_started_subscriptions.html","title":"subscriptions","text":"sort by:
dropdown and uses namespaces listed in thesort by: namespace
sort options. The new namespaces will only be available in new pages.The introduction to subscriptions has been moved to the main downloading help here.
"},{"location":"getting_started_subscriptions.html#description","title":"how do subscriptions work?","text":"For the most part, all you need to do to set up a good subscription is give it a name, select the download source, and use the 'paste queries' button to paste what you want to search. Subscriptions have great default options for almost all query types, so you don't have to go any deeper than that to get started.
Once you hit ok on the main subscription dialog, the subscription system should immediately come alive. If any queries are due for a 'check', they will perform their search and look for new files (i.e. URLs it has not seen before). Once that is finished, the file download queue will be worked through as normal. Typically, the sub will make a popup like this while it works:
The initial sync can sometimes take a few minutes, but after that, each query usually only needs thirty seconds' work every few days. If you leave your client on in the background, you'll rarely see them. If they ever get in your way, don't be afraid to click their little cancel button or call a global halt with network->pause->subscriptions--the next time they run, they will resume from where they were before.
Similarly, the initial sync may produce a hundred files, but subsequent runs are likely to only produce one to ten. If a subscription comes across a lot of big files at once, it may not download them all in one go--but give it time, and it will catch back up before you know it.
When it is done, it leaves a little popup button that will open a new page for you:
This can often be a nice surprise!
"},{"location":"getting_started_subscriptions.html#good_subs","title":"what makes a good subscription?","text":"The same rules as for downloaders apply: start slow, be hesitant, and plan for the long-term. Artist queries make great subscriptions as they update reliably but not too often and have very stable quality. Pick the artists you like most, see where their stuff is posted, and set up your subs like that.
Series and character subscriptions are sometimes valuable, but they can be difficult to keep up with and have highly variable quality. It is not uncommon for users to only keep 15% of what a character sub produces. I do not recommend them for anything but your waifu.
Attribute subscriptions like 'blue_eyes' or 'smile' make for terrible subs as the quality is all over the place and you will be inundated by too much content. The only exceptions are for specific, low-count searches that really matter to you, like 'contrapposto' or 'gothic trap thighhighs'.
If you end up subscribing to eight hundred things and get ten thousand new files a week, you made a mistake. Subscriptions are for keeping up with things you like. If you let them overwhelm you, you'll resent them.
It is a good idea to run a 'full' download for a search before you set up a subscription. As well as making sure you have the exact right query text and that you have everything ever posted (beyond the 100 files deep a sub will typically look), it saves the bulk of the work (and waiting on bandwidth) for the manual downloader, where it belongs. When a new subscription picks up off a freshly completed download queue, its initial subscription sync only takes thirty seconds since its initial URLs are those that were already processed by the manual downloader. I recommend you stack artist searches up in the manual downloader using 'no limit' file limit, and when they are all finished, select them in the list and right-click->copy queries, which will put the search texts in your clipboard, newline-separated. This list can be pasted into the subscription dialog in one go with the 'paste queries' button again!
"},{"location":"getting_started_subscriptions.html#checking","title":"images/how often do subscriptions check?","text":"Hydrus subscriptions use the same variable-rate checking system as its thread watchers, just on a larger timescale. If you subscribe to a busy feed, it might check for new files once a day, but if you enter an artist who rarely posts, it might only check once every month. You don't have to do anything. The fine details of this are governed by the 'checker options' button. This is one of the things you should not mess with as you start out.
If a query goes too 'slow' (typically, this means no new files for 180 days), it will be marked DEAD in the same way a thread will, and it will not be checked again. You will get a little popup when this happens. This is all editable as you get a better feel for the system--if you wish, it is completely possible to set up a sub that never dies and only checks once a year.
I do not recommend setting up a sub that needs to check more than once a day. Any search that is producing that many files is probably a bad fit for a subscription. Subscriptions are for lightweight searches that are updated every now and then.
(you might like to come back to this point once you have tried subs for a week or so and want to refine your workflow)
"},{"location":"getting_started_subscriptions.html#presentation","title":"ok, I set up three hundred queries, and now these popup buttons are a hassle","text":"On the edit subscription panel, the 'presentation' options let you publish files to a page. The page will have the subscription's name, just like the button makes, but it cuts out the middle-man and 'locks it in' more than the button, which will be forgotten if you restart the client. Also, if a page with that name already exists, the new files will be appended to it, just like a normal import page! I strongly recommend moving to this once you have several subs going. Make a 'page of pages' called 'subs' and put all your subscription landing pages in there, and then you can check it whenever is convenient.
If you discover your subscription workflow tends to be the same for each sub, you can also customise the publication 'label' used. If multiple subs all publish to the 'nsfw subs' label, they will all end up on the same 'nsfw subs' popup button or landing page. Sending multiple subscriptions' import streams into just one or two locations like this can be great.
You can also hide the main working popup. I don't recommend this unless you are really having a problem with it, since it is useful to have that 'active' feedback if something goes wrong.
Note that subscription file import options will, by default, only present 'new' files. Anything already in the db will still be recorded in the internal import cache and used to calculate next check times and so on, but it won't clutter your import stream. This is different to the default for all the other importers, but when you are ready to enter the ranks of the Patricians, you will know to edit your 'loud' default file import options under options->importing to behave this way as well. Efficient workflows only care about new files.
"},{"location":"getting_started_subscriptions.html#syncing_explanation","title":"how exactly does the sync work?","text":"Figuring out when a repeating search has 'caught up' can be a tricky problem to solve. It sounds simple, but unusual situations like 'a file got tagged late, so it inserted deeper than it ideally should in the gallery search' or 'the website changed its URL format completely, help' can cause problems. Subscriptions are automatic systems, so they tend to be a bit more careful and paranoid about problems, lest they burn 10GB on 10,000 unexpected diaperfur images.
The initial sync is simple. It does a regular search, stopping if it reaches the 'initial file limit' or the last file in the gallery, whichever comes first. The default initial file sync is 100, which is a great number for almost all situations.
Subsequent syncs are more complicated. It ideally 'stops' searching when it reaches files it saw in a previous sync, but if it comes across new files mixed in with the old, it will search a bit deeper. It is not foolproof, and if a file gets tagged very late and ends up a hundred deep in the search, it will probably be missed. There is no good and computationally cheap way at present to resolve this problem, but thankfully it is rare.
Remember that an important 'staying sane' philosophy of downloading and subscriptions is to focus on dealing with the 99.5% you have before worrying about the 0.5% you do not.
The amount of time between syncs is calculated by the checker options. Based on the timestamps attached to existing urls in the subscription cache (either added time, or the post time as parsed from the url), the sub estimates how long it will be before n new files appear, and then next check is scheduled for then. Unless you know what you are doing, checker options, like file limits, are best left alone. A subscription will naturally adapt its checking speed to the file 'velocity' of the source, and there is usually very little benefit to trying to force a sub to check at a radically different speed.
Tip
If you want to force your subs to run at the same time, say every evening, it is easier to just use network->pause->subscriptions as a manual master on/off control. The ones that are due will catch up together, the ones that aren't won't waste your time.
Remember that subscriptions only keep up with new content. They cannot search backwards in time in order to 'fill out' a search, nor can they fill in gaps. Do not change the file limits or check times to try to make this happen. If you want to ensure complete sync with all existing content for a particular search, use the manual downloader.
In practice, most subs only need to check the first page of a gallery since only the first two or three urls are new.
"},{"location":"getting_started_subscriptions.html#periodic_file_limit","title":"periodic file limit exceeded","text":"If, during a regular sync, the sub keeps finding new URLs, never hitting a block of already-seen URLs, it will stop upon hitting its 'periodic file limit', which is also usually 100. When it happens, you will get a popup message notification. There are two typical reasons for this:
- A user suddenly posted a large number of files to the site for that query. This sometimes happens with CG gallery spam.
- The website changed their URL format.
The first case is a natural accident of statistics. The subscription now has a 'gap' in its sync. If you want to get what you missed, you can try to fill in the gap with a manual downloader page. Just download to 200 files or so, and the downloader will work quickly to one-time work through the URLs in the gap.
The second case is a safety stopgap for hydrus. If a site decides to have
"},{"location":"getting_started_subscriptions.html#merging_and_separating","title":"I put character queries in my artist sub, and now things are all mixed up","text":"/post/123456
style URLs instead ofpost.php?id=123456
style, hydrus will suddenly see those as entirely 'new' URLs. It could also be because of an updated downloader, which pulls URLs in API format or similar. This is again thankfully quite rare, but it triggers several problems--the associated downloader usually breaks, as it does not yet recognise those new URLs, and all your subs for that site will parse through and hit the periodic limit for every query. When this happens, you'll usually get several periodic limit popups at once, and you may need to update your downloader. If you know the person who wrote the original downloader, they'll likely want to know about the problem, or may already have a fix sorted. It is often a good idea to pause the affected subs until you have it figured out and working in a normal gallery downloader page.On the main subscription dialog, there are 'merge' and 'separate' buttons. These are powerful, but they will walk you through the process of pulling queries out of a sub and merging them back into a different one. Only subs that use the same download source can be merged. Give them a go, and if it all goes wrong, just hit the cancel button on the dialog.
"},{"location":"getting_started_tags.html","title":"Getting started with tags","text":"A tag is a small bit of text describing a single property of something. They make searching easy. Good examples are \"flower\" or \"nicolas cage\" or \"the sopranos\" or \"2003\". By combining several tags together ( e.g. [ 'tiger woods', 'sports illustrated', '2008' ] or [ 'cosplay', 'the legend of zelda' ] ), a huge image collection is reduced to a tiny and easy-to-digest sample.
"},{"location":"getting_started_tags.html#intro","title":"How do we find files?","text":"So, you have some files imported. Let's give them some tags so we can find them again later.
FAQ: what is a tag?
Your client starts with two local tags services, called 'my tags' and 'downloader tags' which keep all of their file->tag mappings in your client's database where only you can see them. 'my tags' is a good place to practise.
Select a file and press F3 to open the manage tags dialog:
The area below where you type is the 'autocomplete dropdown'. You will see this on normal search pages too. Type part of a tag, and matching results will appear below. Since you are starting out, your 'my tags' service won't have many tags in it yet, but things will populate fast! Select the tag you want with the arrow keys and hit enter. If you want to remove a tag, enter the exact same thing again or double-click it in the box above.
Prefixing a tag with a category and a colon will create a namespaced tag. This helps inform the software and other users about what the tag is. Examples of namespaced tags are:
character:batman
series:street fighter
person:jennifer lawrence
title:vitruvian man
The client is set up to draw common namespaces in different colours, just like boorus do. You can change these colours in the options.
Once you are happy with your tag changes, click 'apply', or hit F3 again, or simply press Enter on the text box while it is empty. The tags are now saved to your database.
Media Viewer Manage Tags
You can also open the manage tags dialog from the full media viewer, but note that this one does not have 'apply' and 'cancel' buttons, only 'close'. It makes its changes instantly, and you can keep using the rest of the program while it is open (it is a non-'modal' dialog).
Also, you need not close the media viewer's manage tags dialog while you browse. Just like you can hit Enter on the empty text box to close the dialog, hitting Page Up/Down navigates the parent viewer Back/Forward!
AlsoHit Arrow Up/Down on an empty text input to switch between the tag service tabs!
Once you have some tags set, typing the first few characters of one in on a search page will show the counts of all the tags that start with that. Enter the one you want, and the search will run:
If you add more 'predicates' to a search, you will limit the results to those files that match every single one:
You can also exclude a tag by prefixing it with a hyphen (e.g.
-solo
).You can add as many tags as you want. In general, the more search predicates you add, the smaller and faster the results will be, but some types of tag (like excluded
"},{"location":"introduction.html","title":"introduction and statement of principles","text":""},{"location":"introduction.html#this_help","title":"this help","text":"-tags
), or the cleverersystem
tags that you will soon learn about, can be suddenly CPU expensive. If a search takes more than a few seconds to run, a 'stop' button appears by the tag input. It cancels things out pretty quick in most cases.Click the links on the left to go through the getting started guide. Subheadings are on the right. Larger sections are up top. Please at least skim every page in the getting started section, as this will introduce you to the main systems in the client. There is a lot, so you do not have to do it all in one go.
The section on installing, updating, and backing up is very important.
This help is available locally in every release. Hit
"},{"location":"introduction.html#files","title":"on having too many files","text":"help->help and getting started guide
in the client, or openinstall_dir/help/index.html
.I've been on the internet and imageboards for a long time, saving everything I like to my hard drive. After a while, the whole collection was just too large to manage on my own. I couldn't find anything in the mess, and I just saved new files in there with names like 'image1257.jpg'.
There aren't many solutions to this problem that aren't online, and I didn't want to lose my privacy or control.
"},{"location":"introduction.html#anonymous","title":"on being anonymous","text":"I enjoy being anonymous online. When you aren't afraid of repercussions, you can be as truthful as you want and share interesting things, no matter how unusual. You can have unique conversations and tackle some otherwise unsolvable problems. It's fun!
I'm a normal Anon, nothing special. :^)
"},{"location":"introduction.html#hydrus_network","title":"the hydrus network","text":"So! I'm developing a program that helps people organise their files on their own terms and, if they want to, collaborate with others anonymously. I want to help you do what you want with your stuff, and that's it. You can share some tags (and files, but this is limited) with other people if you want to, but you don't have to connect to anything if you don't. The default is complete privacy, no sharing, and every upload requires a conscious action on your part. I don't plan to ever record metrics on users, nor serve ads, nor charge for my software. The software never phones home.
This does a lot more than a normal image viewer. If you are totally new to the idea of personal media collections and booru-style tagging, I suggest you start slow, walk through the getting started guides, and experiment doing different things. If you aren't sure on what a button does, try clicking it! You'll be importing thousands of files and applying tens of thousands of tags in no time. The best way to learn is just to try things out.
The client is chiefly a file database. It stores your files inside its own folders, managing them far better than an explorer window or some online gallery. Here's a screenshot of one of my test installs with a search showing all files:
As well as the client, there is also a server that anyone can run to store files or tags for sharing between many users. This is advanced, and almost always confusing to new users, so do not explore this until you know what you are doing. There is, however, a user-run public tag repository, with more than a billion tags, that you can access and contribute to if you wish.
I have many plans to expand the client and the network.
"},{"location":"introduction.html#principles","title":"statement of principles","text":"- Speech should be as free as possible.
- Everyone should be able to control their own media diet.
- Computer data and network logs should be absolutely private.
None of the above are currently true, but I would love to live in a world where they were. My software is an attempt to move us a little closer.
Where possible, I prefer decentralised systems that are focused on people. I still use gmail and youtube IRL just like pretty much everyone, but I would rather we have alternative systems for alternate work, especially in the future. No one seemed to be making what I wanted for file management, particularly as everything rushed to the cloud space, so I decided to make a local solution myself, and here we are.
If, after a few months, you find you enjoy the software and would like to further support it, I have set up a simple no-reward patreon, which you can read more about here.
"},{"location":"introduction.html#license","title":"license","text":"These programs are free software. Everything I, hydrus dev, have made is under the Do What The Fuck You Want To Public License, Version 3:
license.txtDO WHAT THE FUCK YOU WANT TO PUBLIC LICENSE\n Version 3, May 2010\n\nCopyright (C) 2011 Hydrus Developer\n\nEveryone is permitted to copy and distribute verbatim or modified\ncopies of this license document, and changing it is allowed as long\nas the name is changed.\n\nThis license applies to any copyrightable work with which it is\npackaged and/or distributed, except works that are already covered by\nanother license. Any other license that applies to the same work\nshall take precedence over this one.\n\nTo the extent permitted by applicable law, the works covered by this\nlicense are provided \"as is\" and do not come with any warranty except\nwhere otherwise explicitly stated.\n\n\n DO WHAT THE FUCK YOU WANT TO PUBLIC LICENSE\n TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION, AND MODIFICATION\n\n 0. You just DO WHAT THE FUCK YOU WANT TO.\n
Do what the fuck you want to with my software, and if shit breaks, DEAL WITH IT.
"},{"location":"ipfs.html","title":"IPFS","text":"IPFS is a p2p protocol that makes it easy to share many sorts of data. The hydrus client can communicate with an IPFS daemon to send and receive files.
You can read more about IPFS from their homepage, or this guide that explains its various rules in more detail.
For our purposes, we only need to know about these concepts:
- IPFS daemon -- A running instance of the IPFS executable that can talk to the larger network.
- IPFS multihash -- An IPFS-specific identifier for a file or group of files.
- pin -- To tell our IPFS daemon to host a file or group of files.
- unpin -- To tell our IPFS daemon to stop hosting a file or group of files.
Note there is now a nicer desktop package here. I haven't used it, but it may be a nicer intro to the program.
Get the prebuilt executable here. Inside should be a very simple 'ipfs' executable that does everything. Extract it somewhere and open up a terminal in the same folder, and then type:
ipfs init
ipfs daemon
The IPFS exe should now be running in that terminal, ready to respond to requests:
You can kill it with Ctrl+C and restart it with the
ipfs daemon
call again (you only have to runipfs init
once).When it is running, opening this page should download and display an example 'Hello World!' file from ~~~across the internet~~~.
Your daemon listens for other instances of ipfs using port 4001, so if you know how to open that port in your firewall and router, make sure you do.
"},{"location":"ipfs.html#connecting","title":"connecting your client","text":"IPFS daemons are treated as services inside hydrus, so go to services->manage services->remote->ipfs daemons and add in your information. Hydrus uses the API port, default 5001, so you will probably want to use credentials of
127.0.0.1:5001
. You can click 'test credentials' to make sure everything is working.Thereafter, you will get the option to 'pin' and 'unpin' from a thumbnail's right-click menu, like so:
This works like hydrus's repository uploads--it won't happen immediately, but instead will be queued up at the pending menu. Commit all your pins when you are ready:
Notice how the IPFS icon appears on your pending and pinned files. You can search for these files using 'system:file service'.
Unpin works the same as pin, just like a hydrus repository petition.
Right-clicking any pinned file will give you a new 'share' action:
Which will put it straight in your clipboard. In this case, it is QmP6BNvWfkNf74bY3q1ohtDZ9gAmss4LAjuFhqpDPQNm1S.
If you want to share a pinned file with someone, you have to tell them this multihash. They can then:
- View it through their own ipfs daemon's gateway, at
http://127.0.0.1:8080/ipfs/[multihash]
- View it through a public web gateway, such as the one the IPFS people run, at
http://ipfs.io/ipfs/[multihash]
- Download it through their ipfs-connected hydrus client by going pages->new download popup->an ipfs multihash.
If you have many files to share, IPFS also supports directories, and now hydrus does as well. IPFS directories use the same sorts of multihash as files, and you can download them into the hydrus client using the same pages->new download popup->an ipfs multihash menu entry. The client will detect the multihash represents a directory and give you a simple selection dialog:
You may recognise those hash filenames--this example was created by hydrus, which can create ipfs directories from any selection of files from the same right-click menu:
Hydrus will pin all the files and then wrap them in a directory, showing its progress in a popup. Your current directory shares are summarised on the respective services->review services panel:
"},{"location":"ipfs.html#additional_links","title":"additional links","text":"If you find you use IPFS a lot, here are some add-ons for your web browser, as recommended by /tech/:
This script changes all bare ipfs hashes into clickable links to the ipfs gateway (on page loads):
- https://greasyfork.org/en/scripts/14837-ipfs-hash-linker
These redirect all gateway links to your local daemon when it's on, it works well with the previous script:
- https://github.com/lidel/ipfs-firefox-addon
- https://github.com/dylanPowers/ipfs-chrome-extension
You can launch the program with several different arguments to alter core behaviour. If you are not familiar with this, you are essentially putting additional text after the launch command that runs the program. You can run this straight from a terminal console (usually good to test with), or you can bundle it into an easy shortcut that you only have to double-click. An example of a launch command with arguments:
C:\\Hydrus Network\\hydrus_client.exe -d=\"E:\\hydrus db\" --no_db_temp_files\n
You can also add --help to your program path, like this:
hydrus_client.py --help
hydrus_server.exe --help
./hydrus_server --help
Which gives you a full listing of all below arguments, however this will not work with the built hydrus_client executables, which are bundled as a non-console programs and will not give you text output to any console they are launched from. As hydrus_client.exe is the most commonly run version of the program, here is the list, with some more help about each command:
"},{"location":"launch_arguments.html#-d_db_dir_--db_dir_db_dir","title":"-d DB_DIR, --db_dir DB_DIR
","text":"Lets you customise where hydrus should use for its base database directory. This is install_dir/db by default, but many advanced deployments will move this around, as described here. When an argument takes a complicated value like a path that could itself include whitespace, you should wrap it in quote marks, like this:
"},{"location":"launch_arguments.html#--temp_dir_temp_dir","title":"-d=\"E:\\my hydrus\\hydrus db\"\n
--temp_dir TEMP_DIR
","text":"This tells all aspects of the client, including the SQLite database, to use a different path for temp operations. This would be by default your system temp path, such as:
C:\\Users\\You\\AppData\\Local\\Temp\n
But you can also check it in help->about. A handful of database operations (PTR tag processing, vacuums) require a lot of free space, so if your system drive is very full, or you have unusual ramdisk-based temp storage limits, you may want to relocate to another location or drive.
"},{"location":"launch_arguments.html#--db_journal_mode_waltruncatepersistmemory","title":"--db_journal_mode {WAL,TRUNCATE,PERSIST,MEMORY}
","text":"Change the journal mode of the SQLite database. The default is WAL, which works great for almost all SSD drives, but if you have a very old or slow drive, or if you encounter 'disk I/O error' errors on Windows with an NVMe drive, try TRUNCATE. Full docs are here.
Briefly:
- WAL - Clever write flushing that takes advantage of new drive synchronisation tools to maintain integrity and reduce total writes.
- TRUNCATE - Compatibility mode. Use this if your drive cannot launch WAL.
- PERSIST - This is newly added to hydrus. The ideal is that if you have a high latency HDD drive and want sync with the PTR, this will work more efficiently than WAL journals, which will be regularly wiped and recreated and be fraggy. Unfortunately, with hydrus's multiple database file system, SQLite ultimately treats this as DELETE, which in our situation is basically the same as TRUNCATE, so does not increase performance. Hopefully this will change in future.
- MEMORY - Danger mode. Extremely fast, but you had better guarantee a lot of free ram and no unclean exits.
--db_transaction_commit_period DB_TRANSACTION_COMMIT_PERIOD
","text":"Change the regular duration at which any database changes are committed to disk. By default this is 30 (seconds) for the client, and 120 for the server. Minimum value is 10. Typically, if hydrus crashes, it may 'forget' what happened up to this duration on the next boot. Increasing the duration will result in fewer overall 'commit' writes during very heavy work that makes several changes to the same database pages (read up on WAL mode for more details here), but it will increase commit time and memory/storage needs. Note that changes can only be committed after a job is complete, so if a single job takes longer than this period, changes will not be saved until it is done.
"},{"location":"launch_arguments.html#--db_cache_size_db_cache_size","title":"--db_cache_size DB_CACHE_SIZE
","text":"Change the size of the cache SQLite will use for each db file, in MB. By default this is 256, for 256MB, which for the four main client db files could mean an absolute 1GB peak use if you run a very heavy client and perform a long period of PTR sync. This does not matter so much (nor should it be fully used) if you have a smaller client.
"},{"location":"launch_arguments.html#--db_synchronous_override_0123","title":"--db_synchronous_override {0,1,2,3}
","text":"Change the rules governing how SQLite writes committed changes to your disk. The hydrus default is 1 with WAL, 2 otherwise.
A user has written a full guide on this value here! SQLite docs here.
"},{"location":"launch_arguments.html#--no_db_temp_files","title":"--no_db_temp_files
","text":"When SQLite performs very large queries, it may spool temporary table results to disk. These go in your temp directory. If your temp dir is slow but you have a ton of memory, set this to never spool to disk, as here.
"},{"location":"launch_arguments.html#--boot_debug","title":"--boot_debug
","text":"Prints additional debug information to the log during the bootup phase of the application.
"},{"location":"launch_arguments.html#--profile_mode","title":"--profile_mode
","text":"This starts the program with 'Profile Mode' turned on, which captures the performance of boot functions. This is also a way to get Profile Mode on the server, although support there is very limited.
"},{"location":"launch_arguments.html#--win_qt_darkmode_test","title":"--win_qt_darkmode_test
","text":"Windows only, client only: This starts the program with Qt's 'darkmode' detection enabled, as here, set to 1 mode. It will override any existing qt.conf, so it is only for experimentation. We are going to experiment more with the 2 mode, but that locks the style to
"},{"location":"launch_arguments.html#server_arguments","title":"server arguments","text":"windows
, and can't handle switches between light and dark mode.The server supports the same arguments. It also takes an optional positional argument of 'start' (start the server, the default), 'stop' (stop any existing server), or 'restart' (do a stop, then a start), which should go before any of the above arguments.
"},{"location":"petitionPractices.html","title":"Petitions practices","text":"This document exists to give a rough idea what to do in regard to the PTR to avoid creating uncecessary work for the janitors.
"},{"location":"petitionPractices.html#general_practice","title":"General practice","text":"Kindly avoid creating unnecessary work. Create siblings for underscore and non-namespaced/namespaced versions. Petition for deletion if they are wrong. Providing a reason outside of the stock choices helps the petition getting accepted. If, for whatever reason, you have some mega job that needs doing it's often a good idea to talk to a janitor instead since we can just go ahead and do the job directly without having to deal with potentially tens of petitions because of how Hydrus splits them on the server. An example that we often come across is the removal of the awful Sankaku URLs that are almost everywhere these days due to people using a faulty parser. It's a pretty easy search and delete for a janitor, but a lot of annoying clicking if dealt with as a petition since one big petition can be split out to God-only-knows-how many.
Eventually the PTR janitors will get tools to replace various bad but correct tags on the server itself. These include underscored, wrong or no namespace, common misspelling, wrong locale, and so on. Since we're going to have to do the job eventually anyway there's not much of a point making us do it twice by petitioning the existing bad but correct tags. Just sibling them and leave them be for now.
"},{"location":"petitionPractices.html#ambiguity","title":"Ambiguity","text":"Don't make additions involving ambiguous tags.
"},{"location":"petitionPractices.html#petitions_involving_system_predicates","title":"Petitions involving system predicates","text":"hibiki
->character:hibiki (kantai collection)
is bad since there's far more than one character with that name. There's quite a few wrongly tagged images because of things like this. Petitioning the deletion of such a bad sibling is good.Anything that's covered by system predicates. Siblinging these is unecessary and parenting pointless. There's no harm leaving them be aside from crowding the tag list but there's no harm to deleting them either.
-
system:dimensions
covers most everything related to resolution and aspect ratios.medium:high resolution
,4:3 aspect ratio
, and pixel count. -
system:duration
for whether something has duration (is a video or animated gif/png/whatever), or is a still image. -
system:has audio
for if an image has audio or not.system:has duration + system:no audio
replacesvideo with no sound
as an example. -
system:filesize
for things likehuge filesize
. -
system:filetype
for filetypes. Gif, webm, mp4, psd, and so on. Anything that Hydrus can recognise which is quite a bit.
Don't push parents for tags that are not top-level siblings. It makes tracking down potential issues hard.
Only push parents for relations that are literally always true, no exceptions.
character:james bond
->series:james bond
is a good example because James Bond always belong to that series. ->gender:male
is bad because an artist might decide to draw a genderbent piece of art. Similarily ->person:pierce brosnan
is bad because there have been other actors for the character.List of some bad parents to
"},{"location":"petitionPractices.html#translations","title":"Translations","text":"character:
tags as an example: -species:
due to the various -zations (humanization, animalization, mechanization). -creator:
since just about anybody can draw art of the character. -gender:
Sincegenderswap
and variations exists. - Any form of physical characteristics such as hair or eye colour, hair length, clothing and accessories, etc.Translations should be siblinged to what the closest in-use romanised tag is if there's no proper translation. If the tag is ambiguous, such as
"},{"location":"privacy.html","title":"privacy","text":"\u97ff
or\u30d2\u30d3\u30ad
which meanshibiki
, just sibling them to the ambiguous tag. The tag can then later on be deleted and replaced by a less ambiguous tag. On the other hand,\u97ff(\u8266\u968a\u3053\u308c\u304f\u3057\u3087\u3093)
straight up meanshibiki (kantai kollection)
and can safely be siblinged to the propercharacter:
tag. Do the same for subjective tags.\u9b45\u60d1\u306e\u3075\u3068\u3082\u3082
can be translated tobewitching thighs
.\u307e\u3063\u305f\u304f\u3001\u99c6\u9010\u8266\u306f\u6700\u9ad8\u3060\u305c!!
straight up translates toGeez, destroyers are the best!!
, which does not contain much usable information for Hydrus currently. These can then either be siblinged down to an unsubjective tag (thighs
) if there's objective information in the tag, deleted if purely subjective, or deleted and replaced if ambiguous.tl;dr
Using a trustworthy VPN for all your remotely fun internet traffic is a good idea. It is cheap and easy these days, and it offers multiple levels of general protection.
I have tried very hard to ensure the hydrus network servers respect your privacy. They do not work like normal websites, and the amount of information your client will reveal to them is very limited. For most general purposes, normal users can rest assured that their activity on a repository like the Public Tag Repository (PTR) is effectively completely anonymous.
You need an account to connect, but all that really means serverside is a random number with a random passcode. Your client tells nothing more to the server than the exact content you upload to it (e.g. tag mappings, which are a tag+file_hash pair). The server cannot help but be aware of your IP address to accept your network request, but in all but one situation--uploading a file to a file repository when the administrator has set to save IPs for DMCA purposes--it forgets your IP as soon as the job is done.
So that janitors can process petitions efficiently and correct mistakes, servers remember which accounts upload which content, but they do not communicate this to any place, and the memory only lasts for a certain time--after which the content is completely anonymised. The main potential privacy worries are over a malicious janitor or--more realistically, since the janitor UI is even more buggy and feature-poor than the hydrus front-end!--a malicious server owner or anyone else who gains raw access to the server's raw database files or its code as it operates. Even in the case where you cannot trust the server you are talking to, hydrus should be fairly robust, simply because the client does not say much to the server, nor that often. The only realistic worries, as I talk about in detail below, are if you actually upload personal files or tag personal files with real names. I can't do much about being Anon if you (accidentally or not), declare who you are.
So, in general, if you are on a good VPN and tagging anime babes from boorus, I think we are near perfect on privacy. That said, our community is rightly constantly thinking about this topic, so in the following I have tried to go into exhaustive detail. Some of the vulnerabilities are impractical and esoteric, but if nothing else it is fun to think about. If you can think of more problems, or decent mitigations, let me know!
"},{"location":"privacy.html#https_certificates","title":"https certificates","text":"Hydrus servers only communicate in https, so anyone who is able to casually observe your traffic (say your roommate cracked your router, or the guy running the coffee shop whose wifi you are using likes to snoop) should not ever be able to see what data you are sending or receiving. If you do not use a VPN, they will be able to see that you are talking to the repository (and the repository will technically see who you are, too, though as above, it normally isn't interested). Someone more powerful, like your ISP or Government, may be able to do more:
If you just start a new server yourselfWhen you first make a server, the 'certificate' it creates to enable https is a low quality one. It is called 'self-signed' because it is only endorsed by itself and it is not tied to a particular domain on the internet that everyone agrees on via DNS. Your traffic to this server is still encrypted, but an advanced attacker who stands between you and the server could potentially perform what is called a man-in-the-middle attack and see your traffic.
This problem is fairly mitigated by using a VPN, since even if someone were able to MitM your connection, they know no more than your VPN's location, not your IP.
A future version of the network will further mitigate this problem by having you enter unverified certificates into a certificate manager and then compare to that store on future requests, to try to detect if a MitM attack is occurring.
If the server is on a domain and now uses a proper verified certificate If the admin hosts the server on a website domain (rather than a raw IP address) and gets a proper certificate for that domain from a service like Let's Encrypt, they can swap that into the server and then your traffic should be protected from any eavesdropper. It is still good to use a VPN to further obscure who you are, including from the server admin.You can check how good a server's certificate is by loading its base address in the form
"},{"location":"privacy.html#accounts","title":"accounts","text":"https://host:port
into your browser. If it has a nice certificate--like the PTR--the welcome page will load instantly. If it is still on self-signed, you'll get one of those 'can't show this page unless you make an exception' browser error pages before it will show.An account has two hex strings, like this:
-
Access key:
4a285629721ca442541ef2c15ea17d1f7f7578b0c3f4f5f2a05f8f0ab297786f
This is in your services->manage services panel, and acts like a password. Keep this absolutely secret--only you know it, and no one else ever needs to. If the server has not had its code changed, it does not actually know this string, but it is stores special data that lets it verify it when you 'log in'.
-
Account ID:
207d592682a7962564d52d2480f05e72a272443017553cedbd8af0fecc7b6e0a
This can be copied from a button in your services->review services panel, and acts a bit like a semi-private username. Only janitors should ever have access to this. If you ever want to contact the server admin about an account upgrade or similar, they will need to know this so they can load up your account and alter it.
When you generate a new account, the client first asks the server for a list of available auto-creatable account types, then asks for a registration token for one of them, then uses the token to generate an access key. The server is never told anything about you, and it forgets your IP address as soon as it finishes talking to you.
Your account also stores a bandwidth use record and some miscellaneous data such as when the account was created, if and when it expires, what permissions and bandwidth rules it has, an aggregate score of how often it has petitions approved rather than denied, and whether it is currently banned. I do not think someone inspecting the bandwidth record could figure out what you were doing based on byte counts (especially as with every new month the old month's bandwidth records are compressed to just one number) beyond the rough time you synced and whether you have done much uploading. Since only a janitor can see your account and could feasibly attempt to inspect bandwidth data, they would already know this information.
"},{"location":"privacy.html#downloading","title":"downloading","text":"When you sync with a repository, your client will download and then keep up to date with all the metadata the server knows. This metadata is downloaded the same way by all users, and it comes in a completely anonymous format. The server does not know what you are interested in, and no one who downloads knows who uploaded what. Since the client regularly updates, a detailed analysis of the raw update files will reveal roughly when a tag or other row was added or deleted, although that timestamp is no more precise than the duration of the update period (by default, 100,000 seconds, or a little over a day).
Your client will never ask the server for information about a particular file or tag. You download everything in generic chunks, form a local index of that information, and then all queries are performed on your own hard drive with your own CPU.
By just downloading, even if the server owner were to identify you by your IP address, all they know is that you sync. They cannot tell anything about your files.
In the case of a file repository, you client downloads all the thumbnails automatically, but then you download actual files separately as you like. The server does not log which files you download.
"},{"location":"privacy.html#uploading","title":"uploading","text":"When you upload, your account is temporarily linked to the rows of content you add. This is so janitors can group petitions by who makes them, undo large mistakes easily, and even leave you a brief message (like \"please stop adding those clothing siblings\") for your client to pick up the next time it syncs your account. After the temporary period is over, all submissions are anonymised. So, what are the privacy concerns with that? Isn't the account 'Anon'?
Privacy can be tricky. Hydrus tech is obviously far, far better than anything normal consumers use, but here I believe are the remaining barriers to pure Anonymity, assuming someone with resources was willing to put a lot of work in to attack you:
Note
I am using the PTR as the example since that is what most people are using. If you are uploading to a server run between friends, privacy is obviously more difficult to preserve--if there are only three users, it may not be too hard to figure out who is uploading the NarutoXSonichu diaperfur content! If you are talking to a server with a small group of users, don't upload anything crazy or personally identifying unless that's the point of the server.
"},{"location":"privacy.html#ip_address_across_network","title":"IP Address Across Network","text":"Attacker: ISP/Government.
Exposure: That you use the PTR.
Problem: Your IP address may be recorded by servers in between you and the PTR (e.g. your ISP/Government). Anyone who could convert that IP address and timestamp into your identity would know you were a PTR user.
Mitigation: Use a trustworthy VPN.
"},{"location":"privacy.html#ip_address_at_ptr","title":"IP Address At PTR","text":"Attacker: PTR administrator or someone else who has access to the server as it runs.
Exposure: Which PTR account you are.
Problem: I may be lying to you about the server forgetting IPs, or the admin running the PTR may have secretly altered its code. If the malicious admin were able to convert IP address and timestamp into your identity, they obviously be able to link that to your account and thus its various submissions.
Mitigation: Use a trustworthy VPN.
"},{"location":"privacy.html#time_identifiable_uploads","title":"Time Identifiable Uploads","text":"Attacker: Anyone with an account on the PTR.
Exposure: That you use the PTR.
Problem: If a tag was added way before the file was public, then it is likely the original owner tagged it. An example would be if you were an artist and you tagged your own work on the PTR two weeks before publishing the work. Anyone who looked through the server updates carefully and compared to file publish dates, particularly if they were targeting you already, could notice the date discrepancy and know you were a PTR user.
Mitigation: Don't tag any file you plan to share if you are currently the only person who has any copies. Upload it, then tag it.
"},{"location":"privacy.html#content_identifiable_uploads","title":"Content Identifiable Uploads","text":"Attacker: Anyone with an account on the PTR.
Exposure: That you use the PTR.
Problem: All uploads are shared anonymously with other users, but if the content itself is identifying, you may be exposed. An example would be if there was some popular lewd file floating around of you and your girlfriend, but no one knew who was in it. If you decided to tag it with accurate 'person:' tags, anyone synced with the PTR, when they next looked at that file, would see those person tags. The same would apply if the file was originally private but then leaked.
Mitigation: Just like an imageboard, do not upload any personally identifying information.
"},{"location":"privacy.html#individual_account_cross-referencing","title":"Individual Account Cross-referencing","text":"Attacker: PTR administrator or someone else with access to the server database files after one of your uploads has been connected to your real identity, perhaps with a Time/Content Identifiable Upload as above.
Exposure: What you have been uploading recently.
Problem: If you accidentally tie your identity to an individual content row (could be as simple as telling an admin 'yes, I, person whose name you know, uploaded that sibling last week'), then anyone who can see which accounts uploaded what will obviously be able to see your other uploads.
Mitigation: Best practise is to not to reveal specifically what you upload. Note that this vulnerability (an admin looking up what else you uploaded after they discover something else you did) is now well mitigated by the account history anonymisation as below (assuming the admin has not altered the code to disable it!). If the server is set to anonymise content after 90 days, then your account can only be identified from specific content rows that were uploaded in the past 90 days, and cross-references would also only see the last 90 days of activity.
"},{"location":"privacy.html#big_brain_individual_account_mapping_fingerprint_cross-referencing","title":"Big Brain Individual Account Mapping Fingerprint Cross-referencing","text":"Attacker: Someone who has access to tag/file favourite lists on another site and gets access to a hydrus repository that has been compromised to not anonymise history for a long duration.
Exposure: Which PTR account another website's account uses.
Problem: Someone who had raw access to the PTR database's historical account record (i.e. they had disabled the anonymisation routine below) and also had compiled some booru users' 'favourite tag/artist' lists and was very clever could try to cross reference those two lists and connect a particular PTR account to a particular booru account based on similar tag distributions. There would be many holes in the PTR record, since only the first account to upload a tag mapping is linked to it, but maybe it would be possible to get high confidence on a match if you have really distinct tastes. Favourites lists are probably decent digital fingerprints, and there may be a shadow of that in your PTR uploads, although I also think there are enough users uploading and 'competing' for saved records on different tags that each users' shadow would be too indistinct to really pull this off.
Mitigation: I am mostly memeing here. But privacy is tricky, and who knows what the scrapers of the future are going to do with all the cloud data they are sucking up. Even then, the historical anonymisation routine below now generally eliminates this threat, assuming the server has not been compromised to disable it, so it matters far less if its database files fall into bad hands in the future, but accounts on regular websites are already being aggregated by the big marketing engines, and this will only happen in more clever ways in future. I wouldn't be surprised if booru accounts are soon being connected to other online identities based on fingerprint profiles of likes and similar. Don't save your spicy favourites on a website, even if that list is private, since if that site gets hacked or just bought out one day, someone really smart could start connecting dots ten years from now.
"},{"location":"privacy.html#account_history","title":"account history anonymisation","text":"As the PTR moved to multiple accounts, we talked more about the potential account cross-referencing worries. The threats are marginal today, but it may be a real problem in future. If the server database files were to ever fall into bad hands, having a years-old record of who uploaded what is not excellent. Like the AOL search leak, that data may have unpleasant rammifications, especially to an intelligent scraper in the future. This historical record is also not needed for most janitorial work.
Therefore, hydrus repositories now completely anonymise all uploads after a certain delay. It works by assigning ownership of every file, mapping, or tag sibling/parent to a special 'null' account, so all trace that your account uploaded any of it is deleted. It happens by default 90 days after the content is uploaded, but it can be more or less depending on the local admin and janitors. You can see the current 'anonymisation' period under review services.
If you are a janitor with the ability to modify accounts based on uploaded content, you will see anything old will bring up the null account. It is specially labelled, so you can't miss it. You cannot ban or otherwise alter this account. No one can actually use it.
"},{"location":"reducing_lag.html","title":"reducing lag","text":""},{"location":"reducing_lag.html#intro","title":"hydrus is cpu and hdd hungry","text":"The hydrus client manages a lot of complicated data and gives you a lot of power over it. To add millions of files and tags to its database, and then to perform difficult searches over that information, it needs to use a lot of CPU time and hard drive time--sometimes in small laggy blips, and occasionally in big 100% CPU chunks. I don't put training wheels or limiters on the software either, so if you search for 300,000 files, the client will try to fetch that many.
Furthermore, I am just one unprofessional guy dealing with a lot of legacy code from when I was even worse at programming. I am always working to reduce lag and other inconveniences, and improve UI feedback when many things are going on, but there is still a lot for me to do.
In general, the client works best on snappy computers with low-latency hard drives where it does not have to constantly compete with other CPU- or HDD- heavy programs. Running hydrus on your games computer is no problem at all, but if you leave the client on all the time, then make sure under the options it is set not to do idle work while your CPU is busy, so your games can run freely. Similarly, if you run two clients on the same computer, you should have them set to work at different times, because if they both try to process 500,000 tags at once on the same hard drive, they will each slow to a crawl.
If you run on an HDD, keeping it defragged is very important, and good practice for all your programs anyway. Make sure you know what this is and that you do it.
"},{"location":"reducing_lag.html#maintenance_and_processing","title":"maintenance and processing","text":"I have attempted to offload most of the background maintenance of the client (which typically means repository processing and internal database defragging) to time when you are not using the client. This can either be 'idle time' or 'shutdown time'. The calculations for what these exactly mean are customisable in file->options->maintenance and processing.
If you run a quick computer, you likely don't have to change any of these options. Repositories will synchronise and the database will stay fairly optimal without you even noticing the work that is going on. This is especially true if you leave your client on all the time.
If you have an old, slower computer though, or if your hard drive is high latency, make sure these options are set for whatever is best for your situation. Turning off idle time completely is often helpful as some older computers are slow to even recognise--mid task--that you want to use the client again, or take too long to abandon a big task half way through. If you set your client to only do work on shutdown, then you can control exactly when that happens.
"},{"location":"reducing_lag.html#reducing_lag","title":"reducing search and general gui lag","text":"Searching for tags via the autocomplete dropdown and searching for files in general can sometimes take a very long time. It depends on many things. In general, the more predicates (tags and system:something) you have active for a search, and the more specific they are, the faster it will be.
You can also look at file->options->speed and memory. Increasing the autocomplete thresholds under tags->manage tag display and search is also often helpful. You can even force autocompletes to only fetch results when you manually ask for them.
Having lots of thumbnails open or downloads running can slow many things down. Check the 'pages' menu to see your current session weight. If it is about 50,000, or you have individual pages with more than 10,000 files or download URLs, try cutting down a bit.
"},{"location":"reducing_lag.html#profiles","title":"finally - profiles","text":"Programming is all about re-editing your first, second, third drafts of an idea. You are always going back to old code and adding new features or making it work better. If something is running slow for you, I can almost always speed it up or at least improve the way it schedules that chunk of work.
However figuring out exactly why something is running slow or holding up the UI is tricky and often gives an unexpected result. I can guess what might be running inefficiently from reports, but what I really need to be sure is a profile, which drills down into every function of a job, counting how many times they are called and timing how long they take. A profile for a single call looks like this.
So, please let me know:
- The general steps to reproduce the problem (e.g. \"Running system:numtags>4 is ridiculously slow on its own on 'all known tags'.\")
- Your client's approximate overall size (e.g. \"500k files, and it syncs to the PTR.\")
- The type of hard drive you are running hydrus from. (e.g. \"A 2TB 7200rpm drive that is 20% full. I regularly defrag it.\")
- Any profiles you have collected.
You can generate a profile by hitting help->debug->profiling->profile mode, which tells the client to generate profile information for almost all of its behind the scenes jobs. This can be spammy, so don't leave it on for a very long time (you can turn it off by hitting the help menu entry again).
Turn on profile mode, do the thing that runs slow for you (importing a file, fetching some tags, whatever), and then check your database folder (most likely install_dir/db) for a new 'client profile - DATE.log' file. This file will be filled with several sets of tables with timing information. Please send that whole file to me, or if it is too large, cut what seems important. It should not contain any personal information, but feel free to look through it.
There are several ways to contact me.
"},{"location":"running_from_source.html","title":"running from source","text":"I write the client and server entirely in python, which can run straight from source. It is getting simpler and simpler to run python programs like this, so don't be afraid of it. If none of the built packages work for you (for instance if you use Windows 8.1 or 18.04 Ubuntu (or equivalent)), it may be the only way you can get the program to run. Also, if you have a general interest in exploring the code or wish to otherwise modify the program, you will obviously need to do this.
"},{"location":"running_from_source.html#simple_setup_guide","title":"Simple Setup Guide","text":"There are now setup scripts that make this easy on Windows and Linux. You do not need any python experience.
"},{"location":"running_from_source.html#summary","title":"Summary:","text":"- Get Python.
- Get Hydrus source.
- Get mpv/SQLite/FFMPEG.
- Run setup_venv script.
- Run setup_help script.
- Run client script.
First of all, you will need git. If you are just a normal Windows user, you will not have it. Get it:
Git for WindowsGit is an excellent tool for synchronising code across platforms. Instead of downloading and extracting the whole .zip every time you want to update, it allows you to just run one line and all the code updates are applied in about three seconds. You can also run special versions of the program, or test out changes I committed two minutes ago without having to wait for me to make a whole build. You don't have to, but I recommend you get it.
Installing it is simple, but it can be intimidating. These are a bunch of very clever tools coming over from Linux-land, and the installer has a 10+ page wizard with several technical questions. Luckily, the 'default' is broadly fine, but I'll write everything out so you can follow along. I can't promise this list will stay perfectly up to date, so let me know if there is something complex and new you don't understand. This is also a record that I can refer to when I set up a new machine.
- First off, get it here. Run the installer.
- On the first page, with checkboxes, I recommend you uncheck 'Windows Explorer Integration', with its 'Open Git xxx here' sub-checkboxes. This stuff will just be annoying for our purposes.
- Then set your text editor. Select the one you use, and if you don't recognise anything, set 'notepad'.
- Now we enter the meat of the wizard pages. Everything except the default console window is best left as default:
Let Git decide
on using \"master\" as the default main branch nameGit from the command line and also from 3rd-party software
Use bundled OpenSSH
Use the OpenSSL library
Checkout Windows-style, commit Unix-style line endings
- (NOT DEFAULT)
Use Windows' default console window
. Let's keep things simple, but it isn't a big deal. Fast-forward or merge
Git Credential Manager
- Do
Enable file system caching
/Do notEnable symbolic links
- Do not enable experimental stuff
Git should now be installed on your system. Any new terminal/command line/powershell window (shift+right-click on any folder and hit something like 'Open in terminal') now has the
Windows 7git
command!For a long time, I supported Windows 7 via running from source. Unfortunately, as libraries and code inevitably updated, this finally seems to no longer be feasible. Python 3.8 will no longer run the program. I understand v582 is one of the last versions of the program to work.
First, you will have to install the older Python 3.8, since that is the latest version that you can run.
Then, later, when you do the
git clone https://github.com/hydrusnetwork/hydrus
line, you will need to rungit checkout tags/v578
, which will rewind you to that point in time.You will also need to navigate to
install_dir/static/requirements/advanced
and editrequirements_core.txt
; remove the 'psd-tools' line before you run setup_venv.I can't promise anything though. The requirements.txt isn't perfect, and something else may break in future! You may like to think about setting up a Linux instance.
Then you will need to install Python. Get 3.10 or 3.11 here. During the install process, make sure it has something like 'Add Python to PATH' checked. This makes Python available everywhere in Windows.
You should already have a fairly new python. Ideally, you want at least 3.9. You can find out what version you have just by opening a new terminal and typing 'python'.
You should already have a fairly new python. Ideally, you want at least 3.9. You can find out what version you have just by opening a new terminal and typing 'python'.
If you are already on newer python, like 3.12+, that's ok--you might need to select the 'advanced' setup later on and choose the '(t)est' options. If you are stuck on 3.9, try the same thing, but with the '(o)lder' options (but I can't promise it will work!).
Then, get the hydrus source. It is best to get it with Git: make a new folder somewhere, open a terminal in it, and then paste:
git clone https://github.com/hydrusnetwork/hydrus\n
The whole repository will be copied to that location--this is now your install dir. You can move it if you like.
If Git is not available, then just go to the latest release and download and extract the source code .zip somewhere.
Read-only install locations
Make sure the install directory has convenient write permissions (e.g. on Windows, don't put it in \"Program Files\"). Extracting straight to a spare drive, something like \"D:\\Hydrus Network\", is ideal.
We will call the base extract directory, the one with 'hydrus_client.py' in it,
install_dir
.Mixed Builds
Don't mix and match build extracts and source extracts. The process that runs the code gets confused if there are unexpected extra .dlls in the directory. If you need to convert between built and source releases, perform a clean install.
If you are converting from one install type to another, make a backup before you start. Then, if it all goes wrong, you'll always have a safe backup to rollback to.
"},{"location":"running_from_source.html#built_programs","title":"Built Programs","text":"There are three special external libraries. You just have to get them and put them in the correct place:
WindowsLinuxmacOS-
mpv
- If you are on Windows 8.1 or older, this is known safe.
- If you are on Windows 10 or newer and want the very safe answer, try this.
- Otherwise, go for this.
- I have been testing this newer version and this very new version and things seem to be fine too, at least on updated Windows.
Then open that archive and place the 'mpv-1.dll'/'mpv-2.dll'/'libmpv-2.dll' into
mpv on older Windowsinstall_dir
.I have word that that newer mpv, the API version 2.1 that you have to rename to mpv-2.dll, will work on Qt5 and Windows 7. If this applies to you, have a play around with different versions here. You'll need the newer mpv choice in the setup-venv script however, which, depending on your situation, may not be possible.
-
SQLite3
This is optional and might feel scary, so feel free to ignore. It updates your python install to newer, faster database tech.
Open your python install location and find the DLLs folder. Likely something like
C:\\Program Files\\Python311\\DLLs
orC:\\Python311\\DLLs
. There should be a sqlite3.dll there. Rename it to sqlite3.dll.old, and then openinstall_dir/static/build_files/windows
and copy that 'sqlite3.dll' into the pythonDLLs
folder.The absolute newest sqlite3.dll is always available here. You want the x64 dll.
-
FFMPEG
Get a Windows build of FFMPEG here.
Extract the ffmpeg.exe into
install_dir/bin
.
-
mpv
Try running
apt-get install libmpv1
in a new terminal. You can typeapt show libmpv1
to see your current version. Or, if you use a different package manager, try searchinglibmpv
orlibmpv1
on that.- If you have earlier than 0.34.1, you will be looking at running the 'advanced' setup in the next section and selecting the 'old' mpv.
- If you have 0.34.1 or later, you can run the normal setup script.
-
SQLite3
No action needed.
-
FFMPEG
You should already have ffmpeg. Just type
ffmpeg
into a new terminal, and it should give a basic version response. If you somehow don't have ffmpeg, check your package manager.
-
mpv
Unfortunately, mpv is not well supported in macOS yet. You may be able to install it in brew, but it seems to freeze the client as soon as it is loaded. Hydev is thinking about fixes here.
-
SQLite3
No action needed.
-
FFMPEG
You should already have ffmpeg.
Double-click
setup_venv.bat
.The file is
setup_venv.sh
. You may be able to double-click it. If not, open a terminal in the folder and type:./setup_venv.sh
If you do not have permission to execute the file, do this before trying again:
chmod +x setup_venv.sh
You will likely have to do the same on the other .sh files.
If you get an error about the venv failing to activate during
setup_venv.sh
, you may need to install venv especially for your system. The specific error message should help you out, but you'll be looking at something along the lines ofapt install python3.10-venv
.If you like, you can run the
setup_desktop.sh
file to install an io.github.hydrusnetwork.hydrus.desktop file to your applications folder. (Or check the template ininstall_dir/static/io.github.hydrusnetwork.hydrus.desktop
and do it yourself!)Double-click
setup_venv.command
.If you do not have permission to run the .command file, then open a terminal on the folder and enter:
chmod +x setup_venv.command
You will likely have to do the same on the other .command files.
You may need to experiment with the advanced choices, especially if your macOS is a litle old.
The setup will ask you some questions. Just type the letters it asks for and hit enter. Most users are looking at the (s)imple setup, but if your situation is unusual, try the (a)dvanced, which will walk you through the main decisions. Once ready, it should take a minute to download its packages and a couple minutes to install them. Do not close it until it is finished installing everything and says 'Done!'. If it seems like it hung, just give it time to finish.
If something messes up, or you want to make a different decision, just run the setup script again and it will reinstall everything. Everything these scripts do ends up in the 'venv' directory, so you can also just delete that folder to 'uninstall' the venv. It should just work on most normal computers, but let me know if you have any trouble.
Then run the 'setup_help' script to build the help. This isn't necessary, but it is nice to have it built locally. You can run this again at any time to rebuild the current help.
"},{"location":"running_from_source.html#running_it_1","title":"Running it","text":"WindowsLinuxmacOSRun 'hydrus_client.bat' to start the client.
Qt compatibility
If you run into trouble running newer versions of Qt6, some users have fixed it by installing the packages
libicu-dev
andlibxcb-cursor-dev
. Withapt
that will be:sudo apt-get install libicu-dev
sudo apt-get install libxcb-cursor-dev
If you still have trouble with the default Qt6 version, try running setup_venv again and choose a different version. There are several to choose from, including (w)riting a custom version. Check the advanced requirements.txts files in
install_dir/static/requirements/advanced
for more info, and you can also work off this list: PySide6Run 'hydrus_client.sh' to start the client. Don't forget to set
chmod +x hydrus_client.sh
if you need it.Run 'hydrus_client.command' to start the client. Don't forget to set
chmod +x hydrus_client.command
if you need it.The first start will take a little longer (it has to compile all the code into something your computer understands). Once up, it will operate just like a normal build with the same folder structure and so on.
Missing a Library
If the client fails to boot, it should place a 'hydrus_crash.log' in your 'db' directory or your desktop, or, if it got far enough, it may write the error straight to the 'client - date.log' file in your db directory.
If that error talks about a missing library, try reinstalling your venv. Are you sure it finished correctly? Do you need to run the advanced setup and select a different version of Qt?
WindowsLinuxmacOSIf you want to redirect your database or use any other launch arguments, then copy 'hydrus_client.bat' to 'hydrus_client-user.bat' and edit it, inserting your desired db path. Run this instead of 'hydrus_client.bat'. New
git pull
commands will not affect 'hydrus_client-user.bat'.You probably can't pin your .bat file to your Taskbar or Start (and if you try and pin the running program to your taskbar, its icon may revert to Python), but you can make a shortcut to the .bat file, pin that to Start, and in its properties set a custom icon. There's a nice hydrus one in
install_dir/static
.However, some versions of Windows won't let you pin a shortcut to a bat to the start menu. In this case, make a shortcut like this:
C:\\Windows\\System32\\cmd.exe /c \"C:\\hydrus\\Hydrus Source\\hydrus_client-user.bat\"
This is a shortcut to tell the terminal to run the bat; it should be pinnable to start. You can give it a nice name and the hydrus icon and you should be good!
If you want to redirect your database or use any other launch arguments, then copy 'hydrus_client.sh' to 'hydrus_client-user.sh' and edit it, inserting your desired db path. Run this instead of 'hydrus_client.sh'. New
git pull
commands will not affect 'hydrus_client-user.sh'.If you want to redirect your database or use any other launch arguments, then copy 'hydrus_client.command' to 'hydrus_client-user.command' and edit it, inserting your desired db path. Run this instead of 'hydrus_client.command'. New
"},{"location":"running_from_source.html#simple_updating_guide","title":"Simple Updating Guide","text":"git pull
commands will not affect 'hydrus_client-user.command'.To update, you do the same thing as for the extract builds.
- If you installed by extracting the source zip, then download the latest release source zip and extract it over the top of the folder you have, overwriting the existing source files.
- If you installed with git, then just run
git pull
as normal. I have added easy 'git_pull' scripts to the install directory for your convenience (on Windows, just double-click 'git_pull.bat').
If you get a library version error when you try to boot, run the venv setup again. It is worth doing this anyway, every now and then, just to stay up to date.
"},{"location":"running_from_source.html#migrating_from_an_existing_install","title":"Migrating from an Existing Install","text":"Many users start out using one of the official built releases and decide to move to source. There is lots of information here about how to migrate the database, but for your purposes, the simple method is this:
If you never moved your database to another place and do not use -d/--db_dir launch parameter
- Follow the above guide to get the source install working in a new folder on a fresh database
- MAKE A BACKUP OF EVERYTHING
- Delete everything from the source install's
db
directory. - Move your built release's entire
db
directory to the source. - Run your source release again--it should load your old db no problem!
- Update your backup routine to point to the new source install location.
If you moved your database to another location and use the -d/--db_dir launch parameter
- Follow the above guide to get the source install working in a new folder on a fresh database (without -db_dir)
- MAKE A BACKUP OF EVERYTHING
- Just to be neat, delete the .db files, .log files, and client_files folder from the source install's
db
directory. - Run the source install with --db_dir just as you would the built executable--it should load your old db no problem!
This is for advanced users only.
If you have never used python before, do not try this. If the easy setup scripts failed for you and you don't know what happened, please contact hydev before trying this, as the thing that went wrong there will probably go much more wrong here.
You can also set up the environment yourself. Inside the extract should be hydrus_client.py and hydrus_server.py. You will be treating these basically the same as the 'client' and 'server' executables--with the right environment, you should be able to launch them the same way and they take the same launch parameters as the exes.
Hydrus needs a whole bunch of libraries, so let's now set your python up. I strongly recommend you create a virtual environment. It is easy and doesn't mess up your system python.
You have to do this in the correct order! Do not switch things up. If you make a mistake, delete your venv folder and start over from the beginning.
To create a new venv environment:
- Open a terminal at your hydrus extract folder. If
python3
doesn't work, usepython
. python3 -m pip install virtualenv
(if you need it)python3 -m venv venv
source venv/bin/activate
(CALL venv\\Scripts\\activate.bat
in Windows cmd)python -m pip install --upgrade pip
python -m pip install --upgrade wheel
venvs
That
source venv/bin/activate
line turns on your venv. You should see your terminal prompt note you are now in it. A venv is an isolated environment of python that you can install modules to without worrying about breaking something system-wide. Ideally, you do not want to install python modules to your system python.This activate line will be needed every time you alter your venv or run the
hydrus_client.py
/hydrus_server.py
files. You can easily tuck this into a launch script--check the easy setup files for examples.On Windows Powershell, the command is
.\\venv\\Scripts\\activate
, but you may find the whole deal is done much easier in cmd than Powershell. When in Powershell, just typecmd
to get an old fashioned command line. In cmd, the launch command is justvenv\\scripts\\activate.bat
, no leading period.After you have activated the venv, you can use pip to install everything you need to it from the requirements.txt in the install_dir:
python -m pip install -r requirements.txt\n
If you need different versions of libraries, check the cut-up requirements.txts the 'advanced' easy-setup uses in
"},{"location":"running_from_source.html#qt","title":"Qt","text":"install_dir/static/requirements/advanced
. Check and compare their contents to the main requirements.txt to see what is going on. You'll likely need the newer OpenCV on Python 3.10, for instance.Qt is the UI library. You can run PySide2, PySide6, PyQt5, or PyQt6. A wrapper library called
qtpy
allows this. The default is PySide6, but if it is missing, qtpy will fall back to an available alternative. For PyQt5 or PyQt6, you need an extra Chart module, so go:python -m pip install qtpy PyQtChart PyQt5\n-or-\npython -m pip install qtpy PyQt6-Charts PyQt6\n
If you have multiple Qts installed, then select which one you want to use by setting the
QT_API
environment variable to 'pyside2', 'pyside6', 'pyqt5', or 'pyqt6'. Check help->about to make sure it loaded the right one.If you want to set QT_API in a batch file, do this:
set QT_API=pyqt6
If you run <= Windows 8.1 or Ubuntu 18.04, you cannot run Qt6. Try PySide2 or PyQt5.
Qt compatibility
If you run into trouble running newer versions of Qt6 on Linux, often with an XCB-related error such as
qt.qpa.plugin: Could not load the Qt platform plugin \"xcb\" in \"\" even though it was found.
, try installing the packageslibicu-dev
andlibxcb-cursor-dev
. Withapt
that will be:sudo apt-get install libicu-dev
sudo apt-get install libxcb-cursor-dev
If you still have trouble with the default Qt6 version, check the advanced requirements.txts in
"},{"location":"running_from_source.html#mpv","title":"mpv","text":"install_dir/static/requirements/advanced
. There should be several older version examples you can explore, and you can also work off these lists: PySide6 PyQt6 PySide2 Pyqt5MPV is optional and complicated, but it is great, so it is worth the time to figure out!
As well as the python wrapper, 'python-mpv' (which is in the requirements.txt), you also need the underlying dev library. This is not mpv the program, but 'libmpv', often called 'libmpv1'.
For Windows, the dll builds are here, although getting a stable version can be difficult. Just put it in your hydrus base install directory. Check the links in the easy-setup guide above for good versions. You can also just grab the 'mpv-1.dll'/'mpv-2.dll' I bundle in my extractable Windows release.
If you are on Linux, you can usually get 'libmpv1' like so:
apt-get install libmpv1
On macOS, you should be able to get it with
brew install mpv
, but you are likely to find mpv crashes the program when it tries to load. Hydev is working on this, but it will probably need a completely different render API.Hit help->about to see your mpv status. If you don't have it, it will present an error popup box with more info.
"},{"location":"running_from_source.html#sqlite","title":"SQLite","text":"If you can, update python's SQLite--it'll improve performance. The SQLite that comes with stock python is usually quite old, so you'll get a significant boost in speed. In some python deployments, the built-in SQLite not compiled with neat features like Fast Text Search (FTS) that hydrus needs.
On Windows, get the 64-bit sqlite3.dll here, and just drop it in your base install directory. You can also just grab the 'sqlite3.dll' I bundle in my extractable Windows release.
You may be able to update your SQLite on Linux or macOS with:
apt-get install libsqlite3-dev
- (activate your venv)
python -m pip install pysqlite3
But as long as the program launches, it usually isn't a big deal.
Extremely safe no way it can go wrong
If you want to update SQLite for your Windows system python install, you can also drop it into
C:\\Program Files\\Python310\\DLLs
or wherever you have python installed, and it'll update for all your python projects. You'll be overwriting the old file, so make a backup of the old one (I have never had trouble updating like this, however).A user who made a Windows venv with Anaconda reported they had to replace the sqlite3.dll in their conda env at
"},{"location":"running_from_source.html#ffmpeg","title":"FFMPEG","text":"~/.conda/envs/<envname>/Library/bin/sqlite3.dll
.If you don't have FFMPEG in your PATH and you want to import anything more fun than jpegs, you will need to put a static FFMPEG executable in your PATH or the
"},{"location":"running_from_source.html#running_it","title":"Running It","text":"install_dir/bin
directory. This should always point to a new build for Windows. Alternately, you can just copy the exe from one of my extractable Windows releases.Once you have everything set up, hydrus_client.py and hydrus_server.py should look for and run off client.db and server.db just like the executables. You can use the 'hydrus_client.bat/sh/command' scripts in the install dir or use them as inspiration for your own. In any case, you are looking at entering something like this into the terminal:
source venv/bin/activate\npython hydrus_client.py\n
This will use the 'db' directory for your database by default, but you can use the launch arguments just like for the executables. For example, this could be your client-user.sh file:
"},{"location":"running_from_source.html#building_these_docs","title":"Building these Docs","text":"#!/bin/bash\n\nsource venv/bin/activate\npython hydrus_client.py -d=\"/path/to/database\"\n
When running from source you may want to build the hydrus help docs yourself. You can also check the
"},{"location":"running_from_source.html#windows_build","title":"Building Packages on Windows","text":"setup_help
scripts in the install directory.Almost everything you get through pip is provided as pre-compiled 'wheels' these days, but if you get an error about Visual Studio C++ when you try to pip something, you have two choices:
- Get Visual Studio 14/whatever build tools
- Pick a different library version
Option B is always simpler. If opencv-headless as the requirements.txt specifies won't compile in your python, then try a newer version--there will probably be one of these new highly compatible wheels and it'll just work in seconds. Check my build scripts and various requirements.txts for ideas on what versions to try for your python etc...
If you are confident you need Visual Studio tools, then prepare for headaches. Although the tools are free from Microsoft, it can be a pain to get them through the official (and often huge) downloader installer from Microsoft. Expect a 5GB+ install with an eye-watering number of checkboxes that probably needs some stackexchange searches to figure out.
On Windows 10, Chocolatey has been the easy answer. These can be useful:
choco install -y vcredist-all\nchoco install -y vcbuildtools (this is Visual Studio 2015)\nchoco install -y visualstudio2017buildtools\nchoco install -y visualstudio2022buildtools\nchoco install -y windows-sdk-10.0\n
Update: On Windows 11, I have had some trouble with the above. The VS2015 seems not to install any more. A basic stock Win 11 install with Python 3.10 or 3.11 is fine getting everything on our requirements, but freezing with PyInstaller may have trouble finding certain 'api-***.dll' files. I am now trying to figure this out with my latest dev machine as of 2024-01. If you try this, let me know what you find out!
"},{"location":"running_from_source.html#additional_windows","title":"Additional Windows Info","text":"This does not matter much any more, but in the old days, building modules like lz4 and lxml was a complete nightmare, and hooking up Visual Studio was even more difficult. This page has a lot of prebuilt binaries--I have found it very helpful many times.
I have a fair bit of experience with Windows python, so send me a mail if you need help.
"},{"location":"running_from_source.html#my_code","title":"My Code","text":"I develop hydrus on and am most experienced with Windows, so the program is more stable and reasonable on that. I do not have as much experience with Linux or macOS, but I still appreciate and will work on your Linux/macOS bug reports.
My coding style is unusual and unprofessional. Everything is pretty much hacked together. If you are interested in how things work, please do look through the source and ask me if you don't understand something.
I'm constantly throwing new code together and then cleaning and overhauling it down the line. I work strictly alone. While I am very interested in detailed bug reports or suggestions for good libraries to use, I am not looking for pull requests or suggestions on style. I know a lot of things are a mess. Everything I do is WTFPL, so feel free to fork and play around with things on your end as much as you like.
"},{"location":"server.html","title":"running your own server","text":"Note
You do not need the server to do anything with hydrus! It is only for advanced users to do very specific jobs! The server is also hacked-together and quite technical. It requires a fair amount of experience with the client and its concepts, and it does not operate on a timescale that works well on a LAN. Only try running your own server once you have a bit of experience synchronising with something like the PTR and you think, 'Hey, I know exactly what that does, and I would like one!'
Here is a document put together by a user describing whether you want the server.
"},{"location":"server.html#intro","title":"setting up a server","text":"I will use two terms, server and service, to mean two distinct things:
- A server is an instantiation of the hydrus server executable (e.g. hydrus_server.exe in Windows). It has a complicated and flexible database that can run many different services in parallel.
- A service sits on a port (e.g. 45871) and responds to certain http requests (e.g.
/file
or/update
) that the hydrus client can plug into. A service might be a repository for a certain kind of data, the administration interface to manage what services run on a server, or anything else.
Setting up a hydrus server is easy compared to, say, Apache. There are no .conf files to mess about with, and everything is controlled through the client. When started, the server will place an icon in your system tray in Windows or open a small frame in Linux or macOS. To close the server, either right-click the system tray icon and select exit, or just close the frame.
The basic process for setting up a server is:
- Start the server.
- Set up your client with its address and initialise the admin account
- Set the server's options and services.
- Make some accounts for your users.
- ???
- Profit
Let's look at these steps in more detail:
"},{"location":"server.html#start","title":"start the server","text":"Since the server and client have so much common code, I package them together. If you have the client, you have the server. If you installed in Windows, you can hit the shortcut in your start menu. Otherwise, go straight to 'hydrus_server' or 'hydrus_server.exe' or 'hydrus_server.py' in your installation directory. The program will first try to take port 45870 for its administration interface, so make sure that is free. Open your firewall as appropriate.
"},{"location":"server.html#setting_up_the_client","title":"set up the client","text":"In the services->manage services dialog, add a new 'hydrus server administration service' and set up the basic options as appropriate. If you are running the server on the same computer as the client, its hostname is 'localhost'.
In order to set up the first admin account and an access key, use 'init' as a registration token. This special registration token will only work to initialise this first super-account.
YOU'LL WANT TO SAVE YOUR ACCESS KEY IN A SAFE PLACE
If you lose your admin access key, there is no way to get it back, and if you are not sqlite-proficient, you'll have to restart from the beginning by deleting your server's database files.
If the client can't connect to the server, it is either not running or you have a firewall/port-mapping problem. If you want a quick way to test the server's visibility, just put
"},{"location":"server.html#setting_up_the_server","title":"set up the server","text":"https://host:port
into your browser (make sure it is https! http will not work)--if it is working, your browser will probably complain about its self-signed https certificate. Once you add a certificate exception, the server should return some simple html identifying itself.You should have a new submenu, 'administrate services', under 'services', in the client gui. This is where you control most server and service-wide stuff.
admin->your server->manage services lets you add, edit, and delete the services your server runs. Every time you add one, you will also be added as that service's first administrator, and the admin menu will gain a new entry for it.
"},{"location":"server.html#making_accounts","title":"making accounts","text":"Go admin->your service->create new accounts to create new registration tokens. Send the registration tokens to the users you want to give these new accounts. A registration token will only work once, so if you want to give several people the same account, they will have to share the access key amongst themselves once one of them has registered the account. (Or you can register the account yourself and send them all the same access key. Do what you like!)
Go admin->manage account types to add, remove, or edit account types. Make sure everyone has at least downloader (get_data) permissions so they can stay synchronised.
You can create as many accounts of whatever kind you like. Depending on your usage scenario, you may want to have all uploaders, one uploader and many downloaders, or just a single administrator. There are many combinations.
"},{"location":"server.html#have_fun","title":"???","text":"The most important part is to have fun! There are no losers on the INFORMATION SUPERHIGHWAY.
"},{"location":"server.html#profit","title":"profit","text":"I honestly hope you can get some benefit out of my code, whether just as a backup or as part of a far more complex system. Please mail me your comments as I am always keen to make improvements.
"},{"location":"server.html#backing_up","title":"btw, how to backup a repo's db","text":"All of a server's files and options are stored in its accompanying .db file and respective subdirectories, which are created on first startup (just like with the client). To backup or restore, you have two options:
- Shut down the server, copy the database files and directories, then restart it. This is the only way, currently, to restore a db.
- In the client, hit admin->your server->make a backup. This will lock the db server-side while it makes a copy of everything server-related to
server_install_dir/db/server_backup
. When the operation is complete, you can ftp/batch-copy/whatever the server_backup folder wherever you like.
If you get to a point where you can no longer boot the repository, try running SQLite Studio and opening server.db. If the issue is simple--like manually changing the port number--you may be in luck. Send me an email if it is tricky.
Remember that everything is breaking all the time. Make regular backups, and you'll minimise your problems.
"},{"location":"support.html","title":"Financial Support","text":""},{"location":"support.html#support","title":"can I contribute to hydrus development?","text":"I do not expect anything from anyone. I'm amazed and grateful that anyone wants to use my software and share tags with others. I enjoy the feedback and work, and I hope to keep putting completely free weekly releases out as long as there is more to do.
That said, as I have developed the software, several users have kindly offered to contribute money, either as thanks for a specific feature or just in general. I kept putting the thought off, but I eventually got over my hesitance and set something up.
I find the tactics of most internet fundraising very distasteful, especially when they promise something they then fail to deliver. I much prefer the 'if you like me and would like to contribute, then please do, meanwhile I'll keep doing what I do' model. I support several 'put out regular free content' creators on Patreon in this way, and I get a lot out of it, even though I have no direct reward beyond the knowledge that I helped some people do something neat.
If you feel the same way about my work, I've set up a simple Patreon page here. If you can help out, it is deeply appreciated.
"},{"location":"wine.html","title":"running a client or server in wine","text":"Several Linux and macOS users have found success running hydrus with Wine. Here is a post from a Linux dude:
Some things I picked up on after extended use:
- Wine is kinda retarded sometimes, do not try to close the window by pressing the red close button, while in fullscreen.
- It will just \"go through\" it, and do whatever to whats behind it.
- Flash do work, IF you download the internet explorer version, and install it through wine.
- Hydrus is selfcontained, and portable. That means that one instance of hydrus do not know what another is doing. This is great if you want different installations for different things.
- Some of the input fields behave a little wonky. Though that may just be standard Hydrus behavior.
- Mostly everything else works fine. I was able to connect to the test server and view there. Only thing I need to test is the ability to host a server.
Installation process:
- Get a standard Wine installation.
- Download the latest hydrus .zip file.
- Unpack it with your chosen zip file opener, in the chosen folder. Do not need to be in the wine folder.
- Run it with wine, either though the file manager, or though the terminal.
- For Flash support install the IE version through wine.
If you get the client running in Wine, please let me know how you get on!
"},{"location":"youDontWantTheServer.html","title":"You don't want the server","text":"The hydrus_server.exe/hydrus_server.py is the victim of many a misconception. You don't need to use the server to use Hydrus. The vast majority of features are contained in the client itself so if you're new to Hydrus, just use that.
The server is only really useful for a few specific cases which will not apply for the vast majority of users.
"},{"location":"youDontWantTheServer.html#the_server","title":"The server","text":"The Hydrus server doesn't really work as most people envision a server working. Rather than on-demand viewing, when you link with a Hydrus server, you synchronise a complete copy of all its data. For the tag repository, you download every single tag it has ever been told about. For the file repository, you download the whole file list, related file info, and every single thumbnail, which lets you browse the whole repository in your client in a regular search page--to view files in the media viewer, you need to download and import them specifically.
"},{"location":"youDontWantTheServer.html#you_dont_want_the_server_probably","title":"You don't want the server (probably)","text":"Do you want to remotely view your files? You don't want the server.
Do you want to host your files on another computer since your daily driver don't have a lot of storage space? You don't want the server.
Do you want to use multiple clients and have everything synced between them? You don't want the server.
Do you want to expose API for Hydrus Web, Hydroid, or some other third-party tool? You don't want the server.
Do you want to share some files and/or tags in a small group of friends? You might actually want the server.
"},{"location":"youDontWantTheServer.html#the_options","title":"The options","text":"Now, you're not the first person to have any of the above ideas and some of the thinkers even had enough programming know-how to make something for it. Below is a list of some options, see this page for a few more.
"},{"location":"youDontWantTheServer.html#hydrus_web","title":"Hydrus Web","text":"- Lets you browse and manage your collection.
- Lets you browse and manage your collection.
- Lets you browse your collection.
- Lets you host your files on another drive, even on another computer in the network.
https://hydrusnetwork.github.io/hydrus/index.html -2024-11-08 +2024-11-13 https://hydrusnetwork.github.io/hydrus/Fixing_Hydrus_Random_Crashes_Under_Linux.html -2024-11-08 +2024-11-13 https://hydrusnetwork.github.io/hydrus/PTR.html -2024-11-08 +2024-11-13 https://hydrusnetwork.github.io/hydrus/Understanding_Database_Synchronization.html -2024-11-08 +2024-11-13 https://hydrusnetwork.github.io/hydrus/about_docs.html -2024-11-08 +2024-11-13 https://hydrusnetwork.github.io/hydrus/access_keys.html -2024-11-08 +2024-11-13 https://hydrusnetwork.github.io/hydrus/adding_new_downloaders.html -2024-11-08 +2024-11-13 https://hydrusnetwork.github.io/hydrus/advanced.html -2024-11-08 +2024-11-13 https://hydrusnetwork.github.io/hydrus/advanced_multiple_local_file_services.html -2024-11-08 +2024-11-13 https://hydrusnetwork.github.io/hydrus/advanced_parents.html -2024-11-08 +2024-11-13 https://hydrusnetwork.github.io/hydrus/advanced_siblings.html -2024-11-08 +2024-11-13 https://hydrusnetwork.github.io/hydrus/advanced_sidecars.html -2024-11-08 +2024-11-13 https://hydrusnetwork.github.io/hydrus/after_disaster.html -2024-11-08 +2024-11-13 https://hydrusnetwork.github.io/hydrus/changelog.html -2024-11-08 +2024-11-13 https://hydrusnetwork.github.io/hydrus/client_api.html -2024-11-08 +2024-11-13 https://hydrusnetwork.github.io/hydrus/contact.html -2024-11-08 +2024-11-13 https://hydrusnetwork.github.io/hydrus/database_migration.html -2024-11-08 +2024-11-13 https://hydrusnetwork.github.io/hydrus/developer_api.html -2024-11-08 +2024-11-13 https://hydrusnetwork.github.io/hydrus/docker.html -2024-11-08 +2024-11-13 https://hydrusnetwork.github.io/hydrus/downloader_completion.html -2024-11-08 +2024-11-13 https://hydrusnetwork.github.io/hydrus/downloader_gugs.html -2024-11-08 +2024-11-13 https://hydrusnetwork.github.io/hydrus/downloader_intro.html -2024-11-08 +2024-11-13 https://hydrusnetwork.github.io/hydrus/downloader_login.html -2024-11-08 +2024-11-13 https://hydrusnetwork.github.io/hydrus/downloader_parsers.html -2024-11-08 +2024-11-13 https://hydrusnetwork.github.io/hydrus/downloader_parsers_content_parsers.html -2024-11-08 +2024-11-13 https://hydrusnetwork.github.io/hydrus/downloader_parsers_formulae.html -2024-11-08 +2024-11-13 https://hydrusnetwork.github.io/hydrus/downloader_parsers_full_example_api.html -2024-11-08 +2024-11-13 https://hydrusnetwork.github.io/hydrus/downloader_parsers_full_example_file_page.html -2024-11-08 +2024-11-13 https://hydrusnetwork.github.io/hydrus/downloader_parsers_full_example_gallery_page.html -2024-11-08 +2024-11-13 https://hydrusnetwork.github.io/hydrus/downloader_parsers_page_parsers.html -2024-11-08 +2024-11-13 https://hydrusnetwork.github.io/hydrus/downloader_sharing.html -2024-11-08 +2024-11-13 https://hydrusnetwork.github.io/hydrus/downloader_url_classes.html -2024-11-08 +2024-11-13 https://hydrusnetwork.github.io/hydrus/duplicates.html -2024-11-08 +2024-11-13 https://hydrusnetwork.github.io/hydrus/duplicates_auto_resolution.html -2024-11-08 +2024-11-13 https://hydrusnetwork.github.io/hydrus/faq.html -2024-11-08 +2024-11-13 https://hydrusnetwork.github.io/hydrus/filetypes.html -2024-11-08 +2024-11-13 https://hydrusnetwork.github.io/hydrus/gettingStartedOverview.html -2024-11-08 +2024-11-13 https://hydrusnetwork.github.io/hydrus/getting_started_downloading.html -2024-11-08 +2024-11-13 https://hydrusnetwork.github.io/hydrus/getting_started_files.html -2024-11-08 +2024-11-13 https://hydrusnetwork.github.io/hydrus/getting_started_importing.html -2024-11-08 +2024-11-13 https://hydrusnetwork.github.io/hydrus/getting_started_installing.html -2024-11-08 +2024-11-13 https://hydrusnetwork.github.io/hydrus/getting_started_more_tags.html -2024-11-08 +2024-11-13 https://hydrusnetwork.github.io/hydrus/getting_started_ratings.html -2024-11-08 +2024-11-13 https://hydrusnetwork.github.io/hydrus/getting_started_searching.html -2024-11-08 +2024-11-13 https://hydrusnetwork.github.io/hydrus/getting_started_subscriptions.html -2024-11-08 +2024-11-13 https://hydrusnetwork.github.io/hydrus/getting_started_tags.html -2024-11-08 +2024-11-13 https://hydrusnetwork.github.io/hydrus/introduction.html -2024-11-08 +2024-11-13 https://hydrusnetwork.github.io/hydrus/ipfs.html -2024-11-08 +2024-11-13 https://hydrusnetwork.github.io/hydrus/launch_arguments.html -2024-11-08 +2024-11-13 https://hydrusnetwork.github.io/hydrus/petitionPractices.html -2024-11-08 +2024-11-13 https://hydrusnetwork.github.io/hydrus/privacy.html -2024-11-08 +2024-11-13 https://hydrusnetwork.github.io/hydrus/reducing_lag.html -2024-11-08 +2024-11-13 https://hydrusnetwork.github.io/hydrus/running_from_source.html -2024-11-08 +2024-11-13 https://hydrusnetwork.github.io/hydrus/server.html -2024-11-08 +2024-11-13 https://hydrusnetwork.github.io/hydrus/support.html -2024-11-08 +2024-11-13 https://hydrusnetwork.github.io/hydrus/wine.html -2024-11-08 +2024-11-13 https://hydrusnetwork.github.io/hydrus/youDontWantTheServer.html -2024-11-08 +2024-11-13