From 9a4a7b2c528128acb8c78625d547cf02afec549a Mon Sep 17 00:00:00 2001 From: Fufu Fang Date: Sun, 25 Aug 2019 06:09:17 +0100 Subject: [PATCH] Updated README.md --- README.md | 146 ++++++++++++++++++++++++------------------------------ 1 file changed, 66 insertions(+), 80 deletions(-) diff --git a/README.md b/README.md index f31973c..a383f44 100644 --- a/README.md +++ b/README.md @@ -1,16 +1,15 @@ # HTTPDirFS - now with a permanent cache Have you ever wanted to mount those HTTP directory listings as if it was a partition? Look no further, this is your solution. HTTPDirFS stands for Hyper -Text Transfer Protocol Directory Filesystem +Text Transfer Protocol Directory Filesystem. -The performance of the program is excellent, due to the use of curl-multi -interface. HTTP connections are reused, and HTTP pipelining is used when -available. The FUSE component itself also runs in multithreaded mode. +The performance of the program is excellent. HTTP connections are reused due to +the use of curl-multi interface. The FUSE component runs in multithreaded mode. -The permanent cache system caches all the files you have downloaded, so you -don't need to download those files again if you later access them again. This -feature is triggered by the ``--cache`` flag. This makes this filesystem much -faster than ``rclone mount``. +There is a permanent cache system which can cache all the file segments you have +downloaded, so you don't need to these segments again if you access them later. +This feature is triggered by the ``--cache`` flag. This makes this filesystem +much faster than ``rclone mount``. ## Usage @@ -21,36 +20,42 @@ An example URL would be keeps the program in the foreground, which is useful for monitoring which URL the filesystem is visiting. -Useful options: +### Useful options + +HTTPDirFS options: - -f foreground operation - -s disable multi-threaded operation -u --username HTTP authentication username -p --password HTTP authentication password -P --proxy Proxy for libcurl, for more details refer to https://curl.haxx.se/libcurl/c/CURLOPT_PROXY.html --proxy-username Username for the proxy --proxy-password Password for the proxy - --cache Enable cache, by default this is disabled - --cache-location Set a custom cache location, by default it is - located in ${XDG_CACHE_HOME}/httpdirfs - --dl-seg-size The size of each download segment in MB, - default to 8MB. - --max-seg-count The maximum number of download segments a file - can have. By default it is set to 128*1024. This - means the maximum memory usage per file is 128KB. - This allows caching file up to 1TB in size, - assuming you are using the default segment size. - --max-conns The maximum number of network connections that - libcurl is allowed to make, default to 10. - --retry-wait The waiting interval in seconds before making an - HTTP request, after encountering an error, - default to 5 seconds. - --user-agent The user agent string, default to "HTTPDirFS". + --cache Enable cache (default: off) + --cache-location Set a custom cache location + (default: "${XDG_CACHE_HOME}/httpdirfs") + --dl-seg-size Set cache download segment size, in MB (default: 8) + Note: this setting is ignored if previously + cached data is found for the requested file. + --max-seg-count Set maximum number of download segments a file + can have. (default: 128*1024) + With the default setting, the maximum memory usage + per file is 128KB. This allows caching files up + to 1TB in size using the default segment size. + --max-conns Set maximum number of network connections that + libcurl is allowed to make. (default: 10) + --retry-wait Set delay in seconds before retrying an HTTP request + after encountering an error. (default: 5) + --user-agent Set user agent string (default: "HTTPDirFS") + +FUSE options: + + -d -o debug enable debug output (implies -f) + -f foreground operation + -s disable multi-threaded operation ## Permanent cache system -You can now cache all the files you have looked at permanently on your hard -drive by using the ``--cache`` flag. The file it caches persist across sessions +You can cache all the files you have looked at permanently on your hard drive by +using the ``--cache`` flag. The file it caches persist across sessions. By default, the cache files are stored under ``${XDG_CACHE_HOME}/httpdirfs``, which by default is ``${HOME}/.cache/httpdirfs``. Each HTTP directory gets its @@ -64,82 +69,63 @@ maximum download speed is around 15MiB/s, as measured using my localhost as the web server. However after you have accessed a file once, accessing it again will be the same speed as accessing your hard drive. -If you have any patches to make the initial download go faster, feel free to -submit a pull request. +If you have any patches to make the initial download go faster, please submit a +pull request. -The permanent cache system also relies on sparse allocation. Please make sure -your filesystem supports it. Otherwise your hard drive / SSD might grind to -a halt.For a list of filesystem that supports sparse allocation, please refer to -[Wikipedia](https://en.wikipedia.org/wiki/Comparison_of_file_systems#Allocation_and_layout_policies). +The permanent cache system relies on sparse allocation. Please make sure your +filesystem supports it. Otherwise your hard drive / SSD will get heavy I/O from +cache file creation. For a list of filesystem that supports sparse allocation, +please refer to [Wikipedia](https://en.wikipedia.org/wiki/Comparison_of_file_systems#Allocation_and_layout_policies). ## Configuration file support -There is now rudimentary config file support. The configuration file that the -program will read is ``${XDG_CONFIG_HOME}/httpdirfs/config``. -If ``${XDG_CONFIG_HOME}`` is not set, it will default to ``${HOME}/.config``. So -by default you need to put the configuration file at -``${HOME}/.config/httpdirfs/config``. You will have to create the sub-directory -and the configuration file yourself. In the configuration file, please supply -one option per line. For example: +This program has basic support for using a configuration file. The configuration +file that the program reads is ``${XDG_CONFIG_HOME}/httpdirfs/config``, which by +default is at ``${HOME}/.config/httpdirfs/config``. You will have to create the +sub-directory and the configuration file yourself. In the configuration file, +please supply one option per line. For example: - $ cat ${HOME}/.config/httpdirfs/config --username test --password test -f ## Compilation -This program was developed under Debian Stretch. If you are using the same -operating system as me, you need ``libgumbo-dev``, ``libfuse-dev``, -``libssl1.0-dev`` and ``libcurl4-openssl-dev``. +### Debian 10 "Buster" and newer versions +Under Debian 10 "Buster" and newer versions, you need the following packages: -If you run Debian Stretch, and you have OpenSSL 1.0.2 installed, and you get -warnings that look like below during compilation, + libgumbo-dev libfuse-dev libssl-dev libcurl4-openssl-dev + +### Debian 9 "Stretch" +Under Debian 9 "Stretch", you need the following packages: + + libgumbo-dev libfuse-dev libssl1.0-dev libcurl4-openssl-dev + +If you get the following warnings during compilation, /usr/bin/ld: warning: libcrypto.so.1.0.2, needed by /usr/lib/gcc/x86_64-linux-gnu/6/../../../x86_64-linux-gnu/libcurl.so, may conflict with libcrypto.so.1.1 -then you need to check if ``libssl1.0-dev`` had been installed properly. If you -get these compilation warnings, this program will ocassionally crash if you -connect to HTTPS website. This is because OpenSSL 1.0.2 needs those functions -for thread safety, whereas OpenSSL 1.1 does not. If you have ``libssl-dev`` -rather than ``libssl1.0-dev`` installed, those call back functions will not be -linked properly. +then this program will crash if you connect to HTTPS website. You need to check +if you have ``libssl1.0-dev`` installed rather than ``libssl-dev``. +This is you likely have the binaries of OpenSSL 1.0.2 installed alongside with +the header files for OpenSSL 1.1. The header files for OpenSSL 1.0.2 link in +additional mutex related callback functions, whereas the header files for +OpenSSL 1.1 do not. -If you have OpenSSL 1.1 and the associated development headers installed, then -you can safely ignore these warning messages. If you are on Debian Buster, you -will definitely get these warning messages, and you can safely ignore them. ### Debugging Mutexes -By default the debugging output associated with mutexes are not compiled. To enable them, compile the program using the following command: +By default the debugging output associated with mutexes are not compiled. To +enable them, compile the program using the following command: make CPPFLAGS=-DLOCK_DEBUG -## SSL Support -If you run the program in the foreground, when it starts up, it will output the -SSL engine version string. Please verify that your libcurl is linked against -OpenSSL, as the pthread mutex functions are designed for OpenSSL. - -The SSL engine version string looks something like this: - - libcurl SSL engine: OpenSSL/1.0.2l - ## The Technical Details -I noticed that most HTTP directory listings don't provide the file size for the -web page itself. I suppose this makes perfect sense, as they are generated on -the fly. Whereas the actual files have got file sizes. So the listing pages can -be treated as folders, and the rest are files. - This program downloads the HTML web pages/files using [libcurl](https://curl.haxx.se/libcurl/), then parses the listing pages using [Gumbo](https://github.com/google/gumbo-parser), and presents them using [libfuse](https://github.com/libfuse/libfuse). -I wrote the cache system myself. It was a Herculean effort. I am immensely proud -of it. The cache system stores the metadata and the downloaded file into two -separate directories. It uses bitmaps to record which segment of the file has -been downloaded. By bitmap, I meant ``uint8_t`` arrays, which each byte -indicating for a 1 MiB segment. I could not be bothered to implement proper -bitmapping. The main challenge for the cache system was hunting down various -race conditions which caused metadata corruption, downloading the same segment -multiple times, and deadlocks. +The cache system stores the metadata and the downloaded file into two +separate directories. It uses ``uint8_t`` arrays to record which segments of the +file had been downloaded. ## Other projects which incorporate HTTPDirFS - [Curious Container](https://www.curious-containers.cc/docs/red-connector-http#mount-dir)