Updated README.md
This commit is contained in:
parent
242403098e
commit
9a4a7b2c52
146
README.md
146
README.md
|
@ -1,16 +1,15 @@
|
||||||
# HTTPDirFS - now with a permanent cache
|
# HTTPDirFS - now with a permanent cache
|
||||||
Have you ever wanted to mount those HTTP directory listings as if it was a
|
Have you ever wanted to mount those HTTP directory listings as if it was a
|
||||||
partition? Look no further, this is your solution. HTTPDirFS stands for Hyper
|
partition? Look no further, this is your solution. HTTPDirFS stands for Hyper
|
||||||
Text Transfer Protocol Directory Filesystem
|
Text Transfer Protocol Directory Filesystem.
|
||||||
|
|
||||||
The performance of the program is excellent, due to the use of curl-multi
|
The performance of the program is excellent. HTTP connections are reused due to
|
||||||
interface. HTTP connections are reused, and HTTP pipelining is used when
|
the use of curl-multi interface. The FUSE component runs in multithreaded mode.
|
||||||
available. The FUSE component itself also runs in multithreaded mode.
|
|
||||||
|
|
||||||
The permanent cache system caches all the files you have downloaded, so you
|
There is a permanent cache system which can cache all the file segments you have
|
||||||
don't need to download those files again if you later access them again. This
|
downloaded, so you don't need to these segments again if you access them later.
|
||||||
feature is triggered by the ``--cache`` flag. This makes this filesystem much
|
This feature is triggered by the ``--cache`` flag. This makes this filesystem
|
||||||
faster than ``rclone mount``.
|
much faster than ``rclone mount``.
|
||||||
|
|
||||||
## Usage
|
## Usage
|
||||||
|
|
||||||
|
@ -21,36 +20,42 @@ An example URL would be
|
||||||
keeps the program in the foreground, which is useful for monitoring which URL
|
keeps the program in the foreground, which is useful for monitoring which URL
|
||||||
the filesystem is visiting.
|
the filesystem is visiting.
|
||||||
|
|
||||||
Useful options:
|
### Useful options
|
||||||
|
|
||||||
|
HTTPDirFS options:
|
||||||
|
|
||||||
-f foreground operation
|
|
||||||
-s disable multi-threaded operation
|
|
||||||
-u --username HTTP authentication username
|
-u --username HTTP authentication username
|
||||||
-p --password HTTP authentication password
|
-p --password HTTP authentication password
|
||||||
-P --proxy Proxy for libcurl, for more details refer to
|
-P --proxy Proxy for libcurl, for more details refer to
|
||||||
https://curl.haxx.se/libcurl/c/CURLOPT_PROXY.html
|
https://curl.haxx.se/libcurl/c/CURLOPT_PROXY.html
|
||||||
--proxy-username Username for the proxy
|
--proxy-username Username for the proxy
|
||||||
--proxy-password Password for the proxy
|
--proxy-password Password for the proxy
|
||||||
--cache Enable cache, by default this is disabled
|
--cache Enable cache (default: off)
|
||||||
--cache-location Set a custom cache location, by default it is
|
--cache-location Set a custom cache location
|
||||||
located in ${XDG_CACHE_HOME}/httpdirfs
|
(default: "${XDG_CACHE_HOME}/httpdirfs")
|
||||||
--dl-seg-size The size of each download segment in MB,
|
--dl-seg-size Set cache download segment size, in MB (default: 8)
|
||||||
default to 8MB.
|
Note: this setting is ignored if previously
|
||||||
--max-seg-count The maximum number of download segments a file
|
cached data is found for the requested file.
|
||||||
can have. By default it is set to 128*1024. This
|
--max-seg-count Set maximum number of download segments a file
|
||||||
means the maximum memory usage per file is 128KB.
|
can have. (default: 128*1024)
|
||||||
This allows caching file up to 1TB in size,
|
With the default setting, the maximum memory usage
|
||||||
assuming you are using the default segment size.
|
per file is 128KB. This allows caching files up
|
||||||
--max-conns The maximum number of network connections that
|
to 1TB in size using the default segment size.
|
||||||
libcurl is allowed to make, default to 10.
|
--max-conns Set maximum number of network connections that
|
||||||
--retry-wait The waiting interval in seconds before making an
|
libcurl is allowed to make. (default: 10)
|
||||||
HTTP request, after encountering an error,
|
--retry-wait Set delay in seconds before retrying an HTTP request
|
||||||
default to 5 seconds.
|
after encountering an error. (default: 5)
|
||||||
--user-agent The user agent string, default to "HTTPDirFS".
|
--user-agent Set user agent string (default: "HTTPDirFS")
|
||||||
|
|
||||||
|
FUSE options:
|
||||||
|
|
||||||
|
-d -o debug enable debug output (implies -f)
|
||||||
|
-f foreground operation
|
||||||
|
-s disable multi-threaded operation
|
||||||
|
|
||||||
## Permanent cache system
|
## Permanent cache system
|
||||||
You can now cache all the files you have looked at permanently on your hard
|
You can cache all the files you have looked at permanently on your hard drive by
|
||||||
drive by using the ``--cache`` flag. The file it caches persist across sessions
|
using the ``--cache`` flag. The file it caches persist across sessions.
|
||||||
|
|
||||||
By default, the cache files are stored under ``${XDG_CACHE_HOME}/httpdirfs``,
|
By default, the cache files are stored under ``${XDG_CACHE_HOME}/httpdirfs``,
|
||||||
which by default is ``${HOME}/.cache/httpdirfs``. Each HTTP directory gets its
|
which by default is ``${HOME}/.cache/httpdirfs``. Each HTTP directory gets its
|
||||||
|
@ -64,82 +69,63 @@ maximum download speed is around 15MiB/s, as measured using my localhost as the
|
||||||
web server. However after you have accessed a file once, accessing it again will
|
web server. However after you have accessed a file once, accessing it again will
|
||||||
be the same speed as accessing your hard drive.
|
be the same speed as accessing your hard drive.
|
||||||
|
|
||||||
If you have any patches to make the initial download go faster, feel free to
|
If you have any patches to make the initial download go faster, please submit a
|
||||||
submit a pull request.
|
pull request.
|
||||||
|
|
||||||
The permanent cache system also relies on sparse allocation. Please make sure
|
The permanent cache system relies on sparse allocation. Please make sure your
|
||||||
your filesystem supports it. Otherwise your hard drive / SSD might grind to
|
filesystem supports it. Otherwise your hard drive / SSD will get heavy I/O from
|
||||||
a halt.For a list of filesystem that supports sparse allocation, please refer to
|
cache file creation. For a list of filesystem that supports sparse allocation,
|
||||||
[Wikipedia](https://en.wikipedia.org/wiki/Comparison_of_file_systems#Allocation_and_layout_policies).
|
please refer to [Wikipedia](https://en.wikipedia.org/wiki/Comparison_of_file_systems#Allocation_and_layout_policies).
|
||||||
|
|
||||||
## Configuration file support
|
## Configuration file support
|
||||||
There is now rudimentary config file support. The configuration file that the
|
This program has basic support for using a configuration file. The configuration
|
||||||
program will read is ``${XDG_CONFIG_HOME}/httpdirfs/config``.
|
file that the program reads is ``${XDG_CONFIG_HOME}/httpdirfs/config``, which by
|
||||||
If ``${XDG_CONFIG_HOME}`` is not set, it will default to ``${HOME}/.config``. So
|
default is at ``${HOME}/.config/httpdirfs/config``. You will have to create the
|
||||||
by default you need to put the configuration file at
|
sub-directory and the configuration file yourself. In the configuration file,
|
||||||
``${HOME}/.config/httpdirfs/config``. You will have to create the sub-directory
|
please supply one option per line. For example:
|
||||||
and the configuration file yourself. In the configuration file, please supply
|
|
||||||
one option per line. For example:
|
|
||||||
|
|
||||||
$ cat ${HOME}/.config/httpdirfs/config
|
|
||||||
--username test
|
--username test
|
||||||
--password test
|
--password test
|
||||||
-f
|
-f
|
||||||
|
|
||||||
## Compilation
|
## Compilation
|
||||||
This program was developed under Debian Stretch. If you are using the same
|
### Debian 10 "Buster" and newer versions
|
||||||
operating system as me, you need ``libgumbo-dev``, ``libfuse-dev``,
|
Under Debian 10 "Buster" and newer versions, you need the following packages:
|
||||||
``libssl1.0-dev`` and ``libcurl4-openssl-dev``.
|
|
||||||
|
|
||||||
If you run Debian Stretch, and you have OpenSSL 1.0.2 installed, and you get
|
libgumbo-dev libfuse-dev libssl-dev libcurl4-openssl-dev
|
||||||
warnings that look like below during compilation,
|
|
||||||
|
### Debian 9 "Stretch"
|
||||||
|
Under Debian 9 "Stretch", you need the following packages:
|
||||||
|
|
||||||
|
libgumbo-dev libfuse-dev libssl1.0-dev libcurl4-openssl-dev
|
||||||
|
|
||||||
|
If you get the following warnings during compilation,
|
||||||
|
|
||||||
/usr/bin/ld: warning: libcrypto.so.1.0.2, needed by /usr/lib/gcc/x86_64-linux-gnu/6/../../../x86_64-linux-gnu/libcurl.so, may conflict with libcrypto.so.1.1
|
/usr/bin/ld: warning: libcrypto.so.1.0.2, needed by /usr/lib/gcc/x86_64-linux-gnu/6/../../../x86_64-linux-gnu/libcurl.so, may conflict with libcrypto.so.1.1
|
||||||
|
|
||||||
then you need to check if ``libssl1.0-dev`` had been installed properly. If you
|
then this program will crash if you connect to HTTPS website. You need to check
|
||||||
get these compilation warnings, this program will ocassionally crash if you
|
if you have ``libssl1.0-dev`` installed rather than ``libssl-dev``.
|
||||||
connect to HTTPS website. This is because OpenSSL 1.0.2 needs those functions
|
This is you likely have the binaries of OpenSSL 1.0.2 installed alongside with
|
||||||
for thread safety, whereas OpenSSL 1.1 does not. If you have ``libssl-dev``
|
the header files for OpenSSL 1.1. The header files for OpenSSL 1.0.2 link in
|
||||||
rather than ``libssl1.0-dev`` installed, those call back functions will not be
|
additional mutex related callback functions, whereas the header files for
|
||||||
linked properly.
|
OpenSSL 1.1 do not.
|
||||||
|
|
||||||
If you have OpenSSL 1.1 and the associated development headers installed, then
|
|
||||||
you can safely ignore these warning messages. If you are on Debian Buster, you
|
|
||||||
will definitely get these warning messages, and you can safely ignore them.
|
|
||||||
|
|
||||||
### Debugging Mutexes
|
### Debugging Mutexes
|
||||||
By default the debugging output associated with mutexes are not compiled. To enable them, compile the program using the following command:
|
By default the debugging output associated with mutexes are not compiled. To
|
||||||
|
enable them, compile the program using the following command:
|
||||||
|
|
||||||
make CPPFLAGS=-DLOCK_DEBUG
|
make CPPFLAGS=-DLOCK_DEBUG
|
||||||
|
|
||||||
## SSL Support
|
|
||||||
If you run the program in the foreground, when it starts up, it will output the
|
|
||||||
SSL engine version string. Please verify that your libcurl is linked against
|
|
||||||
OpenSSL, as the pthread mutex functions are designed for OpenSSL.
|
|
||||||
|
|
||||||
The SSL engine version string looks something like this:
|
|
||||||
|
|
||||||
libcurl SSL engine: OpenSSL/1.0.2l
|
|
||||||
|
|
||||||
## The Technical Details
|
## The Technical Details
|
||||||
I noticed that most HTTP directory listings don't provide the file size for the
|
|
||||||
web page itself. I suppose this makes perfect sense, as they are generated on
|
|
||||||
the fly. Whereas the actual files have got file sizes. So the listing pages can
|
|
||||||
be treated as folders, and the rest are files.
|
|
||||||
|
|
||||||
This program downloads the HTML web pages/files using
|
This program downloads the HTML web pages/files using
|
||||||
[libcurl](https://curl.haxx.se/libcurl/), then parses the listing pages using
|
[libcurl](https://curl.haxx.se/libcurl/), then parses the listing pages using
|
||||||
[Gumbo](https://github.com/google/gumbo-parser), and presents them using
|
[Gumbo](https://github.com/google/gumbo-parser), and presents them using
|
||||||
[libfuse](https://github.com/libfuse/libfuse).
|
[libfuse](https://github.com/libfuse/libfuse).
|
||||||
|
|
||||||
I wrote the cache system myself. It was a Herculean effort. I am immensely proud
|
The cache system stores the metadata and the downloaded file into two
|
||||||
of it. The cache system stores the metadata and the downloaded file into two
|
separate directories. It uses ``uint8_t`` arrays to record which segments of the
|
||||||
separate directories. It uses bitmaps to record which segment of the file has
|
file had been downloaded.
|
||||||
been downloaded. By bitmap, I meant ``uint8_t`` arrays, which each byte
|
|
||||||
indicating for a 1 MiB segment. I could not be bothered to implement proper
|
|
||||||
bitmapping. The main challenge for the cache system was hunting down various
|
|
||||||
race conditions which caused metadata corruption, downloading the same segment
|
|
||||||
multiple times, and deadlocks.
|
|
||||||
|
|
||||||
## Other projects which incorporate HTTPDirFS
|
## Other projects which incorporate HTTPDirFS
|
||||||
- [Curious Container](https://www.curious-containers.cc/docs/red-connector-http#mount-dir)
|
- [Curious Container](https://www.curious-containers.cc/docs/red-connector-http#mount-dir)
|
||||||
|
|
Loading…
Reference in New Issue