Update README.md

This commit is contained in:
Fufu Fang 2019-04-24 03:20:38 +01:00
parent cc1697894b
commit ccac51a94f
1 changed files with 44 additions and 40 deletions

View File

@ -1,11 +1,51 @@
# HTTPDirFS - now with a permanent cache
Have you ever wanted to mount those HTTP directory listings as if it was a partition? Look no further, this is your solution. HTTPDirFS stands for Hyper Text Transfer Protocol Directory Filesystem
The performance of the program is excellent, due to the use of curl-multi interface. HTTP connections are reused, and HTTP pipelining is used when available. I haven't benchmarked it, but I feel this is faster than ``rclone mount``. The FUSE component itself also runs in multithreaded mode.
The performance of the program is excellent, due to the use of curl-multi interface. HTTP connections are reused, and HTTP pipelining is used when available. The FUSE component itself also runs in multithreaded mode.
Furthermore, a permanent cache system has been implemented to cache all the files you have downloaded, so you don't need to download those files if you want to access them again. This is triggered by the ``--cache`` flag
The permanent cache system caches all the files you have downloaded, so you don't need to download those files again if you later access them again. This feature is triggered by the ``--cache`` flag. This makes this filesystem much faster than ``rclone mount``.
## Usage
./httpdirfs -c $CACHE_FOLDER -f $URL $YOUR_MOUNT_POINT
An example URL would be [Debian CD Image Server](https://cdimage.debian.org/debian-cd/). The ``-f`` flag keeps the program in the foreground, which is useful for monitoring which URL the filesystem is visiting.
Other useful options:
-u --username HTTP authentication username
-p --password HTTP authentication password
-P --proxy Proxy for libcurl, for more details refer to
https://curl.haxx.se/libcurl/c/CURLOPT_PROXY.html
--proxy-username Username for the proxy
--proxy-password Password for the proxy
--cache Set the cache folder
## Permanent cache system
You can now cache all the files you have looked at permanently on your hard
drive by using the ``--cache`` flag. The file it caches persist across sessions For example:
mkdir cache mnt
httpdirfs -f --cache cache http://cdimage.debian.org/debian-cd/ mnt
Once a segment of the file has been downloaded once, it won't be downloaded again.
Please note that due to the way the permanent cache system is implemented. The maximum download speed is around 6MiB/s, as measured using my localhost as the web server. However after you have accessed the file once, and you access it again, it will be the same speed as accessing your hard drive. .
If you have any patches to make the initial download go faster, feel free to submit a pull request.
The permanent cache system also heavily relies on sparse allocation. Please make sure your filesystem supports it. Otherwise your hard drive / SSD might grind to a halt.
## Configuration file support
There is now rudimentary config file support. The configuration file that the program will read is ``${XDG_CONFIG_HOME}/httpdirfs/config``. If ``${XDG_CONFIG_HOME}`` is not set, it will default to ``${HOME}/.config``. So by default you need to put the configuration file at ``${HOME}/.config/httpdirfs/config``. You will have to create the sub-directory and the configuration file yourself. In the configuration file, please supply one option per line. For example:
$ cat ${HOME}/.config/httpdirfs/config
--username test
--password test
-f
## Compilation
This program was developed under Debian Stretch. If you are using the same operating system as me, you need ``libgumbo-dev``, ``libfuse-dev``, ``libssl1.0-dev`` and ``libcurl4-openssl-dev``.
@ -23,55 +63,19 @@ then you need to check if ``libssl1.0-dev`` had been installed properly. If you
If you have OpenSSL 1.1 and the associated development headers installed, then you can safely ignore these warning messages. If you are on Debian Buster, you will definitely get these warning messages, and you can safely ignore them.
## Usage
./httpdirfs -f $URL $YOUR_MOUNT_POINT
An example URL would be [Debian CD Image Server](https://cdimage.debian.org/debian-cd/). The ``-f`` flag keeps the program in the foreground, which is useful for monitoring which URL the filesystem is visiting.
Other useful options:
-u --username HTTP authentication username
-p --password HTTP authentication password
-P --proxy Proxy for libcurl, for more details refer to
https://curl.haxx.se/libcurl/c/CURLOPT_PROXY.html
--proxy-username Username for the proxy
--proxy-password Password for the proxy
--cache Set the cache folder
## Permanent cache system
You can now cache all the files you have looked at permanently on your hard
drive by using the ``--cache`` flag. The file it caches persist across sessions For example:
mkdir cache mnt
httpdirfs --cache cache http://cdimage.debian.org/debian-cd/ mnt
Once a segment of the file has been downloaded once, it won't be downloaded again. You can also retrieve your partially or fully downloaded file from ``cache/metadata``.
Please note that due to the way the permanent cache system is implemented. The maximum download speed is around 6MiB/s, as measured using my localhost as the web server. If you have a fast connection, you might be better off not to run the permanent cache system. If you have any patches to make it run faster, feel free to submit a pull request.
The permanent cache system also heavily relies on sparse allocation. Please make sure your filesystem supports it. Otherwise your hard drive / SSD might grind to a halt.
## Configuration file support
There is now rudimentary config file support. The configuration file that the program will read is ``${XDG_CONFIG_HOME}/httpdirfs/config``. If ``${XDG_CONFIG_HOME}`` is not set, it will default to ``${HOME}/.config``. So by default you need to put the configuration file at ``${HOME}/.config/httpdirfs/config``. You will have to create the sub-directory and the configuration file yourself. In the configuration file, please supply one option per line. For example:
$ cat ${HOME}/.config/httpdirfs/config
--username test
--password test
-f
## SSL Support
If you run the program in the foreground, when it starts up, it will output the SSL engine version string. Please verify that your libcurl is linked against OpenSSL, as the pthread mutex functions are designed for OpenSSL.
The SSL engine version string looks something like this:
libcurl SSL engine: OpenSSL/1.0.2l
## The Technical Details
I noticed that most HTTP directory listings don't provide the file size for the web page itself. I suppose this makes perfect sense, as they are generated on the fly. Whereas the actual files have got file sizes. So the listing pages can be treated as folders, and the rest are files.
This program downloads the HTML web pages/files using [libcurl](https://curl.haxx.se/libcurl/), then parses the listing pages using [Gumbo](https://github.com/google/gumbo-parser), and presents them using [libfuse](https://github.com/libfuse/libfuse).
I wrote the cache system myself. It was a Herculean effort. I am immensely proud of it.
I wrote the cache system myself. It was a Herculean effort. I am immensely proud of it. The cache system stores the metadata and the downloaded file into two separate directories. It uses bitmaps to record which segment of the file has been downloaded. By bitmap, I meant ``uint8_t`` arrays, which each byte indicating for a 1 MiB segment. I could not be bothered to implement proper bitmapping. The main challenge for the cache system was hunting down a race condition which corrupted the metadata.
## Acknowledgement
- First of all, I would like to thank [Jerome Charaoui](https://github.com/jcharaoui) for being the Debian Maintainer for this piece of software. Thank you so much for packaging it!