updated README, also moved some status message around

This commit is contained in:
Fufu Fang 2019-04-23 02:35:17 +01:00
parent 96a1e4fd55
commit e166098162
4 changed files with 68 additions and 50 deletions

View File

@ -1,15 +1,10 @@
# HTTPDirFS
# HTTPDirFS - now with a permanent cache
Have you ever wanted to mount those HTTP directory listings as if it was a partition? Look no further, this is your solution. HTTPDirFS stands for Hyper Text Transfer Protocol Directory Filesystem
The performance of the program is excellent, due to the use of curl-multi interface. HTTP connections are reused, and HTTP pipelining is used when available. I haven't benchmarked it, but I feel this is faster than ``rclone mount``. The FUSE component itself also runs in multithreaded mode.
Furthermore, a permanent cache system has been implemented to cache all the files you have downloaded. This is triggered by the ``--cache`` flag
## BUG
The permanent cache system seems to have problem when you have randomly seek across the file during the initial download process. I am not sure what causes the problem. It is probably some sort of concurrency issue. The mutexes I set up doesn't seem to help with the problem.
This feature is also very slow. When downloading from localhost, it peaks at about 1.5 MiB/s. I am not entirely sure why.
Furthermore, a permanent cache system has been implemented to cache all the files you have downloaded, so you don't need to download those files if you want to access them again. This is triggered by the ``--cache`` flag
## Compilation
This program was developed under Debian Stretch. If you are using the same operating system as me, you need ``libgumbo-dev``, ``libfuse-dev``, ``libssl1.0-dev`` and ``libcurl4-openssl-dev``.
@ -26,13 +21,13 @@ If you run Debian Stretch, and you have OpenSSL 1.0.2 installed, and you get war
then you need to check if ``libssl1.0-dev`` had been installed properly. If you get these compilation warnings, this program will ocassionally crash if you connect to HTTPS website. This is because OpenSSL 1.0.2 needs those functions for thread safety, whereas OpenSSL 1.1 does not. If you have ``libssl-dev`` rather than ``libssl1.0-dev`` installed, those call back functions will not be linked properly.
If you have OpenSSL 1.1 and the associated development headers installed, then you can safely ignore these warning messages. If you are on Debian Buster, you will definitely get these warning messages, and you can safely ignore them.
If you have OpenSSL 1.1 and the associated development headers installed, then you can safely ignore these warning messages. If you are on Debian Buster, you will definitely get these warning messages, and you can safely ignore them.
## Usage
./httpdirfs -f $URL $YOUR_MOUNT_POINT
An example URL would be [Debian CD Image Server](https://cdimage.debian.org/debian-cd/). The ``-f`` flag keeps the program in the foreground, which is useful for monitoring which URL the filesystem is visiting.
An example URL would be [Debian CD Image Server](https://cdimage.debian.org/debian-cd/). The ``-f`` flag keeps the program in the foreground, which is useful for monitoring which URL the filesystem is visiting.
Other useful options:
@ -50,10 +45,16 @@ drive by using the ``--cache`` flag. The file it caches persist across sessions
mkdir cache mnt
httpdirfs --cache cache http://cdimage.debian.org/debian-cd/ mnt
Once a segment of the file has been downloaded once, it won't be downloaded again. So the first time you use the file it is slow, the subsequent access is fast. You can also retrieve your partially or fully downloaded file from ``cache/metadata``.
Once a segment of the file has been downloaded once, it won't be downloaded again. You can also retrieve your partially or fully downloaded file from ``cache/metadata``.
Please note that due to the way the permanent cache system is implemented. The maximum download speed is around 6MiB/s, as measured using my localhost as the web server. If you have a fast connection, you might be better off not to run the permanent cache system. If you have any patches to make it run faster, feel free to submit a pull request.
The permanent cache system also heavily relies on sparse allocation. Please make sure your filesystem supports it. Otherwise your hard drive / SSD might grind to a halt.
The permanent cache system also appears to be slightly buggy. This software seems to crash less if I don't turn it on. Your mileage may vary.
## Configuration file support
There is now rudimentary config file support. The configuration file that the program will read is ``${XDG_CONFIG_HOME}/httpdirfs/config``. If ``${XDG_CONFIG_HOME}`` is not set, it will default to ``${HOME}/.config``. So by default you need to put the configuration file at ``${HOME}/.config/httpdirfs/config``. You will have to create the sub-directory and the configuration file yourself. In the configuration file, please supply one option per line. For example:
There is now rudimentary config file support. The configuration file that the program will read is ``${XDG_CONFIG_HOME}/httpdirfs/config``. If ``${XDG_CONFIG_HOME}`` is not set, it will default to ``${HOME}/.config``. So by default you need to put the configuration file at ``${HOME}/.config/httpdirfs/config``. You will have to create the sub-directory and the configuration file yourself. In the configuration file, please supply one option per line. For example:
$ cat ${HOME}/.config/httpdirfs/config
--username test
@ -68,6 +69,12 @@ The SSL engine version string looks something like this:
libcurl SSL engine: OpenSSL/1.0.2l
## The Technical Details
I noticed that most HTTP directory listings don't provide the file size for the web page itself. I suppose this makes perfect sense, as they are generated on the fly. Whereas the actual files have got file sizes. So the listing pages can be treated as folders, and the rest are files.
I noticed that most HTTP directory listings don't provide the file size for the web page itself. I suppose this makes perfect sense, as they are generated on the fly. Whereas the actual files have got file sizes. So the listing pages can be treated as folders, and the rest are files.
This program downloads the HTML web pages/files using [libcurl](https://curl.haxx.se/libcurl/), then parses the listing pages using [Gumbo](https://github.com/google/gumbo-parser), and presents them using [libfuse](https://github.com/libfuse/libfuse)
This program downloads the HTML web pages/files using [libcurl](https://curl.haxx.se/libcurl/), then parses the listing pages using [Gumbo](https://github.com/google/gumbo-parser), and presents them using [libfuse](https://github.com/libfuse/libfuse).
I wrote the cache system myself. It was a Herculean effort. I am immensely proud of it.
## Acknowledgement
- I would like to thank [Cosmin Gorgovan](https://scholar.google.co.uk/citations?user=S7UZ6MAAAAAJ&hl=en) for the technical and moral support.
- I would like to thank [-Archivist]([https://www.reddit.com/user/-Archivist/](https://www.reddit.com/user/-Archivist/)) for not providing FTP or WebDAV access to his server. This piece of software was written in direct response to his appalling behaviour.

View File

@ -14,18 +14,9 @@
/**
* \brief Data file block size
* \details The data file block size is set to 128KiB, for convenience. This is
* because the maximum requested block size by FUSE seems to be 128KiB under
* Debian Stretch. Note that the minimum requested block size appears to be
* 4KiB.
*
* More information regarding block size can be found at:
* https://wiki.vuze.com/w/Torrent_Piece_Size
*
* Note that at the current configuration, a 16GiB file uses 16MiB of memory to
* store the bitmap
* \details We set it to 1024*1024 = 1048576 bytes
*/
#define DATA_BLK_SZ 131072
#define DATA_BLK_SZ 1048576
/**
* \brief the maximum length of a path
@ -35,6 +26,11 @@
int CACHE_SYSTEM_INIT = 0;
/**
* \brief the receive buffer
*/
uint8_t RECV_BUF[DATA_BLK_SZ];
/**
* \brief The metadata directory
*/
@ -321,6 +317,12 @@ static long Data_write(const Cache *cf, const uint8_t *buf, off_t len,
return -EINVAL;
}
size_t start = offset;
size_t end = start + len;
char range_str[64];
snprintf(range_str, sizeof(range_str), "%lu-%lu", start, end);
fprintf(stderr, "Data_write(%s, %s);\n", cf->path, range_str);
long byte_written = -EIO;
if (fseeko(cf->dfp, offset, SEEK_SET)) {
@ -563,14 +565,13 @@ cf->content_length: %ld, Data_size(fn): %ld\n",
return NULL;
}
fprintf(stderr, "Cache_open(): Opened cache file %p.\n", cf);
fprintf(stderr, "Cache_open(): Opened cache file %s.\n", cf->path);
return cf;
}
void Cache_close(Cache *cf)
{
fprintf(stderr, "Cache_close(): Closing cache file %p.\n", cf);
fprintf(stderr, "Cache_close(): Closing cache file %s.\n", cf->path);
if (Meta_write(cf)) {
fprintf(stderr, "Cache_close(): Meta_write() error.");
@ -579,7 +580,6 @@ void Cache_close(Cache *cf)
if (fclose(cf->dfp)) {
fprintf(stderr, "Data_write(): fclose(): %s\n", strerror(errno));
}
return Cache_free(cf);
}
@ -605,27 +605,39 @@ static void Seg_set(Cache *cf, off_t offset, int i)
cf->seg[byte] = i;
}
long Cache_read(Cache *cf, char *buf, size_t size, off_t offset)
long Cache_read(Cache *cf, char *output_buf, off_t size, off_t offset)
{
pthread_mutex_lock(&(cf->rw_lock));
// size_t start = offset;
// size_t end = start + size;
// char range_str[64];
// snprintf(range_str, sizeof(range_str), "%lu-%lu", start, end);
// fprintf(stderr, "Cache_read(%s, %s);\n", cf->path, range_str);
long received;
long sent;
if (Seg_exist(cf, offset)) {
/* The metadata shows the segment already exists */
received = Data_read(cf, (uint8_t *) buf, size, offset);
/*
* The metadata shows the segment already exists. This part is easy,
* as you don't have to worry about alignment
*/
sent = Data_read(cf, (uint8_t *) output_buf, size, offset);
} else {
/* The metadata shows the segment doesn't already exist */
received = path_download(cf->path, buf, size, offset);
Data_write(cf, (uint8_t *) buf, received, offset);
Seg_set(cf, offset, 1);
/* Calculate the aligned offset */
off_t dl_offset = offset / cf->blksz * cf->blksz;
/* Download the segment */
long recv = path_download(cf->path, (char *) RECV_BUF, cf->blksz, dl_offset);
/* Send it off */
memmove(output_buf, RECV_BUF + (offset-dl_offset), size);
sent = size;
/* Write it to the disk, check if we haven't received enough data*/
if (recv == cf->blksz) {
Data_write(cf, RECV_BUF, cf->blksz, dl_offset);
Seg_set(cf, dl_offset, 1);
} else if (dl_offset == (cf->content_length / cf->blksz * cf->blksz)) {
/* Check if we are at the last block */
Data_write(cf, RECV_BUF, cf->blksz, dl_offset);
Seg_set(cf, dl_offset, 1);
} else {
fprintf(stderr,
"Cache_read(): recv (%ld) < cf->blksz! Possible network error?\n",
recv);
}
}
pthread_mutex_unlock(&(cf->rw_lock));
return received;
return sent;
}

View File

@ -81,8 +81,6 @@ void Cache_close(Cache *cf);
*/
int Cache_create(const char *fn, long len, long time);
/***************************** To be completed ******************************/
/**
* \brief Intelligently read from the cache system
* \details If the segment does not exist on the local hard disk, download from
@ -92,8 +90,8 @@ int Cache_create(const char *fn, long len, long time);
* \param[in] size the requested segment size
* \param[in] offset the start of the segment
* \return the length of the segment the cache system managed to obtain.
* \note Called by fs_read()
* \note Called by fs_read(), verified to be working
*/
long Cache_read(Cache *cf, char *buf, size_t size, off_t offset);
long Cache_read(Cache *cf, char *output_buf, off_t size, off_t offset);
#endif

View File

@ -16,10 +16,10 @@ static void *fs_init(struct fuse_conn_info *conn)
/** \brief release an opened file */
static int fs_release(const char *path, struct fuse_file_info *fi)
{
fprintf(stderr, "fs_release(): %s\n", path);
if (CACHE_SYSTEM_INIT) {
Cache_close((Cache *)fi->fh);
}
fprintf(stderr, "fs_release(): %s\n", path);
return 0;
}
@ -85,6 +85,8 @@ static int fs_read(const char *path, char *buf, size_t size, off_t offset,
/** \brief open a file indicated by the path */
static int fs_open(const char *path, struct fuse_file_info *fi)
{
fprintf(stderr, "fs_open(): %s\n", path);
if (!path_to_Link(path)) {
return -ENOENT;
}
@ -100,7 +102,6 @@ static int fs_open(const char *path, struct fuse_file_info *fi)
}
}
fprintf(stderr, "fs_open(): %s\n", path);
return 0;
}