This extends the URI parser so it supports full IRI (Internationalized
Resource Identifiers, RFC3987). Some areas of it can/may be improved,
but here's a start.
Note: we assume UTF-8 encoded IRI.
Up until now I used a "poor man" approach: the uri parser is barely a
parser, it tries to extract the path from the request, with some minor
checking, and that's all. This obviously is not RFC3986-compliant.
The new RFC3986 (URI) parser should be fully compliant. It may accept
some invalid URI, but shouldn't reject or mis-parse valid URI. (in
particular, the rule for the path is way more relaxed in this parser
than it is in the RFC text).
A difference with RFC3986 is that we don't even try to parse the
(optional) userinfo part of a URI: following the Gemini spec we treat
it as an error.
A further caveats is that %2F in the path part of the URI is
indistinguishable from a literal '/': this is NOT conforming, but due
to the scope and use of gmid, I don't see how treat a %2F sequence in
the path (reject the URI?).
before the -d option only accepted absolute paths, and this wasn't
documented. Even more, with the default value of "docs" it won't
work. Now it transforms all relative paths to absolute paths before
going on.
internally, gmid doesn’t care if the client issued a certificate, but
now we pass that information to the CGI script in some new environment
variables.
enhance the CGI scripting support so that script can take path
parameters. That is, a script at /cgi/foo is called when the request
path is /cgi/foo/bar/...
This commit also introduce some backward incompatible changes as the
default env variables set for the CGI script changed.
change the meaning of the -x flag: now it takes a string and executes
CGI scripts only if they are inside a directory with the given name,
relatively to the document root.
This is a first try at implementing CGI scripting. The idea is that,
if CGI is explicitly enabled by the user, when a user requires an
executable file instead of serving it to the client, that file will be
executed and its output fed to the client.
There are various pieces that are still lacking, the firsts that comes
to mind are:
- performance: the handle_cgi just loops ignoring the
WANT_POLLIN/POLLOUT and blocking if the child process hasn’t
outputted anything.
- we don’t parse query variable (yet)
- we need to set more variables in the child environment
side question: it’s better to set the variables using setenv() or
by providing an explicit environment?
- document what environment the CGI script will get
- improve the horrible unveil/pledge(cgi ? …)
but now I can serve “hello-world”-tier script from gmid!
(I know, changing variables names AND introducing changes is better
done in more commits, but…)
Added back an lseek that was missing. If TLS_WANT_POLL{IN,OUT}, we
need to re-send that block, but we need also to rewind the file, in
order to read(2) that chunk again. This doesn’t solve the corruption
in transferring big files, but reduces them. I still haven’t tracked
down the corruption :/
At the moment there is an hardcoded table that maps mime types to
extensions. For the time being this can be OK, as I don’t even
currently serve all those types of file, but in the future I’d like to
let user pass a file with the mapping, like /usr/share/misc/mime.types
on OpenBSD, to map. However, even in this case, we should hardcode
text/gemini IMHO, since most mime.types listing doesn’t have it yet.
We can handle up to MAX_USERS (64 by default) concurrently.
Now, given that we don’t support CGI, it’s not a big deal. Gemini
requests are small (up to 1024 bytes), and also the replies from the
server are small (one line plus the document — if any), all over TLS
obviously. (but even there, it’s lighter than HTTP because we don’t
need to send the whole chain for the certificate — see TOFU).
Given all the above, this doesn’t really improve the performance in
the real world, but it’s nice to have. The main use case for this is
to disallow slow clients to stop fast clients.