Commit Graph

2 Commits

Author SHA1 Message Date
Omar Polo df6ca41da3
IRI support
This extends the URI parser so it supports full IRI (Internationalized
Resource Identifiers, RFC3987).  Some areas of it can/may be improved,
but here's a start.

Note: we assume UTF-8 encoded IRI.
2020-12-26 00:33:11 +01:00
Omar Polo 33d32d1fd6
implement a valid RFC3986 (URI) parser
Up until now I used a "poor man" approach: the uri parser is barely a
parser, it tries to extract the path from the request, with some minor
checking, and that's all.  This obviously is not RFC3986-compliant.

The new RFC3986 (URI) parser should be fully compliant.  It may accept
some invalid URI, but shouldn't reject or mis-parse valid URI.  (in
particular, the rule for the path is way more relaxed in this parser
than it is in the RFC text).

A difference with RFC3986 is that we don't even try to parse the
(optional) userinfo part of a URI: following the Gemini spec we treat
it as an error.

A further caveats is that %2F in the path part of the URI is
indistinguishable from a literal '/': this is NOT conforming, but due
to the scope and use of gmid, I don't see how treat a %2F sequence in
the path (reject the URI?).
2020-12-25 13:13:12 +01:00