Commit Graph

144 Commits

Author SHA1 Message Date
Corey McCaffrey 0683074b8b Added scraper rule for TheOatmeal.com
The default rule does not show the comic posted to the feed. The comic image is in a div with id "comic".
2020-05-13 21:28:00 -07:00
Corey McCaffrey 8f6c07afd6 Added scraper rule for RayWenderlich.com
RayWenderlich.com is a popular developer's community for iOS and Android developers. The default rule results in "GROUP GROUP GROUP GROUP…" instead of the content posted on the blog.
2020-05-13 21:28:00 -07:00
Frédéric Guillot 619aa58fb3 Handle more invalid dates
Fixes #617
2020-04-25 20:15:18 -07:00
Frédéric Guillot 592151bdb6 Add support for Invidious
- Embed Invidious player for invidio.us feeds
- Add new rewrite rule to use Invidious player for Youtube feeds
2020-03-20 20:56:59 -07:00
Andrew Williams 9974e0f458 Addition of scraper rule for wdwnt.com
By default fetching original content for wdwnt.com results in a snippet of the comments section, this rule captures the article content.
2020-02-28 20:24:58 -08:00
Frédéric Guillot 997e9422eb Ignore enclosures without URL 2020-01-30 21:18:49 -08:00
Frédéric Guillot 61f0c8aa66 Allow application/xhtml+xml links as comments URL in Atom replies 2020-01-04 16:07:06 -08:00
Frédéric Guillot bf632fad2e Allow only absolute URLs in comments URL
Some feeds are using invalid URLs (random text).
2020-01-04 15:54:16 -08:00
Kebin Liu 8cebd985a2 Use internal XML workarounds to detect feed format 2020-01-02 22:19:15 -08:00
Frédéric Guillot ac3c936820 Make sure whitelisted URI schemes are handled properly by the sanitizer 2020-01-02 11:03:51 -08:00
Frédéric Guillot 3debf75eb9 Normalize URL query string before executing HTTP requests
- Make sure query strings parameters are encoded
- As opposed to the standard library, do not append equal sign
for query parameters with empty value
- Strip URL fragments like Web browsers
2019-12-26 15:56:59 -08:00
Frédéric Guillot 200b1c304b Improve Dublin Core support for RDF feeds 2019-12-23 14:45:58 -08:00
Frédéric Guillot 1b33bb3d1c Improve Podcast support (iTunes and Google Play feeds)
- Add support for Google Play XML namespace
- Improve existing iTunes namespace implementation
2019-12-23 13:51:42 -08:00
Frédéric Guillot 33fdb2c489 Add support for Atom 0.3 2019-12-22 22:42:00 -08:00
Frédéric Guillot cfb6ddfcea Add support for Atom 'replies' link relation
Show comments URL for Atom feeds as per RFC 4685.
See https://tools.ietf.org/html/rfc4685#section-4

Note that only the first link with type "text/html" is taken into consideration.
2019-12-22 18:03:04 -08:00
cinput 8e1ed8bef3 Return outer HTML when scraping elements 2019-12-21 21:18:31 -08:00
somini 30f22fbd78 Update scraper rule for "Le Monde" 2019-12-19 18:35:29 -08:00
Jebbs a155ab6deb Filter valid XML characters for UTF-8 XML documents before decoding
This change should reduce "illegal character code" XML errors.
2019-12-19 18:31:52 -08:00
Frédéric Guillot a4ebb33cd5 Trim spaces for RDF entry links 2019-12-01 15:06:01 -08:00
Frédéric Guillot 120d6ec7d8 Do no rewrite Youtube description twice in "add_youtube_video" rule
This is already done before in <media:description>.
2019-11-30 22:56:06 -08:00
Frédéric Guillot 69aa650203 Add the possibility to add rules during feed creation 2019-11-29 11:27:58 -08:00
Frédéric Guillot 912a98788e Add support of media elements for Atom feeds 2019-11-28 23:55:40 -08:00
Frédéric Guillot f90e9dfab0 Add support of media elements for RSS 2 feeds 2019-11-28 21:33:32 -08:00
Frédéric Guillot c43c9458a9 Add rewrite functions: convert_text_link and nl2br 2019-11-28 21:33:12 -08:00
Neo Ng 90064a8cf0 Update scraper rule for openingsource.org 2019-11-28 19:40:26 -08:00
Tony Wang 2eb2441f2b Improve XML decoder to remove illegal characters 2019-10-22 20:32:35 -07:00
Tony Wang 5517eebafe Add new formats to date parser 2019-10-20 09:52:18 -07:00
Frédéric Guillot 36d7732234 Disable strict XML parsing
This change should improve parsing of broken XML feeds.

See https://golang.org/pkg/encoding/xml/#Decoder
2019-09-18 22:45:56 -07:00
Frédéric Guillot 934385ff55 Replace Travis by GitHub Actions 2019-09-15 11:48:15 -07:00
Frédéric Guillot 8d8f78241d Add native lazy loading for images and iframes
This feature is available only in Chrome >= 76 for now.

See https://web.dev/native-lazy-loading
2019-09-10 21:22:19 -07:00
Peter De Wachter b6f3160dbc add_mailto_subject: New rewrite function
Dinosaur Comics (qwantz.com) likes to hide jokes in mailto: links, but
miniflux's sanitizer strips those out.
2019-08-19 19:42:47 -07:00
Frédéric Guillot ac45307da6 Add test case for parsing HTML entities 2019-08-15 21:42:13 -07:00
Peter De Wachter ea2b6e3608 addImageTitle: Fix HTML injection
This rewrite rule would change this:

    <img title="<foo>">

to this:

    <figure><img><figcaption><foo></figcaption></figure>

The image title needs to be properly escaped.
2019-08-15 21:39:41 -07:00
Peter De Wachter 3a39d110f0 Accept HTML entities when parsing XML
Every once in a while, one of my feeds would throw an XML parse error
because it used `&nbsp;` or some other HTML entity. I feel Miniflux
should be lenient here, and Go already has a handy hook to make this
work.
2019-08-15 21:26:07 -07:00
Ilya Glotov c840268678
Sort feed categories before serialization
A function is added for feeds and its categories normalization.
The test will ensure that the order is right.
2019-07-05 20:34:49 +03:00
Frédéric Guillot 129f1bf3da Add support for OPML v1 import 2019-03-26 20:09:31 -07:00
Jeremy Apthorp 304b43cb30 Add 'allow-popups' to iframe sandbox permissions 2019-03-26 18:26:56 -07:00
Frédéric Guillot 6764a420b0 Make parser compatible with Go 1.12
See changes in strings.Map(): https://golang.org/doc/go1.12#strings
2019-02-28 21:23:33 -08:00
Frédéric Guillot f3fc8b7072 Use feed ID instead of user ID to check entry URLs presence 2019-02-28 20:43:33 -08:00
Frédéric Guillot ed6ae7e0d2 Use preferably the published date for Atom feeds
YouTube feeds use the published date for the original creation date.
2019-01-29 20:01:36 -08:00
Peter De Wachter 0cdcec10ca More robust Atom text handling
Miniflux couldn't deal with XHTML Summary elements.

- Make Summary an 'atomContent' field
- Define an atomContentToString function rather than inling it three times
- Also properly escape special characters in plain text fields.
2019-01-07 17:55:02 -08:00
Frédéric Guillot 56efd2eb3f Add workaround for non GMT dates (RFC822, RFC850, and RFC1123)
RFC822, RFC850, and RFC1123 are supposed to be always in GMT.

This is a workaround for the one defined in PST timezone.
2018-12-26 20:24:38 -08:00
Frédéric Guillot 012138179c Add function storage.UpdateFeedError() 2018-12-15 13:04:38 -08:00
Tom Matthews 8b40778ee1 Add BBC News scraping rule 2018-12-13 20:25:30 -08:00
Frederic Guillot 61bfb3cfa8 Make password prompt compatible with Windows 2018-12-09 17:44:33 -08:00
Frédéric Guillot 1bc8535dbb Move image proxy filter to template functions 2018-12-02 21:09:53 -08:00
Frédéric Guillot 6f5d93cbbe Update scraper rule for lemonde.fr 2018-12-02 20:53:22 -08:00
Frédéric Guillot 311a133ab8 Refactor manual entry scraper 2018-12-02 20:51:06 -08:00
mapl e47188eab2 Update scraper rule for heise.de 2018-12-01 11:49:30 -08:00
Frédéric Guillot 487852f07e Replace daemon and scheduler package with service package 2018-11-11 15:32:48 -08:00