Commit Graph

44 Commits

Author SHA1 Message Date
Frédéric Guillot cecab91298 Fix some linter issues 2022-08-08 22:06:38 -07:00
Frédéric Guillot 13fa08ad39 Handle Atom links with a text/html type defined 2022-07-31 17:43:03 -07:00
lf94 fa8431c5c6 Try to use outermost element text when title is empty 2022-04-13 21:51:54 -07:00
Frédéric Guillot 1eb01b39e7 Use truncated entry description as title if unavailable 2022-03-04 17:10:32 -08:00
Frédéric Guillot c9e0f0b3e4 Do not fallback to InnerXML if XHTML title is empty 2022-03-04 14:28:56 -08:00
Adrian Smith cc3e65dd3c Handle atom feed with space around CDATA
Trim space around CDATA elements before extracting the CharData.

This problem was discovered when reading https://www.sethvargo.com/feed.xml.
Title and Summary fields have newlines and space between the <title>
element and the CDATA element. e.g.

  <title>
    <![CDATA[Entry title here]]>
  </title>

This meant the title of the feed was coming into MiniFlux as,
  <![CDATA[Entry title here]]>
2022-01-17 15:25:22 -08:00
Frédéric Guillot f18ded6117 Add support for multiple authors in Atom feeds 2022-01-14 20:20:55 -08:00
Frédéric Guillot 6e2e2d1665 Setup golangci-lint Github Action 2021-03-22 21:34:48 -07:00
Frédéric Guillot e60e0ba3c4 Add workaround to handle some invalid dates 2021-03-21 10:52:27 -07:00
Frédéric Guillot 5877048749 Improve handling of Atom text content with CDATA 2021-03-20 20:47:35 -07:00
Frédéric Guillot c8c1f05328 Add better support of Atom text constructs
- Note that Miniflux does not render entry title with HTML tags as of now
- Omit XHTML div element because it should not be part of the content
2021-03-19 22:05:00 -07:00
Frédéric Guillot 14888f1cb8 Fix incorrect parsing of Atom entry content of type HTML 2021-03-18 21:43:59 -07:00
Frédéric Guillot 04f9c456d5 Handle entry title with double encoded entities in Atom feeds 2021-02-14 11:19:21 -08:00
Frédéric Guillot 291bf96d15 Do not strip tags for entry title
Some technical blogs have titles like "</some-title>" or "This is some <code>source code</code>".

Miniflux was removing these elements which prevent rendering the title correctly.
2021-01-03 11:44:07 -08:00
Frédéric Guillot f722fd1208 Handle invalid feeds with relative URLs 2020-12-02 20:58:18 -08:00
Frédéric Guillot a108cb7808 Handle various invalid date 2020-11-16 21:37:33 -08:00
Frédéric Guillot 4f358aa0f3 Do not escape HTML for Atom 1.0 text content during parsing
Avoid encoding single quotes to HTML entities (&#39;).

Feed contents are sanitized after parsing.
2020-10-30 23:41:33 -07:00
Frédéric Guillot 997e9422eb Ignore enclosures without URL 2020-01-30 21:18:49 -08:00
Frédéric Guillot 61f0c8aa66 Allow application/xhtml+xml links as comments URL in Atom replies 2020-01-04 16:07:06 -08:00
Frédéric Guillot bf632fad2e Allow only absolute URLs in comments URL
Some feeds are using invalid URLs (random text).
2020-01-04 15:54:16 -08:00
Frédéric Guillot 33fdb2c489 Add support for Atom 0.3 2019-12-22 22:42:00 -08:00
Frédéric Guillot cfb6ddfcea Add support for Atom 'replies' link relation
Show comments URL for Atom feeds as per RFC 4685.
See https://tools.ietf.org/html/rfc4685#section-4

Note that only the first link with type "text/html" is taken into consideration.
2019-12-22 18:03:04 -08:00
Frédéric Guillot 912a98788e Add support of media elements for Atom feeds 2019-11-28 23:55:40 -08:00
Tony Wang 2eb2441f2b Improve XML decoder to remove illegal characters 2019-10-22 20:32:35 -07:00
Frédéric Guillot 36d7732234 Disable strict XML parsing
This change should improve parsing of broken XML feeds.

See https://golang.org/pkg/encoding/xml/#Decoder
2019-09-18 22:45:56 -07:00
Frédéric Guillot ac45307da6 Add test case for parsing HTML entities 2019-08-15 21:42:13 -07:00
Peter De Wachter 3a39d110f0 Accept HTML entities when parsing XML
Every once in a while, one of my feeds would throw an XML parse error
because it used `&nbsp;` or some other HTML entity. I feel Miniflux
should be lenient here, and Go already has a handy hook to make this
work.
2019-08-15 21:26:07 -07:00
Frédéric Guillot ed6ae7e0d2 Use preferably the published date for Atom feeds
YouTube feeds use the published date for the original creation date.
2019-01-29 20:01:36 -08:00
Peter De Wachter 0cdcec10ca More robust Atom text handling
Miniflux couldn't deal with XHTML Summary elements.

- Make Summary an 'atomContent' field
- Define an atomContentToString function rather than inling it three times
- Also properly escape special characters in plain text fields.
2019-01-07 17:55:02 -08:00
Frédéric Guillot 9dc38a0803 Add missing package descriptions for GoDoc 2018-10-08 17:32:17 -07:00
Frédéric Guillot dbcc5d8a97 Use canonical imports 2018-08-24 21:56:39 -07:00
neepl 5365f31e90 Add support for published tag in Atom feeds 2018-07-17 21:52:05 -07:00
Frédéric Guillot 482785c5e6 Convert enclosure size field to bigint 2018-03-14 20:09:06 -07:00
Frédéric Guillot f110384f11 Improve parser error messages 2018-02-27 21:19:59 -08:00
Frédéric Guillot 953d0a2dc0 Support localized feed errors generated by background workers 2018-02-27 21:08:32 -08:00
Frédéric Guillot 9292d5d604 Handle Atom feeds with HTML title 2018-02-17 12:21:58 -08:00
Frédéric Guillot 713b38e34c Handle more encoding edge cases
- Feeds with charset specified only in Content-Type header and not in XML document
- Feeds with charset specified in both places
- Feeds with charset specified only in XML document and not in HTTP header
2018-01-20 13:25:21 -08:00
Frédéric Guillot c39f2e1a8d Rename helper packages 2018-01-02 19:15:08 -08:00
Frédéric Guillot 1d8193b892 Add logger 2017-12-15 18:55:57 -08:00
Frédéric Guillot 827683ab59 Make sure that item URL are absolute 2017-12-13 20:16:15 -08:00
Frédéric Guillot 84d912c979 Rewrite imports 2017-12-12 21:48:13 -08:00
Frédéric Guillot 33445e5b68 Add the possibility to define rewrite rules for each feed 2017-12-11 22:16:32 -08:00
Frédéric Guillot 2b641cc224 Improve feed parsers 2017-11-22 14:52:31 -08:00
Frédéric Guillot d5838b6734 Move feed parsers packages in reader package 2017-11-20 19:17:04 -08:00