miniflux-v2/internal/reader
Ztec e54825bf02 Improve YouTube page feed detection
In order to be more resilient to YouTube URLs variation and
to address this feature_request: https://github.com/miniflux/v2/issues/2628
I've reworked a bit the way the YouTube feed extraction is done.

I've kept all the `FindSubscriptionsFromYouTube*` in order
to keep all the existing unit tests as-is ensuring little to no
regressions. By doing so, I had to call twice `youtubeURLIDExtractor`.
Small performance penalty for peace of mind in my opinion.

`youtubeURLIDExtractor` is made in a way only one kind
of page can be detected at a time. This mean I can
solve the "video in a playlist" feature_request
by prioritizing the playlist ID over the Video ID

Also, by using `url.Parse()` to get ids, it's safer
to url mangle and variation. The most common variation
being the `t=42` parameters that start the playback
at a given position. Previously, this kind of url
would not be detected as "YouTube URL".

I deliberately ignored the url parsing error
to keep previous behavior (skip the YouTube analysis and follow with the other analysis)

I also tried to keep debug logs the same as before as much as I could.

I manually tested all the YouTube cases (video,channel,playlist)
and they all work as expected except for the video. But this one
does not work either on main. The `meta` html tag that was searched for
does not seem to exist anymore.

fix: #2628
2024-06-13 20:18:47 -07:00
..
atom Ensure enclosure URLs are always absolute 2024-03-19 21:57:46 -07:00
date Small refactoring of internal/reader/date/parser.go 2024-02-26 18:08:04 -08:00
dublincore Refactor RDF parser to use an adapter 2024-03-12 20:54:05 -07:00
encoding Inline a one-liner function 2024-03-20 17:21:30 -07:00
fetcher http/response: add brotli compression support 2024-04-19 12:16:49 -07:00
googleplay Refactor RSS Parser to use an adapter 2024-03-13 21:25:09 -07:00
handler Fix force refresh 2024-03-15 19:42:09 -07:00
icon Inline a one-liner function 2024-03-20 17:21:30 -07:00
itunes Refactor RSS Parser to use an adapter 2024-03-13 21:25:09 -07:00
json Ensure enclosure URLs are always absolute 2024-03-19 21:57:46 -07:00
media Minor simplification of internal/reader/media/media.go 2024-03-18 16:09:32 -07:00
opml Add description field to feed settings 2024-05-06 15:40:36 -07:00
parser Refactor Atom parser to use an adapter 2024-03-15 17:27:16 -07:00
processor reader/processor: error out for improper rewrite regexp 2024-06-01 10:37:02 -07:00
rdf Remove trailing space in SiteURL and FeedURL 2024-03-18 17:51:06 -07:00
readability Enable go-critic linter and fix various issues detected 2024-03-17 13:52:34 -07:00
readingtime reader/readingtime: fix incorrect package name 2024-05-21 18:12:24 -07:00
rewrite Update theverge.com rewrite rule: fix duplicate image 2024-06-10 21:08:59 -07:00
rss reader/rss: don't add empty tags to RSS items 2024-03-24 19:46:56 -07:00
sanitizer Enable go-critic linter and fix various issues detected 2024-03-17 13:52:34 -07:00
scraper Add pitchfork.com scraping rule 2024-06-10 21:08:59 -07:00
subscription Improve YouTube page feed detection 2024-06-13 20:18:47 -07:00
xml Enable go-critic linter and fix various issues detected 2024-03-17 13:52:34 -07:00