Commit Graph

33 Commits

Author SHA1 Message Date
Marie Ramlow 48acd1feca Add rewrite and scraper rules for blog.cloudflare.com 2023-02-05 21:01:42 -08:00
Davide Masserut 690d66ce0b Update scraping rules for ilpost.it 2022-12-27 13:33:41 -08:00
Davide Masserut ef312ef770 Update scraping rule for ilpost.it 2022-12-16 15:07:10 -08:00
Davide Masserut c0bed53b42 Add scraping rule for ilpost.it 2022-12-15 19:53:12 -08:00
jgbresson 7f6ce16d85 Add scraping rules for theverge.com 2022-10-16 11:58:35 -07:00
Adam B 4d847c6a74 Add scraping rule for royalroad.com
This is what I use for several stories I follow, and I thought it might be useful to other miniflux users.
2022-08-17 19:25:39 -07:00
Owen Valentine f404ddde91 Add swordscomic.com 2022-08-17 19:23:29 -07:00
Owen Valentine c8a3d953cf Add smbc-comics.com 2022-08-17 19:23:29 -07:00
Owen Valentine f851ecac78 Sort alphabetically 2022-08-17 19:23:29 -07:00
Gabe Cook bd1dc3149e Add explosm.net scraper rule 2022-07-30 20:10:52 -07:00
Jouni K. Seppänen bb0d2bf675 Add Youtube videos in Quanta articles
Some articles (especially the recent year-in-review ones) include a Youtube
video. The server-side rendered articles do not include the Youtube iframe,
but they do have a script that looks like

    <script type="text/javascript" data-reactid="6">
      window.__APOLLO_STATE__ = {
        ...
          youtube_id: "9uASADiYe_8",

We add a reformatting function that tries to detect obvious JavaScript code
that has a field or variable called youtube_id that has an 11-character
double-quoted value, and adds the referenced Youtube videos in the beginning of
the article. This is slightly more general than needed for Quanta, in the hope
that it could be useful for similar sites.
2022-01-03 10:10:13 -08:00
Jouni K. Seppänen dcf87bd642 Add scrape and rewrite rules for quantamagazine
This is a somewhat complex React site so the rules could be a little fragile.
Text content seems to be always inside .outer--content, and most h6 elements
are fluff like "read later" or pointers to other articles. However, h6.byline
and h6.post__title__kicker are relevant to the current article.

Figure captions are sometimes inside both figure and div.outer--content
elements, sometimes only inside figure, so take both and remove the
intersection.

The figure elements sometimes contain multiple copies of images or
videos, and we just take them all. Math articles seem to use Mathjax,
which we don't add.
2022-01-03 10:10:13 -08:00
Jouni K. Seppänen 2fedd8f234 Add scraper rule for ikiwiki.iki.fi
Feed: https://ikiwiki.iki.fi/feed.php?linkto=current&ns=uutiset%3Ablog&num=5

Example page: https://ikiwiki.iki.fi/uutiset/blog/20210923100421viiveita

(To clarify, I'm not a representative of iki.fi although I have an email address in the domain. This is a nonprofit association that offers email forwarding addresses, and the rss feed in question contains news for their members.)
2021-12-27 20:51:37 -08:00
Frédéric Guillot b7c229f30f Update scraper rule for theregister.com 2021-08-16 20:04:02 -07:00
Frédéric Guillot 31435ef83e Add rewrite rule to fix Medium.com images 2020-09-29 22:27:32 -07:00
Manuel Müller ca918bc7e3 Added scraper rule for dilbert.com and turnoff.us 2020-06-10 20:15:46 -07:00
Corey McCaffrey 25d4b9fc0c Added scraper rule for financialsamurai.com
The default rule results in blank content.
2020-05-24 13:29:28 -07:00
Corey McCaffrey 0683074b8b Added scraper rule for TheOatmeal.com
The default rule does not show the comic posted to the feed. The comic image is in a div with id "comic".
2020-05-13 21:28:00 -07:00
Corey McCaffrey 8f6c07afd6 Added scraper rule for RayWenderlich.com
RayWenderlich.com is a popular developer's community for iOS and Android developers. The default rule results in "GROUP GROUP GROUP GROUP…" instead of the content posted on the blog.
2020-05-13 21:28:00 -07:00
Andrew Williams 9974e0f458 Addition of scraper rule for wdwnt.com
By default fetching original content for wdwnt.com results in a snippet of the comments section, this rule captures the article content.
2020-02-28 20:24:58 -08:00
somini 30f22fbd78 Update scraper rule for "Le Monde" 2019-12-19 18:35:29 -08:00
Neo Ng 90064a8cf0 Update scraper rule for openingsource.org 2019-11-28 19:40:26 -08:00
Tom Matthews 8b40778ee1 Add BBC News scraping rule 2018-12-13 20:25:30 -08:00
Frédéric Guillot 6f5d93cbbe Update scraper rule for lemonde.fr 2018-12-02 20:53:22 -08:00
mapl e47188eab2 Update scraper rule for heise.de 2018-12-01 11:49:30 -08:00
Frédéric Guillot df2bebaf3d Update scraper rule for heise.de 2018-08-25 10:33:18 -07:00
Frédéric Guillot dbcc5d8a97 Use canonical imports 2018-08-24 21:56:39 -07:00
Frédéric Guillot 1d7fe892e1 Add scraper rule for darkreading.com 2018-01-06 13:25:12 -08:00
Frédéric Guillot 48aa0d07ef Add more scraper rules 2018-01-04 19:32:24 -08:00
Frédéric Guillot c454f67037 Add scraper rules for version2.dk and ing.dk 2017-12-27 19:44:23 -08:00
Frédéric Guillot d4839b5597 Add more scraper rules 2017-12-27 13:36:07 -08:00
Frédéric Guillot c6d9eb3614 Improve content scraper 2017-12-13 21:30:40 -08:00
Frédéric Guillot 87ccad5c7f Add scraper rules 2017-12-10 20:51:04 -08:00