Commit Graph

1665 Commits

Author SHA1 Message Date
jvoisin 347740dce1 Speed up removeUnlikelyCandidates
`.Not` returns a brand new Selection, copied element by element.
2024-02-29 19:38:43 -08:00
jvoisin ab85d4d678 Improve EstimateReadingTime's speed by a factor 7
- Refactorise the tests and add some
- Use 250 signs instead of the whole text
- Only check for Korean, Chinese and Japanese script
- Add a benchmark
- Use a more idiomatic control flow

```console
$ # main branch
$ go test -bench=.
goos: linux
goarch: amd64
pkg: miniflux.app/v2/internal/reader/readingtime
BenchmarkEstimateReadingTime-12              267           4821268 ns/op
PASS
ok      miniflux.app/v2/internal/reader/readingtime     1.754s
$ # speed_up_reading_time branch
$ go test -bench=.
goos: linux
goarch: amd64
pkg: miniflux.app/v2/internal/reader/readingtime
cpu: 12th Gen Intel(R) Core(TM) i7-1265U
BenchmarkEstimateReadingTime-12             1941            653312 ns/op
PASS
ok      miniflux.app/v2/internal/reader/readingtime     1.342s
$
```
2024-02-29 19:24:15 -08:00
jvoisin 31ac62f410 Don't compute reading-time when unused
If the user doesn't display reading times, there is no need to compute them.
This should speed things up a bit, since `whatlanggo.Detect` is abysmally slow.
2024-02-29 19:14:17 -08:00
Frédéric Guillot 97765b93a9 Revert "Minor internal/reader/readability/readability.go speedup"
This reverts commit 4db138d4b8.

```
panic: runtime error: index out of range [-1]

goroutine 49 [running]:
miniflux.app/v2/internal/reader/readability.getArticle.func1(0x8?, 0xc000b56570)
        /home/fred/repos/miniflux/v2/internal/reader/readability/readability.go:120 +0x2ac
github.com/PuerkitoBio/goquery.(*Selection).Each(0xc000b56510, 0xc000892fa8)
        /home/fred/go/pkg/mod/github.com/!puerkito!bio/goquery@v1.9.0/iteration.go:10 +0x62
miniflux.app/v2/internal/reader/readability.getArticle(0xc00044f1f0, 0xc000a04a50)
        /home/fred/repos/miniflux/v2/internal/reader/readability/readability.go:101 +0x15d
miniflux.app/v2/internal/reader/readability.ExtractContent({0x1005d00?, 0xc0001522d0?})
        /home/fred/repos/miniflux/v2/internal/reader/readability/readability.go:91 +0x211
miniflux.app/v2/internal/reader/scraper.ScrapeWebsite(0xc000893688?, {0xc0007ce720, 0x54}, {0x0, 0x0})
        /home/fred/repos/miniflux/v2/internal/reader/scraper/scraper.go:63 +0x859
miniflux.app/v2/internal/reader/processor.ProcessFeedEntries(0xc000133188, 0xc000502c40, 0xc0003e6360, 0x0)
        /home/fred/repos/miniflux/v2/internal/reader/processor/processor.go:77 +0x8ea
miniflux.app/v2/internal/reader/handler.RefreshFeed(0xc000133188, 0x10cf, 0x52d5c, 0x0)
        /home/fred/repos/miniflux/v2/internal/reader/handler/handler.go:301 +0x1485
miniflux.app/v2/internal/cli.refreshFeeds.func1(0x0)
        /home/fred/repos/miniflux/v2/internal/cli/refresh_feeds.go:59 +0x2d7
created by miniflux.app/v2/internal/cli.refreshFeeds in goroutine 1
        /home/fred/repos/miniflux/v2/internal/cli/refresh_feeds.go:50 +0x5d5
```
2024-02-29 19:06:03 -08:00
dependabot[bot] f858ad5f26 Bump github.com/PuerkitoBio/goquery from 1.9.0 to 1.9.1
Bumps [github.com/PuerkitoBio/goquery](https://github.com/PuerkitoBio/goquery) from 1.9.0 to 1.9.1.
- [Release notes](https://github.com/PuerkitoBio/goquery/releases)
- [Commits](https://github.com/PuerkitoBio/goquery/compare/v1.9.0...v1.9.1)

---
updated-dependencies:
- dependency-name: github.com/PuerkitoBio/goquery
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-02-29 18:36:57 -08:00
jvoisin e6524f925f Simplify username generation for the tests
No need to generate random numbers 10 times, generate a single big-enough one.
A single int64 should be more than enough
2024-02-29 18:36:34 -08:00
Frédéric Guillot c493f8921e Add missing regex anchor detected by CodeQL 2024-02-28 20:50:17 -08:00
Frédéric Guillot b2ce98da87 Add missing plurals for some languages 2024-02-28 20:38:10 -08:00
jvoisin 4db138d4b8 Minor internal/reader/readability/readability.go speedup
- Don't use a capturing group in `divToPElementsRegexp`
- Remove a duplicate condition
- Replace a regex with a fixed-comparison and a `Contains`
2024-02-28 20:03:14 -08:00
jvoisin f12d5131b0 Divide the sanitization time by 3
Instead of having to allocate a ~100 keys map containing possibly dynamic
values (at least to the go compiler), allocate it once in a global variable.
This significantly speeds things up, by reducing the garbage
collector/allocator involvements.

Local synthetic benchmarks have shown a improvements from 38% of wall time to only
12%.
2024-02-28 20:00:13 -08:00
jvoisin 1f5c8ce353 Don't mix up capacity and length
- `make([]a, b)` create a slice of `b` elements `a`
- `make([]a, b, c)` create a slice of `0` elements `a`, but reserve space for `c` of them

When using `append` on the former, it will result on a slice with `b` leading
elements, which is unlikely to be what we want. This commit replaces the two
instances where this happens with the latter construct.
2024-02-28 19:57:30 -08:00
jvoisin 645a817685 Use modern for loops
Go 1.22 introduced a new [for-range](https://go.dev/ref/spec#For_range)
construct that looks a tad better than the usual `for i := 0; i < N; i++`
construct. I also tool the liberty of replacing some
`for i := 0; i < len(myitemsarray); i++ { … myitemsarray[i] …}`
with  `for item := range myitemsarray` when `myitemsarray` contains only pointers.
2024-02-28 19:55:28 -08:00
jvoisin f4f8342245 Remove a superfluous condition
No need to check if the length of `line` is positive since we're checking
afterwards that it contains the `=` sign.
2024-02-28 19:47:30 -08:00
jvoisin 543a690bfd Close resources as soon as possible, instead of using defer() in a loop
So that resources can be freed as soon as they're not used anymore, instead of
waiting for the two nested loops to finish.
2024-02-28 19:47:30 -08:00
jvoisin c4e5dad549 Remove superfluous escaping in a regex 2024-02-28 19:47:30 -08:00
jvoisin fa12c23d79 Use strings.ReplaceAll instead of strings.Replace(…, -1) 2024-02-28 19:47:30 -08:00
jvoisin 4fe902a5d2 Use `strings.EqualFold` instead of `strings.ToLower(…) ==` 2024-02-28 19:47:30 -08:00
jvoisin 61af08a721 Use .WriteString( instead of .Write([]byte(… 2024-02-28 19:47:30 -08:00
jvoisin b04550e2f2 Use `%q` instead of `"%s"` 2024-02-28 19:47:30 -08:00
jvoisin 5e5cb056c5 Make internal/worker/worker.go read-only
Since workers don't communicate anything back to the pool with the channel,
there is no need to have it bidirectional.
2024-02-28 19:39:03 -08:00
jvoisin 48fa64f8ec Use a switch-case construct in internal/locale/plural.go instead of an avalanche of if-if-if-if-if
Less lines or code and marginally greater readability, yay!
Oh and also preallocate a map in LoadCatalogMessages just because we can.
2024-02-28 19:36:38 -08:00
jvoisin f274394f0e Simplify formatFileSize
No need to use a loop with divisions and multiplications when we have logarithms.
2024-02-28 19:32:38 -08:00
jvoisin 9a4a942cc4 Simplify durationImpl 2024-02-28 19:32:38 -08:00
jvoisin 6b3b8e8c9b Inline some templating functions 2024-02-28 19:32:38 -08:00
jvoisin 5a7d6f8997 Make use of printer.Print when possible 2024-02-28 19:24:41 -08:00
jvoisin b4ed17fbac Add a printer.Print to internal/locale/printer.go
No need to use variadic functions with string format interpolation
to generate static strings.
2024-02-28 19:24:41 -08:00
dependabot[bot] 57476f4d59 Bump github.com/prometheus/client_golang from 1.18.0 to 1.19.0
Bumps [github.com/prometheus/client_golang](https://github.com/prometheus/client_golang) from 1.18.0 to 1.19.0.
- [Release notes](https://github.com/prometheus/client_golang/releases)
- [Changelog](https://github.com/prometheus/client_golang/blob/v1.19.0/CHANGELOG.md)
- [Commits](https://github.com/prometheus/client_golang/compare/v1.18.0...v1.19.0)

---
updated-dependencies:
- dependency-name: github.com/prometheus/client_golang
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-02-27 21:25:42 -08:00
jvoisin 7660910232 Use prepared statement for intervals 2024-02-27 21:25:25 -08:00
jvoisin b054506e3a Use proper prepared statements for ArchiveEntries 2024-02-27 21:25:25 -08:00
jvoisin c961c6db7d Use proper prepared statement for updateEnclosures 2024-02-27 21:25:25 -08:00
Frédéric Guillot 0f126d4d11 Fix CodeQL workflow 2024-02-27 21:01:38 -08:00
jvoisin b94756bbf0 Add a warning for StripTags 2024-02-27 20:41:47 -08:00
jvoisin db6ae707ef Add some tests for add_image_title
I'm not sure if the behaviour is expected, but I didn't manage to
get the content injection to work in my browser, so I guess it's alright?
2024-02-27 20:41:15 -08:00
Frédéric Guillot 97feec8ebf Add more URL validation in media proxy 2024-02-26 20:29:40 -08:00
jvoisin bce21a9f91 Remove github.com/google/uuid
Replace it with a hand-rolled implementation. Heck, an UUID isn't even a
requirement, according to [omnivore](https://docs.omnivore.app/integrations/api.html#saving-a-url-with-the-api)'s
documentation, any "unique id" would do.
2024-02-26 18:31:12 -08:00
jvoisin 06e256e5ef Simplify internal/reader/icon/finder.go
- Use a simple regex to parse data uri instead of a hand-rolled parser, and
  document what fields are considered mandatory.
- Use case-insensitive matching to find (fav)icons, instead of doing the same
  query twice with different letter cases
- Add 'apple-touch-icon-precomposed.png' as a fallback favicon
- Reorder the queries to have i`con` first, since it seems to be the most
  popular one. It used to be last, meaning that pages had to be parsed
  completely 4 times, instead of one now.
- Minor factorisation in findIconURLsFromHTMLDocument
2024-02-26 18:18:04 -08:00
jvoisin 040938ff6d Small refactoring of internal/reader/date/parser.go
- Split dates formats into those that require local times
  and those who don't, so that there is no need to have a switch-case in the
  for loop with around 250 iterations at most.
- Be more strict when it comes to timezones, previously invalid ones like -13
  were accepted. Also add a test for this.
- Bail out early if the date is an empty string.
2024-02-26 18:08:04 -08:00
dependabot[bot] 21da7f77f5 Bump golang.org/x/crypto from 0.19.0 to 0.20.0
Bumps [golang.org/x/crypto](https://github.com/golang/crypto) from 0.19.0 to 0.20.0.
- [Commits](https://github.com/golang/crypto/compare/v0.19.0...v0.20.0)

---
updated-dependencies:
- dependency-name: golang.org/x/crypto
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-02-26 18:01:00 -08:00
jvoisin c2d2f31438 Improve a bit internal/reader/scraper/scraper.go
- make findContentUsingCustomRules' more idiomatic,
  since in golang a function returning an error might
  return garbage in other parameter. Moreover, ignoring
  errors is bad practise.
- getPredefinedScraperRules is now running in constant-time,
  instead of iterating on a list with around 50 items in it.
2024-02-26 18:00:23 -08:00
jvoisin 5b2558bf92 Miscellaneous improvements to internal/reader/subscription/finder.go
- Surface `localizedError` in FindSubscriptionsFromWellKnownURLs via slog
- Use an inline declaration for new subscriptions, like done elsewhere in the
  file, if only for consistency's sake
- Preallocate the `subscriptions` slice when using an RSS-bridge,
  it's a good practise, and it might even marginally improve
  performances when adding __a lot__ of feeds via an rss-bridge instance, wooo!
2024-02-26 17:52:21 -08:00
jvoisin ecd59009fb Add a couple of new possible locations for feeds
- Hugo likes to generate index.xml
- feed.atom and feed.rss are used by enterprise-scale/old-school gigantic CMS
2024-02-26 17:43:51 -08:00
jvoisin 4a943b722d Add a couple of fuzzers 2024-02-26 17:23:49 -08:00
Frédéric Guillot 9d1b1e19d4 Google Reader: Do not return a 500 error when no items is returned 2024-02-25 21:17:49 -08:00
Frédéric Guillot 7a8061fc72 Fix regression introduced in PR #2402 2024-02-25 20:45:34 -08:00
jvoisin bca84bac8b Use an update-where for MarkCategoryAsRead instead of a subquery 2024-02-25 17:50:30 -08:00
jvoisin 66e0eb1bd6 Reformat's ArchiveEntries's query for consistency's sake
And replace the `=ANY` with an `IN`
2024-02-25 17:50:30 -08:00
jvoisin 26d189917e Simplify cleanupEntries' query
- `NOT (hash=ANY(%4))` can be expressed as `hash NOT IN $4`
- There is no need for a subquery operating on the same table,
  moving the conditions out is equivalent.
2024-02-25 17:50:30 -08:00
jvoisin ccd3955bf4 Format GetReadTime's query for consistency's sake 2024-02-25 17:50:30 -08:00
jvoisin 8a2cc3a344 Reformat the query in GetEntryIDs
To make it more consistent with how all the other are formatted
2024-02-25 17:50:30 -08:00
jvoisin 647fa025f8 Simplify WeeklyFeedEntryCount
No need for a `BETWEEN`: we want to filter on entries published in the last
week, no need to express is as "entries published between now and last week",
"entries published after last week" is enough.
2024-02-25 17:50:30 -08:00