Commit Graph

84 Commits

Author SHA1 Message Date
Michael Eischer ff7ef5007e Replace most usages of ioutil with the underlying function
The ioutil functions are deprecated since Go 1.17 and only wrap another
library function. Thus directly call the underlying function.

This commit only mechanically replaces the function calls.
2022-12-02 19:36:43 +01:00
greatroar 09c14f33c8 internal/checker: Pass Error.Error pointer receiver 2022-10-14 14:13:32 +02:00
Michael Eischer 2e3f1c08c5 repository: split index into a separate package 2022-10-08 21:15:34 +02:00
Michael Eischer ddcf549eba repository: remove IsMixedPack and add replacement for checker
Repositories with mixed packs are probably quite rare by now. When
loading data blobs from a mixed pack file, this will no longer trigger
caching that file. However, usually tree blobs are accessed first such
that this shouldn't make much of a difference.

The checker gets a simpler replacement.
2022-10-03 12:03:59 +02:00
Michael Eischer 1ebd57247a repository: optimize MasterIndex.Each
Sending data through a channel at very high frequency is extremely
inefficient. Thus use simple callbacks instead of channels.

> name                old time/op  new time/op  delta
> MasterIndexEach-16   6.68s ±24%   0.96s ± 2%  -85.64%  (p=0.008 n=5+5)
2022-09-24 12:21:59 +02:00
Michael Eischer 04e49924fb checker: Fix S3 legacy layout detection 2022-07-23 11:19:32 +02:00
Michael Eischer fcb3ddf181 check: Complain about usage of s3 legacy layout 2022-07-23 11:19:32 +02:00
Michael Eischer 8b8bd4e8ac check: complain about mixed pack files 2022-07-23 11:19:32 +02:00
Michael Eischer 6f53ecc1ae adapt workers based on whether an operation is CPU or IO-bound
Use runtime.GOMAXPROCS(0) as worker count for CPU-bound tasks,
repo.Connections() for IO-bound task and a combination if a task can be
both. Streaming packs is treated as IO-bound as adding more worker
cannot provide a speedup.

Typical IO-bound tasks are download / uploading / deleting files.
Decoding / Encoding / Verifying are usually CPU-bound. Several tasks are
a combination of both, e.g. for combined download and decode functions.
In the latter case add both limits together. As the backends have their
own concurrency limits restic still won't download more than
repo.Connections() files in parallel, but the additional workers can
decode already downloaded data in parallel.
2022-07-03 12:19:26 +02:00
Michael Eischer 5e0f1c3cef check: remove dead code 2022-07-02 19:28:57 +02:00
Michael Eischer 0df022fa6d check: Print full ids
The short ids are not always unique. In addition, recovering from
damages is easier when having the full ids as that makes it easier to
access the corresponding files.
2022-07-02 19:28:57 +02:00
greatroar a0fa9c6e9f Revert "restic prune: Merge three loops over the index"
This reverts commit 8bdfcf779f.
Should fix #3809. Also needed to make #3290 apply cleanly.
2022-06-30 15:27:34 +02:00
MichaelEischer 19581dbc18
Merge pull request #3786 from greatroar/prune
restic prune: Merge three loops over the index
2022-06-18 16:54:50 +02:00
greatroar 8bdfcf779f restic prune: Merge three loops over the index
There were three loops over the index in restic prune, to find
duplicates, to determine sizes (in pack.Size) and to generate packInfos.
These three are now one loop. This way, prune doesn't need to construct
a set of duplicate blobs, pack.Size doesn't need to contain special
logic for prune's use case (the onlyHdr argument) and pack.Size doesn't
need to construct a map only to have it immediately transformed into a
different map.

Some quick testing on a 160GiB local repo doesn't show running time or
memory use of restic prune --dry-run changing significantly.
2022-06-18 10:40:33 +02:00
greatroar f92ecf13c9 all: Move away from pkg/errors, easy cases
github.com/pkg/errors is no longer getting updates, because Go 1.13
went with the more flexible errors.{As,Is} function. Use those instead:
errors from pkg/errors already support the Unwrap interface used by 1.13
error handling. Also:

* check for io.EOF with a straight ==. That value should not be wrapped,
  and the chunker (whose error is checked in the cases changed) does not
  wrap it.
* Give custom Error methods pointer receivers, so there's no ambiguity
  when type-switching since the value type will no longer implement error.
* Make restic.ErrAlreadyLocked private, and rename it to
  alreadyLockedError to match the stdlib convention that error type
  names end in Error.
* Same with rest.ErrIsNotExist => rest.notExistError.
* Make s3.Backend.IsAccessDenied a private function.
2022-06-14 08:36:38 +02:00
Michael Eischer 5815f727ee checker: convert error type to use pointer-receivers 2022-05-09 22:31:30 +02:00
Michael Eischer 7b9ae91e04 copy: Load snapshots before indexes 2022-04-09 12:27:25 +02:00
Michael Eischer 3d29083e60 copy/find/ls/recover/stats: Memorize snapshot listing before index
These commands filter the snapshots according to some criteria which
essentially requires loading the index before filtering the snapshots.
Thus create a copy of the snapshots list beforehand and use it later on.
2022-04-09 12:26:30 +02:00
Michael Eischer a773cb6527 pack: cleanup header size calculation 2022-03-28 22:09:49 +02:00
Michael Eischer 6408686973 repository: Simplify Blob equality check 2022-03-28 22:09:49 +02:00
Michael Eischer f78bd14e28 repository: Remove pack implementation details from MasterIndex 2022-03-28 22:09:49 +02:00
Michael Eischer 4b3dc415ef checker: cleanup header extraction 2022-02-12 20:18:25 +01:00
Michael Eischer 930a00ad54 checker: reuse bufio reader 2022-02-12 20:18:25 +01:00
Michael Eischer f1e58e7c7f checker: rewrite ReadData to stream packs 2022-02-12 20:18:25 +01:00
Alexander Neumann 3c753c071c errcheck: More error handling 2021-01-30 20:02:37 +01:00
Alexander Neumann 16313bfcc9 errcheck: Add error check for MergeFinalIndexes() 2021-01-30 20:02:37 +01:00
Alexander Neumann bdfedf1f5b
Merge pull request #3173 from MichaelEischer/unify-index-loading
Unify index loading
2021-01-28 13:50:42 +01:00
Michael Eischer e2b0072441 check: add progress bar to the tree structure check 2021-01-28 11:10:50 +01:00
Michael Eischer 258ce0c1e5 parallel: report progress for StreamTrees
This assigns an id to each tree root and then keeps track of how many
tree loads (i.e. trees referenced for the first time) are pending per
tree root. Once a tree root and its subtrees were fully processed there
are no more pending tree loads and the tree root is reported as
processed.
2021-01-28 11:08:43 +01:00
Michael Eischer 6e03f80ca2 check: Split the parallelized tree loader into a reusable component
The actual code change is minimal
2021-01-28 11:08:43 +01:00
Michael Eischer 1d7bb01a6b check: Cleanup tree loading and switch to use errgroup
The helper methods are now wired up in the Structure method.
2021-01-28 11:08:43 +01:00
Alexander Weiss 2a1add7538 check: remove file size counter 2020-12-23 02:34:31 +01:00
Michael Eischer 96904f8972 check: extract parallel index loading 2020-12-22 22:36:18 +01:00
Alexander Weiss 26f85779be Parallelize ForAllSnapshots 2020-12-06 05:09:58 +01:00
Alexander Weiss aa7a5f19c2 Use BlobHandle in index methods 2020-11-22 20:41:12 +01:00
Alexander Weiss a851c53cbe Use PackSize in checker 2020-11-21 22:13:54 +01:00
Alexander Weiss c3ddde9e7d Return hdrSize in ListPack 2020-11-21 22:13:54 +01:00
Michael Eischer 1f43cac12d check: Only track data blobs when unused blobs should be reported
This improves the memory usage of check a lot as it now only has to
track tree blobs when run using the default parameters.
2020-11-15 18:43:07 +01:00
Michael Eischer 6da66c15d8 check: Simplify referenced blob tracking
The result is identical as long as the context in not canceled. However,
in that case the result is incomplete anyways.
2020-11-15 18:42:55 +01:00
Michael Eischer 3500f9490c check: Simplify blob status tracking
UnusedBlobs now directly reads the list of existing blobs from the
repository index. This removes the need for the blobStatusExists flag,
which in turn allows converting the blobRefs map into a BlobSet.
2020-11-15 18:42:42 +01:00
Michael Eischer b8c7543a55 check: Merge 'size could not be found' and 'not found in index' errors
By construction these two errors always show up in pairs: 'size could
not be found' is printed when the blob is not found in the repository
index. That blob is also part of the `blobs` array. Later on, check
iterates over that array and checks whether the blob is marked as
existing. Which cannot be the case as that mark is generated by
iterating over the repository index.

The merged warning no longer reports the blob index within a file. That
information could also be derived by printing the affected tree using
`cat` and searching for the blob.
2020-11-15 18:41:50 +01:00
Alexander Weiss 17bb77b1f9 check: Also check blob length and offset 2020-11-14 00:42:49 +01:00
Alexander Weiss 80dcfca191 check: Check sizes computed from index and pack header 2020-11-14 00:42:49 +01:00
MichaelEischer 46d31ab86d
Merge pull request #3058 from greatroar/counter
Replace restic.Progress with new progress.Counter (fixes two race conditions)
2020-11-09 22:19:09 +01:00
Alexander Weiss 239931578c check: check index for packs that are read 2020-11-09 17:28:14 +01:00
greatroar 21b787a4d1 Stop Counters where they're constructed and started 2020-11-09 13:03:31 +01:00
greatroar ddca699cd2 Replace restic.Progress with new progress.Counter
This fixes two race conditions while cleaning up the code.
2020-11-09 12:12:35 +01:00
Alexander Weiss b44ecde8b0 Fix setting of ID in DecodeIndex 2020-10-17 09:12:58 +02:00
MichaelEischer 4ba237bb93
Merge pull request #3019 from greatroar/refactor-decodeindex
Refactor index decoding
2020-10-15 23:22:33 +02:00
greatroar b27375f5ce defer close(ch) outside repository.RunWorkers 2020-10-14 15:50:16 +02:00