Commit Graph

76 Commits

Author SHA1 Message Date
Michael Eischer a773cb6527 pack: cleanup header size calculation 2022-03-28 22:09:49 +02:00
Michael Eischer 6408686973 repository: Simplify Blob equality check 2022-03-28 22:09:49 +02:00
Michael Eischer f78bd14e28 repository: Remove pack implementation details from MasterIndex 2022-03-28 22:09:49 +02:00
Michael Eischer 4b3dc415ef checker: cleanup header extraction 2022-02-12 20:18:25 +01:00
Michael Eischer 930a00ad54 checker: reuse bufio reader 2022-02-12 20:18:25 +01:00
Michael Eischer f1e58e7c7f checker: rewrite ReadData to stream packs 2022-02-12 20:18:25 +01:00
Michael Eischer 9aa2eff384 Add plumbing to calculate backend specific file hash for upload
This enables the backends to request the calculation of a
backend-specific hash. For the currently supported backends this will
always be MD5. The hash calculation happens as early as possible, for
pack files this is during assembly of the pack file. That way the hash
would even capture corruptions of the temporary pack file on disk.
2021-08-04 22:17:46 +02:00
Alexander Neumann 04ca69cc78 Address issues reported by golint 2021-01-30 20:45:57 +01:00
Alexander Neumann 3c753c071c errcheck: More error handling 2021-01-30 20:02:37 +01:00
Alexander Neumann 16313bfcc9 errcheck: Add error check for MergeFinalIndexes() 2021-01-30 20:02:37 +01:00
Alexander Neumann bdfedf1f5b
Merge pull request #3173 from MichaelEischer/unify-index-loading
Unify index loading
2021-01-28 13:50:42 +01:00
Michael Eischer e2b0072441 check: add progress bar to the tree structure check 2021-01-28 11:10:50 +01:00
Michael Eischer 258ce0c1e5 parallel: report progress for StreamTrees
This assigns an id to each tree root and then keeps track of how many
tree loads (i.e. trees referenced for the first time) are pending per
tree root. Once a tree root and its subtrees were fully processed there
are no more pending tree loads and the tree root is reported as
processed.
2021-01-28 11:08:43 +01:00
Michael Eischer 6e03f80ca2 check: Split the parallelized tree loader into a reusable component
The actual code change is minimal
2021-01-28 11:08:43 +01:00
Michael Eischer 1d7bb01a6b check: Cleanup tree loading and switch to use errgroup
The helper methods are now wired up in the Structure method.
2021-01-28 11:08:43 +01:00
Alexander Weiss 2a1add7538 check: remove file size counter 2020-12-23 02:34:31 +01:00
Michael Eischer 96904f8972 check: extract parallel index loading 2020-12-22 22:36:18 +01:00
Alexander Weiss 26f85779be Parallelize ForAllSnapshots 2020-12-06 05:09:58 +01:00
Alexander Weiss aa7a5f19c2 Use BlobHandle in index methods 2020-11-22 20:41:12 +01:00
Alexander Weiss a851c53cbe Use PackSize in checker 2020-11-21 22:13:54 +01:00
Alexander Weiss c3ddde9e7d Return hdrSize in ListPack 2020-11-21 22:13:54 +01:00
Michael Eischer 1f43cac12d check: Only track data blobs when unused blobs should be reported
This improves the memory usage of check a lot as it now only has to
track tree blobs when run using the default parameters.
2020-11-15 18:43:07 +01:00
Michael Eischer 6da66c15d8 check: Simplify referenced blob tracking
The result is identical as long as the context in not canceled. However,
in that case the result is incomplete anyways.
2020-11-15 18:42:55 +01:00
Michael Eischer 3500f9490c check: Simplify blob status tracking
UnusedBlobs now directly reads the list of existing blobs from the
repository index. This removes the need for the blobStatusExists flag,
which in turn allows converting the blobRefs map into a BlobSet.
2020-11-15 18:42:42 +01:00
Michael Eischer b8c7543a55 check: Merge 'size could not be found' and 'not found in index' errors
By construction these two errors always show up in pairs: 'size could
not be found' is printed when the blob is not found in the repository
index. That blob is also part of the `blobs` array. Later on, check
iterates over that array and checks whether the blob is marked as
existing. Which cannot be the case as that mark is generated by
iterating over the repository index.

The merged warning no longer reports the blob index within a file. That
information could also be derived by printing the affected tree using
`cat` and searching for the blob.
2020-11-15 18:41:50 +01:00
Alexander Weiss 17bb77b1f9 check: Also check blob length and offset 2020-11-14 00:42:49 +01:00
Alexander Weiss 80dcfca191 check: Check sizes computed from index and pack header 2020-11-14 00:42:49 +01:00
MichaelEischer 46d31ab86d
Merge pull request #3058 from greatroar/counter
Replace restic.Progress with new progress.Counter (fixes two race conditions)
2020-11-09 22:19:09 +01:00
Alexander Weiss 239931578c check: check index for packs that are read 2020-11-09 17:28:14 +01:00
greatroar 21b787a4d1 Stop Counters where they're constructed and started 2020-11-09 13:03:31 +01:00
greatroar ddca699cd2 Replace restic.Progress with new progress.Counter
This fixes two race conditions while cleaning up the code.
2020-11-09 12:12:35 +01:00
Alexander Weiss b44ecde8b0 Fix setting of ID in DecodeIndex 2020-10-17 09:12:58 +02:00
MichaelEischer 4ba237bb93
Merge pull request #3019 from greatroar/refactor-decodeindex
Refactor index decoding
2020-10-15 23:22:33 +02:00
greatroar b27375f5ce defer close(ch) outside repository.RunWorkers 2020-10-14 15:50:16 +02:00
greatroar 27db3ec262 Refactor index decoding
Decoding old-format indices no longer requires loading and decrypting
twice.
2020-10-13 20:47:50 +02:00
Michael Eischer c458e114d4 pass context to Find / FindSnapshot
This allows proper interruption of restic while it searches for
snapshots or key files.
2020-10-09 22:37:56 +02:00
Michael Eischer 4784540f04 repository: Simplify worker group code 2020-09-05 10:07:16 +02:00
aawsome 0fed6a8dfc
Use "pack file" instead of "data file" (#2885)
- changed variable names, especially changed DataFile into PackFile
- changed in some comments
- always use "pack file" in docu
2020-08-16 11:16:38 +02:00
MichaelEischer 34181b13a2
Merge pull request #2328 from MichaelEischer/no-repeated-checks
Fix duplicate tree checks within `restic check`
2020-07-22 22:08:02 +02:00
Alexander Weiss e388d962a5 Merge final indexes together for faster index access 2020-07-22 21:54:02 +02:00
Michael Eischer 9b0e718852 checker: Test that blob types are not confused 2020-07-20 23:43:47 +02:00
Michael Eischer ddf0b8cd0b checker: Properly distinguish between data and tree blobs
If a data blob and a tree blob with the same ID (= same content) exist,
then the checker did not report a data or tree blob as unused when the
blob of the other type was still in use.
2020-07-20 22:58:39 +02:00
Michael Eischer 2d0c138c9b checker: Test that check only decodes trees once
The `DuplicateTree` flag is necessary to ensure that failures cannot be
swallowed. The old checker implementation ignores errors from LoadTree
if the corresponding tree was already checked.
2020-07-20 22:51:53 +02:00
Michael Eischer ef325ffc02 checker: Cleanup error handling code
This change only moves code around but does not result in any change in
behavior.
2020-07-20 22:51:53 +02:00
Michael Eischer 7a165f32a9 checker: Traverse trees in depth-first order
Backups traverse the file tree in depth-first order and saves trees on
the way back up. This results in tree packs filled in a way comparable
to the reverse Polish notation.  In order to check tree blobs in that
order, the treeFilter would have to delay the forwarding of tree nodes
until all children of it are processed which would complicate the
implementation.

Therefore do the next similar thing and traverse the tree in depth-first
order, but process trees already on the way down. The tree blob ids are
added in reverse order to the backlog, which is once again reverted when
removing the ids from the back of the backlog.
2020-07-20 22:51:53 +02:00
Michael Eischer 36c69e3ca7 checker: Unify blobs, processed trees and referenced blobs map
The blobRefs map and the processedTrees IDSet are merged to reduce the
memory usage. The blobRefs map now uses separate flags to track blob
usage as data or tree blob. This prevents skipping of trees whose
content is identical to an already processed data blob. A third flag
tracks whether a blob exists or not, which removes the need for the
blobs IDSet.
2020-07-20 22:51:47 +02:00
Michael Eischer 35d8413639 checker: Remove dead index map 2020-07-20 22:37:31 +02:00
Michael Eischer c66a0e408c checker: Reduce cost of debug log
Avoid duplicate allocation of the Subtree list.
2020-07-20 22:37:31 +02:00
Michael Eischer 70f4c014ef checker: Decode identical tree nodes only once
Even though the checkTreeWorker skips already processed chunks,
filterTrees did queue the same tree blob on every occurence. This
becomes a serious performance bottleneck for larger number of snapshots
that cover mostly the same directories. Therefore decode a tree blob
exactly once.
2020-07-20 22:37:31 +02:00
Michael Eischer f0d8710611 Add benchmark for checker scaling with snapshot count 2020-07-20 22:37:31 +02:00