restic/internal
greatroar 5141228e0c repository: Re-tune indexmap allocation strategy
fd05037e1a changed the allocation batch
size from 256 to 128 under the assumption that an indexEntry is 60 bytes
on amd64, but it's 64: structs are padded out to a multiple of 8 for
alignment reasons. That means we'd waste no space in malloc even without
the batch allocation, at least on 64-bit machines. While that strategy
cuts the overallocation down dramatically for many small indexes, it also
seems to slow allocation down (Go 1.18, Linux, amd64, -benchtime=2s):

    name                   old time/op    new time/op    delta
    DecodeIndex-8             4.67s ± 5%     4.60s ± 1%      ~     (p=0.953 n=10+5)
    DecodeIndexParallel-8     4.67s ± 3%     4.60s ± 1%      ~     (p=0.953 n=10+5)
    IndexHasUnknown-8        37.8ns ± 8%    36.5ns ±14%      ~     (p=0.841 n=5+5)
    IndexHasKnown-8          38.5ns ±12%    37.7ns ±10%      ~     (p=0.968 n=5+5)
    IndexAlloc-8              615ms ±18%     607ms ± 1%      ~     (p=1.000 n=10+5)
    IndexAllocParallel-8      245ms ±11%     285ms ± 6%   +16.40%  (p=0.001 n=10+5)
    MasterIndexAlloc-8        286ms ± 9%     275ms ± 2%      ~     (p=1.000 n=10+5)
    LoadIndex/v1-8           27.0ms ± 4%    26.8ms ± 1%      ~     (p=0.690 n=5+5)
    LoadIndex/v2-8           22.4ms ± 1%    22.8ms ± 2%    +1.48%  (p=0.016 n=5+5)

    name                   old alloc/op   new alloc/op   delta
    IndexAlloc-8              446MB ± 0%     446MB ± 0%    -0.00%  (p=0.000 n=8+4)
    IndexAllocParallel-8      446MB ± 0%     446MB ± 0%    -0.00%  (p=0.008 n=8+5)
    MasterIndexAlloc-8        213MB ± 0%     159MB ± 0%   -25.47%  (p=0.000 n=10+5)

    name                   old allocs/op  new allocs/op  delta
    IndexAlloc-8               913k ± 0%     2632k ± 0%  +188.19%  (p=0.008 n=5+5)
    IndexAllocParallel-8       913k ± 0%     2632k ± 0%  +188.21%  (p=0.008 n=5+5)
    MasterIndexAlloc-8         318k ± 0%     1172k ± 0%  +267.86%  (p=0.008 n=5+5)

Instead, this patch sets a batch size of 4, which means no space is
wasted by malloc on 64-bit and very little on 32-bit. It still gets very
close to the savings from not allocating in batches, without requiring
special code for bits.UintSize==64. Benchmark results, again for
Linux/amd64:

    name                   old time/op    new time/op    delta
    DecodeIndex-8             4.67s ± 5%     4.83s ± 9%     ~     (p=0.315 n=10+10)
    DecodeIndexParallel-8     4.67s ± 3%     4.68s ± 4%     ~     (p=0.315 n=10+10)
    IndexHasUnknown-8        37.8ns ± 8%    44.5ns ±19%     ~     (p=0.095 n=5+5)
    IndexHasKnown-8          38.5ns ±12%    36.9ns ± 8%     ~     (p=0.690 n=5+5)
    IndexAlloc-8              615ms ±18%     628ms ±18%     ~     (p=0.218 n=10+10)
    IndexAllocParallel-8      245ms ±11%     262ms ± 9%   +7.02%  (p=0.043 n=10+10)
    MasterIndexAlloc-8        286ms ± 9%     287ms ±13%     ~     (p=1.000 n=10+10)
    LoadIndex/v1-8           27.0ms ± 4%    26.8ms ± 0%     ~     (p=1.000 n=5+5)
    LoadIndex/v2-8           22.4ms ± 1%    22.5ms ± 0%     ~     (p=0.056 n=5+5)

    name                   old alloc/op   new alloc/op   delta
    IndexAlloc-8              446MB ± 0%     446MB ± 0%     ~     (p=1.000 n=8+10)
    IndexAllocParallel-8      446MB ± 0%     446MB ± 0%   -0.00%  (p=0.000 n=8+8)
    MasterIndexAlloc-8        213MB ± 0%     160MB ± 0%  -25.02%  (p=0.000 n=10+9)

    name                   old allocs/op  new allocs/op  delta
    IndexAlloc-8               913k ± 0%     1333k ± 0%  +45.94%  (p=0.000 n=8+10)
    IndexAllocParallel-8       913k ± 0%     1333k ± 0%  +45.94%  (p=0.000 n=8+8)
    MasterIndexAlloc-8         318k ± 0%      525k ± 0%  +64.99%  (p=0.000 n=10+10)

The allocation method indexmap.newEntry has also been rewritten in a
form that is a few instructions shorter.
2022-05-11 21:22:14 +02:00
..
archiver repository: run blackbox tests using old and new repo version 2022-04-30 11:34:10 +02:00
backend Use config file modes to derive new dir/file modes 2022-04-30 15:59:51 +02:00
bloblru bloblru: Fix comment for New function 2022-03-28 22:25:25 +02:00
cache crypto: Use helpers for size calculations 2022-03-28 22:09:49 +02:00
checker Add option global --compression 2022-04-30 11:34:10 +02:00
crypto crypto: Remove unused error 2020-09-05 10:07:16 +02:00
debug add go:build headers everywhere 2022-03-28 22:23:47 +02:00
dump Refactor internal/dump + concurrent load/write 2021-11-01 23:01:55 +01:00
errors errors: Ensure that errors.IsFatal(errors.Fatal("err")) == true 2022-03-28 22:09:49 +02:00
filter filter: short circuit if no negative patterns 2022-03-20 13:33:08 +01:00
fs Add simple test for fs.TempFile on windows 2022-04-09 23:37:58 +02:00
fuse copy/find/ls/recover/stats: Memorize snapshot listing before index 2022-04-09 12:26:30 +02:00
hashing errcheck: Add error checks 2021-01-30 20:02:37 +01:00
limiter golangci-lint: replace deprecated golint with revive 2022-03-28 22:33:17 +02:00
migrations Fix issues reported by semgrep 2020-12-11 09:41:59 +01:00
mock Backend: Expose connections parameter 2022-04-23 11:13:08 +02:00
options Some options fixes 2020-12-23 23:26:04 +03:00
pack pack: slightly expand testing of compressed blobs 2022-04-30 11:34:10 +02:00
repository repository: Re-tune indexmap allocation strategy 2022-05-11 21:22:14 +02:00
restic repository: run blackbox tests using old and new repo version 2022-04-30 11:34:10 +02:00
restorer repository: implement pack compression 2022-04-30 11:34:10 +02:00
selfupdate Refactor file handing for self-update. 2022-04-09 21:40:33 +02:00
test Add more error handling 2021-01-30 20:19:47 +01:00
textfile Add more error handling 2021-01-30 20:19:47 +01:00
ui golangci-lint: replace deprecated golint with revive 2022-03-28 22:33:17 +02:00
walker Limit number of large tree blobs loaded in parallel by StreamTrees 2022-02-19 12:26:09 +01:00