Trying this for every peer winds up being very slow and precludes it
from acceptable runtime in the CI, so reduce this to 4.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Now that we have parent pointers hooked up, we can simply go right to
the node and remove it in place, rather than having to recursively walk
the entire trie.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
This makes the insertion algorithm a bit more efficient, while also now
taking on the additional task of connecting up parent pointers. This
will be handy in the following commit.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Heavier network extensions might require the wireguard-go component to
use less ram, so let users of this reduce these as needed.
At some point we'll put this behind a configuration method of sorts, but
for now, just expose the consts as vars.
Requested-by: Josh Bleecher Snyder <josh@tailscale.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
On Linux we can run `ip link del wg0`, in which case the fd becomes
stale, and we should exit. Since this is an intentional action, don't
treat it as an error.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
When debugging, it's useful to know why a receive func exited.
We were already logging that, but only in the "death spiral" case.
Move the logging up, to capture it always.
Reduce the verbosity, since it is not an error case any more.
Put the receive func name in the log line.
Signed-off-by: Josh Bleecher Snyder <josharian@gmail.com>
Note: this bug is "hidden" by avoiding "death spiral" code path by
6228659 ("device: handle broader range of errors in RoutineReceiveIncoming").
If the code reached "death spiral" mechanism, there would be multiple
double frees happening. This results in a deadlock on iOS, because the
pools are fixed size and goroutine might stop until somebody makes
space in the pool.
This was almost 100% repro on the new ARM Macbooks:
- Build with 'ios' tag for Mac. This will enable bounded pools.
- Somehow call device.IpcSet at least couple of times (update config)
- device.BindUpdate() would be triggered
- RoutineReceiveIncoming would enter "death spiral".
- RoutineReceiveIncoming would stall on double free (pool is already
full)
- The stuck routine would deadlock 'device.closeBindLocked()' function
on line 'netc.stopping.Wait()'
Signed-off-by: Kristupas Antanavičius <kristupas.antanavicius@nordsec.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Instead of hard-coding exactly two sources from which
to receive packets (an IPv4 source and an IPv6 source),
allow the conn.Bind to specify a set of sources.
Beneficial consequences:
* If there's no IPv6 support on a system,
conn.Bind.Open can choose not to return a receive function for it,
which is simpler than tracking that state in the bind.
This simplification removes existing data races from both
conn.StdNetBind and bindtest.ChannelBind.
* If there are more than two sources on a system,
the conn.Bind no longer needs to add a separate muxing layer.
Signed-off-by: Josh Bleecher Snyder <josharian@gmail.com>
RoutineReceiveIncoming exits immediately on net.ErrClosed,
but not on other errors. However, for errors that are known
to be permanent, such as syscall.EAFNOSUPPORT,
we may as well exit immediately instead of retrying.
This considerably speeds up the package device tests right now,
because the Bind sometimes (incorrectly) returns syscall.EAFNOSUPPORT
instead of net.ErrClosed.
Signed-off-by: Josh Bleecher Snyder <josharian@gmail.com>
There's no way for len(peers)==0 when a current peer has
isRunning==false.
This requires some struct reshuffling so that the uint64 pointer is
aligned.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Googlers have a habit of graffiting their name in TODO items that then
are never addressed, and other people won't go near those because
they're marked territory of another animal. I've been gradually cleaning
these up as I see them, but this commit just goes all the way and
removes the remaining stragglers.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
This code is stable, and the test is finicky, especially on high core
count systems, so just disable it.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
This linked list implementation is awful, but maybe Go 2 will help
eventually, and at least we're not open coding the hlist any more.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
We're loosing our ownership of the port when bringing the device down,
which means another test process could reclaim it. Avoid this by
retrying for 4 seconds.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Rather than racing with Start(), since we're never destroying these
queues, we just set the variables at creation time.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Since RoutineHandshake calls peer.SendKeepalive(), it potentially is a
writer into the encryption queue, so we need to bump the wg count.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
RoutineReadFromTUN can trigger a call to SendStagedPackets.
SendStagedPackets attempts to protect against sending
on the encryption queue by checking peer.isRunning and device.isClosed.
However, those are subject to TOCTOU bugs.
If that happens, we get this:
goroutine 1254 [running]:
golang.zx2c4.com/wireguard/device.(*Peer).SendStagedPackets(0xc000798300)
.../wireguard-go/device/send.go:321 +0x125
golang.zx2c4.com/wireguard/device.(*Device).RoutineReadFromTUN(0xc000014780)
.../wireguard-go/device/send.go:271 +0x21c
created by golang.zx2c4.com/wireguard/device.NewDevice
.../wireguard-go/device/device.go:315 +0x298
Fix this with a simple, big hammer: Keep the encryption queue
alive as long as it might be written to.
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
This serves two purposes.
First, it makes repeatedly stopping then starting a peer cheaper.
Second, it prevents a data race observed accessing the queues.
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
The high iteration count was useful when TestUpDown
was the nexus of new bugs to investigate.
Now that it has stabilized, that's less valuable.
And it slows down running the tests and crowds out other tests.
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
On a many-core machine with the race detector enabled,
this test can take several minutes to complete.
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
It's never used and we won't have a use for it. Also, move to go-running
stringer, for those without GOPATHs.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
The test previously checked the offset within a substruct, not the
offset within the allocated struct, so this adds the two together.
It then fixes an alignment crash on 32-bit machines.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>