nebula

Commit Graph

Author	SHA1	Message	Date
Wade Simmons	949ec78653	don't set ConnectionState to nil (#590 ) * don't set ConnectionState to nil We might have packets processing in another thread, so we can't safely just set this to nil. Since we removed it from the hostmaps, the next packets to process should start the handshake over again. I believe this comment is outdated or incorrect, since the next handshake will start over with a new HostInfo, I don't think there is any way a counter reuse could happen: > We must null the connectionstate or a counter reuse may happen Here is a panic we saw that I think is related: panic: runtime error: invalid memory address or nil pointer dereference [signal SIGSEGV: segmentation violation code=0x1 addr=0x20 pc=0x93a037] goroutine 59 [running, locked to thread]: github.com/slackhq/nebula.(Firewall).Drop(...) github.com/slackhq/nebula/firewall.go:380 github.com/slackhq/nebula.(Interface).consumeInsidePacket(...) github.com/slackhq/nebula/inside.go:59 github.com/slackhq/nebula.(Interface).listenIn(...) github.com/slackhq/nebula/interface.go:233 created by github.com/slackhq/nebula.(Interface).run github.com/slackhq/nebula/interface.go:191 * use closeTunnel	2021-12-06 14:09:05 -05:00
Nate Brown	bcabcfdaca	Rework some things into packages (#489 )	2021-11-03 20:54:04 -05:00
Wade Simmons	ea2c186a77	remote_allow_ranges: allow inside CIDR specific remote_allow_lists (#540 ) This allows you to configure remote allow lists specific to different subnets of the inside CIDR. Example: remote_allow_ranges: 10.42.42.0/24: 192.168.0.0/16: true This would only allow hosts with a VPN IP in the 10.42.42.0/24 range to have private IPs (and thus don't connect over public IPs). The PR also refactors AllowList into RemoteAllowList and LocalAllowList to make it clearer which methods are allowed on which allow list.	2021-10-19 10:54:30 -04:00
Andrii Chubatiuk	d13f4b5948	fixed recv_errors spoofing condition (#482 ) Hi @nbrownus Fixed a small bug that was introduced in df7c7ee#diff-5d05d02296a1953fd5fbcb3f4ab486bc5f7c34b14c3bdedb068008ec8ff5beb4 having problems due to it	2021-06-03 13:04:04 -04:00
Nathan Brown	df7c7eec4a	Get out faster on nil udpAddr (#449 )	2021-04-26 20:21:47 -05:00
Nathan Brown	6f37280e8e	Fully close tunnels when CloseAllTunnels is called (#448 )	2021-04-26 10:42:24 -05:00
Nathan Brown	710df6a876	Refactor remotes and handshaking to give every address a fair shot (#437 )	2021-04-14 13:50:09 -05:00
Nathan Brown	75f7bda0a4	Lighthouse performance pass (#418 )	2021-03-31 17:32:02 -05:00
Nathan Brown	883e09a392	Don't use a global ca pool (#426 )	2021-03-29 12:10:19 -05:00
Nathan Brown	3ea7e1b75f	Don't use a global logger (#423 )	2021-03-26 09:46:30 -05:00
Nathan Brown	7073d204a8	IPv6 support for outside (udp) (#369 )	2021-03-18 20:37:24 -05:00
Wade Simmons	6c55d67f18	Refactor handshake_ix (#401 ) There are some subtle race conditions with the previous handshake_ix implementation, mostly around collisions with localIndexId. This change refactors it so that we have a "commit" phase during the handshake where we grab the lock for the hostmap and ensure that we have a unique local index before storing it. We also now avoid using the pending hostmap at all for receiving stage1 packets, since we have everything we need to just store the completed handshake. Co-authored-by: Nate Brown <nbrown.us@gmail.com> Co-authored-by: Ryan Huber <rhuber@gmail.com> Co-authored-by: forfuncsake <drussell@slack-corp.com>	2021-03-12 14:16:25 -05:00
Wade Simmons	64d8035d09	fix race in getOrHandshake (#400 ) We missed this race with #396 (and I think this is also the crash in issue #226). We need to lock a little higher in the getOrHandshake method, before we reset hostinfo.ConnectionInfo. Previously, two routines could enter this section and confuse the handshake process. This could result in the other side sending a recv_error that also has a race with setting hostinfo.ConnectionInfo back to nil. So we make sure to grab the lock in handleRecvError as well. Neither of these code paths are in the hot path (handling packets between two hosts over an active tunnel) so there should be no performance concerns.	2021-03-09 09:27:02 -05:00
Wade Simmons	2a4beb41b9	Routine-local conntrack cache (#391 ) Previously, every packet we see gets a lock on the conntrack table and updates it. When running with multiple routines, this can cause heavy lock contention and limit our ability for the threads to run independently. This change caches reads from the conntrack table for a very short period of time to reduce this lock contention. This cache will currently default to disabled unless you are running with multiple routines, in which case the default cache delay will be 1 second. This means that entries in the conntrack table may be up to 1 second out of date and remain in a routine local cache for up to 1 second longer than the global table. Instead of calling time.Now() for every packet, this cache system relies on a tick thread that updates the current cache "version" each tick. Every packet we check if the cache version is out of date, and reset the cache if so.	2021-03-01 19:52:17 -05:00
Wade Simmons	1bae5b2550	more validation in pending hostmap deletes (#344 ) We are currently seeing some cases where we are not deleting entries correctly from the pending hostmap. I believe this is a case of an inbound timer tick firing and deleting the Hosts map entry for a newer handshake attempt than intended, thus leaving the old Indexes entry orphaned. This change adds some extra checking when deleteing from the Indexes and Hosts maps to ensure we clean everything up correctly.	2021-03-01 12:40:46 -05:00
Tim Rots	e7e6a23cde	fix a few typos (#302 )	2021-03-01 11:14:34 -05:00
Wade Simmons	27d9a67dda	Proper multiqueue support for tun devices (#382 ) This change is for Linux only. Previously, when running with multiple tun.routines, we would only have one file descriptor. This change instead sets IFF_MULTI_QUEUE and opens a file descriptor for each routine. This allows us to process with multiple threads while preventing out of order packet reception issues. To attempt to distribute the flows across the queues, we try to write to the tun/UDP queue that corresponds with the one we read from. So if we read a packet from tun queue "2", we will write the outgoing encrypted packet to UDP queue "2". Because of the nature of how multi queue works with flows, a given host tunnel will be sticky to a given routine (so if you try to performance benchmark by only using one tunnel between two hosts, you are only going to be using a max of one thread for each direction). Because this system works much better when we can correlate flows between the tun and udp routines, we are deprecating the undocumented "tun.routines" and "listen.routines" parameters and introducing a new "routines" parameter that sets the value for both. If you use the old undocumented parameters, the max of the values will be used and a warning logged. Co-authored-by: Nate Brown <nbrown.us@gmail.com>	2021-02-25 15:01:14 -05:00
Wade Simmons	ee7c27093c	add HostMap.RemoteIndexes (#329 ) This change adds an index based on HostInfo.remoteIndexId. This allows us to use HostMap.QueryReverseIndex without having to loop over all entries in the map (this can be a bottleneck under high traffic lighthouses). Without this patch, a high traffic lighthouse server receiving recv_error packets and lots of handshakes, cpu pprof trace can look like this: flat flat% sum% cum cum% 2000ms 32.26% 32.26% 3040ms 49.03% github.com/slackhq/nebula.(*HostMap).QueryReverseIndex 870ms 14.03% 46.29% 1060ms 17.10% runtime.mapiternext Which shows 50% of total cpu time is being spent in QueryReverseIndex.	2020-11-23 14:51:16 -05:00
Wade Simmons	2e7ca027a4	Lighthouse handler optimizations (#320 ) We noticed that the number of memory allocations LightHouse.HandleRequest creates for each call can seriously impact performance for high traffic lighthouses. This PR introduces a benchmark in the first commit and then optimizes memory usage by creating a LightHouseHandler struct. This struct allows us to re-use memory between each lighthouse request (one instance per UDP listener go-routine).	2020-11-23 14:50:01 -05:00
Wade Simmons	aba42f9fa6	enforce the use of goimports (#248 ) * enforce the use of goimports Instead of enforcing `gofmt`, enforce `goimports`, which also asserts a separate section for non-builtin packages. * run `goimports` everywhere * exclude generated .pb.go files	2020-06-30 18:53:30 -04:00
Wade Simmons	b37a91cfbc	add meta packet statistics (#230 ) This change add more metrics around "meta" (non "message" type packets). For lighthouse packets, we also record statistics around the specific lighthouse meta type. We don't keep statistics for the "message" type so that we don't slow down the fast path (and you can just look at metrics on the tun interface to find that information).	2020-06-26 13:45:48 -04:00
Patrick Bogen	ecf0e5a9f6	drop packets even if we aren't going to emit Debug logs about it (#239 ) * drop packets even if we aren't going to emit Debug logs about it * smallify change	2020-06-10 16:55:49 -05:00
Patrick Bogen	363c836422	log the reason for fw drops (#220 ) * log the reason for fw drops * only prepare log if we will end up sending it	2020-04-10 10:57:21 -07:00
Wade Simmons	4f6313ebd3	fix config name for {remote,local}_allow_list (#219 ) This config option should be snake_case, not camelCase.	2020-04-08 16:20:12 -04:00
Wade Simmons	0a474e757b	Add lighthouse.{remoteAllowList,localAllowList} (#217 ) These settings make it possible to blacklist / whitelist IP addresses that are used for remote connections. `lighthouse.remoteAllowList` filters which remote IPs are allow when fetching from the lighthouse (or, if you are the lighthouse, which IPs you store and forward to querying hosts). By default, any remote IPs are allowed. You can provide CIDRs here with `true` to allow and `false` to deny. The most specific CIDR rule applies to each remote. If all rules are "allow", the default will be "deny", and vice-versa. If both "allow" and "deny" rules are present, then you MUST set a rule for "0.0.0.0/0" as the default. lighthouse: remoteAllowList: # Example to block IPs from this subnet from being used for remote IPs. "172.16.0.0/12": false # A more complicated example, allow public IPs but only private IPs from a specific subnet "0.0.0.0/0": true "10.0.0.0/8": false "10.42.42.0/24": true `lighthouse.localAllowList` has the same logic as above, but it applies to the local addresses we advertise to the lighthouse. Additionally, you can specify an `interfaces` map of regular expressions to match against interface names. The regexp must match the entire name. All interface rules must be either true or false (and the default rule will be the inverse). CIDR rules are matched after interface name rules. Default is all local IP addresses. lighthouse: localAllowList: # Example to blacklist docker interfaces. interfaces: 'docker.*': false # Example to only advertise IPs in this subnet to the lighthouse. "10.0.0.0/8": true	2020-04-08 15:36:43 -04:00
Wade Simmons	b4f2f7ce4e	log `certName` alongside `vpnIp` (#200 ) This change adds a new helper, `(*HostInfo).logger()`, that starts a new logrus.Entry with `vpnIp` and `certName`. We don't use the helper inside of handshake_ix though since the certificate has not been attached to the HostInfo yet. Fixes: #84	2020-04-06 11:34:00 -07:00
Ryan Huber	9333a8e3b7	subnet support	2019-12-12 16:34:17 +00:00
Slack Security Team	f22b4b584d	Public Release	2019-11-19 17:00:20 +00:00

28 Commits