We should support Cilium with at least one form of CNI - it's supported and used by both GKE and AKS, at least.
I am looking more on how to support ambient on unsupported CNI providers ( no admin permissions, interception problems like Cilium, etc), by continuing to inject ztunnel as a L4 proxy and using the old iptables modes.
As mentioned in slack, one option we may explore for such CNI providers is to modify the default route on the pod so egress goes to ztunnel pod, and in ztunnel pod use the 'old style' interception so all traffic is captured. It may work for
all CNIs since it's not doing anything on the host level - unless they mess with the routing.
First, we don't have consensus on the options for update.
One option ( which I'm advocating - for our managed product ) is to tie it
with node updates, as it is the safest and
hopefully ztunnel is small enough to not require frequent changes, and we
can decouple it from the rest.
The only other option I consider viable is to pre-deploy blue/green daemon
sets - and upgrade one.
It is possible to gradually move the route to the new one ( starting with
pods marked as canary/dev, for
extra safety).
It is also possible to set 2 default routes - to both blue and green - when
one ztunnel is upgraded/drained it may
be possible the new requests will go to the other.
I personally like 'simplest' option, and cordon and node upgrade seems to
be not only simplest but also safest,
and will be consistent with future 'ztunnel built into native CNI' or even
'ztunnel is the CNI' cases we discussed in
the past. Not sure if any vendor is doing live-upgades of the CNI.
Using the default route is also nice on VMs with docker support - and there
are options to still mark packets and use
different routing tables. We already did something similar for the tproxy
mode of capture - but with ztunnel in
separate pod and network namespace it is far simpler and cleaner ( no need
to worry about excluding egress
from ztunnel ).
On Tue, Apr 25, 2023 at 7:13 AM Yuval Kohavi ***@***.***> wrote:
disadvantage of changing the default route of the pod as that you'll need
to update it in all pods if ztunnel were to restart
Reply to this email directly, view it on GitHub
<
#44198 (comment)
>, or
unsubscribe
<
https://github.com/notifications/unsubscribe-auth/AAAUR2SULI7RX5J7URGGVNTXC7LY7ANCNFSM6AAAAAAWOZ4KUM
>
You are receiving this because you were mentioned.Message ID:
***@***.***>
As mentioned in slack, one option we may explore for such CNI providers is to modify the default route on the pod so egress goes to ztunnel pod, and in ztunnel pod use the 'old style' interception so all traffic is captured. It may work for
all CNIs since it's not doing anything on the host level - unless they mess with the routing.
Yeah - aside from the update problem
@yuval-k
mentioned, I also don't like this because any pod with sufficient local perms can self-bypass redirection.
That's the price paid for no host rules, and that feels like a pretty steep price - I think redirection
has
to be "forcible" in the sense that it can be guaranteed to happen for everything leaving the pod netns regardless of intended dest/what the pod thinks is happening, or has local netns/cgroup-level permissions to alter.
AFAICT redirection rules (iptables or ebpf or whatever)
must
live outside the security context of the pod - or at least the ability to block/drop traffic that isn't tagged for redirection
must
live outside the security context of the pod.
Fundamentally, we cannot have two things fighting over host node redirection rules (eBPF or iptables, it doesn't really matter - it's the same problem in either case and CNI plugin chaining doesn't help with this problem).
Really the only options I see are:
Have Istio CNI skip setting up certain host-side redirection things if Cilium (or other
self-redirecting CNI
) is in use, and document how the self-redirecting CNI can be used to set up/enforce the redirection Istio needs, or create a separate Cilium-only Istio CNI plugin (kludgy)
Tell people using Istio and Cilium to avoid the use of some Cilium APIs (e.g. LocalRedirectPolicy) (also kludgy, slightly gross)
Convince GAMMA + Istio + Cilium to take this up as a shared API (ideal, but that could take years as has been discussed)
Come up with a DIY standard (basically (3) but without GAMMA) for making sure overlapping node-level redirection rules don't create problems and getting Istio and Cilium to both support it (eBPF chaining versus CNI chaining - technically possible, but there's no standard for how to do it - or some sort of generic API).
Just creating a completely standalone/dedicated Cilium Istio CNI flavor that perhaps interfaces with Cilium APIs and asks Cilium to set up the node-local redirection we want might be the best/simplest/most maintainable option (for both OSS and vendors that want the two to play nice) - nothing in Istio really cares who or what sets up pod->ztunnel redirection, we just need it to be in place.
disadvantage of changing the default route of the pod as that you'll need to update it in all pods if ztunnel were to restart
xref
#43642 (comment)
for using VRRP to provide ztunnel HA. Since the VRRP VIP can be used in the rules, no updates are required when switching from the active to the backup ztunnel.
I personally like 'simplest' option, and cordon and node upgrade seems to
be not only simplest but also safest,
and will be consistent with future 'ztunnel built into native CNI' or even
'ztunnel is the CNI' cases we discussed in
the past. Not sure if any vendor is doing live-upgades of the CNI.
+1 until the use case arises for a more elaborate solution.
lifecycle/stale
Indicates a PR or issue hasn't been manipulated by an Istio team member for a while
label
Aug 4, 2023
lifecycle/automatically-closed
Indicates a PR or issue that has been closed automatically.
label
Aug 9, 2023
lifecycle/stale
Indicates a PR or issue hasn't been manipulated by an Istio team member for a while
label
Aug 9, 2023
Hello everyone, I'm currently aiming to implement this feature inside Ambient. I've proposed a draft pull request focused on the implementation of what we can call an Ambient Plugin, this is an interface wrapping the call made by Ambient to the underlying CNI.
Also, I'm willing to implement Cilium support too along this interface as a PoC. I'm open to help on this one and if you have any recommendations feel free to comment under the PR
Problems of getting started with ambient (Istio 1.18.0-alpha.0) on CNIs Cilium, Calico & Flannel
#46930