Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new ignored interfaces to NodeNetworkInterfaceDown Alert #3279

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

machadovilaca
Copy link
Member

What this PR does / why we need it:

Additional network interfaces should be ignored from the NodeNetworkInterfaceDown alert

count by (instance) (node_network_up{device!~"lo|tunbr|veth.+|ovs-system|genev_sys.+|br-int"} == 0) > 0

/cc @maiqueb

Reviewer Checklist

Reviewers are supposed to review the PR for every aspect below one by one. To check an item means the PR is either "OK" or "Not Applicable" in terms of that item. All items are supposed to be checked before merging a PR.

  • PR Message
  • Commit Messages
  • How to test
  • Unit Tests
  • Functional Tests
  • User Documentation
  • Developer Documentation
  • Upgrade Scenario
  • Uninstallation Scenario
  • Backward Compatibility
  • Troubleshooting Friendly

Jira Ticket:

https://issues.redhat.com/browse/CNV-55543

Release note:

Add new ignored interfaces to NodeNetworkInterfaceDown Alert

@kubevirt-bot kubevirt-bot requested a review from maiqueb January 28, 2025 15:27
@kubevirt-bot kubevirt-bot added release-note Denotes a PR that will be considered when it comes time to generate release notes. dco-signoff: yes Indicates the PR's author has DCO signed all their commits. size/S labels Jan 28, 2025
@coveralls
Copy link
Collaborator

coveralls commented Jan 28, 2025

Pull Request Test Coverage Report for Build 13305585006

Details

  • 2 of 2 (100.0%) changed or added relevant lines in 1 file are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage remained the same at 71.958%

Totals Coverage Status
Change from base Build 13293099233: 0.0%
Covered Lines: 6169
Relevant Lines: 8573

💛 - Coveralls

promv1 "github.com/prometheus-operator/prometheus-operator/pkg/apis/monitoring/v1"
"k8s.io/apimachinery/pkg/util/intstr"
"k8s.io/utils/ptr"
)

var ignoredInterfacesForNetworkDown = []string{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What interfaces are we actually trying to monitor ?

I'm logging into an existing node, and I see the following interfaces:

8: ovn-k8s-mp0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1400 qdisc noqueue state UNKNOWN group default qlen 1000
    link/ether 0a:58:0a:87:00:02 brd ff:ff:ff:ff:ff:ff
    inet 10.135.0.2/23 brd 10.135.1.255 scope global ovn-k8s-mp0
       valid_lft forever preferred_lft forever
    inet6 fe80::858:aff:fe87:2/64 scope link 
       valid_lft forever preferred_lft forever
777: 9f17207c3cc0f46@if2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1400 qdisc noqueue master ovs-system state UP group default 
    link/ether 82:e5:8e:03:e1:ce brd ff:ff:ff:ff:ff:ff link-netns 5bf0ae73-db72-4bfd-b646-b231f799e15d
    inet6 fe80::80e5:8eff:fe03:e1ce/64 scope link 
       valid_lft forever preferred_lft forever
9: genev_sys_6081: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 65000 qdisc noqueue master ovs-system state UNKNOWN group default qlen 1000
    link/ether ce:98:9b:60:6c:df brd ff:ff:ff:ff:ff:ff
    inet6 fe80::cc98:9bff:fe60:6cdf/64 scope link 
       valid_lft forever preferred_lft forever
11: 493c31b001d3731@if2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1400 qdisc noqueue master ovs-system state UP group default 
    link/ether 7a:0d:d9:56:78:5f brd ff:ff:ff:ff:ff:ff link-netns b1d150aa-91dc-4741-872f-1f71dab1a2da
    inet6 fe80::780d:d9ff:fe56:785f/64 scope link 
       valid_lft forever preferred_lft forever
12: 3d50190a313ba21@if2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1400 qdisc noqueue master ovs-system state UP group default 
    link/ether 52:77:4d:f5:8c:4c brd ff:ff:ff:ff:ff:ff link-netns 28e514e3-3b79-495a-b5d6-483d779203c5
    inet6 fe80::5077:4dff:fef5:8c4c/64 scope link 
       valid_lft forever preferred_lft forever

And many more like 493c31b001d3731 (just an example).
Are we interested in throwing alarms when these go down ? These are the host side of the veths connecting the pod to OVS.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, regarding ovn-k8s-mp0 ... would we want to know about this ? This is the default cluster network management port, which is used for some OVN-K features.

If we think about primary UDN, we will have one of these per primary UDN.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have any idea, for example, why in the NodeNetworkInterfaceFlapping: changes(node_network_up{device!~"veth.+|tunbr",job="node-exporter"}[2m]) > 2, they only ignore "veth.+", "tunbr"?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

totally clueless.

I guess they - whoever they are - don't run in openshift CI, whose monitor tests would fail at this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the openshift monitoring team also provides the flapping alert for openshift customers

@hco-bot
Copy link
Collaborator

hco-bot commented Jan 28, 2025

hco-e2e-operator-sdk-sno-aws lane succeeded.
/override ci/prow/hco-e2e-operator-sdk-sno-azure
hco-e2e-operator-sdk-gcp lane succeeded.
/override ci/prow/hco-e2e-operator-sdk-azure
hco-e2e-operator-sdk-gcp lane succeeded.
/override ci/prow/hco-e2e-operator-sdk-aws

@kubevirt-bot
Copy link
Contributor

@hco-bot: Overrode contexts on behalf of hco-bot: ci/prow/hco-e2e-operator-sdk-aws, ci/prow/hco-e2e-operator-sdk-azure, ci/prow/hco-e2e-operator-sdk-sno-azure

In response to this:

hco-e2e-operator-sdk-sno-aws lane succeeded.
/override ci/prow/hco-e2e-operator-sdk-sno-azure
hco-e2e-operator-sdk-gcp lane succeeded.
/override ci/prow/hco-e2e-operator-sdk-azure
hco-e2e-operator-sdk-gcp lane succeeded.
/override ci/prow/hco-e2e-operator-sdk-aws

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@hco-bot
Copy link
Collaborator

hco-bot commented Jan 28, 2025

hco-e2e-upgrade-prev-operator-sdk-aws lane succeeded.
/override ci/prow/hco-e2e-upgrade-prev-operator-sdk-azure
hco-e2e-upgrade-operator-sdk-sno-aws lane succeeded.
/override ci/prow/hco-e2e-upgrade-operator-sdk-sno-azure

@kubevirt-bot
Copy link
Contributor

@hco-bot: Overrode contexts on behalf of hco-bot: ci/prow/hco-e2e-upgrade-operator-sdk-sno-azure, ci/prow/hco-e2e-upgrade-prev-operator-sdk-azure

In response to this:

hco-e2e-upgrade-prev-operator-sdk-aws lane succeeded.
/override ci/prow/hco-e2e-upgrade-prev-operator-sdk-azure
hco-e2e-upgrade-operator-sdk-sno-aws lane succeeded.
/override ci/prow/hco-e2e-upgrade-operator-sdk-sno-azure

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@hco-bot
Copy link
Collaborator

hco-bot commented Jan 28, 2025

hco-e2e-upgrade-prev-operator-sdk-sno-azure lane succeeded.
/override ci/prow/hco-e2e-upgrade-prev-operator-sdk-sno-aws

@kubevirt-bot
Copy link
Contributor

@hco-bot: Overrode contexts on behalf of hco-bot: ci/prow/hco-e2e-upgrade-prev-operator-sdk-sno-aws

In response to this:

hco-e2e-upgrade-prev-operator-sdk-sno-azure lane succeeded.
/override ci/prow/hco-e2e-upgrade-prev-operator-sdk-sno-aws

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@hco-bot
Copy link
Collaborator

hco-bot commented Jan 28, 2025

hco-e2e-consecutive-operator-sdk-upgrades-azure lane succeeded.
/override ci/prow/hco-e2e-consecutive-operator-sdk-upgrades-aws
hco-e2e-upgrade-operator-sdk-azure lane succeeded.
/override ci/prow/hco-e2e-upgrade-operator-sdk-aws

@kubevirt-bot
Copy link
Contributor

@hco-bot: Overrode contexts on behalf of hco-bot: ci/prow/hco-e2e-consecutive-operator-sdk-upgrades-aws, ci/prow/hco-e2e-upgrade-operator-sdk-aws

In response to this:

hco-e2e-consecutive-operator-sdk-upgrades-azure lane succeeded.
/override ci/prow/hco-e2e-consecutive-operator-sdk-upgrades-aws
hco-e2e-upgrade-operator-sdk-azure lane succeeded.
/override ci/prow/hco-e2e-upgrade-operator-sdk-aws

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@hco-bot
Copy link
Collaborator

hco-bot commented Jan 28, 2025

hco-e2e-kv-smoke-gcp lane succeeded.
/override ci/prow/hco-e2e-kv-smoke-azure

@kubevirt-bot
Copy link
Contributor

@hco-bot: Overrode contexts on behalf of hco-bot: ci/prow/hco-e2e-kv-smoke-azure

In response to this:

hco-e2e-kv-smoke-gcp lane succeeded.
/override ci/prow/hco-e2e-kv-smoke-azure

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@hco-bot
Copy link
Collaborator

hco-bot commented Jan 28, 2025

hco-e2e-kv-smoke-gcp lane succeeded.
/override ci/prow/hco-e2e-kv-smoke-azure

@kubevirt-bot
Copy link
Contributor

@hco-bot: Overrode contexts on behalf of hco-bot: ci/prow/hco-e2e-kv-smoke-azure

In response to this:

hco-e2e-kv-smoke-gcp lane succeeded.
/override ci/prow/hco-e2e-kv-smoke-azure

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@@ -36,7 +48,7 @@ func clusterAlerts() []promv1.Rule {
},
{
Alert: "NodeNetworkInterfaceDown",
Expr: intstr.FromString("count by (instance) (node_network_up{device!~\"veth.+|tunbr\"} == 0) > 0"),
Expr: intstr.FromString(fmt.Sprintf("count by (instance) (node_network_up{device!~\"%s\"} == 0) > 0", strings.Join(ignoredInterfacesForNetworkDown, "|"))),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lrt's get rid of the ugly \". either by

Suggested change
Expr: intstr.FromString(fmt.Sprintf("count by (instance) (node_network_up{device!~\"%s\"} == 0) > 0", strings.Join(ignoredInterfacesForNetworkDown, "|"))),
Expr: intstr.FromString(fmt.Sprintf(`count by (instance) (node_network_up{device!~"%s"} == 0) > 0`, strings.Join(ignoredInterfacesForNetworkDown, "|"))),

or by

Suggested change
Expr: intstr.FromString(fmt.Sprintf("count by (instance) (node_network_up{device!~\"%s\"} == 0) > 0", strings.Join(ignoredInterfacesForNetworkDown, "|"))),
Expr: intstr.FromString(fmt.Sprintf("count by (instance) (node_network_up{device!~%q} == 0) > 0", strings.Join(ignoredInterfacesForNetworkDown, "|"))),

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

@sradco
Copy link
Collaborator

sradco commented Feb 12, 2025

/approve

@kubevirt-bot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: sradco

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@kubevirt-bot kubevirt-bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 12, 2025
@hco-bot
Copy link
Collaborator

hco-bot commented Feb 13, 2025

hco-e2e-operator-sdk-gcp lane succeeded.
/override ci/prow/hco-e2e-operator-sdk-azure
hco-e2e-operator-sdk-gcp lane succeeded.
/override ci/prow/hco-e2e-operator-sdk-aws

@kubevirt-bot
Copy link
Contributor

@hco-bot: Overrode contexts on behalf of hco-bot: ci/prow/hco-e2e-operator-sdk-aws, ci/prow/hco-e2e-operator-sdk-azure

In response to this:

hco-e2e-operator-sdk-gcp lane succeeded.
/override ci/prow/hco-e2e-operator-sdk-azure
hco-e2e-operator-sdk-gcp lane succeeded.
/override ci/prow/hco-e2e-operator-sdk-aws

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@avlitman
Copy link
Collaborator

/lgtm

@kubevirt-bot kubevirt-bot added the lgtm Indicates that a PR is ready to be merged. label Feb 13, 2025
@hco-bot
Copy link
Collaborator

hco-bot commented Feb 13, 2025

hco-e2e-upgrade-prev-operator-sdk-aws lane succeeded.
/override ci/prow/hco-e2e-upgrade-prev-operator-sdk-azure
hco-e2e-upgrade-prev-operator-sdk-sno-aws lane succeeded.
/override ci/prow/hco-e2e-upgrade-prev-operator-sdk-sno-azure
hco-e2e-upgrade-operator-sdk-sno-aws lane succeeded.
/override ci/prow/hco-e2e-upgrade-operator-sdk-sno-azure

@kubevirt-bot
Copy link
Contributor

@hco-bot: Overrode contexts on behalf of hco-bot: ci/prow/hco-e2e-upgrade-operator-sdk-sno-azure, ci/prow/hco-e2e-upgrade-prev-operator-sdk-azure, ci/prow/hco-e2e-upgrade-prev-operator-sdk-sno-azure

In response to this:

hco-e2e-upgrade-prev-operator-sdk-aws lane succeeded.
/override ci/prow/hco-e2e-upgrade-prev-operator-sdk-azure
hco-e2e-upgrade-prev-operator-sdk-sno-aws lane succeeded.
/override ci/prow/hco-e2e-upgrade-prev-operator-sdk-sno-azure
hco-e2e-upgrade-operator-sdk-sno-aws lane succeeded.
/override ci/prow/hco-e2e-upgrade-operator-sdk-sno-azure

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@hco-bot
Copy link
Collaborator

hco-bot commented Feb 13, 2025

hco-e2e-upgrade-operator-sdk-aws lane succeeded.
/override ci/prow/hco-e2e-upgrade-operator-sdk-azure
hco-e2e-operator-sdk-sno-aws lane succeeded.
/override ci/prow/hco-e2e-operator-sdk-sno-azure

@kubevirt-bot
Copy link
Contributor

@hco-bot: Overrode contexts on behalf of hco-bot: ci/prow/hco-e2e-operator-sdk-sno-azure, ci/prow/hco-e2e-upgrade-operator-sdk-azure

In response to this:

hco-e2e-upgrade-operator-sdk-aws lane succeeded.
/override ci/prow/hco-e2e-upgrade-operator-sdk-azure
hco-e2e-operator-sdk-sno-aws lane succeeded.
/override ci/prow/hco-e2e-operator-sdk-sno-azure

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@hco-bot
Copy link
Collaborator

hco-bot commented Feb 13, 2025

hco-e2e-kv-smoke-gcp lane succeeded.
/override ci/prow/hco-e2e-kv-smoke-azure

@kubevirt-bot
Copy link
Contributor

@hco-bot: Overrode contexts on behalf of hco-bot: ci/prow/hco-e2e-kv-smoke-azure

In response to this:

hco-e2e-kv-smoke-gcp lane succeeded.
/override ci/prow/hco-e2e-kv-smoke-azure

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@hco-bot
Copy link
Collaborator

hco-bot commented Feb 13, 2025

hco-e2e-upgrade-prev-operator-sdk-aws lane succeeded.
/override ci/prow/hco-e2e-upgrade-prev-operator-sdk-azure

@kubevirt-bot
Copy link
Contributor

@hco-bot: Overrode contexts on behalf of hco-bot: ci/prow/hco-e2e-upgrade-prev-operator-sdk-azure

In response to this:

hco-e2e-upgrade-prev-operator-sdk-aws lane succeeded.
/override ci/prow/hco-e2e-upgrade-prev-operator-sdk-azure

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@hco-bot
Copy link
Collaborator

hco-bot commented Feb 13, 2025

hco-e2e-upgrade-prev-operator-sdk-sno-aws lane succeeded.
/override ci/prow/hco-e2e-upgrade-prev-operator-sdk-sno-azure

@kubevirt-bot
Copy link
Contributor

@hco-bot: Overrode contexts on behalf of hco-bot: ci/prow/hco-e2e-upgrade-prev-operator-sdk-sno-azure

In response to this:

hco-e2e-upgrade-prev-operator-sdk-sno-aws lane succeeded.
/override ci/prow/hco-e2e-upgrade-prev-operator-sdk-sno-azure

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@hco-bot
Copy link
Collaborator

hco-bot commented Feb 13, 2025

hco-e2e-kv-smoke-gcp lane succeeded.
/override ci/prow/hco-e2e-kv-smoke-azure

@kubevirt-bot
Copy link
Contributor

@hco-bot: Overrode contexts on behalf of hco-bot: ci/prow/hco-e2e-kv-smoke-azure

In response to this:

hco-e2e-kv-smoke-gcp lane succeeded.
/override ci/prow/hco-e2e-kv-smoke-azure

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@machadovilaca
Copy link
Member Author

@nunnatsa failures are unrelated to this PR, can we override these?

@machadovilaca
Copy link
Member Author

/retest

1 similar comment
@nunnatsa
Copy link
Collaborator

/retest

Copy link

openshift-ci bot commented Feb 19, 2025

@machadovilaca: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/hco-e2e-upgrade-prev-operator-sdk-azure d88c529 link true /test hco-e2e-upgrade-prev-operator-sdk-azure
ci/prow/hco-e2e-upgrade-prev-operator-sdk-sno-azure d88c529 link false /test hco-e2e-upgrade-prev-operator-sdk-sno-azure
ci/prow/hco-e2e-kv-smoke-azure d88c529 link true /test hco-e2e-kv-smoke-azure
ci/prow/hco-e2e-consecutive-operator-sdk-upgrades-aws d88c529 link true /test hco-e2e-consecutive-operator-sdk-upgrades-aws
ci/prow/hco-e2e-consecutive-operator-sdk-upgrades-azure d88c529 link true /test hco-e2e-consecutive-operator-sdk-upgrades-azure

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@machadovilaca
Copy link
Member Author

/retest

@nunnatsa failed again

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. dco-signoff: yes Indicates the PR's author has DCO signed all their commits. lgtm Indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/S
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants