Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

K6 prometheus metric k6_http_req_failed_rate is broken #159

Closed
MarkSRobinson opened this issue Nov 15, 2023 · 1 comment
Closed

K6 prometheus metric k6_http_req_failed_rate is broken #159

MarkSRobinson opened this issue Nov 15, 2023 · 1 comment
Assignees
Labels
bug Something isn't working triage

Comments

@MarkSRobinson
Copy link

MarkSRobinson commented Nov 15, 2023

Brief summary

When using the prometheus remote write extension for k6, the k6_http_req_failed_rate is useless. It doesn't increase or decrease, it jumps to 1 on the first error and stays there.

k6 version

0.47

OS

docker image

Docker version and image (if applicable)

grafana/k6:0.47.0

Steps to reproduce the problem

Config:

                    containers:
                      - name: k6-container
                        image: grafana/k6:0.47.0
                        command: ["/bin/sh", "-c"]
                        args:
                          - "k6 run /scripts/k6.js -o experimental-prometheus-rw"
                        env:
                        - name: K6_PROMETHEUS_RW_SERVER_URL
                          value: "http://metrics-system-prometheus.monitoring.svc.cluster.local:9090/api/v1/write"
                        - name: K6_PROMETHEUS_RW_TREND_STATS
                          value: "p(95),p(99),min,max,avg"

K6 script:

              k6.js: |-
                import http from 'k6/http';
                import { check } from 'k6';
                export const options = {
                stages: [
                  { target: 200, duration: '460s' },
                  { target: 0, duration: '30s' },
                ],
                };
                  export default function () {
                  const result = http.get('http://emoji-svc-1-1.tar:8801/metrics');
                  check(result, {
                'http response status code is 200': result.status === 200,
                });
                }

k6 output

     checks.........................: 99.99%  ✓ 9066640      ✗ 26     
     data_received..................: 58 GB   119 MB/s
     data_sent......................: 861 MB  1.8 MB/s
     http_req_blocked...............: avg=3.45µs   min=0s       med=1.58µs   max=64.74ms  p(90)=2.1µs    p(95)=2.59µs 
     http_req_connecting............: avg=55ns     min=0s       med=0s       max=38.77ms  p(90)=0s       p(95)=0s     
     http_req_duration..............: avg=5.26ms   min=0s       med=4.25ms   max=131.96ms p(90)=9.86ms   p(95)=12.33ms
       { expected_response:true }...: avg=5.26ms   min=676.31µs med=4.25ms   max=131.96ms p(90)=9.86ms   p(95)=12.33ms
     http_req_failed................: 0.00%   ✓ 26           ✗ 9066640
     http_req_receiving.............: avg=401.67µs min=0s       med=220.74µs max=108.02ms p(90)=822.31µs p(95)=1.28ms 
     http_req_sending...............: avg=15.41µs  min=0s       med=7.45µs   max=72.86ms  p(90)=9.82µs   p(95)=16.42µs
     http_req_tls_handshaking.......: avg=0s       min=0s       med=0s       max=0s       p(90)=0s       p(95)=0s     
     http_req_waiting...............: avg=4.84ms   min=0s       med=3.85ms   max=121.12ms p(90)=9.25ms   p(95)=11.64ms
     http_reqs......................: 9066666 18503.357772/s
     iteration_duration.............: avg=5.38ms   min=318.58µs med=4.36ms   max=132.03ms p(90)=10.02ms  p(95)=12.53ms
     iterations.....................: 9066666 18503.357772/s
     vus............................: 1       min=1          max=199  
     vus_max........................: 200     min=200        max=200  

During the active phase, I deleted 30% of the pods for the target service. This caused request errors as expected but is not reported in the metrics.

Expected behaviour

image

Because this is a rate metric, it should show a brief spike and then fall back to 0. An alternative fix, would be to have a k6_http_req_failed_total which prometheus can then turn it into a rate.

Actual behaviour

image

In this image, the metric rate jumped to 1 and stayed there. This isn't correct as it should have dropped back to zero after the system adjusted.

@MarkSRobinson MarkSRobinson added the bug Something isn't working label Nov 15, 2023
@mstoykov mstoykov transferred this issue from grafana/k6 Nov 15, 2023
@mstoykov
Copy link
Contributor

Hi @MarkSRobinson please see #77 where this has already been discussed.

@mstoykov mstoykov closed this as not planned Won't fix, can't repro, duplicate, stale Nov 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working triage
Projects
None yet
Development

No branches or pull requests

2 participants