Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handler ErrorTracker.Integrations.Oban has failed and has been detached #63

Closed
jaimeiniesta opened this issue Aug 23, 2024 · 9 comments · Fixed by #70
Closed

Handler ErrorTracker.Integrations.Oban has failed and has been detached #63

jaimeiniesta opened this issue Aug 23, 2024 · 9 comments · Fixed by #70
Assignees
Labels
bug Something isn't working

Comments

@jaimeiniesta
Copy link

Hi! I'm seeing this in the logs and I'm not sure why that's happening.

I'm using these versions:

elixir 1.14.5
erlang 25.3.2.10
phoenix 1.7.12
oban 2.18.2
oban_pro 1.4.11
oban_web 2.10.5
error_tracker 0.2.2

Development logs

[error] Handler ErrorTracker.Integrations.Oban has failed and has been detached. Class=:error
Reason={:badkey, :file, nil}
Stacktrace=[
  {ErrorTracker.Error, :new, 3,
   [file: 'lib/error_tracker/schemas/error.ex', line: 32]},
  {ErrorTracker, :report, 3, [file: 'lib/error_tracker.ex', line: 106]},
  {:telemetry, :"-execute/3-fun-0-", 4,
   [
     file: '/Users/jaime/code/myapp/deps/telemetry/src/telemetry.erl',
     line: 160
   ]},
  {:lists, :foreach_1, 2, [file: 'lists.erl', line: 1442]},
  {Oban.Queue.Executor, :emit_event, 1,
   [file: 'lib/oban/queue/executor.ex', line: 284]},
  {Oban.Queue.Executor, :"-call/1-fun-0-", 1,
   [file: 'lib/oban/queue/executor.ex', line: 84]},
  {Oban.Backoff, :with_retry, 3, [file: 'lib/oban/backoff.ex', line: 108]},
  {Task.Supervised, :invoke_mfa, 2, [file: 'lib/task/supervised.ex', line: 89]},
  {Task.Supervised, :reply, 4, [file: 'lib/task/supervised.ex', line: 34]},
  {:proc_lib, :init_p_do_apply, 3, [file: 'proc_lib.erl', line: 240]}
]

Staging server logs

2024-08-23T07:46:01.833662+00:00 app[web.1]: 07:46:01.833 [error] Handler ErrorTracker.Integrations.Oban has failed and has been detached. Class=:error
2024-08-23T07:46:01.833679+00:00 app[web.1]: Reason={:badkey, :file, nil}
2024-08-23T07:46:01.833680+00:00 app[web.1]: Stacktrace=[
2024-08-23T07:46:01.833680+00:00 app[web.1]: {ErrorTracker.Error, :new, 3,
2024-08-23T07:46:01.833681+00:00 app[web.1]: [file: 'lib/error_tracker/schemas/error.ex', line: 32]},
2024-08-23T07:46:01.833682+00:00 app[web.1]: {ErrorTracker, :report, 3, [file: 'lib/error_tracker.ex', line: 106]},
2024-08-23T07:46:01.833683+00:00 app[web.1]: {:telemetry, :"-execute/3-fun-0-", 4,
2024-08-23T07:46:01.833683+00:00 app[web.1]: [file: '/tmp/build_477b1e97/deps/telemetry/src/telemetry.erl', line: 160]},
2024-08-23T07:46:01.833684+00:00 app[web.1]: {:lists, :foreach_1, 2, [file: 'lists.erl', line: 1442]},
2024-08-23T07:46:01.833684+00:00 app[web.1]: {Oban.Queue.Executor, :emit_event, 1,
2024-08-23T07:46:01.833684+00:00 app[web.1]: [file: 'lib/oban/queue/executor.ex', line: 284]},
2024-08-23T07:46:01.833685+00:00 app[web.1]: {Oban.Queue.Executor, :"-call/1-fun-0-", 1,
2024-08-23T07:46:01.833685+00:00 app[web.1]: [file: 'lib/oban/queue/executor.ex', line: 84]},
2024-08-23T07:46:01.833685+00:00 app[web.1]: {Oban.Backoff, :with_retry, 3, [file: 'lib/oban/backoff.ex', line: 108]},
2024-08-23T07:46:01.833686+00:00 app[web.1]: {Task.Supervised, :invoke_mfa, 2, [file: 'lib/task/supervised.ex', line: 89]}
2024-08-23T07:46:01.833686+00:00 app[web.1]: ]
@odarriba
Copy link
Contributor

According to the stack trace it looks like a problem with certain exceptions that do not report the file of your app in which happened.

We will take a look and push a fix for it. Will keep you posted.

Thanks for the early testing and reporting the issue :)

@crbelaus crbelaus added the bug Something isn't working label Aug 23, 2024
@odarriba odarriba self-assigned this Aug 24, 2024
@odarriba
Copy link
Contributor

I'm not able to reproduce this. I tried with several exception, including some that are specially difficult (' 1/0` or calling an unexistent function) without being able to reproduce it.

According to the stacktrace you are getting from telemetry it looks like the exception has no stacktrace lines, which is weird.

I can prepare a fix without reproducing it, but it would be best to be able to see why that happens.

Do you know which kind of exception did happen at first place? 👀 @jaimeiniesta

@jaimeiniesta
Copy link
Author

Thanks for taking a look! I'm not entirely sure. I suspect it's an Oban worker that saves screenshots to disk and then uploads them to S3, but I'm not sure. I'll try to find it.

@jaimeiniesta
Copy link
Author

Hey, I've been able to reproduce this. This ScreenshotWorker, in case of timeouts, will end returning {:error, :timeout}, so that Oban catches the error and retries. This is when ErrorTracker crashes. Here's how to reproduce it. This worker will always return {:error, :timeout}:

defmodule Myapp.Workers.TestWorker do
  use Oban.Worker

  def test do
    %{}
    |> __MODULE__.new()
    |> Oban.insert!()
  end

  @impl Oban.Worker
  def perform(_) do
    {:error, :timeout}
  end
end

If you launch the worker this is what happens:

ex(2)>  Myapp.Workers.TestWorker.test()
%Oban.Job{
  __meta__: #Ecto.Schema.Metadata<:loaded, "public", "oban_jobs">,
  id: 297861786,
  state: "available",
  queue: "default",
  worker: "Myapp.Workers.TestWorker",
  args: %{},
  meta: %{},
  tags: [],
  errors: [],
  attempt: 0,
  attempted_by: nil,
  max_attempts: 20,
  priority: 0,
  attempted_at: nil,
  cancelled_at: nil,
  completed_at: nil,
  discarded_at: nil,
  inserted_at: ~U[2024-08-26 07:12:28.666320Z],
  scheduled_at: ~U[2024-08-26 07:12:28.666320Z],
  conf: nil,
  conflict?: false,
  replace: nil,
  unique: nil,
  unsaved_error: nil
}
iex(3)> [error] Handler ErrorTracker.Integrations.Oban has failed and has been detached. Class=:error
Reason={:badkey, :file, nil}
Stacktrace=[
  {ErrorTracker.Error, :new, 3,
   [file: 'lib/error_tracker/schemas/error.ex', line: 32]},
  {ErrorTracker, :report, 3, [file: 'lib/error_tracker.ex', line: 106]},
  {:telemetry, :"-execute/3-fun-0-", 4,
   [
     file: '/Users/jaime/code/myapp/deps/telemetry/src/telemetry.erl',
     line: 160
   ]},
  {:lists, :foreach_1, 2, [file: 'lists.erl', line: 1442]},
  {Oban.Queue.Executor, :emit_event, 1,
   [file: 'lib/oban/queue/executor.ex', line: 284]},
  {Oban.Queue.Executor, :"-call/1-fun-0-", 1,
   [file: 'lib/oban/queue/executor.ex', line: 84]},
  {Oban.Backoff, :with_retry, 3, [file: 'lib/oban/backoff.ex', line: 108]},
  {Task.Supervised, :invoke_mfa, 2, [file: 'lib/task/supervised.ex', line: 89]},
  {Task.Supervised, :reply, 4, [file: 'lib/task/supervised.ex', line: 34]},
  {:proc_lib, :init_p_do_apply, 3, [file: 'proc_lib.erl', line: 240]}
]

@odarriba
Copy link
Contributor

Thanks for the detailed steps to reproduce it @jaimeiniesta ! The error makes sense because as the code of the worker is not generating an exception, no stack trace is returned from Oban.

I have prepared a fix in #70 which i hope will be released soon 🚂

Feel free to test it if you want, or even use that branch temporarily if it is breaking your envs.

And thanks again for so detailed issues. We are still polishing some parts and having reports from real users is really helpful 🤗

odarriba added a commit that referenced this issue Aug 28, 2024
There was an error when an Oban worker returned an `{:error, reason}`
tuple because Oban does not return an stacktrace (makes sense, as it is
not an exception by itself).

Our code was not prepared for that use case. I modified it to handle it
and circumvent the needed for source file and source function
information.

After this change, empty stack traces and source information are not
shown on the dashboard to avoid confusion.

As an extra feature I have also added the reporting of the `state` of
the job (which is mostly ever `failure` for us).

Closes #63
@odarriba
Copy link
Contributor

The fix has been released in v0.2.4 a few seconds ago 🚀

@jaimeiniesta
Copy link
Author

jaimeiniesta commented Sep 2, 2024

Hey, I'm trying the latest version, 0.2.4 and it works great, thanks!

I'm still thinking about what's the best strategy to deal with Oban errors, as I rely much on Oban's retry mechanism. For example, my ScreenshotWorker can deal with rate limits and timeouts in this way:

{:error, :rate_limited} ->
  {:snooze, 1}

{:error, :screenshot_timeout} ->
  if corrected_attempt(job) >= @max_attempts do
    {:cancel, :timeout}
  else
    {:error, :timeout}
  end

That is, for a rate limit we use Oban's snooze to automatically retry in 1 second. For a network timeout, we may cancel the job after the max attempts has been reached, or return with {:error, :timeout} otherwise, so that Oban will automatically retry the job.

In this last case we don't really care about tracking the exception, because Oban is going to retry. We're only interested in tracking the error in the case of returning :cancel, as that's the last time Oban will deal with the job.

We have many other Oban workers that will fail often with network errors, timeouts etc., as our app is a web scraper.

What would be the best approach?

@crbelaus
Copy link
Contributor

crbelaus commented Sep 2, 2024

Hey @jaimeiniesta. I am currently working on a feature that will allow users to ignore errors that they don't want to be tracked.

I think that this would solve your issue. You could just ignore those errors and they wouldn't be tracked and wouldn't show in the dashboard either.


Update: the new feature is ready to review in #79

@jaimeiniesta
Copy link
Author

@crbelaus thanks, I've tried that branch and it works great, it's easy to define and ignore that matches on fields from the error or context, I would use that. 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants