Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature request] Allow Interaction with PDF Files #242

Open
WeissP opened this issue Nov 1, 2024 · 20 comments
Open

[Feature request] Allow Interaction with PDF Files #242

WeissP opened this issue Nov 1, 2024 · 20 comments

Comments

@WeissP
Copy link

WeissP commented Nov 1, 2024

With GPT-4o, one can upload files such as PDFs via the Assistant object and get the ID of the file. Then, one can create a thread using that ID and ask questions regarding it. Here is an example that demonstrates what I mentioned in Python.

Can we do similar things via chatgpt-shell? My idea is to provide some helper functions for uploading files and returning the file ID. Then, one can attach :file-id as a parameter in org-babel. Do you have any thoughts about that?

@xenodium
Copy link
Owner

xenodium commented Nov 1, 2024

Thanks for filing. I'd like to add features like this in the future. Recently, I've been focusing on some cleanups/rearchs so I can make it easier to enable things like this (and perhaps use different models other than OpenAI's).

My idea is to provide some helper functions for uploading files and returning the file ID. Then, one can attach :file-id as a parameter in org-babel. Do you have any thoughts about that?

Yup. The upload function would be handy. We can hopefully reuse it in the shells too. I was thinking of scanning for markdown links to local files and automatically uploading and saving the file-id's to buffer-local vars (for future resolutions).

Are you keen to contribute an implementation?

@WeissP
Copy link
Author

WeissP commented Nov 1, 2024

Are you keen to contribute an implementation?

Yes! Since I personally really want this feature, I am willing to contribute to it.

However, I am not yet familiar with the source code of your project. Could you please provide me with a rough guidance on what I can do and where I should start?

@xenodium
Copy link
Owner

xenodium commented Nov 1, 2024

Could you please provide me with a rough guidance on what I can do and where I should start?

Take a look at chatgpt-shell-post-chatgpt-messages usages in chatgpt-shell.el or lower level shell-maker-make-http-request to build the upload function. chatgpt-shell--make-payload shows how to create the entire payload of an OpenAI request. chatgpt-shell-filter-chatgpt-output shows how to parse the response.

If you haven't used edebug before, it's great to step through code to figure how things work.

Lemme know if you have more questions.

@xenodium
Copy link
Owner

xenodium commented Nov 1, 2024

Also shell-maker-make-http-request :async nil (or omit) may be handy while figuring out how to make a valid request. shell-maker-logging set to t is also userful to view chatgpt-shell's curl command. You can also tweak this curl command (in the command line or elsewhere) to iterate while getting a valid request payload sent (and also see the json response).

@xenodium
Copy link
Owner

xenodium commented Nov 1, 2024

chatgpt-shell-describe-image may also be of interest as it uploads an image.

@WeissP
Copy link
Author

WeissP commented Nov 1, 2024

Okay I've set up all the necessary HTTP requests. You can check out a demo by setting the pdf path in the first line, which will allow you to see a summary of the PDF file after evaluating the buffer.

Right now, everything runs synchronously, I will change them later. Also, messages are grouped by threads, so one can view all historical messages via a thread-id. I'm not sure yet how or if this should be made available to users. The interaction with assistants and threads differs quite a bit from the usual GPT usage. Additionally, I'm not very familiar with your package, so could you maybe tell me what you have in mind for your plans?

@xenodium
Copy link
Owner

xenodium commented Nov 3, 2024

Thanks for this! Gimme a little time to play with it and get a feel of the OpenAI api.

If I understand correctly, your main use case is org babel? This may be simpler than shell integration, so we could start with that and defer the shell for a little while?

@WeissP
Copy link
Author

WeissP commented Nov 3, 2024

I just want a functional GPT interface in Emacs that can interact with PDFs, whether it's through the shell or org-babel. I'm choosing org-babel just because it looks easier to implement :)

So yes, we can start with org-babel and defer the shell feature.

@xenodium
Copy link
Owner

xenodium commented Nov 4, 2024

Hey, the demo changes are great! 7f61f49 adds a babel experiment to the ob-assistant-file-query branch. Super rough and needs more work, but it's a start if you wanna give it a try (needs at least shell-maker v0.63.1).

ob

@xenodium
Copy link
Owner

xenodium commented Nov 4, 2024

If assistant-id, thread-id, file-id are missing, you can use :file and it will create the whole lot (and copy them to the kill ring in case you want to reuse).

#+begin_src chatgpt-shell :results output :file "path/to/some/file.pdf"
 What's this pdf about? Give me the tl;dr
#+end_src

@WeissP
Copy link
Author

WeissP commented Nov 5, 2024

It looked already really nice, Great Job!

Do you have any ideas about saving file-id, assistant-id, and thread-id persistently? Maybe the org-babel could in addition also receive :session (or any name you like) as an all-in-one, user-friendly parameter, and we could provide a function to help the user generate and insert such a config. For example, after invoking (generate-config "/path/to/book/Mastering-Emacs.pdf"), the following config will be inserted:

(add-to-list
 'gpt-session
 '("Mastering-Emacs" . ( 
                        :description "This is a session about book Mastering Emacs"
                        :file-path "/path/to/book/Mastering-Emacs.pdf"
                        :file-id "xx"
                        :assistant-id "yy"
                        :thread-id "zz"
                        )))

Additionally, we could provide a function prompting users to choose a session via its name, file-path, and description. Then, an org-babel environment like this will be inserted:

#+begin_src chatgpt-shell :results output :session "Mastering-Emacs"

#+end_src

@WeissP
Copy link
Author

WeissP commented Nov 5, 2024

Or maybe we could even migrate the above thing into your :context parameter?

@xenodium
Copy link
Owner

xenodium commented Nov 5, 2024

Do you know if we can rely on OpenAI API and query? What's the lifetime of each of these things? Wondering as keeping a local copy will also require managing staleness.

@WeissP
Copy link
Author

WeissP commented Nov 5, 2024

According to the documentation, files are never deleted, while assistants and threads are removed after 30 days. So, I think at least we should provide a way to store relations between uploaded files and their corresponding file IDs, as uploading a PDF file is both time-consuming and costly.

In addition, ideally, it would be better if users could choose an assistant declaratively. In other words, users should just need to set its name and prompt and chatgpt-shell will manage its assistant ID under the hood.

@WeissP WeissP closed this as completed Nov 6, 2024
@xenodium
Copy link
Owner

xenodium commented Nov 6, 2024

When I use gptel, it always returns gptel-curl-get-response: Wrong type argument: json-value-p, private-gpt.

Hi there. Is this maybe meant for an issue filed on gptel project?

@xenodium
Copy link
Owner

xenodium commented Nov 6, 2024

WeissP closed this as [completed]

We should prolly keep this feature request open until we merge the https://github.com/xenodium/chatgpt-shell/tree/ob-assistant-file-query branch.

@xenodium xenodium reopened this Nov 6, 2024
@WeissP
Copy link
Author

WeissP commented Nov 6, 2024

Hi there. Is this maybe meant for an issue filed on gptel project?

I feel so sorry. I intended to answer questions in another place. Please just ignore what I posted.

@xenodium
Copy link
Owner

xenodium commented Nov 6, 2024

No worries! 👍

@xenodium
Copy link
Owner

xenodium commented Nov 9, 2024

Merged to main (may as well). While it doesn't yet have the :session feature (or local session tracking), it does offer to create all the relevant id's if any is missing. This behaviour is triggered when :file, :file-id, :assistant-id, or :thread-id are used.

At a bare minimum, :file is needed and all the rest is created for you. When they are created, they are automatically copied to the kill ring so you can save them for future usage.

Will have to do for the time being. I have to switch gears and dedicate available effort to enable non-OpenAI models work in chatgpt-shell. That's a biggie.

@xenodium xenodium changed the title [Feature Request] Allow Interaction with PDF Files [Feature request] Allow Interaction with PDF Files Nov 26, 2024
@WeissP
Copy link
Author

WeissP commented Jan 18, 2025

Just found a bug in the function ob-chatgpt-shell--upload-file. When running the shell-maker-make-http-request within it, you encounter errors like this:

((:success) (:output . "{
  \"error\": {
    \"message\": \"Invalid Content-Type header (application/json; charset=utf-8; boundary=------------------------rs7zligAOm2J2yNKLipuNp), expected multipart/form-data. (HINT: If you're using curl, you can pass -H 'Content-Type: multipart/form-data')\",
    \"type\": \"invalid_request_error\",
    \"param\": null,
    \"code\": null
  }
}
"))

The issue seems to be due to the header function chatgpt-shell-openai--make-headers specifying Content-Type: application/json, which is unexpected for file uploads. When I upload PDF files like this:

(let ((path "...")
      (purpose "assistants")
      )  
  (shell-maker-make-http-request
   :async nil
   :url "https://api.openai.com/v1/files"
   :headers (list (format "Authorization: Bearer %s" (chatgpt-shell-openai-key)))
   :fields `(,(format "purpose=%s" purpose)
             ,(format "file=@%s" path))
   :filter (lambda (raw-response)
             (if-let* ((parsed (shell-maker--json-parse-string raw-response))
                       (response (or (let-alist parsed
                                       .error.message)
                                     (let-alist parsed
                                       .id))))
                 response
               (error "Couldn't parse %s" raw-response)))
   )
  )

no errors occur.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants