Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tag::read_from_path(): value: Parsing: uneven number of IPLS strings #147

Closed
randoragon opened this issue Dec 1, 2024 · 18 comments
Closed

Comments

@randoragon
Copy link

Recently, id3 completely stopped being able to parse any of my files. Strangely enough, I have not run cargo update in a long time, so I don't understand how I could have even introduced a regression, but nevertheless, here's a reproducible example.

use id3::Tag;

fn main() {
    let tag = Tag::read_from_path("sample.mp3").unwrap();
    println!("{:#?}", tag);
}

sample.mp3.tar.gz
I attached sample.mp3 as a gzipped tar archive, because GitHub does not seem to support attaching MP3s directly (but MP4s are totally fine, huh .-.). I tested many of my over 10,000 local mp3 files and all of them exhibit the same error. It did not use to happen before, and other tools like mid3v2 and eyeD3 have no trouble reading the same files.

Executing the above program with sample.mp3 in the current directory produces the following, with RUST_BACKTRACE=full:

$ RUST_BACKTRACE=full cargo r
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.04s
     Running `target/debug/rsid3`
thread 'main' panicked at src/main.rs:4:49:
called `Result::unwrap()` on an `Err` value: Parsing: uneven number of IPLS strings
stack backtrace:
   0:     0x5555555f2ed6 - <std::sys::backtrace::BacktraceLock::print::DisplayBacktrace as core::fmt::Display>::fmt::h9754a6437f1de22a
   1:     0x5555556326cb - core::fmt::write::hb706a393bb60a06f
   2:     0x5555555f7399 - std::io::Write::write_fmt::h9b447dc5d824d0bd
   3:     0x55555561dad6 - std::panicking::default_hook::{{closure}}::h227952daede9dd84
   4:     0x55555561d766 - std::panicking::default_hook::h31626ee1feb8ee2a
   5:     0x55555561dfa1 - std::panicking::rust_panic_with_hook::h76d2aa694a00748e
   6:     0x5555555f3427 - std::panicking::begin_panic_handler::{{closure}}::h8855a344ffa1638b
   7:     0x5555555f30e9 - std::sys::backtrace::__rust_end_short_backtrace::h3110d0cbfbf26886
   8:     0x55555561dc14 - rust_begin_unwind
   9:     0x555555563833 - core::panicking::panic_fmt::h3af706d0346c1c60
  10:     0x555555563246 - core::result::unwrap_failed::h2ba69a02b8a8418b
  11:     0x55555556b843 - core::result::Result<T,E>::unwrap::h2d43590ad2b0d9b1
                               at /build/rustc-1.82.0-src/library/core/src/result.rs:1102:23
  12:     0x55555556b843 - rsid3::main::hd10313d690953dd6
                               at /home/pcache/Software/id3/src/main.rs:4:15
  13:     0x55555558f2db - core::ops::function::FnOnce::call_once::hb3f7b392556f4b59
                               at /build/rustc-1.82.0-src/library/core/src/ops/function.rs:250:5
  14:     0x5555555664de - std::sys::backtrace::__rust_begin_short_backtrace::h1656c15ebe1192d9
                               at /build/rustc-1.82.0-src/library/std/src/sys/backtrace.rs:154:18
  15:     0x555555566491 - std::rt::lang_start::{{closure}}::h59c225471d41be13
                               at /build/rustc-1.82.0-src/library/std/src/rt.rs:164:18
  16:     0x55555561f985 - std::rt::lang_start_internal::h2084fce485147f46
  17:     0x55555556646a - std::rt::lang_start::h1684807171f34ae4
                               at /build/rustc-1.82.0-src/library/std/src/rt.rs:163:17
  18:     0x55555556bafe - main
  19:     0x7ffff7dca27e - __libc_start_call_main
  20:     0x7ffff7dca339 - __libc_start_main_alias_2
  21:     0x555555563d45 - _start
  22:                0x0 - <unknown>

If there's something wrong with my files, I would appreciate some feedback. Although, again, other tools seem to have no trouble reading them, and neither did rust-id3 just a few weeks ago, so this seems strange.

@randoragon
Copy link
Author

I forced a specific version in Cargo.toml and found that this issue was introduced in 1.15.0, and is still present in 1.16.0. Prior versions read the tag just fine (earliest tested 1.12.0)

randoragon added a commit to randoragon/rsid3 that referenced this issue Dec 1, 2024
It seems the id3 package has introduced a regression, causing a reading
error on some mp3 files.
See polyfloyd/rust-id3#147
randoragon added a commit to randoragon/music-tools that referenced this issue Dec 1, 2024
@polyfloyd
Copy link
Owner

@Holzhaus Would you be able to look into this?

@randoragon
Copy link
Author

Also, don't mind my comment about not understanding how I could have introduced a regression. I forgot that cargo by default bumps minor versions on rebuild, and I have recently reinstalled my tools, so this makes sense. I was genuinely confused for a second :)

@Holzhaus
Copy link
Contributor

I can have a look, but according to the spec the IPLS tag is supposed to consist of involvee-involement tuples:

Since there might be a lot of people contributing to an audio file in various ways, such as musicians and technicians, the 'Text information frames' are often insufficient to list everyone involved in a project. The 'Involved people list' is a frame containing the names of those involved, and how they were involved. The body simply contains a terminated string with the involvement directly followed by a terminated string with the involvee followed by a new involvement and so on. There may only be one "IPLS" frame in each tag.

If there is an uneven number of IPLS items, then this does not hold. We coud either assume an empty last item, or maybe parse the entire IPLS as "Unknown" content in that case?

@norbertkeri
Copy link

I'm also encountering this, apparently out of the ~7000 mp3 files I have, now only 137 get parsed correctly, all the rest fail with the aforementioned error. They all parse successfully with 1.14.

@Holzhaus
Copy link
Contributor

Can you look into failing files? Could you post the tag contents for that frame here?

Maybe there is some quirk with popular tagging software that writes the tag slightly incorrectly? Then we could add a workaround instead of encoding the entire tag as unknown.

@randoragon
Copy link
Author

Can you look into failing files? Could you post the tag contents for that frame here?

In case it was overlooked, I've attached a sample failing file in the original post.

@norbertkeri
Copy link

norbertkeri commented Jan 28, 2025

I looked at some of the failing files, and it seems they don't have the IPLS tag filled out. I looked at it with MusicBrainz Picard, is there a better software to use? The files were originally tagged with a bunch of different tagging software not (just) Picard.

@Holzhaus
Copy link
Contributor

I looked at some of the failing files, and it seems they don't have the IPLS tag filled out. I looked at it with MusicBrainz Picard, is there a better software to use? The files were originally tagged with a bunch of different tagging software not (just) Picard.

IPLS is for ID3v2.3. If it's an ID3v2.4 tag, check the TMCL and TIPL frames.

@norbertkeri
Copy link

Those are also missing, or at least don't show up, but I wonder whether Picard actually supports these. If I try to add a new tag, I can't see anything resembling "involved people list" or "credits list" (which seem to correspond to TMCL and TIPL) in the list of tags that can be added.

@Holzhaus
Copy link
Contributor

Holzhaus commented Feb 8, 2025

Okay, I looked into the sample file that OP attached to the ticket:

$ cargo run --example tagdump sample.mp3 | hexdump -C | less
000000f0  00 03 52 75 73 68 4a 65  74 31 54 49 50 4c 00 00  |..RushJet1TIPL..|
00000100  00 09 00 00 03 61 72 72  61 6e 67 65 72 54 44 4f  |.....arrangerTDO|
00000110  52 00 00 00 0b 00 00 03  32 30 31 34 2d 31 32 2d  |R.......2014-12-|

As you can see, the TIPL tag contains only arranger, but not the second value. Looks like the tool used for tagging is broken and writes an invalid frame. But since there are such tools, it's not desirable to fail parsing the entire tag including the valid frames.

I can think of 3 ways to handle frames that fail to parse:

  1. We treat an invalid frame like the frame wasn't set at all. This is the worst option IMHO.
  2. We store an enum which can either be the parsed content (if valid) or the byte data (if invalid).
  3. We always only store a dummy object containing byte data and make parsing a method of that object. Then the actual frame content can be parsed lazily if the callee needs the frame contents and errors can be handled by the callee.

I have no strong preference, but both would be an API change.

@polyfloyd What do you think?

@randoragon
Copy link
Author

randoragon commented Feb 8, 2025

As you can see, the TIPL tag contains only arranger, but not the second value. Looks like the tool used for tagging is broken and writes an invalid frame. But since there are such tools, it's not desirable to fail parsing the entire tag including the valid frames.

For the record, all my files were tagged using the beets music library manager. It seems to be using the Mutagen library for doing the tagging. If you're sure this is an error on their behalf, we should open a ticket.

I haven't looked closely into how the TIPL tag works, but in my experience, ID3 can be a terribly vague standard when you zoom in on its details, so perhaps this is one of those cases of "we interpreted the spec differently", or "we know this is slightly misaligned with the spec, but historically this is how most tools have behaved and we need to conform to the majority".

@Holzhaus
Copy link
Contributor

Holzhaus commented Feb 8, 2025

For the record, all my files were tagged using the beets music library manager. It seems to be using the Mutagen library for doing the tagging. If you're sure this is an error on their behalf, we should open a ticket.

At least some time ago, beets didn't support the TIPL frame (not sure if they do now - lack of support for this was one of the reasons why I wrote my own tool). If you didn't configure beets to strip the frame, it would probably have just copied it the value set by the previous tagger. If they only copied the first value (before the null-byte), this would explain why the tag only contains a single value (instead of pairs of values, see below).

I haven't looked closely into how the TIPL tag works, but in my experience, ID3 can be a terribly vague standard when you zoom in on its details, so perhaps this is one of those cases of "we interpreted the spec differently", or "we know this is slightly misaligned with the spec, but historically this is how most tools have behaved and we need to conform to the majority".

I fully agree that the spec is vague. But in this specific case, it is pretty clear that the field should contain value pairs:

TMCL

The 'Musician credits list' is intended as a mapping between
instruments and the musician that played it. Every odd field is an
instrument and every even is an artist or a comma delimited list of
artists.

TIPL

The 'Involved people list' is very similar to the musician credits
list, but maps between functions, like producer, and names.

In the sample file, the frame contains only the string arranger when it should contain something like arranger\0Jane Doe. So it's clear the file is incompliant. It's not really a slight misalignment with the spec that we can handle gracefully.

@norbertkeri
Copy link

I used the same command you provided to check some of my files, and they look like the sample OP provided:

000000e0  58 58 58 00 00 54 49 50  4c 00 00 00 0b 00 00 03  |XXX..TIPL.......|
000000f0  61 72 72 61 6e 67 65 72  00 00 54 44 4f 52 00 00  |arranger..TDOR..|
00000100  00 0c 00 00 03 32 30 31  34 2d 30 34 2d 30 31 00  |.....2014-04-01.|
00000110  54 58 58 58 00 00 00 0d  00 00 03 53 63 72 69 70  |TXXX.......Scrip|

Also, I saw randoragon mention beets, I actually use that as well, but I completely forgot that it re-tags the files, I mostly use it to sort files into a neat directory structure. Sorry should have mentioned that. That gives more pointers that the problem might be in beets/mutagen.

@Holzhaus
Copy link
Contributor

Holzhaus commented Feb 8, 2025

Regarding graceful handling, it looks like mutagen just assumes an empty value at the end:

$ python -m venv myvenv
$ . myvenv/bin/activate
(myvenv) $ pip install mutagen
Collecting mutagen
  Downloading mutagen-1.47.0-py3-none-any.whl.metadata (1.7 kB)
Downloading mutagen-1.47.0-py3-none-any.whl (194 kB)
Installing collected packages: mutagen
Successfully installed mutagen-1.47.0
(myvenv) $ python -c 'import sys; import mutagen; f = mutagen.File(sys.argv[1]); print(sys.argv[1], f["TIPL"])' sample.mp3
sample.mp3 TIPL(encoding=<Encoding.UTF8: 3>, people=[['arranger', '']])

Maybe we can also consider doing this? It doesn't really make sense to have an empty name for the "arranger" role, but if stops making parsing fail this might also be a feasible option. Advantage would be that it doesn't require an API change.

Holzhaus added a commit to Holzhaus/rust-id3 that referenced this issue Feb 8, 2025
The test data from extracted from the sample file attached to polyfloyd#147.

    $ cargo run --example tagdump sample.mp3 > testdata/github-issue-147.id3
    Tag size: 42998
    Footer: no
    Writing 42998 bytes to stdout...
    Done.
@Holzhaus
Copy link
Contributor

Holzhaus commented Feb 8, 2025

Here's a patched version that mirrors how mutagen handles it: #154

polyfloyd pushed a commit that referenced this issue Feb 8, 2025
The test data from extracted from the sample file attached to #147.

    $ cargo run --example tagdump sample.mp3 > testdata/github-issue-147.id3
    Tag size: 42998
    Footer: no
    Writing 42998 bytes to stdout...
    Done.
@polyfloyd
Copy link
Owner

Merged the proposed fix. Thank you all for your involvement :)

@norbertkeri
Copy link

norbertkeri commented Feb 8, 2025

Just ran a test with the new 1.16.2 release, all my files parse again, thank you very much!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants