-
Notifications
You must be signed in to change notification settings - Fork 395
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Biojava fails to parse Genbank and EMBL format #843
Comments
Thank you, @innovate-invent. I've also created an issue for biojava-legacy at biojava/biojava-legacy#50 |
hi , i am new to embl format parsing |
Hi @josemduarte , |
Thanks @MaxGreil . The issue is quite well explained above. BioJava should be able to parse wrapped records as one record:
Best would be to have that in a unit test and develop a fix based on the unit test. There are more details on parse recommendations in the email pasted above:
|
Biojava fails to parse
anticodon
andtransl_except
feature qualifiers when they line wrap.Biojava expects the values to be quoted, this is invalid.
This causes applications like Mauve and Colombo/SigiHMM to emit
and discard large portions of the dataset.
This line is from 15584_genome.embl from Biopython:
The matching line from 15584_genome.embl from Bioperl:
The difference between biopython and bioperl is that bioperl quotes anticodons if they wrap.
Copying NIH statement here for reference. Some messages were removed/edited for brevity.
bioperl/bioperl-live#321
The text was updated successfully, but these errors were encountered: