Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No isoform-level quants from Star_Salmon Output. #1487

Open
kimberzhu opened this issue Jan 16, 2025 · 1 comment
Open

No isoform-level quants from Star_Salmon Output. #1487

kimberzhu opened this issue Jan 16, 2025 · 1 comment

Comments

@kimberzhu
Copy link

Hi All,
I am new to working with nextflow's rnaseq pipe, and am currently using previously produced outputs from this pipeline. The outputs produced from STAR_salmon include quant matrixes on gene and transcript levels. However, when looking at my transcript outputs, there aren't any quantifications on an isoform level (ex. ENST00000420443, and not ENST00000420443.1,ENST00000420443.2, etc.).

The original references look correct for what this pipeline requires, but unfortunately it looks like reference transcriptome produced in this pipeline wasn't saved: ./Homo_sapiens.GRCh38.dna_sm.primary_assembly.fa.gz and ./homo_sapiens/Homo_sapiens.GRCh38.111.gtf.gz. I also dont have any of the quant.sf files, but I do have access to the downstream BAM's (*.markdup.sorted.bam) from STAR. These also appear to lack this isoform-level data.

Could anyone help me figure out why this may be? Or if it is possible to re-process these BAM files in an alternative way to extract this isoform info.

@davidecarlson
Copy link

Hi @kimberzhu ,

Different transcript isoforms from the same gene have different Ensembl IDs. For example, Ensembl gene ENSG00000139618 has transcript isoforms ENST00000380152.8, ENST00000530893.7, etc. (see here for the full list).

The ".8" and ".7" in the transcript IDs are the version numbers, referring to how many times the transcript annotation has been revised.

So unless I'm misunderstanding your question (always possible!), you do have transcript isoform quantification data in your results. Also, you should have quant.sf files for each of your samples. They will be in individual subdirectories corresponding to your sample names within the star_salmon/ output directory.

Best,
Dave

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants