NOTE: The Basic Data Skills Introduction to the command-line interface workshop is a prerequisite.
- Please study the contents and work through all the exercises within the following lessons:
Time | Topic | Instructor |
---|---|---|
09:30 - 09:45 | Workshop Introduction | Meeta |
09:45 - 11:00 | Understanding chromatin biology using high-throughput sequencing | Dr. Shannan Ho Sui |
11:00- 11:05 | Break | |
11:05 - 11:20 | HPC review Q&A | Will |
11:20 - 11:50 | Dataset overview and project organization | Will |
11:50 - 12:00 | Overview of self-learning materials and homework submission | Meeta |
I. Please study the contents and work through all the code within the following lessons:
-
Experimental design considerations for HTS of chromatin
Click here for a preview of this lesson
Before you begin thinking about performing the experiment, it is important to plan for it and choose a protocol that is best suited for you. There are many things to consider depending on the cells you are working with, and your protein of interest.
In this lesson, we will:
- Highlight the experimental design considerations for ChIP-seq and compare and contrast with CUT&RUN and ATAC-seq
- Highlight the sequencing considerations for each methods listed above
-
Quality Control of Sequence Data: Running FASTQC and evaluating results
Click here for a preview of this lesson
The first step of most NGS analyses is to evaluate the quality of your sequencing reads.
In this lesson you will explore:
- The FASTQC software, and how to run it on your raw sequencing data
- The HTML report that is returned from FASTQC and how to interepret the different plots
-
Click here for a preview of this lesson
The next step is taking our high quality reads and figuring out where in the genome the originated from. In theory this seems like a simple task, but in practice it is quite challenging.
In this lesson you will cover:
- The Bowtie2 software, a popular tool for aligning DNA sequence reads
- Alignment file formats
- How to run your alignment as a job on the cluster
NOTE: To run through the code above, you will need to be logged into O2 and working on a compute node (i.e. your command prompt should have the word
compute
in it). Need a refresher on the cluster? Check out this lesson from the pre-reading assignment.
- Log in using
ssh [email protected]
and enter your password (replace the "XX" in the username with the number you were assigned in class). Your login information can be found here.- Once you are on the login node, use
srun --pty -p interactive -t 0-2:30 --mem 1G /bin/bash
to get on a compute node or as specified in the lesson. > 3. Proceed only once your command prompt has the wordcompute
in it.- If you log out between lessons (using the
exit
command twice), please follow points 1. and 2. above to log back in and get on a compute node when you restart with the self learning.
- Complete the exercises:
- Each lesson above contains exercises; please go through each of them.
- Copy over your solutions into the Google Forms the day before the next class.
- If you get stuck due to an error while runnning code in the lesson, email us
Time | Topic | Instructor |
---|---|---|
9:30 - 10:15 | Self-learning lessons review | All |
10:15 - 11:00 | Filtering BAM files | Will |
11:00 - 11:05 | Break | |
11:05 - 12:00 | Peak calling | Meeta |
I. Please study the contents and work through all the code within the following lessons:
-
Handling peak files using
bedtools
Click here for a preview of this lesson
Now that we have called peaks for each of our samples, it's time to look at the output. The output of MACS2 includes various files, with the narrowPeak file being the most important for interpretation.
In this lesson you will cover:
- The basics of the BED file format (and how it extends to narrowPeak files)
- The bedtools suite of tools
- Filtering and intersecting BED files
-
File formats for peak visualization
Click here for a preview of this lesson
ChIP-seq data is best evaluated by visualizing peaks. However, in order to do so we require the appropriate file formats.
In this lesson you will:
- Learn about different file formats for peak visualization
- Create bigWig files
- Discuss normalization metrics and considerations when choosing a method
-
Qualitative assessment of peak enrichment using deepTools
Click here for a preview of this lesson
An exciting component of ChIP-seq analysis is to be able to visualize your results, and gain some biologically meaningful insight. This may in turn generate hypothesis for you to further explore with your data!
In this lesson you will learn:
- How to use deepTools to create heatmaps and profile plots
- To ask questions about your data and find answers through visualization
-
Complete the exercises:
- Each lesson above contains exercises; please go through each of them.
- Copy over your solutions into the Google Forms the day before the next class.
- If you get stuck due to an error while runnning code in the lesson, email us
Time | Topic | Instructor |
---|---|---|
9:30 - 10:00 | Self-learning lessons review | All |
10:00 - 10:30 | Troubleshooting your ChIP-seq analysis | Meeta |
10:30 - 10:35 | Break | |
10:35 - 11:45 | Automating the ChIP-seq workflow | Will |
11:45 - 12:00 | Wrap-up | Meeta |
-
Day 1 exercises
-
Day 2 exercises
-
Day 3 In-class
- ENCODE Data Standards and Processing Pipeline Information for Histone and Transcription Factors
- ENCODE guidelines and practices for ChIP-seq. An older paper, but a good outline of general best practices.
- Experimental design considerations:
- Thermofisher Step-by-step guide to a successful ChIP experiment
- "Chromatin Immunoprecipitation (ChIP) Principles and How to Obtain Quality Results", BenchSci Blog
- O’Geen et al (2011), Methods Mol Biol - A focus on performing ChIP assays to characterize histone modifications
- Jung et al (2014). NAR. - Impact of sequencing depth in ChIP-seq experiments
- Integration of ChIP-seq and RNA-seq
- Advanced bash commands (aliases, copying files, and symlinks)
- Introduction to R workshop materials
These materials have been developed by members of the teaching team at the Harvard Chan Bioinformatics Core (HBC). These are open access materials distributed under the terms of the Creative Commons Attribution license (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.