Download links and installation instructions can be found here
The Tour shows you how to get started. It explains how to install HTSeq, and then demonstrates typical analysis steps with explicit examples. Read this first, and then see the Reference for details.
Tutorial: Transcription start sites (TSS)
This chapter explains typical usage patterns for HTSeq by explaining in detail three different solutions to the same programming task.
This chapter explorer in detail the use case of counting the overlap of reads with annotation features and explains how to implement custom logic by writing on’s own customized counting scripts
The various classes of HTSeq are described here.
A brief overview over all classes.
Sequences and FASTA/FASTQ files
In order to represent sequences and reads (i.e., sequences with base-call quality information), the classes
SequenceWithQualitiesare used. The classes
FastqReaderallow to parse FASTA and FASTQ files.
Positions, intervals and arrays
GenomicPositionrepresent intervals and positions in a genome. The class
GenomicArrayis an all-purpose container with easy access via a genomic interval or position, and
GenomicArrayOfSetsis a special case useful to deal with genomic features (such as genes, exons, etc.)
To process the output from short read aligners in various formats (e.g., SAM), the classes described here are used, to represent output files and alignments, i.e., reads with their alignment information.
GFF_Readerhelp to deal with genomic annotation data.
This page describes classes to parse VCF, Wiggle and BED files.
The following scripts can be used without any Python knowledge.
Quality Assessment with htseq-qa
Given a FASTQ or SAM file, this script produces a PDF file with plots depicting the base calls and base-call qualities by position in the read. This is useful to assess the technical quality of a sequencing run.
htseq-count: counting reads within features
Given one/multiple SAM/BAM/CRAM files with alignments and a GTF file with genomic features, this script counts how many reads map to each feature. This script is especially popular for bulk and single-cell RNA-Seq analysis.
htseq-count-barcodes: counting reads with cell barcodes and UMIs
Similar to htseq-count, but for a single SAM/BAM/CRAM file containing reads with cell and molecular barcodes (e.g. 10X Genomics cellranger output). This script enables customization of single-cell RNA-Seq pipelines, e.g. to quantify exon-level expression or simply to obtain a count matrix that contains chromosome information additional feature metadata.