Reference API

This page offers a brief overview over all classes and functions offered by HTSeq.

Parser and record classes

For all supported file formats, parser classes (called Reader) are provided. These classes all instatiated by giving a file name or an open file or stream and the function as iterator generators, i.e., the parser objects can be used, e.g., in a for loop to yield a sequence of objects, each desribing one record. The table gives the parse class and the record class yielded. For details, see the linked documentation

For most formats, functionality for writing files of the format is provided. See the detailed documentation as these methods and classes have varying semantics.

File format

typical content

Parser class for reading

Record class yielded by parser

Method/class method for writing

FASTA

DNA sequences

FastaReader

Sequence

Sequence.write_to_fasta_file()

FASTQ

sequenced reads

FastqReader

SequenceWithQualities

SequenceWithQuality.write_to_fastq_file()

GFF (incl. GFF3 and GTF)

genomic annotation

GFF_Reader

GenomicFeature

GenomicFeature.get_gff_line()

BED

score data or annotation

BED_Reader

GenomicFeature

Wiggle (incl. BedGraph)

score data on a genome

WiggleReader

pair: (GenomicInterval, float)

GenomicArray.write_bedgraph_file()

SAM

aligned reads

SAM_Reader

SAM_Alignment

SAM_Alignment.get_sam_line()

BAM

aligned reads

BAM_Reader

SAM_Alignment

BAM_Writer

VCF

variant calls

VCF_Reader

VariantCall

Bowtie (legacy format)

aligned reads

BowtieReader

BowtieAlignment

SolexaExport (legacy format)

aligned reads

SolexaExportReader

SolexaExportAlignment

Most parser classes are sub-classes of class FileOrSequence, which users will, however, rarely use directly.

Specifying genomic positions and intervals

The class GenomicInterval specifies an interval on a chromosome (or contig or the like). It is defined by specifying the chromosome (or contig) name, the start and the end and the strand. Convenience methods are offered for different ways of accessing this information, and for tetsing for overlap between intervals. A GenomicPosition, technically a GenomicInterval of length 1, refers to a single nucleotide or base-pair position.

Objects of these classes are used internally wherever intervals or positions are represented, especially in record classes and as index keys to genomic array.

See page Positions, intervals and arrays for details.

Genomic arrays

The classes GenomicArray and GenomicArrayOfSets are container classes to store data associated with genomic positions or intervals.

See page Positions, intervals and arrays for details.

Special features for SAM/BAM files

The class CigarOperation offers a convenient way to handle the information encoded in the CIGAR field of SAM files.

The functions pair_SAM_alignments() and pair_SAM_alignments_with_buffer() help to pair up the records in a SAM file that describe a pair of alignments for mated reads from the same DNA fragment.

Similarly, the function bundle_multiple_alignments() bundles multiple alignment record pertaining to the same read or read pair.

See page Read alignments for details.