Modeller PIR Format IO
The homology modelling program, Modeller uses a special form of the PIR format where information about sequence numbering and chain codes are written into the 'description' line between the PIR protein tag and the protein alignment entry:
>P1;Q93Z60_ARATH sequence:Q93Z60_ARATH:1:.:118:.:. ----MASTALSSAIVSTSFLRRQQTPISLRSLPFANT-QSLFGLKS-STARGGRVTAMATYKVKFITPEGEQ EVECEEDVYVLDAAEEAGLDLPYSCRAGSCSSCAGKVVSGSIDQSD------QSFLD-D------------- ---------------------* >P1;PDB|1FER|_ structureX:1FER:1:.:105:.:. ----------------------------------------------------AFVVTDNCIKCKY---TDCV EV-CPVDCFY----EGPNFLVIHPDECIDCALCEPECPAQAIFSEDEVPEDMQEFIQLNAELAEVWPNITEK KDPLPDAEDWDGVKGKLQHLE*
Jalview will attempt to parse any PIR entries conforming to the Modeller/PIR format, in order to extract the sequence start and end numbering and (possibly) a PDB file reference. The description line information is always stored in the sequence description string - so no information is lost if this parsing process fails.
The 'Modeller Output' flag in the 'Output' tab of the Jalview Preferences dialog box controls whether Jalview will also output MODELLER style PIR files. In this case, any existing 'non-modeller PIR' header information in the description string of an alignment is appended to an automatically generated modeller description line for that sequence.
The general format used for generating the Modeller/PIR sequence description line is shown below :
>P1;Primary_Sequence_ID sequence or structureX:pdb-reference if available:start residue:start chain code:end residue:end chain code:. description textThe first field is either sequence or structureX, depending upon the presence of a PDB database ID for the sequence. If the protein has no PDB reference, then the chain code is not specified, unless one already existed when the sequence was imported into Jalview.