import jalview.datamodel.SequenceFeature;\r
import jalview.datamodel.SequenceI;\r
\r
+import java.util.Enumeration;\r
+import java.util.Hashtable;\r
import java.util.Iterator;\r
import java.util.Vector;\r
\r
public void setVersion(String version) {\r
this.version = version;\r
}\r
+/*\r
+ * EMBL Feature support is limited. The text below is included for the benefit of\r
+ * any developer working on improving EMBL feature import in Jalview.\r
+ * Extract from EMBL feature specification\r
+ * see http://www.embl-ebi.ac.uk/embl/Documentation/FT_definitions/feature_table.html\r
+3.5 Location\r
+3.5.1 Purpose\r
\r
+The location indicates the region of the presented sequence which corresponds \r
+to a feature. \r
+\r
+3.5.2 Format and conventions\r
+The location contains at least one sequence location descriptor and may \r
+contain one or more operators with one or more sequence location descriptors. \r
+Base numbers refer to the numbering in the entry. This numbering designates \r
+the first base (5' end) of the presented sequence as base 1. \r
+Base locations beyond the range of the presented sequence may not be used in \r
+location descriptors, the only exception being location in a remote entry (see \r
+3.5.2.1, e). \r
+\r
+Location operators and descriptors are discussed in more detail below. \r
+\r
+3.5.2.1 Location descriptors\r
+\r
+The location descriptor can be one of the following: \r
+(a) a single base number\r
+(b) a site between two indicated adjoining bases\r
+(c) a single base chosen from within a specified range of bases (not allowed for new\r
+ entries)\r
+(d) the base numbers delimiting a sequence span\r
+(e) a remote entry identifier followed by a local location descriptor\r
+ (i.e., a-d)\r
+\r
+A site between two adjoining nucleotides, such as endonucleolytic cleavage \r
+site, is indicated by listing the two points separated by a carat (^). The \r
+permitted formats for this descriptor are n^n+1 (for example 55^56), or, for \r
+circular molecules, n^1, where "n" is the full length of the molecule, ie \r
+1000^1 for circular molecule with length 1000.\r
+\r
+A single base chosen from a range of bases is indicated by the first base\r
+number and the last base number of the range separated by a single period\r
+(e.g., '12.21' indicates a single base taken from between the indicated\r
+points). From October 2006 the usage of this descriptor is restricted :\r
+it is illegal to use "a single base from a range" (c) either on its own or\r
+in combination with the "sequence span" (d) descriptor for newly created entries.\r
+The existing entries where such descriptors exist are going to be retrofitted.\r
+\r
+Sequence spans are indicated by the starting base number and the ending base \r
+number separated by two periods (e.g., '34..456'). The '<' and '>' symbols may \r
+be used with the starting and ending base numbers to indicate that an end \r
+point is beyond the specified base number. The starting and ending base \r
+positions can be represented as distinct base numbers ('34..456') or a site \r
+between two indicated adjoining bases. \r
+\r
+A location in a remote entry (not the entry to which the feature table \r
+belongs) can be specified by giving the accession-number and sequence version \r
+of the remote entry, followed by a colon ":", followed by a location \r
+descriptor which applies to that entry's sequence (i.e. J12345.1:1..15, see \r
+also examples below) \r
+\r
+3.5.2.2 Operators\r
+\r
+The location operator is a prefix that specifies what must be done to the \r
+indicated sequence to find or construct the location corresponding to the \r
+feature. A list of operators is given below with their definitions and most \r
+common format. \r
+\r
+complement(location) \r
+Find the complement of the presented sequence in the span specified by "\r
+location" (i.e., read the complement of the presented strand in its 5'-to-3' \r
+direction) \r
+\r
+join(location,location, ... location) \r
+The indicated elements should be joined (placed end-to-end) to form one \r
+contiguous sequence \r
+\r
+order(location,location, ... location) \r
+The elements can be found in the \r
+specified order (5' to 3' direction), but nothing is implied about the \r
+reasonableness about joining them \r
+\r
+Note : location operator "complement" can be used in combination with either "\r
+join" or "order" within the same location; combinations of "join" and "order" \r
+within the same location (nested operators) are illegal.\r
+\r
+\r
+\r
+3.5.3 Location examples \r
+\r
+The following is a list of common location descriptors with their meanings: \r
+\r
+Location Description \r
+\r
+467 Points to a single base in the presented sequence \r
+\r
+340..565 Points to a continuous range of bases bounded by and\r
+ including the starting and ending bases\r
+\r
+<345..500 Indicates that the exact lower boundary point of a feature\r
+ is unknown. The location begins at some base previous to\r
+ the first base specified (which need not be contained in \r
+ the presented sequence) and continues to and includes the \r
+ ending base \r
+\r
+<1..888 The feature starts before the first sequenced base and \r
+ continues to and includes base 888\r
+\r
+1..>888 The feature starts at the first sequenced base and \r
+ continues beyond base 888\r
+\r
+102.110 Indicates that the exact location is unknown but that it is \r
+ one of the bases between bases 102 and 110, inclusive\r
+\r
+123^124 Points to a site between bases 123 and 124\r
+\r
+join(12..78,134..202) Regions 12 to 78 and 134 to 202 should be joined to form \r
+ one contiguous sequence\r
+\r
+\r
+complement(34..126) Start at the base complementary to 126 and finish at the \r
+ base complementary to base 34 (the feature is on the strand \r
+ complementary to the presented strand)\r
+\r
+\r
+complement(join(2691..4571,4918..5163))\r
+ Joins regions 2691 to 4571 and 4918 to 5163, then \r
+ complements the joined segments (the feature is on the \r
+ strand complementary to the presented strand) \r
+\r
+join(complement(4918..5163),complement(2691..4571))\r
+ Complements regions 4918 to 5163 and 2691 to 4571, then \r
+ joins the complemented segments (the feature is on the \r
+ strand complementary to the presented strand)\r
+ \r
+J00194.1:100..202 Points to bases 100 to 202, inclusive, in the entry (in \r
+ this database) with primary accession number 'J00194'\r
+ \r
+join(1..100,J00194.1:100..202)\r
+ Joins region 1..100 of the existing entry with the region\r
+ 100..202 of remote entry J00194\r
+\r
+ */\r
/**\r
* Recover annotated sequences from EMBL file\r
* @param noNa don't return nucleic acid sequences \r
String prseq=null;\r
String prname=new String();\r
String prid=null;\r
+ Hashtable vals=new Hashtable();\r
int prstart=1;\r
// get qualifiers\r
if (feature.getQualifiers()!=null && feature.getQualifiers().size()>0) {\r
{\r
prstart = Integer.parseInt(q.getValue());\r
}\r
- else {\r
- // throw anything else into the title\r
- if (prname.length()==0) {\r
- prname = q.getValue();\r
- } else {\r
- prname = prname + q.getName()+":"+q.getValue();\r
- }\r
+ else \r
+ if (q.getName().equals("product")){\r
+ prname = q.getValue();\r
+ } else {\r
+ // throw anything else into the additional properties hash\r
+ vals.put(q.getName(), q.getValue());\r
}\r
}\r
}\r
sf.setType(feature.getName());\r
sf.setFeatureGroup(jalview.datamodel.DBRefSource.EMBL);\r
sf.setDescription("Exon "+(1+xint)+" for protein '"+prname+"' EMBLCDS:"+prid);\r
+ if (vals!=null && vals.size()>0) {\r
+ Enumeration kv = vals.elements();\r
+ while (kv.hasMoreElements()) {\r
+ Object key=kv.nextElement();\r
+ if (key!=null)\r
+ sf.setValue(key.toString(), vals.get(key));\r
+ }\r
+ }\r
dna.addSequenceFeature(sf);\r
}\r
}\r