CR-JAL-140 188
- Details
- Objectives
- General Comments 13
- Unresolved
- Resolved
- Number of files included: 38
-
jalview
0
-
Folder
benchmarking
0
-
Folder
src/main/java/org/jalview
0
- File HiddenColumnsBenchmark.java 0 Remove
- File README 0 Remove
-
Folder
src/main/java/org/jalview
0
-
Folder
src/jalview
0
-
Folder
appletgui
0
- File AlignFrame.java 3 Remove
- File AnnotationColumnChooser.java 0 Remove
- File AnnotationLabels.java 0 Remove
- File ScalePanel.java 0 Remove
- File SeqCanvas.java 2 Remove
- File SeqCanvas.java 0 Remove
-
Folder
datamodel
0
- File BoundedHiddenColsIterator.java 4 Remove
- File BoundedStartRegionIterator.java 5 Remove
- File HiddenColumns.java 123 Remove
- File HiddenColumnsCursor.java 15 Remove
- File HiddenCursorPosition.java 2 Remove
- File RegionsIterator.java 2 Remove
- File ReverseRegionsIterator.java 2 Remove
- File Sequence.java 0 Remove
- File SequenceI.java 0 Remove
- File VisibleColsIterator.java 4 Remove
- File VisibleContigsIterator.java 0 Remove
-
Folder
gui
0
- File AlignFrame.java 0 Remove
- File AlignViewport.java 0 Remove
- File AnnotationColumnChooser.java 0 Remove
- File AnnotationLabels.java 0 Remove
- File Jalview2XML.java 0 Remove
- File ScalePanel.java 0 Remove
- File SeqCanvas.java 0 Remove
- File SeqCanvas.java 2 Remove
- File SeqPanel.java 0 Remove
- File VamsasApplication.java 0 Remove
-
Folder
renderer
0
- File ScaleRenderer.java 0 Remove
-
Folder
util
0
- File MappingUtils.java 0 Remove
-
Folder
appletgui
0
-
Folder
test/jalview
0
-
Folder
analysis
0
- File DnaTest.java 0 Remove
-
Folder
datamodel
0
- File BoundedStartRegionIteratorTest.java 0 Remove
- File HiddenColumnsCursorTest.java 0 Remove
- File HiddenColumnsTest.java 11 Remove
- File SequenceTest.java 0 Remove
- File VisibleContigsIteratorTest.java 0 Remove
-
Folder
util
0
- File MappingUtilsTest.java 0 Remove
-
Folder
analysis
0
-
Folder
benchmarking
0
-
Filter
- Only show me content:
- Unfiltered files: dynamically added content
- Filtered files: dynamically added content
- Clear filters
Summarize the review outcomes (optional)
Details
Participant | Role | Time Spent | Comments | Latest Comment |
---|---|---|---|---|
Author | 7h 56m | 94 | ok | |
Reviewer - Complete | 4h 41m | 94 | can replace method body with this(null, lowerBound, upper... | |
Total | 12h 38m | 188 |
- Linked Issue:
-
Objectives
HiddenColumns methods typically iterate over the hidden columns collection to perform their operations. As a result HiddenColumns functionality scales poorly with increasing numbers of columns.
Proposed solution is to add a cursor (similar to the recently added SequenceCursor) to cache recent positioning information, which should improve performance for localised hidden columns operations e.g. those operating within the alignment viewport.
Branches in review
Repository | Branch to review | Branched from |
---|
General Comments
Mungo Carstairs
HiddenColumns is basically a list of ranges, decorated with a cursor, with va...HiddenColumns is basically a list of ranges, decorated with a cursor, with various operations exposed. It feels like this could be generalised (is not specific to hidden columns) e.g. in class and method naming. There are methods in it which perhaps belong elsewhere (see specific comments).
Could it for example be called RangeList - in fact could it hold List<jalview.datamodel.Range> rather than List<int[]>?
Kira Mourão
In general I agree. I think I've now moved the last method that belongs elsew...In general I agree. I think I've now moved the last method that belongs elsewhere (locateVisibleBoundsOfSequence). What remains is to refactor the reveal code so that it does not necessarily return a ColumnSelection (potentially depends on refactoring/redesign of ColumnSelection though).
Could use Range rather than List<int[]> but I think it ought to be benchmarked to check performance remains ok. Also that is potentially quite a big code change if the iterators are also updated to return List<Range> as that impacts all over the code.
So - I agree but I think it should be done as a separate issue.
Mungo Carstairs
Thinking further on these lines: as a range list it could equally be used to ...Thinking further on these lines: as a range list it could equally be used to encode gaps in a sequence (something we have considered as an alternative to a sequence as string with gaps).
Possibly the intention of class ShiftList, which however lacks adequate Javadoc (other than 'use at your own risk!'), tests, clearly named methods, and is buggy...
Is this the opportunity to rationalise code and generalise for more use cases than just hidden columns?
Mungo Carstairs
Also, is the datamodel for HiddenColumns any different from that for ColumnSe...Also, is the datamodel for HiddenColumns any different from that for ColumnSelection - could one serve for both?
ColumnSelection holds columns both as List<Integer> and also as a BitSet.
What is the rationale for storing List<Integer> for selected columns but List<int[]> for hidden columns?
Could HiddenColumns also store a BitSet as well?
I assume we want to optimise for read, not for write (a relatively infrequent operation).
Mungo Carstairs
Or to rephrase (yet again): *for correct function, it is necessary and suff...Or to rephrase (yet again):
- for correct function, it is necessary and sufficient for a HiddenColumns model to be able to answer true or false to isColumnHidden(int columnNumber)
- the same for ColumnSelection as isColumnSelected(int)
- i.e. they are functionally equivalent
- everything else is implementation details (read / write / memory performance)
If we are providing an ingenious implementation, it would be desirable to make it serve for both.
Kira Mourão
I think if functional equivalence is defined as 'both data structures support...I think if functional equivalence is defined as 'both data structures support is-a-member-of' then we could argue that using just about any data structure is equivalent to another, so I think this is maybe a bit too simplistic.
There is an interesting tweak in ColumnSelection where the list of selected columns is in selection order and not column order, used for example by Grouping:makeGroupsFromCols. Issues like that would need to be addressed by a common data structure. So while I agree that it seems likely a common list-of-ranges or similar data structure could (and should) be used here, I think it still needs some refactoring and design work to be done to (check and) make that possible.
Kira Mourão
I don't know the rationale for the design decisions around the data structure...I don't know the rationale for the design decisions around the data structures for ColumnSelection and HiddenColumns. I did consider the List<int[]> + BitSet option but was concerned about memory usage and about performance given the extra overhead of keeping both data structures up to date.
I think optimise for read is the correct choice, as that impacts interactive user operations (scrolling, use of overview etc), but the write performance needs to be reasonable to accommodate e.g. propagateInsertions where a large number of non-contiguous columns can be hidden at once.
Mungo Carstairs
Memory usage is unlikely to be an issue as there are not many of these object...Memory usage is unlikely to be an issue as there are not many of these objects in memory (small footprint compared to the size of alignment/annotation data), also BitSet is extremely compact.
Write performance ditto, since propagateInsertions is only called after a JPred prediction, so would only make up a small fraction of the overall elapsed time for that.
So optimising for read is definitely the priority.
Good point about ColumnSelection and selection order though. Though (notwithstanding comments in JAL-2172) I am dubious whether this is actually intended / desirable behaviour.
Mungo Carstairs
So what if, say: *the hidden columns were held as both a BitSet and a List<...So what if, say:
- the hidden columns were held as both a BitSet and a List<int[]>
- read operations use whichever is more suitable / performant
- update operations update the BitSet, then convert it to the List<int[]> (don't attempt to update the list in situ as too complicated)
- can also derive numColumns at the same time - or just BitSet.cardinality() when wanted, should be fast enough
Simple code. Small redundancy in storage. Negligible hit on update performance.
(And I would argue reusable for ColumnSelection with minimal abstraction.)
Kira Mourão
We are going around in circles here. I already more or less considered this -...We are going around in circles here. I already more or less considered this - it is what I wanted to do when I was looking at RoaringBitmaps (which are a souped up Bitset implementation). It might be a lot easier to do now that HiddenColumns is substantially cleaned up but I think there are still issues to consider.
In terms of memory: Current Java implementation of Bitset uses relatively large amounts of memory with only a few entries, if those entries happen to be at the high end (see e.g. http://java-performance.info/bit-sets/ ) because it has to allocate an array to hold all the unset values up to the highest set entry. This seems to me to be quite a likely use case in Jalview. This is especially a consideration if we are going to plan to use multiple alignment-wide Bitsets (for hidden columns, column selections, gaps, ... more?)
In terms of performance: I didn't test Bitset, but RoaringBitmaps, however they are meant to be more performant so it would be surprising if Bitset were much better. Performance was largely worse or similar to the pre-existing implementation. One of the things which made the performance worse was poor performance of the cardinality calculation (relative to the implementation then, of calculating the full sum over the list of ranges) and inversion operations - both of which, unfortunately, are valuable for simplification of the code.
I do agree with you about re-use for ColumnSelection and/or gaps, probably with another iteration over HiddenColumns to get it into suitable shape. The re-use options for HiddenColumns have only really become apparent/possible with the recent clean up, or I would have treated this as a redesign rather than a refactor. I think now the plan for re-use ought to be done as a proper design task, considering the implications and possibilities, with documented decisions and discussed before implementation, rather than a post-hoc design change via code review.
I'll comment on the classes generally although I realise some of it is refactoring / relocation of pre-existing code.