JAL-2759: Add cursor to HiddenColumns

Activity

CR-JAL-140 188

Keyboard shortcuts  
  • Summarize the review outcomes (optional)
     
    #permalink

    Details

    Warning: no files are visible, they have all been filtered.
    Participant Role Time Spent Comments Latest Comment
    Author 7h 56m 94 ok
    Reviewer - Complete 4h 41m 94 can replace method body with this(null, lowerBound, upper...
    Total   12h 38m 188  
    #permalink

    Objectives

    HiddenColumns methods typically iterate over the hidden columns collection to perform their operations. As a result HiddenColumns functionality scales poorly with increasing numbers of columns.

    Proposed solution is to add a cursor (similar to the recently added SequenceCursor) to cache recent positioning information, which should improve performance for localised hidden columns operations e.g. those operating within the alignment viewport.

    Branches in review

    #permalink

    Issues Raised From Comments

    Key Summary State Assignee
    #permalink

    General Comments

    Mungo Carstairs

    I'll comment on the classes generally although I realise some of it is refact...

    I'll comment on the classes generally although I realise some of it is refactoring / relocation of pre-existing code.

    Mungo Carstairs

    HiddenColumns is basically a list of ranges, decorated with a cursor, with va...

    HiddenColumns is basically a list of ranges, decorated with a cursor, with various operations exposed. It feels like this could be generalised (is not specific to hidden columns) e.g. in class and method naming. There are methods in it which perhaps belong elsewhere (see specific comments).
    Could it for example be called RangeList - in fact could it hold List<jalview.datamodel.Range> rather than List<int[]>?

    Kira Mourão

    In general I agree. I think I've now moved the last method that belongs elsew...

    In general I agree. I think I've now moved the last method that belongs elsewhere (locateVisibleBoundsOfSequence). What remains is to refactor the reveal code so that it does not necessarily return a ColumnSelection (potentially depends on refactoring/redesign of ColumnSelection though).

    Could use Range rather than List<int[]> but I think it ought to be benchmarked to check performance remains ok. Also that is potentially quite a big code change if the iterators are also updated to return List<Range> as that impacts all over the code.

    So - I agree but I think it should be done as a separate issue.

    Mungo Carstairs

    Thinking further on these lines: as a range list it could equally be used to ...

    Thinking further on these lines: as a range list it could equally be used to encode gaps in a sequence (something we have considered as an alternative to a sequence as string with gaps).
    Possibly the intention of class ShiftList, which however lacks adequate Javadoc (other than 'use at your own risk!'), tests, clearly named methods, and is buggy...
    Is this the opportunity to rationalise code and generalise for more use cases than just hidden columns?

    Kira Mourão

    As above

    As above

    Mungo Carstairs

    Also, is the datamodel for HiddenColumns any different from that for ColumnSe...

    Also, is the datamodel for HiddenColumns any different from that for ColumnSelection - could one serve for both?
    ColumnSelection holds columns both as List<Integer> and also as a BitSet.
    What is the rationale for storing List<Integer> for selected columns but List<int[]> for hidden columns?
    Could HiddenColumns also store a BitSet as well?
    I assume we want to optimise for read, not for write (a relatively infrequent operation).

    Mungo Carstairs

    Or to rephrase (yet again): *for correct function, it is necessary and suff...

    Or to rephrase (yet again):

    • for correct function, it is necessary and sufficient for a HiddenColumns model to be able to answer true or false to isColumnHidden(int columnNumber)
    • the same for ColumnSelection as isColumnSelected(int)
    • i.e. they are functionally equivalent
    • everything else is implementation details (read / write / memory performance)
      If we are providing an ingenious implementation, it would be desirable to make it serve for both.

    Kira Mourão

    I think if functional equivalence is defined as 'both data structures support...

    I think if functional equivalence is defined as 'both data structures support is-a-member-of' then we could argue that using just about any data structure is equivalent to another, so I think this is maybe a bit too simplistic.

    There is an interesting tweak in ColumnSelection where the list of selected columns is in selection order and not column order, used for example by Grouping:makeGroupsFromCols. Issues like that would need to be addressed by a common data structure. So while I agree that it seems likely a common list-of-ranges or similar data structure could (and should) be used here, I think it still needs some refactoring and design work to be done to (check and) make that possible.

    Kira Mourão

    I don't know the rationale for the design decisions around the data structure...

    I don't know the rationale for the design decisions around the data structures for ColumnSelection and HiddenColumns. I did consider the List<int[]> + BitSet option but was concerned about memory usage and about performance given the extra overhead of keeping both data structures up to date.

    I think optimise for read is the correct choice, as that impacts interactive user operations (scrolling, use of overview etc), but the write performance needs to be reasonable to accommodate e.g. propagateInsertions where a large number of non-contiguous columns can be hidden at once.

    Mungo Carstairs

    Memory usage is unlikely to be an issue as there are not many of these object...

    Memory usage is unlikely to be an issue as there are not many of these objects in memory (small footprint compared to the size of alignment/annotation data), also BitSet is extremely compact.
    Write performance ditto, since propagateInsertions is only called after a JPred prediction, so would only make up a small fraction of the overall elapsed time for that.
    So optimising for read is definitely the priority.
    Good point about ColumnSelection and selection order though. Though (notwithstanding comments in JAL-2172) I am dubious whether this is actually intended / desirable behaviour.

    Mungo Carstairs

    So what if, say: *the hidden columns were held as both a BitSet and a List<...

    So what if, say:

    • the hidden columns were held as both a BitSet and a List<int[]>
    • read operations use whichever is more suitable / performant
    • update operations update the BitSet, then convert it to the List<int[]> (don't attempt to update the list in situ as too complicated)
    • can also derive numColumns at the same time - or just BitSet.cardinality() when wanted, should be fast enough


    Simple code. Small redundancy in storage. Negligible hit on update performance.
    (And I would argue reusable for ColumnSelection with minimal abstraction.)

    Kira Mourão

    We are going around in circles here. I already more or less considered this -...

    We are going around in circles here. I already more or less considered this - it is what I wanted to do when I was looking at RoaringBitmaps (which are a souped up Bitset implementation). It might be a lot easier to do now that HiddenColumns is substantially cleaned up but I think there are still issues to consider.

    In terms of memory: Current Java implementation of Bitset uses relatively large amounts of memory with only a few entries, if those entries happen to be at the high end (see e.g. http://java-performance.info/bit-sets/ ) because it has to allocate an array to hold all the unset values up to the highest set entry. This seems to me to be quite a likely use case in Jalview. This is especially a consideration if we are going to plan to use multiple alignment-wide Bitsets (for hidden columns, column selections, gaps, ... more?)

    In terms of performance: I didn't test Bitset, but RoaringBitmaps, however they are meant to be more performant so it would be surprising if Bitset were much better. Performance was largely worse or similar to the pre-existing implementation. One of the things which made the performance worse was poor performance of the cardinality calculation (relative to the implementation then, of calculating the full sum over the list of ranges) and inversion operations - both of which, unfortunately, are valuable for simplification of the code.

    I do agree with you about re-use for ColumnSelection and/or gaps, probably with another iteration over HiddenColumns to get it into suitable shape. The re-use options for HiddenColumns have only really become apparent/possible with the recent clean up, or I would have treated this as a redesign rather than a refactor. I think now the plan for re-use ought to be done as a proper design task, considering the implications and possibilities, with documented decisions and discussed before implementation, rather than a post-hoc design change via code review.

    Mungo Carstairs

    Agree entirely

    Agree entirely

    /benchmarking/.../jalview/HiddenColumnsBenchmark.java Changed
    /benchmarking/README Changed
    /src/jalview/appletgui/AlignFrame.java Changed 3
    /src/.../appletgui/AnnotationColumnChooser.java Changed
    Open in IDE #permalink
    /src/.../appletgui/AnnotationLabels.java Changed
    Open in IDE #permalink
    /src/jalview/appletgui/ScalePanel.java Changed
    /src/jalview/appletgui/SeqCanvas.java Changed 2
    /src/jalview/appletgui/SeqCanvas.java Changed
    /src/.../datamodel/BoundedHiddenColsIterator.java Added 4
    Open in IDE #permalink
    /src/.../datamodel/BoundedStartRegionIterator.java Added 5
    /src/jalview/datamodel/HiddenColumns.java Changed 123
    /src/.../datamodel/HiddenColumnsCursor.java Added 15
    /src/.../datamodel/HiddenCursorPosition.java Added 2
    /src/.../datamodel/RegionsIterator.java Added 2
    /src/.../datamodel/ReverseRegionsIterator.java Added 2
    Open in IDE #permalink
    /src/jalview/datamodel/Sequence.java Changed
    /src/jalview/datamodel/SequenceI.java Changed
    /src/.../datamodel/VisibleColsIterator.java Added 4
    /src/.../datamodel/VisibleContigsIterator.java Added
    /src/jalview/gui/AlignFrame.java Changed
    /src/jalview/gui/AlignViewport.java Changed
    Open in IDE #permalink
    /src/.../gui/AnnotationColumnChooser.java Changed
    Open in IDE #permalink
    /src/jalview/gui/AnnotationLabels.java Changed
    /src/jalview/gui/Jalview2XML.java Changed
    Open in IDE #permalink
    /src/jalview/gui/ScalePanel.java Changed
    /src/jalview/gui/SeqCanvas.java Changed
    /src/jalview/gui/SeqCanvas.java Changed 2
    /src/jalview/gui/SeqPanel.java Changed
    /src/jalview/gui/VamsasApplication.java Changed
    Open in IDE #permalink
    /src/jalview/renderer/ScaleRenderer.java Changed
    /src/jalview/util/MappingUtils.java Changed
    Open in IDE #permalink
    /test/jalview/analysis/DnaTest.java Changed
    Open in IDE #permalink
    /test/.../datamodel/BoundedStartRegionIteratorTest.java Added
    Open in IDE #permalink
    /test/.../datamodel/HiddenColumnsCursorTest.java Added
    /test/.../datamodel/HiddenColumnsTest.java Changed 11
    /test/jalview/datamodel/SequenceTest.java Changed
    /test/.../datamodel/VisibleContigsIteratorTest.java Added
    Open in IDE #permalink
    /test/jalview/util/MappingUtilsTest.java Changed
    Open in IDE #permalink

    Review updated: Reload | Ignore | Collapse

    You cannot reload the review while writing a comment.

    Log time against