Skip to content

Take care of coordinate modifcation in selector probability_ratio #1076

Merged
dwhswenson merged 4 commits into
openpathsampling:masterfrom
sroet:add_modified_snapshot_selector
Oct 7, 2021
Merged

Take care of coordinate modifcation in selector probability_ratio #1076
dwhswenson merged 4 commits into
openpathsampling:masterfrom
sroet:add_modified_snapshot_selector

Conversation

@sroet

@sroet sroet commented Oct 6, 2021

Copy link
Copy Markdown
Member

From writing down the code in this comment I noticed that selector.probability_ratio might not properly work for snapshots that get their coordinates altered.

How does this fail?

  • selector.probability_ratio takes a frame, normally the original selected shooting point, the original trajectory, and a new trajectory
  • that frame can be modified before generating the new trajectory, which would lead to the situation that new_traj.index(frame) raises a ValueError if the coordinates were modified
  • the added test in 3ad2cc3 shows that this behaviour indeed fails

Why is the ValueError raised when you alter the coordinates, but not with velocity modifications?

  • Trajectory objects are essentially lists and inherits their way of checking for 'index'
  • Trajectory.index(snapshot) looks at the hashes of all internal Snapshots and compares them against hash(snapshot)
    (Not true; in general it checks ==, which for all objects in OPS this checks their __uuid__ (which is also used as part of the hash))
  • hash(snapshot) is determined by the coordinates -> identical coordinates generate identical hashes (This is not true and based on me misremembering things)
  • altering the coordinates of a snapshot alters the hash (This is not true and based on me misremembering things)
  • the hash of original frame does therefor is not found in the altered trajectory, leading to a failure of the index call

Why is this only an issue for this selector

  • All other selectors sidestep the issue by not relying on the snapshot hash (either by setting the ratio to 1.0 or the probability of all snapshots to 1.0)

Solution

This PR adds a keyword argument new_snapshot to probability_ratio, which defaults to snapshot if not given for all internal OPS selectors.

Possible OPS 2.0 alterations

@dwhswenson Should we update the call signature of this function to (old_snapshot, old_trajectory, new_snapshot, new_trajectory) in 2.0 and raise a Deprecation warning if new_snapshot is not defined for 1.X?

The integration of calling the probability ratio with the new snapshot inside the EngineMover will be done as part of #1075

@codecov

codecov Bot commented Oct 6, 2021

Copy link
Copy Markdown

Codecov Report

Merging #1076 (187a6cd) into master (d28571f) will increase coverage by 0.00%.
The diff coverage is 100.00%.

Impacted file tree graph

@@           Coverage Diff           @@
##           master    #1076   +/-   ##
=======================================
  Coverage   81.58%   81.59%           
=======================================
  Files         140      140           
  Lines       15416    15427   +11     
=======================================
+ Hits        12577    12587   +10     
- Misses       2839     2840    +1     
Impacted Files Coverage Δ
openpathsampling/deprecations.py 100.00% <100.00%> (ø)
openpathsampling/pathmovers/spring_shooting.py 100.00% <100.00%> (ø)
openpathsampling/shooting.py 100.00% <100.00%> (ø)
openpathsampling/netcdfplus/cache.py 63.72% <0.00%> (-0.33%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d28571f...187a6cd. Read the comment docs.

@dwhswenson

Copy link
Copy Markdown
Member
  • Trajectory.index(snapshot) looks at the hashes of all internal Snapshots and compares them against hash(snapshot)
  • hash(snapshot) is determined by the coordinates -> identical coordinates generate identical hashes

This combination of facts seems like a bug to me, although I don't see them in the code. Where are you getting this from? I don't see a reimplementation of index, but if we did, it should return based on equality, not based on hash. Python's list.index returns the first object that matches on ==. I'm also surprised that snapshot hash is based on coords -- it seems to be that it should be on UUID.

Before a detailed line-by-line review, I'd also like to see a test where you move the snapshot so that it does not cross the interface. In that case, you should get 0 acceptance probability (because there is no possibility of choosing this snapshot in the reverse move).

I'm fine with this actually calculating the (wasted) dynamics and giving zero acceptance, because that's a use case the users have to seriously try hard to reach (choose interface-constrained shooting with two-way shooting with a modifier that changes coordinates). However, it's worth keeping this in mind in the context of follow-ups to #1075, in particular anything that deals with early rejection in the sense of #794 (comment). (My thought is to calculate the bias components in the __call__ method; if any are 0, we can early-abort).

@sroet

sroet commented Oct 6, 2021

Copy link
Copy Markdown
Member Author

This combination of facts seems like a bug to me, although I don't see them in the code. Where are you getting this from? I don't see a reimplementation of index, but if we did, it should return based on equality, not based on hash. Python's list.index returns the first object that matches on ==. I'm also surprised that snapshot hash is based on coords -- it seems to be that it should be on UUID.

I checked it and it is based on __eq__ which is an __uuid__ comparison for all StorableObjects (Like snapshots).

I definetly misremembered the hashes being determined by the coordinates for snapshot. Was this ever a plan in an early version of OPS? (Don't know where I got this from otherwise, might have dreamt it up 🤷‍♂️ )

I was also quite certain at least some fast things inside python were done based on hashes (why would for example dict otherwise complain about unhashable keys...), but even these abominations work somehow on python 3.9.7:

>>> class b():
...     def __hash__(self):
...         return 1
... 
>>> a = b()
>>> c = b()
>>> a == c
False
>>> hash(c)
1
>>> hash(a)
1
>>> d = {c, a}
>>> d
{<__main__.b object at 0x7f5d85599a90>, <__main__.b object at 0x7f5d85599b50>}
>>> d = {c: "foo", a: "bar"}
>>> d
{<__main__.b object at 0x7f5d85599a90>: 'foo', <__main__.b object at 0x7f5d85599b50>: 'bar'}
>>> d[c]
'foo'
>>> d[a]
'bar'

@dwhswenson

Copy link
Copy Markdown
Member

I was also quite certain at least some fast things inside python were done based on hashes (why would for example dict otherwise complain about unhashable keys...), but even these abominations work somehow on python 3.9.7:

I just posted a gist from some of my own experiments. See also https://hynek.me/articles/hashes-and-equality/. TL;DR: hash is a shortcut that is followed by a check with ==. Your example above is as expected, but performance degrades to same as using a list (no fast look-up).

I'm planning to post an issue soon about hashing and equality changes for OPS 2.0, which need to happen for several reasons, but I'll be looking for feedback on details (and help catching edge cases I hadn't considered).

@sroet

sroet commented Oct 6, 2021

Copy link
Copy Markdown
Member Author

Before a detailed line-by-line review, I'd also like to see a test where you move the snapshot so that it does not cross the interface. In that case, you should get 0 acceptance probability (because there is no possibility of choosing this snapshot in the reverse move).

Done

I'm fine with this actually calculating the (wasted) dynamics and giving zero acceptance, because that's a use case the users have to seriously try hard to reach (choose interface-constrained shooting with two-way shooting with a modifier that changes coordinates). However, it's worth keeping this in mind in the context of follow-ups to #1075, in particular anything that deals with early rejection in the sense of #794 (comment). (My thought is to calculate the bias components in the call method; if any are 0, we can early-abort).

Sounds reasonable, we should probably add some is_valid_after_modification to selectors that can look at the validity on a case by case basis (sometimes rejection might not be possible without knowing the full new trajectory)

@sroet

sroet commented Oct 6, 2021

Copy link
Copy Markdown
Member Author

I just posted a gist from some of my own experiments. See also https://hynek.me/articles/hashes-and-equality/. TL;DR: hash is a shortcut that is followed by a check with ==. Your example above is as expected, but performance degrades to same as using a list (no fast look-up).

Intriguing, thanks for the write up!

@dwhswenson dwhswenson left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we update the call signature of this function to (old_snapshot, old_trajectory, new_snapshot, new_trajectory) in 2.0

Yes (in a PR to 2.0, obviously)....

and raise a Deprecation warning if new_snapshot is not defined for 1.X?

... and yes (in this PR). That's the only change I see a need for; otherwise LGTM!

@sroet

sroet commented Oct 7, 2021

Copy link
Copy Markdown
Member Author

... and yes (in this PR). That's the only change I see a need for; otherwise LGTM!

Done, please have another look

@sroet sroet requested a review from dwhswenson October 7, 2021 08:35

@dwhswenson dwhswenson left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One question as an aside, but this looks good. Will merge!

# TODO OPS 2.0: We should probabily alter the order and keyword names
# of this call
if new_snapshot is None:
NEW_SNAPSHOT_SELECTOR.warn(stacklevel=3)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I notice you've been setting stacklevel=3 in these -- should the default stacklevel=2 be changed?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe. When I run py.test on these with stacklevel=2 I get

openpathsampling/tests/test_external_engine.py::TestExternalEngine::test_in_shooting_move
  /home/sander/github_files/openpathsampling/openpathsampling/shooting.py:41: DeprecationWarning: new_snapshot=None; If snapshot has been copied or modified we can't find it in trial_trajectory. This call signature will update to (old_snapshot, old_trajectory, new_snapshot, new_trajectory) in OpenPathSampling 2.0.  Call with kwargs and use new_snapshot=old_snapshot if  old_snapshot is not copied or modified in new_traj
    NEW_SNAPSHOT_SELECTOR.warn(stacklevel=2)

which is a lot less useful (in my opinion) than with stacklevel=3:

openpathsampling/tests/test_external_engine.py::TestExternalEngine::test_in_shooting_move
  /home/sander/github_files/openpathsampling/openpathsampling/pathmover.py:823: DeprecationWarning: new_snapshot=None; If snapshot has been copied or modified we can't find it in trial_trajectory. This call signature will update to (old_snapshot, old_trajectory, new_snapshot, new_trajectory) in OpenPathSampling 2.0.  Call with kwargs and use new_snapshot=old_snapshot if  old_snapshot is not copied or modified in new_traj
    bias = self.selector.probability_ratio(

So we should definetly default to stacklevel=3 for deprecations were we want people to update their calls.

(As a side note, this is stacklevel=1, which is why the default might be slightly off:

  /home/sander/github_files/openpathsampling/openpathsampling/deprecations.py:62: DeprecationWarning: new_snapshot=None; If snapshot has been copied or modified we can't find it in trial_trajectory. This call signature will update to (old_snapshot, old_trajectory, new_snapshot, new_trajectory) in OpenPathSampling 2.0.  Call with kwargs and use new_snapshot=old_snapshot if  old_snapshot is not copied or modified in new_traj
    warnings.warn(self.message, category, stacklevel=stacklevel)

@dwhswenson dwhswenson merged commit d9832f8 into openpathsampling:master Oct 7, 2021
@sroet sroet deleted the add_modified_snapshot_selector branch October 8, 2021 08:37
@dwhswenson dwhswenson mentioned this pull request Jan 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants