Skip to content

Reduce requirements#867

Merged
dwhswenson merged 15 commits into
openpathsampling:masterfrom
dwhswenson:minimal_requirements
Nov 15, 2019
Merged

Reduce requirements#867
dwhswenson merged 15 commits into
openpathsampling:masterfrom
dwhswenson:minimal_requirements

Conversation

@dwhswenson

@dwhswenson dwhswenson commented Oct 18, 2019

Copy link
Copy Markdown
Member

OPS currently has a lot of requirements that aren't actually required to do simple path sampling. All of these integrations should be included in the standard test suite, but we should also be able to do a minimal test suite runs without the following installed (listed as check boxes to check them off as we go):

  • matplotlib
  • jupyter
  • mdtraj
  • pyemma
  • openmm
  • openmmtools
  • Add minimal install test to Travis tests
  • Update install docs

This would leave the following as the actual core requirements:

  • future
  • numpy
  • scipy
  • pandas
  • netcdf4
  • networkx
  • svgwrite
  • ujson
  • matplotlib (re-added)

Note that removing OpenMM from the requirements will also make us pip-installable.

@dwhswenson

Copy link
Copy Markdown
Member Author

Note that there are three kinds of packages in the "requirements."

  1. Absolute minimum requirements. You can't do anything with the code without these; import openpathsampling would throw an error.
  2. Package "requirements." This may include a few extra things that facilitate common uses.
  3. Additional functionality "requirements." If you install these, you get extra features for free.

There are packages that fall a little in between these, so I'm hoping to get user feedback on where they should go.

Between 1 and 2:

  • matplotlib : We could, in principle, make all the plotting helpers only enabled when matplotlib is installed. (That is, not raise an error on import openpathsampling if you haven't installed matplotlib.) I haven't done that yet, and I'm leaning toward not doing so, because if we already require numpy, scipy, and pandas, we might as well require matplotlib.

Between 2 and 3:

  • mdtraj : MDTraj is not required for the toy engine, but it may be required for the OpenMM engine and is definitely required for Gromacs. I lean toward making it so that import openpathsampling is fine if MDTraj isn't there, but to nonetheless have conda/pip install openpathsampling also include MDTraj. The reason is that I'd like OpenMM/Gromacs to immediately work if you just install those codes. Plus, MDTraj is central to so much of how we interact with other engines.

    An alternative would be to create a bunch of metapackages: openpathsampling-openmm; openpathsampling-gromacs, etc. But that's a mess if you want to use OPS with your already-installed (and customized for high performance) Gromacs install. Then you'd also need something like openpathsampling-gromacs-base.


Anyway, those are my thoughts, but I'd really like to hear what users think would work best for them. Any thoughts, @sroet @hejung @ajsilveira @arjunwadhawan @IdsTeepe?

@IdsTeepe

Copy link
Copy Markdown

matplotlib is a standard which is present in almost every environment. In the rare case that people don't have it installed yet and they are about to use openpathsampling, they are bound to require it sooner than later. Scipy requires matplotlib itself, so why wouldn't openpathsampling require it?

Although I haven't used any engine other than the Gromacs engine, I do believe mdtraj is integral for reading trajectories. I'd definitely include it in conda/pip install openpathsampling, even if the toy engine is capable of running without it.

I'm not sure how I feel about creating a bunch of metapackages. Right now I have simtk installed for openmm, even though I don't use it. Then again, I really don't mind another small package in my environment, especially if it doesn't cause any conflict.

@hejung

hejung commented Oct 22, 2019

Copy link
Copy Markdown
Contributor

While it is true that in principle we can do path sampling on toy systems (and I love to do that for development), I think the typical user will want to do path sampling on 'real' systems. That is only possible with mdtraj and openmm (or gromacs).
I believe atm that the "toy system only" setup is something that is only interesting for development purposes and I believe if you are able to install ops from source you will also be able to edit the requirements if really needed.

I would however opt for not requiring matplotlib and jupyter. Consider e.g. the situation on the MPCDF clusters where you can only run jobs via the queing system and where you are not allowed/supposed to run interactive sessions on the login nodes, i.e. there is no way of using matplotlib except saving the plot as image and hope it will look as you thought. Using jupyter notebooks via ssh tunneling is also not something the MPCDF would be happy about, since the login nodes are not thought for calculations.

Concerning SciPy requiring matplotlib, I think there is some confusion between the SciPy stack and the SciPy library. SciPy stack is the bundle of numpy, matplotlib, scipy library, and friends. The SciPy library is what we actually import with import scipy and it does not require matplotlib (see e.g. http://scipy.github.io/devdocs/building/linux.html).

TLDR:
mdtraj and openmm should/can be required
matplotlib and jupyter should not be required

@dwhswenson

Copy link
Copy Markdown
Member Author

Thanks for the feedback. First, let me invent a few terms, for clarity in communication. "True requirements" are "crash on import if you don't have." "Co-packaged" are requirements that aren't technically necessary, but come with OPS by default. "Not required" means it requires a separate installation.

My current leanings

  • metapackages: I don't want a bunch of them. Sounds like neither do you. Good.

  • matplotlib: true requirement. A lot of recommended analysis requires this. @hejung: Is it a problem to have matplotlib installed on your clusters, as long as the code you run doesn't use it?

    @IdsTeepe: as @hejung pointed out, your link is to what used to be called "the SciPy Stack." I think that term has mostly been replaced by "The SciPy ecosystem", which is a collection of open source software for scientific computing in Python. That's not so much about requirements, but about the idea that all of these things play nicely together.

    But part of the argument for the SciPy ecosystem is that you should be able to assume that if you have part of it, you have most of it. When it comes to matplotlib, I'd like to be able to assume our users can access it unless there's a good reason not to require it.

    This might change in the future, as I'm hoping to develop some graphing tools based on other backends (e.g., more interactive tools like bokeh). But for now, my leaning is to leave it as is.

  • mdtraj: co-packaged. I want other engines (e.g., Gromacs) to be as easy as possible to install, and mdtraj facilitates this. It sounds like you, the users, also support this. In this PR I've already removed it from true requirements, so no reason to undo that. Co-packaged sounds right.

  • openmm: not required. One of my major goals in this PR is to make OPS pip-installable. OpenMM will not be pip-installable in the near future. In addition, having OPS require OpenMM currently means that it has to be in the omnia conda channel. I do not have commit access to the omnia/conda-recipes repo, meaning that every release involves me bugging them to merge (to be fair, I could probably get commit access just by asking, but the omnia channel is already deprecated and packages should move to conda-forge when they can).

  • jupyter: not required. I strongly recommend that users use Jupyter for setup and analysis, because I think the interactive environment makes debugging easier. However, I think most of our Jupyter-specific run requirements relate to checking whether we're in a Jupyter notebook for how we handle output (the trick where we delete the output from the previous step when we're running in a notebook). If you don't have Jupyter installed, obviously we're not running in a Jupyter notebook, so we know which branch of that code to use.

@hejung is absolutely correct that most users will want to work with real systems. My goal here is that installing the OPS engine for most of these will be no more difficult than installing the engine itself. You installed OpenMM? Now you can use OPS with OpenMM. You installed Gromacs? Now you can use OPS with Gromacs. Etc. That's a big part of the reason to co-package MDTraj. At some point we may need to give additional installation instruction for some engines (LAMMPS will require that it is compiled with its Python wrappers), but my goal is to make an instructions page for installing our various integrations (also including CV wrappers, like those for PyEMMA and PLUMED) that makes it clear that most of these are "you get it for free when you install the other package."

@dwhswenson dwhswenson changed the title [WIP] Reduce requirements Reduce requirements Nov 2, 2019
@dwhswenson

Copy link
Copy Markdown
Member Author

Just as a remark to anyone interested: This is now pretty much done. I just need to add some more explanations to the install docs. If there's any significant opposition to the requirements shift outlined above, please let me know. Otherwise, I'll put this up for final review soon.

@dwhswenson

Copy link
Copy Markdown
Member Author

As far as I'm concerned, this is now done. I'll leave the PR open for 48 hours (until 23:59 Thu 14 Nov) for any other comments, and then merge. And OPS 1.1 will be released on conda-forge and pypi!

@dwhswenson dwhswenson merged commit bf40e8b into openpathsampling:master Nov 15, 2019
@dwhswenson dwhswenson deleted the minimal_requirements branch November 15, 2019 11:49
@dwhswenson dwhswenson mentioned this pull request Nov 15, 2019
This was referenced Dec 11, 2019
@dwhswenson dwhswenson mentioned this pull request Sep 17, 2020
6 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants