Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NYC_buildings: Modernize notebook #386

Open
wants to merge 11 commits into
base: main
Choose a base branch
from
Open

Conversation

Azaya89
Copy link
Collaborator

@Azaya89 Azaya89 commented May 17, 2024

This is a WIP PR to help with debugging the issues I'm having with this notebook. I also uploaded the new .parq file together with this PR since I don't have access to uploading it on AWS.

Modernizing an example checklist

Preliminary checks

  • Look for open PRs and issues that reference the project you are updating. It is possible previous unmerged work in PR could be re-used to modernize the project. Comment on these PRs and issues when appropriate, hopefully we should be able to close some of them after your modernizing work.

Change ‘anaconda-project.yml’ to use the latest workable version of packages

  • Pin python=3.11
  • Remove the upper pin (e.g. hvplot<0.9 to hvplot, panel>=0.12,<1.0 to panel>=0.12) of all other dependencies. Removing the upper pins of dependencies could necessitate code revisions in the notebooks to address any errors encountered in the updated environment. Should complexities or extensive time requirements arise, document issues for team discussion on whether to re-pin specific packages or explore other solutions.
  • Add/update the lower pin of all other dependencies (e.g. hvplot to hvplot>=0.9.2, hvplot>=0.8 to hvplot>=0.9.2). Usually, the new/updated lower pin of a dependency will be the version resolved after anaconda prepare has been run. Execute !conda list in a notebook, or anaconda run conda list in the terminal, to display the version of each dependency installed in the environment. Adjusting the lower pin helps ensure that the locks produced for each platform (linux-64, win-64, osx-64, osx-arm64) rely on the tested dependencies and not on some older versions.
  • If one of the channels include conda-forge or pyviz, ask Maxime if it can be removed

Plot API updates (discussed on a per-example basis)

  • Generally, try to replace HoloViews usage with hvPlot. At a certain point of complexity, such as with the use of ‘.select’, it might be better to stick with HoloViews. Additional examples of ‘complexity boundaries’ should be documented in this document.
  • Almost always, try to replace the use of datashade with rasterize (read this page). Essentially, rasterize allows Bokeh to handle the colormapping instead of Datashader.

Interactivity API updates (discussed on a per-example basis)

  • Remove all pn.interact usage
  • Avoid .param.watch() usage. This is pretty low-level and verbose approach and should not be used in Examples unless required, or an Example is specifically trying to demo its usage in an advanced workflow.
  • Prefer using pn.bind(). Read this page for explanation.
  • For apps built using a class approach, when they create a view() method and call it directly, update the class by inheriting from pn.viewable.Viewer and replace view() by __panel__(). Here is an example.

Panel App updates (discussed on a per-example basis)

  • If the project doesn’t at any point create a Panel app at all, consider creating one. It can be as simple as wrapping a plot in pn.Column, or more complicated to incorporate widgets, etc. Make the final app .servable().
  • If the project creates an app in a notebook but doesn’t deploy it (i.e. there is no command: dashboard declaration in the anaconda-project.yml file), try adding it.
  • If the project already deploys an app but doesn’t wrap it in a nice template, consider wrapping it in a template.
  • If the project deploys an app wrapped in a template, customize the template a little so all the apps don’t look similar (e.g. change the header background color). This doesn’t need to be discussed.
  • Comment start If you are building the application in a single cell, you can construct a template explicitly, like template = pn.template.BootstrampTemplate, but if building up an app across multiple cells, it is probably cleaner to declare the template at the top with pn.extension(template='bootstrap'). See how to guide on setting a template.

General code quality updates

  • If the notebook disables warnings (e.g. with warnings.simplefilter(‘ignore’) somewhere at the start of the notebook, remove this line. Try to update the code to remove the warnings, if any. If updating the code to remove the warnings is taking significant amount of time and effort, bring it up for discussion and we may decide to disable warnings again.

Text content

  • Edit the text content anywhere and everywhere that it can be improved for clarity.
  • Check the links are valid, and update old links (e.g. http -> https, xyz.pyviz.org -> xyz.holoviz.org)
  • Remove instructions to install packages inside an example

Visual appearance - Example

  • Check that the titles/headings make sense and are succinct.
  • Check that the text content blocks are easily readable; revise into additional paragraphs if needed.
  • Check that the code blocks are easily readable; revise as needed. (e.g. add spaces after commas in a list if there are none, wrap long lines, etc.)
  • Check image and plot sizes. If possible, making them responsive is highly recommended.
  • Check the appearance on a smartphone (check Google to see how to adapt the appearance of your browser to display pages as if they were seen from a smartphone, this is usually done via the web developer tools). This is not a top priority for all examples, but if there are a few easy and straightforward changes to make that can improve the experience, let’s do it.
  • Check the updated notebook with the original notebook

Visual appearance - Gallery

  • Check the thumbnail is visually appealing
  • Check the project title is well formatted (e.g. Ml Annotators to ML Annotators), if not, add/update the examples_config.title field in anaconda-project.yml
  • Check the project description is appropriate, if not, update the description field in anaconda-project.yml

Workflow (after you have made the changes above)

  • Run successfully doit validate:<projectname>
  • Run successfully doit test:<projectname>
  • Run successfully doit doc_one –name <projectname>. It’s better if the project notebook(s) is saved with its outputs (but be sure to clear outputs before committing to the examples repo!) when building the docs. Then open this file in your browser ./builtdocs/index.html and check how the site looks.
  • If you’re happy with all the above, open a PR. Reminder, clear notebook outputs before pushing to the PR.

@Azaya89 Azaya89 self-assigned this May 17, 2024
@Azaya89 Azaya89 marked this pull request as draft May 17, 2024 15:12
@Azaya89 Azaya89 added the NF SDG NumFocus Software Development Grant 2024 label May 17, 2024
Copy link
Contributor

Your changes were successfully integrated in the dev site, make sure to review
the pages of the projects you touched before merging this PR: https://holoviz-dev.github.io/examples/.
You can also download an archive of the site from the workflow summary page which comes in handy
when your dev site built was overriden by another PR (we have a single dev site!).

@Azaya89 Azaya89 mentioned this pull request May 23, 2024
2 tasks
@Azaya89 Azaya89 marked this pull request as ready for review June 7, 2024 14:28
@Azaya89
Copy link
Collaborator Author

Azaya89 commented Jun 7, 2024

In this PR, pinning notebook<7 prevents geopandas from being imported, so I skipped that step. This is also causing one of the CI failures.
@maximlt

@maximlt
Copy link
Contributor

maximlt commented Jun 7, 2024

In this PR, pinning notebook<7 prevents geopandas from being imported, so I skipped that step.

Can you please add more details on this?

@Azaya89
Copy link
Collaborator Author

Azaya89 commented Jun 7, 2024

Can you please add more details on this?

Screenshot 2024-06-07 at 6 24 33 PM

When I run cell2, here's what I get:

---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
Cell In[2], line 3
      1 import hvplot.dask # noqa
      2 import hvplot.pandas # noqa
----> 3 import geopandas as gpd
      4 import colorcet as cc
      5 from holoviews import opts

File ~/Documents/development/holoviz-topics-examples/nyc_buildings/envs/default/lib/python3.11/site-packages/geopandas/__init__.py:3
      1 from geopandas._config import options
----> 3 from geopandas.geoseries import GeoSeries
      4 from geopandas.geodataframe import GeoDataFrame
      5 from geopandas.array import points_from_xy

File ~/Documents/development/holoviz-topics-examples/nyc_buildings/envs/default/lib/python3.11/site-packages/geopandas/geoseries.py:13
     10 from pandas import Series, MultiIndex
     11 from pandas.core.internals import SingleBlockManager
---> 13 from pyproj import CRS
     14 import shapely
     15 from shapely.geometry.base import BaseGeometry

File ~/Documents/development/holoviz-topics-examples/nyc_buildings/envs/default/lib/python3.11/site-packages/pyproj/__init__.py:33
      1 """
      2 Python interface to PROJ (https://proj.org),
      3 cartographic projections and coordinate transformations library.
   (...)
     29 SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
     30 """
     31 import warnings
---> 33 import pyproj.network
     34 from pyproj._datadir import (  # noqa: F401 pylint: disable=unused-import
     35     _pyproj_global_context_initialize,
     36     set_use_global_context,
     37 )
     38 from pyproj._show_versions import (  # noqa: F401 pylint: disable=unused-import
     39     show_versions,
     40 )

File ~/Documents/development/holoviz-topics-examples/nyc_buildings/envs/default/lib/python3.11/site-packages/pyproj/network.py:10
      6 from typing import Union
      8 import certifi
---> 10 from pyproj._network import (  # noqa: F401 pylint: disable=unused-import
     11     _set_ca_bundle_path,
     12     is_network_enabled,
     13     set_network_enabled,
     14 )
     17 def set_ca_bundle_path(ca_bundle_path: Union[Path, str, bool, None] = None) -> None:
     18     """
     19     .. versionadded:: 3.0.0
     20 
   (...)
     40         variables.
     41     """

ImportError: dlopen(/Users/mac/Documents/development/holoviz-topics-examples/nyc_buildings/envs/default/lib/python3.11/site-packages/pyproj/_network.cpython-311-darwin.so, 0x0002): Library not loaded: @rpath/libtiff.5.dylib
  Referenced from: <1BF0DA3A-18BF-3035-BAF9-9B25E936A309> /Users/mac/Documents/development/holoviz-topics-examples/nyc_buildings/envs/default/lib/libproj.25.9.3.1.dylib
  Reason: tried: '/Users/mac/Documents/development/holoviz-topics-examples/nyc_buildings/envs/default/lib/libtiff.5.dylib' (no such file), '/Users/mac/Documents/development/holoviz-topics-examples/nyc_buildings/envs/default/lib/python3.11/site-packages/pyproj/../../../libtiff.5.dylib' (no such file), '/Users/mac/Documents/development/holoviz-topics-examples/nyc_buildings/envs/default/lib/python3.11/site-packages/pyproj/../../../libtiff.5.dylib' (no such file), '/Users/mac/Documents/development/holoviz-topics-examples/nyc_buildings/envs/default/bin/../lib/libtiff.5.dylib' (no such file), '/Users/mac/Documents/development/holoviz-topics-examples/nyc_buildings/envs/default/bin/../lib/libtiff.5.dylib' (no such file), '/usr/local/lib/libtiff.5.dylib' (no such file), '/usr/lib/libtiff.5.dylib' (no such file, not in dyld cache)

@maximlt
Copy link
Contributor

maximlt commented Jun 8, 2024

Ok it looks like a packaging issue. I don't understand why notebook<7 would influence geopandas though. Can you try to create an environment (conda create -n reproissue ...) with just pyproj, geopandas and python and the versions you have in the current lock, and see if you can reproduce the error? I'm mentioning these packages only as there are the ones that show up in the traceback you shared.

@Azaya89
Copy link
Collaborator Author

Azaya89 commented Jun 10, 2024

Ok it looks like a packaging issue. I don't understand why notebook<7 would influence geopandas though. Can you try to create an environment (conda create -n reproissue ...) with just pyproj, geopandas and python and the versions you have in the current lock, and see if you can reproduce the error? I'm mentioning these packages only as there are the ones that show up in the traceback you shared.

OK, so I did create a new environment with pypoj, geopandas, python, and pyarrow and it worked well, although geopandas import took some time to load (about 20 secs):

Screenshot 2024-06-10 at 12 15 58 PM

So, i'm thinking the issue may be another dependency?

@maximlt
Copy link
Contributor

maximlt commented Jun 10, 2024

@Azaya89 can you maybe try to pin again notebook<7 in the project file, re-lock, and push the changes to Github? It'd be interesting to see whether the issue you reported shows up on the CI or not.

@Azaya89
Copy link
Collaborator Author

Azaya89 commented Jun 10, 2024

@Azaya89 can you maybe try to pin again notebook<7 in the project file, re-lock, and push the changes to Github? It'd be interesting to see whether the issue you reported shows up on the CI or not.

This is not able to work because doit:test ... fails dues to the same import errors.

@maximlt
Copy link
Contributor

maximlt commented Jun 10, 2024

This is not able to work because doit:test ... fails dues to the same import errors.

I would like to see it failing on the CI to see if it reports the same error that you get.

@@ -15,20 +15,26 @@ user_fields: [examples_config]

channels:
- defaults
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Azaya89 ah I just noticed something. It's good practice not to mix the defaults channel with conda-forge. So when we use conda-forge we should replace defaults with nodefaults, to avoid the defaults channel to be added by default 🙃 Can you try that on your machine and re-lock?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, but there was no failure in the CI here. This is diabolical!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

Copy link
Contributor

Your changes were successfully integrated in the dev site, make sure to review
the pages of the projects you touched before merging this PR: https://holoviz-dev.github.io/examples/.
You can also download an archive of the site from the workflow summary page which comes in handy
when your dev site built was overriden by another PR (we have a single dev site!).

1 similar comment
Copy link
Contributor

Your changes were successfully integrated in the dev site, make sure to review
the pages of the projects you touched before merging this PR: https://holoviz-dev.github.io/examples/.
You can also download an archive of the site from the workflow summary page which comes in handy
when your dev site built was overriden by another PR (we have a single dev site!).

@Azaya89 Azaya89 requested a review from maximlt June 11, 2024 10:15
@Azaya89
Copy link
Collaborator Author

Azaya89 commented Jun 14, 2024

I think #199 is ready to be closed now @maximlt

Copy link
Contributor

Your changes were successfully integrated in the dev site, make sure to review
the pages of the projects you touched before merging this PR: https://holoviz-dev.github.io/examples/.
You can also download an archive of the site from the workflow summary page which comes in handy
when your dev site built was overriden by another PR (we have a single dev site!).

Copy link
Contributor

github-actions bot commented Jul 4, 2024

Your changes were successfully integrated in the dev site, make sure to review
the pages of the projects you touched before merging this PR: https://holoviz-dev.github.io/examples/.
You can also download an archive of the site from the workflow summary page which comes in handy
when your dev site built was overriden by another PR (we have a single dev site!).

@Azaya89
Copy link
Collaborator Author

Azaya89 commented Jul 4, 2024

@maximlt I think this PR is ready for another review with the following notes:

  1. There is still a bit of performance issues regarding rendering of the plots and dashboard as you are already aware. It's faster than before but still slower than expected. This also affects the time it takes the tests to run (took 2:38 to run 10 cells via doit test:...)
  2. The new_nyc_buildings.parq file is the dataset used in the notebook now and so needs to be moved to S3 to replace the old one there and then deleted from the repo.
  3. The narrative about inspect_polygons was completely deleted from the notebook as it only works with spatialpandas not geopandas

@maximlt
Copy link
Contributor

maximlt commented Jul 4, 2024

Ok thanks for the report. Depending on the performance issues, it might be that we end up not updating the code in this example.

@Azaya89
Copy link
Collaborator Author

Azaya89 commented Jul 4, 2024

Ok thanks for the report. Depending on the performance issues, it might be that we end up not updating the code in this example.

:(

@droumis
Copy link
Contributor

droumis commented Aug 7, 2024

Isaiah reports that it takes about 30 seconds to run a cell with the full visualization with geopandas and that now (reverting back to spatialpandas on his local machine) it's taking even longer.

@Azaya89
Copy link
Collaborator Author

Azaya89 commented Aug 21, 2024

This resolves all the previous issues about performance and reading of the .parq files.

@Azaya89 Azaya89 requested a review from jbednar August 21, 2024 15:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NF SDG NumFocus Software Development Grant 2024
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants