Julia Wagemann is a Ph.D. student at the University of Marburg, investigating Big Data technologies for meteorological and climate use. She is also a visiting scientist at the European Center for Medium-Range Weather Forecasts. Her work is focused on making geospatial data (meteorological and climate) more accessible to users via cloud-based services. Those interested in the field will find Julia’s tutorials and workflows for Jupyter Notebooks extremely helpful.
What is Jupyter? What’s Notebook, Lab, and Hub?
Jupyter is an ecosystem of tools and services to develop open-source code, tutorials, and reproducible research.
Jupyter Notebooks are an interface for combining code, documentation, and data access. Previously, your data was held in one location. You coded somewhere else and output it to a different folder which you opened to see what your code did. A notebook is a single place where you combine code, workflow, documentation, and the results.
Jupiter Hub is tremendously helpful for teaching and learning. It’s a cloud-based platform with libraries and packages on demand. Perfect for educational environments where people don’t have to spend hours installing and setting up resources for hours before the lessons can start. The magic of JupyterHub is that you set up your environment once and then you can make that environment available to anyone!
Jupyter Lab is an interactive web-based software development environment. It’s similar to PyCharm and Python, but Lab is for developing codes for Jupyter Notebook.
Voilà is a relatively new project. Once you’ve developed workflows and prepared visualizations in your Notebooks, Voilà allows you to present your results in a web-based application and lets you create better apps in the back end with Notebooks.
What Languages Can You Use in JupyterHub?
A lot depends on how you set it up in the first place. Hub supports dozens of different programming languages. For geospatial users, it’s mainly Python and R as well as JavaScript and Julia.
Hub is a blend of files, Notebooks, data, and images in one place. For instance:
- Predefined Notebooks for teaching where you’ve most likely developed some content for your students.
- Notebooks from scratch for developing your own workflow.
What Does a Notebook Process Look Like?
The strength of Jupyter Notebooks is the ease of getting started with coding. If you’re familiar with Google Docs, you’ll get it quickly.
Imagine a sheet and a navigation pane on the top. You can just add a new cell and define each new cell if it should be filled with code or a markdown (a cell with text and documentation for the workflow). The result? Your very own Notebook with a full workflow.
It’s an already made environment. Log in and use this mashup of code and documentation in one place. Start creating cells, put your code in it, and execute it. You can see if it goes through, or you have to go back and debug. For maps, plots, or data structures, you get a visual output as you’re executing along the way.
It’s a bit like a debugging process in an editor where you step through the workflow. The visual representation of each step makes this user-friendly.
Can You Interchange Languages?
Yes. You can either decide on one language all the way through Notebook, or you can pick and mix and write a function in Python and feed that into R.
Notebooks work with kernels. Each has a base kernel which could be Python or R, or your preferred language. You can do your data pre-processing in Python and xarray. Save the results and use R and the Chi package for a visualization, which you write up in a separate Notebook.
You can even transfer objects from one language to another; bring one from Python and cast it as an R object. You can mix languages in one workflow and take advantage of them within the same Notebook.
What Problems Can Jupyter Solve?
Jupyter provides a pre-built environment for students, and it’s a tool for a visual way of programming that combines different languages. It’s self-documenting and brings things together.
It’s also a tool for better reproducible code, workflows, and science. Users can collaborate on code and workflows. They can share a block of code with someone else who can point to the same data source, execute it, test it, and build on it.
People can share research and de-hack flows and data processing code. Get comments and advice on GitHub. It allows them to take already developed ideas and workflows without reinventing the wheel each time and build on other’s work. It’s a big step forward in reproducible research.
Another place where Jupyter is solving problems is by bringing the code to the data. With Hub, you don’t need to download enormous volumes of data and store it on your local machine or a server. You get a link to cloud storage in the back end and bring the code to the data. You can analyze or process your code where the data is.
PanGeo.io, a community project, uses Jupyter for the geoscience community for climate and ocean data. For them, Python, Jupyter Hub, Notebooks, and cloud storage made data processing more reproducible and efficient.
Is There Anything in Jupyter Particularly Useful for the Geospatial Community?
Jupyter Notebooks are especially valuable for geospatial data analysis because it provides access to large volumes of data and effective visualization of the problem. Bringing the various programming languages together and setting up a single workflow with different code snippets is a considerable advantage for a GIS user.
How Does This Help GIS Decision-Makers?
Not everyone is interested in, or has the time for the “how.” For those decision-makers who just need to have a total number, a plot, or a map, Jupyter Notebook is not the best solution. It’s not explicitly designed for them or their clients who, for example, just want to know about natural resources in a given area or want to know about vegetation for their reforestation project. They don’t want to be bogged down in the exact technique or tools used to get the result.
It’s an excellent tool for team collaboration and for those interested in learning how to code to see what it can do to geospatial data.
Once the coding is done and the results are visible, you can invite decision-makers to see the maps and outcomes for the information they need via the more user-friendly web application Voilà.
What’s Next for Jupyter?
The Voilà platform is growing stronger precisely for the reason that it’s user friendly.
Jupiter Widgets are coming along nicely, giving you the option to have interactive modules to change input data. The choice of a button or a slider has been there for some time, but now these widgets are integrated into the Voilà dashboard explorer to give users a better experience.
Notebook, as a whole, is excellent for reproducible research. It’s understandable and a great community tool. It’s easy to grasp, and it will start you off on your journey of coding. It lets you invest the time you saved by not having to start from the ground, in structuring your Notebook, your workflow, and your documentation in finer details and precision.
Should GIS People Explore Jupyter?
Absolutely. Especially if you’ve never coded before. It’s a neat and easy interface. It’s not scary, and you won’t “break” anything while you’re learning. It’s non-techy, and I strongly recommend people to at least go and get a feel for it.
Have you been inspired to open up Julia’s treasure chest of codes and workflows to see if you can pick and mix to make your own? Go to GitHub and see what people are mashing together? Or would you rather wait until someone, hopefully soon, comes up with a universal code?