Jupyter Notebook with Delta-spark in Multipass
Following my latest article, I want to use Jupyter Notebook to record my learning of Delta. The setup is pretty straightforward.
First, shell into the multipass instance and install jupyter using pip.
[~/git/delta]$ multipass shell delta
# in ubuntu instance and pyspark venv
(pyspark) ubuntu@delta:~$ pip install jupyter
...
(pyspark) ubuntu@delta:~$
Second, typical house cleaning of jupyter notebook, disabling fd output in notebook. In addition, I want to start jupyter notebook when shell to the instance, so the shell becomes jupyter log console. I added the jupytor start script to .bashrc.
(pyspark) ubuntu@delta:~$ ipython profile create
...
(pyspark) ubuntu@delta:~$ echo 'c.IPKernelApp.capture_fd_output = False' >> .ipython/profile_default/ipython_kernel_config.py
(pyspark) ubuntu@delta:~$ echo 'jupyter notebook --no-browser --ip 0.0.0.0 --port 8888 --notebook-dir="/data"' >> .bashrc
(pyspark) ubuntu@delta:~$
Exit the instance and run the multipass shell command again.
[~/git/delta]$ multipass shell delta
Welcome to Ubuntu 24.04.1 LTS (GNU/Linux 6.8.0-49-generic aarch64)
* Documentation: https://help.ubuntu.com
* Management: https://landscape.canonical.com
* Support: https://ubuntu.com/pro
System information as of Sat Nov 30 13:12:59 CST 2024
System load: 0.0
Usage of /: 47.9% of 8.65GB
Memory usage: 12%
Swap usage: 0%
Processes: 126
Users logged in: 0
IPv4 address for enp0s1: 192.168.205.7
IPv6 address for enp0s1: fd21:4101:a4fb:ee9c:5054:ff:fe2f:8805
Expanded Security Maintenance for Applications is not enabled.
0 updates can be applied immediately.
Enable ESM Apps to receive additional future security updates.
See https://ubuntu.com/esm or run: sudo pro status
Last login: Sat Nov 30 12:42:04 2024 from 192.168.205.1
[I 2024-11-30 13:13:00.262 ServerApp] jupyter_lsp | extension was successfully linked.
[I 2024-11-30 13:13:00.264 ServerApp] jupyter_server_terminals | extension was successfully linked.
[I 2024-11-30 13:13:00.266 ServerApp] jupyterlab | extension was successfully linked.
[I 2024-11-30 13:13:00.267 ServerApp] notebook | extension was successfully linked.
[I 2024-11-30 13:13:00.356 ServerApp] notebook_shim | extension was successfully linked.
[I 2024-11-30 13:13:00.363 ServerApp] notebook_shim | extension was successfully loaded.
[I 2024-11-30 13:13:00.365 ServerApp] jupyter_lsp | extension was successfully loaded.
[I 2024-11-30 13:13:00.365 ServerApp] jupyter_server_terminals | extension was successfully loaded.
[I 2024-11-30 13:13:00.366 LabApp] JupyterLab extension loaded from /home/ubuntu/pyspark/lib/python3.12/site-packages/jupyterlab
[I 2024-11-30 13:13:00.366 LabApp] JupyterLab application directory is /home/ubuntu/pyspark/share/jupyter/lab
[I 2024-11-30 13:13:00.366 LabApp] Extension Manager is 'pypi'.
[W 2024-11-30 13:13:00.366 LabApp] Failed to instantiate the extension manager pypi. Falling back to read-only manager.
Traceback (most recent call last):
File "/home/ubuntu/pyspark/lib/python3.12/site-packages/jupyterlab/labapp.py", line 837, in initialize_handlers
ext_manager = manager_factory(app_options, listings_config, self)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ubuntu/pyspark/lib/python3.12/site-packages/jupyterlab/extensions/__init__.py", line 46, in get_pypi_manager
return PyPIExtensionManager(app_options, ext_options, parent)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ubuntu/pyspark/lib/python3.12/site-packages/jupyterlab/extensions/pypi.py", line 134, in __init__
self._httpx_client = httpx.AsyncClient(proxies=proxies)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: AsyncClient.__init__() got an unexpected keyword argument 'proxies'
[I 2024-11-30 13:13:00.368 ServerApp] jupyterlab | extension was successfully loaded.
[I 2024-11-30 13:13:00.369 ServerApp] notebook | extension was successfully loaded.
[I 2024-11-30 13:13:00.369 ServerApp] Serving notebooks from local directory: /data
[I 2024-11-30 13:13:00.369 ServerApp] Jupyter Server 2.14.2 is running at:
[I 2024-11-30 13:13:00.369 ServerApp] http://delta:8888/tree?token=879dd78cf1207e3ab64e7b4262c873d6247c0e236e33c430
[I 2024-11-30 13:13:00.369 ServerApp] http://127.0.0.1:8888/tree?token=879dd78cf1207e3ab64e7b4262c873d6247c0e236e33c430
[I 2024-11-30 13:13:00.369 ServerApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
[C 2024-11-30 13:13:00.371 ServerApp]
The notebook should be ready for access from my local browser, but what ip? You can find the ip by running multipass info delta. Don’t forget to login the jupyter with the token showed in the log.
[~/git/delta]$ multipass info delta
Name: delta
State: Running
Snapshots: 1
IPv4: 192.168.205.7
Release: Ubuntu 24.04.1 LTS
Image hash: 6e1f90d3e81b (Ubuntu 24.04 LTS)
CPU(s): 4
Load: 0.02 0.01 0.00
Disk usage: 4.2GiB out of 9.6GiB
Memory usage: 340.8MiB out of 1.9GiB
Mounts: /Users/ysung/git/delta => /data
UID map: 501:default
GID map: 20:default
To verify this setup, I create a notebook folder and an ipython kernel note to test pyspark.
In sum, I added jupyter notebook to my delta tutorials. Keep learning.