Knowledge Base Resources

Contributed by cyberinfrastructure professionals (researchers, research computing facilitators, research software engineers and HPC system administrators), these resources are shared through the ConnectCI community platform. Add resources you find helpful!

Add a Resource

Managing Python Packages on an HPC Cluster

Python Packages on HPC

This workshop will go into the different ways python packages can be managed in a cluster environment using conda and python virtual environments both in batch mode from the command line and with Jupyter Notebooks and Jupyter Lab on the cluster. The examples will be run on the GMU HOPPER Cluster.

1 Like

Type

documentation

Level

Containerized Jupyter Notebooks for HPCs

Containerized Jupyter Notebooks for HPCs

This tutorial demonstrates how to create, manage, and deploy containerized Jupyter simulations for High-Performance Computing (HPC) environments, specifically using SLAC's S3DF infrastructure. By utilizing Apptainer (formerly Singularity) containers, users can package complex simulations with all necessary dependencies, input files, and configurations, ensuring reproducibility and ease of use for new users. The automated workflows, powered by GitHub Actions, handle building and updating the containers, while Open OnDemand provides an accessible interface for running Jupyter notebooks directly from the HPC environment. This approach eliminates setup errors, saves time, and ensures consistent simulation environments, enabling researchers to focus on their work instead of system configuration.

1 Like

Type

learning

Level

Supervised Machine Learning Readiness

Supervised Machine Learning Readiness

Supervised Machine Learning Readiness is a self-paced, beginner-friendly program designed for Earth systems scientists to explore the core principles of supervised machine learning. This series uses a combination of step-by-step frameworks, exploratory widgets, and low-code exercises in Jupyter Notebooks, to explore the full cycle of machine learning model development. No programming experience is required. By the end of the series, you will be able to recognize when machine learning is an appropriate tool and critically evaluate machine learning in Earth systems science contexts. Access requires a free NSF Unidata eLearning account.

ai supervised-learning machine-learning big-data data-analysis artificial-intelligence atmospheric-physics earth-sciences oceanography training workforce-development jupyterhub

1 Like

Type

learning

Level

Enhancing LLMs with RAG: A Beginner’s Guide

Open-Source LLM RAG Enhancement

This beginner-friendly guide introduces Retrieval-Augmented Generation (RAG), a technique to enhance Large Language Models (LLMs) by integrating external data sources. It covers the fundamentals of AI, LLMs, and RAG, providing step-by-step instructions, examples, and visual aids. The guide also discusses tools like Milvus, Faiss, and LangChain, offering a practical approach to building smarter AI systems.

ai llm NAIRR-pilot generative-ai nlp deep-learning machine-learning neural-networks reporting artificial-intelligence computer-science data-science jupyterhub python

1 Like

Type

learning

Level

Recommended Libraries for Cyberinfrastructure Users Developing Jupyter Notebooks

Recommended Libraries for Cyberinfrastructure Users Developing Jupyter Notebooks

This repository contains information about Jupyter Widgets and how they can be used to develop interactive workflows, data dashboards, and web applications that can be run on HPC systems and science gateways. Easy to build web applications are not only useful for scientists. They can also be used by software engineers and system admins who want to quickly create tools tools for file management and more!

0 Likes

Type

website

Level

HPC-AI Resources for STEM and Non-STEM Researchers

HPC-AI Resources for STEM and Non-STEM Researchers

This repository offers accessible resources and workshops on AI and high-performance computing (HPC), designed for both STEM and non-STEM majors. The materials are presented in simple language, requiring no prior technical background, making them suitable for a wide range of learners. The focus is on bridging the AI digital gap and enabling participants to harness the power of AI and HPC for research, innovation, and discovery.

ai llm generative-ai deep-learning machine-learning neural-networks visualization artificial-intelligence computer-science data-science hpc-getting-started professional-development software-carpentry training jupyterhub python

0 Likes

Type

learning

Level

Awesome Jupyter Widgets (for building interactive scientific workflows or science gateway tools)

Awesome Jupyter Widgets List

A curated list of awesome Jupyter widget packages and projects for building interactive visualizations for Python code

0 Likes

Type

learning

Level

Using Dask on HPC Systems

A tutorial on the effective use of Dask on HPC resources. The four-hour tutorial will be split into two sections, with early topics focused on novice Dask users and later topics focused on intermediate usage on HPC and associated best practices. The knowledge areas covered include (but are not limited to): Beginner section High-level collections including dask.array and dask.dataframe Distributed Dask clusters using HPC job schedulers Earth Science data analysis using Dask with Xarray Using the Dask dashboard to understand your computation Intermediate section Optimizing the number of workers and memory allocation Choosing appropriate chunk shapes and sizes for Dask collections Querying resource usage and debugging errors

training jupyterhub python

0 Likes

Type

learning

Level

Science Gateway Tool/Web App Template (Jupyter Notebook + ipywidgets)

0 Likes

Type

learning

Level

GPU Computing Workshop Series for the Earth Science Community

GPU training series for scientists, software engineers, and students, with emphasis on Earth science applications. The content of this course is coordinated with the 6 month series of GPU Training sessions starting in Februrary 2022. The NVIDIA High Performance Computing Software Development Kit (NVHPC SDK) and CUDA Toolkit will be the primary software requirements for this training which will be already available on NCAR's HPC clusters as modules you may load. This software is free to download from NVIDIA by navigating to the NVHPC SDK Current Release Downloads page and the CUDA Toolkit downloads page. Any provided code is written specifically to build and run on NCAR's Casper HPC system but may be adapted to other systems or personal machines. Material will be updated as appropriate for the future deployment of NCAR's Derecho cluster and as technology progresses.

optimization performance-tuning profiling parallelization github pytorch tensorflow oceanography gpu hpc-arch-and-perf training c c++fortran cuda jupyterhub programming programming-best-practices python

0 Likes

Type

learning

Level

CI Computing Module For All

Computing Module: Introduces fundamental concepts and skills of Cyberinfrastructure (CI) and High-Performance Computing (HPC) to lower the barrier to becoming CI users in disaster management research. The module will cover the critical topics of CI and HPC with hands-on sessions. Disaster Data Module: Introduces concepts of geospatial big data in disaster management. Students will learn how to access and process disaster data. Geospatial Analytic Module: Introduces geospatial analytics skills to address real-world challenges in disaster management. The module will use the data introduced in the Disaster Data Module and cover various geospatial analytics topics such as geosimulation, spatial optimization, network analysis, terrain analysis, Geospatial Artificial Intelligence (GeoAI), social sensing, and CyberGIS.

0 Likes

Type

learning

Level

Research Software Development in JupyterLab: A Platform for Collaboration Between Scientists and RSEs

JupyterLabIDE GitHub Repository

Iterative Programming takes place when you can explore your code and play with your objects and functions without needing to save, recompile, or leave your development environment. This has traditionally been achieved with a REPL or an interactive shell. The magic of Jupyter Notebooks is that the interactive shell is saved as a persistant document, so you don't have to flip back and forth between your code files and the shell in order to program iteratively. There are several editors and IDE's that are intended for notebook development, but JupyterLab is a natural choice because it is free and open source and most closely related to the Jupyter Notebooks/iPython projects. The chief motivation of this repository is to enable an IDE-like development environment through the use of extensions. There are also expositional notebooks to show off the usefulness of these features.

0 Likes

Type

learning

Level

Introduction to Vizualization on HPC Using Python

University of Arizona Workshop Series: Introduction to HPC, Visualization

This workshop has an introduction to the concepts of visualization followed by hands on exercises. The concepts section has Speaker Notes, and the hands on section has an accompanying Jupyter notebook. The workshop is one in a series of Introduction to HPC

visualization documentation training jupyterhub

0 Likes

Type

learning

Level

How the Little Jupyter Notebook Became a Web App: Managing Increasing Complexity with nbdev

Tutorial Site

A tutorial entitled "How the Little Jupyter Notebook Became a Web App: Managing Increasing Complexity with nbdev" presented at SciPy 2023 in Austin, TX. This tutorial is hosted in a series of Jupyter Notebooks which can be accessed in the click of a button using Binder. See the README for more information.

0 Likes

Type

learning

Level

Knowledge Base Resources

Topics

Programming Language

Science Domain

Skill Level

Content Type