Contributed by cyberinfrastructure professionals (researchers, research computing facilitators, research software engineers and HPC system administrators), these resources are shared through the ConnectCI community platform. Add resources you find helpful!
This workshop will go into the different ways python packages can be managed in a cluster environment using conda and python virtual environments both in batch mode from the command line and with Jupyter Notebooks and Jupyter Lab on the cluster. The examples will be run on the GMU HOPPER Cluster.
This tutorial demonstrates how to create, manage, and deploy containerized Jupyter simulations for High-Performance Computing (HPC) environments, specifically using SLAC's S3DF infrastructure. By utilizing Apptainer (formerly Singularity) containers, users can package complex simulations with all necessary dependencies, input files, and configurations, ensuring reproducibility and ease of use for new users. The automated workflows, powered by GitHub Actions, handle building and updating the containers, while Open OnDemand provides an accessible interface for running Jupyter notebooks directly from the HPC environment. This approach eliminates setup errors, saves time, and ensures consistent simulation environments, enabling researchers to focus on their work instead of system configuration.
Supervised Machine Learning Readiness is a self-paced, beginner-friendly program designed for Earth systems scientists to explore the core principles of supervised machine learning. This series uses a combination of step-by-step frameworks, exploratory widgets, and low-code exercises in Jupyter Notebooks, to explore the full cycle of machine learning model development. No programming experience is required. By the end of the series, you will be able to recognize when machine learning is an appropriate tool and critically evaluate machine learning in Earth systems science contexts.
Access requires a free NSF Unidata eLearning account.
This beginner-friendly guide introduces Retrieval-Augmented Generation (RAG), a technique to enhance Large Language Models (LLMs) by integrating external data sources. It covers the fundamentals of AI, LLMs, and RAG, providing step-by-step instructions, examples, and visual aids. The guide also discusses tools like Milvus, Faiss, and LangChain, offering a practical approach to building smarter AI systems.
This repository contains information about Jupyter Widgets and how they can be used to develop interactive workflows, data dashboards, and web applications that can be run on HPC systems and science gateways. Easy to build web applications are not only useful for scientists. They can also be used by software engineers and system admins who want to quickly create tools tools for file management and more!
This repository offers accessible resources and workshops on AI and high-performance computing (HPC), designed for both STEM and non-STEM majors. The materials are presented in simple language, requiring no prior technical background, making them suitable for a wide range of learners. The focus is on bridging the AI digital gap and enabling participants to harness the power of AI and HPC for research, innovation, and discovery.
A tutorial on the effective use of Dask on HPC resources. The four-hour tutorial will be split into two sections, with early topics focused on novice Dask users and later topics focused on intermediate usage on HPC and associated best practices. The knowledge areas covered include (but are not limited to):
Beginner section
High-level collections including dask.array and dask.dataframe
Distributed Dask clusters using HPC job schedulers
Earth Science data analysis using Dask with Xarray
Using the Dask dashboard to understand your computation
Intermediate section
Optimizing the number of workers and memory allocation
Choosing appropriate chunk shapes and sizes for Dask collections
Querying resource usage and debugging errors
GPU training series for scientists, software engineers, and students, with emphasis on Earth science applications.
The content of this course is coordinated with the 6 month series of GPU Training sessions starting in Februrary 2022. The NVIDIA High Performance Computing Software Development Kit (NVHPC SDK) and CUDA Toolkit will be the primary software requirements for this training which will be already available on NCAR's HPC clusters as modules you may load. This software is free to download from NVIDIA by navigating to the NVHPC SDK Current Release Downloads page and the CUDA Toolkit downloads page. Any provided code is written specifically to build and run on NCAR's Casper HPC system but may be adapted to other systems or personal machines. Material will be updated as appropriate for the future deployment of NCAR's Derecho cluster and as technology progresses.
Computing Module: Introduces fundamental concepts and skills of Cyberinfrastructure (CI) and High-Performance Computing (HPC) to lower the barrier to becoming CI users in disaster management research. The module will cover the critical topics of CI and HPC with hands-on sessions.
Disaster Data Module: Introduces concepts of geospatial big data in disaster management. Students will learn how to access and process disaster data.
Geospatial Analytic Module: Introduces geospatial analytics skills to address real-world challenges in disaster management. The module will use the data introduced in the Disaster Data Module and cover various geospatial analytics topics such as geosimulation, spatial optimization, network analysis, terrain analysis, Geospatial Artificial Intelligence (GeoAI), social sensing, and CyberGIS.
Iterative Programming takes place when you can explore your code and play with your objects and functions without needing to save, recompile, or leave your development environment. This has traditionally been achieved with a REPL or an interactive shell. The magic of Jupyter Notebooks is that the interactive shell is saved as a persistant document, so you don't have to flip back and forth between your code files and the shell in order to program iteratively.
There are several editors and IDE's that are intended for notebook development, but JupyterLab is a natural choice because it is free and open source and most closely related to the Jupyter Notebooks/iPython projects. The chief motivation of this repository is to enable an IDE-like development environment through the use of extensions. There are also expositional notebooks to show off the usefulness of these features.
This workshop has an introduction to the concepts of visualization followed by hands on exercises. The concepts section has Speaker Notes, and the hands on section has an accompanying Jupyter notebook.
The workshop is one in a series of Introduction to HPC
A tutorial entitled "How the Little Jupyter Notebook Became a Web App: Managing Increasing Complexity with nbdev" presented at SciPy 2023 in Austin, TX. This tutorial is hosted in a series of Jupyter Notebooks which can be accessed in the click of a button using Binder. See the README for more information.