Knowledge Base Resources

Contributed by cyberinfrastructure professionals (researchers, research computing facilitators, research software engineers and HPC system administrators), these resources are shared through the ConnectCI community platform. Add resources you find helpful!

Add a Resource

HPC University

HPC University Resources

A comprehensive list of training resources from the HPC University. HPCU is a virtual organization whose primary goal is to provide a cohesive, persistent, and sustainable on-line environment to share educational and training materials for a continuum of high performance computing environments that span desktop computing capabilities to the highest-end of computing facilities offered by HPC centers.

3 Likes

Type

learning

Level

An Introduction to Cryptography with Python

Workshop Tutorial

This comprehensive workshop is designed to guide participants through the world of cryptography, from foundational concepts to advanced implementations. Starting with the basics of encryption, decryption, and hashing, the workshop discusses real-world applications like SSL, blockchain, and digital signatures. Interactive Python-based coding examples, such as symmetric and asymmetric encryption, will provide hands-on experience. Participants will also learn to identify cryptographic vulnerabilities and perform attacks like length extension. Finally, the workshop also explores future trends such as quantum cryptography and zero-knowledge proofs, providing participants with the knowledge to apply cryptography in securing modern digital systems. Ideal for beginners and intermediate learners alike, this workshop is a step-by-step journey into mastering cryptographic principles and practices.

python data-security cybersecurity encryption secure-data-architecture

2 Likes

Type

website

Level

Using Linux commands in a python script (and the difference between the subprocess and os python modules)

Using Linux Commands in a Python Script

Learn how to use Linux commands in a python script. Specifically, learn how to use the subprocess and os modules in python to run shell commands (which run Linux commands) in a python script that is run on a cluster.

cluster-management programming python

1 Like

Type

learning

Level

Introduction to Python for Digital Humanities and Computational Research

Introduction to Python book

This documentation contains introductory material on Python Programming for Digital Humanities and Computational Research. This can be a go-to material for a beginner trying to learn Python programming and for anyone wanting a Python refresher.

ai big-data data-analysis deep-learning data-science python

1 Like

Type

documentation

Level

PyTorch Documentation

PyTorch Documantation

PyTorch is an optimized tensor computation library that supports automatic differentiation and is designed to accelerate deep learning research and production on both GPUs and CPUs. Built with flexibility and performance in mind, PyTorch provides a dynamic computational graph and a rich ecosystem of tools for building and deploying deep learning models.

1 Like

Type

documentation

Level

NCSA HPC Training Moodle

NCSA HPC Training Moodle Site

Self-paced tutorials on high-end computing topics such as parallel computing, multi-core performance, and performance tools. Other related topics include 'Cybersecurity for End Users' and 'Developing Webinar Training.' Some of the tutorials also offer digital badges. Many of these tutorials were previously offered on CI-Tutor. A list of open access training courses are provided below. Parallel Computing on High-Performance Systems Profiling Python Applications Using an HPC Cluster for Scientific Applications Debugging Serial and Parallel Codes Introduction to MPI Introduction to OpenMP Introduction to Visualization Introduction to Performance Tools Multilevel Parallel Programming Introduction to Multi-core Performance Using the Lustre File System

performance-tuning profiling parallelization lustre training workforce-development openmp python mpi cybersecurity

1 Like

Type

learning

Level

Tutorial: Localized RAG Chatbot with ACCESS HPC

Tutorial: Localized RAG Chatbot with ACCESS HPC

This tutorial shows how to set up an open-source customizable RAG chatbot to answer questions about documents you can choose. It uses Indiana's Jetstream 2 HPC, but should work on any major ACCESS HPC.

1 Like

Type

tool

Level

Enhancing LLMs with RAG: A Beginner’s Guide

Open-Source LLM RAG Enhancement

This beginner-friendly guide introduces Retrieval-Augmented Generation (RAG), a technique to enhance Large Language Models (LLMs) by integrating external data sources. It covers the fundamentals of AI, LLMs, and RAG, providing step-by-step instructions, examples, and visual aids. The guide also discusses tools like Milvus, Faiss, and LangChain, offering a practical approach to building smarter AI systems.

ai llm NAIRR-pilot generative-ai nlp deep-learning machine-learning neural-networks reporting artificial-intelligence computer-science data-science jupyterhub python

1 Like

Type

learning

Level

Data Visualization tools for Python

MatPlotLib Docs

Matplotlib is a comprehensive library for creating static, animated, and interactive visualizations in Python. It makes analyzing and presenting your data extremely easy and works with Python which many people already know.

documentation python

1 Like

Type

documentation

Level

Containerized Jupyter Notebooks for HPCs

Containerized Jupyter Notebooks for HPCs

This tutorial demonstrates how to create, manage, and deploy containerized Jupyter simulations for High-Performance Computing (HPC) environments, specifically using SLAC's S3DF infrastructure. By utilizing Apptainer (formerly Singularity) containers, users can package complex simulations with all necessary dependencies, input files, and configurations, ensuring reproducibility and ease of use for new users. The automated workflows, powered by GitHub Actions, handle building and updating the containers, while Open OnDemand provides an accessible interface for running Jupyter notebooks directly from the HPC environment. This approach eliminates setup errors, saves time, and ensures consistent simulation environments, enabling researchers to focus on their work instead of system configuration.

1 Like

Type

learning

Level

Enhanced Sampling for MD simulations

Tools and plugins to enhance molecular dynamics sampling

data-analysis computational-chemistry c++conda cuda python

1 Like

Type

tool

Level

Managing Python Packages on an HPC Cluster

Python Packages on HPC

This workshop will go into the different ways python packages can be managed in a cluster environment using conda and python virtual environments both in batch mode from the command line and with Jupyter Notebooks and Jupyter Lab on the cluster. The examples will be run on the GMU HOPPER Cluster.

documentation pytorch data-science open-ondemand batch-jobs job-submission slurm environment-modules anaconda jupyterhub python library-paths dependencies pip version-control

1 Like

Type

documentation

Level

Cornell Virtual Workshop

Cornell Virtual Workshop is a comprehensive training resource for high performance computing topics. The Cornell University Center for Advanced Computing (CAC) is a leader in the development and deployment of Web-based training programs. Our Cornell Virtual Workshop learning platform is designed to enhance the computational science skills of researchers, accelerate the adoption of new and emerging technologies, and broaden the participation of underrepresented groups in science and engineering. Over 350,000 unique visitors have accessed Cornell Virtual Workshop training on programming languages, parallel computing, code improvement, and data analysis. The platform supports learning communities around the world, with code examples from national systems such as Frontera, Stampede2, and Jetstream2.

jetstream matlab cloud-computing data-analysis performance-tuning parallelization file-transfer globus slurm training cuda matlab python r mpi

1 Like

Type

learning

Level

DeapSECURE – Data-Enabled Advanced Computational Training Platform for Cybersecurity Research and Education

DeapSECURE lesson modules

DeapSECURE is a training program to infuse high-performance computational techniques into cybersecurity research and education. It is an NSF-funded project of the ODU School of Cybersecurity along with the Department of Electrical and Computer Engineering and the Information Technology Services at ODU. The DeapSECURE team has developed six non-degree training modules to expose cybersecurity students to advanced CI platforms and techniques rooted in big data, machine learning, neural networks, and high-performance programming. Techniques taught in DeapSECURE workshops are rather general and transferable to other areas including science, engineering, finance, linguistics, etc. All lesson materials are made available as open-source educational resources.

ai deep-learning machine-learning neural-networks visualization big-data data-analysis jekyll batch-jobs slurm bash ssh training workforce-development python scikit-learn cybersecurity

1 Like

Type

learning

Level

Gentle Introduction to Programming With Python

A Gentle Introduction to Programming with Python (MIT OCW)

This course from MIT OpenCourseWare (OCW) covers very basic information on how to get started with programming using Python. Lectures are available, along with practice assignments, to users at no cost. Python has many applications in tech today, from web frameworks to machine learning. This course will also instruct users on how to get set up with an IDE, which will allow for way more efficient debugging.

python

1 Like

Type

learning

Level

Introductory Python Lecture Series

Python Handbook Series

A lecture and notes with the goal of teaching introductory python. Starting by understanding how to download and start using python, then expanding to basic syntax for lists, arrays, loops, and methods.

documentation programming python

0 Likes

Type

learning

Level

Setting up PyFR flow solver on clusters

PyFR installation to local machine

These instructions were executed on the FASTER and Grace cluster computing facilities at Texas A&M University. However, the process can be applied to other clusters with similar environments. For local installation, please refer to the PyFR documentation. Please note that these instructions were valid at the time of writing. Depending on the time you're executing these, the versions of the modules may need to be updated. 1. Loading Modules The first step involves loading pre-installed software libraries required for PyFR. Execute the following commands in your terminal to load these modules: module load foss/2022b module load libffi/3.4.4 module load OpenSSL/1.1.1k module load METIS/5.1.0 module load HDF5/1.13.1 2. Python Installation from Source Choose a location for Python 3.11.1 installation, preferably in a .local directory. Navigate to the directory containing the Python 3.11.1 source code. Then configure and install Python: cd $INSTALL/Python-3.11.1/ ./configure --prefix=$LOCAL --enable-shared --with-system-ffi --with-openssl=/sw/eb/sw/OpenSSL/1.1.1k-GCCcore-11.2.0/ PKG_CONFIG_PATH=$LOCAL/pkgconfig LDFLAGS=/usr/lib64/libffi.so.6.0.2 make clean; make -j20; make install; 3. Virtual Environment Setup A virtual environment allows you to isolate Python packages for this project from others on your system. Create and activate a virtual environment using: pip3.11 install virtualenv python3.11 -m venv pyfr-venv . pyfr-venv/bin/activate 4. Install PyFR Dependencies Several Python packages are required for PyFR. Install these packages using the following commands: pip3 install --upgrade pip pip3 install --no-cache-dir wheel pip3 install --no-cache-dir botorch pandas matplotlib pyfr pip3 uninstall -y pyfr 5. Install PyFR from Source Finally, navigate to the directory containing the PyFR source code, and then install PyFR: cd /scratch/user/sambit98/github/PyFR/ python3 setup.py develop Congratulations! You've successfully set up PyFR on the FASTER and Grace cluster computing facilities. You should now be able to use PyFR for your computational fluid dynamics simulations.

faster fluid-dynamics c++cuda python mpi software-installation

0 Likes

Type

learning

Level

GPU Acceleration in Python

GPU Acceleration in Python

This tutorial explains how to use Python for GPU acceleration with libraries like CuPy, PyOpenCL, and PyCUDA. It shows how these libraries can speed up tasks like array operations and matrix multiplication by using the GPU. Examples include replacing NumPy with CuPy for large datasets and using PyOpenCL or PyCUDA for more control with custom GPU kernels. It focuses on practical steps to integrate GPU acceleration into Python programs.

machine-learning big-data data-analysis optimization parallelization gpu cuda python

0 Likes

Type

learning

Level

Optimizing Research Workflows - A Documentation of Snakemake

https://snakemake.readthedocs.io/en/stable/

Snakemake is a powerful and versatile workflow management system that simplifies the creation, execution, and management of data analysis pipelines. It uses a user-friendly, Python-based language to define workflows, making it particularly valuable for automating and reproducibly managing complex computational tasks in research and data analysis.

documentation data-analysis data-reproducibility workflow bioinformatics data-science python

0 Likes

Type

documentation

Level

Official Documentation for PyTorch and NumPy

The official documentation for PyTorch, a machine learning tensor-based framework, and NumPy, which allows for support for ndarrays which is useful to make tensors when implementing NNs. Both libraries can be installed with pip.

deep-learning neural-networks pytorch python

0 Likes

Type

documentation

Level

AI/ML TechLab - Accelerating AI/ML Workflows on a Composable Cyberinfrastructure

This technology lab contains a set of sessions to help a new user start an AI project on the ACES cluster, a composable accelerator testbed at Texas A&M University. You will learn how to create and activate a virtual environment, manipulate and visualize data with Pandas and Matplotlib, use Scikit-learn for linear regression and classification applications, and use Pytorch to create and train a simple image classification model with deep neural networks (DNN).

ACES documentation TAMU ai visualization deep-learning machine-learning neural-networks login authentication composable-systems gpu nvidia slurm bash modules vim anaconda conda programming python scikit-learn

0 Likes

Type

documentation

Level

CI Computing Module For All

Computing Module: Introduces fundamental concepts and skills of Cyberinfrastructure (CI) and High-Performance Computing (HPC) to lower the barrier to becoming CI users in disaster management research. The module will cover the critical topics of CI and HPC with hands-on sessions. Disaster Data Module: Introduces concepts of geospatial big data in disaster management. Students will learn how to access and process disaster data. Geospatial Analytic Module: Introduces geospatial analytics skills to address real-world challenges in disaster management. The module will use the data introduced in the Disaster Data Module and cover various geospatial analytics topics such as geosimulation, spatial optimization, network analysis, terrain analysis, Geospatial Artificial Intelligence (GeoAI), social sensing, and CyberGIS.

0 Likes

Type

learning

Level

Research Software Development in JupyterLab: A Platform for Collaboration Between Scientists and RSEs

JupyterLabIDE GitHub Repository

Iterative Programming takes place when you can explore your code and play with your objects and functions without needing to save, recompile, or leave your development environment. This has traditionally been achieved with a REPL or an interactive shell. The magic of Jupyter Notebooks is that the interactive shell is saved as a persistant document, so you don't have to flip back and forth between your code files and the shell in order to program iteratively. There are several editors and IDE's that are intended for notebook development, but JupyterLab is a natural choice because it is free and open source and most closely related to the Jupyter Notebooks/iPython projects. The chief motivation of this repository is to enable an IDE-like development environment through the use of extensions. There are also expositional notebooks to show off the usefulness of these features.

0 Likes

Type

learning

Level

Intro to Statistical Computing with Stan

The Stan language is used to specify a (Bayesian) statistical model with an imperative program calculating the log probability density function. Here are some useful links to start your exploration of this statistical programming language, and a Python interface to Stan.

data-analysis machine-learning monte-carlo python

0 Likes

Type

documentation

Level

MATLAB with other Programming Languages

Using MATLAB with Other Programming Languages

MATLAB is a really useful tool for data analysis among other computational work. This tutorial takes you through using MATLAB with other programming languages including C, C++, Fortran, Java, and Python.

c c++fortran java matlab python

0 Likes

Type

tool

Level

Knowledge Base Resources

Topics

Programming Language

Science Domain

Skill Level

Content Type