Knowledge Base Resources

These resources have been contributed and “vetted” by the community of cyberinfrastructure professionals (researchers, research computing facilitators, research software engineers and HPC system administrators) that are participating in programs such as this one, that are supported by the ConnectCI community management platform. Additional Knowledge Base Resources are always welcome!

Add a Resource

HPC University

HPC University Resources

A comprehensive list of training resources from the HPC University. HPCU is a virtual organization whose primary goal is to provide a cohesive, persistent, and sustainable on-line environment to share educational and training materials for a continuum of high performance computing environments that span desktop computing capabilities to the highest-end of computing facilities offered by HPC centers.

3 Likes

Type

learning

Level

Cornell Virtual Workshop

Cornell Virtual Workshop is a comprehensive training resource for high performance computing topics. The Cornell University Center for Advanced Computing (CAC) is a leader in the development and deployment of Web-based training programs. Our Cornell Virtual Workshop learning platform is designed to enhance the computational science skills of researchers, accelerate the adoption of new and emerging technologies, and broaden the participation of underrepresented groups in science and engineering. Over 350,000 unique visitors have accessed Cornell Virtual Workshop training on programming languages, parallel computing, code improvement, and data analysis. The platform supports learning communities around the world, with code examples from national systems such as Frontera, Stampede2, and Jetstream2.

jetstream matlab cloud-computing data-analysis performance-tuning parallelization file-transfer globus slurm training cuda matlab python r mpi

1 Like

Type

learning

Level

DARWIN Documentation Pages

DARWIN Documentation

DARWIN (Delaware Advanced Research Workforce and Innovation Network) is a big data and high performance computing system designed to catalyze Delaware research and education

darwin big-data

1 Like

Type

documentation

Level

Using Linux commands in a python script (and the difference between the subprocess and os python modules)

Using Linux Commands in a Python Script

Learn how to use Linux commands in a python script. Specifically, learn how to use the subprocess and os modules in python to run shell commands (which run Linux commands) in a python script that is run on a cluster.

cluster-management programming python

1 Like

Type

learning

Level

ACCESS Pegasus Documentation

ACCESS Pegasus Documentation

The documentation provides an overview of using Pegasus, a workflow management system, on ACCESS resources for high throughput computing (HTC) workloads, covering logging in, workflow creation, resource configuration, and monitoring options.

pegasus

1 Like

Type

documentation

Level

DeapSECURE – Data-Enabled Advanced Computational Training Platform for Cybersecurity Research and Education

DeapSECURE lesson modules

DeapSECURE is a training program to infuse high-performance computational techniques into cybersecurity research and education. It is an NSF-funded project of the ODU School of Cybersecurity along with the Department of Electrical and Computer Engineering and the Information Technology Services at ODU. The DeapSECURE team has developed six non-degree training modules to expose cybersecurity students to advanced CI platforms and techniques rooted in big data, machine learning, neural networks, and high-performance programming. Techniques taught in DeapSECURE workshops are rather general and transferable to other areas including science, engineering, finance, linguistics, etc. All lesson materials are made available as open-source educational resources.

ai deep-learning machine-learning neural-networks visualization big-data data-analysis jekyll batch-jobs slurm bash ssh training workforce-development python scikit-learn cybersecurity

1 Like

Type

learning

Level

GIS: Geocoding Services

Geocoding is the process of taking a street address and converting it into coordinates that can be plotted on a map. This conversion typically requires an API call to a remote server hosted by an organization/institution. The remote server will take the address attributes provided by you and the remote server will compare it to the data it contains and return a best estimate on the coordinates for that location. There are many geocoding services available with different world coverages, quality of result, and set different rate limits for access. For R, a package called "tidygeocoder" provides an easy way to connect to these different services. As an additional benefit, their documentation provides a good summary of geocoding services available and links to their documentation. The link to the documentation for gecoding services accessible by "tidygeocoder" is provided below. For Python, geopy package is a library that provides connection to various geocoding services. The link to the documentation for this package is also included below.

gis

1 Like

Type

documentation

Level

PyTorch for Deep Learning and Natural Language Processing

Introduction to PyTorch for Deep Learning

PyTorch is a Python library that supports accelerated GPU processing for Machine Learning and Deep Learning. In this tutorial, I will teach the basics of PyTorch from scratch. I will then explore how to use it for some ML projects such as Neural Networks, Multi-layer perceptrons (MLPs), Sentiment analysis with RNN, and Image Classification with CNN.

ai big-data data-analysis deep-learning machine-learning neural-networks

1 Like

Type

documentation

Level

Useful R Packages for Data Science and Statistics

https://www.udacity.com/blog/2021/01/best-r-packages-for-data-science.html

This Udacity article listed the most frequently used R packages for data science and statistics. For each package, the article provided the link to its official documentation. It will be a great start point if you want to start your data science journey in R.

plotting visualization data-analysis machine-learning data-science r

1 Like

Type

documentation

Level

Version control with Git

Version Control with Git

Understand the benefits of an automated version control system and the basics of how automated version control systems work. Configure git the first time it is used on a computer and understand the meaning of the --global configuration flag. Create a local Git repository and describe the purpose of the .git directory. Go through the modify-add-commit cycle for one or more files, explain where information is stored at each stage of that cycle, and distinguish between descriptive and non-descriptive commit messages.

version-control github git

1 Like

Type

learning

Level

Introduction to Deep Learning in Pytorch

This workshop series introduces the essential concepts in deep learning and walks through the common steps in a deep learning workflow from data loading and preprocessing to training and model evaluation. Throughout the sessions, students participate in writing and executing simple deep learning programs using Pytorch – a popular Python library for developing, training, and deploying deep learning models.

ai deep-learning image-processing machine-learning neural-networks pytorch gpu

1 Like

Type

learning

Level

Gentle Introduction to Programming With Python

A Gentle Introduction to Programming with Python (MIT OCW)

This course from MIT OpenCourseWare (OCW) covers very basic information on how to get started with programming using Python. Lectures are available, along with practice assignments, to users at no cost. Python has many applications in tech today, from web frameworks to machine learning. This course will also instruct users on how to get set up with an IDE, which will allow for way more efficient debugging.

python

1 Like

Type

learning

Level

Data Visualization tools for Python

MatPlotLib Docs

Matplotlib is a comprehensive library for creating static, animated, and interactive visualizations in Python. It makes analyzing and presenting your data extremely easy and works with Python which many people already know.

documentation python

1 Like

Type

documentation

Level

Introduction to Python for Digital Humanities and Computational Research

Introduction to Python book

This documentation contains introductory material on Python Programming for Digital Humanities and Computational Research. This can be a go-to material for a beginner trying to learn Python programming and for anyone wanting a Python refresher.

ai big-data data-analysis deep-learning data-science python

1 Like

Type

documentation

Level

An Introduction to the Julia Programming Language

The Julia Programming Language is one of the fastest growing software languages for AI/ML development. It writes in manner that's similar to Python while being nearly as fast as C++, while being open source, and reproducible across platforms and environments. The following link provide an introduction to using Julia including the basic syntax, data structures, key functions, and a few key packages.

ai data-analysis machine-learning julia

0 Likes

Type

learning

Level

GIS: Projections and their distortions

Map Projections

In GIS, projections are helpful to take something plotted on a globe and convert it to a flat map that we can print or show on a screen. Unfortunately it also introduces distortions to the objects and features on the map. This not only distorts the objects visually, but the results for any spatial attribute calculations will also reflect this distortion (such as distance and area ). Below is a link to a quick primer on projections, types of distortions that can occur, and suggestions on how to choose a correct projection for your work.

gis

0 Likes

Type

learning

Level

Bioinformatics Workflow Management with Nextflow

Nextflow is an open-source, domain-specific language and workflow manager designed for the execution and coordination of scientific and data-intensive computational workflows. It was specifically created to address the challenges faced by researchers and scientists when dealing with complex and scalable computational pipelines, particularly in fields such as bioinformatics, genomics, and data analysis. Here provided some links to start with.

cloud-computing parallelization data-management bioinformatics training

0 Likes

Type

documentation

Level

Introduction to Probabilistic Graphical Models

https://ermongroup.github.io/cs228-notes/

This website summarizes the notes of Stanford's introductory course on probabilistic graphical models. It starts from the very basics and concludes by explaining from first principles the variational auto-encoder, an important probabilistic model that is also one of the most influential recent results in deep learning.

ai machine-learning

0 Likes

Type

learning

Level

R for Data Science

https://r4ds.had.co.nz/index.html

R for Data Science is a comprehensive resource for individuals looking to harness the power of the R programming language for data analysis, visualization, and statistical modeling. Whether you're a beginner or an experienced data scientist, this guide will help you unlock the full potential of R in the realm of data science.

visualization data-analysis data-science r

0 Likes

Type

learning

Level

ACCESS HPC Workshop Series

Monthly workshops sponsored by ACCESS on a variety of HPC topics organized by Pittsburgh Supercomputing Center (PSC). Each workshop will be telecast to multiple satellite sites and workshop materials are archived.

deep-learning machine-learning neural-networks big-data tensorflow gpu training openmpi c c++fortran openmp programming mpi spark

0 Likes

Type

learning

Level

Building the ArduPilot environment for Linux

Building the ArduPilot environment for Linux

This article provides instructions for building AirSim, an open-source simulator for autonomous vehicles, on Linux. It outlines the steps to build Unreal Engine, clone and build the AirSim repository, and set up the Unreal environment. It also includes information on how to use AirSim and optional setups such as remote control for manual flight.

profiling data-transfer github github-pages cpu-architecture bash environment-modules git modules os permissions ssh vim

0 Likes

Type

documentation

Level

Understanding LLM Fine-tuning

The Ultimate Guide to LLM Fine Tuning: Best Practices & Tools

With the recent uprising of LLM's many business are looking at way to adopt these LLMs and fine-tuning these models on specfic data sets to ensure accuracy. These models when fine-tuned can be optimal for fulfilling the specific needs of a company. This site explains explicitly when, how, and why models should be trained. It goes over various strategies for LLM fine -tuning.

big-data training

0 Likes

Type

learning

Level

Regular Expressions

Regular expressions (sometimes referred to as RegEx) is an incredibly powerful tool that is used to define string patterns for "find" or "find and replace" operations on strings, or for input validation. Regular Expressions are used in search engines, in search and replace dialogs of word processors and text editors, and text-processing Linux utilities such as sed and awk. They are supported in many programming languages, including Python, R, Perl, Java, and others.

perl programming python r

0 Likes

Type

learning

Level

CHARMM Links to Install, Run, and Troubleshoot MD Simulations

CHARMM (Chemistry at HARvard Macromolecular Mechanics) is a widely distributed molecular simulation program with a broad array of applications. CHARMM has the capabilities to setup and run simulations on both biological and materials systems, contains a comprehensive set of analysis and tools, and has high performance on a variety of platforms. Here you will find links to the CHARMM website, forum, and registration/download page.

charmm molecular-dynamics namd computational-chemistry

0 Likes

Type

learning

Level

The Official Documentation of Pandas

pandas documentation

Pandas is one of the most essential Python libraries for data analysis and manipulation. It provides high-performance, easy-to-use data structures, and data analysis tools for the Python programming language. The official documentation serves as an in-depth guide to using this powerful tool including explanations and examples.

plotting visualization

0 Likes

Type

documentation

Level

Knowledge Base Resources

Topics

Programming Language

Science Domain

Skill Level

Content Type