Knowledge Base Resources

These resources have been contributed and “vetted” by the community of cyberinfrastructure professionals (researchers, research computing facilitators, research software engineers and HPC system administrators) that are participating in programs such as this one, that are supported by the ConnectCI community management platform. Additional Knowledge Base Resources are always welcome!

Add a Resource

HPC University

HPC University Resources

A comprehensive list of training resources from the HPC University. HPCU is a virtual organization whose primary goal is to provide a cohesive, persistent, and sustainable on-line environment to share educational and training materials for a continuum of high performance computing environments that span desktop computing capabilities to the highest-end of computing facilities offered by HPC centers.

3 Likes

Type

learning

Level

Cornell Virtual Workshop

Cornell Virtual Workshop is a comprehensive training resource for high performance computing topics. The Cornell University Center for Advanced Computing (CAC) is a leader in the development and deployment of Web-based training programs. Our Cornell Virtual Workshop learning platform is designed to enhance the computational science skills of researchers, accelerate the adoption of new and emerging technologies, and broaden the participation of underrepresented groups in science and engineering. Over 350,000 unique visitors have accessed Cornell Virtual Workshop training on programming languages, parallel computing, code improvement, and data analysis. The platform supports learning communities around the world, with code examples from national systems such as Frontera, Stampede2, and Jetstream2.

jetstream matlab cloud-computing data-analysis performance-tuning parallelization file-transfer globus slurm training cuda matlab python r mpi

1 Like

Type

learning

Level

ACCESS Pegasus Documentation

ACCESS Pegasus Documentation

The documentation provides an overview of using Pegasus, a workflow management system, on ACCESS resources for high throughput computing (HTC) workloads, covering logging in, workflow creation, resource configuration, and monitoring options.

pegasus

1 Like

Type

documentation

Level

Useful R Packages for Data Science and Statistics

https://www.udacity.com/blog/2021/01/best-r-packages-for-data-science.html

This Udacity article listed the most frequently used R packages for data science and statistics. For each package, the article provided the link to its official documentation. It will be a great start point if you want to start your data science journey in R.

plotting visualization data-analysis machine-learning data-science r

1 Like

Type

documentation

Level

DARWIN Documentation Pages

DARWIN Documentation

DARWIN (Delaware Advanced Research Workforce and Innovation Network) is a big data and high performance computing system designed to catalyze Delaware research and education

darwin big-data

1 Like

Type

documentation

Level

Optimizing Research Workflows - A Documentation of Snakemake

https://snakemake.readthedocs.io/en/stable/

Snakemake is a powerful and versatile workflow management system that simplifies the creation, execution, and management of data analysis pipelines. It uses a user-friendly, Python-based language to define workflows, making it particularly valuable for automating and reproducibly managing complex computational tasks in research and data analysis.

documentation data-analysis data-reproducibility workflow bioinformatics data-science python

0 Likes

Type

documentation

Level

NCSA HPC-Moodle

NCSA HPC-Moodle

Self-paced tutorials on high-end computing topics such as parallel computing, multi-core performance, and performance tools. Some of the tutorials also offer digital badges.

training workforce-development

0 Likes

Type

learning

Level

Official Documentation of VisIt

VisIt is a prominent open-source, interactive parallel visualization and graphical analysis tool predominantly used for viewing scientific data. Its GitHub repository offers a detailed insight into the software's source code, documentation, and contribution guidelines. In particular, it offers useful examples on how it

visIt novel-accelerators particle-physics

0 Likes

Type

documentation

Level

Active inference textbook

Active Inference: The Free Energy Principle in Mind, Brain, and Behavior

This textbook is the first comprehensive treatment of active inference, an integrative perspective on brain, cognition, and behavior used across multiple disciplines including computational neurosciences, machine learning, artificial intelligence, and robotics. It was published in 2022 and it's open access at this time. The contents in this textbook should be educational to those who want to understand how the free energy principle is applied to the normative behavior of living organisms and who want to widen their knowledge of sequential decision making under uncertainty.

ai machine-learning neural-networks

0 Likes

Type

learning

Level

ACCESS KB Guide - Anvil

ACCESS KB Guide - Anvil

Purdue University is the home of Anvil, a powerful supercomputer that provides advanced computing capabilities to support a wide range of computational and data-intensive research spanning from traditional high-performance computing to modern artificial intelligence applications.

anvil

0 Likes

Type

documentation

Level

Globus Documentation

Globus Documentation

Globus is a data transfer, sharing, automation, and discovery service used by hundreds of thousands of researchers to manage "big data" at universities, research labs, and national systems such as ACCESS. The Globus documentation website provides how-to guides, reference documentation, and examples for Globus's web application, command-line interface, Python software development kit (SDK), and APIs.

cloud-storage data-sharing data-management data-management-software data-transfer data-wrangling file-transfer globus dtn python data-security data-compliance federated-authentication secure-data-architecture

0 Likes

Type

documentation

Level

Advanced Mathematical Optimization Techniques

https://scipy-lectures.org/advanced/mathematical_optimization/

Mathematical optimization deals with the problem of finding numerically minimums or maximums of a functions. This tutorial provides the Python solutions for the optimization problems with examples.

optimization python

0 Likes

Type

learning

Level

Machine Learning in R online book

Flexible and Robust Machine Learning Using mlr3 in R

The free online book for the mlr3 machine learning framework for R. Gives a comprehensive overview of the package and ecosystem, suitable from beginners to experts. You'll learn how to build and evaluate machine learning models, build complex machine learning pipelines, tune their performance automatically, and explain how machine learning models arrive at their predictions.

data-analysis machine-learning r

0 Likes

Type

learning

Level

ACCESS Guide (originally given at Duke OIT)

Using Jetstream 2 for Duke members (written for Duke OIT)

A guide for Duke OIT on how to advise users on using ACCESS and allocation credits to jetstream 2 for Duke University members. This can be used for non Duke members. Assumes the reader has basic knowledge of ACCESS.

ACCESS-credits adding-users allocation-management jetstream cloud-computing login ACCESS-website project-management cilogon

0 Likes

Type

documentation

Level

Paraview UArizona HPC links (advanced)

These links take you to visualization resources supported by the University of Arizona's HPC visualization consultant ([rtdatavis.github.io](http://rtdatavis.github.io/)). The following links are specific to the Paraview program and the workflows that have been used my researchers at the U of Arizona. These links are distinct from the others posted in the beginner paraview access ci links from the University of Arizona in that they are for more complex workflows. The links included explain how to use the terminal with paraview (pvpython), and the steps to leverage HPC resources for headless batch rendering. The batch rendering tutorial is significantly more complex than the others so if you find yourself stuck please post on the https://ask.cyberinfrastructure.org/ and I will try to troubleshoot with you.

visualization

0 Likes

Type

documentation

Level

Examples of code using JSON nlohmann header only Library for C++

This code showcases how to work with the header-only nlohmann JSON library for C++. In order to compile, change the extensions from json_test.txt to json_test.cpp and test.txt to test.json. You must also download the header files from https://github.com/nlohmann/json. Complilation instructions are at the bottom of json_test. This code is very helpful for creating config files, for example.

c++

0 Likes

Type

learning

Level

Awesome Jupyter Widgets (for building interactive scientific workflows or science gateway tools)

Awesome Jupyter Widgets List

A curated list of awesome Jupyter widget packages and projects for building interactive visualizations for Python code

0 Likes

Type

learning

Level

Astronomy data analysis with astropy

astropy

Astropy is a community-driven package that offers core functionalities needed for astrophysical computations and data analysis. From coordinate transformations to time and date handling, unit conversions, and cosmological calculations, Astropy ensures that astronomers can focus on their research without getting bogged down by the intricacies of programming. This guide walks you through practical usage of astropy from CCD data reduction to computing galactic orbits of stars.

visualization image-processing astrophysics

0 Likes

Type

learning

Level

Probabilistic Semantic Data Association for Collaborative Human-Robot Sensing

Probabilistic Semantic Data Association for Collaborative Human-Robot Sensing

Humans cannot always be treated as oracles for collaborative sensing. Robots thus need to maintain beliefs over unknown world states when receiving semantic data from humans, as well as account for possible discrepancies between human-provided data and these beliefs. To this end, this paper introduces the problem of semantic data association (SDA) in relation to conventional data association problems for sensor fusion. It then, develops a novel probabilistic semantic data association (PSDA) algorithm to rigorously address SDA in general settings. Simulations of a multi-object search task show that PSDA enables robust collaborative state estimation under a wide range of conditions.

ai machine-learning

0 Likes

Type

documentation

Level

ACCESS KB Guide - DELTA

ACCESS KB Guide - DELTA

NCSA is the home of Delta, a computing and data resource that balances cutting-edge graphics processor and CPU architectures with a non-POSIX file system with a POSIX-like interface. Delta allows applications to reap the benefits of modern file systems without rewriting code.

delta

0 Likes

Type

documentation

Level

A survey on datasets for fairness-aware machine learning

A survey on datasets for fairness-aware machine learning

The research paper provides an overview of various datasets that have been used to study fairness in machine learning. It discusses the characteristics of these datasets, such as their size, diversity, and the fairness-related challenges they address. The paper also examines the different domains and applications covered by these datasets.

ai data-analysis deep-learning data-science

0 Likes

Type

documentation

Level

Biopython Tutorial

The Biopython Tutorial and Cookbook website is a dedicated online resource for users in the field of computational biology and bioinformatics. It provides a collection of tutorials and practical examples focused on using the Biopython library. The website offers a series of tutorials that cover various aspects of Biopython, catering to users with different levels of expertise. It also includes code snippets and examples, and common solutions to common challenges in computational biology.

bioinformatics genomics python

0 Likes

Type

learning

Level

CMake Tutorials

CMake Tutorials

CMake is an open-source tool used to manage the build process in operating systems. This tutorial takes you through how to use CMake from the very basics with example projects.

training compiling

0 Likes

Type

learning

Level

FSL Lectures

FSL Courses

This is the official University of Oxford FSL group lecture page. This includes information on upcoming and past courses (online and in-person), as well as lecture materials. Available lecture materials includes slides and recordings on using FSL, MR physics, and applications of imaging data.

data-analysis image-processing psychology

0 Likes

Type

learning

Level

Examples of Thrust code for GPU Parallelization

thrust_ex.txt

Some examples for writing Thrust code. To compile, download the CUDA compiler from NVIDIA. This code was tested with CUDA 9.2 but is likely compatible with other versions. Before compiling change extension from thrust_ex.txt to thrust_ex.cu. Any code on the device (GPU) that is run through a Thrust transform is automatically parallelized on the GPU. Host (CPU) code will not be. Thrust code can also be compiled to run on a CPU for practice.

parallelization gpu cuda

0 Likes

Type

learning

Level

Knowledge Base Resources

Topics

Programming Language

Science Domain

Skill Level

Content Type