Knowledge Base Resources

These resources have been contributed and “vetted” by the community of cyberinfrastructure professionals (researchers, research computing facilitators, research software engineers and HPC system administrators) that are participating in programs such as this one, that are supported by the ConnectCI community management platform. Additional Knowledge Base Resources are always welcome!

Add a Resource

Data Imputation Methods for Climate Data and Mortality Data

This slices and videos introduced how to use K-Nearest-Neighbors method to impute climate data and how to use Bayesian Spatio-Temporal models in R-INLA to impute mortality data. The demos will be added soon.

allocation-value documentation ai plotting visualization data-analysis machine-learning

0 Likes

Type

video_link

Level

Neurodesk

Neurodesk

Neurodesk provides a containerised data analysis environment to facilitate reproducible analysis of neuroimaging data. Analysis pipelines for neuroimaging data typically rely on specific versions of packages and software, and are dependent on their native operating system. These dependencies mean that a working analysis pipeline may fail or produce different results on a new computer, or even on the same computer after a software update. Neurodesk provides a platform in which anyone, anywhere, using any computer can reproduce your original research findings given the original data and analysis code.

psychology containers software-installation version-control

0 Likes

Type

website

Level

Installing Rocky Linux Operating System

Installing Rocky Linux 9

Rocky Linux is an open-source enterprise operating system. It is compatible with Red Hat Enterprise Linux (RHEL). It is a community-driven project that provides a stable and reliable platform for production workloads. It is one of the best alternatives to Opensource CentOS, since Centos will be on end of life (EoL) soon in 2024 by shifting to CentOS Stream.

unix-environment software-installation

0 Likes

Type

learning

Level

Automated Machine Learning Book

Automated Machine Learning: Methods, Systems, Challenges

The authoritative book on automated machine learning, which allows practitioners without ML expertise to develop and deploy state-of-the-art machine learning approaches. Describes the background of techniques used in detail, along with tools that are available for free.

ai data-analysis deep-learning machine-learning neural-networks python r

0 Likes

Type

learning

Level

Time-Series LSTMs Python Walkthrough

A walkthrough (with a Google Colab link) on how to implement your own LSTM to observe time-dependent behavior.

ai deep-learning machine-learning neural-networks pytorch python

0 Likes

Type

website

Level

Globus Documentation

Globus Documentation

Globus is a data transfer, sharing, automation, and discovery service used by hundreds of thousands of researchers to manage "big data" at universities, research labs, and national systems such as ACCESS. The Globus documentation website provides how-to guides, reference documentation, and examples for Globus's web application, command-line interface, Python software development kit (SDK), and APIs.

cloud-storage data-sharing data-management data-management-software data-transfer data-wrangling file-transfer globus dtn python data-security data-compliance federated-authentication secure-data-architecture

0 Likes

Type

documentation

Level

Building Anaconda Navigator applications

Building Anaconda Navigator applications

This tutorial explains how to create an Anaconda Navigator Application (app) for JupyterLab. It is intended for users of Windows, macOS, and Linux who want to generate an Anaconda Navigator app conda package from a given recipe. Prior knowledge of conda-build or conda recipes is recommended.

compiling conda programming programming-best-practices

0 Likes

Type

tool

Level

Slurm Scheduling Software Documentation

Slurm Documentation

Slurm is an open source, fault-tolerant, and highly scalable cluster management and job scheduling system for large and small Linux clusters. Slurm requires no kernel modifications for its operation and is relatively self-contained. As a cluster workload manager, Slurm has three key functions. First, it allocates exclusive and/or non-exclusive access to resources (compute nodes) to users for some duration of time so they can perform work. Second, it provides a framework for starting, executing, and monitoring work (normally a parallel job) on the set of allocated nodes. Finally, it arbitrates contention for resources by managing a queue of pending work.

cluster-management cluster-support slurm

0 Likes

Type

website

Level

Numba: Compiler for Python

Numba Compiler

Numba is a Python compiler designed for accelerating numerical and array operations, enabling users to enhance their application's performance by writing high-performance functions in Python itself. It utilizes LLVM to transform pure Python code into optimized machine code, achieving speeds comparable to languages like C, C++, and Fortran. Noteworthy features include dynamic code generation during import or runtime, support for both CPU and GPU hardware, and seamless integration with the Python scientific software ecosystem, particularly Numpy.

vectorization optimization performance-tuning parallelization

0 Likes

Type

documentation

Level

Introductory Tutorial to Numpy and Pandas for Data Analysis

Numpy and Pandas for Data Analysis

In this tutorial, I present an overview with many examples of the use of Numpy and Pandas for data analysis. Beginners in the field of data analysis can find It incredibly helpful, and at the same time, anyone who already has experience in data analysis and needs a refresher can find value in it. I discuss the use of Numpy for analyzing 1D and 2D multidimensional data and an introduction on using Pandas to manipulate CSV files.

ai big-data data-analysis vectorization

0 Likes

Type

documentation

Level

Weka

Weka is a collection of machine learning algorithms for data mining tasks. It contains tools for data preparation, classification, regression, clustering, association rules mining, and visualization.

big-data data-analysis machine-learning weka data-science java

0 Likes

Type

tool

Level

Anvil Home Page

Purdue University is the home of Anvil, a powerful supercomputer that provides advanced computing capabilities to support a wide range of computational and data-intensive research spanning from traditional high-performance computing to modern artificial intelligence applications.

anvil

0 Likes

Type

website

Level

NERSC Training and Tutorials

A comprehensive collection of NERSC developed training and tutorial events, offered on regular schedules. All sessions are archived, including slide decks, video recordings, and software examples as are available. Some examples of past training and tutorial topics are listed below Deep Learning for Sciences Webinar Series BerkeleyGW Tutorial Workshop VASP Trainings Timemory Software Monitoring Tutorial, April 2021 HPCToolkit to Measure and Analyzing GPU Applications Performance Tutorial Totalview Tutorial NVidia HPCSDK - OpenMP Target Offload Training Parallelware Training Series ARM Debugging and Profiling Tools Tutorial Roofline on NVIDIA GPUs GPUs for Science events 3-part OpenACC Training Series 9-part CUDA Training Series

training

0 Likes

Type

learning

Level

Python Tools for Data Science

Python Tools for Data Science

Python has become a very popular programming language and software ecosystem for work in Data Science, integrating support for data access, data processing, modeling, machine learning, and visualization. In this webinar, we will describe some of the key Python packages that have been developed to support that work, and highlight some of their capabilities. This webinar will also serve as an introduction and overview of topics addressed in two Cornell Virtual Workshop tutorials, available at https://cvw.cac.cornell.edu/pydatasci1 and https://cvw.cac.cornell.edu/pydatasci2

ai machine-learning big-data data-analysis data-wrangling data-science training workforce-development python scikit-learn sql

0 Likes

Type

video_link

Level

Solving differential equations with Physics-informed Neural Network

solving DE with neural networks

Differential equations, the backbone of countless physical phenomena, have traditionally been solved using numerical methods or analytical techniques. However, the advent of deep learning introduces an intriguing alternative: Physics-Informed Neural Networks (PINNs). By leveraging the representational power of neural networks and integrating physical laws (like differential equations), PINNs offer a novel approach to solving complex problems. This guide walks through an implementation of a PINN to solve DEs such as the logistic equation.

neural-networks

0 Likes

Type

learning

Level

Data Visualization Tools for Julia

Plots.jl is the most widely used plotting library for the Julia programming language. It's known for being especially powerful in its versatility and intuitiveness. It's limited set of dependencies and wide applicability across different graphics packages make it especially helpful in visualizing the results of your latest Julia implementation. However, there are still multiple options available for Julia programmers to visualize their datasets. The second link details a comparison against a variety of Julia packages.

plotting visualization julia

0 Likes

Type

tool

Level

Developer Stories Podcast

Developer Stories Podcast

As developers, we get excited to think about challenging problems. When you ask us what we are working on, our eyes light up like children in a candy store. So why is it that so many of our developer and software origin stories are not told? How did we get to where we are today, and what did we learn along the way? This podcast aims to look “Behind the Scenes of Tech’s Passion Projects and People.” We want to know your developer story, what you have built, and why. We are an inclusive community - whatever kind of institution or country you hail from, if you are passionate about software and technology you are welcome!

community-outreach professional-development training workforce-development

0 Likes

Type

website

Level

Warewulf documentation

Warewulf Documentation

Warewulf is an operating system provisioning platform for Linux that is designed to produce secure, scalable, turnkey cluster deployments that maintain flexibility and simplicity. It can be used to setup a stateless provisioning in HPC environment.

documentation administering-hpc distributed-computing hpc-cluster-architecture provisioning containers

0 Likes

Type

website

Level

MOPAC

Examples of I/O Files for Mopac

MOPAC (Molecular Orbital PACkage) is a semi-empirical quantum chemistry package used to compute molecular properties and structures by using approximations of the Schrödinger equation. This tutorial explains the process of using MOPAC for different forms of calculations.

computational-chemistry

0 Likes

Type

tool

Level

ACCESS Guide (originally given at Duke OIT)

Using Jetstream 2 for Duke members (written for Duke OIT)

A guide for Duke OIT on how to advise users on using ACCESS and allocation credits to jetstream 2 for Duke University members. This can be used for non Duke members. Assumes the reader has basic knowledge of ACCESS.

ACCESS-credits adding-users allocation-management jetstream cloud-computing login ACCESS-website project-management cilogon

0 Likes

Type

documentation

Level

Geocomputation with R (Free Reference Book)

Geocomputation with R

Below is a link for a book that focuses on how to use "sf" and "terra" packages for GIS computations. As of 5/1/2023, this book is up to date and examples are error free. The book has a lot of information but provides a good overview and example workflows on how to use these tools.

0 Likes

Type

learning

Level

OnShape FeatureScripts: Custom features for everyone

OnShape FeatureScripts

OnShape FeatureScripts allow users to create their own features via OnShape's programming language. The user can make these as simple or complex as they need, and they can save tons of time for heavy OnShape users or complex projects!

documentation materials-science particle-physics

0 Likes

Type

tool

Level

ACCESS Video Learning Center

Video Learning Center

A library of short videos about ACCESS allocations, resources and support.

training

0 Likes

Type

video_link

Level

Higher Ed Controlled Unclassified Information Slack (HigherEdCUI)

HigherEdCUI Slack Channel

Slack channel for the Higher Ed CUI community

cybersecurity

0 Likes

Type

tool

Level

Bioinformatics Workflow Management with Nextflow

Nextflow is an open-source, domain-specific language and workflow manager designed for the execution and coordination of scientific and data-intensive computational workflows. It was specifically created to address the challenges faced by researchers and scientists when dealing with complex and scalable computational pipelines, particularly in fields such as bioinformatics, genomics, and data analysis. Here provided some links to start with.

cloud-computing parallelization data-management bioinformatics training

0 Likes

Type

documentation

Level

Knowledge Base Resources

Topics

Programming Language

Science Domain

Skill Level

Content Type