Knowledge Base Resources

Use these links “vetted” by the community. Additional CI links are always welcome.

What is fairness in ML?

Building ML models for everyone: understanding fairness in machine learning

This article discusses the importance of fairness in machine learning and provides insights into how Google approaches fairness in their ML models. The article covers several key topics: Introduction to fairness in ML: It provides an overview of why fairness is essential in machine learning systems, the potential biases that can arise, and the impact of biased models on different communities. Defining fairness: The article discusses various definitions of fairness, including individual fairness, group fairness, and disparate impact. It explains the challenges in achieving fairness due to trade-offs and the need for thoughtful considerations. Addressing bias in training data: It explores how biases can be present in training data and offers strategies to identify and mitigate these biases. Techniques like data preprocessing, data augmentation, and synthetic data generation are discussed. Fairness in ML algorithms: The article examines the potential biases that can arise from different machine learning algorithms, such as classification and recommendation systems. It highlights the importance of evaluating and monitoring models for fairness throughout their lifecycle. Fairness tools and resources: It showcases various tools and resources available to practitioners and developers to help measure, understand, and mitigate bias in machine learning models. Google's TensorFlow Extended (TFX) and What-If Tool are mentioned as examples. Google's approach to fairness: The article highlights Google's commitment to fairness and the steps they take to address fairness challenges in their ML models. It mentions the use of fairness indicators, ongoing research, and partnerships to advance fairness in AI. Overall, the article provides a comprehensive overview of fairness in machine learning and offers insights into Google's approach to building fair ML models.

0 Likes

Type

documentation

Level

Flag as

EasyBuild Documentation

EasyBuild is a software installation framework that allows administrators to easily build and install software on high-performance computing (HPC) systems. It supports a wide range of software packages, toolchains, and compilers. Supported software are found in the EasyConfigs repository, one of several resositories in EasyBuild project.

easybuild

0 Likes

Type

documentation

Level

Flag as

Thrust resources

Thrust is a CUDA library that optimizes parallelization on the GPU for you. The Thrust tutorial is great for beginners. The documentation is helpful for anyone using Thrust.

parallelization gpu resources

0 Likes

Type

learning

Level

Flag as

MDAnalysis - Python library for the analysis of molecular dynamics simulations

MDAnalysis

MDAnalysis is a python based library of tools for the analysis of molecular dynamics simulations. It is able to read and write many popular simulation formats including CHARMM, LAMMPS, GROMACS, and AMBER and more. This link contains the documentation pages of all MDAnalysis functions and has links to tutorials using Jupyter Notebooks.

computational-chemistry materials-science python

0 Likes

Type

tool

Level

Flag as

Docker - Containerized, reproducible workflows

Docker Documentation

Docker allows for containerization of any task - basically a smaller, scalable version of a virtual machine. This is very useful when transferring work across computing environments, as it ensures reproducibility.

documentation cloud-computing deep-learning

0 Likes

Type

tool

Level

Flag as

Open-Source Server Virtualization Platform

Proxmox Virtual Environment - Installation

Proxmox Virtual Environment is a hyper-converged infrastructure open-source software. It is a hosted hypervisor that can run operating systems including Linux and Windows on x64 hardware.

software-installation

0 Likes

Type

learning

Level

Flag as

Data Visualization Tools for Julia

Plots.jl is the most widely used plotting library for the Julia programming language. It's known for being especially powerful in its versatility and intuitiveness. It's limited set of dependencies and wide applicability across different graphics packages make it especially helpful in visualizing the results of your latest Julia implementation. However, there are still multiple options available for Julia programmers to visualize their datasets. The second link details a comparison against a variety of Julia packages.

plotting visualization julia

0 Likes

Type

tool

Level

Flag as

UNIX/command line basics tutorial

UNIX/command line basics tutorial

Introductory training materials for working on the UNIX command line.

bash

0 Likes

Type

learning

Level

Flag as

Official Python Documentation

Python 3.11.5 Documentation

The official documentation for Python 3.11.5. Python comes with a lot of features built into the language, so it is worth taking a look as you code.

documentation python

0 Likes

Type

documentation

Level

Flag as

Mechanism and Implementation of Various MPI Libraries

There is a detailed explanation about communication routines and managing methods of different MPI libraries, as well as several exercises designed for users to get familiar with the implementation of MPI build process.

compiling mpi

0 Likes

Type

website

Level

Flag as

Understanding LLM Fine-tuning

The Ultimate Guide to LLM Fine Tuning: Best Practices & Tools

With the recent uprising of LLM's many business are looking at way to adopt these LLMs and fine-tuning these models on specfic data sets to ensure accuracy. These models when fine-tuned can be optimal for fulfilling the specific needs of a company. This site explains explicitly when, how, and why models should be trained. It goes over various strategies for LLM fine -tuning.

big-data training

0 Likes

Type

learning

Level

Flag as

NITRC

NITRC

The Neuroimaging Tools and Resources Collaboratory (NITRC) is a neuroimaging informatics knowledge environment for MR, PET/SPECT, CT, EEG/MEG, optical imaging, clinical neuroinformatics, imaging genomics, and computational neuroscience tools and resources.

data-analysis image-processing data-sharing

0 Likes

Type

website

Level

Flag as

How the Little Jupyter Notebook Became a Web App: Managing Increasing Complexity with nbdev

Tutorial Site

A tutorial entitled "How the Little Jupyter Notebook Became a Web App: Managing Increasing Complexity with nbdev" presented at SciPy 2023 in Austin, TX. This tutorial is hosted in a series of Jupyter Notebooks which can be accessed in the click of a button using Binder. See the README for more information.

0 Likes

Type

learning

Level

Flag as

UCLA Extended Reality (XR) collaboration resources and Workshop

Extended Reality (XR) Resource workshop/Guide for Building collaboration

Comprehensive Extended Reality (XR) collaboration resources for building a high performance extended reality (XR), augmented reality (AR), virtual reality (VR) and mixed reality campus teams. The tags set are a small subset of the the topics covered.

documentation neural-networks

0 Likes

Type

presentation

Level

Flag as

Performance Engineering Of Software Systems

MIT Performance Engineering Of Software Systems Homepage

A class from MITOpenCourseware that gives a hands on approach to building scalable and high-performance software systems. Topics include performance analysis, algorithmic techniques for high performance, instruction-level optimizations, caching optimizations, parallel programming, and building scalable systems.

optimization parallelization training

0 Likes

Type

learning

Level

Flag as

ACCESS Video Learning Center

Video Learning Center

A library of short videos about ACCESS allocations, resources and support.

training

0 Likes

Type

video_link

Level

Flag as

NCSA HPC Training Moodle

NCSA HPC Training Moodle Site

Self-paced tutorials on high-end computing topics such as parallel computing, multi-core performance, and performance tools. Other related topics include 'Cybersecurity for End Users' and 'Developing Webinar Training.' Some of the tutorials also offer digital badges. Many of these tutorials were previously offered on CI-Tutor. A list of open access training courses are provided below. Parallel Computing on High-Performance Systems Profiling Python Applications Using an HPC Cluster for Scientific Applications Debugging Serial and Parallel Codes Introduction to MPI Introduction to OpenMP Introduction to Visualization Introduction to Performance Tools Multilevel Parallel Programming Introduction to Multi-core Performance Using the Lustre File System

performance-tuning profiling parallelization lustre training workforce-development openmp python mpi cybersecurity

0 Likes

Type

learning

Level

Flag as

Machine Learning in Astrophysics

Machine learning is becoming increasingly important in field with large data such as astrophysics. AstroML is a Python module for machine learning and data mining built on numpy, scipy, scikit-learn, matplotlib, and astropy allowing for a range of statistical and machine learning routines to analyze astronomical data in Python. In particular, it has loaders for many open astronomical datasets with examples on how to visualize such complicated and large datasets.

plotting big-data image-processing machine-learning astrophysics

0 Likes

Type

documentation

Level

Flag as

A visual introduction to Gaussian Belief Propagation

https://gaussianbp.github.io/

This website is an interactive introduction to Gaussian Belief Propagation (GBP). A probabilistic inference algorithm that operates by passing messages between the nodes of arbitrarily structured factor graphs. A special case of loopy belief propagation, GBP updates rely only on local information and will converge independently of the message schedule. The key argument is that, given recent trends in computing hardware, GBP has the right computational properties to act as a scalable distributed probabilistic inference framework for future machine learning systems.

ai machine-learning

0 Likes

Type

website

Level

Flag as

AI/ML TechLab - Accelerating AI/ML Workflows on a Composable Cyberinfrastructure

This technology lab contains a set of sessions to help a new user start an AI project on the ACES cluster, a composable accelerator testbed at Texas A&M University. You will learn how to create and activate a virtual environment, manipulate and visualize data with Pandas and Matplotlib, use Scikit-learn for linear regression and classification applications, and use Pytorch to create and train a simple image classification model with deep neural networks (DNN).

ACES documentation TAMU ai visualization deep-learning machine-learning neural-networks login authentication composable-systems gpu nvidia slurm bash modules vim anaconda conda programming python scikit-learn

0 Likes

Type

documentation

Level

Flag as

RMACC Website

RMACC.org

Rocky Mountain Advanced Computing Consortium Website

community-outreach

0 Likes

Type

website

Level

Flag as

CaRCC Data Facing Track

CaRCC Data Facing Track Page

The Data-Facing Track of the People Network brings together people from research computing groups, libraries, research institutes, and other organizations who support data-enabled research. Many of us are also Researcher-Facing, but this track is an opportunity to discuss the varied challenges of working with data.

data-analysis data-access-protocols data-lifecycle data-management data-management-software data-provenance data-retention data-reproducibility data-transfer data-wrangling storage hpc-storage data-compliance

0 Likes

Type

website

Level

Flag as

Electric field analyses for molecular simulations

TUPA - Electric field analyses for molecular simulations

Tool to compute electric fields from molecular simulations

visualization computational-chemistry conda python

0 Likes

Type

tool

Level

Flag as

HPCwire

HPCwire

HPCwire is a prominent news and information source for the HPC community. Their website offers articles, analysis, and reports on HPC technologies, applications, and industry trends.

documentation pytorch data-science bioinformatics hpc-operations training programming programming-best-practices python

0 Likes

Type

website

Level

Flag as

Introduction to GPU/Parallel Programming using OpenACC

Intro to OpenACC

Introduction to the basics of OpenACC.

gpu c c++compiling fortran

0 Likes

Type

presentation

Level

Flag as

Knowledge Base Resources

Topics

Programming Language

Science Domain

Skill Level

Content Type