These resources have been contributed and “vetted” by the community of cyberinfrastructure professionals (researchers, research computing facilitators, research software engineers and HPC system administrators) that are participating in programs such as this one, that are supported by the ConnectCI community management platform. Additional Knowledge Base Resources are always welcome!
The NSF-funded ResearchSOC helps make scientific computing resilient to cyberattacks and capable of supporting trustworthy, productive research through operational cybersecurity services, training, and information sharing necessary to a community as unique and variable as research and education (R&E).
ResearchSOC is a service offering from Indiana University's OmniSOC.
Neocortex is a new supercomputing cluster at the Pittsburgh Supercomputing Center (PSC) that features groundbreaking AI hardware from Cerebras Systems.
Warewulf is an operating system provisioning platform for Linux that is designed to produce secure, scalable, turnkey cluster deployments that maintain flexibility and simplicity. It can be used to setup a stateless provisioning in HPC environment.
Feed-forward neural networks are a simple type of network that simply rely on data to be "fed-forward" through a series of layers that makes decisions on how to categorize datum. Gradient descent is a type of optimization tool that is often used to train machines. These two areas in ML are good starting points and are the easiest types of neural network/optimization to understand.
The listed repository contains code written in C++ to model the flow inside a cavity with a lid moving above from left to right by discretizing incompressible N-S equations with finite difference method. For the governing equations, artificial viscosity has been considered to increase the stability. In terms of solving the resulted algebraic equation system, both the Point Jacobi Method and Symmetric Gauss Seidel methods have been used for the iteration process.
Nextflow is an open-source, domain-specific language and workflow manager designed for the execution and coordination of scientific and data-intensive computational workflows. It was specifically created to address the challenges faced by researchers and scientists when dealing with complex and scalable computational pipelines, particularly in fields such as bioinformatics, genomics, and data analysis.
Here provided some links to start with.
InsideHPC is an informational site offers videos, research papers, articles, and other resources focused on machine learning and quantum computing among other topics within high performance computing.
This tutorial introduces machine learning on high performance computing (HPC) clusters. While it focuses on the HPC clusters at The University of Arizona, the content is generic enough that it can be used by students from other institutions.
NVIDIA CUDA Toolkit Documentation: If you are working with GPUs in HPC, the NVIDIA CUDA Toolkit is essential. You can access the CUDA Toolkit documentation, including programming guides and API references, at this provided website
Discover Data Science is all about making connections between prospective students and educational opportunities in an exciting new, hot, and growing field – data science.
Data visualization is a critical aspect of data analysis. It allows for a clear and concise representation of data, making it easier for users to understand and interpret complex datasets. One of the most popular libraries for data visualization in Python is Matplotlib. The included website aims to provide a brief overview of Matplotlib, its features, and examples/exercises to dive deeper into its functionalities.
Connect.Cybinfrastructure is a family of portals, each representing a program that is serving a segment of the research computing and data community. Each portal provides program-specific information, as well a custom "view" into a common database. The portal was originally developed to support project workflows and a knowledge base of self service learning resources for the Northeast Cyberteam. Subsequently, it was expanded to provide support to multiple cyberteams and other research computing communities of practice. We welcome additional communities, please contact us if you are interested in participating. Central to the Portal is an extensive and ever-evolving tagging infrastructure which informs every aspect of the Portal. The tag taxonomy was initially developed by the Northeast Cyberteam to categorize subject matter relevant to practitioners of Research Computing Facilitation and is ever changing due to the frequent introduction of new technology in domains that characterize the field of research computing.
Representation learning is a fundamental concept in machine learning and artificial intelligence, particularly in the field of deep learning. At its core, representation learning involves the process of transforming raw data into a form that is more suitable for a specific task or learning objective. This transformation aims to extract meaningful and informative features or representations from the data, which can then be used for various tasks like classification, clustering, regression, and more.
The official documentation for PyTorch, a machine learning tensor-based framework, and NumPy, which allows for support for ndarrays which is useful to make tensors when implementing NNs. Both libraries can be installed with pip.
Goes through in detail on how to build an application that can run on Android and IOS devices, using Qt Creator to develop Qt Quick applications. Goes through the setting up, creation, configuration, optimization, and overall deployment. This provides the fundamental basis, need to click around on the site for more specifics.
EasyBuild is a software installation framework that allows administrators to easily build and install software on high-performance computing (HPC) systems. It supports a wide range of software packages, toolchains, and compilers.
Supported software are found in the EasyConfigs repository, one of several resositories in EasyBuild project.
In this tutorial, I present an overview with many examples of the use of Numpy and Pandas for data analysis. Beginners in the field of data analysis can find It incredibly helpful, and at the same time, anyone who already has experience in data analysis and needs a refresher can find value in it. I discuss the use of Numpy for analyzing 1D and 2D multidimensional data and an introduction on using Pandas to manipulate CSV files.
The Neuroimaging Tools and Resources Collaboratory (NITRC) is a neuroimaging informatics knowledge environment for MR, PET/SPECT, CT, EEG/MEG, optical imaging, clinical neuroinformatics, imaging genomics, and computational neuroscience tools and resources.
As developers, we get excited to think about challenging problems. When you ask us what we are working on, our eyes light up like children in a candy store. So why is it that so many of our developer and software origin stories are not told? How did we get to where we are today, and what did we learn along the way? This podcast aims to look “Behind the Scenes of Tech’s Passion Projects and People.” We want to know your developer story, what you have built, and why. We are an inclusive community - whatever kind of institution or country you hail from, if you are passionate about software and technology you are welcome!
Purdue University is the home of Anvil, a powerful supercomputer that provides advanced computing capabilities to support a wide range of computational and data-intensive research spanning from traditional high-performance computing to modern artificial intelligence applications.
The research paper provides an overview of various datasets that have been used to study fairness in machine learning. It discusses the characteristics of these datasets, such as their size, diversity, and the fairness-related challenges they address. The paper also examines the different domains and applications covered by these datasets.
Samtools is a suite of programs for interacting with high-throughput sequencing data, especially in the SAM/BAM format. It offers various utilities for processing, analyzing, and managing sequence data generated from next-generation sequencing (NGS) experiments. Samtools is widely used in bioinformatics and genomics research for tasks such as read alignment, variant calling, and data manipulation.