Numpy is a python package that leverages types and compiled C code to make many math operations in Python efficient. It is especially useful for matrix manipulation and operations.
CaRCC – the Campus Research Computing Consortium – is an organization of dedicated professionals developing, advocating for, and advancing campus research computing and data and associated professions.
Vision: CaRCC advances the frontiers of research by improving the effectiveness of research computing and data (RCD) professionals, including their career development and visibility, and their ability to deliver services and resources for researchers. CaRCC connects RCD professionals and organizations around common objectives to increase knowledge sharing and enable continuous innovation in research computing and data capabilities.
Bioinformatics Toolbox provides algorithms and apps for Next Generation Sequencing (NGS), microarray analysis, mass spectrometry, and gene ontology. Using toolbox functions, you can read genomic and proteomic data from standard file formats such as SAM, FASTA, CEL, and CDF, as well as from online databases such as the NCBI Gene Expression Omnibus and GenBank.
Spack is a package manager for supercomputers that can help administrators install scientific software and libraries for multiple complex software stacks.
This repository contains information about Jupyter Widgets and how they can be used to develop interactive workflows, data dashboards, and web applications that can be run on HPC systems and science gateways. Easy to build web applications are not only useful for scientists. They can also be used by software engineers and system admins who want to quickly create tools tools for file management and more!
VSCode is a popular IDE that runs on Windows, MacOS, and Linux. This tutorial will explain how to get set up with VSCode to code in Python. It will also provide a tutorial on how to set up Github integration within VSCode.
As LLMs get larger fine-tuning to the full extent can become difficult to train on consumer hardware. Storing and deploying these tuned models can also be quite expensive and difficult to store. With PEFT (parameter -efficent fine tuning), it approaches fine-tune on a smaller scale of model parameters while freezing most parameters of the pretrained LLMs. Basically it is providing full performance that which is similar if not better than full fine tuning while only having a small number of trainable parameters. This source explains that as well as going over LORA diagrams and a code walk through.
Hour of Cyberinfrastructure (Hour of CI) is a nationwide campaign to introduce undergraduate and graduate students to cyberinfrastructure and geographic information science (GIS).
Raftlib is an open-source C++ Library that provides a framework for implementing parallel and concurrent data processing pipelines. It is designed to simplify the development of high-performance data processing applications by abstracting away the complexities of parallelism, concurrency, and data flow management.
It enables stream/data-flow parallel computation by linking parallel compute kernels together using simple right shift operators, similar to C++ streams for string manipulation. RaftLib eliminates the need for explicit usage of traditional threading libraries such as pthreads, std::thread, or OpenMP, which can lead to non-deterministic behavior when misused.
InsideHPC is an informational site offers videos, research papers, articles, and other resources focused on machine learning and quantum computing among other topics within high performance computing.
The Better Scientific Software (BSSw) project provides a community to collaborate and learn about best practices in scientific software development. Software—the foundation of discovery in computational science & engineering—faces increasing complexity in computational models and computer architectures. BSSw provides a central hub for the community to address pressing challenges in software productivity, quality, and sustainability.
JSON is a lightweight format for storing and transporting data, for example in a config file. This library is header-only, and has easy-to-read documentation. It is a C++ library.
Data visualization is a critical aspect of data analysis. It allows for a clear and concise representation of data, making it easier for users to understand and interpret complex datasets. One of the most popular libraries for data visualization in Python is Matplotlib. The included website aims to provide a brief overview of Matplotlib, its features, and examples/exercises to dive deeper into its functionalities.
This website summarizes the notes of Stanford's introductory course on probabilistic graphical models.
It starts from the very basics and concludes by explaining from first principles the variational auto-encoder, an important probabilistic model that is also one of the most influential recent results in deep learning.
This tutorial introduces the use of Containers using the Charliecloud software suite. This tutorial will provide participants with background and hands-on experience to use basic Charliecloud containers for HPC applications. We discuss what containers are, why they matter for HPC, and how they work. We'll give an overview of Charliecloud, the unprivileged container solution from Los Alamos National Laboratory's HPC Division. Students will learn how to build toy containers and containerize real HPC applications, and then run them on a cluster. Exercises are demonstrated using the ACES cluster, a composable accelerator testbed at Texas A&M University. Students with an allocation on the ACES cluster can follow along with the ACES-specific exercises.