Contributed by cyberinfrastructure professionals (researchers, research computing facilitators, research software engineers and HPC system administrators), these resources are shared through the ConnectCI community platform. Add resources you find helpful!
A comprehensive list of training resources from the HPC University. HPCU is a virtual organization whose primary goal is to provide a cohesive, persistent, and sustainable on-line environment to share educational and training materials for a continuum of high performance computing environments that span desktop computing capabilities to the highest-end of computing facilities offered by HPC centers.
Supervised Machine Learning Readiness is a self-paced, beginner-friendly program designed for Earth systems scientists to explore the core principles of supervised machine learning. This series uses a combination of step-by-step frameworks, exploratory widgets, and low-code exercises in Jupyter Notebooks, to explore the full cycle of machine learning model development. No programming experience is required. By the end of the series, you will be able to recognize when machine learning is an appropriate tool and critically evaluate machine learning in Earth systems science contexts.
Access requires a free NSF Unidata eLearning account.
DeapSECURE is a training program to infuse high-performance computational techniques into cybersecurity research and education. It is an NSF-funded project of the ODU School of Cybersecurity along with the Department of Electrical and Computer Engineering and the Information Technology Services at ODU. The DeapSECURE team has developed six non-degree training modules to expose cybersecurity students to advanced CI platforms and techniques rooted in big data, machine learning, neural networks, and high-performance programming. Techniques taught in DeapSECURE workshops are rather general and transferable to other areas including science, engineering, finance, linguistics, etc. All lesson materials are made available as open-source educational resources.
Self-paced tutorials on high-end computing topics such as parallel computing, multi-core performance, and performance tools. Other related topics include 'Cybersecurity for End Users' and 'Developing Webinar Training.' Some of the tutorials also offer digital badges. Many of these tutorials were previously offered on CI-Tutor. A list of open access training courses are provided below.
Parallel Computing on High-Performance Systems
Profiling Python Applications
Using an HPC Cluster for Scientific Applications
Debugging Serial and Parallel Codes
Introduction to MPI
Introduction to OpenMP
Introduction to Visualization
Introduction to Performance Tools
Multilevel Parallel Programming
Introduction to Multi-core Performance
Using the Lustre File System
This tutorial shows how to set up an open-source customizable RAG chatbot to answer questions about documents you can choose. It uses Indiana's Jetstream 2 HPC, but should work on any major ACCESS HPC.
The Better Scientific Software (BSSw) project provides a community to collaborate and learn about best practices in scientific software development. Software—the foundation of discovery in computational science & engineering—faces increasing complexity in computational models and computer architectures. BSSw provides a central hub for the community to address pressing challenges in software productivity, quality, and sustainability.
The Use of High-Performance Computing Services in University Settings: A Usability Case Study of the University of Cincinnati’s High-Performance Computing Cluster
0
This presentation gives a detailed breakdown of the outcome of my master's thesis which was focused on making HPC Clusters accessible across all disciplines in a university setting "Our Case Study was the university of Cincinnati".
This webinar series is an orientation to R. We start with an overview of R’s history and place in the larger data science ecosystem. Next, we introduce the R Studio user interface and how to access R’s excellent documentation. Finally, we present the fundamental concepts you need to use the R environment and language for data analysis. Along the way, we compare R script files (.R) to R Notebook (.Rmd) files and show how the features of R Notebook support better communication and encourage more dynamic engagement with statistical analysis and code. It is helpful to be familiar with tabular data analysis using statistical software, database tools, or spreadsheet programs.
Workshop materials, including setup directions and slides are available at https://github.com/CornellCAC/r_for_edu/ The Rstudio Cloud project used in the workshop is https://rstudio.cloud/project/4044219.
CaRCC – the Campus Research Computing Consortium – is an organization of dedicated professionals developing, advocating for, and advancing campus research computing and data and associated professions.
Vision: CaRCC advances the frontiers of research by improving the effectiveness of research computing and data (RCD) professionals, including their career development and visibility, and their ability to deliver services and resources for researchers. CaRCC connects RCD professionals and organizations around common objectives to increase knowledge sharing and enable continuous innovation in research computing and data capabilities.
Network packet processing faces significant performance challenges due to kernel overheads. These issues have become more pronounced with the rapid growth of network traffic. To address these challenges, the Data Plane Development Kit (DPDK) was developed. DPDK bypasses the kernel and operates directly in user space, offering significant improvements in performance and latency for packet processing tasks. However, DPDK's steep learning curve presents a barrier to entry for developers and network administrators. In recent years, P4 has emerged as a language specifically designed for expressing packet processing data paths. Building on this development, P4-DPDK has been introduced as a new technology that bridges P4 and DPDK. It allows developers to create P4 code which is then translated into a DPDK pipeline, combining the expressiveness of P4 with the performance benefits of DPDK. This lab series offers a hands-on experience on the basics of P4-DPDK.
Self-paced tutorials on high-end computing topics such as parallel computing, multi-core performance, and performance tools. Some of the tutorials also offer digital badges.
CS244N is a renowned natural language processing course offered by Stanford University and taught by Christopher Manning. It covers a wide range of topics in NLP, including language modeling, machine translation, sentiment analysis, and more. It teaches both foundational concepts and cutting-edge research to gain a comprehensive understanding of NLP techniques and applications.
The United Nations (UN) is an international organization comprising 193 Member States, including the United States. As a global organization, the UN is the one place on Earth where the world's nations can gather to discuss common problems and find shared solutions that benefit all humanity. This handbook has been produced for UN staff of all backgrounds and levels and provides an overview of how to approach your participation in a mentorship program. This resource is quickly digestible and provides a basic structure that will be helpful to review before the first meeting with your mentee.
Tableau is a popular and capable software product for creating charts that present data and dashboards that allow you to explore data. It is typically used to present business or statistical data, but can also create compelling visualizations of scientific data. However, scientific data is often generated or stored in formats that are not immediately accessible by Tableau. This seminar will explore the data formats that work best with Tableau and the available mechanisms for generating scientific data in (or converting it to) those formats so that you can apply the full power of Tableau to create the best possible visualizations of your data.
As developers, we get excited to think about challenging problems. When you ask us what we are working on, our eyes light up like children in a candy store. So why is it that so many of our developer and software origin stories are not told? How did we get to where we are today, and what did we learn along the way? This podcast aims to look “Behind the Scenes of Tech’s Passion Projects and People.” We want to know your developer story, what you have built, and why. We are an inclusive community - whatever kind of institution or country you hail from, if you are passionate about software and technology you are welcome!
phenoACCESS-24: Workshop on Research Computing and Plant Phenotyping
High-throughput plant phenotyping is computationally intensive, requiring data storage, data processing and analysis, research computing expertise, and mechanisms for data sharing. This workshop is aimed at research computing workforce development by addressing questions such as what is plant phenotyping; what types of data are collected; what are the preprocessing and analytical needs; what tools and platforms exist for data capture, management, analysis, and storage; and how best to collaborate and engage with phenotyping researchers. The full-day agenda will include speakers (scientists and research compute staff); panel discussions (how to work with research computing staff and facilities; how to engage with phenotyping scientists), and networking opportunities (meet-and-greet, ice breakers, small group discussions). The videos and slide decks for the talks are included on the linked page.
Discover Data Science is all about making connections between prospective students and educational opportunities in an exciting new, hot, and growing field – data science.
Backed by collegiate white papers, top industry professionals, and researchers, The Plank Center’s Mentorship Guide offers basic tips and tricks on how to get the most out of a mentorship relationship. This easy-to-follow guide supplements mentorship programs, lesson plans, and professional relationships.
Python has become a very popular programming language and software ecosystem for work in Data Science, integrating support for data access, data processing, modeling, machine learning, and visualization. In this webinar, we will describe some of the key Python packages that have been developed to support that work, and highlight some of their capabilities. This webinar will also serve as an introduction and overview of topics addressed in two Cornell Virtual Workshop tutorials, available at https://cvw.cac.cornell.edu/pydatasci1 and https://cvw.cac.cornell.edu/pydatasci2
An ongoing collection of RSE training material, workshops, and resources. We are compiling this list as a starting point for future activities. We are especially seeking material that goes beyond basic research computing competency (e.g. what The Carpentries does so well) and is general enough to span multiple domains. Specific tools and technologies used only in one domain, or applicable to only one subset of computing (i.e. HPC) are typically too narrowly focused. When in doubt, submit it to be included or reach out and we’d be happy to discuss.
The CyberAmbassadors project was funded through a workforce development grant from the National Science Foundation (Award #1730137). Starting in 2017, the initial focus of this project was to develop, test, and refine new curriculum to help CyberInfrastructure (CI) Professionals strengthen their communications, teamwork and leadership skills. With support and collaboration from a number of academic and professional organizations, the CyberAmbassadors project was expanded to offer professional skills training to college students and professionals working across STEM (science, technology, engineering, math) disciplines.
A book for researchers who contribute code to R projects: This booklet is the result of my work with the Social Cognition for Social Justice lab. It was developed in response to questions I was getting from students; both grad students that were making software design decisions, and undergraduates who were using things like version control for the first time. Although many tutorials and resources exist for these topics, there was not a single source that I thought covered just enough material to build up to the workflow used by the lab without extraneous detail.