These resources have been contributed and “vetted” by the community of cyberinfrastructure professionals (researchers, research computing facilitators, research software engineers and HPC system administrators) that are participating in programs such as this one, that are supported by the ConnectCI community management platform. Additional Knowledge Base Resources are always welcome!
This workshop will go into the different ways python packages can be managed in a cluster environment using conda and python virtual environments both in batch mode from the command line and with Jupyter Notebooks and Jupyter Lab on the cluster. The examples will be run on the GMU HOPPER Cluster.
This documentation contains introductory material on Python Programming for Digital Humanities and Computational Research. This can be a go-to material for a beginner trying to learn Python programming and for anyone wanting a Python refresher.
This Udacity article listed the most frequently used R packages for data science and statistics. For each package, the article provided the link to its official documentation. It will be a great start point if you want to start your data science journey in R.
Gaussian 16 is a computational chemistry package that is used in predicting molecular properties and understanding molecular behavior at a quantum mechanical level.
R for Data Science is a comprehensive resource for individuals looking to harness the power of the R programming language for data analysis, visualization, and statistical modeling. Whether you're a beginner or an experienced data scientist, this guide will help you unlock the full potential of R in the realm of data science.
A tutorial entitled "How the Little Jupyter Notebook Became a Web App: Managing Increasing Complexity with nbdev" presented at SciPy 2023 in Austin, TX. This tutorial is hosted in a series of Jupyter Notebooks which can be accessed in the click of a button using Binder. See the README for more information.
HPCwire is a prominent news and information source for the HPC community. Their website offers articles, analysis, and reports on HPC technologies, applications, and industry trends.
An ongoing collection of RSE training material, workshops, and resources. We are compiling this list as a starting point for future activities. We are especially seeking material that goes beyond basic research computing competency (e.g. what The Carpentries does so well) and is general enough to span multiple domains. Specific tools and technologies used only in one domain, or applicable to only one subset of computing (i.e. HPC) are typically too narrowly focused. When in doubt, submit it to be included or reach out and we’d be happy to discuss.
MOPAC (Molecular Orbital PACkage) is a semi-empirical quantum chemistry package used to compute molecular properties and structures by using approximations of the Schrödinger equation. This tutorial explains the process of using MOPAC for different forms of calculations.
Samtools is a suite of programs for interacting with high-throughput sequencing data, especially in the SAM/BAM format. It offers various utilities for processing, analyzing, and managing sequence data generated from next-generation sequencing (NGS) experiments. Samtools is widely used in bioinformatics and genomics research for tasks such as read alignment, variant calling, and data manipulation.
This repository contains information about Jupyter Widgets and how they can be used to develop interactive workflows, data dashboards, and web applications that can be run on HPC systems and science gateways. Easy to build web applications are not only useful for scientists. They can also be used by software engineers and system admins who want to quickly create tools tools for file management and more!
CHARMM (Chemistry at HARvard Macromolecular Mechanics) is a widely distributed molecular simulation program with a broad array of applications. CHARMM has the capabilities to setup and run simulations on both biological and materials systems, contains a comprehensive set of analysis and tools, and has high performance on a variety of platforms. Here you will find links to the CHARMM website, forum, and registration/download page.
Weka is a collection of machine learning algorithms for data mining tasks. It contains tools for data preparation, classification, regression, clustering, association rules mining, and visualization.
Snakemake is a powerful and versatile workflow management system that simplifies the creation, execution, and management of data analysis pipelines. It uses a user-friendly, Python-based language to define workflows, making it particularly valuable for automating and reproducibly managing complex computational tasks in research and data analysis.
MDAnalysis is a python based library of tools for the analysis of molecular dynamics simulations. It is able to read and write many popular simulation formats including CHARMM, LAMMPS, GROMACS, and AMBER and more. This link contains the documentation pages of all MDAnalysis functions and has links to tutorials using Jupyter Notebooks.
This webinar series is an orientation to R. We start with an overview of R’s history and place in the larger data science ecosystem. Next, we introduce the R Studio user interface and how to access R’s excellent documentation. Finally, we present the fundamental concepts you need to use the R environment and language for data analysis. Along the way, we compare R script files (.R) to R Notebook (.Rmd) files and show how the features of R Notebook support better communication and encourage more dynamic engagement with statistical analysis and code. It is helpful to be familiar with tabular data analysis using statistical software, database tools, or spreadsheet programs.
Workshop materials, including setup directions and slides are available at https://github.com/CornellCAC/r_for_edu/ The Rstudio Cloud project used in the workshop is https://rstudio.cloud/project/4044219.
Numpy is a python package that leverages types and compiled C code to make many math operations in Python efficient. It is especially useful for matrix manipulation and operations.
The research paper provides an overview of various datasets that have been used to study fairness in machine learning. It discusses the characteristics of these datasets, such as their size, diversity, and the fairness-related challenges they address. The paper also examines the different domains and applications covered by these datasets.
Astropy is a community-driven package that offers core functionalities needed for astrophysical computations and data analysis. From coordinate transformations to time and date handling, unit conversions, and cosmological calculations, Astropy ensures that astronomers can focus on their research without getting bogged down by the intricacies of programming. This guide walks you through practical usage of astropy from CCD data reduction to computing galactic orbits of stars.