These resources have been contributed and “vetted” by the community of cyberinfrastructure professionals (researchers, research computing facilitators, research software engineers and HPC system administrators) that are participating in programs such as this one, that are supported by the ConnectCI community management platform. Additional Knowledge Base Resources are always welcome!
Samtools is a suite of programs for interacting with high-throughput sequencing data, especially in the SAM/BAM format. It offers various utilities for processing, analyzing, and managing sequence data generated from next-generation sequencing (NGS) experiments. Samtools is widely used in bioinformatics and genomics research for tasks such as read alignment, variant calling, and data manipulation.
This slices and videos introduced how to use K-Nearest-Neighbors method to impute climate data and how to use Bayesian Spatio-Temporal models in R-INLA to impute mortality data. The demos will be added soon.
Snakemake is a powerful and versatile workflow management system that simplifies the creation, execution, and management of data analysis pipelines. It uses a user-friendly, Python-based language to define workflows, making it particularly valuable for automating and reproducibly managing complex computational tasks in research and data analysis.
pandas is a fast, powerful, flexible and easy to use open source data analysis and manipulation tool, built on top of the Python programming language. It lets you store data in easy to manage and display data frames, with column names and datatypes.
Scikit-learn is free software machine learning library for Python. It has a variety of features you can use on data, from linear regression classifiers to xg-boost and random forests. It is very useful when you want to analyze small parts of data quickly.
Numpy is a python package that leverages types and compiled C code to make many math operations in Python efficient. It is especially useful for matrix manipulation and operations.