There is a tremendous increase in volumes of text data across multiple disciplines.
It has become necessary to develop easy to use research frameworks using high performance computing
(HPC) capabilities for research with text data, because it is near impossible to run analysis of text data on
even medium sized datasets. For example, an attempt to run sentiment analysis algorithms on a social
media text data file with just 100,000 records would fail on a computer with 16 B or less RAM.
Such frameworks need to be beginner friendly and user friendly, and need to customized to the Rutgers’
computing environments to benefit researchers, faculty, students and other users and stakeholders. This
will empower all relevant users to focus on the core aspects of their research rather than struggle with
HPC related technological challenges.
To bring this concept to effect at Rutgers University, we propose the development of standardized
processes for basic multidisciplinary natural language processing (NLP) analyses to support beginners
and current users of the Amarel system.
Our work will focus on preparing Jupyter Notebooks in Python for textual data analyses, NLP and textual
data visualization. We anticipate the production of materials which will help researchers at Rutgers.
Project Information Subsection
Jupyter Notebooks in Python for textual data analyses, NLP and textual data visualization.
{Empty}
{Empty}
{Empty}
Some hands-on experience
{Empty}
{Empty}
{Empty}
CR-Rutgers
{Empty}
Yes
Already behind3Start date is flexible
6
{Empty}
{Empty}
{Empty}
{Empty}
Milestone Title: Launch Presentation Milestone Description: Give a launch presentation during the March CAREERS Monthly meeting. Completion Date Goal: 2024-03-13
Milestone Title: Gathering Datasets Milestone Description: Prepare text datasets, establish and preliminary training. Completion Date Goal: 2024-03-10
Milestone Title: Putting the code together Milestone Description: Experiment with code, process and datasets. Completion Date Goal: 2024-04-12
Milestone Title: Creating the Notebooks Milestone Description: Create scripts, standardized notebooks using NLP methods. Completion Date Goal: 2024-05-15
Milestone Title: Finalize materials Milestone Description: Finalize slides, notebooks and documentation. Completion Date Goal: 2024-06-12
Milestone Title: Wrap Presentation Milestone Description: Give a wrap presentation at the June CAREERS monthly meeting. Completion Date Goal: 2024-07-10
{Empty}
{Empty}
{Empty}
The student will gain familiarity with Rutgers' HPC system, Amarel, and understand how to run NLP analysis using Amarel.
{Empty}
Access to the Amarel cluster, Rutgers' HPC system.