This project entails generating vector representations using a general purpose and a finance corpus using the GloVe implementation. GloVe is an unsupervised learning algorithm for obtaining vector representations for words. The steps will involve extracting text from two sets of documents and building the two corpora, then training GloVe on these two corpora and generating vector representations. These vector representations will then be used to analyze the impact of domain-specific corpus on vector representation.
This project will require storage space to save large corpora and computation power to train GloVe on these corpora. A computing platform like URI’s HPC or MGHPCC will be used to perform these tasks. The student facilitator will help the project PI to get the computational workflow set up in an HPC environment i.e. develop and test the job submission scripts and set up the required software and data properly on the chosen computational resource.
This project entails generating vector representations using a general purpose and a finance corpus using the GloVe implementation. GloVe is an unsupervised learning algorithm for obtaining vector representations for words. The steps will involve extracting text from two sets of documents and building the two corpora, then training GloVe on these two corpora and generating vector representations. These vector representations will then be used to analyze the impact of domain-specific corpus on vector representation.
This project will require storage space to save large corpora and computation power to train GloVe on these corpora. A computing platform like URI’s HPC or MGHPCC will be used to perform these tasks. The student facilitator will help the project PI to get the computational workflow set up in an HPC environment i.e. develop and test the job submission scripts and set up the required software and data properly on the chosen computational resource.