Use PACER datasets to collect bankruptcy filings and identify cases filed for business reasons, then design a web-scraper to collect names of the petitioners and other information from the bankruptcy petitions and further develop textual analysis-based routine to analyze names and classify bankruptcy filings by gender.
Project Information Subsection
Penn State University
CR-Penn State
Already behind3Start date is flexible
Milestone Title: Collect Bankruptcy Data Milestone Description: Write the code to obtain bankruptcy case filings 2008-present from the Federal Judicial Center web-site. Organize available data elements in the SAS dataset. Produce summary statistics of the data where applicable. Codify textual categorical responses where possible for further data analysis. Validate data and ensure proper formatting. Completion Date Goal: 2024-05-10
Milestone Title: Develop NLP code to classify petitions by gender Milestone Description: Obtain list of names from the dataset produced in (1). Develop the code to scrape web for gender identifiers associated with the names. Use NLP logic to associate names with gender. Assign gender identifier to the bankruptcy filings based on names. Completion Date Goal: 2024-06-10
Milestone Title: Assist with statistical analysis Milestone Description: Add local economic and run regressions in SAS to explore gender-based distribution of bankruptcy filings. Completion Date Goal: 2024-07-10