Chapter 2 Proposal
2.1 Research topic
As announced in September 2022 by the World Health Organization, the end of the Covid-19 pandemic was in sight. This was a piece of great news for the United States and even the whole world. However, this global pandemic has already lasted for three years and made significant impacts on the economy, culture, and society. Most importantly, lots of people’s daily lives have been severely influenced. In addition to the effects on social life and physical activities, the Covid-19 pandemic has forced many people to lose their jobs. Based on this phenomenon, our group is interested in the unemployment situation in the U.S. after the Covid outbreak, especially the recent period from September 2021 to September 2022. How did the unemployment situation change after Covid-19 burst out? Can we find any pattern along the time? How did the economic conditions influence the unemployment rates in the U.S.? Was the state location a vital factor to consider when it came to the unemployment rate in the U.S.? Did the factors, such as race, age, etc., affect a person’s employment status? Several questions arose in our brains. We intend to get some intuitions into the answers to these questions in this project by conducting a series of exploratory data analysis and visualization.
We collected our data from the U.S. Bureau of Labor Statistics (BLS) website according to our interests. The BLS is the principal fact-finding agency for the Federal government in the broad fields of labor economics and statistics. All data on the website are public and can be legally used by us.
2.2 Data availability
To investigate the unemployment conditions in the United States, we have decided to utilize the data from the U.S. Bureau of Labor Statistics and the U.S. Bureau of Economic Analysis. We consider these data sources because they are the U.S. official departments that provide officially valid data. Based on such data, we will draw more convincing conclusions. More details can be accessed from the [Documentation of Unemployment Situation] (https://www.bls.gov/news.release/pdf/empsit.pdf)
Most of our data can be downloaded directly as an Excel file. Some of them need to be downloaded as HTML files, and then we will extract the corresponding information using ‘rvest’ package in R. If we have any questions regarding data, we will attempt to contact the U.S. Bureau of Labor Statistics and the U.S. Bureau of Economic Analysis via email, Twitter or LinkedIn. The data is fairly accurate, while the data might suffer from the sampling bias from surveys.
Below are five aspects we intend to explore and corresponding data sources for our project.
2.2.1 Unemployment Situation by Month
Firstly, we intend to visualize the seasonally adjusted unemployed population and unemployment rates for each month in the United States from September 2021 to September 2022. The data comes from the Current Population Survey (CPS), a household survey administered by the U.S. Census Bureau on a sample of 60,000 households about the employment situation, provided by the U.S. Bureau of Labor Statistics. The historical data is updated monthly and can be retrieved from the [Labor Force Statistics (CPS) Database] (https://www.bls.gov/webapps/legacy/cpsatab1.htm)
. This database contains employment and unemployment conditions for the total civilian population, men of 16 years old and above, women of 16 years old and above, men of 20 years old and above, women of 20 years old and above, and both sexes with 16-19 years old. Under each category, employment metrics such as employment-population ratio, unemployed population, unemployment rate, etc., are listed. In this task, we will only focus on the unemployed population and unemployment rate for the total civilian population. We will use seasonally adjusted data to eliminate the influence of predictable seasonal patterns and provide a more meaningful metric. By selecting the above conditions and clicking on the “Retrieve Data” button, we will get two tables: unemployed population and unemployment rates for the total civilian population, respectively. Each table records unemployment data, which is continuous numerical data, from January 2012 to September 2022. We will download these two tables from the website as two Excel files and import them into R studio directly. We will use the “dplyr” package in R to filter the corresponding data between September 2021 and September 2022. If we have any questions, we will attempt to contact the U.S. Bureau of Labor Statistics via email or social media like Twitter.
2.2.2 Unemployment Rates by State
Secondly, we plan to investigate the geographical distribution of unemployment conditions in the United States from September 2021 to September 2022. Specifically, we will visualize each state’s seasonally adjusted unemployment rates, provided by the Local Area Unemployment Statistics (LAUS) program under the U.S. Bureau of Labor Statistics. The state unemployment rate is the sum of two parts: estimates from the Current Population Survey (CPS)——the source of national unemployment rates——and real-time estimates controlled by models. These models reflect current and historical data from the CPS, the Current Employment Statistics (CES) survey, and state unemployment insurance (UI) systems. The unemployment rates are updated on a monthly basis, and the data can be accessed from [Local Area Unemployment Statistics Map] (https://data.bls.gov/lausmap/showMap.jsp;jsessionid=26EFE5EB9A9FF91E84278E09CECCDA06._t3_08v)
. Each page presents the state unemployment rate for a specific month. By selecting different years and months, we can access the historical unemployment data at different times. The monthly unemployment data is a two-column table. The first column records the name of each state, which is categorical data, and the second one records the unemployment rate for a specific month, which is continuous numerical data. We plan to utilize the “rvest” package in R to crawl the unemployment information in the table. We will first download the pages for each month and then process them one by one in R. If we have any questions, we will contact the U.S. Bureau of Labor Statistics via email or social media like Twitter.
2.2.3 Unemployment Situation by Race
Thirdly, we plan to investigate the unemployment status of the civilian population by different races (i.e., White, Black or African American, and Asian) from September 2021 to September 2022. Specifically, we will focus on the seasonally adjusted unemployed population and unemployment rates for different races. The data is collected from the Current Population Survey (CPS) by the U.S. Bureau of Labor Statistics. The data can be retrieved from the [Labor Force Statistics (CPS) Database] (https://www.bls.gov/webapps/legacy/cpsatab2.htm)
, which contains employment data for White, Black or African American, and Asian. Under each race, the employment status data, such as employed population, unemployed population, unemployment rates, etc., of the total civilian population, men with 20 years old and above, women with 20 years old and above, and both sexes with 16-19 years old is provided. Here we will only select seasonally adjusted unemployed population and unemployment rates for the total civilian population under each race. Finally, we will get six tables, which are “Unemployment Level-White”, “Unemployment Rates-White”, “Unemployment Level-Black or African American”, “Unemployment Rates-Black or African American”, “Unemployment Level-Asian”, and “Unemployment Rates-Asian”. The rows of each table are the years from 2012 to 2022, and the columns are the months from January to December. Each table records the continuous numerical data for each metric. Each table can be downloaded directly as an Excel file, and we will import those six files into R studio directly and filter data between September 2021 and September 2022. For more information, we will contact the U.S. Bureau of Labor Statistics via email or social media like Twitter.
2.2.4 Unemployment Situation by Age
Fourthly, we want to investigate the unemployment status of the civilian population by age from September 2021 to September 2022. Specifically, we would like to visualize the seasonally adjusted unemployment population based on each age interval provided by the Division of Labor Force Statistics under the Bureau of Labor Statistics. The data was estimated through the household survey given by the Bureau of Labor Statistics based on 60,000 household data sampled by the U.S. Census Bureau. The unemployment rate data by age can be retrieved from the [Labor Force Statistics (CPS) Database] (https://www.bls.gov/news.release/empsit.t09.htm)
. The data specifically shows the estimated unemployment population in the U.S. at different age intervals and were further separated into two tables by gender. The table divides age intervals into six: 16 to 17 years, 18 to 19 years, 25 to 34 years, 35 to 44 years, 45 to 54 years, and 55 years and over. Within each table by gender and age, the data was separated into two parts: not seasonally adjusted data and seasonally adjusted data. Because the unemployment rate is influenced by season, seasonally adjusted data helps us better understand the influence of the economy on the labor market. We may use the web scraper package “rvest” in R as what we did in our PSet2 to import those data from September 2021 to September 2022. For the unemployment rate by age, we can contact the Division of Labor Force Statistics in the Bureau of Labor Statistics for more information via their email or phone number. The data quality is quite high, while the only issue is that the sampling errors may partly influence the sample accuracy.
2.2.5 GDP and CPI by Region
We decided to investigate the regional GDP and CPI data from 2021 to 2022 across the United States from September 2021 to September 2022. We plan to visualize these geospatial data of GDP and CPI on a map to represent the percentage changes of these two indicators, which will help us better understand the unemployment situation in terms of its economic context in the post-pandemic era from 2021 to 2022. The GDP data was maintained by the Bureau of Economic Analysis (BEA), and the CPI data was maintained by the U.S. Bureau of Labor Statistics (BLS). These GDP data was collected by the Regional Product Division of BEA from the Department of Agriculture, the Census Bureau, and other federal agencies based on different indicators such as income for the services-producing industries. The CPI data was calculated by the U.S. Bureau of Labor Statistics (BLS) by sampling prices of goods from 6,000 households and 22,000 retail stores each month in 75 major regions in the U.S. The GDP data is updated on a quarterly basis and can be accessed from [Gross Domestic Product Dataset] https://www.bea.gov/data/gdp/gdp-state
. The CPI data is on a monthly basis and can be accessed from [Geographic Information of Consumer Price Index] https://www.bls.gov/regions/subjects/consumer-price-indexes.htm
. The GDP data shows the seasonally adjusted percentage change of GDP from Q1 2022 to Q2 2022 within different industries based on about 50 major cities across the U.S. Each table of the CPI dataset shows one-month and twelve-month percentage changes in the CPI of all items. Each table represents a region and has monthly data from Jan. 2018 to Sep. 2022. There are also detailed CPI index data based on different categories of goods. Both GDP and CPI data are rendered on the website in a table, and we may use the web scraper package “rvest” in R as what we did in our PSet2 to import those data. For GDP data, we can contact the Department of Commerce in the Bureau of Economic Analysis for more information via their email address. For CPI data, we can contact the Division of Consumer Prices and Price Indexes in the Bureau of Labor Statistics via their email or phone number.