Category: Code-assisted Reporting

Tinkering with programming languages to make reporting faster and more accurate.

Clear Skies
I have worked over the past year or so to speed up routine reporting tasks with rudimentary code. Python gave me the most success, and I still rely heavily on Excel formulas. Argus prefers the statistics language R.

R easily chews through the massive sheets of data I review as an energy reporter. But coaxing conclusions back out of that maw has proven a much more finicky conversation than Python and certainly harder than the formulas I cobbled together over a decade in Excel. Still, I try to figure out new problems using R first as often as possible when coding for work.

The collapse of the air transportation industry offers one of the clearest illustrations of economic devastation wrought by global efforts to mitigate COVID-19. Major US air carriers slashed their capacity in March and April with dizzying speed. But without access to data about scheduled flights, it was tough to get a sense of the drop.

The US Transportation Safety Administration began posting data about daily passenger screenings beginning in March. These popped up in a simple HTML table updated every morning with the previous day’s total. Nice! This simple year-over-year comparison is very helpful for several reasons.

Screen capture of https://www.tsa.gov/coronavirus/passenger-throughput on 12 April, 2020.

First, it’s much more timely information than government agencies usually make available. You can usually wait weeks, if not months, for this kind of data to go public. Second, it’s published with context that illustrates how the current data is changing. Sure, it’d be great to see a few years of the same data, instead of just one year. But giving year-over-year data still means we can draw some very apples-to-apples comparisons about the current year by comparing it to the same travel season last year. One additional helpful bit is that TSA is giving the information compared to the same day of the week as 2019, not simply the exact same date. This is helpful because in normal circumstances travelers favor specific days of the week. Simply comparing 12 April, 2020, to the same day of 2019, would fuzzy up the comparison by using different days of the week.

This is useful data. And because it’s in a simple table, it is very easy to grab with a script, a task called scraping.

R has a package called rvest that will help make this easy. I’m going to also call in some tools from a huge and useful library called tidyverse. Tidyverse also has tools to help the computer understand dates in the first column, but at the time I wrote this I was using a package called anytime to process those. Finally, I’ll use ggplot2 to show me what’s happening over time.
```
library(rvest)
library(anytime)
library(tidyverse)
library(ggplot2)
```
First, I need to tell my script what to grab. Obviously, it’s going to need the website’s URL. I also need to tell it where on the webpage it should look.

I’m going to do this with the website’s xpath, which tells my script how to navigate the website. It’s the computer equivalent of telling it to look at the table below the masthead. xpath is simple to find. Modern browsers make this really easy when inspecting a page’s source.

OK, so I can now describe through R how to find the table we want. I’m still learning rvest at this point, so for now I’m going to use tools called xml2 and html_table() in the package to read the table. Why? Because I’m learning R pretty much the same way I learned higher-level Excel: Google the question and try the most recent answers. This is not a great way to go about it! R has changed a lot over the years. But it does let you eavesdrop on lots of conversations of people who struggled with solving these questions before you.
```
# fetch the latest screening numbers
url <- "https://www.tsa.gov/coronavirus/passenger-throughput"
TSA <- url %>%
  xml2::read_html() %>%
  html_nodes(xpath='//*[@id="node-page-25164"]/div[1]/div/div/div/table') %>%
  html_table()
```
Alright! I parked the url in a variable called “url”. Then I told rvest to look at the code that makes up the TSA website, find the table, and scrape the table into a variable called “TSA”. This comes out of rvest in a list format, and I want it in a dataframe, so I’ll need to deal with that next.

I’m going to create a dataframe called “passengers.” I’m going to label my columns of data and trim the old headings. I’m going to tell R to treat the text that writes out dates as dates that a computer understands. I’m going to make sure R knows that the rest of the numbers are, actually, numbers. Finally, I’m going to go ahead and create a column that tells me what the 2020 traffic represents as a percentage of last year’s traffic.

I’d use iterative processes called loops to do this in Python. You can do this in R, too — but I can’t, consistently. So this code does it the longer but clearer way.
```
# process the HTML table into a data table with numbers and a percentage calculation
passengers <- TSA[[1]]
colnames(passengers) <- c("Date", "Current", "Year-ago")
passengers <- passengers[-1,]
passengers$Date <- anydate(passengers$Date)
passengers$Current <- as.numeric(gsub(",", "", passengers$Current))
passengers$`Year-ago` <- as.numeric(gsub(",","", passengers$`Year-ago`))
passengers  <-  mutate(passengers, 
  pc = round((passengers$Current / passengers$'Year-ago' * 100), 1)
)
```
There’s one more thing I want to do, right off the bat. This TSA data kind of appeared from nowhere. It showed up in March, back dated to the first of the month. That tells me it could disappear at any time, too. So I’m going to write a copy of that data onto my computer for later, just in case.
```
write.csv(passengers,"<not a real path posted to the internet>tsa_screen.csv", row.names = FALSE)
```
Now for where R shines: instant graphs! let’s take a look at what the data looks like over time. I’m going to tell R to gather up my dataframe into a tall set so I can plot the two years against each other. I’m going to tell it not to get confused about my percentage column or the dates. Then I’m going to do some quality of life bits, like telling the scale to include commas, giving it a title, and making sure I cite my source.
```
# plot the current and year-ago screenings
passengers %>%
  gather(key = "When", value = "passengers", -Date, -pc) %>%
  ggplot(., aes(x=Date, y=passengers, col=When)) +
  geom_line(size=1.2) +
  scale_y_continuous(labels = scales::comma) +
  labs(title = "TSA Passengers Screened",
       caption = "Source: Transportation Security Administration",
       x = element_blank(),
       y = element_blank()) +
  theme(plot.title = element_text(hjust = 0.5))
```
Quick plot of the plummeting US passenger screening data.

Ouch.

I need one last thing, and it’s the bit that I struggle with in R the most. The plot is great for showing the story at a glance. But I’m a writer. I need the numbers R found. So for want of a pretty way to do this, I’m just going to tell it to show me the latest day of data.
```
# What is the latest number of passengers screened?
passengers[1,]

        Date Current Year-ago  pc
1 2020-04-11   93645  2059142 4.5
```
Just 4.5 percent of air travelers screened last year on the Saturday of Easter weekend checked in yesterday. An incredible collapse.

And now that I’ve written the script, my computer will check and record this information in the background while showing me the latest update on screenings. No more visiting the website, no risk of errors of transcription or new math problems.
April 12, 2020
Renewable diesel makes inroads in California

Production of California’s boutique diesel fell last year to the lowest levels in decades as renewable alternatives gained market share in the state.

In-state output of diesel approved for California roads plunged to the lowest fourth quarter volume on record last year, according to the California Energy Commission (CEC), despite available tax data suggesting higher overall diesel consumption. Refiners meanwhile boosted production of non-California diesel to the highest levels in at least two decades and ramped up exports of fuel oil.

Renewable diesel, an alternative diesel that blends seamlessly with its petroleum-based cousin, rapidly grew its market share in the state transportation pool as the mix of diesel supply flipped, according to the California Air Resources Board (CARB). The shift illustrates how US refiners have adapted asa major market seeks to slash their traditional products.

“When we look at opportunities to produce products where there is going to be growth in the market and they are going to have sustainably high margins, we look to renewable diesel,” Valero chief executive Joe Gorder said in October.

CARB crash

California refiners had not since 2003 produced less than an annual average 200,000 b/d of diesel approved for the state’s roads, called CARB diesel. The stretch dated back to before California’s modern fuel requirements took effect. Average production in 2019 fell by a third from 2018, the largest year-to-year decline on record and more than double a 2009 drop as the state navigated a recession.

But California diesel demand was steady for much of 2019, according to a combination of state and federal data. Taxable gallons recorded by the most recent California Department and Tax and Fee Administration data show a 1.3pc increase, to 204,000 b/d, through the first nine months of the year. While short of the more than 2pc increases in implied demand seen for three consecutive years for the same periods in 2015, 2016 and 2017, the modest increase reversed a decline in 2018.

Fuel prices over the course of the year also cannot fully explain the shift, based on Argus assessments. San Francisco CARB and non-CARB diesel were at parity for almost all of 2019, while the Los Angeles market offered a steady premium for CARB fuel. That premium narrowed over the course of the year to within a 7¢/USG range from the previous year’s 9.5¢/USG premium.

Renewable diesel surge

Credit prices to comply with the state’s green fuel standard moved more dramatically. Renewable diesel supported by the state’s Low Carbon Fuel Standard took steadily larger bites of state fuel demand. The fuel, which blends seamlessly with petroleum diesel and can use existing pipelines and other infrastructure, accounted for 10.2pc of the California diesel pool in 2018. Renewable diesel increased that share by more than 60pc to 17.2pc in the first half of 2019, racing past biodiesel as the lead diesel alternative.

Credit prices over the same period increased the spread between average California renewable diesel credits and conventional diesel penalties by almost a third. That gap has doubled since 2017.

US independent refiners Marathon Petroleum, Phillips 66 and Valero and majors Shell and BP have all planned renewable diesel projects. Oil majors Shell and BP have also planned west coast renewable diesel plants. Independent refiner PBF Energy plans to join a proposed Shell plant at the 155,000 b/d Martinez refinery that PBF plans to buy from the major during the first quarter.

Valero already operates an 18,000 b/d renewable diesel plant in Louisiana that it plans to expand to 44,000 b/d in 2021. California’s commitment to the program, along with rising interest in the fuel in Canada, Europe and New York, supported investments in the fuel, Valero senior vice president of alternative fuels Martin Parrish said during a third quarter conference call.

“We think the future demand for renewable diesel just looks very strong,” Parrish said.

New focus

But California refiners did not ramp down overall diesel production, according to the CEC. Non-California diesel production climbed above 200,000 b/d for the first time in CEC records, and stayed there through the last three quarters of 2019. Output of non-California diesel consistently surpassed CARB diesel production for the first time since CEC records began in 1999.

Refiners and traders have not discussed where that production flowed, and federal and state data does not give a complete picture of consumption in neighboring states.

State tax records show that Nevada diesel and biodiesel consumption increased by almost 5pc in the first 10 months of 2019 compared to 2018. Both Los Angeles and Utah supply that state. Arizona state fuels information was not available, and federal data offers only an incomplete estimate of higher consumption for 2019.

Federal export data and vessel tracking by analytics firm Vortexa show fairly typical diesel exports from California over 2019. But fuel oil exports loading from California began ramping up beyond year-ago levels in August as diesel exports shrank, according to Vortexa.

February 4, 2020

Category: Code-assisted Reporting

Clear Skies

Renewable diesel makes inroads in California

CARB crash

Renewable diesel surge

New focus