Federal Reserve Scrapings

6 views
Skip to first unread message

Daniel

unread,
Feb 16, 2012, 3:05:15 AM2/16/12
to ScraperWiki
I'm not a programmer, but I'm looking for someone who might be able to
write some code that would quickly download documents, extract text
and create ngrams from the Federal Reserve-as well as be able to look
at the total amount of communication. The documents are available at
http://federalreserve.gov/newsevents/default.htm. I'm working on a
project that would focus on the level of Fed communication before,
during and after the financial crisis. If you're interested, please
reply, and I can provide further details about what I'm looking for
and what I'd like to be able to do with the information.


I'm working on my master's thesis on the topic of transparency in
monetary policy. Specifically I want to look at the information
asymmetries between the Federal Reserve, Congress and the financial
markets. Since Congress and financial markets have different
incentives to learn about monetary policy, the Federal Reserve has to
adjust its communication strategy to avoid creating confusion. I'm
working on a simple model to show how these asymmetries were
exacerbated not only by the financial crisis, but also by the Fed's
unconventional response to the crisis. I need the data in order to
test my model and to see how the Fed has changed its communication in
recent years. I plan to combine the Fed communication data with data
from capitolwords.org to compare the timing of Fed communication and
Congressional concern with topics like inflation.


Here are the details of what I'm looking for. Federal Reserve
communication (press releases, Congressional testimony, speeches) is
just one part of transparency, and that's what I really want to focus
on. I need time series data from at least 2004 on the words that the
Fed has released in these documents. I would like to be able to look
at total words by document type, as well as ngrams for specific words
and phrases.

It would take me a very long time to do this by downloading each file
individually, extracting the text from the pdfs, then feeding those
files into a word counter to get a spreadsheet, combining the
spreadsheets, then finally have something that I can work with in
Stata.

If you could get me something along the lines of the capitolwords.org
setup, plus a way to look at total words and split the time series by
type of document that would be fantastic.

The document categories I'd like to be able to sub-divide the data
into are:
Press releases
Speeches
Testimony
Semi-annual Monetary Policy Report

The website is pretty simply structured and the documents are mostly
pdfs going back to 2008, prior to that, they are just .htm pages.

The ngrams will be helpful to highlight which specific Fed programs
were discussed when (TALF, TAF, PDCF, TSLF, etc.) as well as the
specific topics of the communications (housing bubble, Bear Stearns,
liquidity crunch, shadow banking, Lehman Brothers, etc.)

If you have any questions feel free to send me a message.

Neil Turley

unread,
Feb 28, 2012, 8:35:59 AM2/28/12
to ScraperWiki
I've never really done this before, but I wanted to give it a try and
I managed to scrape 2006-12 speeches.

No promises on quality because like I said, it's my first time but I
think you can access my scraper at https://scraperwiki.com/scrapers/fed_speeches/
if that's any help to you. I also uploaded the data to my server at
http://neilstuff.comuv.com/data/fed_speeches.csv. I put it all in
lower case, which makes some acronyms difficult to recognize, and it
caught the footnotes and a link or two (usually one that said "return
to the top") and every once in a while one of the pages had something
nasty like a flash movie so it caught ocasional things like this
---------------------
"swfloader('src', '/newsevents/speech/speech20070323f.swf', 'name',
'thisimagemovie', 'width', '550', 'height', '650');
these figures are presented with flash®; the software to
view these figures is available at
adobe's web site.  "
----------------------
but I figured these sorts of aberrations shouldn't bias the data in a
big way, because these are a small percentage of the corpus.

I started writing code to analyze ngrams but scraperwiki has a CPU
timer that clocked me out before it finished analyzing it. Before it
died, it was saying interesting things (i.e. high occurrence two
letter phrases were things like "i am" and "the fed" and "the board").
I'll see if I can run it on my own machine natively and give you
something interesting to look at if I have time this week.

This took a lot more time than I anticipated though so I'm not making
any promises that I'll finish this project. But hopefully the work
I've done will save somebody time, even if it isn't me.

Good luck,

-Neil


On Feb 16, 1:05 am, Daniel <danieldb...@gmail.com> wrote:
> I'm not a programmer, but I'm looking for someone who might be able to
> write some code that would quickly download documents, extract text
> and create ngrams from the Federal Reserve-as well as be able to look
> at the total amount of communication. The documents are available athttp://federalreserve.gov/newsevents/default.htm. I'm working on a
Reply all
Reply to author
Forward
0 new messages