990 XML data moving forward as IRS moves data off AWS

25 views
Skip to first unread message

Daniel Fonner

unread,
Feb 2, 2022, 1:27:16 PM2/2/22
to ARNOVA Data and Analytics Section
Hi everyone,

Just came across the information from the IRS/AWS that the IRS discontinued updating e-filed 990 data on AWS at the end of 2021 to instead making data solely available from the IRS' website.  It appears the IRS has all of the e-filed XML data available through the end of 2021 for download on their webpage with the XML documents in compressed folders.

It appears the document naming convention is the same as when accessing 990s via AWS and the index file. It's just less efficient needing to download and unzip the folders to then query the document names desired.

Do anybody have any insight as to how the IRS will handle this data moving forward? Will they continue with generating new index files as things are filed in 2022/will they keep the same document naming conventions? Will they make a system for querying specific XML documents on there own or will the download/unzip/query on our own machines be the method going forward?

I'd appreciate if anyone has any insight.

Thanks!
Daniel

Francisco J Santamarina

unread,
Feb 2, 2022, 2:59:38 PM2/2/22
to arnova...@googlegroups.com

Hi Daniel,

 

The Nonprofit Open Data Collective held a meeting on January 11th (hosted by Cinthia Schuman at the Aspen Institute) on precisely this issue. The group brought together academics, practitioners, and (former) government staff to discuss the questions that you outlined below. Please find below a summary of the discussion:

 

Together, we essentially decided on a two-pronged  strategy.

 

  1. We, as a community, should work together to ensure more accessible, “normalized” 990 data. We know that the IRS EO division is understaffed, and the agency is plagued by major delays.  While we learned from Andrea Suozzo that the agency  plans to take at least one step - providing indices -  to improve access to the bulk 990 e-filed data being  released  to the IRS website,  we do not know when such improvements will happen.  Open 990 data is a common good, and by collaborating on various pieces of this puzzle, we can jumpstart the process of upgrading the 990 “ecosystem.”    Many good ideas for working together were provided, including jointly creating a “bucket” from which individuals can gain access to 990 files more easily; creating a coding competition to accelerate production of the 990 data “pipeline”; and continuing to jointly tackle the missing 990 schema, that are so essential. 

 

  1. Advocacy and communication with the IRS must continue. In addition to our regular comments to the IRS, we should reach out to individuals, both within and outside the IRS, to ensure that our recommendations are heard, and the spirit of the Taxpayer First Act’s transparency provisions is realized. It was noted that officials within the Statistics of Income division of the IRS are natural allies and are open to learning what is needed to further research.  In addition, there are future legislative opportunities, and we should work with our partners at Independent Sector and elsewhere, to express and underscore our policy concerns.

 

As you can tell, there were no answers as much as a refining of some questions and needs to overcome the challenges with accessing 990 data that the situation brings to light.

 

It seems like the .ZIP approach will be what they do moving forward, but I’m not 100%. Happy to chat further, via email, this listserv, or one-on-one!

 

 

-Francisco

Francisco J. Santamarina (he/his/him)
PhD Candidate
Evans School of Public Policy & Governance | University of Washington
360-836-0731 | Seattle, WA 98195 |
website
fjsa...@uw.edu
The University of Washington acknowledges the Coast Salish peoples of this land, the land which touches the shared waters of all tribes and bands within the Duwamish, Puyallup, Suquamish, Tulalip and Muckleshoot nations. To learn more, visit Native-Land.ca.

--
You received this message because you are subscribed to the Google Groups "ARNOVA Data and Analytics Section" group.
To unsubscribe from this group and stop receiving emails from it, send an email to arnovadataci...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/arnovadatacig/d7fef395-1130-4f28-85a6-efa2db1a51den%40googlegroups.com.

Daniel Fonner

unread,
Feb 2, 2022, 3:39:41 PM2/2/22
to arnova...@googlegroups.com
Thank you so much Francisco! This is really helpful.  I completely missed the Nonprofit Open Data Collective meeting; I'm really glad the conversations are going on!

All the best,
Daniel

Francisco J Santamarina

unread,
Feb 2, 2022, 10:16:37 PM2/2/22
to arnova...@googlegroups.com

Hi Daniel,

 

Glad to be of service 😊.

 

 

-Francisco

Francisco J. Santamarina (he/his/him)
PhD Candidate
Evans School of Public Policy & Governance | University of Washington
360-836-0731 | Seattle, WA 98195 |
website
fjsa...@uw.edu
The University of Washington acknowledges the Coast Salish peoples of this land, the land which touches the shared waters of all tribes and bands within the Duwamish, Puyallup, Suquamish, Tulalip and Muckleshoot nations. To learn more, visit Native-Land.ca.

 

From: arnova...@googlegroups.com <arnova...@googlegroups.com> On Behalf Of Daniel Fonner
Sent: Wednesday, February 2, 2022 12:39 PM
To: arnova...@googlegroups.com
Subject: Re: [ARNOVA Data and Analytics CIG] 990 XML data moving forward as IRS moves data off AWS

 

Thank you so much Francisco! This is really helpful.  I completely missed the Nonprofit Open Data Collective meeting; I'm really glad the conversations are going on!

 

All the best,

Daniel

 

On Wed, Feb 2, 2022 at 1:59 PM Francisco J Santamarina <fjsa...@uw.edu> wrote:

Hi Daniel,

 

The Nonprofit Open Data Collective held a meeting on January 11th (hosted by Cinthia Schuman at the Aspen Institute) on precisely this issue. The group brought together academics, practitioners, and (former) government staff to discuss the questions that you outlined below. Please find below a summary of the discussion:

 

Together, we essentially decided on a two-pronged  strategy.

 

1.      We, as a community, should work together to ensure more accessible, “normalized” 990 data. We know that the IRS EO division is understaffed, and the agency is plagued by major delays.  While we learned from Andrea Suozzo that the agency  plans to take at least one step - providing indices -  to improve access to the bulk 990 e-filed data being  released  to the IRS website,  we do not know when such improvements will happen.  Open 990 data is a common good, and by collaborating on various pieces of this puzzle, we can jumpstart the process of upgrading the 990 “ecosystem.”    Many good ideas for working together were provided, including jointly creating a “bucket” from which individuals can gain access to 990 files more easily; creating a coding competition to accelerate production of the 990 data “pipeline”; and continuing to jointly tackle the missing 990 schema, that are so essential. 

 

2.      Advocacy and communication with the IRS must continue. In addition to our regular comments to the IRS, we should reach out to individuals, both within and outside the IRS, to ensure that our recommendations are heard, and the spirit of the Taxpayer First Act’s transparency provisions is realized. It was noted that officials within the Statistics of Income division of the IRS are natural allies and are open to learning what is needed to further research.  In addition, there are future legislative opportunities, and we should work with our partners at Independent Sector and elsewhere, to express and underscore our policy concerns.

Reply all
Reply to author
Forward
0 new messages