Subject Add and Modify workflow changes.

1 view
Skip to first unread message

David Zwarg

unread,
Dec 6, 2011, 1:53:48 PM12/6/11
to districtb...@googlegroups.com
Greetings,

In the development branch of 'datatools', I'm putting together the workflow for uploading new Subject data into a running DistrictBuilder via the administrative interface.

I've put up some workflow instructions on the wiki, on the "Add a Subject" and "Modify a Subject" wiki pages:
On each page, there is a section entitled "1.5 and Later", which is the targeted version for datatools.  These instructions allow administrators to do the following:
  1. Generate a template CSV for all blocks in the system.
  2. Upload this template CSV into the app
  3. Have the CSV verified and checked for integrity
  4. Allow the user to edit the metadata of the subject (verbose name, sort order, and descriptions).
Let me know if you have any questions or comments.
Thanks,
David

--
David Zwarg, Software Developer

Azavea  |  One Cambridge Center, 6th Floor, Cambridge, MA 02142-1601
dzw...@azavea.com  | T 617.649.2227  | F 215.925.2663
Web azavea.com  |  Blog azavea.com/blogs

Dr. Micah Altman

unread,
Dec 8, 2011, 3:51:01 PM12/8/11
to districtb...@googlegroups.com
A few questions based on your docs for add/modify:


: template

- Format = CSV ? -- What are rules for quote characters and quote escaping?
- Prefill - is this prefilled with current values when downloaded

: Metadata Edit

- Assume denominator, subject order, be specified when editing. Confirm?
- Assume that metadata can be edited on existing subject without
reloading data. Confirm?

Thanks,

Micah

--
________________________________________________________________________
Micah Altman, Ph.D. <http://redistricting.info>           Twitter: @drmaltman
Senior Research Scientist, Institute for Quantitative Social Science, Harvard U.
Director of Archiving and Acquisitions, IQSS;
"Entia non sunt multiplicanda sine necessitate" - Dr. Invincibilis
(Corollary, "Ad indicia spectate.")

David Zwarg

unread,
Dec 8, 2011, 4:01:31 PM12/8/11
to districtb...@googlegroups.com
Hello,
  1. The format is CSV. Normal rules for CSV quoting and escaping apply: long subject names may be quoted in double quotes if a comma is in their name. Unfortunately, there is no 'spec' for CSV files, so all fields should be UTF-8 or printable ASCII charaters. Field delimiters are ",", quote character is '"' (a single double quote), and a quote in a quoted name is '""' (two double quotes).
  2. The subject values are prefilled with "0.0" for all geounits in the template.
  3. The subject name, short display, description, denominator, and subject order are all editable after upload has completed.
  4. Subject meta data may be edited at any time, and does not require a reload of the subject data.
Thanks,
David

Dr. Micah Altman

unread,
Dec 8, 2011, 4:27:25 PM12/8/11
to districtb...@googlegroups.com
Thanks.

> The subject values are prefilled with "0.0" for all geounits in the
> template.

Um, if you are modifying an existing subject shouldn't the template
prefill the existing subject values?

David Zwarg

unread,
Dec 8, 2011, 4:40:49 PM12/8/11
to districtb...@googlegroups.com
The template download occurs before a subject is uploaded, and the modification process is only triggered after uploading a complete template with a subject name that matches a subject already in the system.

If it's important that administrators have a copy of an existing subject's data, then we can add a separate administrative command that fetches a copy of a single subject.

Thanks,
David

Dr. Micah Altman

unread,
Dec 8, 2011, 5:02:10 PM12/8/11
to districtb...@googlegroups.com
If all the existing data is going to be available in CSV anyway with
completion of issue 242, we don't strictly need the admin to be able
to
download an individual subject, though it might be convenient, if its
trivial to add.

In either case, I still confused about the use case corresponding to
downloading an empty template. If they are adding a new subject,
they must have the data already in some machine readable form. In
which case they'll simply export it as CSV and upload ... the
template doesn't help there. And if they want to modify existing
values, but don't have a copy a null template isn't useful either.
When is an empty template used?

--

David Zwarg

unread,
Dec 9, 2011, 9:15:46 AM12/9/11
to districtb...@googlegroups.com
The details of 242 need to be hashed out -- I did not assume that the original data would be in CSV format, because the original data used in setup.py comes in the form of shapefiles and dbf attribute files.

The use of a template file is in order to guide administrators in building a CSV that contains the correct formatting and column headings. We have used this model in other applications before, and it has been beneficial and useful to end users. In our experience, if an administrator only opens up the template, looks at it, and creates a new CSV with the same structure, it will be beneficial.

In addition, the template contains all the GEOIDs that are in the instance. These will be uniform within an instance. This can be correlated with the GEOIDs in the administrators machine readable form, to ensure data integrity prior to upload.

Essentially, having the template (even if it's all zeros) helps to reduce human error when creating new or modifying existing Subject data. At Azavea, we've used this mechanism before, and it really helps administrators and regular users construct well formatted documents for upload -- in any format (in our case, CSV).

Thanks,
David

Dr. Micah Altman

unread,
Dec 9, 2011, 7:00:56 PM12/9/11
to districtb...@googlegroups.com
I see the value in having the geoid's and in having an example of what the column headings are supposed to be. And it seems to me that we should make this (which seems like "data input") consistent with 242 (which seems like "data output") if possible. This allows for round-tripping, which is a good design principle and robustness check. It also allows for the user to get the values they'll need to modify, without building a separate single-column csv download, as per your previous suggestion. The latter would be ok, but seems somewhat redundant with 242.

They seem pretty close in requirements already ... both should have a GEOID column, column headers, subject values. CSV would be fine for 242 (or maybe dbf is fine for import?). Are there key differences in the requirements that pull these apart and I'm not seeing?

David Zwarg

unread,
Dec 14, 2011, 9:10:03 AM12/14/11
to districtb...@googlegroups.com
Making 242 a CSV download in order to support "round-tripping" is definitely a plus. The only difference between the Subject Upload/Modify task is that 242 addresses the user-accessible (vs. admin-accessible) subject data download.  That should not be time consuming, since the mechanics of generating the subject file are already in place.

-David
Reply all
Reply to author
Forward
0 new messages