User Privacy and controls around metadata

6 views
Skip to first unread message

Thad Guidry

unread,
Dec 6, 2020, 8:39:28 PM12/6/20
to openref...@googlegroups.com
Do we have strong feelings around user privacy for us NOT keeping any record of a filename in the metadata when importing?  I think we should not keep a record of the original filename imported in the metadata, but if users what to keep some metadata of the original format, name, etc. then we are currently presenting that at Project Creation time and allowing them to rename or keep this stored as part of the project metadata and searchable, correct?  I'm pretty sure that having complete privacy was the ask that I heard from Open News folks and a few others at NICAR that process whistleblower files.

To that end... do we or do we not keep anything about the original files being imported somehow, other than the displaying the original filename and allow to easily rename on import ?
project_filename_rename.png


Noting this option on some of the importers...

  Store file source (file names, URLs) in each row

project_options.png

Tom Morris

unread,
Dec 7, 2020, 12:49:13 AM12/7/20
to openref...@googlegroups.com
For the vast majority of users, the original data source, whether it be a filename or URL, is a very valuable piece of provenance. We can make storing it optional for the paranoid, but we shouldn't make life more difficult for the many to placate the few.

Tom

--
You received this message because you are subscribed to the Google Groups "OpenRefine Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openrefine-de...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/openrefine-dev/CAChbWaMUPZsZAkdf%3DwUkWXfBQML-v-B0c9uWu0WMCP1Ot72jXA%40mail.gmail.com.

Thad Guidry

unread,
Dec 7, 2020, 6:13:26 AM12/7/20
to openref...@googlegroups.com
Agree Tom.

So my question still stands.  Are we internally keeping the original filename somewhere (maybe in the metadata), even when users at Project Creation time are renaming the filename?



Tom Morris

unread,
Dec 7, 2020, 11:35:18 AM12/7/20
to openref...@googlegroups.com
On Mon, Dec 7, 2020 at 6:13 AM Thad Guidry <thadg...@gmail.com> wrote:
Are we internally keeping the original filename somewhere (maybe in the metadata), even when users at Project Creation time are renaming the filename?

I don't know off the top of my head, but a simple experiment should tell you. After you've imported your super secret file, click "Browse workspace directory" and search the workspace.json file for your secrets to see if they're in there. If OpenRefine isn't behaving the way you want, please create a ticket for the feature/bug.

Tom 

Thad Guidry

unread,
Dec 7, 2020, 12:49:07 PM12/7/20
to openref...@googlegroups.com

--
You received this message because you are subscribed to the Google Groups "OpenRefine Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openrefine-de...@googlegroups.com.

Thad Guidry

unread,
Dec 7, 2020, 1:13:44 PM12/7/20
to openref...@googlegroups.com
OK, so as suspected and as heard by folks at NICAR... yes, it seems the metadata.json does hold onto this... "github-recovery-codes.txt" was stored as fileSource, even though I name changed to "name":"mineTest" at project creation.

And travels with it during an OpenRefine Project export as well inside the metadata.json


{
    "created""2020-12-07T17:49:58Z",
    "modified""2020-12-07T17:50:01Z",
    "name""mineTest",
    "password""",
    "encoding""US-ASCII",
    "encodingConfidence"0,
    "tags": [],
    "creator""",
    "contributors""",
    "subject""",
    "description""",
    "rowCount"16,
    "title""",
    "version""",
    "license""",
    "homepage""",
    "image""",
    "importOptionMetadata": [
        {
            "encoding""US-ASCII",
            "linesPerRow"1,
            "ignoreLines"-1,
            "limit"-1,
            "skipDataLines"-1,
            "storeBlankRows"true,
            "storeBlankCellsAsNulls"true,
            "includeFileSources"false,
            "projectName""mineTest",
            "projectTags": [
                ""
            ],
            "headerLines"0,
            "fileSource""github-recovery-codes.txt"
        }
    ],
    "customMetadata": {},
    "preferences": {
        "entries": {
            "scripting.starred-expressions": {
                "top"2147483647,
                "list": [],
                "class""com.google.refine.preference.TopList"
            },
            "scripting.expressions": {
                "top"100,
                "list": [],
                "class""com.google.refine.preference.TopList"
            }
        }
    }
}

Reply all
Reply to author
Forward
0 new messages