[Python-ideas] Add pathlib.Path.write_json and pathlib.Path.read_json

1,021 views
Skip to first unread message

Ram Rachum

unread,
Mar 27, 2017, 8:51:58 AM3/27/17
to python-ideas
Hi guys,

What do you think about adding methods pathlib.Path.write_json and pathlib.Path.read_json , similar to write_text, write_bytes, read_text, read_bytes?

This would make writing / reading JSON to a file a one liner instead of a two-line with clause.


Thanks,
Ram.

Ram Rachum

unread,
Mar 27, 2017, 8:52:37 AM3/27/17
to python-ideas
Oh, and also it saves you from having to import json.

Paul Moore

unread,
Mar 27, 2017, 8:57:29 AM3/27/17
to Ram Rachum, python-ideas
On 27 March 2017 at 13:50, Ram Rachum <r...@rachum.com> wrote:
> This would make writing / reading JSON to a file a one liner instead of a
> two-line with clause.

That hardly seems like a significant benefit...

Paul
_______________________________________________
Python-ideas mailing list
Python...@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Steve Dower

unread,
Mar 27, 2017, 10:06:53 AM3/27/17
to Paul Moore, Ram Rachum, python-ideas
It was enough of a benefit for text (and I never forget the argument order for writing text to a file, unlike json.dump(file_or_data?, data_or_file?) )

+1

Top-posted from my Windows Phone

From: Paul Moore
Sent: ‎3/‎27/‎2017 5:57
To: Ram Rachum
Cc: python-ideas
Subject: Re: [Python-ideas] Add pathlib.Path.write_json andpathlib.Path.read_json

Markus Meskanen

unread,
Mar 27, 2017, 10:10:00 AM3/27/17
to Steve Dower, Ram Rachum, Python-Ideas
-1, should we also include write_ini, write_yaml, etc?

A class cannot account for everyone who wants to use it in different ways.

Donald Stufft

unread,
Mar 27, 2017, 10:34:26 AM3/27/17
to Ram Rachum, python-ideas

On Mar 27, 2017, at 8:50 AM, Ram Rachum <r...@rachum.com> wrote:

What do you think about adding methods pathlib.Path.write_json and pathlib.Path.read_json , similar to write_text, write_bytes, read_text, read_bytes?



-1, I also think that write_* and read_* were mistakes to begin with.


Donald Stufft



Paul Moore

unread,
Mar 27, 2017, 10:37:13 AM3/27/17
to Donald Stufft, Ram Rachum, python-ideas
On 27 March 2017 at 15:33, Donald Stufft <don...@stufft.io> wrote:
> What do you think about adding methods pathlib.Path.write_json and
> pathlib.Path.read_json , similar to write_text, write_bytes, read_text,
> read_bytes?
>
>
>
> -1, I also think that write_* and read_* were mistakes to begin with.

Text is (much) more general-use than JSON.

Ram Rachum

unread,
Mar 27, 2017, 10:42:09 AM3/27/17
to python-ideas
Another idea: Maybe make json.load and json.dump support Path objects?

Donald Stufft

unread,
Mar 27, 2017, 10:43:35 AM3/27/17
to Paul Moore, Ram Rachum, python-ideas

On Mar 27, 2017, at 10:36 AM, Paul Moore <p.f....@gmail.com> wrote:

On 27 March 2017 at 15:33, Donald Stufft <don...@stufft.io> wrote:
What do you think about adding methods pathlib.Path.write_json and
pathlib.Path.read_json , similar to write_text, write_bytes, read_text,
read_bytes?



-1, I also think that write_* and read_* were mistakes to begin with.

Text is (much) more general-use than JSON.


Sure. I also think touch() and all the others are the same :) I think they’re just an unfortunate detritus of a time before PathLike and that it’s super weird to have some operations you do to a file path (compared to things you do to generate, modify, or resolve a path) be hung off of the Path object and every other be an independent thing that takes it as an input. I’d find it equally weird if dictionary objects supported a print() or a .json() method.


Donald Stufft



Serhiy Storchaka

unread,
Mar 27, 2017, 10:44:40 AM3/27/17
to python...@python.org
Good try, but you have published this idea 5 days ahead of schedule.

Markus Meskanen

unread,
Mar 27, 2017, 10:45:03 AM3/27/17
to Ram Rachum, Python-Ideas

Another idea: Maybe make json.load and json.dump support Path objects?

Much better. Or maybe add json.load_path and dump_path

Paul Moore

unread,
Mar 27, 2017, 11:00:52 AM3/27/17
to Ram Rachum, python-ideas
On 27 March 2017 at 15:40, Ram Rachum <r...@rachum.com> wrote:
> Another idea: Maybe make json.load and json.dump support Path objects?

If they currently supported filenames, I'd say that's a reasonable
extension. Given that they don't, it still seems like more effort than
it's worth to save a few characters

with path.open('w'): json.dump(obj, f)
with path.open() as f: obj = json.load(f)

Steven D'Aprano

unread,
Mar 27, 2017, 11:11:01 AM3/27/17
to python...@python.org
Reading/writing JSON is already a one liner, for people who care about
writing one liners:

obj = json.load(open("foo.json"))
json.dump(obj, open("foo.json"))

Pathlib exists as an OO interface to low-level path and file operations.
It understands how to read and write to files, but it doesn't understand
the content of those files. I don't think it should.

Of course pathlib can already read JSON, or for that matter ReST text
or JPG binary files. It can read anything as text or bytes, including
JSON:

some_path.write_text(json.dumps(obj))
json.loads(some_path.read_text())


I don't think it should be pathlib's responsibility to deal with the
file format (besides text). Today you want to add JSON support. What
about XML and plists and ini files? Tomorrow you'll ask for HTML
support, next week someone will want pathlib to support .wav files as a
one liner, and before you know it pathlib is responsible for a hundred
different file formats with separate read_* and write_* methods.

That's not pathlib's responsibility, and there is nothing wrong with
writing two lines of code.


--
Steve

Chris Barker

unread,
Mar 27, 2017, 11:35:51 AM3/27/17
to Paul Moore, Ram Rachum, python-ideas
On Mon, Mar 27, 2017 at 7:59 AM, Paul Moore <p.f....@gmail.com> wrote:
On 27 March 2017 at 15:40, Ram Rachum <r...@rachum.com> wrote:
> Another idea: Maybe make json.load and json.dump support Path objects?

If they currently supported filenames, I'd say that's a reasonable
extension. Given that they don't, it still seems like more effort than
it's worth to save a few characters

Sure, but they probably should -- it's a REALLY common (most common) use-case to read and write JSON from a file. And many APIs support "filename or open file-like object".

I'd love to see that added, and, or course, support for Path objects as well.

-CHB




--

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris....@noaa.gov

Eric V. Smith

unread,
Mar 27, 2017, 11:37:34 AM3/27/17
to python...@python.org
On 3/27/17 10:40 AM, Ram Rachum wrote:
> Another idea: Maybe make json.load and json.dump support Path objects?

json.dump requires open file objects, not strings or Paths representing
filenames.

But does this not already do what you want:

Path('foo.json').write_text(json.dumps(obj))
?

Eric.

Paul Moore

unread,
Mar 27, 2017, 11:42:36 AM3/27/17
to Eric V. Smith, Python-Ideas
On 27 March 2017 at 15:48, Eric V. Smith <er...@trueblade.com> wrote:
> On 3/27/17 10:40 AM, Ram Rachum wrote:
>>
>> Another idea: Maybe make json.load and json.dump support Path objects?
>
>
> json.dump requires open file objects, not strings or Paths representing
> filenames.
>
> But does this not already do what you want:
>
> Path('foo.json').write_text(json.dumps(obj))
> ?

Indeed. There have now been a few posts quoting ways of reading and
writing JSON, all of which are pretty short (if that matters). Do we
*really* need another way?

Paul

Ethan Furman

unread,
Mar 27, 2017, 11:59:30 AM3/27/17
to python...@python.org
On 03/27/2017 08:04 AM, Steven D'Aprano wrote:
> On Mon, Mar 27, 2017 at 02:50:38PM +0200, Ram Rachum wrote:

>> What do you think about adding methods pathlib.Path.write_json and
>> pathlib.Path.read_json , similar to write_text, write_bytes, read_text,
>> read_bytes?
>
> That's not pathlib's responsibility, and there is nothing wrong with
> writing two lines of code.

+1

Bruce Leban

unread,
Mar 27, 2017, 12:45:31 PM3/27/17
to Ram Rachum, python-ideas
I'm not in favor of this idea for the reason mentioned by many of the other posters. BUT ... this does bring up something missing from json readers: the ability to read one json object from the input rather than reading the entire input and attempting to interpret it as one object. For my use case, it would be sufficient to read whole lines only but I can imagine other use cases. 

The basic rule would be to read as much of the input as necessary (and no more) to read a single json object, ignoring leading white space.

In practical terms:
  • if the first character is [ or { or " read to the matching ] or } or "
  • otherwise if the first character is a digit or '-' read as many characters as possible to parse a number
  • otherwise attempt to match 'true', 'false' or 'null'
  • otherwise fail 

--- Bruce
Check out my puzzle book and get it free here:



Chris Barker

unread,
Mar 27, 2017, 4:34:48 PM3/27/17
to Bruce Leban, Ram Rachum, python-ideas
On Mon, Mar 27, 2017 at 9:43 AM, Bruce Leban <br...@leban.us> wrote:
I'm not in favor of this idea for the reason mentioned by many of the other posters. BUT ... this does bring up something missing from json readers: the ability to read one json object from the input rather than reading the entire input and attempting to interpret it as one object.

I can't tell from the JSON spec (at least not quickly), but it is possible to have more than one object at the top level?

Experimenting with the python json module seems to indicate that it is not -- you can only have one "thing" in a JSON file -- either an "object" or an array.

then, of course you can arbitrarily nest stuff inside that top-level container.

Since the nesting is arbitrary, I'm not sure it's clear how a one-object-at-a-time reader would work in the general case?

-CHB

Paul Moore

unread,
Mar 27, 2017, 4:36:16 PM3/27/17
to Bruce Leban, Ram Rachum, python-ideas
On 27 March 2017 at 17:43, Bruce Leban <br...@leban.us> wrote:
> the ability to read one json object from the input rather than reading the
> entire input

Is this a well-defined idea? From a quick read of the JSON spec (which
is remarkably short on details of how JSON is stored in files, etc)
the only reference I can see is to a "JSON text" which is a JSON
representation of a single value. There's nothing describing how
multiple values would be stored in the same file/transmitted in the
same stream. It's not unreasonable to assume "read one object, then
read another" but without an analysis of the grammar, it's not 100%
clear if the grammar supports that (you sort of have to assume that
when you hit "the end of the object" you skip some whitespace then
start on the next - but the spec doesn't say anything like that.
Alternatively, it's just as reasonable to assume that
json.load/json.loads expect to be passed a single "JSON text" as
defined by the spec.

If the spec was clear on how multiple objects in a single stream
should be handled, then yes the json module should support that. But
without anything explicit in the spec, it's not as obvious. What do
other languages do?

Paul

David Mertz

unread,
Mar 27, 2017, 4:46:01 PM3/27/17
to Paul Moore, Ram Rachum, python-ideas, Bruce Leban
The format JSON lines (http://jsonlines.org/) is pretty widely used, but is an extension of JSON itself. Basically, it's the idea that you can put one object per physical line to allow incremental reading or spending of objects.

It's a good idea, and I think the `json` module should support it. But it definitely doesn't belong in `pathlib`.

David Mertz

unread,
Mar 27, 2017, 4:47:41 PM3/27/17
to Paul Moore, Ram Rachum, python-ideas, Bruce Leban

Wes Turner

unread,
Mar 27, 2017, 5:28:34 PM3/27/17
to Chris Barker, Ram Rachum, python-ideas
On Mon, Mar 27, 2017 at 10:34 AM, Chris Barker <chris....@noaa.gov> wrote:
On Mon, Mar 27, 2017 at 7:59 AM, Paul Moore <p.f....@gmail.com> wrote:
On 27 March 2017 at 15:40, Ram Rachum <r...@rachum.com> wrote:
> Another idea: Maybe make json.load and json.dump support Path objects?

If they currently supported filenames, I'd say that's a reasonable
extension. Given that they don't, it still seems like more effort than
it's worth to save a few characters

Sure, but they probably should -- it's a REALLY common (most common) use-case to read and write JSON from a file. And many APIs support "filename or open file-like object".

I'd love to see that added, and, or course, support for Path objects as well.






class PathJSONEncoder(json.JSONEncoder):
    def default(self, obj):
        if isinstance(obj, pathlib.Path):
            return unicode(obj)  # ? (what about bytes)
            return OrderedDict((
                ('@type', 'pydatatypes:pathlib.Path'),  # JSON-LD
                ('path', unicode(obj)), )
         return json.JSONEncoder.default(self, obj)


def as_pathlib_Path(obj):
        if obj.get('@type') == 'pydatatypes:pathlib.Path':
            return pathlib.Path(obj.get('path'))
        return obj


def read_json(self, **kwargs):
    object_pairs_hook = kwargs.pop('object_pairs_hook', collections.OrderedDict) # OrderedDefaultDict
    object_hook = kwargs.pop('object_hook', as_pathlib_Path)
    encoding = kwargs.pop('encoding', 'utf8')
    with codecs.open(self, 'r ', encoding=encoding) as _file:
        return json.load(_file,
            object_pairs_hook=object_pairs_hook,
            object_hook=object_hook,
            **kwargs)

def write_json(self, obj, **kwargs):
    kwargs['cls'] = kwargs.pop('cls', PathJSONEncoder)
    encoding = kwargs.pop('encoding', 'utf8')
    with codecs.open(self, 'w', encoding=encoding) as _file:
        return json.dump(obj, _file, **kwargs)


def test_pathlib_json_encoder_decoder():
    p = pathlib.Path('./test.json')
    obj = dict(path=p, _path=str(unicode(p)))
    p.write_json(obj)
    obj2 = p.read_json()
    assert obj['path'] == obj2['path']
    assert isinstance(obj['path'], pathlib.Path)



open()
bytes()
chunks()
write_bytes()
text()
def write_text(self, text, encoding=None, errors='strict',
                   linesep=os.linesep, append=False):
lines()
write_lines()

read_hash()
read_md5()
read_hexhash()


 

-CHB




--

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris....@noaa.gov

Wes Turner

unread,
Mar 27, 2017, 5:42:17 PM3/27/17
to David Mertz, Ram Rachum, python-ideas, Bruce Leban
FWIW, pyline could produce streaming JSON w/ json.dumps(indent=0),
but because indent>0, there are newlines.

    pydoc json | pyline '{"a":l} if "json" in l.lower() else None' -O json
    pydoc json | pyline -r '.*JSON.*' 'rgx and line' -O json 

It's a similar issue:
what are good default JSON encoding/decoding settings?

 # loads/JSONDecoder
 file.encoding # UTF-8
 object_pairs_hook
 object_hook

 # dumps/JSONEncoder
 file.encoding # UTF-8
 cls
 separators
 indent

- [ ] ENH: pyline: add 'jsonlines' as an {output,} format



#python tip:  Set separators=(',', ':') to dump JSON more compactly.
>>> json.dumps({'a':1, 'b':2}, separators=(',',':'))
'{"a":1,"b":2}'

Philipp A.

unread,
Mar 27, 2017, 5:45:36 PM3/27/17
to Ram Rachum, python-ideas
Ram Rachum <r...@rachum.com> schrieb am Mo., 27. März 2017 um 16:42 Uhr:
Another idea: Maybe make json.load and json.dump support Path objects?

yes, all string-path expecting stdlib APIs should support PEP 519

Greg Ewing

unread,
Mar 27, 2017, 7:38:44 PM3/27/17
to python-ideas
Paul Moore wrote:
> Is this a well-defined idea? ... There's nothing describing how
> multiple values would be stored in the same file/transmitted in the
> same stream.

I think this is something that's outside the scope of the spec.

But since the grammar makes it clear when you've reached the end
of a value, it seems entirely reasonable for a parser to just
stop reading from the stream at that point, and leave whatever
remains for the application to deal with as it sees fit. The
application can then choose to immediately read another value
from the same stream if it wants.

--
Greg

Victor Stinner

unread,
Mar 27, 2017, 7:43:36 PM3/27/17
to Steven D'Aprano, python-ideas
2017-03-27 17:04 GMT+02:00 Steven D'Aprano <st...@pearwood.info>:
> Of course pathlib can already read JSON, or for that matter ReST text
> or JPG binary files. It can read anything as text or bytes, including
> JSON:
>
> some_path.write_text(json.dumps(obj))
> json.loads(some_path.read_text())

Note: You should specify the encoding:

some_path.write_text(json.dumps(obj), encoding='utf8')
json.loads(some_path.read_text(encoding='utf8'))


> I don't think it should be pathlib's responsibility to deal with the
> file format (besides text).

Right.

Victor

Serhiy Storchaka

unread,
Mar 28, 2017, 2:58:53 AM3/28/17
to python...@python.org
On 28.03.17 02:35, Greg Ewing wrote:
> Paul Moore wrote:
>> Is this a well-defined idea? ... There's nothing describing how
>> multiple values would be stored in the same file/transmitted in the
>> same stream.
>
> I think this is something that's outside the scope of the spec.
>
> But since the grammar makes it clear when you've reached the end
> of a value, it seems entirely reasonable for a parser to just
> stop reading from the stream at that point, and leave whatever
> remains for the application to deal with as it sees fit. The
> application can then choose to immediately read another value
> from the same stream if it wants.

You can determine the end of integer literal only after reading a
character past the end of the integer literal. This there is not a way
to put back a character, it will be lost for following readers.

And currently json.load() is implemented by reading all file content at
once and passing it to json.loads(). Different implementation would be
much more complex (if we don't want to loss the performance).

Barry Scott

unread,
Mar 29, 2017, 4:39:11 PM3/29/17
to Python-Ideas
On 27 Mar 2017, at 15:08, Markus Meskanen <markusm...@gmail.com> wrote:

-1, should we also include write_ini, write_yaml, etc?


Markus, You illustrate why this is a bad design pattern to implement. It does not scale.

I attended a talk at PYCON UK that talked to the point of using object composition
rather then rich interfaces. I cannot recall the term that was used to cover this idea.

I also think that its a mistake to open a text file from pathlib.

-1

A pattern that allows pathlib.Path to be composed with content handling is an
interesting idea. Maybe that should be explored?

But that should be a separate topic.

Barry

Nick Timkovich

unread,
Mar 29, 2017, 5:04:44 PM3/29/17
to Barry Scott, Python-Ideas
I attended a talk at PYCON UK that talked to the point of using object composition
rather then rich interfaces. I cannot recall the term that was used to cover this idea.
 

Separating things by concern/abstraction (the storage vs. the serialization) results in easier-to-learn code, *especially* incrementally, as you can (for example) plug reading from a file, a socket, a database into the same JSON, INI, XML... functions.

Learn N ways to read data, M ways to transform the data, and you can do N*M things with N+M knowledge. If the libraries start tightly coupling everything, you need to start going through N*M methods, then do it yourself anyways, because reader X doesn't support new-hotness-format Y directly.

Perhaps less code could result from making objects "quack" alike, so instead of you doing the plumbing, the libraries themselves would. I recently was satisfied by being able to exchange

    with open('dump.txt') as f:
        for line in f:...

with

    import gzip
    with gzip.open('dump.gz', 'rt') as f:
        for line in f:...

and it just worked through the magic of file-like objects and context managers.

Nick

Wes Turner

unread,
Mar 29, 2017, 5:30:47 PM3/29/17
to Chris Barker, Ram Rachum, python-ideas
def as_pathlib_Path(obj):
        if hasattr(obj, 'get') and obj.get('@type') == 'pydatatypes:pathlib.Path':
            return pathlib.Path(obj.get('path'))
        return obj
 


def read_json(self, **kwargs):
    object_pairs_hook = kwargs.pop('object_pairs_hook', collections.OrderedDict) # OrderedDefaultDict
    object_hook = kwargs.pop('object_hook', as_pathlib_Path)
    encoding = kwargs.pop('encoding', 'utf8')
    with codecs.open(self, 'r ', encoding=encoding) as _file:
        return json.load(_file,
            object_pairs_hook=object_pairs_hook,
            object_hook=object_hook,
            **kwargs)

def write_json(self, obj, **kwargs):
    kwargs['cls'] = kwargs.pop('cls', PathJSONEncoder)
    encoding = kwargs.pop('encoding', 'utf8')
    with codecs.open(self, 'w', encoding=encoding) as _file:
        return json.dump(obj, _file, **kwargs)


def test_pathlib_json_encoder_decoder():
    p = pathlib.Path('./test.json')
    obj = dict(path=p, _path=str(unicode(p)))
    p.write_json(obj)
    obj2 = p.read_json()
    assert obj['path'] == obj2['path']
    assert isinstance(obj['path'], pathlib.Path)

should it be 'self' or 'obj'?
Reply all
Reply to author
Forward
0 new messages