Apoc Json

1 view

Skip to first unread message

Argelia Long

unread,

Aug 5, 2024, 11:01:09 AM8/5/24

to mafectimyf

TheLoad JSON procedures retrieve data from URLs or maps and turn it into map value(s) for Cypher to consume.Cypher has support for deconstructing nested documents with dot syntax, slices, UNWIND etc. so it is easy to turn nested data into graphs.

This procedure takes a file or HTTP URL and parses the JSON into a map data structure.It is a more configurable version of apoc.load.json that enables processing of endpoints that require HTTP headers or JSON payloads.

Using JSON paths gives you a condensed way to read and process sub-documents and sub-values from nested JSON structures.This is especially helpful if you need to skip over unwinding higher-level parent objects in order to access more nested data, or if you need to manipulate values in those substructures.

Many of the apoc.convert.Json procedures and functions, as well as the apoc.load.json procedure, now accept a json path as last argument.Note that these functions are meant to stream arrays (of values or objects) and maps, not a single value.If a single item containing a single value is specified as the path, the function must try to wrap it and will not return expected results.

There is also the apoc.json.path(json,path) function that takes a JSON string (not map or list) and retrieves values from the json path provided as the second argument.Note: if the JSON is not already in string format, you can use the apoc.convert.toJson function to convert it.

More examples can be found at the links provided above, but let us look at an example of the syntax for JSON paths.The syntax shown below pulls the items array from the StackOverflow API of Neo4j questions and retrieves the array of tags from the first object in the item list.

Moreover, we can customize the Json path options, adding the config pathOptions: LIST OF STRINGS,where the strings are based on Enum.The default value is ["SUPPRESS_EXCEPTIONS", "DEFAULT_PATH_LEAF_TO_NULL"]. Note that we can also insert [], that is "without options".So with the following json:

You can use failOnError configuration to handle the result in case of incorrect url or json.For example, with the help of the apoc.when procedure, you can return nothingToDo as result with incorrect url:

We can narrow down the data that we sift through and import using the JSON path syntax.This will allow us to specify substructures to import and ignore the rest of the data.For this example, we only want to import answers and the members posting those answers.

Notice that we are only looking at StackOverflow questions that have an answer count greater than 0.That means we are only passing along the question JSON objects that have answers, as the rest do not pertain to our use case.With this in mind, let us import those with this statement:

I would like to include this URL in a Neo4j query to retrieve the data directly in Neo4j. So I turned to APOC. Below is a query that calls apoc.load.json, which you can paste into your Neo4j Desktop query window for testing:

Neo.ClientError.Procedure.ProcedureCallFailed: Failed to invoke procedure apoc.load.json: Caused by: java.lang.RuntimeException: Can't read url or key Welcome - pne data Stavangerregionen as json: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target

But I don't think that's the version used by Neo4j Desktop, which I am using here. Neo4j Desktop installs Neo4j Enterprise, and Neo4j Enterprise supplies its own JDK, which I read somewhere is Zulu OpenJDK. (I can't remember the source.)

An answer to the question in that post would help answer this one. If I can find how to swap out Neo4j Enterprise's JDK I would experiment with different JDKs and the website certificate issue of this post.

I found a work-around for using Neo4j to directly pull the data from the Open Data site, by using an alternative url. The original url I used when stating the question invokes an API provided by the organization CKAN. And that requires a certificate. But near the api link there is also a download button that fetches it in .csv format. That link does not require Neo4j to negotiate with the above API. So here is the working cypher query with the download url.

Would like to add that I am able to download the wiki.json file and import it using CALL apoc.load.json(file///) but this doesn't really fix the problem as it fails when trying to fetch a file in another Neo4J example shown here

I have managed to load the same file with the same user on the same computer before. I'm now rebuilding my graph from scratch - the only difference is that I just upgraded to Neo4j version 4.4.8 from 4.4.5. I know some people will say to go back to version 4.4.5 but I'm wondering if there is something I'm missing.

I'm going to go back to 4.4.5 and see if it works - my old graph (on 4.4.5) still works with loading in the json even with the "dbms.directories.import" not commented out - my settings file allows that as well as "apoc.import.file.enabled=true"

Hi @stephflint - is the file in the correct directory? By default, it would need to be in the "import" directory of your database, or you can comment out the "dbms.directories.import" setting to allow reading files from anywhere.

@glilienfield hi, thanks for your answer. I tried that and it caused an interesting situation... when I clicked the import folder it says "Trying to open undefined directory". Is there a way to link to the import folder - how might I find it on my system? At this stage, I'm thinking of upgrading to the latest version - but would be good to know incase it happens again

With apoc.load.json you can retrieve data from URLs and turn it into map value(s) for Cypher to consume. Cypher is pretty good at deconstructing nested documents with dot syntax, slices, UNWIND etc. so it is easy to turn nested data into graphs.

I try to load a large (4.5GB) json file into neo4j. This file is in jsonl format, meaning each json object is on its own line. There are about 5.3 million entries.

I read about the apoc.load..() functions but have a few questions:

You should try with apoc.periodic.iterate(). It is used to load your data in transactional batches and in parallel. By using it, the heap memory will be released in every batch and the load time will be faster.

I tried out several combinations of this, but always getting to running out of memory exception. I tried different batch sizes, different ways to split the query etc. However, I think that this way is the right one!

We can use the output of procedures in a query as well as passing the output of a query into a procedure.For example if we want to get the list of labels in the database in alphabetical order we could write the following query:

A common request in the Neo4j forums is for a function to format timestamps in a human friendly way and we have just the function for that in apoc.We can use it to find out the date and time of the last 5 events in our dataset:

As an extension author, it is likely that you may want to test your extension during its development.This chapter details how extension authors can set up automatic extension testing. We'll do that withtwo examples. Both embed the given extension in a TYPO3 instance and run tests within this environment,both examples also configure GitHub Actions to execute tests. We'll use Docker containers for test execution again and usean extension specific runTests.sh script for executing test setup and execution.

If a project needs a TYPO3 extension, it will add the required extension using composer requireto its own root composer.json file. The extensions composer.json then specifies additional detail, forinstance which PHP class namespaces it provides and where they can be found. This properlyintegrates the extension into the project and the project then "knows" the location of extensionclasses.

If we want to test extension code directly, we do a similar change: We turn the composer.jsonfile of the extension into a root composer.json file.That file then serves two needs at the same time: It is used by projects that require the extensionas a dependency and it is used as the root composer.json to specify dependencies turning the extensioninto a project on its own for testing. The latter allows us to set up a full TYPO3environment in a sub folder of the extension and execute the tests within this sub folder.

The extension enetcache is a small extension that helpswith frontend plugin based caches. It has been available as Composer package and a TER extension for quitesome time and is loosely maintained to keep up with current Core versions.

On this page, we focus on testing one TYPO3 version at a time though it ispossible to support and test 2 TYPO3 versions in one branch with thetypo3/testing-framework and enetcache does this. But, for the sake of simplicitywe describe the simpler use case here.

As outlined in the general strategy, we need to extend the existing composer.json file byadding some root composer.json specific things. This does not harm the functionality of the existingcomposer.json properties if the extension is a project dependency and not used as root composer.json:Root properties are ignored in Composer if the file is not used as root project file, see thenotes "root-only" of the Composer documentation for details.

This is a typical composer.json file without any complexity: It's a typo3-cms-extension, with anauthor and a license. We are stating that "I need at least 13.0.0 of cms-core" and we tell the autoloader"find all class names starting with Lolli\Enetcache in the Classes/ directory".

Now let's add our properties to put these tests into action. First, we add a series of properties to composer.jsonto add root composer.json details, turning the extension into a project at the same time:

Note all added properties are only used within our root composer.json files, they are ignored if theextension is loaded as a dependency in our project. Note: We specify .Build asbuild directory. This is where our TYPO3 instance will be set up. We add typo3/testing-frameworkin a v13 compatible version as require-dev dependency. We add a autoload-dev to tell composerthat test classes are found in the Tests/ directory.