Importing Schema into EDG Data Assets does not work properly

26 views
Skip to first unread message

Tim Smith

unread,
Nov 20, 2019, 6:03:15 PM11/20/19
to topbrai...@googlegroups.com

Hi,

 

I’m using EDG Version: 6.2.3.v20190722-1623R.  When I create a new Data Asset collection and try to use the JDBC importer to bring in a schema from a SQL Server database, two things of note happen.

 

  1. The import fails.  This database is setup to use integrated security (IntegratedSecurity=True) and I receive the following error:

 

Error generating schema: This driver is not configured for integrated authentication. ClientConnectionId:1878a5f8-61ed-4459-899b-4f2d180203d9

 

  1. Once the error appears, I click OK.  This takes me back to the Import tab listing all import options.  It does not take me back to the JDBC Import parameter page so I can modify my parameters.  Instead, it makes me start over by re-entering everything.  It would be much more efficient if clicking OK on the error took me back to the parameter page .

 

Any thoughts on how I can import a schema when using integrated security?  Not having this will eliminate most SQL Server databases from use with EDG.

 

Thanks,

 

Tim

John Beard

unread,
Nov 21, 2019, 12:09:13 PM11/21/19
to TopBraid Suite Users
Hi Tim,

From the error page you can return the JDBC import form by using the browser back button.  As of now the importer only supports SQL Server authentication.  This would have to be enable by your DBA.

Tim Smith

unread,
Nov 21, 2019, 4:07:05 PM11/21/19
to topbrai...@googlegroups.com
Are there plans to support additional access methods?  It is not an option to change the access/security footprint of a database.

Also, I've noticed there is no support for big data or cloud service environments.   How do you envision companies capturing such assets in EDG?  Manually is not workable in all but the smallest environments. 

--
You received this message because you are subscribed to the Google Groups "TopBraid Suite Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to topbraid-user...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/topbraid-users/80f2061c-8e21-479e-a589-c31c7f440244%40googlegroups.com.

Irene Polikoff

unread,
Nov 22, 2019, 10:40:14 PM11/22/19
to topbrai...@googlegroups.com
Tim,

As I understand it, the issue is specific to MS SQL Server and its authentication methods. In other words, Oracle and other databases would not have this problem. MS SQL Server has two options for authentication; "Integrated" or "Integrated plus SQL Server".  Integrated is integrated Windows authentication.  SQL Server authentication is where the users and passwords are stored in SQL Server itself.  This is just like every other database vendor and the only one we currently support.  

To support integrated Windows authentication we would have to implement our own Kerberos client.  This would be a non-trivial new feature. We do understand that some administrators have a problem enabling SQL Server Authentication. There is an option to connect to EDG, as opposed to EDG reaching out and pulling the information. 

We want to provide the most commonly used data cataloging support pre-packaged. However, the types of sources and access methods used place in a given organizations  are very varied. This diversity is only growing. Further, with datasets in data lakes, organizations typically need cataloging be a part of the dataset creation and exchange processes as opposed to an "after the fact" inventory. This is because these environments are much more dynamic than RDBMS where schemas change rarely and while data changes, its nature is fairly stable.

To support this, EDG offers open APIs for writing, making it possible to connect any data source and process to EDG. Many organizations today are using JSON for APIs and data interchange, For this reason, we provide GraphQL APIs that use JSON to add information to EDG. The APIs are model-based. In the case of cataloging data sources, the models are mostly pre-built with EDG. And, of course, they can be extended. Any new class or a property you may define immediately becomes part of the API.

This applies to any source (e.g., a dataset in a cloud). Click on any sample data asset collection (e.g., Northwind), then select Export -> GraphQL. Click on Documentation and select mutationRootRDFMutation. The list is very long because the models are extensive. If you type “dataset” in the search field, you will see the following:


If you click on createDataSet, you will see the following input.


And so on - this is a partial view. A dataset can have a dataset schema if a common schema is used across multiple datasets. Dataset schema consist of data elements. Or, alternatively, there can be a direct connection between a dataset and a data element.

The idea here is that you can connect from any processing environment into EDG and provide as much or as little information as deemed necessary to register a dataset.

What we hear is happening with cataloging done “after the fact” is that organizations end up with millions of data elements and data entities. We all know, however, that no organization has millions of unique data elements. Information about the nature of the data is available in the context of its processing and can be captured. Figuring out what it is and how it may relate to other data items after a dataset is already in the lake becomes very hard. We know that even after years of efforts and significant investments, only a tiny portion of this data becomes understood. If data cataloging becomes a part of the data creation processes, the odds of getting in front of these problems increases.


Similarly, you can filter on “database"


And select, for example, createDatabaseColumn to see input details:




Reply all
Reply to author
Forward
0 new messages