Tim,
As I understand it, the issue is specific to MS SQL Server and its authentication methods. In other words, Oracle and other databases would not have this problem. MS SQL Server has two options for authentication; "Integrated" or "Integrated plus SQL Server". Integrated is integrated Windows authentication. SQL Server authentication is where the users and passwords are stored in SQL Server itself. This is just like every other database vendor and the only one we currently support.
To support integrated Windows authentication we would have to implement our own Kerberos client. This would be a non-trivial new feature. We do understand that some administrators have a problem enabling SQL Server Authentication. There is an option to connect to EDG, as opposed to EDG reaching out and pulling the information.
We want to provide the most commonly used data cataloging support pre-packaged. However, the types of sources and access methods used place in a given organizations are very varied. This diversity is only growing. Further, with datasets in data lakes, organizations typically need cataloging be a part of the dataset creation and exchange processes as opposed to an "after the fact" inventory. This is because these environments are much more dynamic than RDBMS where schemas change rarely and while data changes, its nature is fairly stable.
To support this, EDG offers open APIs for writing, making it possible to connect any data source and process to EDG. Many organizations today are using JSON for APIs and data interchange, For this reason, we provide GraphQL APIs that use JSON to add information to EDG. The APIs are model-based. In the case of cataloging data sources, the models are mostly pre-built with EDG. And, of course, they can be extended. Any new class or a property you may define immediately becomes part of the API.
This applies to any source (e.g., a dataset in a cloud). Click on any sample data asset collection (e.g., Northwind), then select Export -> GraphQL. Click on Documentation and select mutation: RootRDFMutation. The list is very long because the models are extensive. If you type “dataset” in the search field, you will see the following:
If you click on createDataSet, you will see the following input.
And so on - this is a partial view. A dataset can have a dataset schema if a common schema is used across multiple datasets. Dataset schema consist of data elements. Or, alternatively, there can be a direct connection between a dataset and a data element.
The idea here is that you can connect from any processing environment into EDG and provide as much or as little information as deemed necessary to register a dataset.
What we hear is happening with cataloging done “after the fact” is that organizations end up with millions of data elements and data entities. We all know, however, that no organization has millions of unique data elements. Information about the nature of the data is available in the context of its processing and can be captured. Figuring out what it is and how it may relate to other data items after a dataset is already in the lake becomes very hard. We know that even after years of efforts and significant investments, only a tiny portion of this data becomes understood. If data cataloging becomes a part of the data creation processes, the odds of getting in front of these problems increases.
Similarly, you can filter on “database"
And select, for example, createDatabaseColumn to see input details: