Required Fields

84 views
Skip to first unread message

Ning Zhu

unread,
Jul 26, 2014, 1:27:48 PM7/26/14
to rav...@googlegroups.com
Hi all,

This might be a stupid question, but I cannot find the references any where.

I have a collection in the database, which holding documents should have at least say fields for "Name, Address, Phone", all other fields are just optional.

What is the index or other way to ensure that documents have those required fields?  

Thanks,
Ning

Felipe Leusin

unread,
Jul 28, 2014, 11:12:06 AM7/28/14
to rav...@googlegroups.com

Jeff Harris

unread,
Jul 29, 2014, 11:49:52 AM7/29/14
to rav...@googlegroups.com
There isn't a "built-in" way to require values in fields, but you can use the extension points to either:

a) Create a client-side "listener" to throw an exception if the client tries to store a document with null values in these fields (http://ravendb.net/docs/2.0/client-api/advanced/client-side-listeners) There's a good example at the bottom of this page of a way to block saving documents

b) Create a server-side "PUT Trigger" plugin (http://ravendb.net/docs/2.5/server/extending/plugins). You'd want to write some code in the "AllowPut()" method to block saving documents.


Kijana Woodard

unread,
Jul 29, 2014, 12:01:57 PM7/29/14
to rav...@googlegroups.com
And then you have to consider whether it's really worthwhile.

If you have to write this code, you have to test this code for correctness.
Why not test the "real code" for correctness?
What if in some cases the fields are required and in other cases they are not?

If you're worried about someone modifying the db directly and "messing things up",
- that might be a feature that gets you out of a tight squeeze
- restrict access to the db so people who don't know what they're doing don't go in there
- turn on the versioning bundle or some equivalent to log direct modifications

But, quantify the fears, development costs, potential costs of malfeasance, etc. Make the decision based on reasoned facts about the problem at hand instead of "in sql server, there was a 'nullable' checkbox that seemed useful for data integrity".


--
You received this message because you are subscribed to the Google Groups "RavenDB - 2nd generation document database" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ravendb+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Jeff Harris

unread,
Jul 29, 2014, 12:22:03 PM7/29/14
to rav...@googlegroups.com
@Kijana: I'm with you...any time you have to "extend" RavenDB to implement something like this, it's important to think it through 100% and consider the additional complexity and maintenance you are introducing.

One additional reason why this should be a *last-resort*: blocking storage of documents (especially at the database-level, via triggers) will only throw an exception during "SaveChanges()", which can be confusing to troubleshoot since the exception is not thrown at the place in code where you are actually storing this document.

Chris Marisic

unread,
Jul 29, 2014, 1:49:20 PM7/29/14
to rav...@googlegroups.com
Concerns for things HAVE TO EXIST is a RDBMS concept.

You really should think in terms of "what if it doesn't exist, can I go on? do i make the application error?"

Partial data is almost always better than zero data.

Kijana Woodard

unread,
Jul 29, 2014, 2:02:10 PM7/29/14
to rav...@googlegroups.com
I consider these design decisions as a user. I type info for 9 out of the 10 fields. I need to talk with someone tomorrow about the 10th field. It sucks that I can save what I've already typed.

Thinking deeper about the scenarios makes you question underlying assumptions about data, integrity, state transitions, etc.


--

Ning Zhu

unread,
Jul 31, 2014, 9:09:34 AM7/31/14
to rav...@googlegroups.com
Thanks guys for the input!  I will check the AllowPut on server side.

Make several data points, so far RavenDB works pretty good for us, but those data points can make it better.

1. Overall Raven has to keep Gov't and Big Enterprise users at its core value, no less than community users, otherwise it will never fly.

2. As database flexibility is just one side of the coin, the other side is enforce data quality and integrity -- this cannot be more important in the our situation.  We need both at the same time -- crazy right? 

3. It cannot be assumed that data quality and integrity can always be ensure at the business logic level -- we are using Raven to pull all data sources together for some overall data clean and analysis.  Do not even have access to the business logic layer, all we care about is data.  Yes, data from different sources, we need ensure their minimum quality and load them for further cleaning and analysis.

Sounds like AllowPut is the way for server side to ensure data quality.  We will develop some framework on that.

Chris Marisic

unread,
Jul 31, 2014, 10:26:33 AM7/31/14
to rav...@googlegroups.com


On Thursday, July 31, 2014 9:09:34 AM UTC-4, Ning Zhu wrote:
3. It cannot be assumed that data quality and integrity can always be ensure at the business logic level -- we are using Raven to pull all data sources together for some overall data clean and analysis.  Do not even have access to the business logic layer, all we care about is data.  Yes, data from different sources, we need ensure their minimum quality and load them for further cleaning and analysis.



That is a patently false statement. Don't give anyone direct access to your database. Then you can never have integrity issues that your software doesn't specifically allow.

A database should always be proxied by a service, whether it's a web service, rest api,  WCF service, service bus, BizTalk, etc. 

Kijana Woodard

unread,
Jul 31, 2014, 10:29:04 AM7/31/14
to rav...@googlegroups.com
"we need ensure their minimum quality and load them for further cleaning and analysis."

What defines minimum quality? 

Subjective business logic. 

What you are asking is for a bit of business logic to run on the database server itself. SQL Server, Oracle, etc have sold this as an unmitigated Good Thing. It may be good, it may not. But we should be clear that we are merely putting some of our application's logic in the database. It's not "something else".

"Further cleaning and analysis" is an interesting statement. This demonstrates that the database layer may need to be more flexible about the data that it accepts. The decision to reject some data as "unacceptable" is arbitrary and chosen by the organization writing the software. 

There is not _one_ business logic layer.

One could just as easily have a dedicated import process that enforces the "import rules". This process has no relationship to the broader application's "business logic".

Furthermore, it _may_ be a good idea to have a completely separate "import database" which handles all the activities around importing data:

Audit trail of rejected data
Workflow for data cleaning including automated cleaning, manual cleaning assignments, manager approval, etc
Analysis and statistical workflows

Then, only once the data has been through this process, it can be migrated to other databases for the broader application's use.

This allows separation of concerns and reduces the complexity of the overall application. The "end user" application(s) need not worry about filtering out "unclean" data.

One mistake enterprise / gov't makes is getting trapped into the notion that everything must fit in one db. This isn't an accident. The large database vendors have been selling this notion for years. :-D

The main fear is often that buggy code will "corrupt the database". Bugs exist. Bugs can also exist in database data restrictions. Code is code. Just because it's "in the db" doesn't make it perfect. Furthermore, what happens to data that the database rejects? Is it lost? Is it buried in a log file? Is this acceptable? How could it be better? How can we allow flexibility for exceptional situations? How do we handle shifting business expectations over time?

Once we start to answer those questions, a different viewpoint begins to emerge.

Enterprise software shouldn't be about locking down servers and calling it done. It's about understanding the data flows, gaining insight into reality, and then making informed decisions to mitigate risk and maximize upside. Management.



Jeff Harris

unread,
Jul 31, 2014, 1:23:29 PM7/31/14
to rav...@googlegroups.com
Ning Zhu,

I think Kijana and Chris have good points -- and I think the solutions they suggest here are much better than using "AllowPut()". 

I would be *very* careful to fully examine the side-effects of AllowPut().  Are you inserting the data via bulk insert? Are you understanding that data is only saved when you call SaveChanges() and not when you call Store()? Do you fully understand the ramifications of an exception being thrown during a batch operation? Are all the clients that will be inserting data smart enough to know the difference between the database being offline and a data record being invalid? Is it okay for your application to treat those things the same way? Are clients going to be able to "skip" inserting invalid records? If so, how do they know they are invalid?

While "AllowPut()" *looks* like what you need, I'd just be concerned that there are side-effects that will come up that will require a more robust solution.

I don't think I would have phrased things as bluntly as Chris did, but he's right...if you have the organizational power to shift the thinking of this project as just a "database" project towards thinking of this as a separate application that has its own separate business logic, I think your life will be much improved.

Oren Eini (Ayende Rahien)

unread,
Aug 1, 2014, 1:48:50 AM8/1/14
to ravendb
I would _strongly_ recommend that you would do the data integrity checks at the client side.
Doing this via the PutTrigger is probably going to end up in tears.



Oren Eini

CEO


Mobile: + 972-52-548-6969

Office:  + 972-4-622-7811

Fax:      + 972-153-4622-7811



Oren Eini (Ayende Rahien)

unread,
Aug 1, 2014, 1:52:13 AM8/1/14
to ravendb
Just to note.
The main use for the RavenDB Triggers is actually internal, this is how we implement bundles, handle new features, etc.
They are exposed externally because other users can also extend RavenDB easily this way.
However, most of the time, you need to know what you are doing to get things done, and we have seen a lot of poorly written triggers causing issues.

In particular, data integrity triggers will also block you from doing things like "fixing the data", because the business rule changed (middle name is no longer a requirement) and now you have to update the trigger, take down the server to replace the dll, and that is a massive unpleasant thing.



Oren Eini

CEO


Mobile: + 972-52-548-6969

Office:  + 972-4-622-7811

Fax:      + 972-153-4622-7811





Chris Marisic

unread,
Aug 1, 2014, 9:03:42 AM8/1/14
to rav...@googlegroups.com
Great advice Oren. I didn't look at it from the logistics side and the logistics side of this would be a nightmare.
Reply all
Reply to author
Forward
0 new messages