XBlock User service, nearly implemented

Jason Bau

unread,

Nov 26, 2014, 11:27:39 PM11/26/14

to edx-...@googlegroups.com

Hi,

As a product of the open edX conference hackathon, I've got code that's roughly in shape to merge which provides a simple user info service to xblocks. Basically, this service as currently implemented will return to requesting xblocks the following information about the currently logged-in user: username, email, user_id, and full name. I'd like to circulate the design and the fields provided by the service to the community, as hopefully one of the last steps before merging, so I've written it up here:

https://docs.google.com/document/d/1Vd6jlfVpoBy5LQ-8SV6y-vIOnxYOC3OEtErv_j6yZYM/edit?usp=sharing

The edx-platform part of this PR is at https://github.com/edx/edx-platform/pull/6013/files, while the xblock addition was merged in https://github.com/edx/XBlock/pull/273/files.

Some previous discussion surrounding this happened on https://groups.google.com/forum/#!searchin/edx-code/user$20service/edx-code/LST8teVqQKI/Es0W8z0vof8J. The current implementation differs from that discussion in that it returns no pseudonymous identifiers associated with the user, with the rationale explained in the Google doc.

Your comments are welcome.

Jason

Piotr Mitros

unread,

Dec 2, 2014, 8:44:58 AM12/2/14

to edx-...@googlegroups.com

Thank you. I wanted to propose two small changes to the PR which might make it substantially more future-proof. First, to motivate the changes, the issues with the PR are:

1) The version from Stanford presumes a static set of fields matching what we currently have in edx-platform, and that they will be used in a specific way.

In a little more detail:

When you look at something like the edX Insider course and the Profile XBlock (or Novoed), the set of fields is flexible and course-specific. This provides great pedagogical value.
If you look at different LMSes, they collect different sets of profile information (edX, as an international platform, uses a full legal name, which handles non-US name formats, while more US-centric platforms will have first/last name split). If we want XBlocks to be used outside of edX platform, the service cannot presume the edX fields exist or are identical.
Different LMSes (and edX as it evolves) may want to change how it uses these fields. We may want semantic abstractions for those use cases, which may not map 1:1 to what we have in the back-end. I expect this might be something like:

"Give me the user's name, as we would show to other users." Depending on LMS, this may be username, e-mail, real name, etc.
"Give me the user's name, as I would use to address the user." Depending on LMS, this may be full name, first name, or username.
"Give me the user's contact info to show to other users"
"Contact the user" (a.k.a. notifications)

2) The caller of the service also does not advertise what they need. An e-mail, for example, is more sensitive than a username. An LMS might want to allow untrusted XBlocks to have a username, but not an e-mail.

3) There isn't a clear set of use cases. For example, before putting in e-mail, I'd consider why we want that here. We have a notifications framework for sending e-mails (without knowing usernames) coming down the line. Is what we're looking for e-mail, or the user's contact information? It may be e-mail, but there should be a clear explanation of use case and why (which would, in turn, drive what we call the field, how we document it, etc).

With that background, I wanted to propose a long-term architectural direction, as well as a minor change to the patch which puts us on that direction.

As a proposal, we would split semantic information (e.g. displayname for showing to other users) from the source information. The XBlocks framework would guarantee that the semantic information was there, but it would not guarantee any particular set of source information. The semantic information would be explicit variables in the service, while the source information would be in a dict-like object:

xblock_user.display_name -- Returns the user's name, as shown to peers on the platform. This might be a username on edX MOOCs, and the real name in residential classes.
xblock_user.address_name -- Returns the user's name, as used when addressing the user. This could be a first name, a username, a title+last name, or a full legal name.
xblock_user["edx-platfrom.user_id"] -- Returns the user ID, as per edX' internal representation. Either None or KeyError it doesn't exist.
xblock_user["edx-platfrom.legal_name"] -- Returns the user's legal name, as per edX' internal representation. Either None or KeyError it doesn't exist.
xblock_user["udacity.first_name"] -- Returns the user's first name, as per Udacity's internal representation. Either None or KeyError it doesn't exist.
xblock_user["profile.country"] -- Returns the user's country, as registered with the Profile XBlock. Either None or KeyError it doesn't exist.

If someone can come up with cleaner names, that would be better.

In addition (but less importantly), it'd be nice if the constructor of the service took an optional argument which listed the set of data required (won't work without) and requested (will work better with). Right now, this could be ignored, or it could raise an exception if the block asks for something not on the list. I think we'll probably get it wrong on the first shot, but a first pass would be a good starting point for a discussion.

If we wanted to go in this direction, the required changes to the PR would be modest:

Defining a clear set of use-cases for the user service.
Changing the variable names in __init__ to cover those use cases. So instead of self.username=kwargs.get('username'), the code would have self.displayname=kwargs.get('username')
Adding appropriate documentation around what those fields mean.

In addition, as nice-to-haves:

Adding a __getitem__ that always returns None.
Adding a __contains__ that always returns False.
Make a first pass at how we declare what data the XBlock will want during operation, to get the discussion going.

I'd suggest we build out the standard fields as specific use cases come up. So the initial PR would add the specific fields which Stanford needs, and we would add additional fields on an as-needed basis. In the PR, we could either add the full set of edX data to __getitem__ and __contains__, or just leave them blank until the need arises.

Thoughts?

Piotr

Calen Pennington

unread,

Dec 2, 2014, 10:21:08 AM12/2/14

to edx-...@googlegroups.com

From an ease of implementation and clarity of api, I'd tend to prefer something along the lines of xblock_user.raw["edx-platform.user_id"], where xblock_user.raw would be (or act) like a normal python dictionary, including using .get("edx-platform.user_id") to retrieve with a default value, and raising KeyError for direct ["edx-platform.user_id"] access of a raw attribute that doesn't exist. I think that interface will end up being less confusing to when reading the source in the future, as it will conform to expectations around dictionary access.

-Cale

Piotr Mitros

unread,

Dec 2, 2014, 10:31:42 AM12/2/14

to edx-...@googlegroups.com

I'm good either way. I tend to favor brevity as a matter of personal style.

The name raw may be a bit misleading. It makes sense for edx-platform.user_id, where the non-brevity also has the advantage of directing people to the semantic rather than raw version. However, I suspect the majority use case might be more like profilexblock.country. This is neither raw nor discouraged (just not standardized or guaranteed to exist).

Piotr

Calen Pennington

unread,

Dec 2, 2014, 11:16:18 AM12/2/14

to edx-...@googlegroups.com

Good points. I'd be happy to substitute a different name for `raw`.

-Cale

Ned Batchelder

unread,

Dec 2, 2014, 1:35:03 PM12/2/14

to edx-...@googlegroups.com

If we're going to have two levels of information, I would prefer that we use more conventional means of access: attributes and functions. Instead of xblock_user["edx.userid"], why not: xblock_user.get_data("edx.userid") . This makes the interface clearer and easier to document.

--Ned.

Piotr Mitros

unread,

Dec 3, 2014, 9:51:58 AM12/3/14

to edx-...@googlegroups.com

Hi Ned,

I really do think the profile is conceptually a dictionary. If you look at how services behind this might look, they're likely to do something like:

user_profile = dict()

user_profile.update(tbd_service_architecture("edx_platform").get_profile())

for service in tbd_service_architecture.list_providers("profile_information", course_level):

user_profile.update(service.get_profile())

return JSONResponse(user_profile)

Most of the time, what will pass over the wire will be JSON dictionaries. Databases will store dictionaries in some form (for Mongo, JSON; otherwise, as appropriate). Overall, abstraction are helpful when they abstract things away or make things more clear, but here, I expect, 90% of the time, there is a 1:1 mapping. In a case like this, they just confuse things since they rename things, add new APIs, and prevent reuse of existing APIs.

For V1, I'd be happy to make this an actual dict(), but I would not want to guarantee a literal dict() in the docs; just a dict()-like object. I can see us wanting to change it in a number of ways in the future:

Evaluating pull vs. push for performance
Adding metadata (a function which essentially asks: "describe key profilexblock.country," determine which keys are private/public, etc.)
More specific search (instead of for key in user_service.source, we might have for key in user_service.source.list("edx-platform") to get just fields in the edx-platform namespace.

The interface and documentation will be much simpler if XBlock authors can leverage what they know about Python, rather than relearning our own renamings and reinvented semantics of __getitem__, __contains__, __iter__, etc.

Again, I don't have strong feelings about whether the service object should be this dict-like object, contain this dict-like object, or a have a function which returns it, but using get_data for each key would make me sad.

Piotr

Piotr Mitros

unread,

Dec 3, 2014, 9:53:04 AM12/3/14

to edx-...@googlegroups.com

(Just to be clear -- I'm not proposing the bullet changes as all things we'll want to do -- just that want to be able to make those kinds of changes)

Yarko Tymciurak

unread,

Dec 3, 2014, 10:45:51 AM12/3/14

to edx-...@googlegroups.com

... I briefly looked at Ned's "a.get_data(...)". and wondered how this is clearer (or what I'd missed) over the idiomatic " a.get(...)"

Jason Bau

unread,

Dec 3, 2014, 7:07:52 PM12/3/14

to edx-...@googlegroups.com

Given the discussion, I am planning the following to move forward. It changes few things but is reasonably future proof in that user attributes are now two-layered and seems to be a decent compromise of peoples' opinions.

xblock_user.is_current_user is a Boolean (it will always be true for an XBlockUser returned from get_current_user())

xblock_user.email is a string or None.

xblock_user.full_name is a string or None.

(more of these can be added in the future)

xblock_user.opt_attrs is a dict-like thing that _is_ a dict in this version. So it can support .get(). It will never be None.

To start, it will fill in the following attributes when available:

xblock_user.opt_attrs['edx-platform.is_authenticated'] (Boolean: pretty much always available if the runtime is edx-platform)

xblock_user.opt_attrs['edx-platform.username'] (available when is_authenticated=True)

xblock_user.opt_attrs['edx-platform.user_id'] (available when is_authenticated=True)

Speak your objections now, if any,

Jason

Calen Pennington

unread,

Dec 4, 2014, 9:17:07 AM12/4/14

to edx-...@googlegroups.com

Hi, Jason,

I'm curious about why you opted to stick with `.full_name`, compared with Piotr's suggestion of .display_name and .address_name. In particular, I think being clear about the context that the name is going to be used in is a useful distinction to make.

-Cale

Jason Bau

unread,

Dec 4, 2014, 12:06:26 PM12/4/14

to edx-...@googlegroups.com

Hi Cale,

Some related points in my rationale:

0. "Full Name" is what edx-platform currently asks of users upon registration. See attached screenshot.

1. I think `.full_name` is useful to have as a fall-back field. We aren't going to be able to anticipate all the contexts in which a name is going to be used. One good example of where `.display_name` and `.address_name` don't fit perfectly is on Certificates. This could of course be addressed by adding `.certificate_name`, but you can see where we may just want to have a sane fall-back. (As an aside, certificates are the only use case for "Full Name" the platform mentions to users upon signup: see attached screenshot)

2. I'm _slightly_ uncomfortable with providing `address_name` and `display_name` when currently the platform doesn't ask the user for their preferences for these (not ruling out the possibility it may in the future). I think computing `address_name` is fairly fraught difficult, given all the cultural and personal preferences, and `full_name` is often not a great default. On the other hand, `full_name` has become the de-facto `display_name` by virtue of the edx-platform top navigation, so computing an acceptable value for `display_name` at present seems doable.

So, I would be fine with adding `.display_name`, while keeping `.full_name`, and setting both to the same value to start. I think `.address_name` in particular should be added when we have a reliable value to provide.

Jason

Screen Shot 2014-12-04 at 7.58.45 AM.png

Calen Pennington

unread,

Dec 4, 2014, 1:56:06 PM12/4/14

to edx-...@googlegroups.com

Just adding .display_name sounds reasonable to me, and the rationale behind full_name as well.

-Cale

Piotr Mitros

unread,

Dec 4, 2014, 4:28:56 PM12/4/14

to edx-...@googlegroups.com

I'd be a lot more comfortable adding full_name with specific use cases. I have not seen any. Jason: Would you mind providing some?

That we ask for that on registering is a specific reason to not have it in the semantic fields, but rather to put it in the platform-extensions. It is edX-specific. XBlocks are designed to be cross-platform (although now, only 2 or 3 platforms support it). In many platforms, calculating the equivalent of full name is impossible. They ask for first name/last name. In China, we have last/first. In the US, it is first/last. In other parts of the world, there are other algorithms for doing this. I'm hesitant to add a dangerous feature (in terms of technical debt, privacy, and compatibility) without clear use-cases. In terms of default fall-back, I'd rather use platform['edx.full_name'] for now. Most XBlocks run in edX, and this clearly signals that it might not work outside of edX. As we see how people most use this, we could figure out whether it is something we want to add, whether we want to add a few fields like address_name and similar, or if we want to omit this. Jason: What is the downside of accessing this through something like platform['edx.fullname'] in the short term?

My preference is to add address_name as well, with something like username for now (or another not-completely-reasonable default), and swap what it points to when something better comes along. Otherwise, we accumulate technical debt in XBlocks. However, I'm okay leaving it out.

I don't want to be a gatekeeper here. I'll argue, but I ultimately defer to Cale, Dave, and Jason.

Piotr

Jason Bau

unread,

Dec 4, 2014, 6:34:37 PM12/4/14

to edx-...@googlegroups.com

Hi Piotr,

On Thursday, December 4, 2014 1:28:56 PM UTC-8, Piotr Mitros wrote:

I'd be a lot more comfortable adding full_name with specific use cases. I have not seen any. Jason: Would you mind providing some?

I think my previous post does provide a use case, but to be completely explicit, certificates are one such use case. We run self-service certs at Stanford, with a "certify me" button that kicks off cert generation. Say we want to xblockify what is now just an HTML template, and in doing that, the person's full name, which edx-platform implies will appear on their cert, will be needed. One could think of other potential social-like xblocks where full name, in addition to contextual forms of address, might be useful.

That we ask for that on registering is a specific reason to not have it in the semantic fields, but rather to put it in the platform-extensions. It is edX-specific. XBlocks are designed to be cross-platform (although now, only 2 or 3 platforms support it). In many platforms, calculating the equivalent of full name is impossible. They ask for first name/last name. In China, we have last/first. In the US, it is first/last. In other parts of the world, there are other algorithms for doing this. I'm hesitant to add a dangerous feature (in terms of technical debt, privacy, and compatibility) without clear use-cases. In terms of default fall-back, I'd rather use platform['edx.full_name'] for now. Most XBlocks run in edX, and this clearly signals that it might not work outside of edX. As we see how people most use this, we could figure out whether it is something we want to add, whether we want to add a few fields like address_name and similar, or if we want to omit this. Jason: What is the downside of accessing this through something like platform['edx.fullname'] in the short term?

I really think Full Name deserves to be an easily accessible semantic field, and having that field be nullable alleviates potential incompatibility. I think it was smart and correct of edx-platform to ask for Full Name rather than First and Last, so let's defer the problem of providing that field to other platforms that want to support xblocks (presumably, this is also giving them the power to provide an implementation they like.) WRT privacy, I've said all along the intention for this service is to provide PII to xblocks, and that the entire service could/should be access controlled when xblocks runtimes have evolved to allow that.

Jason

Piotr Mitros

unread,

Dec 5, 2014, 8:02:34 AM12/5/14

to edx-...@googlegroups.com

That sounds reasonable.

Nit: I'd consider something like legal_name, but your call.

Piotr

Piotr Mitros

unread,

Dec 5, 2014, 8:02:47 AM12/5/14

to edx-...@googlegroups.com

Thanks for building this.

Jason Bau

unread,

Dec 5, 2014, 4:18:29 PM12/5/14

to edx-...@googlegroups.com

I'm happy to contribute to the platform.

I have one proposed addition, which I'm not dogmatic about adding. Just wanted to throw it out there.

Looking at code like this makes me a bit sad:

in lms/lib/xblock/runtime.py:

```

class UserTagsService(object):

def _get_current_user(self):

"""Returns the real, not anonymized, current user."""

real_user = self.runtime.get_real_user(self.runtime.anonymous_student_id)

return real_user

```

The runtime instantiation converts the django user to an anonymous_student_id, and UserTagService uses anonymous_student_id to re-lookup the django user.

Should we just add xblock_user.opt_attrs['edx-platform.django_user'] ?

Jason

Piotr Mitros

unread,

Dec 8, 2014, 7:18:33 PM12/8/14

to edx-...@googlegroups.com

It seems sensible to me. I don't like 'django_user.' Perhaps 'edx-platform.username' for auth.user.username, and 'edx-platform.student_id' for auth.user.id? That's the terminology used elsewhere.

Piotr

Jason Bau

unread,

Dec 8, 2014, 7:49:06 PM12/8/14

to edx-...@googlegroups.com

actually, I was proposing adding the actual django user object to opt_attrs. I think the other attributes are already part of the implementation and are similarly named to your suggestions.

Reply all

Reply to author

Forward