How to manage a large taxonomy

69 views
Skip to first unread message

Jon Ku

unread,
Jul 18, 2024, 7:49:24 PMJul 18
to dotCMS User Group
We're a new, licensed dotCMS managed site, migrating from a bespoke knowledge base.

So far we have successfully tested the following:
  • WebDav to manage files/folders
  • dotCLI running
  • Webhooks to load content (future batch load of existing content)
  • prototype Taxonomy and Content page content types
  • Build a single page application to demonstrate hierarchical menus based on the Taxonomy content type with parent Relationship
  • Content page content type attached to the taxonomies with another Relationship

QUESTION
The question at hand is how to use native tools to allow easy author/editor access to select a Taxonomy (i.e. menus and breadcrumbs) when there are 1,000 different taxonomies.

This also requires that certain branches and certain leaf pages be attached to multiple parts of the hierarchy, and also that some taxonomies will have the same name, for example Billing is a common one. This seems to rule out Categories and Folders.

We are exploring Relationships as the best approach, having created a Taxonomy content object that has both parent and child relationships for navigation, as well as relating content to be found on those menus.

ISSUE
With all 1,000 taxonomies loaded, using the Related selector panel will be unwieldy at best. Search will help, however our current users are comfortable with navigating a tree structure UI similar to Folders in dotCMS.

SOLUTION
Is there a solution in place to manage this type of taxonomy using Relationships?

We speculate that creating a parallel folder structure to the taxonomy would do the following:
  • Use Relationship taxonomy contentlets on the front end to create menus, including duplicate branches.
  • Each taxonomy has a matching folder with the same title, and the folder contains that single taxonomy.
  • Users of the Relate panel can navigate the folders with the Sites/Folder selector widget on the left-hand side, and click on the resulting taxonomy once they reach it.
BIG QUESTION
Can a vtl actionlet or other workflow action generate a new folder and set its parent when the taxonomy contentlet is saved? I don't see that referenced.

Else is it realistic to have a separate batch process to traverse the taxonomy relationship tree and create a matching folder for each, and could that succeed if triggered on saving a new or edited taxonomy contentlet.

note: I'm using my personal Google account for this Google Group, my work email for some reason can't find it.

Mark Pitely

unread,
Jul 19, 2024, 10:18:16 AMJul 19
to dot...@googlegroups.com
It is fairly easy to write some html/vtl/javascript to interact with the API to 'reproduce' whatever the system does. That is, build a back-end-like app/page that hides the complexity from the user. It pulls the data they need and filters the options available to them and gives them a form to update. 
Keep the real backend for the developers. I don't know if you can create folders via the API, but I would be surprised if you couldn't. You also could just create a structure that provides 90% of what a folder would offer with 5 times the flexibility. (I'm clearly not a fan of folders at this stage, they are more of a legacy concept)

Here's an example concept:

This is a tool that allows people to create new pages, adding them to the navigation tree. (I have disabled the ability to actually do anything for the time being, and normally the page is password-protected).
They cannot create new top-level nav (a place where I am restricting them)
If you click new, it gives you a form, and if you complete it, it extracts the data from the old site (That's the 'verify' button, which does a lot of work) and then builds a new page in dotCMS with the old content along the correct path.  
This is all using a URLMap to make the folder structure. 
This is so I can have non-admin users help build out a site without really understanding what they are doing and controlling what options they have and also automating the actual cut-paste work - this sounds similar to what you need. It's a tool with training wheels. 
The code will follow.

Hope this helps!

Mark Pitely
Albright College

<style>
.openstar::before{ content: '\2606';}
.closedstar::before{ content: '\2605';}    
   
</style>

<div style="padding-top:20px"></div>
#set ($pull=$dotcontent.pull("+contentType:Navigation",0,"Navigation.title"))
#foreach($con in $pull)
#if (!$con.parent)
<div  class="toplevel" data-title="$con.title" data-top="$!con.path" data-path="$con.path" data-identifier="$con.identifier" id="p${velocityCount}" style="display:inline-block;font-size:22px;margin-bottom:10px"><span class="star openstar" style="cursor:pointer" id="id-${con.path}" onclick="focuser('id-${con.path}','p${velocityCount}')"></span> <a style="width:500px;min-width:500px;display:inline-block" target="_blank" href="/${con.path}">$con.title</a><div style="width:40px;border:2px solid red;display:inline-block;cursor:pointer;padding:3px;margin-left:20px;font-size:14px;" onclick="adder('p${velocityCount}')">Add</div></div>


#foreach($cone in $dotcontent.pull("+contentType:Navigation ",0,"Navigation.title"))
#if ($cone.parent.get(0).title==$con.title)
<div class="first-child hidden" data-title="$cone.title" data-top="$cone.top" data-path="$cone.path" data-identifier="$cone.identifier" id="c${velocityCount}" style="padding-left:20px;display:inline-block;font-size:16px;margin-top:10px;margin-bottom:10px"><a style="width:400px;min-width:400px;display:inline-block" target="_blank" href="/${cone.top}/${cone.path}">$cone.title</a><div style="display:inline-block;width:40px;border:2px solid red;cursor:pointer;padding:3px;margin-left:20px;font-size:14px" onclick="adder('c${velocityCount}')">Add</div></div>

#end
#end



#end
##Of has parent

#end
##of pull







<form id="theform" style="margin-top:20px;margin-bottom:20px" class="hidden">
   <div style="float:left;font-size:18px;">Add Page to <span style="color:#a51e36;" id="adder"></span></div>
    <div style="height:10px;clear:both"></div>
  <div style="width:100px;float:left">New Page Title</div>
  <input type="text" id="title" name="title" style="float:left;font-size;16px;width:400px">
  <div style="height:10px;clear:both"></div>
  <div style="width:100px;float:left">New Page Name (for url)</div>
  <input style="float:left;font-size;16px;width:400px" type="text" id="path" name="path">
   <div style="height:10px;clear:both"></div>
   <div style="width:100px;float:left">Wordpress URL</div>
  <input style="float:left;font-size;16px;width:400px" type="text" id="wordpressurl" name="wordpressurl" > <div style="float:left;border:1px solid green;margin-left:5px;cursor:pointer" onclick="gethtml()">Verify</div>
   <div style="height:10px;clear:both"></div>
   <div style="width:100px;float:left">Top-level Path</div>
  <input style="float:left;font-size;16px;width:400px"  type="text" id="top" name="top">
   <div style="height:10px;clear:both"></div>
   <div style="width:100px;float:left">Notes/Comments</div>
  <input style="float:left;font-size;16px;width:400px"  type="text" id="notes" name="notes">
   <div style="height:10px;clear:both"></div>
  <input style="display:none" type="text" id="identifier" name="identifier">
 
 <input id="button" class="hidden" type="button" onclick="poster();" value="Submit" style="color:#000">
</form>




<script>



function focuser(starid,whatid){
   
var where=document.getElementById(whatid);
 var parents=document.getElementsByClassName("toplevel");
 var children=document.getElementsByClassName("first-child");
 var star=document.getElementById(starid);
 
 
if (!where.classList.contains("focus")){
    star.classList.remove("openstar");
    where.classList.add("focus");
    star.classList.add("closedstar");
for (var i = 0; i < parents.length; i += 1){
    parents[i].classList.add("hidden");
}
for (var i = 0; i < children.length; i += 1){
    children[i].classList.add("hidden");
 if (children[i].dataset.top==where.dataset.top) children[i].classList.remove("hidden");  
}    
where.classList.remove("hidden");  
return;
}//of not focused  

if (where.classList.contains("focus")){
    star.classList.remove("closedstar");
    star.classList.add("openstar");
    where.classList.remove("focus");
   
for (var i = 0; i < parents.length; i += 1){
    parents[i].classList.remove("hidden");
}
for (var i = 0; i < children.length; i += 1){
    children[i].classList.add("hidden");
 
}    
 
}//of not focused





   
}//of fucntion

function adder(whatid){
    console.log("Adder");
    console.log(whatid);
    what=document.getElementById(whatid);
    console.log(what);
    title=what.dataset.title;
    adder=document.getElementById('adder');
    topt=what.dataset.top;
    path=what.dataset.path;
    id=what.dataset.identifier;
   form=document.getElementById('theform');
   form.classList.remove("hidden");
   adder.innerHTML=title;
   
   var ftop=document.getElementById('top');
var fpath=document.getElementById('path');
var ftitle=document.getElementById('title');
var fid=document.getElementById('identifier')
var notes=document.getElementById('notes');
ftop.value=topt;
fid.value="+identifier:"+id;

   
}

function show_news(out){
   
    alert(out);
   
}



function gettoken(res){
 console.log(res);
 console.log(res.entity);
 console.log(res.entity.token);
document.token=res.entity.token;    
   
}


function gethtml(){
pather=document.getElementById('wordpressurl');
path=pather.value;
   
fullpath='https://www.albright.edu/wp-content/themes/albright2017/functions/ripper.php?url='+path;  
   
  fetch(fullpath)
    .then(function(response) {
        // When the page is loaded convert it to text
        return response.text()  })
    .then(function (text) {
        document.html=text;  
      console.log(text);
      pather.style.color="green";
      document.getElementById('button').classList.remove('hidden');
    });    
   
 
}

function finish(res){
    console.log(res);
    location.reload();
}


function poster(){
var formData = new FormData();


var top=document.getElementById('top').value;
var path=document.getElementById('path').value;
var title=document.getElementById('title').value;
var wordpressurl=document.getElementById('wordpressurl').value;
var identifier=document.getElementById('identifier').value;
var notes=document.getElementById('notes').value;

//wordpressurl='https://www.albright.edu/about-albright/about-reading-pa/';

//var html=document.getElementById('htmlcontent').value;

var dataObj={"contentlet":[{"title":title,"contentType":"navigation","top":top,"path":path,"wordpress":wordpressurl,
"parent":identifier,'html':document.html,'notes':notes}]};
   
   
jsonout=JSON.stringify(dataObj);    
console.log(jsonout);    
     
     
     
     
     
     
     
     
     
     
     
let url = '/api/v1/workflow/actions/default/fire/PUBLISH';      
fetch(url, {
  method: 'POST',
  headers: {
    'Accept': 'application/json, text/plain, */*',
    'Content-Type': 'application/json',
    "Authorization": "Bearer "+document.token
  },
  body: jsonout
}).then(res => res.json())
  .then(res => finish(res));
         
     

}

    let url = '/api/v1/authentication/api-token';  
fetch(url, {
  method: 'POST',
  headers: {
    'Accept': 'application/json, text/plain, */*',
    'Content-Type': 'application/json'
  },
  body: JSON.stringify({"user":"a...@albright.edu", "password":"XXXXXXX", "expirationDays": 10 })
}).then(res => res.json())
  .then(res => gettoken(res));


</script>




--
http://www.dotcms.com - Open Source headless/hybrid CMS
---
You received this message because you are subscribed to the Google Groups "dotCMS User Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dotcms+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dotcms/99156bec-23fe-4350-aa56-37d088bfe953n%40googlegroups.com.

John Michael Thomas

unread,
Jul 19, 2024, 10:58:52 AMJul 19
to dotCMS User Group
It sounds like you've chosen to use a content type to represent the taxonomies in a tree structure, which is a time-tested approach many customers use.  It also sounds like the main challenge you have with that is how the relationships among those taxonomies are displayed - e.g. it would be easier in a tree structure than in a flat structure.

If I'm right about both of those, then I'll follow up on what Mark said, that you can use code to represent that however you want.  And if you want to keep that within the dotCMS back-end, you can use a custom field to display and select the taxonomies (you don't have to use an external app).

There's a couple caveats to using custom fields for this that may be helpful to know if this will work for your needs.

1. The value of the custom field that gets stored in the content is a string.

So, once a user has selected the taxonomies, you'll need to represent that as a string.  And when you load a content item, the custom field will need to parse that string to show the already assigned taxonomy in whatever tree component you use.

2. When that string gets indexed, Elasticsearch tokenizes it.

By default, ES will strip out all white space and punctuation.  So, for example, if you were storing the taxonomy "path" as something like "Billing/Department/BillingCode", then it would strip out the slashes and index each of those labels separately - which would mean you couldn't search on the whole path.  And that's probably not what you want.

But you can override how each individual field is tokenized using the esCustomMapping field variable.  So, you can set that field up to index the entire path as a single string, which will allow you to include slashes in your search, to pull content with taxonomies matching any part of a path.

If you think custom fields might work for you, then it's probably worth taking a look at some of our custom field examples on the demo site.  I don't think we have any examples there that use a tree component, but you can use pretty much any vanilla JS component easily.  You can also use components from specific frameworks (Next, Angular, etc.), though to do that you'll also need import the appropriate libraries for the framework that the component relies on.

Hope it helps,
John

Jon Ku

unread,
Jul 23, 2024, 5:18:57 PMJul 23
to dotCMS User Group
Thanks for your replies John, and Mark for your real-world example and sample code, everything helps.

We are trying to stay with the relationships paradigm for taxonomy, there are about 1,000 in the tree. This allows us to have content in multiple locations and use a native content type, a many-to-many relationship called Taxonomy. The Relate control when editing will have 1,000 rows to select from, so even with Search functionality it will be difficult for our authors to use.

To manage editing the taxonomy tree I've instantiated the Dojo tree control within a custom field which queries the hierarchy and creates a Dojo data store in memory, i.e. inline within the custom field's execution. I've added a Dojo popup dialog which fires when you click the name of a tree element, and it will have buttons to action Cut, Copy, Delete, Rename and Paste the taxonomy to another branch. 

This works well within the custom control to update the tree view, but takes no action within the DotCMS context to change the existing relationships. As the custom field stores only strings it seems best to carry the relationship back into DotCMS.

My thought is to have a post.vtl or patch.vtl and call that from Dojo Ajax, passing in the identifier of the copied object, and the identifier of the target parent. The webhook should query the copied object and find its existing parents, then add the new parent to that list and save that as the new parent relationship.

When I try to prototype this in the Velocity editor I can retrieve the title and identifier of a taxonomy, but when I try to get a relationship for it the system returns the entire source code of the custom field, although I am requesting a different field from the contentlet. If I could access the parent relationship then I could update it and save it back, or delete and recreate the contentlet.

Any ideas are welcome, thanks!

- Jon

This works

#foreach( $taxonomy in $dotcontent.pull( "+contentType:TreecontrolTest", 30, "title asc") )
$taxonomy.title
$taxonomy.identifier
#end

result
Africa
eb3e595badc0eae92c62ec724e921cb6
Nairobi
bf73602ed1cedeb8aef77b576077b17c
North America
de8510f350212f92dd5d6da72ea25b3f
The earth
a051c35987a440e895f40211eb7a1e5f


This doesn't
#foreach( $taxonomy in $dotcontent.pull( "+contentType:TreecontrolTest", 30, "title asc") )
$taxonomy.title
$taxonomy.ident
ifier
#$taxonomy.link
#end

result
Africa
eb3e595badc0eae92c62ec724e921cb6
#[com.dotcms.rendering.velocity.viewtools.content.ContentMap@39c5b4a6[content=com.dotmarketing.portlets.contentlet.model.Contentlet@e75c614[map={owner=dotcms.org.1, identifier=a051c35987a440e895f40211eb7a1e5f, nullProperties=[], modDate=2024-07-23 19:28:40.723, languageId=1, title=The earth, inode=3ad533df-2270-487e-9ff5-776d824ffcff, titleImage=TITLE_IMAGE_NOT_FOUND, folder=SYSTEM_FOLDER, disabledWYSIWYG=[ ], sortOrder=0, modUser=dotcms.org.1, host=8a7d5e23-da1e-420a-b4f0-471e7da8ea2d, stInode=331851cd093906b59dd6dc521881fdd9},lowIndexPriority=false,variantId=DEFAULT],conAPI=com.dotmarketing.portlets.contentlet.business.ContentletAPIInterceptor@25338d57,perAPI=com.dotmarketing.business.PermissionBitAPIImpl@7b071adb,fields=[com.dotmarketing.portlets.structure.model.Field@11d05b2f[structureInode=331851cd093906b59dd6dc521881fdd9,fieldName=fields-0,fieldType=com.dotcms.contenttype.model.field.RowField,fieldRelationType=<null>,fieldContentlet=system_field,required=false,velocityVarName=fields0,sortOrder=0,values=<null>,regexCheck=<null>,hint=<null>,defaultValue=<null>,indexed=false,listed=false,fixed=false,readOnly=false,searchable=false,unique=false,modDate=Mon Jul 22 19:33:13 UTC 2024,iDate=Mon Jul 22 19:32:56 UTC 2024,type=field,owner=<null>,inode=7f1401f19e2d8a17792e670053629a2a,identifier=7f1401f19e2d8a17792e670053629a2a], com.dotmarketing.portlets.structure.model.Field@455b68c4[structureInode=331851cd093906b59dd6dc521881fdd9,fieldName=fields-1,fieldType=com.dotcms.contenttype.model.field.ColumnField,fieldRelationType=<null>,fieldContentlet=system_field,required=false,velocityVarName=fields1,sortOrder=1,values=<null>,regexCheck=<null>,hint=<null>,defaultValue=<null>,indexed=false,listed=false,fixed=false,readOnly=false,searchable=false,unique=false,modDate=Mon Jul 22 19:33:13 UTC 2024,iDate=Mon Jul 22 19:32:56 UTC 2024,type=field,owner=<null>,inode=1a8a21c10453abd1ddf701efc6d18cfe,identifier=1a8a21c10453abd1ddf701efc6d18cfe], com.dotmarketing.portlets.structure.model.Field@461fcb6b[structureInode=331851cd093906b59dd6dc521881fdd9,fieldName=Title,fieldType=com.dotcms.contenttype.model.field.TextField,fieldRelationType=<null>,fieldContentlet=text1,required=false,velocityVarName=title,sortOrder=2,values=<null>,regexCheck=<null>,hint=<null>,defaultValue=<null>,indexed=false,listed=false,fixed=false,readOnly=false,searchable=false,unique=false,modDate=Tue Jul 23 19:26:01 UTC 2024,iDate=Tue Jul 23 19:26:01 UTC 2024,type=field,owner=<null>,inode=1d57baf475004b8c35295b375294ebf4,identifier=1d57baf475004b8c35295b375294ebf4], com.dotmarketing.portlets.structure.model.Field@52eb278a[structureInode=331851cd093906b59dd6dc521881fdd9,fieldName=Tree control,fieldType=com.dotcms.contenttype.model.field.CustomField,fieldRelationType=<null>,fieldContentlet=text_area1,required=false,velocityVarName=treeControl,sortOrder=3,values=<script>
require([
"dojo/store/Memory", "dojo/store/Observable",
"dijit/tree/ObjectStoreModel", "dijit/Tree", "dojo/parser"
], function(Memory, Observable, ObjectStoreModel, Tree){
// Create test store, adding a getChildren() method needed by the model
myStore = new Memory({
data: [
{ id: 'world', name:'The earth', type:'planet', population: '6 billion'},
{ id: 'AF', name:'Africa'

... and so on ... tree-control.png

Mark Pitely

unread,
Jul 24, 2024, 9:45:59 AMJul 24
to dot...@googlegroups.com
The return from a relationship field is either a full object or a list of objects.You are missing a layer of abstraction that is actually pretty damn useful. 
If you have a structure called Planet with title, number of moons, etc. And then a structure of continents (name, coolest city)...
If you pull all continents:
$continent.planet   would have $continent.planet.title, $content.planet.moons as fields. 
You can also just say $planet=$continent.planet and then do $planet.title.
So, in your case, you want to do something with $taxonomy.link.identifier or $taxonomy.link.title.

If there are multiples in the relationship (many to many), you have to loop through them. 
#foreach ($planet in $planets)
$planet.title
#foreach ($cont in $planet.continents)
$cont.title
#end
#end

Hope that helps. 


M



Jon Ku

unread,
Jul 24, 2024, 11:41:29 AMJul 24
to dotCMS User Group
Yes, that got me back on track!

- Jon

working code

#foreach( $taxonomy in $dotcontent.pull( "+contentType:Taxonomy", 100, "title asc") )
$taxonomy.title
$taxonomy.identifier
$taxonomy.parent[0].identifier
#end

but as you point out, I need to manage multiple parents
#foreach( $taxonomy in $dotcontent.pull( "+contentType:Taxonomy", 100, "title asc") )
$taxonomy.title
$taxonomy.inode
$taxonomy.identifier
#foreach($parent in $taxonomy.parent)
    $parent.title $parent.identifier
#end
#end

Reply all
Reply to author
Forward
0 new messages