Processing imported HTML

122 views
Skip to first unread message

Ruslan Popov

unread,
Feb 9, 2020, 3:47:55 AM2/9/20
to TiddlyWiki
I want to import spreadsheets from HTML files into TiddlyWiki. The files' content looks like this:

<html> 
<head> <title>Search ....</title> 
</head> 
<body> 
 <div> 
 <div class="grid_9"> 
 <table> <tr><td colspan="3" width="400"><h1>..... List</h1></td></tr> </table> <br> 

<table class="report-layout" border="1">
   <thead>
                    <tr>
                    <!-- <th style="text-align:center;background-color:#BDBDBD;">Name</th> -->
                    <th style="text-align:center;background-color:#BDBDBD;">Application No.</th>
                    <th style="text-align:center;background-color:#BDBDBD;">Course Type</th>
                    <th style="text-align:center;background-color:#BDBDBD;">Course Name</th>
                    <th style="text-align:center;background-color:#BDBDBD;">First Name</th>
                    <th style="text-align:center;background-color:#BDBDBD;">Middle Name</th>
                    <th style="text-align:center;background-color:#BDBDBD;">Last Name</th>
                    <th style="text-align:center;background-color:#BDBDBD;">Email</th>
                    <th style="text-align:center;background-color:#BDBDBD;">Phone</th>
                    ....
                    </tr>
  </thead> 
             <tr id="row_8797" class="odd">
                    <td>P3.........</td>
                    <td>TEACHER TRAINING COURSE</td>
                    <td>200H</td>
                    <td>Natasha</td>
                    <td></td>
                    <td>K....</td>
                    ...
             </tr>
....

In short, it is a basic table in basic HTML format. The first row has column headers, and the following rows have the data.

This file can be imported using the standard Import tool. As a result, a new tiddler is created, which has the same HTML content from the file. So, the data gets nicely displayed in the tiddler.

What I'd like to do, however, is to add some wikitext decoration to these data. For example, where there is "Application ID" in the first column, I want to be a wiki link. Clicking on the link should create a new tiddler titled with the Application ID. And that tiddler's fields should be populated with values from that row. For example, there should be fields such as "First Name", "Last Name", "Email", each containing corresponding data from the table.

I don't know how to approach this. But for a TiddlyWiki guru this may be trivial. For example, 

  1. is it better to process the data during import itself, or import it first and then run a post-import script? (I don't need to keep the original HTML in the Wiki)
  2. whether there's a a DOM / SAX parser available, or I have to stick with regex
  3. what is the best practice to pre-populate tiddler fields when it is created?
Thanks in advance for any suggestions

Mat

unread,
Feb 9, 2020, 4:07:06 AM2/9/20
to TiddlyWiki
There is one trivial way, yes, but it sets a demand on the input:

If, to use your example, the "Application ID" is in CamelCase form, it will automatically be linkified. 

If this is not acceptable then it is a little more iffy. One way would be to use the split and join operators for a "search and replace" type operation as described at the bottom here. I.e you must filter the whole tiddler text to locate specific spots where you insert [[brackets]] to create links. For sure doable but will take some tinkering, I'd expect.

what is the best practice to pre-populate tiddler fields when it is created?

To create tiddlers using a template. Do some searching on the boards.

<:-)

Joshua Fontany

unread,
Feb 9, 2020, 9:55:35 PM2/9/20
to TiddlyWiki
Is this data coming from an actual spreadsheet, or a program that can export *.csv (comma seperated values)?

If so, you could use my new CSV Import options in my JsonMangler plugin. After importing the CSV tiddler, there is an option to re-Import each row as its own tiddler (auto-naming fields after the header row if present).

https://joshuafontany.github.io/TW5-JsonMangler/

Best,
Joshua Fontany

Ruslan Popov

unread,
Feb 18, 2020, 6:14:53 AM2/18/20
to TiddlyWiki
Is this data coming from an actual spreadsheet, or a program that can export *.csv (comma seperated values)?

It's coming from a program, but it's only available in HTML format.

Thanks, your plugin looks good!
 

TonyM

unread,
Feb 18, 2020, 7:15:54 AM2/18/20
to TiddlyWiki
Quick tip

There are browser plugins that allow you to copy online tables in other formats such as text and csv. Perhaps you should investigate intermediate steps to reach your goal.

Regards
Tony

Reply all
Reply to author
Forward
0 new messages