first off thanks hugely. This type of involvement is exactly what we
need to get this dataset to the level of accuracy that will make it
really useful to people. We did a bunch of checks on the data, but with
so much some errors were bound to creep in.
There are legitimate reasons some sanity checks will fail. For instance,
some network are actually disconnected (as far as their map goes) and
there are some reasons this may make sense (e.g. they get transit from a
provider rather than having their own network connection). And there
are definitely some ambiguities in the data. However, it looks like you
have found a bunch of issues that we really need to fix.
We are using version control, but only locally. I like to idea of using
github or something similar to allow contributions to be merged directly
into the zoo. We'll have a talk about this in the next week or so and
get something more useful useful set up for contributors.
I guess another thing we need is an FAQ to answer some basic questions
about the data. I'll see about starting that as well.
Can I also ask where you came across the dataset?
Cheers,
Matt
Hi
first off thanks hugely. This type of involvement is exactly what we need to get this dataset to the level of accuracy that will make it really useful to people. We did a bunch of checks on the data, but with so much some errors were bound to creep in.
There are legitimate reasons some sanity checks will fail. For instance, some network are actually disconnected (as far as their map goes) and there are some reasons this may make sense (e.g. they get transit from a provider rather than having their own network connection). And there are definitely some ambiguities in the data. However, it looks like you have found a bunch of issues that we really need to fix.
We are using version control, but only locally. I like to idea of using github or something similar to allow contributions to be merged directly into the zoo. We'll have a talk about this in the next week or so and get something more useful useful set up for contributors.
I guess another thing we need is an FAQ to answer some basic questions about the data. I'll see about starting that as well.
Can I also ask where you came across the dataset?
Sorry about the delay in response.
I have created a GitHub repository, and will soon check the source into them.
The sources differ slightly from the GML files in the zoo, as they
contain x-y co-ordinates from when the network was traced from the
source image. They also contain some extra information for the
geocoding script. These are then converted into the GML format used in
the zoo using the yed2zoo tool described at
http://topology-zoo.org/toolset.html
I have been working on a paper, so haven't had a chance to investigate
the full set of bugs you reported. I appreciate the report, and will
get onto them early next week.
I agree a schema is important, and will get onto this as well soon.
Is there something in the meantime that I could directly provide you
with that will allow you to get up and running quicker?
Thanks
Simon
On Tue, Jul 26, 2011 at 5:06 AM, Brandon Heller
if the biggest holdup for you is the lack of co-ordinates for the
hyperedges, we can probably provide a programatic solution. We can
remove the hyperedge, and connect its neighbors together, either as a
clique or as a minimum spanning tree.
Another option is to retain the hyperedge, but approximate its
location, such as the midpoint of its neighbors.
Finally, you could manually set the location. This could be done by
editing the GML. However, if I were to do this method I would write a
small Python script to update the relevant node information, and then
write it back to a GML file.
Does any of this help?
We will address the transcribing errors shortly. Once the sources are
in GitHub corrections will be simpler, and we will easily be able to
produce a list of diffs.
Thanks for your feedback, it is much appreciated.
Thanks
Simon
Hi Brandon,
if the biggest holdup for you is the lack of co-ordinates for the
hyperedges, we can probably provide a programatic solution. We can
remove the hyperedge, and connect its neighbors together, either as a
clique or as a minimum spanning tree.
Another option is to retain the hyperedge, but approximate its
location, such as the midpoint of its neighbors.
Finally, you could manually set the location. This could be done by
editing the GML. However, if I were to do this method I would write a
small Python script to update the relevant node information, and then
write it back to a GML file.
Does any of this help?
We will address the transcribing errors shortly. Once the sources are
in GitHub corrections will be simpler, and we will easily be able to
produce a list of diffs.
Thanks for your feedback, it is much appreciated.
I am still finalising the toolset for conversion, which will hopefully
be done in the next few days.
We have also merged in your corrections - thank you for spotting
these. We have started to use the issue tracker on GitHub, so that
should make it easier to report new networks and corrections in the
future: https://github.com/sk2/topologyzoo/issues
We are now using GraphML as our working file format, as this allows
properties (for nodes, edges, and the graph itself) to be directly
edited in yED. Previously we were using GML with an external
properties CSV, which wasn't going to scale well with a public version
control system.
Thanks
Simon