1. Rewritten in C++
2. Many source/target/pipeline mappings allowed, as long as they are 1 to 1.
3. Files that are not mapped are transparently read directly from the
source.
4. Readers of a transcoded file can start reading before transcoding is
complete.
5. Transcoded images are built efficiently using STL ropes.
6. Size of transcoded file is always reported accurately (waits for
transcoding to complete).
7. Opendir operation starts transcoding all directory entries in
parallel in anticipation of future need (~readahead).
8. Image cache may be limited by count and/or memory.
9. Image cache entries are indexed by filesystem/inode/modification time.
Any interest?
I don't personally have any problems with C++. But it's really up
to Stanislav, as I have no real time to work on it. Philosophically,
I'm disinclined to do rewrites, but this program is so small and new
that it doesn't really matter.
I think NetBSD has gstfs in ports so it still needs to be portable.
> 2. Many source/target/pipeline mappings allowed, as long as they are 1 to 1.
Nice.
> 3. Files that are not mapped are transparently read directly from the
> source.
Should already be the case with gstfs (well, after a patch by
Stanislav in the git tree). Bind mounts can also accomplish the same.
> 4. Readers of a transcoded file can start reading before transcoding is
> complete.
Yeah, I had this working at one point, I can't remember why I dropped
it. I think it was just a complexity thing.
> 5. Transcoded images are built efficiently using STL ropes.
I had thought that a custom gstreamer plugin would be better than my
hacky mess of pipes, but this might be a better way...
> 6. Size of transcoded file is always reported accurately (waits for
> transcoding to complete).
One problem is that 'ls' may take forever to return.
> 7. Opendir operation starts transcoding all directory entries in
> parallel in anticipation of future need (~readahead).
Which takes care of 6, but might need a whole lot of cache space.
(Think of the case of someone having 5000 media files in one
directory). I'm not sure it's worth the trade-off for size to be
correct the first time.
> 8. Image cache may be limited by count and/or memory.
>
> 9. Image cache entries are indexed by filesystem/inode/modification time.
Just curious, why modification time? Are you using f_fsid from statfs
for the filesystem or the name?
> Any interest?
Thanks for your work, and speaking for myself, I think there's a lot
of interest in the features, as well as in anyone who wants to put in
the time to work on it. From that point of view, I can't quibble with
such minor details as language choice :)
--
Bob Copeland %% www.bobcopeland.com
2009/4/29 Ross Tyler <rosse...@gmail.com>:
> This is a very interesting project.
> I started experimenting with gstfs 0.1 over a month ago.
> I started fixing bugs (I sent Bob a patch) and adding features but ended
> up rewriting it.
> Here are some differences:
>
> 1. Rewritten in C++
I have no problem with C++ as well (that was my first real programming
language anyway).
I'd love to see code first so that we don't "jump" into it. But as Bob
mentioned, gstfs is really small so rewrite at this point is not that
big of a problem. If we can keep the portability and make it easier to
enhance future versions all the better.
> 2. Many source/target/pipeline mappings allowed, as long as they are 1 to 1.
Great, that was on my TODO list.
> 3. Files that are not mapped are transparently read directly from the
> source.
That fix was already in master branch in git.
> 4. Readers of a transcoded file can start reading before transcoding is
> complete.
Also great. Current version in master would have problems with this,
since transcoding happens during "open()"
> 5. Transcoded images are built efficiently using STL ropes.
I'd say that most processing/memory intensive operations are in
conversion stuff itself, but any improvement will help
> 6. Size of transcoded file is always reported accurately (waits for
> transcoding to complete).
Just like Bob mentioned, this could be a huge problem with big
directories and simple "ls" getting stuck (but maybe we
misunderstood?)
> 7. Opendir operation starts transcoding all directory entries in
> parallel in anticipation of future need (~readahead).
With me, this is perhaps the most controversial "feature". Admittedly
I have pretty old computer (one core) so this would effectively
prevent me to do anything else :-) If it's off by default and only
turned on as optional feature I'm ok with it. I suppose that on new
machines with few spare cores and lots of RAM this could make sense.
> 8. Image cache may be limited by count and/or memory.
I was also thinking about timeouts for cache (clear cache entries
older than 30 minutes for example).
> 9. Image cache entries are indexed by filesystem/inode/modification time.
Another performance improvement, good.
> Any interest?
Definitely. Anyone willing to spend time on this project is welcome.
It would be great if you could share your code somewhere, perhaps even
here so that we can all collectively help out each other.
With that in mind, there is also a mob branch available for you to
commit if you want:
http://repo.or.cz/w/gstfs-ng.git
I admit though that your rewrite would probably best reside in its own
branch at least. But that's just details.
I was planning to release gstfs-0.2 once autotools would be working.
But that would have to wait after my thesis is done (two weeks from
now). If we make this work with your rewrite, then we'll skip the 0.2
I guess :-)
One last time, thanks for all the work and time you've spent on this
--
Stanislav Ochotnicky
jabber: socho...@gmail.com
icq: 74274152
Not at all. Git can be intimidating in the beginning.
This is modified example from git-scm.com:
$ git clone ssh://m...@repo.or.cz/srv/git/gstfs-ng.git
$ cd gstfs-ng
$ git branch mob origin/mob
$ git checkout mob
$ (edit files)
$ git add (files)
$ git commit -m 'Explain what I changed'
$ git format-patch origin/master
In case I didn't make any typos/mistakes this will create set of
patches (one for every commit you've made) that you can send to
mailing list or elsewhere. Instead of last line (format-patch) you
could also run "git push" and your changes would immediately appear in
publicly available repository. This is all useful for small changes,
fixing bugs. But as far as your changes are concerned, I'd suggest
just sending tar-ed project here for now.
There is a lot of nice things about git and if you already have some
experience with SVN/CVS and understand principles of version control
you should pick it up easily. I am not much of a tutor, so I suggest
you read some basic tutorial on Git. Good start might be man
gittutorial and other resources on git-scm.com.
If you have some specific question you can of course ask, maybe I'll
be able to answer.
Sure send it in, just so that we know where we stand and can maybe
give you some feedback. I guess that the project files are not that
big so it's not a problem (just be sure to do make clean before
packaging :-) )
> I would like to make the changes we have talked about and improve
> backward compatibility first.
> I should have something here by this weekend.
Fine by me
No harm done
> Bob made me realize that I needed to step back and figure out how I
> would use such a thing.
> I have done so and my thoughts are reflected in a man page that I just
> posted.
Posted where if I may ask? I suppose you wanted to attach it to the
email. Happens to everyone sometime :-)
> I am working toward delivering what the man page documents.
> Much of this work is done.
> The persistent cache feature is something that I am really interested
> in but will take longer.
> I hope to deliver code that meets the rest of the requirements soon.
I really suggest you find 10 minutes to read part of "The Cathedral
and the Bazaar" namely "Release Early, Release Often". You can find
online copy here:
http://www.catb.org/~esr/writings/cathedral-bazaar/cathedral-bazaar/ar01s04.html
Having buggy/incomplete code is OK. If you share we all learn and many
good things can come out of it:
1. You get feedback soon so that if you head in wrong direction you
won't spend so much time working on it for nothing
2. All the other people will be able to understand your final version
better because they were involved and know why certain things have
been done
3. You learn about your code as well because sometimes you will have
to explain your decisions
These were just from the top of my head. I am sure Bob could think of many more.
So whatever you have, even if it doesn't even compile right now is OK.
We can see the style, we can see basic design decisions, etcetera,
etcetera.
> Thank you for your patience/support.
Hope you will have enough patience with me :-)
Note that there is a new man page.
I made the user's job of specifying a pipeline simpler by slapping on
the fdsrc and fdsink elements myself.
It turned out that the trueSize and readAhead options were a harder than
I anticipated.
I think some amount of readAhead should be on by default but trueSize
should be off by default.
Please let me know what you think.
Thanks!
Fist let me apologize for not answering at least in some way right away.
My inbox has been a mess lately and I've been terrible at answering
people (not just you). Once I read you message I downloaded your
code and gave it a quick glimpse. It needed more time than I had at the
time so I put it on hold. And then I forgot about it. Again I am
terribly sorry.
First I must admit I've never used boost before, though I've heard great
things about it. I was thinking whether it'd be a good thing to
complicate things with additional libraries, but after some thought it
may not be such a bad idea. Boost is becoming part of STL to a certain
degree after all.
I really like you code overall. It's consistent, you used exceptions and
boost functionality in a lot of places. However I'd like to simplify
things as much as possible. Not sure it's possible with you codebase,
but I'd like to try. Readability of several parts could be improved with
different formatting IMO, but I'm just nitpicking now. Comments you
provided in the source code are great start for dev documentation, we
could aggregate it on wiki later.
Also I think your assumptions about computers running gstfs are very
optimistic. I still use my 4 years old laptop. 1.5 GHz Centrino, 1GB
RAM. If you set image cache to 1GB you will effectively kill off machines
with less then 2GB RAM. Defaults should be very conservative, with tips
to improve performance in --help output or README.
There is one small bug in your code that makes it impossible to disable
readahead (I attached a patch that should fix this). But if you don't
mind I'd like to try and add your code to repo.or.cz repository. More
specifically mob branch for now. Once we clean up few bits here and
there we could replace master branch (trunk in SVN dictionary) with your
code.
Your code in repo.or.cz mob branch:
http://repo.or.cz/w/gstfs-ng.git?a=shortlog;h=refs/heads/mob
More info about mob branches on repo.or.cz:
http://repo.or.cz/mob.html
You can checkout current version like this:
git clone git://repo.or.cz/gstfs-ng.git
Then do:
cd gstfs-ng
git checkout -t origin/mob -b mob
Now you are in mob branch and can edit/add/remove files as you like.
After you are finished you will git-commit to you local copy and then
you should be able to git-push it back to repo.or.cz so that others can
see changes too. Maybe you will need to put mob@ in the url before
hostname, not sure about that. This will make it easier to track changes
and improve code together.
You mentioned before that you don't have experience with git, so if you
need helping hand you can contact me through jabber (address is in sig)
Once again I apologize for terrible response time on my side. I'll do my
best for it not to happen again.
--
Stanislav Ochotnicky
jabber: socho...@gmail.com
icq: 74274152
I agree with your assessment on the default cache memory usage.
At your prodding, I was rushing to get something to you for review.
Shortly after I released the code, I made a change to accept a % suffix
so that the cache limit can be specified as a percentage of physical memory.
The default is now 25% but can easily be changed.
Thank you for your bug fix.
I will be evaluating it soon.
I appreciate you checking it in to git.
Learning git will be good for me (as, I hope, using boost will be good
for you/us).
Thank you for reviewing the code.
Do you know if Bob had a chance?
Thanks again!
The original code works as I intended.
Values for cacheCount, cacheMemory, cacheTime and readAhead options all
specify limits on their respective resources.
A value of 0 consistently means "no limit" for all of these options.
This should have been documented in the man page.
I wanted to have an easy way to say "no limit".
I did not consider the need to say "no capability".
These values make more sense in the code where they are named
cacheCountLimit, cacheMemoryLimit, cacheTimeLimit and readAheadLimit.
So, !readAheadLimit naturally means no read ahead limit.
I thought about naming the options the same way but I thought it might
be too verbose.
I see how !readAhead (no read ahead) means something far different than
!readAheadLimit (no read ahead limit).
I see how readAhead=0 looks more like !readAhead than it does
!readAheadLimit.
Do you really think we need the ability to say "no cache" or "no read
ahead"?
If so, we will have to think of a way to say both.
Perhaps I should add readAheadOff and cacheOff options and append
"Limit" to the names of the existing options.
What do you think?
To be honest I didn't even consider that someone might want to set "no
limit" because as I see it it makes no sense. No one has unlimited number
of processors and memory.
> I wanted to have an easy way to say "no limit".
> I did not consider the need to say "no capability".
> These values make more sense in the code where they are named
> cacheCountLimit, cacheMemoryLimit, cacheTimeLimit and readAheadLimit.
> So, !readAheadLimit naturally means no read ahead limit.
> I thought about naming the options the same way but I thought it might
> be too verbose.
> I see how !readAhead (no read ahead) means something far different than
> !readAheadLimit (no read ahead limit).
> I see how readAhead=0 looks more like !readAhead than it does
> !readAheadLimit.
Well I knew what the value meant so I didn't think much about semantics.
Readahead limit is maximum number of concurrent processing jobs. In this
sense 0 can mean two things as you said. It is either no limit or no
processing jobs in advance. Since in this case I really don't consider
'no limit' a viable option I'd say 0 means no concurrent processing
jobs.
As far as naming the variable goes, readAheadLimit is fine, because it
means limit, but with special meaning for 0. As far as I know this
behaviour is pretty common.
> Do you really think we need the ability to say "no cache" or "no read
> ahead"?
Definitely. Every user with single processor will tell you so. Even now
you have for example Atom with HT (but one core anyway).
> If so, we will have to think of a way to say both.
> Perhaps I should add readAheadOff and cacheOff options and append
> "Limit" to the names of the existing options.
> What do you think?
Like I said I see no reason why 0 should mean "no limit". If someone
really wants to benchmark their system they can set readahead (or cache)
to arbitrary large number (say MAX_INT). Also consider that every option
you add has to be supported from that point on and the more "knobs" you
have the harder it is for users to figure them out.
But if you can supply some use cases that couldn't be solved by
setting readahead to MAX_INT I will change my mind of course.
On further refection, I completely agree with you.
I have changed cacheCountLimit, cacheMemoryLimit, cacheTimeLimit and
readAheadLimit to give no special meaning to 0.
That is, 0 means a limit of 0 - it no longer means unlimited.
The names of the command line options have not been changed.
As I previously mentioned, I have also added the % suffix to the
cacheMemory option so that it can be expressed as a percentage of
physical memory.
I have updated the man page.
I would like to push this all back to your git repository but am having
problems.
I believe that I have made the changes to my local git repository (clone).
I did git-add, git-commit, etc.
Now that I try a git-push it fails.
Here is what I have tried:
git push
fatal: The remote end hung up unexpectedly
git push git://repo.or.cz/gstfs-ng.git
fatal: The remote end hung up unexpectedly
git push git://m...@repo.or.cz/gstfs-ng.git
fatal: Unable to look up m...@repo.or.cz (port 9418) (Name or service
not known)
help!
Don't panic! :-)
For push comand you need to use push URL, in our case that is
ssh://m...@repo.or.cz/srv/git/gstfs-ng.git
So to properly setup your repo:
git config --add remote.mob.url ssh://m...@repo.or.cz/srv/git/gstfs-ng.git
git config --add remote.mob.fetch +refs/heads/*:refs/remotes/mob/*
then you should be able to do:
git push mob
Great, this will make participation much more easier.
Please set your name/email using git config though.
git config --global user.name "Your Name"
git config --global user.email "y...@example.com"
Thanks for contributions again. I may be a bit unresponsive until Tuesday
because I have my defense then. If that's the case bear with me and if I
won't respond to any email you send until then, remind me on
Thursday/Friday :-)