Re: question for the Arista/Transmageddon groups

On Wed, 2010-09-29 at 09:47 -0400, Daniel G. Taylor wrote:
> I'm definitely interested in such functionality as the author of Arista. 
> I get a lot of bugs about ripping the wrong titles on DVDs and use some 
> ridiculous code to try and get all the titles and guess which one to use.
> Arista doesn't have a plugin framework per-se, but you could easily fork 
> it on github and add the feature and I will merge it into the main 
> development repo. If something is a useful feature I don't mind adding 
> it to the core codebase.
> Just so everyone is on the same page, both Arista and Transmageddon:
>   * Are written in python
>   * Use GStreamer as a backend
>   * Shared a lot of stuff initially but have diverged significantly
>   * Are friendly to patches
Greetings once again and thank you for the warm response. Most of my
tool is written in Python; I do use some compiled bits for things that I
did not find a Python way to make happen (or do so easily). Namely as it
applies to this conversation:
1. The DVD hashing method. At first (and as-published in the current
version), because HandBrake tended to download everything as source and
builds it on-the-spot, part of that is libdvdread and part of THAT is a
demo applet that quickly (very) does an MD5 checksum on the DVD image
data. I have since extracted the essentials of that into a stand-alone
project that I can easily publish the source for but it just makes the
same calls into libdvdread to do the hashing. The reason I started doing
the stand-alone project is that if I could just ship a single .so or app
I could dispense with the HandBrake source install method. This would be
good for all the right reasons, not the least of which is making the
tool rip-system agnostic. Now having run transcoders on a variety of
machines over the years (I have been fooling with this stuff for around
six so far), the source-built HandBrake (or any system that takes
advantage of multi-core and specifically 64 bit systems) just plain runs
faster when it is compiled on more powerful hardware. The difference in
speed between a dual-core 32-bit system, dual-core 64, or even a
quad-core can be striking. What I am saying is that there is a plus to
this madness in general BUT makes my stuff less portable than I would
like. Finding a Pythonic way of doing the hashing would let the ripper
be anything (because I am honest enough to admit that while the disc
lookup stuff is pretty handy, I know all of the code is not or not as
robust as I would like it) and still, if the end-user was a power-user
to begin with, they could still hand-build and optimize HandBrake (or
whatever; I am not hung up on that tool) if they wanted. Since Arista is
already and I think ( I just got home from work, found the wife had been
experimenting with our network while I was gone so I had work to do) Transmageddon either is or plans to be
cross-platform, this is a must-solve for me. 
2. As mentioned, this tool uses a compiled HandBrake (and all the bits
that go into it) but with a couple of configuration items is not locked
to HandBrake and it was never my intention to make it so. As a matter of
fact, while I am not into stripping protection as a goal, when I buy a
DVD and want to put it on our NAS so we can stream it to our TV (my wife
is disabled so this project was initially to help her), it ticks me off
to see some of the childish ways they are protected and if there is an
easy way for me to side-step it on something I already own, I will do
it. Now while HandBrake can do alot, one of my coders in England has
worked out some ways of using mencoder to do a dump on certain kinds of
things that (for him anyways) works very well so my interest has always
been about letting this use the right tool for the right job. 

So of the bits of what I have that might be useful to the "greater
good", I feel that the hash/dvd-lookup part is probably of most use. The
good news is, once I sort out the hashing, I could have this code (in
Python) isolated and in a stand-alone module or class in about a day.
Since there is no standard plugin system, this would seem to be the way
to get the best results the fastest. That said I would like to get some
off-the-cuff feedback on two aspects of this that I would like to change
and if this is going to be used by more than a handful of people, it
makes sense to change this stuff sooner rather than later. This is based
on the premise that while it is useful as-is, the more folks ripping and
contributing to this, the more useful and powerful overall it would
become. To that end: 
1. Right now the back-end server is just a few lines of Python providing
XML-RPC services as well as marshalling/demarshalling the data. Finally
it provides an interface to SQLite as a database (I got used to using
that when I worked at Sony on the Playstation 3). Thus I doubt the
server could withstand more than a few concurrent users as-written. Yes
the packets are small, the load is light and I am running it on its own
box (headless tower bought for this purpose). However I guess if I
wanted to get more serious about this, I should probably rewrite the
back-end to be more scalable, more robust (it never crashes or anything;
by robust I mean some script-kiddie could probably get into it and
overwrite stuff) and secure. Right now, I control it via access keys
that I issue to people I trust; most "users" who just want to do lookups
don't need anything for read-only access...this could use some work.
2. Also, right now basically the front end turns Python objects into
pipe-delimited strings and XML-RPC pushes it to the back-end where a
counterpart turns the strings back into Python objects for interacting
with the database. However as a side-project recently I started turning
the lookup stuff into a C/C++ plugin and realized how Python-based this
stuff is. Now that may not be a problem but I did want to make those
interested folks aware of it.

There is also a part to this that I have/had intentions on expanding but
frankly this bit is weak at best. Essentially it uses track-lengths and
some other bits of information to try to sort out what kind of disc a
DVD is (Movie with features, TV show episodes, etc). This is purely to
help when readying a DVD for the first time (and its not in the
database, currently around 2000+ titles) it tries to guess what it is
and makes certain assumptions about how best to rip. Honest folks, this
is a piece that could land on the floor but in the name of open-ness I
wanted it out there. I really think the hashing/lookup stuff would be of
most use to the projects. Assuming the hashing stuff gets sorted, either
(or any for that matter) project could just create a Python object
passing in the DVD path (/dev/dvd for example) and if there is an
Internet connection, read either an object representing the DVD itself,
a list of tracks with names, track types (animation, etc) and so on.
There are other tables of associated data that might be useful, I don't
know. For me/us, the disc names and track-names have been most useful.

Oh yes, something else I should mention: since this project was an
internal "what-if" kind of thing, and I was/am still using Freevo for my
set-top box, the track-naming convention may seem odd to some...if you
have used Freevo you know that what I have listed above automatically
has '_' converted to spaces, the strings are automatically coaster-cased
and the episode-numbering method keeps all episodes in the right order.
More, the extensions are suppressed on the screen and most-importantly,
Freevo turns our directory structure into a menu automatically. Now this
may not be right for everyone and I am way way open to
adjusting/extending/whatever any portion of this so it is more useful to
everyone. Please don't be shy about asking for changes to the way it
does stuff.

Folks I am really excited and happy about contributing what I can. I
have been doing the code thing for a number of years now and would love
to do it for someone other than "the man" some more. I publish what I
can as GPL or just give it away but would love and welcome the
opportunity to be able to contribute in a more tangible and useful way.

That said I am in a deathmarch right now so long-extended periods of
work are a little hard to come by but the deathmarch won't last and I am
very motivated to add to one or both of these projects and in turn, make
my code more useful to me as well.


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]