Gee [Long] Re: Functional iterators - call for consensus



On Wed, 2011-06-29 at 18:41 +0200, Quikee wrote:
> On Wed, Jun 29, 2011 at 15:04, Maciej Piechotka <uzytkownik2 gmail com> wrote:
> > Well - possibly due to different background I like virtual method in
> > interfaces (i.e. like classes in Haskell). They allow to change
> > interface w/out breaking API (breaking ABI however).
> I acknowledge it is the difference in the background but still for the
> sake of consistency you either get rid of AbstractCollection and co.
> or change the interface. Still I prefer to keep interfaces clean of
> implementation - this helps to separate how from what (How it works is
> something a user of the interface is not interested in and does not
> need to know, what does it do is the only thing he is interested in.)
> and if I use an interface it does not assume anything (no default
> implementation).
> 

Just to clarify - I use virtual function ONLY when there is only one
correct (according to spec/intent) behaviour of given function and the
overloading is due to performance rather then anything else. For example

BAD:

public interface Iterator<G> {
    // Please overload when iterator implements remove();
    public virtual void remove() {
        assert_not_reached ();
    }
    // ...
}

GOOD:

public interface Iterator<G> {
    public virtual void foreach(ForeachFunc<G> f) {
        // Only legal instances are equivalent
        if (valid)
            f(get ());
        while (next ())
            f(get ());
    }
}

Implementing foreach do have high impact on the speed (20x-26x speedup)
but it can only be this function - with inlined get().

In this sense the Iterator.foreach is what written in different way then
plain english and ListIterator.foreach is how.

> > The stream function was written for performance reason. The Vala does
> > not optimize the virtual calls at all so they perform slowly as Java or
> > C# would do. Vala have no static or dynamic type analysis and therefore
> > each call to interface goes through 1-2 jumps by function pointers.
> >
> > For exact numbers see:
> >
> > https://bugzilla.gnome.org/show_bug.cgi?id=645850#c4
> >
> > It is written mostly for implemantators as it:
> >
> >  - Allows to reuse of code across functions (like map etc.) therefore
> > making life of people implementing new combinators easier (i.e. Jurg,
> > Dider, mine, ...). I agree that it is small benefit and therefore it was
> > not in original spec.
> >  - As a side effect it allows to implement all functional methods to be
> > implemented efficiently by just reimplementing forall and stream. As it
> > generally have huge impact (virtual calls are heavy in Vala, virtual
> > calls to interfaces via abstract classes are even more).
> >
> > As shown in the data I've linked it have the  some constant overhead
> > (which quickly outperform other methods except .get for array).
> >
> > In short - it is implementation detail but important one from
> > performance POV (yes, I was surprised by benefits of forall).
> 
> If it is a performance regression in Vala or in GObject it should be
> fixed there not worked around in libgee.

It is not so much as regression in Vala as the improvement in libgee.
I.e. by implementing those function in special way we achieve a
speed-up.

> Also I have nothing against stream besides that it should be part of
> implementation - hidden away from the user as he does not need to know
> it exists. There is a reason that libgee exists.. it is because GLib's
> collections are horrible. They expose implementation details which
> means you can not mix them easily by just changing the type like for
> example you can do with libgee (change from LinkedList to ArrayList
> and semantic doesn't change). With stream that is a implementation
> detail you do the same. For example map can be implemented in various
> ways - even non-lazy and parallel evaluated but when you assume stream
> this is not possible or let's say it is not elegant.
> 

(Transforming from lazy value to strict is relativly easy).

I have chosen specific design. Parallel or non-lazy map are not the same
map as you have to take into account the side-effects. The behaviour of
map method should not depend on the type of collection. For example:

int x = 0;
var i = c.iterator ().map<int> ((e) => {return x++;});
while (i.next()) {
    stdout.printf ("%d\t", x);
    stdout.printf ("%d\t", i.get());
    stdout.printf ("%d\n", x);
}

Should not depend on type of c (yes I agree that this is ugly
implementation). The correct way would be more of:

int x = 0;
var i = c.iterator ().par_map<int> ((e) => {return x++;});
while (i.next()) {
    stdout.printf ("%d\t", x);
    stdout.printf ("%d\t", i.get());
    stdout.printf ("%d\n", x);
}

> > Hmm. +1?
> >
> > What about something like Stream interface from which both Collection
> > and Iterator inherit?
> >
> > Edit: it seems you call it Functional. I'm not particulary happy with
> > either of names.
> I am not sure about the name either. Interfaces define behaviour of a
> particular Class - I thought in the line that a certain Class is
> functional or has functional behaviour if it implements the Functional
> interface. The same as a Class is iterable if it implements Iterable
> interface. Still functional behaviour sound very awkward and
> misleading.
> 
> Stream is also a bad name for the interface - it's just one kind of
> implementation for functional operators.
> 

For some of them there is natural way (say fold).

> > Ok. There was take and skip operators in spec which get lost.
> >
> > it would be something like:
> >
> > iterator().skip(3).map(func)
> 
> This was not the point. Point was what kind of behaviour can I expect
> when I do iterator().next().next().next().map(func). Does map work
> through the whole collection or only the rest of collection?

Oh. Sorry - it works from this point on. Like in python.

Similary:

def my_test(x):
    if x < 3:
         print x
         return True
    else:
         return False

for x in itertools.dropwhile(my_test, itertools.count(0)):
	print "##", x

"""output:

0
1
2
## 3
## 4
## 5
..."""

> Because
> there is no reset it obviously can not go from the beginning. Still I
> don't like that I (or every other user) have to even ask such
> questions - I want that the behaviour is obvious on the interface
> level.
> 
> I like skip(..) but even nicer would be limit(3,10) or maybe
> slice(3,10) also things like slice(3, -2) would be nice. There is
> slice already but that slice copies the collection which is not really
> cool.
> 

+1. I assume slice(3, 10) returns [iter[3], iter[4], ..., iter[12]]

But what does slice(3, -2) do? Return [iter[3], iter[2], iter[1]]?

> > They produces elements as they go. So say there is iterator returning:
> >
> > iter = [1, 2, 3, 4, ....]
> >
> > iter.map((x) => {return x + 1;}) returns
> >
> > [2, 3, 4, 5, ...]
> >
> > iter.scan((x, y) => {return x + y}, 0) returns
> >
> > [0, 1, 3, 6, 10, ...]
> 
> ... and fold and foreach? Because the method is implemented on the
> interface the user doesn't even know that it has a problem. In case an
> interface only prescribes that you need to implement an method the
> user will go "hmmm..." immediately in such situation. :)
> 

Semantically:

foreach (var x in c) {
    f(x);
}

var z = i;
foreach (var x in c) {
    z = g(x, z);
}

and

c.iterator().foreach(f);
c.iterator().fold(g, i);

are equivalent. I think any sane implementation of those function would
do so.

> > To be honest I don't like the separation of FunctionalIterator and
> > Iterator. Every iterator is, conceptually, functional iterator. There is
> > no iterator for which you cannot define meaningfully map function.
> 
> Arguably you can kill a man with almost every knife but still this
> does not mean that a kitchen knife should be designated as a weapon.
> Also here the iterator purpose is to iterate elements, a special
> iterator can do other things besides this but it is not only an
> iterator anymore - it is something more than a iterator. In this case
> it is a functional iterator - iterator that besides all also supports
> functional operations.
> 

See discussion at the end.

> > Sorry for double-posting. Also - your implementation uses foreach by
> > default which is a bad idea from performance POV as mentioned before.
> > (Namely 20x performance degradation for 1023 element array).
> 

Ups. Sorry - my error - it is degradation from POV of vala 0.10 not
0.12. Currently (i.e. in 0.12) using foreach function is as fast as
foreach construct for array lists and 26x faster for linked lists
(regression from 11x faster in vala 0.10).

> I did not know foreach has performance regressions.. still as I said -
> there is no reason foreach should have regressions. If it has it is a
> bug elsewhere.
> 

Please see the bug. The problem is that it started to be using get()
instead of iterator for List which makes it significantly slower on
anything but ArrayList. (In some cases it makes it incorrect).

> And.. another thing I remembered that is currently missing and would
> be nice to have is an ability to create a new collection from the
> iterator. With all the power that can be done now with functional
> operations it would be nice to have a add_all_from_iterator (not a
> good name) method on collection interface or change the signature of
> add_all to
> 

Part of it is missing but I assume it is Iterator or something similar.

> >No. It would be... well stupid if I sent RFC/call for consensus
> >expecting only positive feedback. Well - ok, I do hope everything is
> >fine with my original design ;) but from my experience negative, but
> >constructive, feedback is the most valuable one.
> It is sometimes hard to give constructive feedback.

I know and I appreciate that you do. Sorry if I'm a little harsh but I
fight all day with building one library which tried to find boost
in /include ;)

> Sometimes I know
> that something is not good and we might run against the wall in the
> future (or not) but it is hard to explain and persuade that I might be
> right. As in this case I have a bad feeling about stream on interface
> as I would always design the interface correct, extensible, friendly
> first and never for performance in mind. I just hope we find something
> that will be acceptable by both.

I think we differ in a few points.

I see that if X behaves like Y then X should be Y. I.e. if integers
behave like a field they are field. If matrix behave as a monoid they
are monoid. If parser behaves as monad it is monad (sorry for using
mathematical terms[1]).

The main question is if it is possible to implement function f that
follows certain axioms. To give an example with monoid - is there such
operation that have an identity.

All of those functions - map, stream, forall etc. - can be implemented
in terms of the 0.6 Iterator interface. They could be a static function
if not 2 reasons:

 - Performance. The benefit of using specialized forall is enormous (20x
for linked list).
 - Clarity. This is of course arguable but:
      Iterator.fold(Iterator.map(c.iterator(), f), g, x)
   is less clear (for people accustomed to LTR flow instead of RTL) then
      c.iterator().map(f).fold(g, x)

Please also note that it is not the only such way of solving the
problem. C# do it in very similar way - both putting emphasis on
iterationn and extending it functionality[2][3]:

% cat mycollection.cs
using System.Collections;
using System.Collections.Generic;
using System.Linq;

public class Test : IEnumerable<int> {
  public IEnumerator<int> GetEnumerator() {
    return new TestEnumerator();
  }

  IEnumerator IEnumerable.GetEnumerator() {
    return new TestEnumerator();
  }

  public static void Main() {
    Test t = new Test();
    System.Console.WriteLine(t.Sum(x => x));
  }
}

public class TestEnumerator : IEnumerator<int> {
  public int Current {
    get {
      return current;
    }
  }

  object IEnumerator.Current {
    get {
      return current;
    }
  }

  public void Dispose() {
    return;
  }

  public bool MoveNext() {
    if (current >= 4)
      return false;
    current++;
    return true;
  }
  
  public void Reset() {
    current = 0;
  }
  
  private int current = 0;
}
% gmcs mycollection.cs
% mono mycollection.exe 
10

In similar way I think stream would not cause any problems - it can
always be implemented. The definition in iterator.vala is prove as it
uses only the core methods/properties (i.e. valid, get and next).

Correct me if I'm wrong but if I understand correctly you prefer the
model enforced by Java while I prefer much more liberal approach where I
split the methods into 3 groups:

 - Core methods (next/get/...) - in the interface
 - Sane defaults (say read_only_view implementation) - in abstract
classes
 - Methods which semantic can be based on core methods (forall/...) - in
the interface as virtual (sometimes with strong suggestion for
overloading).

> Regards, Tomaž

Regards

[1] For example monoid is an triple (M, ⊕, e) where:
 - ⊕ : M⨉M → M
 - e ∈ M
 - ∀a. a ⊕ e = e ⊕ a = a
Integers are monoid under addition - (ℤ, +, 0) . Similarly lists under
concatenations (L<T>, |, []):

[2] http://msdn.microsoft.com/en-us/library/9eekhta0.aspx
[3] http://msdn.microsoft.com/en-us/library/system.linq.enumerable.aspx

Attachment: signature.asc
Description: This is a digitally signed message part



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]