Re: Folder Comparison with Percentage Similarity?



Alan,

Tools already exist that more directly meet your need.  Any unix-like system will have command-line tools to do most of this analysis.  I'd start with "diff -b -B -w", but you can also use "comm".  The comm tool relies on the files being sorted, though, so you might want to ignore "empty" lines or common lines like </head>, for example.

There are some plagiarism-detector tools that may also help, but I don't have any experience with those.

Feel free to contact me off-list if you need more specific guidance.
Phil


On Wed, Sep 27, 2017 at 2:49 PM Alan Halls <alanjhalls gmail com> wrote:
I am involved in a legal matter regarding an employees theft of trade secrets. In particular he stole the source code for a website that he and 2 other programmers worked on for 2 years.

I now have a copy of his project, and of course a copy of mine. I found the software Meld which seems to do a great job on a one by one basis, but it would be very time consuming to try to end up with any "score" of how much of our original code is still in his existing project.

He was sloppy and his launched public website still has our company info in the 404 page, which links you to the about us, pricing, docs, contact us pages ---- which all still have the original code in them, so there is no question about whether or not he did, just how much "custom" work did he do for himself.

I was kind of imagining a report with a total score, then the top 50 matches with each of their scores. Has anyone thought of adding that in? It seems that all that info would be available already in the program, just needing a view for it to display on.

_______________________________________________
meld-list mailing list
meld-list gnome org
https://mail.gnome.org/mailman/listinfo/meld-list


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]