7
submitted 1 year ago* (last edited 1 year ago) by bahmanm@lemmy.ml to c/lisp@lemmy.ml

cross-posted from: https://lemmy.ml/post/4591838

Suppose I need to find out if the intersection of an arbitrary number of lists or sequences is empty.

Instead of the obvious O(n^2^) approach I used a hash table to achieve an O(n) implementation.

Now, loop mini-language aside, is this idiomatic elisp code? Could it be improved w/o adding a lot of complexity?

You can view the same snippet w/ syntax highlighting on pastebin.

(defun seq-intersect-p (seq1 seq2 &rest sequences)
 "Determine if the intersection of SEQ1, SEQ2 and SEQUENCES is non-nil."
 (cl-do* ((sequences `(,seq1 ,seq2 ,@sequences) (cdr sequences))
          (seq (car sequences) (car sequences))
          (elements (make-hash-table :test #'equal))
          (intersect-p nil))
     ((or (seq-empty-p sequences)) intersect-p)
   (cl-do* ((seq seq (cdr seq))
            (element (car seq) (car seq)))
       ((or intersect-p (seq-empty-p seq)) intersect-p)
     (if (ht-get elements element)
         (setf intersect-p t)
       (ht-set elements element t)))))

(defun test-seq-intersect-p ()
 "Test cases."
 (cl-assert (null (seq-intersect-p '()
                                   '())))
 (cl-assert (null (seq-intersect-p '(1)
                                   '())))
 (cl-assert (null (seq-intersect-p '(1 2)
                                   '(3 4)
                                   '(5 6))))
 (cl-assert (seq-intersect-p '(1 2)
                             '(3 4)
                             '(5 6)
                             '(1)))
 t)

(test-seq-intersect-p)

Version 2

(defun seq-intersect-p (first second & sequences)
  "Determine if FIRST, SECOND and any of the sequences in SEQUENCES have
an intersection."
  (if (seq-empty-p sequences)
      (seq-intersection first second)
    (or (seq-intersection first second)
        (apply #'seq-intersect-p
               first
               (seq-first sequences)
               `,@(seq-rest sequences))
        (apply #'seq-intersect-p
               second
               (seq-first sequences)
               `,@(seq-rest sequences))
        (apply #'seq-intersect-p
               (seq-first sequences)
               (seq-elt sequences 2)
               `,@(seq-rest (seq-rest sequences))))))
top 9 comments
sorted by: hot top controversial new old
[-] jwlgowi 2 points 1 year ago* (last edited 1 year ago)

If you don’t care about the intersection values:

init empty hashmap w best guess on size
Iterate sequences
  Iterate elements
    If elt in hashmap return t
  Add elt to hashmap
return nil

If you have maybe million+ elements, a db like sqlite might help. Unique index, insert each item til you get a unique constraint failure.

[-] bahmanm@lemmy.ml 1 points 1 year ago

Yes, that's essentially the snippet in my post πŸ‘

[-] jwlgowi 2 points 1 year ago

Personally, I’d drop all the macro template syntax, applys, intersects, etc. And simplify the function arg to be just: (sequences)

let item-map make-hash-table
  dolist seq sequences
    dolist item seq
      ;; gethash / return / or setf & continue
[-] charje@lemmy.ml 1 points 1 year ago* (last edited 1 year ago)

Instead of storing intersect-p as a variable and keeping it until the end of the loop, you can return early as soon as you find the first intersection.

Even though a hash table has better symtotic run time, you might find after benchmarking that the O(n^2) is faster for your use case. If you are set on using a hash table, you might consider setting the initial size to something a bit larger (relative to the input lists) to avoid having to dynamically grow the hash table.

I think also the return value of the inner loop is never used...

I personally like to keep my tests assertions top level so I can interactively run each one by itself.

[-] bahmanm@lemmy.ml 1 points 1 year ago

tests assertions top level

Noted. Makes sense.

[-] bahmanm@lemmy.ml 1 points 1 year ago

Thanks for the code review and feedback. Here's a 2nd attempt: https://pastebin.com/WBqs9u8L

I essentially threw away my bloated Java/C#'esq implementation and started from scratch. Please let me know what you think πŸ™

[-] charje@lemmy.ml 1 points 1 year ago

This version is definitely a bit harder to follow what is going on.

[-] bahmanm@lemmy.ml 1 points 1 year ago* (last edited 1 year ago)

Oh!? And I was under the impression that the code reads more naturally than the initial version πŸ˜‚


Let me try putting it in words and see if it makes sense to you:

Given sequences seq1 and seq2 and sequence of sequences sequences, seq-intersect-p should return non-nil if at least one pair of the input sequences have got an intersection.

  1. If seq1 and seq2 intersect return t
  2. Recursively check if seq1 intersects w/ any element in sequences. If it does, return t. Otherwise we know seq1 is safe to be ignored - no intersection whatsoever.
  3. Recursively check if seq2 intersects w/ any element in sequences. If they don't, we know seq2 is safe to be ignored too.
  4. Recursively check if any elements of sequences intersect w/ each other.

There's no caching or optimisation in this version. So it's always O(n^2^).

[-] bahmanm@lemmy.ml 1 points 1 year ago

Thanks all for your feedback πŸ™

I'm going to stick to "version 2" which, to my mind, reads more naturally. I'll definitely consider the iterative suggestions for the sake of performance if I ever decide to submit a patch upstream. But for now, what I've got does the job for me dealing w/ sequences w/ less than 50 elements.

this post was submitted on 08 Sep 2023
7 points (100.0% liked)

Lisp Community

687 readers
1 users here now

A community for the Lisp family of programming languages.

Lisp (historically LISP) is a family of programming languages with a long history and a distinctive, fully parenthesized prefix notation. Originally specified in 1958, Lisp is the second-oldest high-level programming language. Only Fortran is older, by one year.

History

Associations and meetings

Resources - TODO

Related communities (dialects) - TODO

founded 5 years ago
MODERATORS