Jumat, 08 April 2022

latihan koefisien jaccard

 1. Jika diketahui A={1,2,3,4} dan B ={1,2,4} dan C = {1,2,4,5}, berapakah Jaccard (A,B), Jaccard (B,C) dan Jaccard (A,C)

2. Berikutnya untuk kasus query dan document. Misalnya kita punya :

query : ideas of march

doc 1 : caesar died in march

doc 2 : the long march 

Cari Koefisien Jaccard antara query dengan doc 1 dan doc 2.

3. Diketahui 3 dokumen :

d1 : "Jack London traveled to Oakland"

d2 : "Jack London traveled to the city of Oakland"

d3 : "Jack traveled from Oakland to London"

Nilai dari Koefisien Jaccard J(d1,d2) dan J(d1,d3) jika dilakukan dengan n-gram analisis dengan n=2 (bigram) adalah


Jawaban :

1. Jaccard(A,B)

 | A | = 4

 | B | = 3

 | A ∩ B | = 3

 | A U B | = (|A| + |B| - | A ∩ B |) = 4 + 3 – 3 = 4 

Jaccard(A,B) = | A ∩ B | / | A U B | = 3/4 = 0.75 

Jaccard(B,C) 

| B | = 3

| C | = 4

| B ∩ C | = 3

| B U C | = (|B| + |C| - | B ∩ C |) = 3 + 4 – 3 = 4 

Jaccard(B,C) = | B ∩ C | / | B U C | = 3/4 = 0.75

Jaccard(A,C) 

| A | = 4

| C | = 4

| A ∩ C | = 3 

| A U C | = (|A| + |C| - | A ∩ C |) = 4 + 4 – 3 = 5 

Jaccard(A,C) = | A ∩ C| / | A U C | = 3/5 = 0.6


2. Jaccard(Q, DOC1)

Q = 3 

DOC1 = 4 

| Q ∩ DOC1 | = 1 

| Q U DOC1 | = 6 

|Q ∩ DOC1| / | Q U DOC1| = 1/6 = 0.17

 Jaccard(Q, DOC2) 

Q = 3 

DOC2 = 3 

| Q ∩ DOC2 | = 1 

| Q U DOC2 | = 5 

| Q ∩ DOC2 | / |Q U DOC2| = 1/5 = 0.2


3. Jaccard(D1,D2) 

D1 = 4 (Jack London, London traveled, traveled to, to Oakland) 

D2 = 7 (Jack London, London traveled, traveled to, to the, the city, city of, of Oakland) 

| D1 ∩ D2 | = 3 

| D1 U D2 | = 8 

| D1 ∩ D2 | / | D1 U D2 | = 3/8 = 0.375 

Jaccard(D1,D3) 

D1 = 4(Jack London, London traveled, traveled to, to Oakland) 

D2 = 5(Jack traveled, traveled from, from Oakland, Oakland to, to London 

| D1 ∩ D3 | = 0 

| D1 U D2 | = 9 

| D1 ∩ D2 | / | D1 U D2 | = 0/9 = 0

Tidak ada komentar:

Posting Komentar

latihan koefisien jaccard

 1. Jika diketahui A={1,2,3,4} dan B ={1,2,4} dan C = {1,2,4,5}, berapakah Jaccard (A,B), Jaccard (B,C) dan Jaccard (A,C) 2. Berikutnya untu...