若两个或多个变量的取值之间存在某种规律性,就称为关联
关联规则是寻找在同一个事件中出现的不同项的相关性,比如在一次购买活动中所买不同商品的相关性。
“在购买计算机的顾客中,有30%的人也同时购买了打印机”
若关联规则X->Y的支持度和置信度分别大于或等于用户指定的最小支持率minsupport和最小置信度minconfidence,则称关联规则X->Y为强关联规则,否则称关联规则X->Y为弱关联规则。
由此可见,lift正是弥补了confidence的这一缺陷,if lift=1,X与Y独立,X对Y出现的可能性没有提升作用,其值越大(lift>1),则表明X对Y的提升程度越大,也表明关联性越强。
### Leverage 与 Conviction的作用和lift类似,都是值越大代表越关联
import pandas as pdfrom mlxtend.frequent_patterns import apriorifrom mlxtend.frequent_patterns import association_rules123
自定义一份购物数据集
data = {'ID':[1,2,3,4,5,6], 'Onion':[1,0,0,1,1,1], 'Potato':[1,1,0,1,1,1], 'Burger':[1,1,0,0,1,1], 'Milk':[0,1,1,1,0,1], 'Beer':[0,0,1,0,1,0]}123456
df = pd.DataFrame(data)1
df = df[['ID', 'Onion', 'Potato', 'Burger', 'Milk', 'Beer' ]]1
df1
ID | Onion | Potato | Burger | Milk | Beer | |
---|---|---|---|---|---|---|
0 | 1 | 1 | 1 | 1 | 0 | 0 |
1 | 2 | 0 | 1 | 1 | 1 | 0 |
2 | 3 | 0 | 0 | 0 | 1 | 1 |
3 | 4 | 1 | 1 | 0 | 1 | 0 |
4 | 5 | 1 | 1 | 1 | 0 | 1 |
5 | 6 | 1 | 1 | 1 | 1 | 0 |
选择最小支持度为50%
apriori(df, min_support=0.5, use_colnames=True)
frequent_itemsets = apriori(df[['Onion', 'Potato', 'Burger', 'Milk', 'Beer' ]], min_support=0.50, use_colnames=True)1
frequent_itemsets1
support | itemsets | |
---|---|---|
0 | 0.666667 | (Onion) |
1 | 0.833333 | (Potato) |
2 | 0.666667 | (Burger) |
3 | 0.666667 | (Milk) |
4 | 0.666667 | (Potato, Onion) |
5 | 0.500000 | (Burger, Onion) |
6 | 0.666667 | (Burger, Potato) |
7 | 0.500000 | (Milk, Potato) |
8 | 0.500000 | (Burger, Potato, Onion) |
返回的3种项集均是支持度>=50%
rules = association_rules(frequent_itemsets, metric='lift', min_threshold=1)1
rules1
antecedents | consequents | antecedent support | consequent support | support | confidence | lift | leverage | conviction | |
---|---|---|---|---|---|---|---|---|---|
0 | (Potato) | (Onion) | 0.833333 | 0.666667 | 0.666667 | 0.80 | 1.200 | 0.111111 | 1.666667 |
1 | (Onion) | (Potato) | 0.666667 | 0.833333 | 0.666667 | 1.00 | 1.200 | 0.111111 | inf |
2 | (Burger) | (Onion) | 0.666667 | 0.666667 | 0.500000 | 0.75 | 1.125 | 0.055556 | 1.333333 |
3 | (Onion) | (Burger) | 0.666667 | 0.666667 | 0.500000 | 0.75 | 1.125 | 0.055556 | 1.333333 |
4 | (Burger) | (Potato) | 0.666667 | 0.833333 | 0.666667 | 1.00 | 1.200 | 0.111111 | inf |
5 | (Potato) | (Burger) | 0.833333 | 0.666667 | 0.666667 | 0.80 | 1.200 | 0.111111 | 1.666667 |
6 | (Burger, Potato) | (Onion) | 0.666667 | 0.666667 | 0.500000 | 0.75 | 1.125 | 0.055556 | 1.333333 |
7 | (Burger, Onion) | (Potato) | 0.500000 | 0.833333 | 0.500000 | 1.00 | 1.200 | 0.083333 | inf |
8 | (Potato, Onion) | (Burger) | 0.666667 | 0.666667 | 0.500000 | 0.75 | 1.125 | 0.055556 | 1.333333 |
9 | (Burger) | (Potato, Onion) | 0.666667 | 0.666667 | 0.500000 | 0.75 | 1.125 | 0.055556 | 1.333333 |
10 | (Potato) | (Burger, Onion) | 0.833333 | 0.500000 | 0.500000 | 0.60 | 1.200 | 0.083333 | 1.250000 |
11 | (Onion) | (Burger, Potato) | 0.666667 | 0.666667 | 0.500000 | 0.75 | 1.125 | 0.055556 | 1.333333 |
返回的是各个的指标的数值,可以按照感兴趣的指标排序观察,但具体解释还得参考实际数据的含义。
rules [ (rules['lift'] >1.125) & (rules['confidence']> 0.8) ]1
antecedents | consequents | antecedent support | consequent support | support | confidence | lift | leverage | conviction | |
---|---|---|---|---|---|---|---|---|---|
1 | (Onion) | (Potato) | 0.666667 | 0.833333 | 0.666667 | 1.0 | 1.2 | 0.111111 | inf |
4 | (Burger) | (Potato) | 0.666667 | 0.833333 | 0.666667 | 1.0 | 1.2 | 0.111111 | inf |
7 | (Burger, Onion) | (Potato) | 0.500000 | 0.833333 | 0.500000 | 1.0 | 1.2 | 0.083333 | inf |
这几条结果就比较有价值了:
retail_shopping_basket = {'ID':[1,2,3,4,5,6], 'Basket':[['Beer', 'Diaper', 'Pretzels', 'Chips', 'Aspirin'], ['Diaper', 'Beer', 'Chips', 'Lotion', 'Juice', 'BabyFood', 'Milk'], ['Soda', 'Chips', 'Milk'], ['Soup', 'Beer', 'Diaper', 'Milk', 'IceCream'], ['Soda', 'Coffee', 'Milk', 'Bread'], ['Beer', 'Chips'] ] }123456789
retail = pd.DataFrame(retail_shopping_basket)1
retail = retail[['ID', 'Basket']]1
pd.options.display.max_colwidth=1001
retail1
ID | Basket | |
---|---|---|
0 | 1 | [Beer, Diaper, Pretzels, Chips, Aspirin] |
1 | 2 | [Diaper, Beer, Chips, Lotion, Juice, BabyFood, Milk] |
2 | 3 | [Soda, Chips, Milk] |
3 | 4 | [Soup, Beer, Diaper, Milk, IceCream] |
4 | 5 | [Soda, Coffee, Milk, Bread] |
5 | 6 | [Beer, Chips] |
数据集中都是字符串组成的,需要转换成数值编码
retail_id = retail.drop('Basket' ,1)retail_id12
ID | |
---|---|
0 | 1 |
1 | 2 |
2 | 3 |
3 | 4 |
4 | 5 |
5 | 6 |
retail_Basket = retail.Basket.str.join(',')retail_Basket12
0 Beer,Diaper,Pretzels,Chips,Aspirin 1 Diaper,Beer,Chips,Lotion,Juice,BabyFood,Milk 2 Soda,Chips,Milk 3 Soup,Beer,Diaper,Milk,IceCream 4 Soda,Coffee,Milk,Bread 5 Beer,Chips Name: Basket, dtype: object1234567
retail_Basket = retail_Basket.str.get_dummies(',')retail_Basket12
Aspirin | BabyFood | Beer | Bread | Chips | Coffee | Diaper | IceCream | Juice | Lotion | Milk | Pretzels | Soda | Soup | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 |
1 | 0 | 1 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 1 | 1 | 0 | 0 | 0 |
2 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 |
3 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 0 | 0 | 1 |
4 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 |
5 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
retail = retail_id.join(retail_Basket)retail12
ID | Aspirin | BabyFood | Beer | Bread | Chips | Coffee | Diaper | IceCream | Juice | Lotion | Milk | Pretzels | Soda | Soup | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 |
1 | 2 | 0 | 1 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 1 | 1 | 0 | 0 | 0 |
2 | 3 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 |
3 | 4 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 0 | 0 | 1 |
4 | 5 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 |
5 | 6 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
frequent_itemsets_2 = apriori(retail.drop('ID',1), use_colnames=True)1
frequent_itemsets_21
support | itemsets | |
---|---|---|
0 | 0.666667 | (Beer) |
1 | 0.666667 | (Chips) |
2 | 0.500000 | (Diaper) |
3 | 0.666667 | (Milk) |
4 | 0.500000 | (Chips, Beer) |
5 | 0.500000 | (Diaper, Beer) |
如果光考虑支持度support(X>Y), [Beer, Chips] 和 [Beer, Diaper] 都是很频繁的,哪一种组合更相关呢?
association_rules(frequent_itemsets_2, metric='lift')1
antecedents | consequents | antecedent support | consequent support | support | confidence | lift | leverage | conviction | |
---|---|---|---|---|---|---|---|---|---|
0 | (Chips) | (Beer) | 0.666667 | 0.666667 | 0.5 | 0.75 | 1.125 | 0.055556 | 1.333333 |
1 | (Beer) | (Chips) | 0.666667 | 0.666667 | 0.5 | 0.75 | 1.125 | 0.055556 | 1.333333 |
2 | (Diaper) | (Beer) | 0.500000 | 0.666667 | 0.5 | 1.00 | 1.500 | 0.166667 | inf |
3 | (Beer) | (Diaper) | 0.666667 | 0.500000 | 0.5 | 0.75 | 1.500 | 0.166667 | 2.000000 |
显然{Diaper, Beer}更相关一些
数据集: MovieLens (small)
movies = pd.read_csv('ml-latest-small/movies.csv')1
movies.head(10)1
movieId | title | genres | |
---|---|---|---|
0 | 1 | Toy Story (1995) | Adventure|Animation|Children|Comedy|Fantasy |
1 | 2 | Jumanji (1995) | Adventure|Children|Fantasy |
2 | 3 | Grumpier Old Men (1995) | Comedy|Romance |
3 | 4 | Waiting to Exhale (1995) | Comedy|Drama|Romance |
4 | 5 | Father of the Bride Part II (1995) | Comedy |
5 | 6 | Heat (1995) | Action|Crime|Thriller |
6 | 7 | Sabrina (1995) | Comedy|Romance |
7 | 8 | Tom and Huck (1995) | Adventure|Children |
8 | 9 | Sudden Death (1995) | Action |
9 | 10 | GoldenEye (1995) | Action|Adventure|Thriller |
数据中包括电影名字与电影类型的标签,第一步还是先转换成one-hot格式
movies_ohe = movies.drop('genres',1).join(movies.genres.str.get_dummies())1
pd.options.display.max_columns=1001
movies_ohe.head()1
movieId | title | (no genres listed) | Action | Adventure | Animation | Children | Comedy | Crime | Documentary | Drama | Fantasy | Film-Noir | Horror | IMAX | Musical | Mystery | Romance | Sci-Fi | Thriller | War | Western | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | Toy Story (1995) | 0 | 0 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
1 | 2 | Jumanji (1995) | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
2 | 3 | Grumpier Old Men (1995) | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 |
3 | 4 | Waiting to Exhale (1995) | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 |
4 | 5 | Father of the Bride Part II (1995) | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
movies_ohe.shape1
(9125, 22)1
数据集包括9125部电影,一共有20种不同类型。
movies_ohe.set_index(['movieId','title'],inplace=True)1
movies_ohe.head()1
(no genres listed) | Action | Adventure | Animation | Children | Comedy | Crime | Documentary | Drama | Fantasy | Film-Noir | Horror | IMAX | Musical | Mystery | Romance | Sci-Fi | Thriller | War | Western | ||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
movieId | title | ||||||||||||||||||||
1 | Toy Story (1995) | 0 | 0 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
2 | Jumanji (1995) | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
3 | Grumpier Old Men (1995) | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 |
4 | Waiting to Exhale (1995) | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 |
5 | Father of the Bride Part II (1995) | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
frequent_itemsets_movies = apriori(movies_ohe,use_colnames=True, min_support=0.025)1
frequent_itemsets_movies1
support | itemsets | |
---|---|---|
0 | 0.169315 | (Action) |
1 | 0.122411 | (Adventure) |
2 | 0.048986 | (Animation) |
3 | 0.063890 | (Children) |
4 | 0.363288 | (Comedy) |
5 | 0.120548 | (Crime) |
6 | 0.054247 | (Documentary) |
7 | 0.478356 | (Drama) |
8 | 0.071671 | (Fantasy) |
9 | 0.096110 | (Horror) |
10 | 0.043178 | (Musical) |
11 | 0.059507 | (Mystery) |
12 | 0.169315 | (Romance) |
13 | 0.086795 | (Sci-Fi) |
14 | 0.189479 | (Thriller) |
15 | 0.040219 | (War) |
16 | 0.058301 | (Action, Adventure) |
17 | 0.037589 | (Action, Comedy) |
18 | 0.038247 | (Action, Crime) |
19 | 0.051178 | (Action, Drama) |
20 | 0.040986 | (Sci-Fi, Action) |
21 | 0.062904 | (Action, Thriller) |
22 | 0.029260 | (Adventure, Children) |
23 | 0.036712 | (Adventure, Comedy) |
24 | 0.032438 | (Adventure, Drama) |
25 | 0.030685 | (Adventure, Fantasy) |
26 | 0.027726 | (Sci-Fi, Adventure) |
27 | 0.027068 | (Children, Animation) |
28 | 0.032877 | (Children, Comedy) |
29 | 0.032438 | (Crime, Comedy) |
30 | 0.104000 | (Drama, Comedy) |
31 | 0.026959 | (Fantasy, Comedy) |
32 | 0.090082 | (Romance, Comedy) |
33 | 0.067616 | (Crime, Drama) |
34 | 0.057863 | (Crime, Thriller) |
35 | 0.031671 | (Mystery, Drama) |
36 | 0.101260 | (Romance, Drama) |
37 | 0.087123 | (Drama, Thriller) |
38 | 0.031014 | (War, Drama) |
39 | 0.043397 | (Horror, Thriller) |
40 | 0.036055 | (Mystery, Thriller) |
41 | 0.028932 | (Sci-Fi, Thriller) |
42 | 0.035068 | (Romance, Drama, Comedy) |
43 | 0.032000 | (Crime, Drama, Thriller) |
rules_movies = association_rules(frequent_itemsets_movies, metric='lift', min_threshold=1.25)1
rules_movies1
antecedents | consequents | antecedent support | consequent support | support | confidence | lift | leverage | conviction | |
---|---|---|---|---|---|---|---|---|---|
0 | (Action) | (Adventure) | 0.169315 | 0.122411 | 0.058301 | 0.344337 | 2.812955 | 0.037575 | 1.338475 |
1 | (Adventure) | (Action) | 0.122411 | 0.169315 | 0.058301 | 0.476276 | 2.812955 | 0.037575 | 1.586111 |
2 | (Action) | (Crime) | 0.169315 | 0.120548 | 0.038247 | 0.225890 | 1.873860 | 0.017836 | 1.136081 |
3 | (Crime) | (Action) | 0.120548 | 0.169315 | 0.038247 | 0.317273 | 1.873860 | 0.017836 | 1.216716 |
4 | (Sci-Fi) | (Action) | 0.086795 | 0.169315 | 0.040986 | 0.472222 | 2.789015 | 0.026291 | 1.573929 |
5 | (Action) | (Sci-Fi) | 0.169315 | 0.086795 | 0.040986 | 0.242071 | 2.789015 | 0.026291 | 1.204870 |
6 | (Action) | (Thriller) | 0.169315 | 0.189479 | 0.062904 | 0.371521 | 1.960746 | 0.030822 | 1.289654 |
7 | (Thriller) | (Action) | 0.189479 | 0.169315 | 0.062904 | 0.331984 | 1.960746 | 0.030822 | 1.243510 |
8 | (Adventure) | (Children) | 0.122411 | 0.063890 | 0.029260 | 0.239033 | 3.741299 | 0.021439 | 1.230158 |
9 | (Children) | (Adventure) | 0.063890 | 0.122411 | 0.029260 | 0.457976 | 3.741299 | 0.021439 | 1.619096 |
10 | (Adventure) | (Fantasy) | 0.122411 | 0.071671 | 0.030685 | 0.250671 | 3.497518 | 0.021912 | 1.238881 |
11 | (Fantasy) | (Adventure) | 0.071671 | 0.122411 | 0.030685 | 0.428135 | 3.497518 | 0.021912 | 1.534608 |
12 | (Sci-Fi) | (Adventure) | 0.086795 | 0.122411 | 0.027726 | 0.319444 | 2.609607 | 0.017101 | 1.289519 |
13 | (Adventure) | (Sci-Fi) | 0.122411 | 0.086795 | 0.027726 | 0.226500 | 2.609607 | 0.017101 | 1.180614 |
14 | (Children) | (Animation) | 0.063890 | 0.048986 | 0.027068 | 0.423671 | 8.648758 | 0.023939 | 1.650122 |
15 | (Animation) | (Children) | 0.048986 | 0.063890 | 0.027068 | 0.552573 | 8.648758 | 0.023939 | 2.092205 |
16 | (Children) | (Comedy) | 0.063890 | 0.363288 | 0.032877 | 0.514580 | 1.416453 | 0.009666 | 1.311672 |
17 | (Comedy) | (Children) | 0.363288 | 0.063890 | 0.032877 | 0.090498 | 1.416453 | 0.009666 | 1.029255 |
18 | (Romance) | (Comedy) | 0.169315 | 0.363288 | 0.090082 | 0.532039 | 1.464511 | 0.028572 | 1.360609 |
19 | (Comedy) | (Romance) | 0.363288 | 0.169315 | 0.090082 | 0.247964 | 1.464511 | 0.028572 | 1.104581 |
20 | (Crime) | (Thriller) | 0.120548 | 0.189479 | 0.057863 | 0.480000 | 2.533256 | 0.035022 | 1.558693 |
21 | (Thriller) | (Crime) | 0.189479 | 0.120548 | 0.057863 | 0.305379 | 2.533256 | 0.035022 | 1.266089 |
22 | (Romance) | (Drama) | 0.169315 | 0.478356 | 0.101260 | 0.598058 | 1.250236 | 0.020267 | 1.297810 |
23 | (Drama) | (Romance) | 0.478356 | 0.169315 | 0.101260 | 0.211684 | 1.250236 | 0.020267 | 1.053746 |
24 | (War) | (Drama) | 0.040219 | 0.478356 | 0.031014 | 0.771117 | 1.612015 | 0.011775 | 2.279087 |
25 | (Drama) | (War) | 0.478356 | 0.040219 | 0.031014 | 0.064834 | 1.612015 | 0.011775 | 1.026321 |
26 | (Horror) | (Thriller) | 0.096110 | 0.189479 | 0.043397 | 0.451539 | 2.383052 | 0.025186 | 1.477810 |
27 | (Thriller) | (Horror) | 0.189479 | 0.096110 | 0.043397 | 0.229034 | 2.383052 | 0.025186 | 1.172413 |
28 | (Mystery) | (Thriller) | 0.059507 | 0.189479 | 0.036055 | 0.605893 | 3.197672 | 0.024779 | 2.056601 |
29 | (Thriller) | (Mystery) | 0.189479 | 0.059507 | 0.036055 | 0.190283 | 3.197672 | 0.024779 | 1.161509 |
30 | (Sci-Fi) | (Thriller) | 0.086795 | 0.189479 | 0.028932 | 0.333333 | 1.759206 | 0.012486 | 1.215781 |
31 | (Thriller) | (Sci-Fi) | 0.189479 | 0.086795 | 0.028932 | 0.152689 | 1.759206 | 0.012486 | 1.077769 |
32 | (Drama, Comedy) | (Romance) | 0.104000 | 0.169315 | 0.035068 | 0.337197 | 1.991536 | 0.017460 | 1.253291 |
33 | (Romance) | (Drama, Comedy) | 0.169315 | 0.104000 | 0.035068 | 0.207120 | 1.991536 | 0.017460 | 1.130057 |
34 | (Crime, Drama) | (Thriller) | 0.067616 | 0.189479 | 0.032000 | 0.473258 | 2.497673 | 0.019188 | 1.538742 |
35 | (Drama, Thriller) | (Crime) | 0.087123 | 0.120548 | 0.032000 | 0.367296 | 3.046884 | 0.021497 | 1.389989 |
36 | (Crime) | (Drama, Thriller) | 0.120548 | 0.087123 | 0.032000 | 0.265455 | 3.046884 | 0.021497 | 1.242778 |
37 | (Thriller) | (Crime, Drama) | 0.189479 | 0.067616 | 0.032000 | 0.168884 | 2.497673 | 0.019188 | 1.121845 |
rules_movies[(rules_movies.lift>4)].sort_values(by=['lift'], ascending=False)1
antecedents | consequents | antecedent support | consequent support | support | confidence | lift | leverage | conviction | |
---|---|---|---|---|---|---|---|---|---|
14 | (Children) | (Animation) | 0.063890 | 0.048986 | 0.027068 | 0.423671 | 8.648758 | 0.023939 | 1.650122 |
15 | (Animation) | (Children) | 0.048986 | 0.063890 | 0.027068 | 0.552573 | 8.648758 | 0.023939 | 2.092205 |
Children和Animation 这俩题材是最相关的了,常识也可以分辨出来。
movies[(movies.genres.str.contains('Children')) & (~movies.genres.str.contains('Animation'))]1
movieId | title | genres | |
---|---|---|---|
1 | 2 | Jumanji (1995) | Adventure|Children|Fantasy |
7 | 8 | Tom and Huck (1995) | Adventure|Children |
26 | 27 | Now and Then (1995) | Children|Drama |
32 | 34 | Babe (1995) | Children|Drama |
36 | 38 | It Takes Two (1995) | Children|Comedy |
51 | 54 | Big Green, The (1995) | Children|Comedy |
56 | 60 | Indian in the Cupboard, The (1995) | Adventure|Children|Fantasy |
74 | 80 | White Balloon, The (Badkonake sefid) (1995) | Children|Drama |
81 | 87 | Dunston Checks In (1996) | Children|Comedy |
98 | 107 | Muppet Treasure Island (1996) | Adventure|Children|Comedy|Musical |
114 | 126 | NeverEnding Story III, The (1994) | Adventure|Children|Fantasy |
125 | 146 | Amazing Panda Adventure, The (1995) | Adventure|Children |
137 | 158 | Casper (1995) | Adventure|Children |
148 | 169 | Free Willy 2: The Adventure Home (1995) | Adventure|Children|Drama |
160 | 181 | Mighty Morphin Power Rangers: The Movie (1995) | Action|Children |
210 | 238 | Far From Home: The Adventures of Yellow Dog (1995) | Adventure|Children |
213 | 241 | Fluke (1995) | Children|Drama |
215 | 243 | Gordy (1995) | Children|Comedy|Fantasy |
222 | 250 | Heavyweights (Heavy Weights) (1995) | Children|Comedy |
230 | 258 | Kid in King Arthur's Court, A (1995) | Adventure|Children|Comedy|Fantasy|Romance |
234 | 262 | Little Princess, A (1995) | Children|Drama |
280 | 314 | Secret of Roan Inish, The (1994) | Children|Drama|Fantasy|Mystery |
308 | 343 | Baby-Sitters Club, The (1995) | Children |
320 | 355 | Flintstones, The (1994) | Children|Comedy|Fantasy |
326 | 362 | Jungle Book, The (1994) | Adventure|Children|Romance |
338 | 374 | Richie Rich (1994) | Children|Comedy |
361 | 410 | Addams Family Values (1993) | Children|Comedy|Fantasy |
371 | 421 | Black Beauty (1994) | Adventure|Children|Drama |
404 | 455 | Free Willy (1993) | Adventure|Children|Drama |
431 | 484 | Lassie (1994) | Adventure|Children |
... | ... | ... | ... |
7707 | 83177 | Yogi Bear (2010) | Children|Comedy |
7735 | 84312 | Home Alone 4 (2002) | Children|Comedy|Crime |
7823 | 87383 | Curly Top (1935) | Children|Musical|Romance |
7900 | 89881 | Superman and the Mole-Men (1951) | Children|Mystery|Sci-Fi |
7929 | 90866 | Hugo (2011) | Children|Drama|Mystery |
7935 | 91094 | Muppets, The (2011) | Children|Comedy|Musical |
7942 | 91286 | Little Colonel, The (1935) | Children|Comedy|Crime|Drama |
7971 | 91886 | Dolphin Tale (2011) | Children|Drama |
8096 | 95740 | Adventures of Mary-Kate and Ashley, The: The Case of the United States Navy Adventure (1997) | Children|Musical|Mystery |
8199 | 98441 | Rebecca of Sunnybrook Farm (1938) | Children|Comedy|Drama|Musical |
8200 | 98458 | Baby Take a Bow (1934) | Children|Comedy|Drama |
8377 | 104074 | Percy Jackson: Sea of Monsters (2013) | Adventure|Children|Fantasy |
8450 | 106441 | Book Thief, The (2013) | Children|Drama|War |
8558 | 110461 | We Are the Best! (Vi är bäst!) (2013) | Children|Comedy|Drama |
8592 | 111659 | Maleficent (2014) | Action|Adventure|Children|IMAX |
8689 | 115139 | Challenge to Lassie (1949) | Children|Drama |
8761 | 118997 | Into the Woods (2014) | Children|Comedy|Fantasy|Musical |
8765 | 119155 | Night at the Museum: Secret of the Tomb (2014) | Adventure|Children|Comedy|Fantasy |
8766 | 119655 | Seventh Son (2014) | Adventure|Children|Fantasy |
8792 | 122932 | Elsa & Fred (2014) | Children|Comedy|Romance |
8845 | 130073 | Cinderella (2015) | Children|Drama|Fantasy|Romance |
8850 | 130450 | Pan (2015) | Adventure|Children|Fantasy |
8871 | 132046 | Tomorrowland (2015) | Action|Adventure|Children|Mystery|Sci-Fi |
8916 | 135264 | Zenon: Girl of the 21st Century (1999) | Adventure|Children|Comedy |
8917 | 135266 | Zenon: The Zequel (2001) | Adventure|Children|Comedy|Sci-Fi |
8918 | 135268 | Zenon: Z3 (2004) | Adventure|Children|Comedy |
8960 | 139620 | Everything's Gonna Be Great (1998) | Adventure|Children|Comedy|Drama |
8967 | 140152 | Dreamcatcher (2015) | Children|Crime|Documentary |
8981 | 140747 | 16 Wishes (2010) | Children|Drama|Fantasy |
9052 | 149354 | Sisters (2015) | Children|Comedy |
336 rows × 3 columns
具体分析还得落实到数据本身,这就需要充分理解数据才可以。