jeorj_euler's profile

10.7K Messages

 • 

225.4K Points

Tuesday, July 25th, 2017 7:02 PM

Closed

The advanced search tools should be more advanced, like being able to exclude results.

Not only should there be a way to exclude results based upon particular attributes, there should be a way to control whether multiple parts of a query are intersected (treated with logical "and") or united (treated with logical "or"), or for that matter, combinations of these operations as an expression in a formal language.

For example, I would like to be able to search for titles that are rated G but also not animated. Without a plot keyword such as, say, "live-action" in regular use (or applied automatically on title entries that do not specify the nature of the work), there is no way to filter out cartoons, stop-action cinematography or computer-generated pictures.

Some get-us-up-to-speed information...

The query string of the URL search results listings is organized into a collection of parameters, some of which accept as values a comma-delimited sequence of movie/person properties. For some parameters (such as ones that do not have numeric ranges for values or are not special), the commas ultimately function as either the logical operator "and" or the logical operator "or". An example of such a URL is http://www.imdb.com/search/title?genres=biography,history,war&title_type=feature,tv_movie,short: "Most Popular Biography-History-War Feature Films/TV Movies/Short Films". Each result is a title that belongs at least to all three of the genres "biography", "history" and "war" (an intersection ["and"] operation), while also being any of the types "feature film", "TV movie" and "short film" (a union ["or"] operation). In title of the results page, noticeably the genres are separated by hyphens whereas the types are separated by forward slashes.

Some of the possible parameters in advanced title searches are as follows:
  • certificates, which can have any combination of the values "us:g", "us:pg", "us:pg_13", "us:r" and "us:nc_17" (and likely more) which are ored together;
  • colors, which can have any combination of values "color", "black_and_white", "colorized" and "aces" which are ored together;
  • companies, which can have any combination of the values "fox", "columbia", "dreamworks", "mgm", "paramount", "universal", "disney" and "warner" which are anded together;
  • genres, which can have any combination of the values "action", "adventure", "animation", "biography", "comedy", "crime", "documentary", "drama", "family", "fantasy", "film_noir", "game_show", "history", "horror", "music", "musical", "mystery", "news", "reality_tv", "romance", "sci_fi", "sport", "talk_show", "thriller", "war" and "western" which (as stated before) are anded together;
  • groups, which can have any combination of the values "top_100", "top_250", "top_1000,", "now-playing-us", "oscar_winners", "oscar_best_picture_winners", "oscar_best_director_winners", "oscar_nominees", "emmy_winners", "emmy_nominees", "golden_globe_winners", "golden_globe_nominees", "razzie_winners", "razzie_nominees", "national_film_registry", "bottom_100", "bottom_250" and "bottom_1000" which are anded together (which in some cases nullify the outcome [but that is no big deal]);
  • keywords, which can have any combination of alphanumeric strings or hyphen-separated alphanumeric strings which are anded together;
  • online_availability, which can have any combination of the values "US/today/IMDb/free", "US/today/Amazon/paid", "US/today/Amazon/subs", "US/today/Amazon/subs", "US/today/WithoutABox/free" and "US/today/Internet Archive/free" which are ored together;
  • production_status, which can have any combination of the values "released", "post_production", "filming", "pre_production", "completed", "script", "optioned_property", "announced", "treatment_outline", "pitch", "turnaround", "abandoned", "delayed", "indefinitely_delayed", "active" and "unknown" which are ored together;
  • role, which can any combination of person keys (for example, "nm1000000") which are anded together (["collaborations and overlaps" in other words] one of IMDb's best features, by the way);
  • title_type, which can have any combination of the values "feature", "tv_movie", "tv_series", "tv_episode", "tv_special", "mini_series", "documentary", "game", "short", "video" and "tvshort" which (as stated before) are ored together.
The above is a partial specification of the system in place. I do not yet have a detailed proposal for what would be even more advanced and improved, because there are lots of contextual nuances (of how to apply set theory) to sort out.

By the way, the techniques presented in the IMDb GS topic "Excluding genres in advanced title search" are outdated or inaccurate. To elaborate on that point, we will note that http://www.imdb.com/search/title?!genres=comedy,music&genres=documentary produces the same results as http://www.imdb.com/search/title?genres=documentary does.

This conversation has been merged. Please refer the main conversation:

List of URL search parameters

Champion

 • 

19.4K Messages

 • 

476.9K Points

7 years ago

Hi Jeorj Euler,

Nicely stated specifications. I would love to have an advanced search with the power you specified.

10.7K Messages

 • 

225.4K Points

Just to clarify, in case my words were misread, most of what I brought up (as a matter of "specification") explain the existing system which has been in place for many years now.

10.7K Messages

 • 

225.4K Points

7 years ago

O, alright. Thanks, Dan Dassow. It would appear that there are some other GS topics that go into some of the hypothetical details of using Boolean expressions to control union, intersection and inversion (exclusions). It would be nice if we could at least exclude unwanted results based upon their known properties.

Champion

 • 

5K Messages

 • 

118.2K Points

Jeorj Euler

Would you like to point out any of the other ATS ideas that look promising?

10.7K Messages

 • 

225.4K Points

Ha! I'm not sure what would be promising. I don't really have a refined idea short of suggesting that the company host a publicly-accessible SQL site with well-documented parameters, for which machines that have compatible SQL clients and Web daemons would basically broker access for people without SQL clients. Such is inconvenient in a number of ways, including the fact that only Web, e-mail and chat (and shell) systems could ever viably accommodate advertisements being presented to client software. So, instead, there might as well be a website that accepts queries organized into expressions with parentheses, brackets, braces, operators and operands/parameters having alphanumeric identifiers. It could be a burden on the servers if the domain of allowed expressions (or scripts in effect) is too broad, as in enough to allow the risk of lengthy inefficient expressions to be executed. As well, things can get complex and even complicated when multiple nesting of parentheses is needed to convey an idea (including submitting a query). I'd like to think of something a bit more simple or easy, whereby human error (and subsequent frustration) will not be as prevalent.

Champion

 • 

19.4K Messages

 • 

476.9K Points

Providing a publicly-accessible SQL server for IMDb data probably would require more server and support resources than IMDb would be willing to expend. IMDb does provide a static alternate subset version of IMDb Data.

IMDb Datasets
http://www.imdb.com/interfaces/

5 Messages

 • 

232 Points

7 years ago

And how many angels can dance on the head of a pin?

107 Messages

 • 

2.2K Points

12.5?

10.7K Messages

 • 

225.4K Points

6 years ago

https://d2r1vs3d9006ap.cloudfront.net/s3_images/1753786/RackMultipart20180926-46876-oi4fwq-fist_bump_it_photo_print.png?1537989666

Champion

 • 

7.4K Messages

 • 

276K Points

6 years ago

Jeorj: I see that I have already given you a vote in favor of this idea, so I am just posting here again to indicate that I agree that I would like to see the advanced search tools offer options such as AND, OR, and NOT as the user may prefer.