Rating data files
Rating data files have at least three columns: the user ID, the item ID, and the rating value.
Valid user and item IDs are strings, and the rating value is a single precision (32 bit, float
) floating point number.
Date/time information or numerical timestamps will be used if necessary, and ignored otherwise.
Examples
Tab-separated columns (.tsv)
5951 50 5 5951 223 5 5951 260 5 5951 293 5 5951 356 4 5951 364 3 5951 457 3
Tab-separated columns (.tsv) with non-integer ratings
u5951 i50 5.0 u5951 i223 5.0 u5951 i260 5.0 u5951 i293 4.5 u5951 i356 4.0 u5951 i364 3.5 u5951 i457 3.0
Space-separated columns
5951 50 5 5951 223 5 5951 260 5 5951 293 5 5951 356 4 5951 364 3 5951 457 3
Comma-separated columns (.csv)
5951,50,5 5951,223,5 5951,260,5 5951,293,5 5951,356,4 5951,364,3 5951,457,3
Rating data with timestamps
5951,50,5,978300760 5951,223,5,978302109 5951,260,5,978301968 5951,293,5,978300275 5951,356,4,978824291 5951,364,3,978302268 5951,457,3,978300719
Rating data with dates
5951 50 5 2005-12-04 5951 223 5 2005-12-04 5951 260 5 2005-12-04 5951 293 5 2005-11-27 5951 356 4 2005-11-27 5951 364 3 2005-11-27 5951 457 3 2005-11-27
Rating data with dates and times
5951 50 5 "2009-08-05 00:50:30" 5951 223 5 "2009-08-02 17:19:33" 5951 260 5 "2010-05-04 21:21:03" 5951 293 5 "2009-09-25 05:04:24" 5951 356 4 "2010-06-30 02:07:57" 5951 364 3 "2010-06-11 04:54:41" 5951 457 3 "2010-06-11 14:26:32"
It does not matter whether the date/time information is in quotation marks or not.
Command-line tools
Rating prediction
MyMediaLite's rating prediction tool
supports all three variants by default for its --training-data=FILE
and --test-data=FILE
arguments.
If you know that all your ratings are integers between 0 and 255, you can use --rating-type=byte
to save memory.
--prediction-file=FILE
, you will get a tab-separated file with predictions for the test data.
With the --prediction-line="FORMAT"
you can modify the line format, e.g. if you want the items to be in the first column: --prediction-line="{1},{0},{2}"
.
Item recommendation
The item recommendation tool also supports this rating data format.
By default, every rating is interpreted as positive feedback, the rating value is ignored.
You can use the option --rating-threshold=NUM
to define the minimum rating value that is to be interpreted as positive feedback.
Ratings below this value will be ignored.
Classes
Reading
The classes implementing reading in this kind of rating data are
MyMediaLite.IO.RatingData
and
MyMediaLite.IO.StaticRatingData
.
The former's Read()
method returns a data structure that can be updated/modified, while the latter's result cannot be updated.
If you also want to read in time information, use the static class MyMediaLite.IO.TimedRatingData
.
Writing
MyMediaLite.RatingPrediction.Extensions
contains an extension method WritePredictions()
that lets you write the predictions to a target, either a file or a TextWriter
, for example:
recommender.WritePredictions(ratings, user_mapping, item_mapping, target);This will use tabs as column separators.
If you want other separators, provide a line format string to the method:
recommender.WritePredictions(ratings, user_mapping, item_mapping, target, "{0}|{1}|{2}");