The preparations include defining what to include, when scraping the content to be used as source for the recommendations. See also Collecting metadata.
Set up the data-epi-type="content" and data-epi-type="title" elements as follows. You can define multiple blocks of data-epi-type="content", which are concatenated for topic extraction.
<html> <head> <!-- Place normal og tags here --> </head> <body> <header> <span>Stuff to ignore.</span> </header> <div id="wrapper"> <h1 data-epi-type="title">My page tile</h1> <div data-epi-type="content"> <p>The good stuff.</p> <p>Even more good stuff.</p> </div> </div> <footer> <span>Other stuff to ignore.</span> </footer> </body> </html>
You can use the data-epi-type=”title” to specify what the scraper should use as the title of a piece of content within the instance. A common situation where this is needed is if the contents in your <og:title>, <h1>, and <title> tags are set as different values. Adding this identifier ensures that you pick up the correct value as the title.
Any metadata such as <og:title> will still be picked up and saved against the content in addition to the main title of the page itself.
You can use the data-epi-type=”content” element to specify what texts on the page should be included for topic extraction. A common situation where this is needed is if you have disclaimers and/or footers on a page and you do not want our service to scrape those parts of the page.
Updated 5 months ago