endeca123.com
14May/132

Installing OEID Provisioning Service

Posted by Wim Villano

One completely new feature of OEID 3.0 is the Provisioning Service. What is that? The Provisioning Service provides the business user a way  to do self-service data loads and application creation. It provides a framework where users can upload spreadsheet data (from a desktop) and create a discovery application on top of that. If we read the documentation carefully we see a Note: In Version 3.0, the Provisioning Service only supports upload of Excel files. This indicates, I think, that more sources will be added in the future ...

Where is the button to upload a spreadsheet? Well, first we need to install the Provsioning Service. It runs as a web application in a WebLogic Server container. There is an expert install (see the documentation) and an easy (test/development) install. Below you will find the steps involved with an easy install.

I am assuming you already have done a basic install of OEID 3.0 as described in this post (or similar) and downloaded and unzipped the Provisioning software. First you need to create an additional domain via the provided eidProvisioningTemplate.jar. Start config.cmd located in <middleware home>\wls_server_10.3\common\bin.

Create a New WebLogic Domain


Click <Next>

Select the template eidProvisioningTemplate.jar from the location where you unzipped the package.


Click <Next>

Leave all the settings to defaults


Click <Next>

Enter the Administrator credentials


Click <Next>

Select the Production Mode


Click <Next>

Check Administration Server (to be able to change the Listen port)


Click <Next>

Change the Listen port and uncheck SSL enabled


Click <Next>
Click <Create> and after a few seconds <Done>.

The next thing is to change the plan.xml. You will find this in <middleware home>\user_projects\domains\<provisioning domain>\eidProvisioningConfig.
Change the following 4 parameters:

Set endeca-server-ws-port (the Endeca Server port) in my case now:7770 (was default 7001):
Set endeca-server-security-enabled (SSL): false
Set transport-guarantee to NONE:
Set protected-url-pattern to /DISABLED:

The last change is for the upload directory. During the upload of the spreadsheet file it gets temporary stored by the Provisioning Service. By default the service uses the operating system temporary directory. You can also specify a directory. That directory is relative to the domain home directory. I want to specify my own directory and create a directory in my Provisioning Service directory structure. My domain is: D:\oracle\oeid30\fmw\user_projects\domains\oracle.eid-ps and I create the upload directory there:

Set the upload-file-directory:

Save the file plan.xml. Start the Provisioning Domain (<Middleware Home>\user_projects\domains\<provisioning domain>\bin>startWebLogic.cmd)

Now you need to configure the connection from Endeca Studio to the Provisioning Service. Start Endeca Studio if not running. Go to the Control Panel. Go to Provisioning Service:

Up to now you are not using SSL, so only the port and server are required:

Click <Save>. Go back to the Home page.

If you now click on you will see the new option to upload your own data:

Be aware: If the Provisioning Service domain has not been started, this option will not show up!

Self-Service for everybody!

3May/130

New Screencast Series OEID v3.0

Posted by Wim Villano

As with release v2.3 Oracle has put some screencasts on Youtube: OEID v3.0 Screencasts. Very nice screencasts to get familiar with v3.0 and do basic development & configuration.

Enjoy.

 

 

Filed under: Uncategorized No Comments
25Apr/130

Change the Endeca Server Port

Posted by Wim Villano

 

In the blog entry Installing Oracle Endeca Information Discovery 3.0 the Endeca Server was installed on the default port of 7001. If you have running multiple installations on your machine (i.e. Oracle BI) which also makes use of this port you probably want to change it.

Here are the steps to do so:

Start the Endeca Server (<Weblogic Middleware Home>\user_projects\domains\<endeca server domain name>\bin\startWebLogic.cmd) and log in on the console. Click on Environment -> Servers and then in the right pane Adminserver:

Click on the <Lock & Edit> button and change the Listen Port to i.e. 7770 (can be any free port number):

Click on <Save>

Click <Activate Changes>

Stop the Endeca Server (<Weblogic Middleware Home>\user_projects\domains\<endeca server domain name>\bin\stopWebLogic.cmd).

Now the port of the Webservice interface has been changed, but (unfortunately) there are also some hard coded port numbers in some files.

These files need to be changed:

<Weblogic Middleware Home>\user_projects\domains\<endeca server domain name>\config\EndecaCmd.properties
Change port=7001 to  port=7770

<Weblogic Middleware Home>\user_projects\domains\<endeca server domain name>\config\EndecaServer.properties
Change endeca-webserver-port=7001 to endeca-webserver-port=7770

The port is also hard coded in the stopWeblogic.cmd found in directory <Weblogic Middleware Home>\user_projects\domains\<endeca server domain name>\bin

Change set ADMIN_URL=t3://<your host name>:7001 to set ADMIN_URL=t3://<your host name>:7770

With the above changes you can use the endeca-cmd.bat located in directory <Weblogic Middleware Home>user_projects\domains\<endeca server domain name>\EndecaServer\bin.
But the endeca-cmd.bat located in <Weblogic Middleware Home>\EndecaServer7.5.1_1\endeca-cmd (which is actually called by the one mentioned in the previous sentence :-)  ) is missing the environment variable ENDECA_CMD_PROPERTIES (I could not find where that default port is stored ....).
So if you also want to use this endeca-cmd.bat I suggest to add the environment variable ENDECA_CMD_PROPERTIES to the .bat file. You can copy it  from the file <Weblogic Middleware Home>user_projects\domains\<endeca server domain name>\EndecaServer\bin\endeca-cmd.bat.


Start the Endeca Server. Your Endeca Server should now run on the new port. Do not forget to change the port in projects created and also the Data Sources in Studio should be changed.

8Apr/1321

Installing Oracle Endeca Information Discovery 3.0

Posted by Wim Villano

So, we saw that the Quickstart Installation is no longer part of the download packages. How do we then install the whole package step-by-step?

All further documentation can be found here.

You have some different options for installations for the the presentation part (Endeca Studio). You can manually deploy it in Tomcat or you can manually deploy it in Weblogic or you can use the all-in-one package bundled with Tomcat.

Since we do not have a choice where to deploy the Endeca Server (this has to be a Weblogic application server), I also decided to deploy the Endeca Studio manually on the same Weblogic application server as the Endeca Server will be deployed. But you could save some installations steps by using the all-in-one package bundled with Tomcat.

What software do we need (Bill Of Material):

  • Oracle Endeca Server (7.5.1.0) from edelivery
  • Oracle Endeca Information Discovery Studio (3.0) for WebLogic from edelivery
  • Oracle Endeca Information Discovery Integrator (3.0) from edelivery
  • JDK version 6
    Download the Sun version here or
    On this page download the jRockit version for Windows x86-64 (recommended for Endeca Server, but not certified with Endeca Studio as Brett stated in the comments)
  • Oracle Weblogic Server 11gR1
    On this page download Weblogic 10.3.6, the Generic: 1GB release.
  • ADF Runtime
    On this page download Application Development Runtime version 11.1.1.6:

 

When you've downloaded these components you are ready to go. Perhaps first a brief summary of the steps we will do during this install, so you will not loose the overview:

  1. Install the JDK (needed for the application server)
  2. Install Weblogic Server (the application server)
  3. Install the ADF Runtime software (some additional middletier components needed for a succesful Endeca Server deployment)
  4. Install the Endeca Server software (the analytical search database)
  5. Create a Weblogic Domain for the Endeca Server
  6. Create a Weblogic Domain for Endeca Studio
  7. Update some Weblogic settings for the purpose of the installation of Endeca Studio
  8. Deploy the Endeca Studio application (the presentation layer)
  9. Install Integrator (the ETL component)
  10. Done

 

1. Install the JDK

I use the jRockit, but the Sun JDK has similar steps. To install the jRockit JDK just start the setup.exe. Click <Next> on the first screen and on the second screen enter a path. DO NOT USE SPACES!

Click <Next>. On the next page make no selections and click <Next> again. Then you are asked if you want to install the JRE as well. It is not necessary, but if you say Yes and <Next> then you are asked for a path again. Enter a different path then the previous one and click <Next>. jRockit will be installed.

2. Install Weblogic Server

To install Weblogic Server execute the download wls1036_generic.jar with the JDK. Go to the directory with wls1036_generic.jar and type:
<JDK Path>\bin\java -jar wls1036_generic.jar
Like:

The installer will start. On the Welcome screen Click <Next>.

Then select 'Create a newMiddleware Home' and enter a path where to install (I entered: d:\oracle\oeid30\fmw):

Click <Next>.

Then I try with minimum clicks :-) to make clear to the software that I do not want to be notified of updates by clicking:

Click <Next> Click <Yes> Click <Yes>

On the next screen put a check at "I wish to remain uninformed ..." and click <Continue>.

Then choose a Custom install and click <Next>. On the next screen uncheck Oracle Coherence  and click <Next>. The just installed JDK should be checked on the screen:  . click <Next>.

Then enter a path for the installation directory:

Click <Next>. Choose not to install the Nodemanager:

Click <Next>, Click For "All users" Start Menu Folder:

Click <Next>, again <Next>. Then the install will run for about a minute. Uncheck the "Run Quickstart" and click <Done>.

Weblogic has been installed.

3. Install the ADF Runtime software

After you unzipped the downloaded software go to the directory Disk1. From a command line we start the setup.exe with a jreLoc option: setup.exe -jreLoc <JDK dir>. In my case: setup.exe -jreLoc d:\oracle\jrockit

The installer starts. On the first screen click <Next>. Select the "Skip Software Updates" option and click <Next>. After the Prerequisites Checks have been done click <Next>. Then enter the location of the Oracle Middleware Home. The just installed Weblogic home should appear in the drop down list:


Click <Next>. On the Application Server screen "Weblogic Server" is already checked. Click <Next>. After about 2 minutes the installation is finished. Click <Next> and <Finish>

4. Install the Endeca Server Software

Open a command line box, go to the unzipped Endeca Server software directory Disk1 and start the setup with the -jreLoc option as with the ADF runtime software: setup.exe -jreLoc <JDK dir>. In my case: setup.exe -jreLoc d:\oracle\jrockit

The installer starts. On the Welcome screen click <Next>. After the Prerequisites Checks have been done click <Next>.  Then select the Weblogic Middleware home from a drop down list. Leave the Oracle Home Directory to the default (EndecaServer7.5.1_1):

After entering locations click <Next>. For this installation we will not run the Endeca Server in a secure mode, so uncheck 'Yes':

Click <Next>. After 2 minutes the installation is finished. Click <Next> and then <Finish>. The core Endeca Server software has now been installed. The surrounding Web Application Server software is next.

 

5. Create a Weblogic Domain for the Endeca Server

Now we need to create a Weblogic Domain for the Endeca Server software. To do so, open a command prompt (of course you can also use Windows Explorer, but I like the command line, if something goes wrong ...) and go to the directory: <Weblogic Middleware Home>\wlserver_10.3\common\bin (for my environment: d:\oracel\oeid30\fmw\wlserver_10.3\common\bin). Type there: config.cmd

On the appearing screen choose: Create a new Weblogic domain. Click <Next>. Then select "Oracle Endeca Server - 7.5.1.0". The second check ("Oracle JRF") comes for free:

Click <Next>.

Enter a Domain name (mine: endecaserver_domain) and leave the Domain location untouched:

Click <Next>. On the next screen you enter the username and password for the administrator of the domain:

Click <Next>. Choose Production Mode and see that the right JDK already has been checked:

Click <Next>. No optional configuration. Click <Next>. Click <Create>. After 30 seconds the domain has been created. Click <Done>.

You can verify your install of the Endeca Server. Start the Endeca Server Domain by executing:
<Weblogic Middleware Home>\user_projects\domains\<endeca server domain name>\bin\startWebLogic.cmd. For my installation: d:\oracle\oeid30\fmw\user_projects\domains\endecaserver_domain\bin\startWebLogic.cmd

You will be asked to enter the username and password of the administrator of this domain (you entered this during the installation).

From a browser start this URL: http://localhost:7001/endeca-server/ws/manage?wsdl

That should give a respons something like this:

Now the Endeca Server is completely installed and operational (on port 7001). Shortly will follow a blog entry to change the port for the Endeca Server.

 

6.  Create a Weblogic Domain for Endeca Studio

For the Endeca Studio application we also need to create a domain (recommended as per the documentation). This is similar to the Endeca Server Domain creation: Open a command prompt and go to the directory: <Weblogic Middleware Home>\wlserver_10.3\common\bin (for my environment: d:\oracel\oeid30\fmw\wlserver_10.3\common\bin). Type there: config.cmd.

Select "Create a new WebLogic domain". Click <Next>. Use the default Basic domain configuration:

Click <Next>

Entera Domain name and leave the Domain location untouched:

Click <Next>, Then enter a username and password for the domain administrator:

Click <Next>. Choose Production Mode and see that the JDK has been selected.

Click <Next>.

Select modify settings of Administration Server:

Click <Next>. Since we did a basic domain installation of the Endeca Server this has been configured on port 7001. So we need to change the port of the domain where Endeca Studio will be running. Change the Listen port of the future Endeca Studio Application. I chose port 8880.

Click <Next>. Then click <Create> and after a couple of seconds the domain has been created. Click <Done>.

A domain has been created and is ready to deploy the Studio application.

7. Update some Weblogic settings for the purpose of Endeca Studio

Before you can deploy the Endeca Studio application, we need to modify some Weblogic settings. Edit the file:
<Weblogic Middleware Home>\user_projects\domains\<Endeca Studio Domain>\bin\setDomainEnv.cmd

Copy the italics text below and paste it in the top of the file, i.e. just after set WL_HOME.

set JAVA_OPTIONS=-Djavax.xml.transform.TransformerFactory=com.sun.org.apache.xalan.internal.xsltc.trax.TransformerFactoryImpl -Djavax.xml.parsers.DocumentBuilderFactory=com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderFactoryImpl -Djavax.xml.parsers.SAXParserFactory=com.sun.org.apache.xerces.internal.jaxp.SAXParserFactoryImpl

In the same file update (2 times) memory arguments:
Change set WLS_MEM_ARGS_64BIT=-Xms256m -Xmx512m into: set WLS_MEM_ARGS_64BIT=-Xms256m –Xmx1024m
and Change
set MEM_MAX_PERM_SIZE_64BIT=-XX:MaxPermSize=256m into: set MEM_MAX_PERM_SIZE_64BIT=-XX:MaxPermSize=512m

Save the file.

Stop the Endeca Studio domain if started (via command <Weblogic Middleware Home>\user_projects\domains\<endeca server domain name>\bin\stopWebLogic.cmd)  and start it (via command <Weblogic Middleware Home>\user_projects\domains\<endeca server domain name>\bin\startWebLogic.cmd).

8. Deploy the Endeca Studio application

Now you will deploy the Endeca Studio application in the just created Weblogic domain. After you unzip the edelivery Studio package for Weblogic you'll see some files:

Copy the files portal-ext.properties and endeca-portal-weblogic-3.0.10089.ear to <Weblogic Middleware Home>\user_projects\domains

In the directory <Weblogic Middleware Home>\user_projects\domains create the following directories: data, deploy and weblogic-deploy. In the newly created directory <Weblogic Middleware Home>\user_projects\domains\data create the directory endeca-data-sources.

The directory <Weblogic Middleware Home>\user_projects\domains will look like:

Now let's deploy the Endeca Studio file (endeca-portal-weblogic-3.0.10089.ear).

Make sure your Endeca Studio domain has been started. Open a browser and go to URL: http://localhost:<port number>/console (for me: http://localhost:8880/console).

Go to Configure applications:

Click Lock & Edit:


Click Install:

Locate the file endeca-portal-weblogic-3.0.10089.ear in <Weblogic Middleware Home>\user_projects\domains and paste the path to that file in the Path space. Hit <enter>:

The file endeca-portal-weblogic-3.0.10089.ear appears. Select it and click <Next>.
Leave the defaults:

Click <Next>.
Again leave the defaults:

Click <Finish>. After the deployment is finished you will return to the overview screen. Click on <Activate Changes>:

Then the message appears that no restart is necessary:

Then click on deployments in the left pane of the Domain Structure. In the deployments then check endeca-portal-weblogic and from the Start drop down select: Servicing All Requests:

On the next screen click <Yes>. When the deployment screen returns and the State has been changed to Active, you can check the URL: http://localhost:<port number> (in my case: http://localhost:8880):

You can log on with username: admin@oracle.com and password: Welcome123. You have to reset the password immediately after log on.

That was the install of Endeca Studio.

9. Install Integrator

The last piece of software before we can create our projects is the Integrator. After the unzip of the downloaded file you will see 3 files. 2 Files are needed when you would want to deploy the server version for production environments. For now double click the file EID_3.0_Integrator.exe.

On the opening screen click <Next>, click <Next> again. Then enter a location:

Click <Next>, Click <Next> and click <Finish>.

That was the Integrator install.

10. Done

Hopefully the next release of Endeca will give us the 'next, next, finish' experience again ...

Enjoy Endeca 3.0

 

 

24Mar/132

Oracle EID 3.0 is Available

Posted by Wim Villano

Hi all,

Oracle Endeca Information Discovery 3.0 is out!. It is just released on 22nd March.

Some New Features:

Studio 3.0

  • Refreshed Look And Feel
  • New Chart Types (Bubble and Scatter)
  • Lots of components upgraded (Metric Bar, Range Filter, Guided Navigation)

Endeca Server 7.5

  • Language sensitive search for 22 languages
  • New EQL features (like case statement, string join, etc.)

Data Integrator 3.0

  • Some General ETL improvements (like job flow control)
  • Enhanced support for some Data Sources (Spreadsheet, Microsoft Access)
  • New Native JSON Reader
  • New HTTPConnector

 

Something completely new and exciting is the Provisioning Services: The Provisioning Service enables dynamic application creation using data uploaded from the user's desktop. So that means that business users can upload their own data to the server and create their applications with no effort from the IT department.

Something we seem to have lost is the quick installation package. This means we have to do the 3 step installation of: Endeca Server, Integrator and Studio separately.

Also excited to get your hands on this new release? Go to https://edelivery.oracle.com

Filed under: Oracle EID 3.0 2 Comments
12Feb/130

Loading data from the Oracle Business Intelligence (OBI) Server

Posted by Wim Villano

A new feature with OEID v2.4 is to load data directly from an Oracle Business Intelligence (OBI) Server. For companies who are using OBI and have modeled their common enterprise information model in it this can be a very nice feature. One place to get the structured data from!

How do we load data from OBI into an index with Integrator? According to the documentation we go in the Integrator to the menu bar and choose: File > New > Project. Integrator will display the New Project dialog. Then expand the Information Discovery Node and select Load Data from OBI Server.

Unfortunately this does not appear in my screen:

I can only see the Load Metadata from a Record Store ...

This perhaps has to do with my quickstart installation? It appears that not all plug-ins are present. To fix this download the Integrator v2.4 from edelivery.oracle.com. You will find it in the Media Pack: Oracle Endeca Information Discovery Integrator (2.4) Media Pack for Microsoft Windows x64 (64-bit) .

Unzip the file. You will see the plug-in appear:

Copy this file to the plug-ins directory of Integrator: <Endeca Install Dir>\Endeca\Discovery\2.4.0\Integrator\plugins

Restart Integrator and the Wizard will appear as stated in the documentation:

You can now follow the instructions in the Wizard documentation to load data from the OBI server. I suggest the first time to choose the option: 'Create new project'. Then it becomes clear how it works (no magic there) and we can use it in other projects.

When you choose the option 'Create new project' a complete project will be generated. If you run the graph Baseline a new index will be created and data loaded from the OBI Server. The 2 important and re-usable information pieces are: How the format of the BI Server query should look (the QueryStatement.sql in the Navigator pane) and what connection information is needed (Connections in the Outline pane):

First have a look at the connection. If you double click the connection in the Outline pane the connection pane opens:

As you can see it uses the Oracle BI JDBC driver which is embedded in the Plug-in. So if we want to create a connection in another project we have to choose the driver oracle.bi.jdbc.AnaJdbcDriver and copy (or type in) the URL jdbc:oraclebi://wvillano-nl:9703/ (where wvillano-nl should be your OBI Server name) upon creation of the connection.

What should be the format of a query against the OBI Server? Double click on the QueryStatement.sql:

It looks like the logical query OBI is normally producing. But there is a difference. It starts with: SELECT_BUSINESS_MODEL. And that is also what it does! So the entities should match the Business Model. Here a piece of my rpd:

As you can see in my presentation layer the Department is still called DEPT and in my Business Model layer it is called Department. The Integrator query uses the Business Model naming: Department.

So be aware, if your presentation layer has different names for attributes/tables then in your business model,  you cannot copy/paste the logical query created in the OBI Server log. Otherwise: just copy/paste the logical query put the command SELECT_BUSINESS_MODEL before it and it will run.

We can also create any query we want manually of course e.g. add filters to the query:

 

In the generated project the query is stored in a seperate file, but you can also enter it as 'SQL Query' in the DBInputTable Reader component in the graph.

For the edge metadata we can use the same query since we are not able (yet) to create the metadata via the graphical interface.

That should be enough information to use OBI Server as a source in your projects. I hope you like it too.

11Feb/130

OEID v2.4 and Upgrade from v2.3

Posted by Wim Villano

It seems there was a silent release of Oracle Endeca Information Discovery  v2.4 on January 31.

As per the notes (http://docs.oracle.com/cd/E35976_01/index.htm) the release is mainly about bug fixes from v2.3. A noticeable change has been made to the numbering of the Endeca Server. The Integrator and Studio numbering went from 2.3 to 2.4 where the numbering of the Endeca Server went from 2.3 to 7.4 (in the documentation is stated that "There have no behavioral changes to Endeca Server since the last release (version 2.3)", so the indexes should all still be valid)

So, any reason to upgrade then? Well there is a new version of the Oracle Text Enrichment software (now v5.1). This version also supports  the languages: German, Spanish, Portuguese and French. This is very interesting if your language is now on the list (Dutch still is not ...).  Besides the language support there is also an additional enrichment data directory for social media downloadable. This should provide a better handling of short-form content like Twitter.

My personal reason to upgrade is because Oracle Business Intelligence (OBI) server is now supported as a source for Endeca. That could be an interesting feature for current OBI customers who use it as their common enterprise information model and are looking at Endeca.

Upgrade from v2.3

To upgrade to v2.4 is pretty straight forward (http://docs.oracle.com/cd/E35976_01/general.240/eid_migration/toc.htm#About%20this%20guide).

Below you will find my upgrade steps to go from quickstart v2.3 to quickstart v2.4.

I made a back-up of the directories:
- For Endeca Server (the indexes): <Oracle Endeca Dir>\Endeca\Server\2.3.0\endeca-server\data
- For Endeca Studio (the communities, data sources): <Oracle Endeca Dir>\Endeca\Discovery\2.3.0\Studio\data
- For Integrator projects: directory Workspace

If you customized or added other content to any of the components, please also back-up! I have a non-customized environment.

Download the software (Oracle Endeca Information Discovery Quick Start (2.4) for Microsoft Windows x64 (64-bit))  from edelivery.oracle.com and run the quickstart installer. This will then first do an uninstall of the previous version:

Installation Step v2.4

A part of the directory structure of v2.3 will still be on the disk (like the indexes), but to be sure it is more save to back them up as written.

Then you are prompted (after 2 screens) to enter a location for the new installation:

Installation Step v2.4

I suggest to use a different location then the previous one to avoid potential conflicts.

After a few minutes the installation is completed. The new software has been installed.

Now we have to bring in the old (back-up) content to the new environment.
Let's start with the indexes.

Copy the backed-up Endeca Server data directory to: <New v24 Oracle Endeca Dir>\Endeca\Server\7.4.0\endeca-server\

Then start the Endeca Server.

Now we need to attach the v2.3 indexes to the new Endeca Server. We will do this with the batch file endeca-cmd.bat which you will find in the directory: <New v24 Oracle Endeca Dir>\Endeca\Server\7.4.0\endeca-cmd\. Each index needs to be attached via the command: endeca-cmd attach-ds IndexName.

Secondly we copy the backed-up Endeca Studio data directory to: <New v24 Oracle Endeca Dir>\Endeca\Discovery\2.4.0\Studio.

Remove 2 directories from the just copied data directory (is stated in the documentation): ee and endeca-data-sources. Your data directory should look like this:

Also do/copy additional customizations which you also had on the v2.3 environment (I had none).

Start Studio Server and wait untill all deployments are finished the first time before you open a page.

As you will see, when you start Integrator, it points to the same Workspace as v2.3. So all you projects are immediately available. No additional copy needed here.

That is it.

Oh yeah, the CAS is still v3.0.2 and works without a change.

Now let's have a look how to import data from Oracle BI Server ... In the next post.

25Oct/1221

How to Crawl Web Content with the CAS

Posted by Wim Villano

One of the unstructured sources for analyses with Oracle Endeca could be web content. Especially when doing some social media analyses for a customer the forum(s) definitely is/are one of the sources. This blog will show you how to crawl a forum with the CAS software.

When you install the CAS software (for installation instructions see the blog entry here), you also have a web crawler tool. The configuration and usage is different then crawling documents as described in the just mentioned blog entry, so here I will give a short example how to configure this. More options and information about the Web Crawler can be found in the Oracle documentation: here.

After the installation of the CAS software there are already sample scripts provided to configure a web content crawl. You will find such an example if you go to the directory: <CAS Install Dir>\3.0.2\sample\webcrawler-to-recordstore.

There you will find all necessary files to configure a web crawl. We take this as a starting point for our forum crawl. So make a copy of the directory "webcrawler-to-recordstore" (in the same directory: <CAS Install Dir>\3.0.2\sample) and give it a different name e.g.: "myfirstcrawl".

Go to the  directory "myfirstcrawl". We will make modifications to the following files (details below):
conf/endeca.lst
conf/crawl-urlfilter.txt
conf/site.xml
run-sample.bat

endeca.lst
In the file endeca.lst you can list all the URLs that you want to crawl. In my case I thought it would be nice to crawl the Dutch forum of the Liferay portal (because the number of posts can easily be handled :-) ):
http://www.liferay.com/community/forums/-/message_boards/category/6375573

But you can put there any (number of) URL(s).

crawl-urlfilter.txt
In this file you can define what to do with URLs found on the page: you can specify to follow a URL and crawl it or skip certain URLs and not follow them.

When you hoover hyperlinks on the Liferay forum page you can see the URLs. We do not want to follow all URLs. For example: we do not want to follow the URL which is under the Statistics option (www.liferay.com/community/forums/-/message_boards/statistics), but we do want to follow the category hyperlinks (e.g.: www.lifray.com/community/forums/-/message_boards/category/...).

We also want to follow the threads. So if you click on category "Algemeen" (which means in English: "General") you can see the threads. The threads have a different URL structure: www.liferay.com/community/forums/-/message_boards/message/...

So, to instruct the crawler to follow the following URLs:
www.liferay.com/community/forums/-/message_boards/category/
www.liferay.com/community/forums/-/message_boards/message/

We replace the following code in the file crawl-urlfilter.txt:

With:

The rest of the file we do not touch for now.

site.xml
Here we can specify the record store to write to (among other things). So at the tag <property> with the name tag output.recordStore.instanceName
we can specify the record store name. If not existing it will be created automatically when running the crawl. My record store will be called rs-myfirstrs:

 

run-sample.bat
The last configuration has to be done in the run-sample.bat. Here we have to point to the right record store (in my case rs-myfirstrs) and we can configure the depth of the crawl (so how deep we want to follow the URLs, starting with level 0). The wanted depth in my case is 2. Because Level 0 is: URL as mentioned in endeca.lst, Level 1 is: All Category Level and level 2 is: All Thread content level:

 

These are the (minimum) configuration steps. We are now ready to run web crawl. Execute the run-sample.bat. Be sure that the Endeca CAS Service is running. The crawl can take up to a few minutes. The result will be something like this:

 

To see the web content what has been crawled you can create a simple graph as also explained in the CAS installation blog here.

Now it is up to us to create better information out of the crawled data with all functionality available in CloverETL like text tagging, regular expressions, etc.

One last remark:

If you need a proxy to get access to the internet (most companies have one), you have to configure the file conf/default.xml. Change the
<!-- Proxy properties -->.

11Sep/123

Clean Up an Endeca Data Store

Posted by Wim Villano

This article was written at the time of EID 2.3. Since then some changes have been made to EID.
Some additional remarks for Endeca v3.0:
- the endeca-cmd commands now have a suffix of -dd (data domain) in stead of -ds (data store). So endeca-cmd list-ds is now in v3.0: endeca-cmd list-dd

 

With all tests and tryouts I keep creating new data stores. Every now and then I have to look up again how to clean up those data stores. So just for my record to find this quick I'll post the commands here.

All commands are done via the batch file endeca-cmd.bat that you will find in the directory: <Install Dir>\Endeca\Server\2.3.0\endeca-cmd\

To get a list of current active data stores, type:
endeca-cmd list-ds

Endeca List Active Data Stores

To stop a Data Store, type:
endeca-cmd stop-ds FirstSteps

To detach a Data Store, type:
endeca-cmd detach-ds FirstSteps

And then you can safely delete the Index directory files. Default location of the index files: <Install Dir>\Endeca\Server\2.3.0\endeca-server\data

9Sep/125

Moving OEID Environments (Project End Results)

Posted by Wim Villano

After finishing a demo project or POC we sometimes only want to move the end result to another machine. In that case we are not interested in the Integrator project and source files, but only in the resulting dashboards (and associated index) for the end user. That is actually quite easy to do.

There are some activities on the source machine and the target machine.

On the source machine there are 2 steps to do:
1 Make an export of the dashboard pages (.lar file)
2 Copy/save the Endeca Server Index files (the data).

On the target machine you have to do 5 steps:
3 Create a data store (with the same name as on the source machine)
4 Copy the source Index files to the new, just created, Endeca Server Index directory
5 Create the data source in Studio (with the same name as on the source machine)
6 Create a Studio Community
7 Import the dashboard pages (.lar file) into the community

Below a brief explanation of the steps. Here I will move my 'FirstSteps' environment from one machine to another.

1 Create an export of the dashboard.

  • In Studio on the source machine go to the Control panel [A]
  • Select Communities under the heading Portal [B]
  • Select Manage Pages with the Actions button [C]
  • Go to the sub tab Export / Import and select Export (default active) [D]
  • You can there give the .lar a different name or go with the default (here: FirstSteps201209090821.lar).
  • Then hit the export button. Now you can save it in a wanted directory (e.g.: OS folder "Transfer").

Export OEID community pages

 

2 Copy/Save the Endeca Server Index files from the source machine

The default location of the index is: <Install Directory>\Endeca\Server\2.3.0\endeca-server\data. Copy from there the directory with the wanted data store and the associated .worddat file e.g. to the same folder as the .lar, folder: "Transfer".OEID Index files and directories

Now we copy/move the .lar and index files (or the, in the example given folder: "Transfer") to the target machine and do steps 3 to 7.

3 Create a data store on the target machine

The easiest way to create a new data store is to go to the Integrator and re-use the graph "InitDataStore.grf" of the Quickstart project. The safest way (without accidentally breaking the QuickStart) is to create a new project and copy the following components:
graph: "InitDataStore.grf"
meta: "AttachAndCreateDataStore.fmt"
meta: "StarDataStore.fmt"
Then add the following snippet to the workspace.prm of the new project (with your data store name in stead of "FirstSteps", as named on the source machine):

# Configuration parameters for running Endeca Data Store
ENDECA_SERVER_HOST=localhost
ENDECA_SERVER_PORT=7770
DATA_STORE_NAME=FirstSteps

The project will look something like this:
OEID Integrator Simple Project to create Data Store

Now run the graph and a new data store will be created.

4 Copy the the files from step 2 to the target machine location

Stop the Endeca Server on the target machine if running.

Stop and Start Endeca Server

Delete the just created data store index and .worddat file (default location: <Install Directory>\Endeca\Server\2.3.0\endeca-server\data).
Copy the files from step 2 to the (target) Endeca data store location.
Start Endeca Server

5 Create a data source in Studio

The dashboard components are linked via a data source to the data store. So before we can use the dashboards we create the data source as named on the source machine. You probably know how to create one, so only briefly screenshots:

OEID Steps to create a Data Source

6 Create a new community in Studio

To 'host' the pages/dashboards, we create a new community on the target machine. You probably already know how to do that, so briefly:

OEID Steps Create Community

7 Import the dashboard pages (.lar file)

Click on the Actions button of the freshly created community, select Manage Pages [A] , select sub tab Export /Import [B], click on Import [C] and then browse for the copied .lar file from the source machine (created in step 1) [D] and click on Import [E]:

OEID Steps Import Dashboard Pages

 

The transfer is complete.

If you now go back to the Studio start page and go to My Places, you will see the new community with the pages of the source machine.