Importing a MySQL database into Apache Solr Part I

This is a tutorial for getting started with Apache Solr enterprise search with PHP and MySQL on Unix/Linux or Mac OS X.  Most of the information I have found is rather incomplete or full of erros so I wanted to share how I have successfully used Solr in a number of different applications.  I’m going to break this down into three separate posts/sections.

  • Importing a MySQL database into Apache Solr
  • Using PHP with Apache Solr
  • Using the Suggester feature in the new Apache Solr 3.1

Apaches Solr is an enterprise search application that was developed at CNET and is now part of the Apache Lucene project.  It runs in a standalone java servlet contain like Jetty or Tomcat.  Since it runs as a webservice, you access it from any application like PHP or in Rails.  Results can be returned in XML, JSON, or now CSV format.

There are several advantages to using Solr for your search function in you application.  The biggest is speed.  In a ~1 gig table Solr will return a result in under 0.2 seconds versus MySql fulltext search which can take up to 10 seconds (or even longer) depending on the length and complexity of the query.  End users expect fast search results and Solr can deliver that, while allowing a level of control over search that a person would not be able to achieve using Google, i.e. search for posts by user name or product by manufacturer.

Another big advantage with Solr is that it recognizes plurals and slight misspellings, for instance a query with “Arc’teryx” will return results with both “Acrteryx” and “Arc’teryx”.  A description of using other search features like facets can be found at the Solr Homepage.

Where would you use Solr?  Any site that has a lot of content will benefit from using enterprise search; large popular blogs, e-commerce sites with a large product database, and forums/message boards.

Server requirements depend on the size of database but are fairly modest while running queries.  I have been able to run Solr on a VPS with 300 megs or RAM with laser quick results.  However, the database import requires much more RAM, for a 1.5 gig DB it requires at least 2 gigs and preferably 4gig.  I don’t believe it’s necessary to run Solr on a separate server or VPS, just making sure that you setup your cron jobs for import to run in the evening when traffic is low.

Let’s get started!

First, all the examples in this tutorial will be run on locally on OS X under the domain solr.local.  If you don’t already develop on your machine locally I encourage you to do so.  On Mac OS X both MAMP and VirtualHostX are great tools for using a local dev environment.

  1. Start by downloading Solr.  The current version is 3.1.  Place the directory in an appropriate location, ie User/Sites/Solr31.
  2. In the termimal change into the example directory.
  3. Start the Jetty server up by executing in the terminal “java -jar start.jar.
  4. Get familiar with the terminal output for Jetty as this will help you track down errors later in your configuration.
  5. In your browser go to http://solr.local:8983/solr/admin
  6. Go through the Solr tutorial to get yourself familiar with the feature set.
  7. Stop Jetty with Control-C.

Great.  Now we are ready to start getting Solr configured to import your MySQL database.  In this example I will be configuring Solr to import a generic message board table.

  1. Download the MySQL JDBC driver from
  2. Copy the downloaded directory into example/lib
  3. In example/solr/conf create a file called data-config.xml.  Change the sql queries to reflect the database you are indexing.
    1. <dataConfig>
    2. <dataSource type="JdbcDataSource"
    3. driver="com.mysql.jdbc.Driver"
    4. url="jdbc:mysql://localhost/ccubbupgrade"
    5. user="root"
    6. password=""/>
    8. <document name="content">
    9. <entity name="id"
    10. query="SELECT
    11. p.post_id,
    12. p.post_posted_time,
    13. FROM_UNIXTIME(p.post_posted_time) AS post_posted_datetime,
    14. p.user_id,
    15. t.forum_id,
    16. p.post_subject,
    17. p.post_default_body
    18. FROM
    19. POSTS AS p
    20. JOIN
    21. TOPICS AS t
    22. ON p.topic_id = t.topic_id
    23. WHERE
    24. p.POST_IS_APPROVED = '1'
    25. AND t.TOPIC_STATUS != 'M'"
    26. deltaImportQuery="SELECT
    27. p.post_id,
    28. p.post_posted_time,
    29. FROM_UNIXTIME(p.post_posted_time) AS post_posted_datetime,
    30. p.user_id,
    31. t.forum_id,
    32. p.post_subject,
    33. p.post_default_body
    34. FROM
    35. POSTS AS p
    36. JOIN
    37. TOPICS AS t
    38. ON p.topic_id = t.topic_id
    39. WHERE
    40. p.post_id = ${}"
    41. deltaQuery="SELECT
    42. post_id
    43. FROM
    44. POSTS
    45. WHERE
    46. post_posted_time > UNIX_TIMESTAMP('${dataimporter.last_index_time}')">
    47. <field column="POST_ID" name="post_id" />
    48. <field column="POST_POSTED_TIME" name="post_posted_time" />
    49. <field column="USER_ID" name="user_id" />
    50. <field column="FORUM_ID" name="forum_id" />
    51. <field column="POST_SUBJECT" name="post_subject" />
    52. <field column="POST_DEFAULT_BODY" name="post_default_body" />
    53. </entity>
    54. </document>
    55. </dataConfig>
  4. We need to tell Solr that we will be using the JDBC driver.  Open the solrconfig.xml file in the same directory and insert the following code.
    1. <requestHandler name="/dataimport"
    2. class="org.apache.solr.handler.dataimport.DataImportHandler">
    3. <lst name="defaults">
    4. <str name="config">data-config.xml</str>
    5. </lst>
    6. </requestHandler>
  5. Now we need to add some info into the schema.xml file, located in the same directory, to tell Solr more about our table columns and how to conduct a search.
    1. <!-- Field to use to determine and enforce document uniqueness.
    2. Unless this field is marked with required="false", it will be a required field
    3. -->
    4. <uniqueKey>post_id</uniqueKey>
    6. <field name="post_id" type="string" indexed="true" stored="true" required="true"/>
    7. <field name="post_posted_time" type="tint" indexed="true" stored="true"/>
    8. <field name="user_id" type="int" indexed="true" stored="true" />
    9. <field name="forum_id" type="int" indexed="true" stored="true"/>
    10. <field name="post_subject" type="text" indexed="true" stored="true"/>
    11. <field name="post_default_body" type="text" indexed="true" stored="true"/>
    13. <!-- field for the QueryParser to use when an explicit fieldname is absent -->
    14. <defaultSearchField>fulltext</defaultSearchField>
  6. Go ahead and start Jetty up again with “java -jar start.jar” and check for any errors in the terminal.
  7. Now it’s time to do a full index.  In your browser go to http://solr.local:8983/solr/dataimport?command=full-import .  This process will take a while depending on the size of your database.  Check the status of the import with http://solr.local:8983/solr/dataimport .  The response format will look like this for the first little bit “00data-config.xmlidleThis response format is experimental. It is likely to change in the future.”  Eventually it will show you the index progress after it has loaded the database.
  8. After the index process is done you should be able to search your data at http://solr.local:8983/solr/admin .  If not, look at the errors popping up in the terminal.  Results will be in XML format.
  9. To setup delta indexing (ie changes made every hour or interval at you choosing) setup a cronjob to the following url http://solr.local:8983/solr/dataimport?command=delta-import
  10. To run the Solr service in the background or in the background on your server make sure you start it like this “java -jar start.jar &”

In another post I will cover locking down Solr so that it is only accessible by IP address or username/password combo.

This entry was posted in Featured, MySQL, PHP, Solr and tagged , , , . Bookmark the permalink. Post a comment or leave a trackback: Trackback URL.


  1. milestones
    Posted June 6, 2011 at 6:29 am | Permalink

    Awesome! I haven’t tried this tutorial yet but, it seems to be a great resource.

  2. Andy
    Posted June 29, 2011 at 12:26 pm | Permalink

    Great tutorial. Exactly what I was looking for! Looking forward to the next part. If you could give some examples of how to access solr from php, that would be sweet.

    • Jon
      Posted July 7, 2011 at 11:16 am | Permalink

      Thanks. I’ll have it up in a few days.

    • badllama77
      Posted August 12, 2011 at 1:50 pm | Permalink

      There is a set of classes in PHP to handle Solr.

      • Jon
        Posted August 18, 2011 at 11:18 am | Permalink

        The problem with this is there are instances where it might be difficult installing the PECL extension to to run this. In those instances it’s much easier to use solr php client library or call the API directly.

  3. xyz
    Posted July 6, 2011 at 6:53 am | Permalink
    • admin
      Posted July 7, 2011 at 11:15 am | Permalink

      I know maybe I made this a little confusing, but I’m using VirtualHostX and setup a domain for this, so I’m not using localhost.

  4. Posted August 1, 2011 at 5:39 am | Permalink

    @Andy: Just call the API from Solr (HTTP Get)

  5. pankaj
    Posted August 18, 2011 at 7:01 am | Permalink

    while hitting :-http://localhost:8983/solr/dataimport
    i am getting 404 error
    while http://localhost:8983/solr/admin works file..

    • Jon
      Posted August 18, 2011 at 11:23 am | Permalink

      @pankaj which step are you in the installation process? Have you done an import yet? Can you run queries?

      • pankaj
        Posted August 18, 2011 at 10:15 pm | Permalink

        yes i have install apache-solr, after that i’ll follow the steps of above and make change in files like create queries in db-data-config.xml & map table in schema.xml file. then i’ll rin java -jar start.jar. it will start without any error..
        i am getting 404 error
        while http://localhost:8983/solr/admin works file..

        i’ll check in start.jar file there is admin folder, that is why (http://localhost:8983/solr/admin works file..)
        but there is no dataimport folder so how could it run

  6. pankaj
    Posted August 18, 2011 at 11:55 pm | Permalink

    also i m using apache-solr 3.3.0

  7. pankaj
    Posted August 19, 2011 at 1:05 am | Permalink
    • Jon
      Posted August 19, 2011 at 12:41 pm | Permalink

      @pankaj I’m glad you figured it out. It’s possible they have made a change since I made this tutorial on 3.1, or maybe you inadvertently changed something in a configuration file?

  8. Russell
    Posted September 5, 2011 at 8:03 am | Permalink


    This looks brilliant and exactly what I am looking for. I am thinking of moving away from Sphinx for PHP and using Solr so this is very interesting.

    Do you know when you will be posting the next part(s) of your series? I am interested in the PHP integration side of things.

    Thanks very much, Russell

    • Jon
      Posted September 7, 2011 at 1:20 pm | Permalink

      @Russell, thanks for the compliments. I’m working on the follow up article at the moment. Should be up in a couple of days. I apologize on the delay to everyone :)

      • gino
        Posted March 21, 2012 at 7:19 am | Permalink

        I’m not that familiar with java and pretty new to using solr,
        is the .jar in the examples dir the one i should be using on my production server or should i install it somehow?
        I just want solr to run on startup!
        and last is there a follow-up for php already?

        tnx for the tutorial, very usefull and clear.

  9. Cynthia
    Posted September 29, 2011 at 8:15 am | Permalink

    What happens if i have a data model with more that one table? For example if i need to do search by customers and by products? Do i need to use multicore for that? One table per Solr instance? How is the performance?
    Thanks a lot in advance,

  10. Anand
    Posted September 30, 2011 at 3:49 pm | Permalink

    Hello Jon,

    Nice work I appreciate this one it very easy to follow these steps. I am very new to Solr. I have installed Apache Solr-3.4.0 on the linux server and works perfect. but How can I import my MySQL data and how can I do indexing on those tables. can you please describe that in details. Also I am not able to figure it out that where can I find the log file for this Solr.

  11. pravin
    Posted February 7, 2012 at 3:32 am | Permalink

    plz give me video of this

  12. pravin
    Posted February 20, 2012 at 1:41 am | Permalink

    I have successfully imported data to solr.
    when i say full import it shows appropriate no of rows fetched.
    but when i search any thing related to data from fetched rows in solr admin it does not finds any result.
    can any one help me out i want to search the data from database using solr admin

  13. Keith
    Posted May 11, 2012 at 4:37 pm | Permalink

    Did I miss Parts II & III? I can’t seem to find them. Thanks so much!

  14. Posted September 3, 2012 at 11:27 pm | Permalink

    Please give database SQL file.. containing create and insert statements…
    Thanks a lot

  15. Sumesh SG
    Posted September 5, 2012 at 3:42 am | Permalink


  16. Nitin Khilwani
    Posted September 17, 2012 at 12:06 am | Permalink

    Good Post exactly what i need i get it . u guy’s are Awesome always helping to other’s
    I also waiting for u r next post.

  17. Jameel
    Posted October 8, 2012 at 9:49 am | Permalink

    This is really helpful…. I was not able to find this much clear and detailed article on this. could u please upload the other parts….? Thanks

  18. me
    Posted October 16, 2012 at 4:02 am | Permalink

    doenst work for me :(


    i just get HTTP ERROR 404

    some ideas?

  19. neha
    Posted November 27, 2012 at 4:28 am | Permalink

    I am successfully imported data to solr.
    but when i search any thing related to data from fetched rows in solr admin it does not finds any result.
    can any one help me out i want to search the data from database using solr admin.

One Trackback

  1. [...] we can write an sql query to populate that and index it directly. Ref. for steps mentioned below: 1. Solr runs on java so we will use a mysql-java-connector from [...]

Post a Comment

Your email is never published nor shared. Required fields are marked *


You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>