VIDEO: Kibana 3 Dashboard – 3 Use Cases Demonstrated

Read More
Kibana 3 Dashboard: All Events

Kibana dashboards, from the Elasticsearch project, can help you visualise activity and incidents in log files. Here I show 3 different types of use cases for dashboards and how each can be used to answer different questions depending on the person.  Video and details follow.

Text Search Dashboard

The first example is the simplest UI I could imagine: a query/search box, a histogram, and a table.  In this instance any user, at any level of curiosity, can find textual data in the logs using a keyword match.

Kibana 3 Dashboard: Text Search

Then they can also see the relative number of records that occur at given times within the time window of all the data available.  These are aggregate counts of all records that have some match to the query keyword or algebra.

Likewise, the table reflects the subset of data provided by the records, with the ability to only show fields of interest.

Process Details

A slightly more advanced use is to focus on a particular process (i.e. application) running on a machine that’s being logged.  Here we can then take a particular metric, i.e. CPU usage, and graph it instead of just a simple histogram.

Kibana 3 Dashboard: Process Details

A typical user may be in charge of a particular set of services in a system.  Here they can see how they perform and yet still dig into the details as desired.

I also do some cool “markers” to subtly show when events coincide with other process metrics.

All Events

The data example shown here has process, performance and event logging information.  I combine multiple queries and having them drive different parts of the dashboard – a pie chart, summary table, histogram, sparkline and other charts based on numeric data.

Kibana 3 Dashboard: All Events

These can then all be filtered based on time windows that are interactively selected.  This is really the typical picture of a dashboard – giving more densely packed information about a variety of metrics, ideal for system managers to get a handle on things.

Streaming Pipeline

The data are generated by Windows servers using a custom C# application that pushes data in a Kafka topic in a Hadoop cluster running in EC2. The data stream is then read from the topic using Actian DataFlow platform and pushed into Elasticsearch for Kibana to use at the end of the pipeline.  There are other reasons I have this kind of pipeline – namely that DataFlow can simultaneously also feed other outgoing part of the pipeline – RDBMS, graph, etc.   More on that in a future video/post.

Next Steps

  • My next plans are to show you Kibana version 4 in action, replicating some of what I’ve shown here.
  • If you haven’t seen it already, see this link and my video with some tips and tricks for using Kibana 3.
  • Tell me more about your interests in dashboards and I’ll consider focusing on them too.

Google wants “mobile-friendly” – fix your WordPress site

Read More
Google mobile check fixed success

TheNextWeb reports: “Google will begin ranking mobile-friendly sites higher starting April 21“.  It’s always nice having advance warning, so use it wisely – here’s how to tweak WordPress to increase your mobile-friendliness.

Google Mobile-Friendly Check

I use a self hosted WordPress site and wanted to make sure it was ready for action.  I already thought it was, because I’ve accessed in on a mobile device very often and it worked okay.

I even went onto the Google Web Admin tools and the mobile usability check said things were fine.

Google admin tool says mobile check is okay

However, all was not golden when I ran the Google mobile-friendly checker.  (Obviously two different apps here, hopefully those will merge.)

Google mobile check failure

Try it here, now!  The complaints about were that some content is wider than screens and that links were too close together.  Fair enough.

WordPress Mobile-Friendly Activation

If you’re not already using WordPress’s Jetpack features, you’re really missing out.  I use it mostly for monitoring stats but there are several other features that make it very useful, including one called Mobile Theme.

  1. From the admin sidebar select Jetpack (install it first if not already enabled).  It will show you some suggested plugins to enable, plus show you a search bar to find others.
  2. Enter “mobile” and click on the Mobile Theme item.
  3. Activate it (lower-right corner button).
  4. And you’re done!

Going back to Google’s checker it shows a different preview now and also says things are fine.  Looking at the site after making these changes, it’s obviously better.

Google mobile check fixed success

However, I still have one plugin (Crayon markup) that helps display code samples that seems to force some posts to wider than the screen.  I assume the plugin creators will fix that up, but it’s not too bad at this point.  Unless Google complains, it doesn’t matter anyway!

iPhone cable – loose connection?

iPhone cable – mysterious loose connection bothering you?  Before buying a new gold plated cord or adapter, clean out the port with a toothpick.  You will be amazed!

Kafka Consumer – Simple Python Script and Tips

Read More
Screenshot from Hortonworks site describing how Kafka works
Screenshot from Hortonworks site describing how Kafka works

When you’re pushing data into a Kafka topic, it’s always helpful to monitor the traffic using a simple Kafka consumer script.  Here’s a simple script I’ve been using that subscribes to a given topic and outputs the results.  It depends on the kafka-python module and takes a single argument for the topic name.  Modify the script to point to the right server IP.

Max Buffer Size

There are two lines I wanted to focus on in particular.  The first is the “max_buffer_size” setting:

When subscribing to a topic with a high level of messages that have not been received before, the consumer/client can max out and fail.  Setting an infinite buffer size (zero) allows it to take everything that is available.

If you kill and restart the script it will continue where it last left off, at the last offset that was received.  This is pretty cool but in some environments it has some trouble, so I changed the default by adding another line.

Offset Out of Range Error

As I regularly kill the servers running Kafka and the producers feeding it (yes, just for fun), things sometimes go a bit crazy, not entirely sure why but I got the error:

To fix it I added the “seek” setting:

If you set it to (0,0) it will restart scanning from the first message.  Setting it to (0,2) allows it to start from the most recent offset – so letting you tap back into the stream at the latest moment.

Removing this line forces it back to the context mentioned earlier, where it will pick up from the last message it previously received.  But if/when that gets broke, then you’ll want to have a line like this to save the day.


For more about Kafka on Hadoop – see Hortonworks excellent overview page from which the screenshot above is taken.

Web Mapping Illustrated – 10 year celebration giveaway [ENDED!]

Read More
web-mapping-tyler-mitchell-large
My O'Reilly, 2005 book
web-mapping-tyler-mitchell-large
My O’Reilly, 2005 book

Update: All copies are gone!  If you want Geospatial Desktop or Geospatial Power Tools – go to LocatePress.com – quantity discounts available.  For Web Mapping Illustrated go to Amazon.


 

I’m giving away a couple copies of my circa 2005 classic book.  Details below…  When O’Reilly published Web Mapping Illustrated – Using Open Source GIS Toolkits – nothing like it existed on the market.  It was a gamble but worked out well in the end.

Primarily focused on MapServer, GDAL/OGR and PostGIS, it is a how-to guide for those building web apps that included maps.  That’s right, you couldn’t just use somebody else’s maps all the time – us geographers needed jobs, after all.

To help give you the context of the times, a couple months before the final print date, Google Maps, was released.  I blithely added a reference to their site just in case it became popular.

The book is still selling today and though I haven’t reviewed it in a while, I do believe many of the concepts are still as valid as when it was written.  In fact, it’s even easier to install and configure the apps now due to packaging and distribution options that didn’t exist back then.  Note this was also a year before OSGeo.org’s collaborative efforts started to help popularise the tools further.

In celebration of 10 years of sales I have a couple autographed copies as giveaways to the first two people who don’t mind paying only for the shipping (about USD$8) and who drop me a note expressing their interest.

Additionally, I have some of Gary Sherman’s excellent Geospatial Desktop books as giveaways as well.  Same deal, pay actual shipping cost only from my remote hut in northern Canada.  Just let me know you’d like one of them and I’ll email you the PayPal details.  Sorry, not autographed by Gary, though I was editor and publisher, so could scribble on it for you if desired.

Neo4j Cypher Query for Graph Density Analysis

Read More
Graph density calculation with Neo4j

Graph analysis is all about finding relationships. In this post I show how to compute graph density (a ratio of how well connected relationships in a graph are) using a Cypher query with Neo4j. This is a follow up to the earlier post: SPARQL Query for Graph Density Analysis.

Installing Neo4j Graph Database

In this example we launch Neo4j and enter Cypher commands into the web console.

  1. Neo4j is a java application and requires OracleJDK 7 or OpenJDK 7 to be on your system.
  2. Download the Community Edition of Neo4j.  In my case I grabbed the Unix .tar.gz file and unzipped the files.   Install on Windows may vary.
  3. From command line and within the newlycreatedneo4j-community folder, start the database:
  4. Use web console at: http://localhost:7474 – the top line of the page is a window for entering Cypher commands.
  5. Load sample CSV data using a Cypher command – I cover this in a separate post here.  Be sure the path to the file is the same on your system, matching where you saved the CSV files to.

Quick Visualization

Using the built-in Neo4j graph browser, you can easily see all your relationships as nodes and edges.  For a query, return results that include all the objects:

Friends graph sample viewed in Neo4j

Compute Graph Density

Graph Density requires total number of nodes and total number of relationships/edges.  We do them both separately, then pull them together at the end.

Compute the number of unique nodes in the graph

This tells us that there are 21 people as Subjects in the graph.  (I’m not sure how this differed from the 20 I had in my other post – perhaps part of the header from the CSV came in?)

Therefore, a maximum number of edges between all people would be 21² (we’ll use only 21×20 as a person might not link to themselves in this example).

Compute the number of edges in the graph

Here we only select subjects that are “Person” and only where they have a relationship called “HAS_FRIEND”.

(The “p:”, “r:” and “f:” prefixes act like the “?…” variables references in SPARQL – you set them to whatever you want as a pointer to the data returned from the MATCH statement.)

 When we defined the data from CSV, we set up a relationship that I thought would be just one-way.  But you can see, if you don’t provide the DISTINCT keyword, that you’ll get double the record counts, so I’m assuming it’s treating the relationships as bi-directional.

Total edges is 57.  Do the quick math and see that the ratio is then:

Rolling it all together

We can do all this in a single query and get some nice readable results, even though the query looks a bit long.  Note that I snuck in an extra “-1″ to account for that stray record I didn’t account for.

Challenge: Why did I get different results than in the SPARQL query example?

In the earlier post I had only 20 nodes, but in this one got 21.  Can you explain why?

Future Post

What’s your favourite graph analytic?  Let me know and I’ll try it out in the future post.

One of things I have planned is to do some further comparisons between graph analytics in SPARQLverse and other product like Neo4j but with large amounts of data instead of these tiny examples.  What different results would you expect?

Code snippet: SPARQL Query Graph Density

Code snippet: SPARQL Query Graph Density

I’m testing out sharing SPARQL code snippets using Github Gist features. I’ll be adding more as I work through more graph-specific examples using SPARQLverse, but here is my first one:

Ideally we’d have a common landing place for building up a library of these kinds of examples.

Graph relations in Neo4j – simple load example

Read More
Graph of first load test of Neo4j
Basic load example of a handful of relationships.

In preparation for a post about doing graph analytics in Neo4j (paralleling SPARQLverse from this earlier post), I had to learn to load text/CSV data into Neo.  This post just shows the steps I took to load nodes and then establish edges/relationships in the database.

My head hurt trying to find a simple example of loading the data I had used in my earlier example but this was because I was new to the Cypher language.  I was getting really hung up on previewing the data in the Neo4j visualiser and finding that all my nodes had only ID numbers was really confusing me.  I had thought it wasn’t loading my name properties or something when it was really just a visualisation setting (more on that another time).  Anyway, enough distractions…

Graph Data File – Simple Graph Relations Example

I took my earlier sample data and dumbed it down to fit the normal paradigm of Neo4j – separate nodes and edges load files.  I appreciated working with triples before as I didn’t have to pre-load all the nodes first, but that’s also a story for another day.

First, the nodes file looked like the following.  Note, I thought I had to add the ID though I didn’t end up using it after all:

The second file was simply a list of “source” and “target” names – the graph relations – where the first person had the second person for a friend.  (We handle them as unidirectional in this example.)

Graph of first load test of Neo4j
Basic load example of a handful of relationships.

Loading CSV Relationships into Neo4j

To get the data into Neo4j I had to run two commands.  But first I run a sort of “delete all” as I was doing lots of testing:

Then load all the nodes, assigning each one to a Person entity and grabbing only the name property from the CSV:

And finally, load the edges/relationships to map persons -> to persons via a has_friend relationship:

The resulting load will say something like:

Created 57 relationships, returned 0 rows in 46 ms

More on exploring and analysing this in a future post.  Tweet it or comment if you are interested in more along this line.  Thanks for reading!

Geospatial Power Tools Reviews [Book]

Thinking of buying my latest book?  We’ve finally got a few reviews on Amazon that might help you decide.  See my other post for more about the book.  Buy the PDF on Locate Press.com.

Reader Reviews

Geospatial Power Tools book cover
From Amazon.com

5.0 out of 5 stars This book makes a great reference manual for using GDAL/OGR suite of command line …,
January 24, 2015 By Leo Hsu
“The GDAL Toolkit is chuckful of ETL commandline tools for working with 100s of spatial (and not so spatial data sources). Sadly the GDAL website only provides the basic API command switches with very few examples to get a user going with. I was really excited when this book was announced and purchased as soon as it came out. This book makes a great reference manual for using GDAL/OGR suite of command line utilities.
Several chapters are devoted to each commandline tool, explaining what its for, the switches it has, and several examples of how to use each one. You’ll learn how to work with both vector/(basic data no vector) data sources and how to convert from one vector format to another. You’ll also learn how to work with raster data and how to transform from one raster data source to another as well as various operations you can perform on these.”

 

“This is a great book for any GIS pro to get their hands dirty and learn how to use spatial tools to do some amazing things. The benefit of using the tools covered in this book (GDAL/OGR) is that you could learn to integrate these tools into other workflows, such as downloading the latest raster data for your organization on weekly basis and running the necessary tools against it.
The first part of the book is filled with some great examples to get you familiar with using these tools and the rest of the book covers the documentation of these tools in detail. This should be on every GIS pros bookshelf.”

 

Amazon UK:

5.0 out of 5 stars Absolute Must Have Reference.
14 Feb. 2015 By SamFranklin
“I have been a GDAL/OGR user for about 3 years and have always struggled to find readable, entry-level GDAL documentation that provides simple examples which are well explained. This I feel this is a barrier to greater adoption of GDAL/OGR.
This book 100% fills this gap and is therefore required reading for GDAL novices and veterans alike. The book is split between “Common Tasks & Workflows” and then a comprehensive guide to all utilities. I read it cover to cover and picked a number of ideas and solutions to problems with existing workflows involving GDAL, for that alone, it’s easily worth buying. The first chapter provides links to sample data which is used throughout the book. This allows the reader to replicate every example which is great for learning trickier and lesser known utilities.Cannot recommend highly enough. Great job.”
Thanks for reading, reviewing and for the encouraging comments guys.

Follow me on Twitter and follow Locate Press too for latest sales and deals!  Want to write a book too?  Contact me to discuss.

 

Graph analytics – the new super power

Graph analytics – is it just hype or is it technology that has come of age?  Mike Hoskins, CTO of Actian sums it up well in this article from InfoWorld:
Mike Hoskins writes about graph analytics and how it is a game changer for finding unknown unknowns
“One area where graph analytics particularly earns its stripes is in data discovery. While most of the discussion around big data has centered on how to answer a particular question or achieve a specific outcome, graph analytics enables us, in many cases, to discover the “unknown unknowns” — to see patterns in the data when we don’t know the right question to ask in the first place.”

Read Mike’s full article at:
InfoWorld

In the remainder of this post I outline a few more of my thoughts on this topic and give you pointers to some more resources to help you understand what to do next.

Finding the unknown unknowns?

You are already familiar with the idea of writing SQL queries to extract data from a database.  Perhaps more recently you even remember having to wrap your head around how to consolidate data from a distributed data store?

But that will only answer your questions.

Say again?  Of course you want answers to your questions, but what about discovering new insights via relationships you didn’t even known existed in your data?  Hence, extracting data with a question you did not anticipate.  For example, you may be looking for edge cases or anomalies that can only be seen when analysing multiple “hops” of a relationships in a network (e.g. friends of friends who like the same music).

Until you try some graph analytics or think through a related problem, you probably won’t really understand the power in these ideas.  In many ways our SQL experience has taught us to not ask some of the harder questions because they weren’t possible to answer with SQL!  I’m speaking from my own personal experience on this and how challenged I was to break out of my previous paradigm.

When SQL is a hammer…

…every data problem that can fit in a table looks like a nail.  However, when a single table is full of a myriad of complex relationships, that’s when things start to get really tricky.  If that data should end up on your Data Analyst’s desk, just wait for the fireworks. That is, unless the DA has graph-like querying tools available.

Let’s look at an example, the many-to-many relationships table.  The table in SQL would just be two columns with names in them, representing a relationship between two people: i.e. Harry, Sally.  Those from the graph world see them as two special subjects with a relationship, not two pieces of text in a table.

These two subjects may have a bidirectional relationship too.  So when doing RDF querying or graph analytics, you will often be building queries to find both relationships: Harry->Knows->Sally and Sally->Knows->Harry.

You can emulate this in SQL with a self join or a second query but don’t bother unless your dataset is really simple and your questions are very superficial as you’ll only get one degree of relationships for each join – then you’ll be tempted to do 20 joins to emulate what can be much easier in a graph analysis engine.

You need graph analytics in the future

“You don’t know my business!”  Yes, but we know what the common data handling needs are for most organisations.  Without fail, companies, governments and non-profits alike all deal with relationship-laden data:

  •  Customers “know” other customers – who are the key influencers in your buying community?
  • Donors support certain kinds of causes and share about them in social media – what other causes are gaining more traction than yours with similar types of donors?
  • Patients have various diagnosis that relate to other patients with similar issues – how can we predict better outcomes if certain factors of care are modified?

How are you planning to deal with such complexity in a proactive way?  Your competitors (and all the Internet titans) are already using it in some way (likely to sell to you better), so it is not going away any time soon.

If you’re just getting started in understanding graph analytics, may I recommend my earlier blog post which is a mini HOWTO using the SPARQLverse analytic engine (from SPARQLcity).  Install it in a couple minutes, push some simple data in and get your hands dirty with the ultra simple example there.

Also, if you don’t know some of the differences between RDF, graph data, SQL and SPARQL, it’s worth digging into.  I’ll leave you with a great article by Robin Bloor on the subject.  He clears the air very well in this article from Inside Analysis, here is a snippet but you should really read the whole thing as it’s very insightful:

“Where the RDF databases really score is when you want to do set processing (a la SQL) at the same time that you want to do graph processing. Consider a query such as “Who are the biggest influencers on Twitter over the past six months?”

Both the RDF and Graph database would handle such a query and return the same results quickly. But if you ask the very different question, “Which influencers have had the same pattern of influence on Twitter over the last six months?” you are asking both for graph processing and set processing at the same time to get to the answer, and the RDF databases do both well. Not only that, but this is an area of analytics, which was virtually untapped until recently, because there was no software that could easily do it.”

Follow me @1tylermitchell for further discussion or see my links in the sidebar.