Genealogy Blog

450+ New Record Indexes Added in One Day

07 Mar 2014

Our pursuit to bring all of the world’s historical content online free forever is growing even stronger. Today alone, we are adding more than 450 new record indexes in addition to the 1,000 databases that we usually launch every day. These indexes include 25 million names from a variety of countries including the United States, Canada, Australia, and New Zealand.

Start Making Discoveries Now

View New Collections

New Must-Search Collections

Screen Shot 2014-03-06 at 12.58.59 PM

Screen Shot 2014-03-06 at 1.01.54 PM

Screen Shot 2014-03-06 at 1.02.11 PM

 

Screen Shot 2014-03-07 at 9.18.00 AM

Screen Shot 2014-03-06 at 1.02.18 PM

Since October, we have added more than 150,000 databases to our existing collection, revealed revolutionary technologies (including handwriting recognition, transcription, and photo detection tools) and worked with organizations to digitize offline collections – all with the ambition to help our community make more discoveries than ever.

Join our revolution and upgrade to Mocavo Gold knowing that every dollar you spend supports our mission to bring all of the world’s historical content online for free. With Mocavo Gold, you enjoy exclusive access to advanced and automated tools that will help you make discoveries faster.

 

Hoosier Daddy? An Old Fashioned Soap Opera to Inspire Your Blog

04 Mar 2014

I frequently encourage genealogists to share their family history research with family members and others. In addition to sharing your research findings, people can learn about research techniques and resources from each other. Journals and popular genealogy magazines are a wonderful place to do this.

Blogs are another way to keep your family and friends in the information loop. One of the best ways to figure out what you want to do with your blog, and how you want to write is to look at blogs you enjoy reading. Take hints from them, and incorporate them into your own style.

Last fall you might have seen my Fireside Chat with my friend Michael Lacopo, a professional genealogist from Indiana. Michael recently made some interesting discoveries in his research. He decided to share his story with others, so he started a blog.

From the get-go one can see Michael’s sense of humor. He titled his blog Hoosier Daddy? His mother was adopted as an infant in1947. Her adoptive parents were always up front with her about the adoption, and told her what they knew about his birth parents. As he got older and started his genealogical research, Michael’s interest was piqued. He wanted to find his mother’s birth parents for her. Thus the name, Hoosier Daddy?

The blog starts with the beginning of his search in the 1980s. He writes in a conversational style. No scholarly discussion here. Yet he is still clear in his writing, explaining the history, and taking the reader not only through his research process, but his thought process as well. Reading through the posts, one feels a part of the story.

We often hear about putting the family in historical context. Michael easily brings us into the story, making us care about the people without making up stories. Take the following passage about his great-grandparents from his post Grandma, Part I:

 

VolneyDobyns

 

“The marriage between Volney and Gracie Mae was apparently a monumental mismatch. Married just three months before the birth of their first daughter, Clara Belle, in 1909, it was probably a union that neither entered into with great joy. He was twenty-six years old, an athlete, a musician and a notorious ladies man. She was nineteen years old and pregnant. Volney had an eye for the young girls, and “he had a horse which he rode all over the country courting several girls at the same time.” Granted, when Volney met eighteen-year-old Gracie she shared her home with six sisters, but she was the one who caught his eye, because after all “she was considered to be the smartest and prettiest of the Hanks girls.” Unfortunately, marriage and family did nothing to change Volney’s ways, and everything to change Gracie’s.”

Reading Michael’s posts is like following an old-fashioned soap opera, or other well-crafted television show. You just can’t wait for the next installment to arrive!. Read Hoosier Daddy, but start at the very beginning (a very good place to start). And hopefully you will get some inspiration to start writing your own family’s story.

 

The Mean, Lean, Green Mocavo Machine

28 Feb 2014

Here’s a riddle for you: What runs on clean burning natural gas, is cooled by ice cold mountain air, has 99.9% reliability, and processes 40 teraflops per second? Why it’s nothing less than Mocavo’s primary datacenter.

Mean

Reading the world’s genealogical records one at a time and making them searchable is no small feat. It requires a fine tuned infrastructure with plenty of processing power, storage, and redundancy.

With over 500 multi-core Dell Datacenter grade servers under the hood we have the ability to perform OCR on over 1 million documents per day. In fact, we’re in final stages of re-engineering our OCR process to increase that number to over 5 million, all without affecting the performance of the website whatsoever!

The processed documents have to go somewhere, and we’re pleased to announce that we have increased our storage capacity to over 1 Petabyte! That’s a lot of spinning platters, check out below how we keep them all spinning!

What good is all that power and processed documents if there is a fire, flood, or zombie apocalypse that destroys it in one fell swoop? We have an off-site datacenter connected via a 10Gb dedicated fiber link that keeps all of our (and your) precious records safe and available instantly for recovery. We like being able to sleep at night, the backup cluster makes that possible.

Lean

The most expensive part of running a datacenter isn’t power or cooling, it’s the labor to keep it running all the time. When you’re working with 500 servers, seconds count. Even spending just 30 seconds per server puts you over 4 hours in labor. Out in the wild you’ll find the server to administrator ratio ranges from about 15:1 to 100:1. So for 500 physical machines, what do we consider lean? Try 500:1, which is plenty- if you have the right tools.

Enter Puppet, Icinga, and Fabric.

Puppet is enterprise level configuration management. It works seamlessly in our DevOps workflow. Every 30 minutes every single physical server in our datacenter checks in with the puppet-master asking for updates. Last week I added a new subnet and needed to add a route to about 100 machines. So I opened up our nodes.pp file and added this:


exec { "route add -net 10.10.108.0/22 gw 10.10.109.2 dev eth0":
   unless => "route | grep 10.10.108",
}

So in 5 minutes I added a static route to 100 machines. That equates to about 3 seconds per machine. Not too bad, but I can do better.

Lets say I wanted to add that route to all 500 machines, and I couldn’t wait for the half hour puppet update. Let’s get Fabric involved.

Fabric is a python module that sends pre-defined (or on the fly) commands over SSH to hosts in hostgroups or roles. In my fabfile.py I already have a function to restart puppet:


@parallel
def kick_puppet():
    sudo('service puppet restart')

So after I add the route in puppet, I’ll restart the puppet client on all machines with fab -R All_Machines kick_puppet I have now touched 500 machines in less than 6 minutes, which takes me to less than a second per machine. I’m sure you see where this is going… but you can’t automate everything, can you? What if you have to reinstall a server from scratch?

In case of a corrupted OS drive, or a new server that has never been on the network, (re)building from scratch is quick and easy. Power on the server, press F12 to boot from the network, and PXE takes over. The OS gets installed, it reboots, then puppet takes it from vanilla OS to production ready, all without being touched again. One touch installs, try it. You’ll be glad you did.

I suppose there are a few things I’ll never be able to automate, like changing out a hard drive or a bad stick of ram. I don’t have time to run tests on each machine to see how it’s doing, but Icinga has 24 hours each day to do just that, and it never gets bored or tired of it.

Icinga is a fork of Nagios, and right now it makes over 3000 individual checks for us every 10 minutes, and it’s not even breaking a sweat. We use puppet to automate the creation of the checks, and Icinga will holler when a hard drive fails, puppet stops running, a web server stalls, or a machine becomes unresponsive. It can even perform actions based on an alert through a handler (like restarting puppet if it’s not running or rebooting the unresponsive machine)

So on the occasion that we must physically touch a machine, Icinga narrows it down for us so we can get in and get out, because contrary to what you see in the movies, datacenters are LOUD and generally uncomfortable to work in for long periods of time.

Green

Mocavo is concerned with being efficient and taking care of our natural resources. Often those two goals work very well together, here are some initiatives we have at Mocavo to lower our footprint while providing an excellent product:

The power we use here at the datacenter comes from clean burning natural gas, which we like because it’s less expensive and better for the environment.

We don’t run redundant power supplies on each server and instead rely on a redundant infrastructure. The load is distributed so if a server drops out, the application can continue to run smoothly until it can be repaired.

We run the datacenter at a balmy 82° F. With adequate airflow for heat removal, our equipment runs comfortably when warm, saving energy from cooling. To give us extra heat ballast for thermal load changes and to prevent static build up, we run a humidifier to boost the ambient humidity above 30%.

We’ve engineered a free-cooling air exchanger to make use of the cold and arid mountain air to cool the datacenter. When running at capacity it saves 4 tons (14kW) of cooling, which annually saves 75 tons of CO2 from the atmosphere, bringing our PUE down to around 1.27. According to the Uptime Institute’s 2012 Data Center Survey, our PUE is 32% less than the respondents’ largest data centers that average between 1.8 and 1.89 and is quickly approaching Google’s internal datacenter PUE of 1.12.

Technology makes genealogy possible, in a lean, mean, green, BIG way!

News Stories and Blog Posts for Genealogists, February 28, 2014

28 Feb 2014

This week’s roundup includes news stories, blog posts for genealogists, and a podcast. You can read about an eighteenth-century biracial woman who was the founder of a large family, how prisoners are helping genealogy, caveats about using vital records, the origins of some fraternal organizations, and interviews conducted at RootsTech.  I hope you find them as interesting and informative as I do.

Dominique Bass of Kings Mountain, North Carolina has spent a great deal of time researching the Brooks Family History. He combed many records, including court documents, to piece together the story of Sarah Brooks. Sarah’s mother, Elizabeth, was a white woman from Ireland whose relationship with a black man in Baltimore resulted in Sarah’s birth. Elizabeth gave Sarah to another family to raise, with the understanding that she would be free upon adulthood. This didn’t happen, and Sarah was transported to North Carolina. She was eventually freed and became the matriarch of a huge family that includes Arthur Ashe, the first African-American tennis player to be ranked number one in the world. Read the full story in Family Ties: Man Traces Family History 200 Years to a Slave.

Utah prisons are participating in an interesting program with the LDS church. Church volunteers are teaching inmates how to index records through the FamilySearch indexing system. In 2013, almost 175,000 names were indexed. This year they expect to have about 2 million names indexed. Read about the experience of the Davis County jail in Davis County Inmates Keep Busy with Genealogy Work.

Randy Seaver had an interesting post this week about recordkeeping. A reader had forwarded an 1842 death record from Massachusetts. At the bottom of the record the town clerk went on a rant  about the difficulty in registering records. Especially difficult in this instance was that this was the town of Southbridge, which lies on the border with Connecticut. The clerk estimated that as many as 33% of couples went to Connecticut to be married, making his job much more difficult. Randy sums up this and several other lessons to be learned from the rant in Dear Randy: Check Out This Town Clerk’s Lament.

 

The Week Elks Shriners Masons

 

In the 1860s, a group of actors and entertainers created The Jolly Corks, a group to help them be able to escape paying higher taxes for drinking on Sundays in New York. In 1868, they decided to become a formal fraternal order focused on benevolence and charity and chose a new name. An interesting post in The Week discussed the origins of this and several other fraternal organizations. Find out the current, well-known name of The Jolly Corks, and other origins in Elks, Shriners, and Masons: How ‘Old Man’ Frats Got Their Names and Symbols.

Finally this week we have the Genealogy Guys Podcast. During the RootsTech conference a few weeks ago, Drew Smith recorded a number of interviews. This week the podcast features the first group of interviews. Drew’s guests this week are Dennis Brimhall, the CEO of FamilySearch; Ed Thompson, developer of Evidentia; and Michael J. Leclerc, Mocavo’s Chief Genealogist. I had a wonderful time chatting with Drew, and you can enjoy our discussion as well as the others in The Genealogy Guys Podcast #261 – 2014 February 23.

Will Genetic Modifications Make Today’s Genealogy a Thing of the Past?

27 Feb 2014

DNA Helix

Genealogists have many conventions that have been created over time, and founded on bedrock principles. We establish parental and other familial relationships using records and other evidence. Our numbering systems clearly communicate these relationships. Our pedigree charts visually show the relationships between children and parents, tracing the generations back in time.

Over the last few decades, we have had to adapt to many changes. Cultural changes, such as adoptions have caused us to look at the way we trace our ancestry. Do we follow the birth lines, or the adoptive lines, or both? Blended families include many step- and half-sibling relationships. Same-sex couples are raising children. Sperm and egg donors provide new challenges: trace the parents of record, the genetic parents, or both?

Perhaps nothing has made such a difference as the introduction of DNA testing to genealogy. Because DNA is inherited, relatively unchanged, from our parents, and their parents, etc., it has made a great difference in research. It has not only provided evidence that allows us to identify many unknown ancestors, it has also proven that many people are not descended from individuals they thought were their ancestors.

But just as we are starting to learn how we can use  DNA properly, medicine is poised to challenge everything we do, and how we do it.

The news this week told us about researchers from the New York Stem Cell Foundation, Oregon Health and Science University, and England’s Newcastle University, who are petitioning the Food and Drug Administration to allow new procedures.

Some women cannot bear children because of problems with their mitochondrial DNA. These new procedures would allow doctors to replace the mtDNA in a woman’s eggs with that from a donor. The rest of the DNA in the egg would remain. Scientists have successfully done this procedure with animals, and are now looking to test the procedure on humans. Setting aside the moral and health issues for the moment, let’s just look at how this would impact genealogy.

The foundations of genealogy are predicated on the fact that biologically, we have one male parent and one female parent. Societal changes have caused us to challenge tracing multiple branches (adoptive parents, biological parents, etc), but we usually do this one set at a time. We create one pedigree for adoptive parents, one for the biological parents, etc. We are now, however, looking at a future where a child might have three biological parents.

And what would constitute a lineage? Will mtDNA be considered the “real” umbilical” line, or will it be the autosomal DNA? What will the birth certficiate look like? Are we now going to be looking at “non-maternal events” in the future? After all, this procedure would not affect just a single individual. The DNA will be passed down from generation to generation.

You can read more about the story from NPR’s Scientists Question Safety of Genetically Altering Human Eggs, and another explanation from CNN’s Dr. Sanjay Gupta’s story: What to Make of ‘Designer Babies.‘ Fortunately, the FDA panel has grave concerns about “designer babies” and other moral and ethical issues. But it won’t be long before there will be more and more pressure for events like this. And how will we as genealogists address these implications?

Photo Detection in Historical Documents

27 Feb 2014

We have continued to improve our handwriting detection and recognition tools. In doing so, we stumbled upon another exciting new feature that we think will help change the way people learn about their family history. We are excited to share that we have developed the ability to very easily extract pictures, photographs and other images from our historical books. It’s not exactly like stumbling upon penicillin, but we were pleasantly surprised at how perfectly we are able to identify these images!

Notice the red outline in the examples below -

29-thumb_709-601x1024

31-thumb_709-601x1024

33-605x1024

The next step for us will be to not only extract the image, but to also read the associated caption to enable our community members to search for information about the image. In the vast majority of cases, the caption describing the image is relatively easy for our search engine to identify for the following reasons:

  • its proximity to the image
  • additional whitespace around the block of text
  • the caption may also have different type characteristics from the page content (font size, weight, casing, etc)

What is particularly exciting about this discovery is that when we put the finishing touches on this technology, we’ll be able to add Image-specific search capabilities to Mocavo. This development will open up a whole new realm of exciting discoveries for our community. Stay tuned!

Friday, March 28th Update:

Now, not only can we identify the image, but we can also read the associated caption, opening up an entirely new realm of exciting possibilities for our community. Using this technology, we’ve found some delightful, surprising, and sometimes hilarious historical images and we couldn’t wait to give you a peek at what we’ve found. We hope you’ll have as much fun as we have finding the gems hidden in these books, so take your time browsing through the pages, or enter a keyword in to the search box to find captions that match that term. It’s not yet 100% polished, and we currently have limited results, but we thought you might get a kick out of playing around with it.

Test Drive the Photo Detection Tool Now

Coming Soon: Online Transcription from Mocavo

26 Feb 2014

Everyday at Mocavo we’re looking for new opportunities to bring more of the world’s historical content online for free, forever. We are excited to share a new service that will be launching soon – our own web-based transcription tool.

We’re very proud to release 1,000 databases everyday; but within those databases are signatures and hand-written notes that could be the answer to a riddle one of our community members (maybe you!) has been trying to solve for decades.

Our transcription tool will soon be “ready for prime time” and we will be inviting our community members to help index these valuable resources. The tool is being tested internally, and the initial experience is so exciting that we wanted to give you a sneak peek of what’s to come.

Transcription

transcription 2

You’ll be able to contribute to transcription projects simply and easily within your browser. No confusing software to install. No frustrating spreadsheets to maintain. You’ll just select an active project and away you go.

The tool is fast, intuitive to use, and relies on the hand-writing detection system that we announced several months ago. Popover windows will appear above the text and allow you to easily transcribe without ever leaving your keyboard.

Our arbitration process will allow us to quickly review every submission to ensure we maintain the quality standards the Mocavo community expects.

Current Projects


When the time comes to launch the transcription tool, we’ll send you an invitation along with a tutorial that explains how to get started. You will be able to join a Current Project with a single click, and our system will immediately take you to a page like the one in the example above. It’s that simple: Join a project and start contributing!

transcription_landing-845x1024 2

Recent Activity

When you’re part of a community, there’s nothing quite as exciting as drawing from the energy and momentum of the people around you! It’s important that we share a collective sense of progress and camaraderie, so we’re including an activity stream that will be constantly updating as other community members add transcriptions.

recent activity 2

Leaderboards

As part of the transcription tool, we will show you the top contributors on individual projects, as well as the top contributors overall.

leader boards 2

Coming Soon

We still have a little bit of work to do so that your first experience is as rewarding and bug-free as possible, but we hope you’re as excited as we are about the potential to bring even more content online for the world to enjoy for free, forever.

 

5 Research Resources You Cannot Afford to Forget

26 Feb 2014

Five

We often speak of a “reasonably exhaustive search” for your ancestors. The question arises, then: what is a reasonably exhaustive search? I will start with what it is not. It is not simply:

  • Doing a Google search
  • Searching Mocavo/FamilySearch/Ancestry/FindMyPast/etc.
  • Looking in a single compiled genealogy
  • Looking at published vital records

For certain, conducting a reasonably exhaustive search includes all of these items. But it must also include more.

1. Search Commercial and Free Genealogy Websites

Searching a single website isn’t enough, you must search all of them, or at least as many as possible. If you do not have personal subscriptions to all of them, go to your local libraries and archives which often have institutional subscriptions. While there is a great deal of overlap between some sites, each also has a great deal of content unique to that site. You should also check free websites, such as the USGenWeb or CanadaGenWeb or FreeBMD which have many abstractions and indexes to original records.

2. Read Compiled Genealogies

Compiled genealogies are an excellent place to look for your ancestors. But they are only a starting point. Many genealogies have no source citations, so you must verify every fact in original records. Even if the genealogy you are looking at has source citations, it is important to review those original records to be certain they were not misinterpreted. Our scholarly journals are filled with corrections to previously published genealogy caused by misreading an original record, or by additional information coming to light that causes one to reinterpret a record.

3. Check Online Catalogs for Other Published Material

WorldCat.org is the largest online library catalog, with links to holdings at thousands and thousands of repositories around the world. A search on WorldCat will show you the nearest repositories with copies of the works you need, and links to the repositories in case you need to borrow something through interlibrary loan.

4. Shelf Read

This one only works for libraries and repositories that have open shelves. When you are researching in a facility, don’t just look at the individual book or books you found in the catalog. Look at the shelves and examine every title around them. If you are looking at local histories or record abstractions, be certain to check the areas for local, county, and state levels. I have found more than one clue by perusing a book a would have ignored in the catalog because the title didn’t intrigue me.

5. Examine the Original Records

Published records are extremely helpful. They can be much easier to read than some originals, and often provide an index that original records do not have. But it is important to not rely too heavily on them. Examining the original records can provide many details missing in the originals. The published pre-1850 vital records for Massachusetts, for examples, are a great resource. But they pulled the information out of its original context and placed the records in alphabetical order. This removes the records of family groupings that connect many of the individuals. And abstractions are known to contain “only the important information.” The problems is that what is important to you may not have been important to the abstractor.

Don’t Miss Out on this week’s Fireside Chat with Judy Russell

25 Feb 2014

Join us this Wednesday at 1:00 P.M. EST for a lively discussion between Chief Genealogist, Michael J. Leclerc and the Legal Genealogist, Judy Russell. The famous genealogist with a law degree, Judy writes, teaches and lectures on a wide variety of genealogical topics, ranging from using court records in family history to understanding DNA testing.

Judy Russell

A Colorado native with roots deep in the American south on her mother’s side and entirely in Germany on her father’s side, she is a member of the Association of Professional Genealogists, the National Genealogical Society and numerous state and regional genealogical societies. She has written for the National Genealogical Society Quarterly and National Genealogical Society Magazine, among other publications. On the faculty of the Salt Lake Institute of Genealogy, the Institute for Genealogy and Historical Research in Alabama, and the Genealogical Research Institute of Pittsburgh, she is a member of the Board of Trustees of the Board for Certification of Genealogists, from which she holds credentials as a Certified Genealogist℠ and Certified Genealogical Lecturer℠.

You can tune in to tomorrow’s Fireside Chat at www.mocavo.com/fireside 

Pitfalls of Cutting Your Research Short

22 Feb 2014

The other day I wrote a popular post about the new Genealogy Standards book from the Board for Certification of Genealogists. This is very heartening to see. Experienced genealogists  get extremely frustrated with a new breed of researcher.  These are people who think that discussing quality researching is only done by “elitists” who have no interest in the average person. Many of these people use their blogs and other social media to denigrate quality work, saying that a bit of work online will tell you what you need to know about your family.

 

Genealogy Standards

 

In addition to being uninformed, these people don’t understand that they are actually undermining their own family history. Instead of gathering their ancestors, they are actually collecting a number of people who are completely unrelated and putting them into their family tree. And this is frustrating to those of us who are trying to help them understand what it takes to confirm that individuals are actually your ancestors.

This week, noted genealogist and author of Evidence Explained, Elizabeth Shown Mills posted on Facebook the perfect example of the problems of doing this “just enough” type of research. She kindly allowed me reproduce her story here:

Six years, I began work on a Georgia R[evolutionary] W[ar] soldier named William Cooksey. At that time, Cooksey researchers had spent 20 years circulating a packet of six pages or so, with *one paragraph* on William. It called him “William Cooksey IV,” gave him a birth place and specific parents in Maryland, took his lineage back to England, assigned him a wife named “Leanna Wesley,” and gave him 5 sons by Leana, plus a daughter (b. 1804) by an unknown second wife.

Today, I have 108 typed pages of abstracts and transcripts on this soldier who could not read or write. There is no shred of evidence that he was from Maryland. His parents remain unproved, although the evidence points to a Georgia-S[outh] C[arolina] couple. His first wife was definitely not “Leanna Wesley.” Only two of the sons attributed to him were actually his. Two daughters by his first wife were totally missed. The second marriage to the “unknown” wife didn’t exist. The 1804 child was actually born to the widowed daughter-in-law whom he eventually took for a wife about 1818-19. And the children he fathered by that last wife had been previously called his grandchildren!

All the wrong stuff was attributed to him because someone had plugged together random scraps of data on “same-name” men and then tried to make sense of the hodgepodge. In the end, his name was about the only accurate fact left from that one paragraph!

Yes, all the complaints we see about too-persnickety genealogists are right in one respect: If we spend so much time on one person, “we want get very far very fast.” So? What’s our object as genealogists? To scramble up a fantasy tree and remain out on a limb for 20+ years, going nowhere because we aren’t working on real people and actual family units?

In the end, real progress is made only when we do the kind of “reasonably exhaustive research” called for by the Genealogical Proof Standard.