Big Data‎ > ‎

BigData - Data


 BigData - Finding and Loading the Data
Where do we find the data? There are many Gov. sites as well as private sites that have their data for download.

I have listed many that I have found here: BigData - Resources and Links For this exercise, we will use data from the Social Security Administration (SSA.Gov). I have provided the names for the years 1946 through 2002 in the Files section below ("yob1946.txt" to "yob2002.txt" where "yob" stands for "year of birth") . If you want other years than those two, download them yourself from here

note: Those are not absolutely all the names for those years, just the ones for 5 or more people. (i.e. If only you and 3 other people have the same name, it was not included). It still is a lot of names (almost 4 million names)

1. Create the Application - "Names" 
    a. open up LiveCode and create a new mainstack, size it to be the full height of the screen
        Click on "Object" on the EditBar, select "Stack Inspector", name it "Names"

    b. save the Stack
        Click on "File" on the Editbar", select "Save As", Save it as "Names.livecode" on your computer

2. Add a field to hold the names
    a. add a text entry field, size it to the card, name it "names" and check the Vertical Scrollbar check box

3. Add a Button to read in the file of names
    a. add a button, call it "Load File". We will program it to allow us to select different files from our computer.
    
    add this script to it:

on mouseUp
   answer file "Select a Text file"                        // open up a file-selection dialog
   if it is empty then exit mouseUp                    // if user clicks "Cancel", do nothing
   put it into x                                                 // save the name of the file they selected
   put url ("file:" & x) into field "names"            // load the file into out text field
end mouseUp

4. Load the Data
    a. Download one of the data files below for the year in which you were born - e.g. "yob1999.txt"
    b. go into "Run" mode, click on the button and load that file into your program

It should look like this:

                              

    
You can see three fields - this is called a CSV format file. The letters stand for Comma Separated Values. Every field/item/value is separated with a comma. They are - the name, the gender and the number of people with that name. 

Notice that the names are in no particular order



We can open the "Message Box" on the EditBar and try some commands.



    a. Find the name Sam by typing:   Find "Sam" in field "names"

       

We get this:(The name "Samantha" has a box around the letters "Sam"

                               
 

That helps but it would be nice if all the names were in order.


  Let's put them in alphabetical order by typing: Sort field "names" in the MessageBox

                         

         That looks better, now let's so the find again: Find "Sam" in field "names"

                         

That looks good, because we now have all the names beginning with "Sam" together.

looking at the data, for this year there were

    15 Females named Sam
    476 Males named Sam

You can experiment with other commands in the Message Box for now but we will want to put those commands in our program to run automatically

Lets add the sort command to our "Load Files" button:

on mouseUp
   answer file "Select a Txt file"                        // open up a file-selection dialog
   if it is empty then exit mouseUp                   // if user clicks "Cancel", do nothing
   put it into x                                                   // save the name of the file they selected
   put url ("file:" & x) into field "names"           // load the file into out text field
   sort field "names"
end mouseUp

Now, when we load a new year, it will be sorted for us.


For You To Do:
   (answer the questions, then make the changes to your program)

    Questions to Investigate:
    1. Does case matter? try doing a find on "Sam" then a find on "sam"
    2. What happens when you do "Find Willy" twice?  Why?
    3. Look up "find" in the LiveCode Dictionary. Is there a way to reset the find command?


Adding a Look-up Field

Now, Add another text field - "name2find" and a button "Find Name"
On the button put the code to get the name the user types into that field and look for that name in then list of names
    e.g. 

on mouseUp
        put field "name2find" into x
                   find x in field "names"
            end mouseUp

        note: 
    • You can use whatever names you want for the fields, you do not have to use mine
    • do you need to put a "find empty" at the beginning to reset the find command? try it out without it and see
    • nice feature to add - check the result after the find. If the result is "not found" then the find command did not find the name. Put up an answer box to tell the user that.



    Extra:

    You can load any file into a field. For fun, load any other text file. You can see its contents inside the field.
    Try a HTML page, you can see all the HTML tags and text.

                               


New Code for "Load Files" button

on mouseUp
   answer file "Select a Txt file"
   if it is empty then 
      exit mouseUp
   end if
   put it into x
    put url ("file:" & x) into y
   sort y
   put empty into field "girls"
   put empty into field "boys"
   repeat for each line l in y
      if item 2 of l = "M" then
         put l  & return after b1
      else
         put l & return after g1
      end if
   end repeat
   sort lines of b1 descending numeric by item 3 of each 
   sort lines of g1 descending numeric by item 3 of each
   put b1 into field "boys"
   put g1 into field "girls"
   
end mouseUp

new code for Lookup Name

on mouseUp
   put field "myname" into x
   find x in field "boys"
   
   set itemDelimiter to space
   put item 2 of  foundline() into field "num"
end mouseUp

ċ
cyril.pruszko@pgcps.org,
Feb 28, 2017, 7:51 AM
ċ
cyril.pruszko@pgcps.org,
Feb 28, 2017, 7:51 AM
ċ
cyril.pruszko@pgcps.org,
Feb 28, 2017, 7:51 AM
ċ
cyril.pruszko@pgcps.org,
Feb 28, 2017, 7:52 AM
ċ
cyril.pruszko@pgcps.org,
Feb 28, 2017, 7:52 AM
ċ
cyril.pruszko@pgcps.org,
Feb 28, 2017, 7:52 AM
ċ
cyril.pruszko@pgcps.org,
Feb 28, 2017, 7:52 AM
ċ
cyril.pruszko@pgcps.org,
Feb 28, 2017, 7:52 AM
ċ
cyril.pruszko@pgcps.org,
Feb 28, 2017, 7:52 AM
ċ
cyril.pruszko@pgcps.org,
Feb 28, 2017, 7:52 AM
ċ
cyril.pruszko@pgcps.org,
Feb 28, 2017, 7:52 AM
ċ
cyril.pruszko@pgcps.org,
Feb 28, 2017, 7:52 AM
ċ
cyril.pruszko@pgcps.org,
Feb 28, 2017, 7:52 AM
ċ
cyril.pruszko@pgcps.org,
Feb 28, 2017, 7:52 AM
ċ
cyril.pruszko@pgcps.org,
Feb 28, 2017, 7:52 AM
ċ
cyril.pruszko@pgcps.org,
Feb 28, 2017, 7:52 AM
ċ
cyril.pruszko@pgcps.org,
Feb 28, 2017, 7:52 AM
ċ
cyril.pruszko@pgcps.org,
Feb 28, 2017, 7:52 AM
ċ
cyril.pruszko@pgcps.org,
Feb 28, 2017, 7:53 AM
ċ
cyril.pruszko@pgcps.org,
Feb 28, 2017, 7:53 AM
ċ
cyril.pruszko@pgcps.org,
Feb 28, 2017, 7:53 AM
ċ
cyril.pruszko@pgcps.org,
Feb 28, 2017, 7:53 AM
ċ
cyril.pruszko@pgcps.org,
Feb 28, 2017, 7:53 AM
ċ
cyril.pruszko@pgcps.org,
Feb 28, 2017, 7:53 AM
ċ
cyril.pruszko@pgcps.org,
Feb 28, 2017, 7:53 AM
ċ
cyril.pruszko@pgcps.org,
Feb 28, 2017, 7:53 AM
ċ
cyril.pruszko@pgcps.org,
Feb 28, 2017, 7:53 AM
ċ
cyril.pruszko@pgcps.org,
Feb 28, 2017, 7:53 AM
ċ
cyril.pruszko@pgcps.org,
Feb 28, 2017, 7:53 AM
ċ
cyril.pruszko@pgcps.org,
Feb 28, 2017, 7:53 AM
ċ
cyril.pruszko@pgcps.org,
Feb 28, 2017, 7:53 AM
ċ
cyril.pruszko@pgcps.org,
Feb 28, 2017, 7:53 AM
ċ
cyril.pruszko@pgcps.org,
Feb 28, 2017, 7:53 AM
ċ
cyril.pruszko@pgcps.org,
Feb 28, 2017, 7:53 AM
ċ
cyril.pruszko@pgcps.org,
Feb 28, 2017, 7:53 AM
ċ
cyril.pruszko@pgcps.org,
Feb 28, 2017, 7:54 AM
ċ
cyril.pruszko@pgcps.org,
Feb 28, 2017, 7:54 AM
ċ
cyril.pruszko@pgcps.org,
Feb 28, 2017, 7:54 AM
ċ
cyril.pruszko@pgcps.org,
Feb 28, 2017, 7:54 AM
ċ
cyril.pruszko@pgcps.org,
Feb 28, 2017, 7:54 AM
ċ
cyril.pruszko@pgcps.org,
Feb 28, 2017, 7:54 AM
ċ
cyril.pruszko@pgcps.org,
Feb 28, 2017, 7:54 AM
ċ
cyril.pruszko@pgcps.org,
Feb 28, 2017, 7:54 AM
ċ
cyril.pruszko@pgcps.org,
Feb 28, 2017, 7:54 AM
ċ
cyril.pruszko@pgcps.org,
Feb 28, 2017, 7:54 AM
ċ
cyril.pruszko@pgcps.org,
Feb 28, 2017, 7:54 AM
ċ
cyril.pruszko@pgcps.org,
Feb 28, 2017, 7:54 AM
ċ
cyril.pruszko@pgcps.org,
Feb 28, 2017, 7:54 AM
ċ
cyril.pruszko@pgcps.org,
Feb 28, 2017, 7:54 AM
ċ
cyril.pruszko@pgcps.org,
Feb 28, 2017, 7:54 AM
ċ
cyril.pruszko@pgcps.org,
Feb 28, 2017, 7:54 AM
ċ
cyril.pruszko@pgcps.org,
Feb 28, 2017, 7:55 AM
ċ
cyril.pruszko@pgcps.org,
Feb 28, 2017, 7:55 AM
ċ
cyril.pruszko@pgcps.org,
Feb 28, 2017, 7:55 AM
ċ
cyril.pruszko@pgcps.org,
Nov 29, 2016, 7:49 AM
ċ
cyril.pruszko@pgcps.org,
Nov 29, 2016, 7:49 AM
ċ
cyril.pruszko@pgcps.org,
Nov 29, 2016, 7:49 AM
Comments