Setting up MySQL on ec2

http://kylegoslin.wordpress.com/2012/05/18/109/

Advertisements

Repairing tables

I spent all morning figuring out how to unlock a frozen table in MySQL and this is all I had to do.

1) Let’s see the list of locked tables

mysql> show open tables where in_use>0;

2) Let’s see the list of the current processes, one of them is locking your table(s)

mysql> show processlist;

3) Let’s kill one of these processes

mysql> kill put_process_id_here;

Uploading CSV to SQL

A friend of mine asked me today how to upload a csv file to SQL server automatically and I made him this dummy script based on a really convoluted scraper I have running.

#import your modules
import requests
import pyodbc

#Downloading a file
url = ‘www.spc.noaa.gov/climo/reports/yesterday_torn.csv’
r = requests.get(url)

#Writing the file
with open(‘output’, ‘w’) as f:
f.write(r.content)

#Get the name of your server and database here
connstr = ‘DRIVER={SQL Server};SERVER=WIN-XXXX;DATABASE=Weather;’

#Connecting to the server
conn = pyodbc.connect(connstr)
cursor = conn.cursor()

#Deleting everything from current table
#Bulk inserting the output file into your table tornadoes_raw
cursor.execute(“””
DELETE FROM tornadoes_raw;

BULK INSERT tornadoes_raw
FROM \”C:\\Users\\afuller\\Desktop\\tornadoes\\output.txt”
WITH (
fieldterminator = ‘,’,
rowterminator = ‘\n’,
FIRSTROW = 2
)
“””)
#executing
conn.commit()

#you have to close the connection
conn.close()

More SQL Tips, etc.

I’ve once again been remiss in posting SQL tips and tricks. Here are a few that save my life now and again.

Case statements in T-SQL:

UPDATE  [table name] set [field] = CASE
WHEN [value] > 1 then [new_value]
WHEN [value] < 1 then [new_value_2]
END;

Decimal fields:

ALTER TABLE [tablename]
ALTER COLUMN [column] decimal(13,0);

Convert columns to another column type:

SELECT CONVERT(int,[column])
FROM [tablename]

Create table:

CREATE TABLE [tablename] (
[column1] VARCHAR(10) NULL,
[column2] VARCHAR(10) NULL,
);

Pivot in SQL Server:

SELECT [state], [keep], [drop]
FROM
(select [state], [show_title_1]
FROM [tv_shows]) ps
PIVOT
(
COUNT([show_title_1])
for [show_title_1] in
([keep], [drop])
) AS pvt

Medians in MySQL (not SQL Server):

SET @rownum := -1;

SELECT
AVG(t.[y5_totrevenue_growth]) as median
FROM
(SELECT
@rownum := @rownum + 1 AS rownum,
[tablename].[y5_totrevenue_growth] AS y5_totrevenue_growth
FROM
[tablename]
ORDER BY [tablename].[y5_totrevenue_growth]) AS t
WHERE
t.rownum IN (
CEIL(@rownum/2),
FLOOR(@rownum/2))

Medians in SQL Server:

SELECT ((
SELECT TOP 1 [domestic_gross_adj]
FROM (
SELECT TOP 50 PERCENT [domestic_gross_adj]
FROM [table] A
WHERE [domestic_gross_adj] IS NOT NULL
ORDER BY [domestic_gross_adj]
) AS A
ORDER BY [domestic_gross_adj] DESC) +
(
SELECT TOP 1 [domestic_gross_adj]
FROM (
SELECT TOP 50 PERCENT [domestic_gross_adj]
FROM table A
WHERE [domestic_gross_adj] IS NOT NULL
ORDER BY [domestic_gross_adj] DESC
) AS A
ORDER BY [domestic_gross_adj] ASC)) / 2

Convert dates to integers to find the midpoint:

SELECT [year], convert(date,convert(datetime, [avg_date])) as avg_release from
(select [year], avg(CAST(convert(datetime, [release_date]) AS int)) as avg_date from [table]
group by [year]) as a;

And a random other tip – adjusting for inflation:

ftp://ftp.bls.gov/pub/special.requests/cpi/cpiai.txt
= (2013 CPI x original dollar amount) / original CPI

Non-SQL Tips and Tricks

Batch combine files:

This is a huge problem with FEC data. If you have a bunch of files, put them in a folder and hit shift + right click. That brings up the command prompt. Then:
ren *.fec *.fec
copy *.txt target.txt

Splitting first/middle/last in Excel:

I have to do this all the time and for some reason, I can never remember how to do it right.

First: =LEFT(A2, SEARCH(” “,A2,1))

Middle: =IF(IF(ISNUMBER(SEARCH(“* * *”,A2)),”1″,”0″)=”1″,MID(A2,SEARCH(” “,A2,1)+1,SEARCH(” “,A2,SEARCH(” “,A2,1)+1)-SEARCH(” “,A2,1)),””)

Last: =IF(C2=””,RIGHT(A2,LEN(A2)-SEARCH(” “,A2,1)),RIGHT(A2,LEN(A2)-SEARCH(” “,A2,SEARCH(” “,A2,1)+1)))

name first middle last
june jones june jones
june b. jones june b. jones

This link is also helpful: http://office.microsoft.com/en-us/excel-help/split-text-among-columns-by-using-functions-HA010102341.aspx

More SQL Help

I’ve been remiss in updating this because I’ve been lazy. Or studying competitive Scrabble. But here are some few SQL and non-SQL things:

Revised advice on dates in SQL Server:

ALTER TABLE [table_name]
ADD [transaction_date_2] date;

UPDATE [table_name]
SET [transaction_date_2] =
(RIGHT([transaction_date],4)+LEFT([transaction_date],2)+SUBSTRING([transaction_date],3,2));

*This won’t work if there are any typos in the original column (e.g. formatted without a leading zero) but it’s a good trick.

Cleaning data when you don’t want to Export to Excel:

UPDATE [table]
SET [field] = REPLACE([field],’.’,”);

UPDATE [table]
SET [field]= LEFT([field],len([field])-5)
WHERE [field] like (‘% Corp’);

UPDATE [table]
SET [field]= RTRIM([field]);

Join and group by:

SELECT a.[id,] a.[name], b.[field_count] FROM table_1 as a
JOIN
(SELECT [id_2], COUNT([id_2]) as field_count
FROM [table_2]
GROUP BY [id_2]) as b
ON a.[id] = b.[id_2]
ORDER BY field_count desc

*I’ve posted something similar to this before, but hopefully this is a little more clear.

Subquery in the from clause:

SELECT b.[period], COUNT(b.[id]) as total FROM
(SELECT distinct year([start_date]) as [period], [id]
FROM [table]) as b
GROUP BY b.[period]

Alter table:

ALTER TABLE [table_name]
DROP COLUMN [id_new];

ALTER TABLE [table_name]
ADD column_name [datatype];

ALTER TABLE [table_name]
ALTER COLUMN [column_name] [datatype];

*This is always really confusing for me because it’s different in different programs.